CN115049968A

CN115049968A - Dynamic programming video automatic cutting method, device, equipment and storage medium

Info

Publication number: CN115049968A
Application number: CN202210966159.3A
Authority: CN
Inventors: 沈振冈; 龙思敏; 周斌; 胡波; 李艳红
Original assignee: Wuhan Etah Information Technology Co ltd
Current assignee: Wuhan Etah Information Technology Co ltd
Priority date: 2022-08-12
Filing date: 2022-08-12
Publication date: 2022-09-13
Anticipated expiration: 2042-08-12
Also published as: CN115049968B

Abstract

The invention discloses a method, a device, equipment and a storage medium for automatically cutting a dynamic programming video, wherein the method comprises the steps of detecting a source video to obtain target content in each frame of video and corresponding image characteristics, color histograms and gray level maps; fusing the image characteristics, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data; generating a target video sequence according to video frame data, and searching an optimal cutting path in the target video sequence by using a shortest key path in dynamic programming; the source video is cut according to the optimal cutting path, the video cutting result is obtained, the loss of cutting content can be avoided, the cutting window does not need to be moved frequently, the smoothness of the cut video is guaranteed, the reasonability of cutting the video content is guaranteed, the track discontinuity phenomenon caused by directly selecting a detection area to cut is avoided, the aesthetic visual requirement of audiences is met, the automatic cutting speed and efficiency of the video are improved, and the user experience is improved.

Description

Dynamic programming video automatic cutting method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of video clipping, in particular to a method, a device, equipment and a storage medium for dynamically planning video automatic clipping.

Background

With the rise of streaming media applications and the update of electronic equipment, videos permeate into various industries, and people have higher and higher requirements on the playing quality of different equipment; the variety of display equipment in life is more and more diversified, such as liquid crystal televisions, smart phones, tablet computers and the like, and the equipment has difference in display specifications and is suitable for different browsing modes; in order to save manpower and material resources, the multimedia content is usually made in a fixed size, which causes the phenomenon that the content of the video is not matched on different display devices; meanwhile, distortion and deformation of the video are required to be avoided when the video is played on a display, so that the aspect ratio of the video to be played can be automatically adjusted while important contents are kept, and the phenomenon that the visual quality is affected due to discontinuity, unsmooth and the like cannot occur; to solve this problem, more and more researchers in the field of computer graphics and computer vision have begun to invest in the field of image video cropping.

Unlike image cropping, video cropping introduces information of time dimension, and is more challenging because the adjustment of each frame of video needs to take the time continuity of the previous and subsequent frames into consideration, and many researchers in the field of computer vision also start to transfer the center of gravity from image cropping to video cropping, for example, by using a confrontation generation network framework to crop video, the core idea is a condition generation model; if the area which must be reserved in the video frame is determined firstly, then the clipping is realized according to the importance and the significance of the content, and when the clipped image does not meet the target size, the image is directly zoomed to the target size; if the dynamic planning is applied to optimize the time consistency of the cutting area, the effect is not expected; if a group of continuous video frames in a shot are subjected to fast curve fitting to find an optimal clipping sequence, complex optimization of time constraint between adjacent frames is avoided; also, spatial and temporal consistency is achieved, for example, by an explicit mechanism to reduce jitter and content distortion; also, unlike existing algorithms that generate thumbnails on a single image, attention is paid to the stereo saliency detection and thumbnail generation, respectively.

The above methods have certain limitations, and most of them can only aim at the condition that the target in the target sequence moves slowly or the number of the targets is small; there are two main reasons: (1) the method has the advantages that the target in the fast moving video has factors such as fuzzy, shielding and scene change, so that important information is easily lost, and the content of the cut video is incomplete; (2) for a fast moving target, the clipping window needs to be moved frequently, so that the clipped video is not smooth enough.

Disclosure of Invention

The invention mainly aims to provide a method, a device, equipment and a storage medium for dynamically planning video automatic clipping, and aims to solve the technical problems that video clipping contents are easy to lose, a clipping window needs to be frequently moved, and the clipped video is not smooth enough in the prior art.

In a first aspect, the present invention provides a method for automatically cutting a dynamic programming video, where the method for automatically cutting a dynamic programming video includes the following steps:

detecting a source video to obtain target content in each frame of video and corresponding image characteristics, color histograms and gray level maps;

fusing the image characteristics, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data;

generating a target video sequence according to the video frame data, and searching an optimal cutting path in the target video sequence by using a shortest key path in dynamic programming;

and cutting the source video according to the optimal cutting path to obtain a video cutting result.

Optionally, the detecting the source video to obtain the target content in each frame of video and the corresponding image feature, color histogram and gray-scale map includes:

performing semantic processing on a source video to obtain a video sequence;

carrying out bounding box detection on the video sequence, and determining whether the size of a bounding box of the video sequence conforms to a target screen;

when the size of the bounding box does not accord with a target screen, obtaining the position of a region of interest of a user from the video sequence;

and determining a range to be cut according to the position of the region of interest of the user, and obtaining target content in each frame of video and corresponding image characteristics, color histograms and gray level images from the range to be cut.

Optionally, the fusing the image feature, the color histogram and the gray-scale map by using a log-linear model to obtain video frame data includes:

obtaining a color histogram value corresponding to the color histogram and a gray value corresponding to the gray map by using a logarithmic linear model;

fusing the image features, the color histogram values and the gray values through the following formula to obtain video frame data:

wherein the content of the first and second substances,

for the color histogram values of the image of each frame,

for the gray value of the image of each frame,

for the features of each frame of the image,

is the average value of the three components,

for the local integrated theoretical value or desired frequency value of the tth frame,

is video frame data.

Optionally, the generating a target video sequence according to the video frame data and finding an optimal clipping path in the target video sequence by using a shortest key path in dynamic programming includes:

setting the area of a cutting window and the distance of a frame window according to a preset constraint condition;

screening the video frame data according to the cutting window area and the frame window distance to obtain a target video sequence meeting the conditions;

and searching the optimal clipping path in the target video sequence by using the shortest key path in the dynamic programming.

Optionally, the setting of the clipping window area and the frame window distance according to the preset constraint condition includes:

setting the clipping window area and the frame window distance according to a preset constraint condition by the following formula:

wherein d: (

,

) Is a function of a distance measure and,

the distance between adjacent frames of the clipping window is shown, S (W) is used for calculating the area size of the window W,

representing the area difference of the clipping windows between two adjacent frames.

Optionally, the finding an optimal clipping path in the target video sequence according to the shortest critical path in the application dynamic programming includes:

reconstructing a dynamic transition track of a cutting window of the target video sequence from a source position to a target position into a shortest key path according to dynamic programming;

obtaining a directed weighted graph corresponding to the directed edge weight of a cutting window of the target video sequence, and obtaining a transition factor of each corresponding edge in the directed weighted graph;

determining a visual penalty function of the shortest key path according to the transition factor, acquiring local frame data of the target video sequence, and calculating global frame data of each target position according to the local frame data;

and determining an optimal clipping path in the target video sequence according to the visual penalty function and the global frame data.

Optionally, the cropping the source video according to the optimal cropping path to obtain a video cropping result includes:

acquiring a preset smoothing factor corresponding to the optimal cutting path, and finding an optimal smoothing sequence in the source video according to the preset smoothing factor;

and generating a video clipping result according to the optimal smooth sequence.

In a second aspect, to achieve the above object, the present invention further provides an automatic dynamic programming video clipping device, including:

the detection module is used for detecting the source video to obtain target content in each frame of video and corresponding image characteristics, color histograms and gray level maps;

the fusion module is used for fusing the image characteristics, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data;

the path generation module is used for generating a target video sequence according to the video frame data and searching an optimal cutting path in the target video sequence by using a shortest key path in dynamic planning;

and the cutting module is used for cutting the source video according to the optimal cutting path to obtain a video cutting result.

In order to achieve the above object, the present invention further provides a dynamic programming video automatic clipping device, where the dynamic programming video automatic clipping device includes: the system comprises a memory, a processor and a dynamic programming video automatic clipping program stored on the memory and operable on the processor, wherein the dynamic programming video automatic clipping program is configured to realize the steps of the dynamic programming video automatic clipping method.

In a fourth aspect, to achieve the above object, the present invention further provides a storage medium, where a dynamic programming video automatic clipping program is stored, and when executed by a processor, the dynamic programming video automatic clipping program implements the steps of the dynamic programming video automatic clipping method described above.

The invention provides a dynamic programming video automatic cutting method, which comprises the steps of detecting a source video to obtain target content in each frame of video and corresponding image characteristics, color histograms and gray level maps; fusing the image features, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data; generating a target video sequence according to the video frame data, and searching an optimal cutting path in the target video sequence by using a shortest key path in dynamic programming; according to the optimal cutting path is right the source video is cut to obtain a video cutting result, the loss of cutting content can be avoided, a cutting window does not need to be frequently moved, the smoothness of the cut video is guaranteed, the rationality of cutting the video content is guaranteed, the phenomenon of track discontinuity caused by directly selecting a detection area for cutting is avoided, the aesthetic requirement of vision of audiences is met, the automatic cutting speed and efficiency of the dynamic programming video are improved, and the user experience is improved.

Drawings

FIG. 1 is a schematic diagram of an apparatus architecture of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart illustrating a first embodiment of a method for automatically cropping a dynamic programming video according to the present invention;

FIG. 3 is a flowchart illustrating a second embodiment of a method for automatically cropping a dynamic programming video according to the present invention;

FIG. 4 is a flowchart illustrating a third embodiment of a method for automatically cropping a dynamic programming video according to the present invention;

FIG. 5 is a flowchart illustrating a fourth embodiment of a method for automatically cropping a dynamic programming video according to the present invention;

FIG. 6 is a flowchart illustrating a fifth embodiment of a method for automatically cropping a dynamic programming video according to the present invention;

FIG. 7 is a functional block diagram of an automatic dynamic programming video clipping device according to a first embodiment of the present invention.

The implementation, functional features and advantages of the present invention will be further described with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

The solution of the embodiment of the invention is mainly as follows: detecting a source video to obtain target content in each frame of video and corresponding image characteristics, color histograms and gray level maps; fusing the image features, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data; generating a target video sequence according to the video frame data, and searching an optimal cutting path in the target video sequence by using a shortest key path in dynamic programming; according to the optimal cutting path is right the source video is cut to obtain a video cutting result, the loss of cutting content can be avoided, a cutting window does not need to be moved frequently, the smooth and smooth video cutting after cutting is guaranteed, the rationality of cutting the video content is guaranteed, the track discontinuous phenomenon caused by directly selecting a detection area for cutting is avoided, the visual aesthetic requirement of audiences is met, the automatic cutting speed and efficiency of a dynamic programming video are improved, the user experience is improved, the technical problem that in the prior art, the video cutting content is easy to lose, the cutting window needs to be moved frequently, and the cut video is not smooth and smooth enough is solved.

Referring to fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the apparatus may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. The communication bus 1002 is used to implement connection communication among these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., a Wi-Fi interface). The Memory 1005 may be a high-speed RAM Memory or a Non-Volatile Memory (Non-Volatile Memory), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration of the apparatus shown in fig. 1 is not intended to be limiting of the apparatus and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a storage medium, may include an operating device, a network communication module, a user interface module, and a dynamic programming video auto-cropping program.

The device calls a dynamic programming video automatic cutting program stored in a memory 1005 through a processor 1001 and executes the following operations:

fusing the image features, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data;

The device calls the dynamic programming video automatic cutting program stored in the memory 1005 through the processor 1001, and also executes the following operations:

performing semantic processing on a source video to obtain a video sequence;

carrying out bounding box detection on the video sequence, and determining whether the size of a bounding box of the video sequence meets a target screen;

wherein the content of the first and second substances,

for the color histogram value of each frame of the image,

for the gray value of the image of each frame,

for the features of each frame of the image,

is the average value of the three components,

is video frame data.

wherein d: (

,

) Is a function of a distance measure and,

the distance between adjacent frames of the cropping window is shown, S (W) is used for calculating the area size of the window W,

acquiring a directed weighted graph corresponding to the directed edge weight of a cutting window of the target video sequence, and acquiring transition factors of all corresponding edges in the directed weighted graph;

and determining the optimal clipping path in the target video sequence according to the visual penalty function and the global frame data.

According to the scheme, the target content in each frame of video and the corresponding image characteristics, color histograms and gray level images are obtained by detecting the source video; fusing the image features, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data; generating a target video sequence according to the video frame data, and searching an optimal cutting path in the target video sequence by using a shortest key path in dynamic programming; according to the optimal cutting path is right the source video is cut to obtain a video cutting result, the loss of cutting content can be avoided, a cutting window does not need to be frequently moved, the smoothness of the cut video is guaranteed, the rationality of cutting the video content is guaranteed, the phenomenon of track discontinuity caused by directly selecting a detection area for cutting is avoided, the aesthetic requirement of vision of audiences is met, the automatic cutting speed and efficiency of the dynamic programming video are improved, and the user experience is improved.

Based on the hardware structure, the embodiment of the method for automatically cutting the dynamic programming video is provided.

Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of a dynamic programming video auto-cropping method according to the present invention.

In a first embodiment, the method for automatically cropping a dynamically planned video includes the following steps:

and step S10, detecting the source video, and obtaining the target content in each frame of video and the corresponding image characteristics, color histogram and gray level map.

After the source video to be cropped is detected, predetermined important contents, that is, the target content and the corresponding image feature, color histogram and gray scale map thereof, in each frame of video can be obtained.

And step S20, fusing the image characteristics, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data.

It is understood that the image features, the color histogram and the gray scale map may be fused by using a log-linear model to obtain corresponding frame coefficient data, i.e. video frame data.

Further, the step S20 specifically includes the following steps:

(1)

wherein the content of the first and second substances,

for the color histogram value of each frame of the image,

for the gray value of the image of each frame,

for the features of each frame of the image,

is the average value of the three components,

is video frame data.

It should be noted that finding a suitable track is the core of video cropping, and this track not only needs to traverse the entire video, but also should contain important content; due to the motion, scaling and background transformation of the camera, the size of a simple target detection area frame is not enough to determine an important area in a video sequence; in order to measure the content correlation in the detection result, a log-linear model is defined, through analyzing the output of the previous stage, modeling is carried out by using the log-linear model according to the gray-scale image, the frame color histogram and the image characteristics contained in each frame of detected content to generate each frame of data, the approximate boundary of each frame of cutting area is found through the frame data, the gray-scale image, the frame color histogram and the image appearance are given, and for each position (x, y) in the image x, calculation is carried out by fusing three characteristics and according to the formula.

And step S30, generating a target video sequence according to the video frame data, and searching the optimal cutting path in the target video sequence by using the shortest key path in dynamic planning.

It should be understood that, a corresponding adaptation sequence may be generated according to the video frame data, and then an optimal clipping track, that is, an optimal clipping path in the target video sequence, is found by using a shortest critical path in dynamic planning.

And step S40, clipping the source video according to the optimal clipping path to obtain a video clipping result.

It can be understood that the source video can be cropped through the optimal cropping path, and a video cropping result is obtained.

According to the scheme, the target content in each frame of video and the corresponding image characteristics, color histograms and gray level images are obtained by detecting the source video; fusing the image characteristics, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data; generating a target video sequence according to the video frame data, and searching an optimal cutting path in the target video sequence by using a shortest key path in dynamic programming; according to the optimal cutting path is right the source video is cut to obtain a video cutting result, the loss of cutting content can be avoided, a cutting window does not need to be frequently moved, the smoothness of the cut video is guaranteed, the rationality of cutting the video content is guaranteed, the phenomenon of track discontinuity caused by directly selecting a detection area for cutting is avoided, the aesthetic requirement of vision of audiences is met, the automatic cutting speed and efficiency of the dynamic programming video are improved, and the user experience is improved.

Further, fig. 3 is a schematic flow chart of a second embodiment of the dynamic programming video automatic clipping method of the present invention, and as shown in fig. 3, the second embodiment of the dynamic programming video automatic clipping method of the present invention is provided based on the first embodiment, and in this embodiment, the step S10 specifically includes the following steps:

and step S11, performing semantic processing on the source video to obtain a video sequence.

It should be noted that, the source video is subjected to semantic processing, and a video sequence after the semantic processing can be obtained.

And step S12, carrying out boundary box detection on the video sequence, and determining whether the size of the boundary box of the video sequence conforms to a target screen.

It is understood that the bounding box detection of the video sequence may determine whether the size of the video sequence conforms to the target screen, i.e., whether the size of the bounding box of the video sequence conforms to the display screen size of the target display device.

And step S13, when the size of the bounding box does not accord with the target screen, obtaining the position of the region of interest of the user from the video sequence.

It should be appreciated that the user roi can be obtained from the video sequence when the bounding box size does not fit the target screen, and in general, the user roi position can be found for each frame of video using the open source high performance detector YOLOX.

In a specific implementation, since the saliency detection does not detect the saliency detection in place due to the sudden appearance in a certain frame, the YOLOX model can be used for detection, and the detection speed is improved mainly because the YOLOX is superior to the saliency detection in accuracy.

And step S14, determining a range to be cut according to the position of the user interesting region, and obtaining target content in each frame of video and corresponding image characteristics, color histograms and gray level maps from the range to be cut.

It can be understood that the region-of-interest position of the user can determine a range to be cropped, from important content in each video frame corresponding to the range to be cropped, that is, the target content and the image feature, the color histogram and the grayscale map corresponding to the target content.

According to the scheme, the video sequence is obtained by performing semantic processing on the source video; carrying out bounding box detection on the video sequence, and determining whether the size of a bounding box of the video sequence meets a target screen; when the size of the bounding box does not accord with the target screen, obtaining the position of the region of interest of the user from the video sequence; and determining a range to be cut according to the position of the region of interest of the user, and obtaining target content in each frame of video and corresponding image characteristics, color histograms and gray level maps from the range to be cut, so that the image characteristics, the color histograms and the gray level maps of the video frames can be obtained, and the speed and the efficiency of automatic cutting of the video are improved.

Further, fig. 4 is a schematic flow chart of a third embodiment of the method for automatically cropping a dynamically planned video according to the present invention, and as shown in fig. 4, the third embodiment of the method for automatically cropping a dynamically planned video according to the present invention is provided based on the first embodiment, and in this embodiment, the step S30 specifically includes the following steps:

and step S31, setting the clipping window area and the frame window distance according to the preset constraint condition.

It should be noted that, the clipping window area of the video frame data and the frame window distance of two adjacent frames may be set by presetting the constraint condition.

Further, the step S31 specifically includes the following steps:

(2)

wherein d: (

,

) Is a function of a distance measure and,

It will be appreciated that in view of the previously obtained frame data, an optimal cropping sequence is found for a given video sequence to obtain a smooth and time consistent visual effect; the clipping sequence directly found according to each frame data may generate a non-smooth clipping sequence, because the distance of the clipping window may be relatively far or the area may have a large jump, in order to obtain a smooth sequence, the clipping window between two consecutive frames should keep a small relative position distance, and at the same time, the area of the clipping window cannot have too large difference; distance and area constraints are added herein, as defined in the above equation.

And step S32, screening the video frame data according to the cutting window area and the frame window distance to obtain a qualified target video sequence.

It can be understood that, by filtering the video frame data according to the cropping window area and the frame window distance, a video sequence conforming to the cropping window area and the frame window distance can be obtained as a qualified target video sequence.

And step S33, finding the optimal cutting path in the target video sequence by using the shortest key path in the dynamic planning.

It should be appreciated that the optimal clipping path in the target video sequence can be found using the shortest critical path in dynamic programming.

According to the scheme, the area of the cutting window and the distance of the frame window are set according to the preset constraint condition; screening the video frame data according to the cutting window area and the frame window distance to obtain a target video sequence meeting the conditions; the optimal cutting path in the target video sequence is searched by using the shortest key path in the dynamic programming, the loss of cutting content can be avoided, the cutting window does not need to be frequently moved, the smooth and smooth video after cutting is ensured, the rationality of cutting the video content is ensured, the track discontinuity phenomenon caused by directly selecting a detection area for cutting is avoided, the visual aesthetic requirement of audiences is met, the automatic cutting speed and efficiency of the dynamic programming video are improved, and the user experience is improved.

Further, fig. 5 is a schematic flow chart of a fourth embodiment of the dynamic programming video automatic clipping method of the present invention, and as shown in fig. 5, the fourth embodiment of the dynamic programming video automatic clipping method of the present invention is provided based on the third embodiment, in this embodiment, the step S33 specifically includes the following steps:

and S331, reconstructing a dynamic transition track of the cutting window of the target video sequence from a source position to a target position into a shortest key path according to dynamic planning.

It should be noted that, the shortest critical path is determined for the clipping window of the target video sequence according to dynamic planning, that is, the dynamic transition trajectory from the source position to the target position of the clipping window is reconstructed into the shortest path.

And S332, acquiring a directed weighted graph corresponding to the directed edge weight of the clipping window of the target video sequence, and acquiring a transition factor of each corresponding edge in the directed weighted graph.

It can be understood that, obtaining a directed weighted graph corresponding to the directed edge weight of the clipping window of the target video sequence can obtain the transition factor of each corresponding edge.

In the specific implementation, in consideration of the visual effect on improving the time continuity, the cost of directly searching all possible tracks of the cutting window is too high, and a usable dynamic planning method is used for reconstructing the dynamic transition track of the cutting window from the source position to the target position into the shortest path problem; and finding the approximate position of each frame of cutting window according to each frame of local frame data, constructing a directed weighted graph at the same time, taking the cutting window as a node of the graph, taking the frame data value in each frame as the weight of a directed edge, and constructing the directed weighted graph by combining with the tracking of the cutting window.

Step S333, determining the visual penalty function of the shortest key path according to the transition factor, acquiring local frame data of the target video sequence, and calculating global frame data of each target position according to the local frame data.

It should be understood that, by determining the visual penalty function of the shortest critical path through the transition factor, after obtaining the local frame data of the target video sequence, the global frame data of each target position can be calculated according to the local frame data.

In the specific implementation, a shortest path algorithm is applied to the directed weighted graph to find a globally optimal clipping track, and in order to ensure that the clipping window keeps smooth in the moving process, except for the distance constraint of each frame of the clipping window defined in the formula (2), for the edges in the directed weighted graph, transition factors of the corresponding edges are defined herein

：

（3）

When finding the optimal track, the situation that significant information loss occurs in the process of clipping window shift is also considered, but the information loss and the clipping window transition shift are contradictory, and in order to balance the two to optimize the visual comfort, a visual penalty function is defined:

wherein the content of the first and second substances,

for significant information loss in each frame,

the transition offset for the cropping window between adjacent frames,

is the node position in the upper left corner of each frame,

is a dynamic trajectory representing all of the cropping windows,

is a balance parameter between significant information loss and clipping window offset, and n represents the total frame number.

Step S334, determining an optimal clipping path in the target video sequence according to the visual penalty function and the global frame data.

It is understood that the optimal clipping path in the target video sequence is determined by the visual penalty function and the global frame data.

In a specific implementation, in order to make the resulting clipping sequence work best, the visual penalty needs to be minimized, so minimizing V in the equation is equivalent to the slave node (x) _i-1 ,y _i-1 ) To node (x) _i ,y _i ) Then the minimum visual penalty is defined as follows:

（5）

wherein the content of the first and second substances,

a representation of the source node is shown,

indicating the shortest path from the source node of frame 1 to the kth node of the ith frame,

indicating the shortest path from the source node of frame 1 to the jth node of frame i-1,

indicating an offset transition from the jth node of the (i-1) th frame to the kth node of the ith frame.

Calculating global frame data of each target position according to the local frame data obtained in the formula (1) by the final cutting track, determining an optimal track according to the global frame data and a visual penalty function, and storing the obtained candidate value in a backtracking algorithm

The method comprises the following steps:

(6)

(7)

wherein

For the global frame data of each target location,

is local frame data per frame, R (x, y) is a possible candidate node for node (x, y), B _t (x, y) are possible results of the storage.

According to the scheme, the dynamic transition track of the cutting window of the target video sequence from the source position to the target position is reconstructed into the shortest key path according to dynamic planning; acquiring a directed weighted graph corresponding to the directed edge weight of a cutting window of the target video sequence, and acquiring transition factors of all corresponding edges in the directed weighted graph; determining a visual penalty function of the shortest key path according to the transition factor, acquiring local frame data of the target video sequence, and calculating global frame data of each target position according to the local frame data; and determining the optimal cutting path in the target video sequence according to the visual penalty function and the global frame data, so that the loss of cutting content can be avoided, a cutting window does not need to be frequently moved, the smoothness of the cut video is ensured, and the reasonability of the cutting video content is ensured.

Further, fig. 6 is a schematic flow chart of a fifth embodiment of the dynamic programming video automatic clipping method of the present invention, and as shown in fig. 6, the fifth embodiment of the dynamic programming video automatic clipping method of the present invention is provided based on the first embodiment, in this embodiment, the step S40 specifically includes the following steps:

and step S41, obtaining a preset smoothing factor corresponding to the optimal clipping path, and finding the optimal smoothing sequence in the source video according to the preset smoothing factor.

It should be noted that, after the preset smoothing factor corresponding to the optimal clipping path is obtained, the optimal smoothing sequence in the source video may be found according to the preset smoothing factor.

And step S42, generating a video cutting result according to the optimal smooth sequence.

It should be appreciated that video cropping results may be generated by the optimal smoothing sequence.

In the specific implementation, if the cutting track is not subjected to smooth processing when a directed weighted graph is constructed in a dynamic programming algorithm, disturbance with a certain component exists, so that the finally obtained cutting result looks like jitter; to improve visual comfort, a smoothing factor is defined in the resulting clipping trajectory to ensure allowable transitions between subsequent frames, finding the best smooth sequence, and making the video more stable in time.

The smoothing factor is designed to improve the smoothness of the clipping trajectory, as in equation (3)

Smoothing the cropping window between adjacent frames as a transition factor that can be used to screen for large jumps between two consecutive frames, e.g., using two nodes (x) in two adjacent frames ₁ ,y ₁ ) And (x) ₂ ,y ₂ ) Euclidean distance between:

(8)

wherein

Are weighting parameters that control smoothness and flexibility.

Since the image after cropping is distorted and deformed in the conventional method, which is caused by the fact that the image area obtained after cropping is not uniform in size and is directly scaled, this phenomenon is changed by the area constraint in equation (2) and setting the scaling factor:

(9)

wherein (w, h) and (

) The sizes of the current and original clipping areas are obtained.

After the video cropping results are obtained, an evaluation may be made:

in the embodiment, the Cropping Rate (CR) refers to the ratio of important contents in the cropped image to the important contents of a source video sequence and is calculated through a frame color histogram, and the distortion Degree (DR) is measured by using SSIM (structural similarity) scores, is commonly used for judging the fidelity and the similarity between two images and can also be used for observing the distortion degree; the Stability (SR) refers to the relative position change of the same target between two continuous frames, namely, the jitter condition of the cutting window is judged; the CR, SSIM, DR and SR calculation expressions are defined as (10), (11), (12) and (13):

(10)

wherein S _A Representing important content areas, S, of source video frames _B Representing the area of the important content after cropping.

(11)

(12)

Wherein

Represents I ₁ The average value of (a) is calculated,

represents I ₂ The average value of (a) is calculated,

represents I ₁ The variance of (a) is determined,

represents I ₂ The variance of (a) is determined,

represents I ₁ And I ₂ Covariance of (a), b ₁ And b ₂ To prevent overflow conditions caused by SSIM being zero the SSIM score is [0,1 ]]In between, when the contents of the two video images are completely the same, the SSIM score is equal to 1.

(13)

In the embodiment, the boundary of the detection frame of each frame of important content is taken as a reference, and the change degree of the clipping window is judged by calculating the left and right distances from the clipping window to the target content in the front frame and the rear frame; is represented by the formula (13), wherein d _i Clipping the distance from the left border of the window to the target content for the ith frame, d _i-1 Taking the distance from the left border of the cropping window to the target content in the i-1 th frame as an example, a smaller SR indicates a smaller jitter degree, i.e., a better stability.

According to the scheme, the optimal smooth sequence in the source video is found according to the preset smooth factor by obtaining the preset smooth factor corresponding to the optimal clipping path; according to the optimal smooth sequence generation video cutting result, the loss of cutting contents can be avoided, the cutting window does not need to be frequently moved, the smooth and smooth video after cutting is guaranteed, the rationality of cutting the video contents is guaranteed, the track discontinuous phenomenon caused by directly selecting a detection area to cut is avoided, the visual aesthetic requirements of audiences are met, the automatic cutting speed and efficiency of the dynamic programming video are improved, and the user experience is improved.

Correspondingly, the invention further provides an automatic cutting device for the dynamic programming video.

Referring to fig. 7, fig. 7 is a functional block diagram of an automatic dynamic programming video clipping device according to a first embodiment of the present invention.

In a first embodiment of the device for automatically cutting a dynamic programming video according to the present invention, the device for automatically cutting a dynamic programming video includes:

the detection module 10 is configured to detect a source video, and obtain target content in each frame of video and corresponding image features, color histograms, and grayscale images.

And a fusion module 20, configured to fuse the image features, the color histogram, and the grayscale map by using a log-linear model to obtain video frame data.

And the path generating module 30 is configured to generate a target video sequence according to the video frame data, and find an optimal clipping path in the target video sequence by using a shortest key path in dynamic programming.

And the cutting module 40 is used for cutting the source video according to the optimal cutting path to obtain a video cutting result.

The detection module 10 is further configured to perform semantic processing on the source video to obtain a video sequence; carrying out bounding box detection on the video sequence, and determining whether the size of a bounding box of the video sequence meets a target screen; when the size of the bounding box does not accord with the target screen, obtaining the position of the region of interest of the user from the video sequence; and determining a range to be cut according to the position of the region of interest of the user, and obtaining target content in each frame of video and corresponding image characteristics, color histograms and gray level images from the range to be cut.

The fusion module 20 is further configured to obtain a color histogram value corresponding to the color histogram and a gray value corresponding to the gray map by using a log-linear model;

wherein the content of the first and second substances,

for the color histogram value of each frame of the image,

for the gray value of the image of each frame,

for the features of each frame of the image,

is the average value of the three components,

is video frame data.

The path generating module 30 is further configured to set a clipping window area and a frame window distance according to a preset constraint condition; screening the video frame data according to the cutting window area and the frame window distance to obtain a target video sequence meeting the conditions; and searching the optimal clipping path in the target video sequence by using the shortest key path in the dynamic programming.

The path generating module 30 is further configured to set a clipping window area and a frame window distance according to a preset constraint condition by the following formula:

wherein d: (

,

) Is a function of a distance measure and,

The path generating module 30 is further configured to reconstruct a dynamic transition trajectory of a clipping window of the target video sequence from a source position to a target position into a shortest critical path according to dynamic programming; acquiring a directed weighted graph corresponding to the directed edge weight of a cutting window of the target video sequence, and acquiring transition factors of all corresponding edges in the directed weighted graph; determining a visual penalty function of the shortest key path according to the transition factor, acquiring local frame data of the target video sequence, and calculating global frame data of each target position according to the local frame data; and determining an optimal clipping path in the target video sequence according to the visual penalty function and the global frame data.

The cropping module 40 is further configured to obtain a preset smoothing factor corresponding to the optimal cropping path, and find an optimal smoothing sequence in the source video according to the preset smoothing factor; and generating a video clipping result according to the optimal smooth sequence.

The steps implemented by each functional module of the dynamic programming video automatic clipping device can refer to each embodiment of the dynamic programming video automatic clipping method of the present invention, and are not described herein again.

In addition, an embodiment of the present invention further provides a storage medium, where a dynamic programming video automatic cutting program is stored on the storage medium, and when executed by a processor, the dynamic programming video automatic cutting program implements the following operations:

Further, when executed by the processor, the dynamic programming video auto-cropping program further implements the following operations:

performing semantic processing on a source video to obtain a video sequence;

when the size of the bounding box does not accord with the target screen, obtaining the position of the region of interest of the user from the video sequence;

wherein, the first and the second end of the pipe are connected with each other,

for the color histogram values of the image of each frame,

for the gray value of the image of each frame,

for the features of each frame of the image,

is the average value of the three components,

is video frame data.

wherein d: (

,

) Is a function of a distance measure and,

obtaining a preset smooth factor corresponding to the optimal cutting path, and finding an optimal smooth sequence in the source video according to the preset smooth factor;

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. The method for automatically cutting the dynamic programming video is characterized by comprising the following steps of:

2. The method for automatically cropping a dynamically planned video according to claim 1, wherein said detecting the source video to obtain the target content and the corresponding image features, color histogram and gray-scale map in each frame of video comprises:

performing semantic processing on a source video to obtain a video sequence;

3. The method of claim 1, wherein the fusing the image features, the color histogram, and the gray-scale map using a log-linear model to obtain video frame data comprises:

wherein the content of the first and second substances,

for the color histogram value of each frame of the image,

for the gray value of the image of each frame,

for the features of each frame of the image,

is the average value of the three components,

is video frame data.

4. The method according to claim 1, wherein the generating a target video sequence according to the video frame data and finding an optimal clipping path in the target video sequence using a shortest key path in dynamic programming comprises:

setting the area of a clipping window and the distance of a frame window according to a preset constraint condition;

5. The method for automatically cropping a dynamically planned video according to claim 4, wherein said setting a cropping window area and a frame window distance according to preset constraints comprises:

wherein d: (

,

) Is a function of a distance measure and,

6. The method as claimed in claim 4, wherein the finding the optimal clipping path in the target video sequence according to the shortest critical path in the dynamic programming comprises:

7. The method for automatically cropping a dynamically planned video according to claim 1, wherein said cropping the source video according to the optimal cropping path to obtain a video cropping result comprises:

8. The device for automatically cutting out the dynamic programming video is characterized by comprising the following components:

9. An automatic dynamic programming video cropping device, comprising: a memory, a processor, and a dynamic programming video auto-clip program stored on the memory and executable on the processor, the dynamic programming video auto-clip program configured to implement the steps of the dynamic programming video auto-clip method of any one of claims 1 to 7.

10. A storage medium having stored thereon a dynamic programming video auto-cropping program which, when executed by a processor, implements the steps of the dynamic programming video auto-cropping method of any of claims 1 to 7.