CN115049968A - Dynamic programming video automatic cutting method, device, equipment and storage medium - Google Patents

Dynamic programming video automatic cutting method, device, equipment and storage medium Download PDF

Info

Publication number
CN115049968A
CN115049968A CN202210966159.3A CN202210966159A CN115049968A CN 115049968 A CN115049968 A CN 115049968A CN 202210966159 A CN202210966159 A CN 202210966159A CN 115049968 A CN115049968 A CN 115049968A
Authority
CN
China
Prior art keywords
video
cutting
target
path
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210966159.3A
Other languages
Chinese (zh)
Other versions
CN115049968B (en
Inventor
沈振冈
龙思敏
周斌
胡波
李艳红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Etah Information Technology Co ltd
Original Assignee
Wuhan Etah Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Etah Information Technology Co ltd filed Critical Wuhan Etah Information Technology Co ltd
Priority to CN202210966159.3A priority Critical patent/CN115049968B/en
Publication of CN115049968A publication Critical patent/CN115049968A/en
Application granted granted Critical
Publication of CN115049968B publication Critical patent/CN115049968B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The invention discloses a method, a device, equipment and a storage medium for automatically cutting a dynamic programming video, wherein the method comprises the steps of detecting a source video to obtain target content in each frame of video and corresponding image characteristics, color histograms and gray level maps; fusing the image characteristics, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data; generating a target video sequence according to video frame data, and searching an optimal cutting path in the target video sequence by using a shortest key path in dynamic programming; the source video is cut according to the optimal cutting path, the video cutting result is obtained, the loss of cutting content can be avoided, the cutting window does not need to be moved frequently, the smoothness of the cut video is guaranteed, the reasonability of cutting the video content is guaranteed, the track discontinuity phenomenon caused by directly selecting a detection area to cut is avoided, the aesthetic visual requirement of audiences is met, the automatic cutting speed and efficiency of the video are improved, and the user experience is improved.

Description

Dynamic programming video automatic cutting method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of video clipping, in particular to a method, a device, equipment and a storage medium for dynamically planning video automatic clipping.
Background
With the rise of streaming media applications and the update of electronic equipment, videos permeate into various industries, and people have higher and higher requirements on the playing quality of different equipment; the variety of display equipment in life is more and more diversified, such as liquid crystal televisions, smart phones, tablet computers and the like, and the equipment has difference in display specifications and is suitable for different browsing modes; in order to save manpower and material resources, the multimedia content is usually made in a fixed size, which causes the phenomenon that the content of the video is not matched on different display devices; meanwhile, distortion and deformation of the video are required to be avoided when the video is played on a display, so that the aspect ratio of the video to be played can be automatically adjusted while important contents are kept, and the phenomenon that the visual quality is affected due to discontinuity, unsmooth and the like cannot occur; to solve this problem, more and more researchers in the field of computer graphics and computer vision have begun to invest in the field of image video cropping.
Unlike image cropping, video cropping introduces information of time dimension, and is more challenging because the adjustment of each frame of video needs to take the time continuity of the previous and subsequent frames into consideration, and many researchers in the field of computer vision also start to transfer the center of gravity from image cropping to video cropping, for example, by using a confrontation generation network framework to crop video, the core idea is a condition generation model; if the area which must be reserved in the video frame is determined firstly, then the clipping is realized according to the importance and the significance of the content, and when the clipped image does not meet the target size, the image is directly zoomed to the target size; if the dynamic planning is applied to optimize the time consistency of the cutting area, the effect is not expected; if a group of continuous video frames in a shot are subjected to fast curve fitting to find an optimal clipping sequence, complex optimization of time constraint between adjacent frames is avoided; also, spatial and temporal consistency is achieved, for example, by an explicit mechanism to reduce jitter and content distortion; also, unlike existing algorithms that generate thumbnails on a single image, attention is paid to the stereo saliency detection and thumbnail generation, respectively.
The above methods have certain limitations, and most of them can only aim at the condition that the target in the target sequence moves slowly or the number of the targets is small; there are two main reasons: (1) the method has the advantages that the target in the fast moving video has factors such as fuzzy, shielding and scene change, so that important information is easily lost, and the content of the cut video is incomplete; (2) for a fast moving target, the clipping window needs to be moved frequently, so that the clipped video is not smooth enough.
Disclosure of Invention
The invention mainly aims to provide a method, a device, equipment and a storage medium for dynamically planning video automatic clipping, and aims to solve the technical problems that video clipping contents are easy to lose, a clipping window needs to be frequently moved, and the clipped video is not smooth enough in the prior art.
In a first aspect, the present invention provides a method for automatically cutting a dynamic programming video, where the method for automatically cutting a dynamic programming video includes the following steps:
detecting a source video to obtain target content in each frame of video and corresponding image characteristics, color histograms and gray level maps;
fusing the image characteristics, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data;
generating a target video sequence according to the video frame data, and searching an optimal cutting path in the target video sequence by using a shortest key path in dynamic programming;
and cutting the source video according to the optimal cutting path to obtain a video cutting result.
Optionally, the detecting the source video to obtain the target content in each frame of video and the corresponding image feature, color histogram and gray-scale map includes:
performing semantic processing on a source video to obtain a video sequence;
carrying out bounding box detection on the video sequence, and determining whether the size of a bounding box of the video sequence conforms to a target screen;
when the size of the bounding box does not accord with a target screen, obtaining the position of a region of interest of a user from the video sequence;
and determining a range to be cut according to the position of the region of interest of the user, and obtaining target content in each frame of video and corresponding image characteristics, color histograms and gray level images from the range to be cut.
Optionally, the fusing the image feature, the color histogram and the gray-scale map by using a log-linear model to obtain video frame data includes:
obtaining a color histogram value corresponding to the color histogram and a gray value corresponding to the gray map by using a logarithmic linear model;
fusing the image features, the color histogram values and the gray values through the following formula to obtain video frame data:
Figure 503781DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 968391DEST_PATH_IMAGE002
for the color histogram values of the image of each frame,
Figure 279287DEST_PATH_IMAGE003
for the gray value of the image of each frame,
Figure 205654DEST_PATH_IMAGE004
for the features of each frame of the image,
Figure 551185DEST_PATH_IMAGE005
is the average value of the three components,
Figure 154074DEST_PATH_IMAGE006
for the local integrated theoretical value or desired frequency value of the tth frame,
Figure 952265DEST_PATH_IMAGE007
is video frame data.
Optionally, the generating a target video sequence according to the video frame data and finding an optimal clipping path in the target video sequence by using a shortest key path in dynamic programming includes:
setting the area of a cutting window and the distance of a frame window according to a preset constraint condition;
screening the video frame data according to the cutting window area and the frame window distance to obtain a target video sequence meeting the conditions;
and searching the optimal clipping path in the target video sequence by using the shortest key path in the dynamic programming.
Optionally, the setting of the clipping window area and the frame window distance according to the preset constraint condition includes:
setting the clipping window area and the frame window distance according to a preset constraint condition by the following formula:
Figure 682324DEST_PATH_IMAGE008
wherein d: (
Figure 882361DEST_PATH_IMAGE009
,
Figure 141304DEST_PATH_IMAGE009
) Is a function of a distance measure and,
Figure 177524DEST_PATH_IMAGE010
the distance between adjacent frames of the clipping window is shown, S (W) is used for calculating the area size of the window W,
Figure 445695DEST_PATH_IMAGE011
representing the area difference of the clipping windows between two adjacent frames.
Optionally, the finding an optimal clipping path in the target video sequence according to the shortest critical path in the application dynamic programming includes:
reconstructing a dynamic transition track of a cutting window of the target video sequence from a source position to a target position into a shortest key path according to dynamic programming;
obtaining a directed weighted graph corresponding to the directed edge weight of a cutting window of the target video sequence, and obtaining a transition factor of each corresponding edge in the directed weighted graph;
determining a visual penalty function of the shortest key path according to the transition factor, acquiring local frame data of the target video sequence, and calculating global frame data of each target position according to the local frame data;
and determining an optimal clipping path in the target video sequence according to the visual penalty function and the global frame data.
Optionally, the cropping the source video according to the optimal cropping path to obtain a video cropping result includes:
acquiring a preset smoothing factor corresponding to the optimal cutting path, and finding an optimal smoothing sequence in the source video according to the preset smoothing factor;
and generating a video clipping result according to the optimal smooth sequence.
In a second aspect, to achieve the above object, the present invention further provides an automatic dynamic programming video clipping device, including:
the detection module is used for detecting the source video to obtain target content in each frame of video and corresponding image characteristics, color histograms and gray level maps;
the fusion module is used for fusing the image characteristics, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data;
the path generation module is used for generating a target video sequence according to the video frame data and searching an optimal cutting path in the target video sequence by using a shortest key path in dynamic planning;
and the cutting module is used for cutting the source video according to the optimal cutting path to obtain a video cutting result.
In order to achieve the above object, the present invention further provides a dynamic programming video automatic clipping device, where the dynamic programming video automatic clipping device includes: the system comprises a memory, a processor and a dynamic programming video automatic clipping program stored on the memory and operable on the processor, wherein the dynamic programming video automatic clipping program is configured to realize the steps of the dynamic programming video automatic clipping method.
In a fourth aspect, to achieve the above object, the present invention further provides a storage medium, where a dynamic programming video automatic clipping program is stored, and when executed by a processor, the dynamic programming video automatic clipping program implements the steps of the dynamic programming video automatic clipping method described above.
The invention provides a dynamic programming video automatic cutting method, which comprises the steps of detecting a source video to obtain target content in each frame of video and corresponding image characteristics, color histograms and gray level maps; fusing the image features, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data; generating a target video sequence according to the video frame data, and searching an optimal cutting path in the target video sequence by using a shortest key path in dynamic programming; according to the optimal cutting path is right the source video is cut to obtain a video cutting result, the loss of cutting content can be avoided, a cutting window does not need to be frequently moved, the smoothness of the cut video is guaranteed, the rationality of cutting the video content is guaranteed, the phenomenon of track discontinuity caused by directly selecting a detection area for cutting is avoided, the aesthetic requirement of vision of audiences is met, the automatic cutting speed and efficiency of the dynamic programming video are improved, and the user experience is improved.
Drawings
FIG. 1 is a schematic diagram of an apparatus architecture of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating a first embodiment of a method for automatically cropping a dynamic programming video according to the present invention;
FIG. 3 is a flowchart illustrating a second embodiment of a method for automatically cropping a dynamic programming video according to the present invention;
FIG. 4 is a flowchart illustrating a third embodiment of a method for automatically cropping a dynamic programming video according to the present invention;
FIG. 5 is a flowchart illustrating a fourth embodiment of a method for automatically cropping a dynamic programming video according to the present invention;
FIG. 6 is a flowchart illustrating a fifth embodiment of a method for automatically cropping a dynamic programming video according to the present invention;
FIG. 7 is a functional block diagram of an automatic dynamic programming video clipping device according to a first embodiment of the present invention.
The implementation, functional features and advantages of the present invention will be further described with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
The solution of the embodiment of the invention is mainly as follows: detecting a source video to obtain target content in each frame of video and corresponding image characteristics, color histograms and gray level maps; fusing the image features, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data; generating a target video sequence according to the video frame data, and searching an optimal cutting path in the target video sequence by using a shortest key path in dynamic programming; according to the optimal cutting path is right the source video is cut to obtain a video cutting result, the loss of cutting content can be avoided, a cutting window does not need to be moved frequently, the smooth and smooth video cutting after cutting is guaranteed, the rationality of cutting the video content is guaranteed, the track discontinuous phenomenon caused by directly selecting a detection area for cutting is avoided, the visual aesthetic requirement of audiences is met, the automatic cutting speed and efficiency of a dynamic programming video are improved, the user experience is improved, the technical problem that in the prior art, the video cutting content is easy to lose, the cutting window needs to be moved frequently, and the cut video is not smooth and smooth enough is solved.
Referring to fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the apparatus may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. The communication bus 1002 is used to implement connection communication among these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., a Wi-Fi interface). The Memory 1005 may be a high-speed RAM Memory or a Non-Volatile Memory (Non-Volatile Memory), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the configuration of the apparatus shown in fig. 1 is not intended to be limiting of the apparatus and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a storage medium, may include an operating device, a network communication module, a user interface module, and a dynamic programming video auto-cropping program.
The device calls a dynamic programming video automatic cutting program stored in a memory 1005 through a processor 1001 and executes the following operations:
detecting a source video to obtain target content in each frame of video and corresponding image characteristics, color histograms and gray level maps;
fusing the image features, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data;
generating a target video sequence according to the video frame data, and searching an optimal cutting path in the target video sequence by using a shortest key path in dynamic programming;
and cutting the source video according to the optimal cutting path to obtain a video cutting result.
The device calls the dynamic programming video automatic cutting program stored in the memory 1005 through the processor 1001, and also executes the following operations:
performing semantic processing on a source video to obtain a video sequence;
carrying out bounding box detection on the video sequence, and determining whether the size of a bounding box of the video sequence meets a target screen;
when the size of the bounding box does not accord with a target screen, obtaining the position of a region of interest of a user from the video sequence;
and determining a range to be cut according to the position of the region of interest of the user, and obtaining target content in each frame of video and corresponding image characteristics, color histograms and gray level images from the range to be cut.
The device calls the dynamic programming video automatic cutting program stored in the memory 1005 through the processor 1001, and also executes the following operations:
obtaining a color histogram value corresponding to the color histogram and a gray value corresponding to the gray map by using a logarithmic linear model;
fusing the image features, the color histogram values and the gray values through the following formula to obtain video frame data:
Figure 234659DEST_PATH_IMAGE012
wherein the content of the first and second substances,
Figure 195662DEST_PATH_IMAGE002
for the color histogram value of each frame of the image,
Figure 217713DEST_PATH_IMAGE003
for the gray value of the image of each frame,
Figure 23995DEST_PATH_IMAGE004
for the features of each frame of the image,
Figure 933046DEST_PATH_IMAGE005
is the average value of the three components,
Figure 799370DEST_PATH_IMAGE006
for the local integrated theoretical value or desired frequency value of the tth frame,
Figure 59450DEST_PATH_IMAGE007
is video frame data.
The device calls the dynamic programming video automatic cutting program stored in the memory 1005 through the processor 1001, and also executes the following operations:
setting the area of a cutting window and the distance of a frame window according to a preset constraint condition;
screening the video frame data according to the cutting window area and the frame window distance to obtain a target video sequence meeting the conditions;
and searching the optimal clipping path in the target video sequence by using the shortest key path in the dynamic programming.
The device calls the dynamic programming video automatic cutting program stored in the memory 1005 through the processor 1001, and also executes the following operations:
setting the clipping window area and the frame window distance according to a preset constraint condition by the following formula:
Figure 154577DEST_PATH_IMAGE013
wherein d: (
Figure 183712DEST_PATH_IMAGE009
,
Figure 486518DEST_PATH_IMAGE009
) Is a function of a distance measure and,
Figure 968315DEST_PATH_IMAGE010
the distance between adjacent frames of the cropping window is shown, S (W) is used for calculating the area size of the window W,
Figure 116399DEST_PATH_IMAGE011
representing the area difference of the clipping windows between two adjacent frames.
The device calls the dynamic programming video automatic cutting program stored in the memory 1005 through the processor 1001, and also executes the following operations:
reconstructing a dynamic transition track of a cutting window of the target video sequence from a source position to a target position into a shortest key path according to dynamic programming;
acquiring a directed weighted graph corresponding to the directed edge weight of a cutting window of the target video sequence, and acquiring transition factors of all corresponding edges in the directed weighted graph;
determining a visual penalty function of the shortest key path according to the transition factor, acquiring local frame data of the target video sequence, and calculating global frame data of each target position according to the local frame data;
and determining the optimal clipping path in the target video sequence according to the visual penalty function and the global frame data.
The device calls the dynamic programming video automatic cutting program stored in the memory 1005 through the processor 1001, and also executes the following operations:
acquiring a preset smoothing factor corresponding to the optimal cutting path, and finding an optimal smoothing sequence in the source video according to the preset smoothing factor;
and generating a video clipping result according to the optimal smooth sequence.
According to the scheme, the target content in each frame of video and the corresponding image characteristics, color histograms and gray level images are obtained by detecting the source video; fusing the image features, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data; generating a target video sequence according to the video frame data, and searching an optimal cutting path in the target video sequence by using a shortest key path in dynamic programming; according to the optimal cutting path is right the source video is cut to obtain a video cutting result, the loss of cutting content can be avoided, a cutting window does not need to be frequently moved, the smoothness of the cut video is guaranteed, the rationality of cutting the video content is guaranteed, the phenomenon of track discontinuity caused by directly selecting a detection area for cutting is avoided, the aesthetic requirement of vision of audiences is met, the automatic cutting speed and efficiency of the dynamic programming video are improved, and the user experience is improved.
Based on the hardware structure, the embodiment of the method for automatically cutting the dynamic programming video is provided.
Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of a dynamic programming video auto-cropping method according to the present invention.
In a first embodiment, the method for automatically cropping a dynamically planned video includes the following steps:
and step S10, detecting the source video, and obtaining the target content in each frame of video and the corresponding image characteristics, color histogram and gray level map.
After the source video to be cropped is detected, predetermined important contents, that is, the target content and the corresponding image feature, color histogram and gray scale map thereof, in each frame of video can be obtained.
And step S20, fusing the image characteristics, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data.
It is understood that the image features, the color histogram and the gray scale map may be fused by using a log-linear model to obtain corresponding frame coefficient data, i.e. video frame data.
Further, the step S20 specifically includes the following steps:
obtaining a color histogram value corresponding to the color histogram and a gray value corresponding to the gray map by using a logarithmic linear model;
fusing the image features, the color histogram values and the gray values through the following formula to obtain video frame data:
Figure 249309DEST_PATH_IMAGE014
(1)
wherein the content of the first and second substances,
Figure 723016DEST_PATH_IMAGE002
for the color histogram value of each frame of the image,
Figure 426530DEST_PATH_IMAGE003
for the gray value of the image of each frame,
Figure 378305DEST_PATH_IMAGE004
for the features of each frame of the image,
Figure 132766DEST_PATH_IMAGE005
is the average value of the three components,
Figure 777374DEST_PATH_IMAGE006
for the local integrated theoretical value or desired frequency value of the tth frame,
Figure 968184DEST_PATH_IMAGE007
is video frame data.
It should be noted that finding a suitable track is the core of video cropping, and this track not only needs to traverse the entire video, but also should contain important content; due to the motion, scaling and background transformation of the camera, the size of a simple target detection area frame is not enough to determine an important area in a video sequence; in order to measure the content correlation in the detection result, a log-linear model is defined, through analyzing the output of the previous stage, modeling is carried out by using the log-linear model according to the gray-scale image, the frame color histogram and the image characteristics contained in each frame of detected content to generate each frame of data, the approximate boundary of each frame of cutting area is found through the frame data, the gray-scale image, the frame color histogram and the image appearance are given, and for each position (x, y) in the image x, calculation is carried out by fusing three characteristics and according to the formula.
And step S30, generating a target video sequence according to the video frame data, and searching the optimal cutting path in the target video sequence by using the shortest key path in dynamic planning.
It should be understood that, a corresponding adaptation sequence may be generated according to the video frame data, and then an optimal clipping track, that is, an optimal clipping path in the target video sequence, is found by using a shortest critical path in dynamic planning.
And step S40, clipping the source video according to the optimal clipping path to obtain a video clipping result.
It can be understood that the source video can be cropped through the optimal cropping path, and a video cropping result is obtained.
According to the scheme, the target content in each frame of video and the corresponding image characteristics, color histograms and gray level images are obtained by detecting the source video; fusing the image characteristics, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data; generating a target video sequence according to the video frame data, and searching an optimal cutting path in the target video sequence by using a shortest key path in dynamic programming; according to the optimal cutting path is right the source video is cut to obtain a video cutting result, the loss of cutting content can be avoided, a cutting window does not need to be frequently moved, the smoothness of the cut video is guaranteed, the rationality of cutting the video content is guaranteed, the phenomenon of track discontinuity caused by directly selecting a detection area for cutting is avoided, the aesthetic requirement of vision of audiences is met, the automatic cutting speed and efficiency of the dynamic programming video are improved, and the user experience is improved.
Further, fig. 3 is a schematic flow chart of a second embodiment of the dynamic programming video automatic clipping method of the present invention, and as shown in fig. 3, the second embodiment of the dynamic programming video automatic clipping method of the present invention is provided based on the first embodiment, and in this embodiment, the step S10 specifically includes the following steps:
and step S11, performing semantic processing on the source video to obtain a video sequence.
It should be noted that, the source video is subjected to semantic processing, and a video sequence after the semantic processing can be obtained.
And step S12, carrying out boundary box detection on the video sequence, and determining whether the size of the boundary box of the video sequence conforms to a target screen.
It is understood that the bounding box detection of the video sequence may determine whether the size of the video sequence conforms to the target screen, i.e., whether the size of the bounding box of the video sequence conforms to the display screen size of the target display device.
And step S13, when the size of the bounding box does not accord with the target screen, obtaining the position of the region of interest of the user from the video sequence.
It should be appreciated that the user roi can be obtained from the video sequence when the bounding box size does not fit the target screen, and in general, the user roi position can be found for each frame of video using the open source high performance detector YOLOX.
In a specific implementation, since the saliency detection does not detect the saliency detection in place due to the sudden appearance in a certain frame, the YOLOX model can be used for detection, and the detection speed is improved mainly because the YOLOX is superior to the saliency detection in accuracy.
And step S14, determining a range to be cut according to the position of the user interesting region, and obtaining target content in each frame of video and corresponding image characteristics, color histograms and gray level maps from the range to be cut.
It can be understood that the region-of-interest position of the user can determine a range to be cropped, from important content in each video frame corresponding to the range to be cropped, that is, the target content and the image feature, the color histogram and the grayscale map corresponding to the target content.
According to the scheme, the video sequence is obtained by performing semantic processing on the source video; carrying out bounding box detection on the video sequence, and determining whether the size of a bounding box of the video sequence meets a target screen; when the size of the bounding box does not accord with the target screen, obtaining the position of the region of interest of the user from the video sequence; and determining a range to be cut according to the position of the region of interest of the user, and obtaining target content in each frame of video and corresponding image characteristics, color histograms and gray level maps from the range to be cut, so that the image characteristics, the color histograms and the gray level maps of the video frames can be obtained, and the speed and the efficiency of automatic cutting of the video are improved.
Further, fig. 4 is a schematic flow chart of a third embodiment of the method for automatically cropping a dynamically planned video according to the present invention, and as shown in fig. 4, the third embodiment of the method for automatically cropping a dynamically planned video according to the present invention is provided based on the first embodiment, and in this embodiment, the step S30 specifically includes the following steps:
and step S31, setting the clipping window area and the frame window distance according to the preset constraint condition.
It should be noted that, the clipping window area of the video frame data and the frame window distance of two adjacent frames may be set by presetting the constraint condition.
Further, the step S31 specifically includes the following steps:
setting the clipping window area and the frame window distance according to a preset constraint condition by the following formula:
Figure 723650DEST_PATH_IMAGE015
(2)
wherein d: (
Figure 316305DEST_PATH_IMAGE009
,
Figure 398660DEST_PATH_IMAGE009
) Is a function of a distance measure and,
Figure 342346DEST_PATH_IMAGE010
the distance between adjacent frames of the cropping window is shown, S (W) is used for calculating the area size of the window W,
Figure 635924DEST_PATH_IMAGE011
representing the area difference of the clipping windows between two adjacent frames.
It will be appreciated that in view of the previously obtained frame data, an optimal cropping sequence is found for a given video sequence to obtain a smooth and time consistent visual effect; the clipping sequence directly found according to each frame data may generate a non-smooth clipping sequence, because the distance of the clipping window may be relatively far or the area may have a large jump, in order to obtain a smooth sequence, the clipping window between two consecutive frames should keep a small relative position distance, and at the same time, the area of the clipping window cannot have too large difference; distance and area constraints are added herein, as defined in the above equation.
And step S32, screening the video frame data according to the cutting window area and the frame window distance to obtain a qualified target video sequence.
It can be understood that, by filtering the video frame data according to the cropping window area and the frame window distance, a video sequence conforming to the cropping window area and the frame window distance can be obtained as a qualified target video sequence.
And step S33, finding the optimal cutting path in the target video sequence by using the shortest key path in the dynamic planning.
It should be appreciated that the optimal clipping path in the target video sequence can be found using the shortest critical path in dynamic programming.
According to the scheme, the area of the cutting window and the distance of the frame window are set according to the preset constraint condition; screening the video frame data according to the cutting window area and the frame window distance to obtain a target video sequence meeting the conditions; the optimal cutting path in the target video sequence is searched by using the shortest key path in the dynamic programming, the loss of cutting content can be avoided, the cutting window does not need to be frequently moved, the smooth and smooth video after cutting is ensured, the rationality of cutting the video content is ensured, the track discontinuity phenomenon caused by directly selecting a detection area for cutting is avoided, the visual aesthetic requirement of audiences is met, the automatic cutting speed and efficiency of the dynamic programming video are improved, and the user experience is improved.
Further, fig. 5 is a schematic flow chart of a fourth embodiment of the dynamic programming video automatic clipping method of the present invention, and as shown in fig. 5, the fourth embodiment of the dynamic programming video automatic clipping method of the present invention is provided based on the third embodiment, in this embodiment, the step S33 specifically includes the following steps:
and S331, reconstructing a dynamic transition track of the cutting window of the target video sequence from a source position to a target position into a shortest key path according to dynamic planning.
It should be noted that, the shortest critical path is determined for the clipping window of the target video sequence according to dynamic planning, that is, the dynamic transition trajectory from the source position to the target position of the clipping window is reconstructed into the shortest path.
And S332, acquiring a directed weighted graph corresponding to the directed edge weight of the clipping window of the target video sequence, and acquiring a transition factor of each corresponding edge in the directed weighted graph.
It can be understood that, obtaining a directed weighted graph corresponding to the directed edge weight of the clipping window of the target video sequence can obtain the transition factor of each corresponding edge.
In the specific implementation, in consideration of the visual effect on improving the time continuity, the cost of directly searching all possible tracks of the cutting window is too high, and a usable dynamic planning method is used for reconstructing the dynamic transition track of the cutting window from the source position to the target position into the shortest path problem; and finding the approximate position of each frame of cutting window according to each frame of local frame data, constructing a directed weighted graph at the same time, taking the cutting window as a node of the graph, taking the frame data value in each frame as the weight of a directed edge, and constructing the directed weighted graph by combining with the tracking of the cutting window.
Step S333, determining the visual penalty function of the shortest key path according to the transition factor, acquiring local frame data of the target video sequence, and calculating global frame data of each target position according to the local frame data.
It should be understood that, by determining the visual penalty function of the shortest critical path through the transition factor, after obtaining the local frame data of the target video sequence, the global frame data of each target position can be calculated according to the local frame data.
In the specific implementation, a shortest path algorithm is applied to the directed weighted graph to find a globally optimal clipping track, and in order to ensure that the clipping window keeps smooth in the moving process, except for the distance constraint of each frame of the clipping window defined in the formula (2), for the edges in the directed weighted graph, transition factors of the corresponding edges are defined herein
Figure 348665DEST_PATH_IMAGE016
Figure 820228DEST_PATH_IMAGE017
(3)
When finding the optimal track, the situation that significant information loss occurs in the process of clipping window shift is also considered, but the information loss and the clipping window transition shift are contradictory, and in order to balance the two to optimize the visual comfort, a visual penalty function is defined:
Figure 985631DEST_PATH_IMAGE018
wherein the content of the first and second substances,
Figure 82900DEST_PATH_IMAGE019
for significant information loss in each frame,
Figure 915726DEST_PATH_IMAGE016
the transition offset for the cropping window between adjacent frames,
Figure 807459DEST_PATH_IMAGE020
is the node position in the upper left corner of each frame,
Figure 975004DEST_PATH_IMAGE021
is a dynamic trajectory representing all of the cropping windows,
Figure 610385DEST_PATH_IMAGE022
is a balance parameter between significant information loss and clipping window offset, and n represents the total frame number.
Step S334, determining an optimal clipping path in the target video sequence according to the visual penalty function and the global frame data.
It is understood that the optimal clipping path in the target video sequence is determined by the visual penalty function and the global frame data.
In a specific implementation, in order to make the resulting clipping sequence work best, the visual penalty needs to be minimized, so minimizing V in the equation is equivalent to the slave node (x) i-1 ,y i-1 ) To node (x) i ,y i ) Then the minimum visual penalty is defined as follows:
Figure 32139DEST_PATH_IMAGE023
(5)
wherein the content of the first and second substances,
Figure 360352DEST_PATH_IMAGE024
a representation of the source node is shown,
Figure 251079DEST_PATH_IMAGE025
indicating the shortest path from the source node of frame 1 to the kth node of the ith frame,
Figure 690150DEST_PATH_IMAGE026
indicating the shortest path from the source node of frame 1 to the jth node of frame i-1,
Figure 231990DEST_PATH_IMAGE027
indicating an offset transition from the jth node of the (i-1) th frame to the kth node of the ith frame.
Calculating global frame data of each target position according to the local frame data obtained in the formula (1) by the final cutting track, determining an optimal track according to the global frame data and a visual penalty function, and storing the obtained candidate value in a backtracking algorithm
Figure 731104DEST_PATH_IMAGE028
The method comprises the following steps:
Figure 358395DEST_PATH_IMAGE029
(6)
Figure 850425DEST_PATH_IMAGE030
(7)
wherein
Figure 246771DEST_PATH_IMAGE031
For the global frame data of each target location,
Figure 916787DEST_PATH_IMAGE006
is local frame data per frame, R (x, y) is a possible candidate node for node (x, y), B t (x, y) are possible results of the storage.
According to the scheme, the dynamic transition track of the cutting window of the target video sequence from the source position to the target position is reconstructed into the shortest key path according to dynamic planning; acquiring a directed weighted graph corresponding to the directed edge weight of a cutting window of the target video sequence, and acquiring transition factors of all corresponding edges in the directed weighted graph; determining a visual penalty function of the shortest key path according to the transition factor, acquiring local frame data of the target video sequence, and calculating global frame data of each target position according to the local frame data; and determining the optimal cutting path in the target video sequence according to the visual penalty function and the global frame data, so that the loss of cutting content can be avoided, a cutting window does not need to be frequently moved, the smoothness of the cut video is ensured, and the reasonability of the cutting video content is ensured.
Further, fig. 6 is a schematic flow chart of a fifth embodiment of the dynamic programming video automatic clipping method of the present invention, and as shown in fig. 6, the fifth embodiment of the dynamic programming video automatic clipping method of the present invention is provided based on the first embodiment, in this embodiment, the step S40 specifically includes the following steps:
and step S41, obtaining a preset smoothing factor corresponding to the optimal clipping path, and finding the optimal smoothing sequence in the source video according to the preset smoothing factor.
It should be noted that, after the preset smoothing factor corresponding to the optimal clipping path is obtained, the optimal smoothing sequence in the source video may be found according to the preset smoothing factor.
And step S42, generating a video cutting result according to the optimal smooth sequence.
It should be appreciated that video cropping results may be generated by the optimal smoothing sequence.
In the specific implementation, if the cutting track is not subjected to smooth processing when a directed weighted graph is constructed in a dynamic programming algorithm, disturbance with a certain component exists, so that the finally obtained cutting result looks like jitter; to improve visual comfort, a smoothing factor is defined in the resulting clipping trajectory to ensure allowable transitions between subsequent frames, finding the best smooth sequence, and making the video more stable in time.
The smoothing factor is designed to improve the smoothness of the clipping trajectory, as in equation (3)
Figure 765794DEST_PATH_IMAGE032
Smoothing the cropping window between adjacent frames as a transition factor that can be used to screen for large jumps between two consecutive frames, e.g., using two nodes (x) in two adjacent frames 1 ,y 1 ) And (x) 2 ,y 2 ) Euclidean distance between:
Figure 297401DEST_PATH_IMAGE033
(8)
wherein
Figure 813833DEST_PATH_IMAGE022
Are weighting parameters that control smoothness and flexibility.
Since the image after cropping is distorted and deformed in the conventional method, which is caused by the fact that the image area obtained after cropping is not uniform in size and is directly scaled, this phenomenon is changed by the area constraint in equation (2) and setting the scaling factor:
Figure 654750DEST_PATH_IMAGE034
(9)
wherein (w, h) and (
Figure 991053DEST_PATH_IMAGE035
) The sizes of the current and original clipping areas are obtained.
After the video cropping results are obtained, an evaluation may be made:
in the embodiment, the Cropping Rate (CR) refers to the ratio of important contents in the cropped image to the important contents of a source video sequence and is calculated through a frame color histogram, and the distortion Degree (DR) is measured by using SSIM (structural similarity) scores, is commonly used for judging the fidelity and the similarity between two images and can also be used for observing the distortion degree; the Stability (SR) refers to the relative position change of the same target between two continuous frames, namely, the jitter condition of the cutting window is judged; the CR, SSIM, DR and SR calculation expressions are defined as (10), (11), (12) and (13):
Figure 575619DEST_PATH_IMAGE036
(10)
wherein S A Representing important content areas, S, of source video frames B Representing the area of the important content after cropping.
Figure 195825DEST_PATH_IMAGE037
(11)
Figure 207643DEST_PATH_IMAGE038
(12)
Wherein
Figure 765663DEST_PATH_IMAGE039
Represents I 1 The average value of (a) is calculated,
Figure 153919DEST_PATH_IMAGE040
represents I 2 The average value of (a) is calculated,
Figure 130097DEST_PATH_IMAGE041
represents I 1 The variance of (a) is determined,
Figure 47237DEST_PATH_IMAGE042
represents I 2 The variance of (a) is determined,
Figure 623712DEST_PATH_IMAGE043
represents I 1 And I 2 Covariance of (a), b 1 And b 2 To prevent overflow conditions caused by SSIM being zero the SSIM score is [0,1 ]]In between, when the contents of the two video images are completely the same, the SSIM score is equal to 1.
Figure 550080DEST_PATH_IMAGE044
(13)
In the embodiment, the boundary of the detection frame of each frame of important content is taken as a reference, and the change degree of the clipping window is judged by calculating the left and right distances from the clipping window to the target content in the front frame and the rear frame; is represented by the formula (13), wherein d i Clipping the distance from the left border of the window to the target content for the ith frame, d i-1 Taking the distance from the left border of the cropping window to the target content in the i-1 th frame as an example, a smaller SR indicates a smaller jitter degree, i.e., a better stability.
According to the scheme, the optimal smooth sequence in the source video is found according to the preset smooth factor by obtaining the preset smooth factor corresponding to the optimal clipping path; according to the optimal smooth sequence generation video cutting result, the loss of cutting contents can be avoided, the cutting window does not need to be frequently moved, the smooth and smooth video after cutting is guaranteed, the rationality of cutting the video contents is guaranteed, the track discontinuous phenomenon caused by directly selecting a detection area to cut is avoided, the visual aesthetic requirements of audiences are met, the automatic cutting speed and efficiency of the dynamic programming video are improved, and the user experience is improved.
Correspondingly, the invention further provides an automatic cutting device for the dynamic programming video.
Referring to fig. 7, fig. 7 is a functional block diagram of an automatic dynamic programming video clipping device according to a first embodiment of the present invention.
In a first embodiment of the device for automatically cutting a dynamic programming video according to the present invention, the device for automatically cutting a dynamic programming video includes:
the detection module 10 is configured to detect a source video, and obtain target content in each frame of video and corresponding image features, color histograms, and grayscale images.
And a fusion module 20, configured to fuse the image features, the color histogram, and the grayscale map by using a log-linear model to obtain video frame data.
And the path generating module 30 is configured to generate a target video sequence according to the video frame data, and find an optimal clipping path in the target video sequence by using a shortest key path in dynamic programming.
And the cutting module 40 is used for cutting the source video according to the optimal cutting path to obtain a video cutting result.
The detection module 10 is further configured to perform semantic processing on the source video to obtain a video sequence; carrying out bounding box detection on the video sequence, and determining whether the size of a bounding box of the video sequence meets a target screen; when the size of the bounding box does not accord with the target screen, obtaining the position of the region of interest of the user from the video sequence; and determining a range to be cut according to the position of the region of interest of the user, and obtaining target content in each frame of video and corresponding image characteristics, color histograms and gray level images from the range to be cut.
The fusion module 20 is further configured to obtain a color histogram value corresponding to the color histogram and a gray value corresponding to the gray map by using a log-linear model;
fusing the image features, the color histogram values and the gray values through the following formula to obtain video frame data:
Figure 364452DEST_PATH_IMAGE045
wherein the content of the first and second substances,
Figure 504358DEST_PATH_IMAGE002
for the color histogram value of each frame of the image,
Figure 36971DEST_PATH_IMAGE003
for the gray value of the image of each frame,
Figure 767029DEST_PATH_IMAGE004
for the features of each frame of the image,
Figure 701487DEST_PATH_IMAGE005
is the average value of the three components,
Figure 226009DEST_PATH_IMAGE006
for the local integrated theoretical value or desired frequency value of the tth frame,
Figure 262230DEST_PATH_IMAGE007
is video frame data.
The path generating module 30 is further configured to set a clipping window area and a frame window distance according to a preset constraint condition; screening the video frame data according to the cutting window area and the frame window distance to obtain a target video sequence meeting the conditions; and searching the optimal clipping path in the target video sequence by using the shortest key path in the dynamic programming.
The path generating module 30 is further configured to set a clipping window area and a frame window distance according to a preset constraint condition by the following formula:
Figure 530400DEST_PATH_IMAGE046
wherein d: (
Figure 584944DEST_PATH_IMAGE009
,
Figure 280367DEST_PATH_IMAGE009
) Is a function of a distance measure and,
Figure 36840DEST_PATH_IMAGE010
the distance between adjacent frames of the cropping window is shown, S (W) is used for calculating the area size of the window W,
Figure 108701DEST_PATH_IMAGE011
representing the area difference of the clipping windows between two adjacent frames.
The path generating module 30 is further configured to reconstruct a dynamic transition trajectory of a clipping window of the target video sequence from a source position to a target position into a shortest critical path according to dynamic programming; acquiring a directed weighted graph corresponding to the directed edge weight of a cutting window of the target video sequence, and acquiring transition factors of all corresponding edges in the directed weighted graph; determining a visual penalty function of the shortest key path according to the transition factor, acquiring local frame data of the target video sequence, and calculating global frame data of each target position according to the local frame data; and determining an optimal clipping path in the target video sequence according to the visual penalty function and the global frame data.
The cropping module 40 is further configured to obtain a preset smoothing factor corresponding to the optimal cropping path, and find an optimal smoothing sequence in the source video according to the preset smoothing factor; and generating a video clipping result according to the optimal smooth sequence.
The steps implemented by each functional module of the dynamic programming video automatic clipping device can refer to each embodiment of the dynamic programming video automatic clipping method of the present invention, and are not described herein again.
In addition, an embodiment of the present invention further provides a storage medium, where a dynamic programming video automatic cutting program is stored on the storage medium, and when executed by a processor, the dynamic programming video automatic cutting program implements the following operations:
detecting a source video to obtain target content in each frame of video and corresponding image characteristics, color histograms and gray level maps;
fusing the image features, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data;
generating a target video sequence according to the video frame data, and searching an optimal cutting path in the target video sequence by using a shortest key path in dynamic programming;
and cutting the source video according to the optimal cutting path to obtain a video cutting result.
Further, when executed by the processor, the dynamic programming video auto-cropping program further implements the following operations:
performing semantic processing on a source video to obtain a video sequence;
carrying out bounding box detection on the video sequence, and determining whether the size of a bounding box of the video sequence meets a target screen;
when the size of the bounding box does not accord with the target screen, obtaining the position of the region of interest of the user from the video sequence;
and determining a range to be cut according to the position of the region of interest of the user, and obtaining target content in each frame of video and corresponding image characteristics, color histograms and gray level images from the range to be cut.
Further, when executed by the processor, the dynamic programming video auto-cropping program further implements the following operations:
obtaining a color histogram value corresponding to the color histogram and a gray value corresponding to the gray map by using a logarithmic linear model;
fusing the image features, the color histogram values and the gray values through the following formula to obtain video frame data:
Figure 283330DEST_PATH_IMAGE045
wherein, the first and the second end of the pipe are connected with each other,
Figure 149655DEST_PATH_IMAGE002
for the color histogram values of the image of each frame,
Figure 160467DEST_PATH_IMAGE003
for the gray value of the image of each frame,
Figure 504861DEST_PATH_IMAGE004
for the features of each frame of the image,
Figure 533997DEST_PATH_IMAGE005
is the average value of the three components,
Figure 571223DEST_PATH_IMAGE006
for the local integrated theoretical value or desired frequency value of the tth frame,
Figure 53020DEST_PATH_IMAGE007
is video frame data.
Further, when executed by the processor, the dynamic programming video auto-cropping program further implements the following operations:
setting the area of a cutting window and the distance of a frame window according to a preset constraint condition;
screening the video frame data according to the cutting window area and the frame window distance to obtain a target video sequence meeting the conditions;
and searching the optimal clipping path in the target video sequence by using the shortest key path in the dynamic programming.
Further, when executed by the processor, the dynamic programming video auto-cropping program further implements the following operations:
setting the clipping window area and the frame window distance according to a preset constraint condition by the following formula:
Figure 715951DEST_PATH_IMAGE046
wherein d: (
Figure 599594DEST_PATH_IMAGE009
,
Figure 73300DEST_PATH_IMAGE009
) Is a function of a distance measure and,
Figure 776814DEST_PATH_IMAGE010
the distance between adjacent frames of the cropping window is shown, S (W) is used for calculating the area size of the window W,
Figure 728590DEST_PATH_IMAGE011
representing the area difference of the clipping windows between two adjacent frames.
Further, when executed by the processor, the dynamic programming video auto-cropping program further implements the following operations:
reconstructing a dynamic transition track of a cutting window of the target video sequence from a source position to a target position into a shortest key path according to dynamic programming;
acquiring a directed weighted graph corresponding to the directed edge weight of a cutting window of the target video sequence, and acquiring transition factors of all corresponding edges in the directed weighted graph;
determining a visual penalty function of the shortest key path according to the transition factor, acquiring local frame data of the target video sequence, and calculating global frame data of each target position according to the local frame data;
and determining an optimal clipping path in the target video sequence according to the visual penalty function and the global frame data.
Further, when executed by the processor, the dynamic programming video auto-cropping program further implements the following operations:
obtaining a preset smooth factor corresponding to the optimal cutting path, and finding an optimal smooth sequence in the source video according to the preset smooth factor;
and generating a video clipping result according to the optimal smooth sequence.
According to the scheme, the target content in each frame of video and the corresponding image characteristics, color histograms and gray level images are obtained by detecting the source video; fusing the image features, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data; generating a target video sequence according to the video frame data, and searching an optimal cutting path in the target video sequence by using a shortest key path in dynamic programming; according to the optimal cutting path is right the source video is cut to obtain a video cutting result, the loss of cutting content can be avoided, a cutting window does not need to be frequently moved, the smoothness of the cut video is guaranteed, the rationality of cutting the video content is guaranteed, the phenomenon of track discontinuity caused by directly selecting a detection area for cutting is avoided, the aesthetic requirement of vision of audiences is met, the automatic cutting speed and efficiency of the dynamic programming video are improved, and the user experience is improved.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. The method for automatically cutting the dynamic programming video is characterized by comprising the following steps of:
detecting a source video to obtain target content in each frame of video and corresponding image characteristics, color histograms and gray level maps;
fusing the image features, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data;
generating a target video sequence according to the video frame data, and searching an optimal cutting path in the target video sequence by using a shortest key path in dynamic programming;
and cutting the source video according to the optimal cutting path to obtain a video cutting result.
2. The method for automatically cropping a dynamically planned video according to claim 1, wherein said detecting the source video to obtain the target content and the corresponding image features, color histogram and gray-scale map in each frame of video comprises:
performing semantic processing on a source video to obtain a video sequence;
carrying out bounding box detection on the video sequence, and determining whether the size of a bounding box of the video sequence meets a target screen;
when the size of the bounding box does not accord with the target screen, obtaining the position of the region of interest of the user from the video sequence;
and determining a range to be cut according to the position of the region of interest of the user, and obtaining target content in each frame of video and corresponding image characteristics, color histograms and gray level images from the range to be cut.
3. The method of claim 1, wherein the fusing the image features, the color histogram, and the gray-scale map using a log-linear model to obtain video frame data comprises:
obtaining a color histogram value corresponding to the color histogram and a gray value corresponding to the gray map by using a logarithmic linear model;
fusing the image features, the color histogram values and the gray values through the following formula to obtain video frame data:
Figure 499716DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 503444DEST_PATH_IMAGE002
for the color histogram value of each frame of the image,
Figure 413631DEST_PATH_IMAGE003
for the gray value of the image of each frame,
Figure 870020DEST_PATH_IMAGE004
for the features of each frame of the image,
Figure 359907DEST_PATH_IMAGE005
is the average value of the three components,
Figure 483721DEST_PATH_IMAGE006
for the local integrated theoretical value or desired frequency value of the tth frame,
Figure 768072DEST_PATH_IMAGE007
is video frame data.
4. The method according to claim 1, wherein the generating a target video sequence according to the video frame data and finding an optimal clipping path in the target video sequence using a shortest key path in dynamic programming comprises:
setting the area of a clipping window and the distance of a frame window according to a preset constraint condition;
screening the video frame data according to the cutting window area and the frame window distance to obtain a target video sequence meeting the conditions;
and searching the optimal clipping path in the target video sequence by using the shortest key path in the dynamic programming.
5. The method for automatically cropping a dynamically planned video according to claim 4, wherein said setting a cropping window area and a frame window distance according to preset constraints comprises:
setting the clipping window area and the frame window distance according to a preset constraint condition by the following formula:
Figure 210292DEST_PATH_IMAGE008
wherein d: (
Figure 769449DEST_PATH_IMAGE009
,
Figure 747770DEST_PATH_IMAGE009
) Is a function of a distance measure and,
Figure 999760DEST_PATH_IMAGE010
the distance between adjacent frames of the cropping window is shown, S (W) is used for calculating the area size of the window W,
Figure 165162DEST_PATH_IMAGE011
representing the area difference of the clipping windows between two adjacent frames.
6. The method as claimed in claim 4, wherein the finding the optimal clipping path in the target video sequence according to the shortest critical path in the dynamic programming comprises:
reconstructing a dynamic transition track of a cutting window of the target video sequence from a source position to a target position into a shortest key path according to dynamic programming;
acquiring a directed weighted graph corresponding to the directed edge weight of a cutting window of the target video sequence, and acquiring transition factors of all corresponding edges in the directed weighted graph;
determining a visual penalty function of the shortest key path according to the transition factor, acquiring local frame data of the target video sequence, and calculating global frame data of each target position according to the local frame data;
and determining an optimal clipping path in the target video sequence according to the visual penalty function and the global frame data.
7. The method for automatically cropping a dynamically planned video according to claim 1, wherein said cropping the source video according to the optimal cropping path to obtain a video cropping result comprises:
acquiring a preset smoothing factor corresponding to the optimal cutting path, and finding an optimal smoothing sequence in the source video according to the preset smoothing factor;
and generating a video clipping result according to the optimal smooth sequence.
8. The device for automatically cutting out the dynamic programming video is characterized by comprising the following components:
the detection module is used for detecting the source video to obtain target content in each frame of video and corresponding image characteristics, color histograms and gray level maps;
the fusion module is used for fusing the image characteristics, the color histogram and the gray level image by using a logarithmic linear model to obtain video frame data;
the path generation module is used for generating a target video sequence according to the video frame data and searching an optimal cutting path in the target video sequence by using a shortest key path in dynamic planning;
and the cutting module is used for cutting the source video according to the optimal cutting path to obtain a video cutting result.
9. An automatic dynamic programming video cropping device, comprising: a memory, a processor, and a dynamic programming video auto-clip program stored on the memory and executable on the processor, the dynamic programming video auto-clip program configured to implement the steps of the dynamic programming video auto-clip method of any one of claims 1 to 7.
10. A storage medium having stored thereon a dynamic programming video auto-cropping program which, when executed by a processor, implements the steps of the dynamic programming video auto-cropping method of any of claims 1 to 7.
CN202210966159.3A 2022-08-12 2022-08-12 Dynamic programming video automatic cutting method, device, equipment and storage medium Active CN115049968B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210966159.3A CN115049968B (en) 2022-08-12 2022-08-12 Dynamic programming video automatic cutting method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210966159.3A CN115049968B (en) 2022-08-12 2022-08-12 Dynamic programming video automatic cutting method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115049968A true CN115049968A (en) 2022-09-13
CN115049968B CN115049968B (en) 2022-11-11

Family

ID=83166953

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210966159.3A Active CN115049968B (en) 2022-08-12 2022-08-12 Dynamic programming video automatic cutting method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115049968B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8194101B1 (en) * 2009-04-01 2012-06-05 Microsoft Corporation Dynamic perspective video window
US20140044404A1 (en) * 2012-03-13 2014-02-13 Google Inc. Methods and Systems for Video Retargeting Using Motion Saliency
CN105959707A (en) * 2016-03-14 2016-09-21 合肥工业大学 Static state background video compression algorithm based on motion perception
CN111767923A (en) * 2020-07-28 2020-10-13 腾讯科技(深圳)有限公司 Image data detection method and device and computer readable storage medium
CN112364168A (en) * 2020-11-24 2021-02-12 中国电子科技集团公司电子科学研究院 Public opinion classification method based on multi-attribute information fusion
CN112561840A (en) * 2020-12-02 2021-03-26 北京有竹居网络技术有限公司 Video clipping method and device, storage medium and electronic equipment
CN113269790A (en) * 2021-03-26 2021-08-17 北京达佳互联信息技术有限公司 Video clipping method and device, electronic equipment, server and storage medium
CN114387440A (en) * 2022-01-13 2022-04-22 腾讯科技(深圳)有限公司 Video clipping method and device and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8194101B1 (en) * 2009-04-01 2012-06-05 Microsoft Corporation Dynamic perspective video window
US20140044404A1 (en) * 2012-03-13 2014-02-13 Google Inc. Methods and Systems for Video Retargeting Using Motion Saliency
CN105959707A (en) * 2016-03-14 2016-09-21 合肥工业大学 Static state background video compression algorithm based on motion perception
CN111767923A (en) * 2020-07-28 2020-10-13 腾讯科技(深圳)有限公司 Image data detection method and device and computer readable storage medium
CN112364168A (en) * 2020-11-24 2021-02-12 中国电子科技集团公司电子科学研究院 Public opinion classification method based on multi-attribute information fusion
CN112561840A (en) * 2020-12-02 2021-03-26 北京有竹居网络技术有限公司 Video clipping method and device, storage medium and electronic equipment
CN113269790A (en) * 2021-03-26 2021-08-17 北京达佳互联信息技术有限公司 Video clipping method and device, electronic equipment, server and storage medium
CN114387440A (en) * 2022-01-13 2022-04-22 腾讯科技(深圳)有限公司 Video clipping method and device and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘瑾: "监控视频裁剪方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
柳青: "《中华医学统计百科全书 多元统计分册》", 30 June 2013 *

Also Published As

Publication number Publication date
CN115049968B (en) 2022-11-11

Similar Documents

Publication Publication Date Title
US10282853B2 (en) Method for tracking object in video in real time in consideration of both color and shape and apparatus therefor
US9240056B2 (en) Video retargeting
US9998685B2 (en) Spatial and temporal alignment of video sequences
US7760956B2 (en) System and method for producing a page using frames of a video stream
US8295683B2 (en) Temporal occlusion costing applied to video editing
US8654181B2 (en) Methods for detecting, visualizing, and correcting the perceived depth of a multicamera image sequence
US7152209B2 (en) User interface for adaptive video fast forward
Fan et al. Looking into video frames on small displays
KR101605983B1 (en) Image recomposition using face detection
US7127127B2 (en) System and method for adaptive video fast forward using scene generative models
US7711210B2 (en) Selection of images for image processing
US20220078358A1 (en) System for automatic video reframing
JP6179889B2 (en) Comment information generation device and comment display device
CN113286194A (en) Video processing method and device, electronic equipment and readable storage medium
US20110255844A1 (en) System and method for parsing a video sequence
US20070127775A1 (en) Method and apparatus for color-based object tracking in video sequences
US9672866B2 (en) Automated looping video creation
JP2005328105A (en) Creation of visually representative video thumbnail
Zhang et al. Simultaneous camera path optimization and distraction removal for improving amateur video
JP3312105B2 (en) Moving image index generation method and generation device
CN114390201A (en) Focusing method and device thereof
JP2001521656A (en) Computer system process and user interface providing intelligent scissors for image composition
US10062409B2 (en) Automated seamless video loop
CN115049968B (en) Dynamic programming video automatic cutting method, device, equipment and storage medium
JP2004080156A (en) Image processing apparatus, image processing method, program, recording medium, and image processing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Dynamic planning video automatic cropping method, device, equipment, and storage medium

Granted publication date: 20221111

Pledgee: Guanggu Branch of Wuhan Rural Commercial Bank Co.,Ltd.

Pledgor: WUHAN ETAH INFORMATION TECHNOLOGY Co.,Ltd.

Registration number: Y2024980009498