CN112135188A - Video clipping method, electronic device and computer-readable storage medium - Google Patents

Video clipping method, electronic device and computer-readable storage medium Download PDF

Info

Publication number
CN112135188A
CN112135188A CN202010973452.3A CN202010973452A CN112135188A CN 112135188 A CN112135188 A CN 112135188A CN 202010973452 A CN202010973452 A CN 202010973452A CN 112135188 A CN112135188 A CN 112135188A
Authority
CN
China
Prior art keywords
target
video
core
cutting
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010973452.3A
Other languages
Chinese (zh)
Inventor
李琳
周效军
苏毅
吴耀华
李鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, MIGU Culture Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202010973452.3A priority Critical patent/CN112135188A/en
Publication of CN112135188A publication Critical patent/CN112135188A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440245Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The embodiment of the invention discloses a video clipping method, electronic equipment and a computer readable storage medium, and belongs to the technical field of video processing. The video clipping method comprises the following steps: acquiring a target video to be processed; respectively determining the core degree of each object in the target video; determining the object with the core degree larger than a preset threshold value as a target object corresponding to the target video according to the core degree of each object; with the target object as a cutting target, cutting video frames in the target video to obtain a plurality of cutting video frames; and generating a cutting video according to the plurality of cutting video frames. Therefore, according to the embodiment, the clipped target object can be selected based on the core degree, so that the core object in the target video is effectively clipped, and the effect of clipping the video is improved.

Description

Video clipping method, electronic device and computer-readable storage medium
Technical Field
The invention belongs to the technical field of video processing, and particularly relates to a video clipping method, electronic equipment and a computer readable storage medium.
Background
In the prior art, for clipping a video, for example, clipping a horizontal screen video into a vertical screen video, a fixed-position clipped video with a fixed video bandwidth and a high proportion of fixed video bandwidth is usually generated by video editing software. In this case, the clipped video may not be clipped to the core character or the like, and the effect of clipping the video may be poor.
Disclosure of Invention
The embodiment of the invention aims to provide a video clipping method, electronic equipment and a computer readable storage medium, so as to solve the problem that the video effect obtained by clipping by the existing video clipping method is poor.
In order to solve the technical problem, the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a video cropping method, which is applied to an electronic device, and the method includes:
acquiring a target video to be processed;
respectively determining the core degree of each object in the target video;
determining the object with the core degree larger than a preset threshold value as a target object corresponding to the target video according to the core degree of each object;
with the target object as a cutting target, cutting video frames in the target video to obtain a plurality of cutting video frames;
and generating a cutting video according to the plurality of cutting video frames.
Optionally, the determining the core degree of each object in the target video respectively includes:
respectively determining the core degree of each object in the target video based on at least one of the following items:
a continuity ranking of the each object in the target video;
whether each object is occluded in the target video;
whether each object is a preset object or not;
whether each of the objects is speaking in the target video;
whether each object has the display of a preset action in the target video or not;
and the emotional expression of each object in the target video.
Optionally, the determining the core degree of each object in the target video respectively includes:
for each of the objects, the following processes are respectively performed:
analyzing the object to obtain a plurality of characteristic values of the object;
and calculating to obtain the core degree of the object according to the plurality of characteristic values and the weight of each characteristic value.
Optionally, the plurality of feature values include a feature value S, a feature value R, a feature value K, a feature value T, a feature value a, and a feature value E; the calculating the core degree of the object according to the plurality of characteristic values and the weight of each characteristic value comprises:
calculating to obtain the core degree I of the object by adopting the following formula:
I=a*S+b*(1/R)+c*K+d*T+e*A+f*E
wherein, the value of S is 0 or 1, 0 represents that the object is shielded in the target video, and 1 represents that the object is not shielded in the target video; r represents the continuity ranking of the object in the target video, and the value of R is a positive integer; k takes the value of 0 or 1, 0 represents that the object is not a preset object, and 1 represents that the object is a preset object; the value of T is 0 or 1, 0 represents that the object does not speak in the target video, and 1 represents that the object speaks in the target video; the value of A is 0 or 1, wherein 0 represents that the object has no display of a preset action in the target video, and 1 represents that the object has the display of the preset action in the target video; the value of E is 0 or 1, wherein 0 represents that the object has no preset emotional expression in the target video, and 1 represents that the object has the preset emotional expression in the target video; a is the weight of S, b is the weight of (1/R), c is the weight of K, d is the weight of T, E is the weight of A, and f is the weight of E.
Optionally, when the number of the target objects is one, the cropping the video frame in the target video to obtain a plurality of cropped video frames includes:
and taking the abscissa of the central position point of the target object as the abscissa of the central point of the cutting frame, and cutting the video frames in the target video to obtain a plurality of cut video frames.
Optionally, when the area of the target object occupying a video frame in the target video is smaller than a preset area threshold, the height of the cropped video frame is determined based on the height of the target object, and a ratio of the height of the target object to the height of the cropped video frame is equal to a preset proportion threshold.
Optionally, when the number of the target objects is multiple, the cropping the video frame in the target video to obtain multiple cropped video frames includes:
under the condition that a first video frame comprising a target object exists in the target video, the abscissa of the central position point of the target object is used as the abscissa of the central point of the cutting frame, and the cutting video frame is obtained by cutting from the first video frame;
and/or the presence of a gas in the gas,
and in the case that a second video frame comprising a plurality of target objects exists in the target video, cutting out a cut video frame from the second video frame based on the position of a core target object in the plurality of target objects.
Optionally, the core target object satisfies any one of the following conditions:
a core degree is highest among the plurality of target objects;
the center point position of the core target object among the plurality of target objects is closest to the center position of the second video frame.
Optionally, when the core target object has the highest core degree in the target objects, the cropping the second video frame based on the position of the core target object in the target objects to obtain a cropped video frame includes:
determining the distance between the core target object and other target objects according to the position of the core target object;
determining a first target object which can be in a cutting box with the core target object at the same time according to the distance;
selecting a target cutting frame according to a preset condition, and cutting the second video frame by using the target cutting frame to obtain a cut video frame; wherein the target crop box completely covers the core target object and the at least one first target object.
Optionally, when the center point of the core target object is closest to the center position of the second video frame, the cropping a cropped video frame from the second video frame based on the position of the core target object in the plurality of target objects includes:
taking the abscissa of the central position point of the core target object as the abscissa of the central point of the cutting frame, sliding the cutting frame by a preset step length, and cutting the second video frame by using the sliding cutting frame to obtain a cut video frame; wherein the slid crop box completely covers the core target object, and the slid crop box does not have an incomplete second target object covered therein; the second target object is the other target objects except the core target object in the plurality of target objects.
In a second aspect, embodiments of the present invention provide an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, implement the steps of the method described above.
In a third aspect, the present invention provides a computer-readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method as described above.
In the embodiment of the invention, the electronic device can obtain a target video to be processed, respectively determine the core degree of each object in the target video, determine an object with the core degree larger than a preset threshold value as a target object corresponding to the target video according to the core degree of each object, clip video frames in the target video by taking the target object as a clipping target to obtain a plurality of clipping video frames, and generate the clipping video according to the plurality of clipping video frames. Therefore, the clipped target object can be selected based on the core degree, so that the core object in the target video can be effectively clipped, and the video clipping effect is improved.
Drawings
FIG. 1 is a flow chart of a video cropping method of an embodiment of the present invention;
FIG. 2 is one of the schematic diagrams of clipping in the embodiment of the present invention;
FIG. 3A, FIG. 3B and FIG. 3C are two schematic views of the clipping in the embodiment of the present invention;
FIG. 4 is a third schematic diagram of clipping according to the embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a video cropping device according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms first, second and the like in the description and in the claims of the present invention are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the invention may be practiced other than those illustrated or described herein, and that the objects identified as "first," "second," etc. are generally a class of objects and do not limit the number of objects, e.g., a first object may be one or more. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.
The video cropping method provided by the embodiment of the invention is described in detail by specific embodiments in the following with reference to the accompanying drawings.
Referring to fig. 1, fig. 1 is a flowchart of a video cropping method according to an embodiment of the present invention, where the method is applied to an electronic device, and as shown in fig. 1, the method includes the following steps:
step 101: and acquiring a target video to be processed.
In this embodiment, the target video may be selected as a horizontal screen video or a vertical screen video. For the current video content, the narration is generally performed by switching different scenes, and the core characters and the positions of the characters and the like are generally different among the different scenes, so for accurate cropping, the acquired target video to be processed can be selected as a video under one scene.
In one embodiment, after acquiring an original video to be cropped (e.g., a landscape video), if the original video includes a plurality of videos under different shots, the electronic device may split the original video according to the shots by using a shot detection algorithm to obtain target videos in units of shots, and then analyze and crop the target videos under each shot.
In one embodiment, the aspect ratio of the target video to be processed may be selected to be 9: 16. In addition, the aspect ratio of the cropped video (i.e., the cropped video) may also be selected to be 9: 16.
Step 102: and respectively determining the core degree of each object in the target video.
In this embodiment, the object in the target video may be a person, an object (for example, a vehicle), or the like, and is not limited herein. Preferably, the object in the target video is a person. The core degree determined in this step can be understood as the importance degree of the corresponding object.
Taking an object in a target video as a character as an example, the core degree of the character can be determined by Artificial Intelligence (AI) algorithms such as character recognition, character tracking, face detection, preset action recognition, and/or character emotion recognition.
Step 103: and determining the object with the core degree larger than a preset threshold value as a target object corresponding to the target video according to the core degree of each object.
Optionally, the preset threshold may be preset based on actual requirements, for example, preset to 60, and the like, which is not limited in this embodiment. The target object determined in this step may be understood as a core object, and the corresponding number may be one or more.
Step 104: and with the target object as a cutting target, cutting the video frames in the target video to obtain a plurality of cut video frames.
In this embodiment, when the video frames in the target video are cropped, the cropping may be performed on each video frame in the target video. In a special case, if a certain video frame in the target video does not include the target object or the included target object is incomplete, the video frame may be skipped, that is, the video frame is not clipped. And the size of the cropped video frames obtained by cropping is preferably uniform.
Step 105: and generating a cutting video according to the plurality of cutting video frames.
Optionally, when generating the cropped video, the plurality of cropped video frames may be synthesized according to the time sequence of the plurality of cropped video frames.
The above-mentioned cropping video may be selected as a vertical screen video, for example, cropping a horizontal screen video into a vertical screen video.
It should be noted that, in the embodiment of the present invention, after analyzing and cropping the target video under each shot to obtain the cropped video, the cropped videos under different shots may be merged based on the switching relationship between different shots in the original video, so as to obtain the cropped video corresponding to the original video.
According to the video clipping method provided by the embodiment of the invention, the electronic equipment can obtain the target video to be processed, respectively determine the core degree of each object in the target video, determine the object with the core degree larger than the preset threshold value as the target object corresponding to the target video according to the core degree of each object, clip the video frames in the target video by taking the target object as the clipping target to obtain a plurality of clipping video frames, and generate the clipping video according to the plurality of clipping video frames. Therefore, the clipped target object can be selected based on the core degree, so that the core object in the target video can be effectively clipped, and the video clipping effect is improved.
In an embodiment of the present invention, in consideration of factors affecting importance of objects, such as continuity in the target video, whether the objects are preset objects, whether there is a display of a preset action, and the like, the process of separately determining the core degree of each object in the target video may include:
determining the core degree of each object in the target video respectively based on at least one of the following items:
1) continuity ranking of each object in target video
For the continuity ranking in 1), the tracking coordinate of each object in the target video may be obtained through a tracking detection algorithm (such as a human tracking detection algorithm), and the continuity of each object in the target video is ranked based on the number of the tracking coordinates of each object and the number of breaks. Understandably, the higher the continuity ranking, the higher the core degree.
In one embodiment, taking the above-mentioned object as a person as an example, the person tracking detection algorithm may adopt a human skeleton detection and tracking algorithm to track each person in the target video to obtain a person tracking coordinate and a person head center point coordinate, and calculate and rank the number of frames of each tracked person according to the obtained person tracking coordinate. If the number of the frames of the two persons is the same, the tracking coordinates can be ranked by the number of break times, and the ranking is higher when the number of break times is smaller. Wherein the coordinates of the center point of the head of the character can be used for ensuring that the head position of the character can be intercepted during subsequent cutting
For example, a target video under a shot comprises 100 frames of 5 seconds, and a head coordinate sequence of each frame corresponding to each person is obtained through a person tracking detection algorithm. For example, if person a appears 90 frames, the frame sequence is [1-90], person B appears 90 frames, the frame sequence is [1-40,45-95], and person C appears 50 frames, the continuity ranking of person a, person B, and person C can be as shown in table 1 below:
TABLE 1
Continuity ranking Character
1 A
2 B
3 C
2) Whether each object is occluded in the target video
In 2), the video frames in the target video may be identified by an image identification technique to determine whether the objects therein are occluded. Understandably, the degree of coring of an unoccluded object is typically higher compared to an occluded object.
3) Whether each object is a preset object
In this 3), the video frames in the target video may be identified by an image identification technology to determine whether an object therein is a preset object, and the preset object may be preset based on actual requirements. Understandably, the core degree of the preset object is generally higher than that of the non-preset object.
For example, taking the above-mentioned object as a person, it can be determined whether the person in the target video is a specific person or a known star person by using a face recognition technology.
4) Whether each object is speaking in the target video
In 4), the object may be selected as a person, and it may be determined whether each person in the target video speaks in the corresponding shot through lip motion recognition. Understandably, the core level of a speaking subject is typically higher compared to a subject that is not speaking.
5) Whether each object has display of preset action in target video
In this 5), whether each object in the target video has a preset action in the corresponding shot or not can be determined through action recognition. Wherein the preset action may be preset based on actual demand. Understandably, compared with an object without a preset action, the core degree of the object with the preset action is usually higher.
For example, taking the above-mentioned object as a character, the preset action can be, but is not limited to, extending an arm, kicking a ball, catching a basket, and/or dancing.
6) Emotional expression of each object in target video
In 6), the object can be selected as a person, and whether each person in the target video has an emotional expression in the corresponding shot can be determined through facial expression recognition, wherein the emotional expression can be selected from, but is not limited to, surprise, laugh, cry and the like. Understandably, the core of a person with emotional expression is typically higher compared to a person without emotional expression.
Further, in the embodiment of the present invention, in addition to determining whether there is an emotional expression, a specific emotional expression may be determined, and the importance of the corresponding character may be evaluated based on the specific emotional expression. Alternatively, the specific facial expression recognition may be to mark the emotion of each person by recognizing the facial expression of different frame images of different persons in the target video.
In one embodiment, for a target video segment, frames may be extracted according to a certain frequency, then a Multi-Task Convolutional Neural Network (MTCNN) model is used to extract faces appearing in corresponding characters in an extracted video frame image, and finally an Xception Network model (a deep separable Convolutional Network model) is used to classify facial expressions of all characters, so as to count whether the characters have emotional expressions or not.
Thus, with the aid of the influencing factors 1) to 6) above, the core degree of the object in the target video can be accurately determined.
For example, taking the core degree of the person in the target video determined based on the above 1), 3) to 6) as an example, if the person in the target video includes person a, person B and person C, the corresponding recognition results can be shown in table 2 below:
TABLE 2
Figure BDA0002684923370000091
In this embodiment of the present invention, optionally, the process of respectively determining the core degree of each object in the target video may include: for each object, the following processes are performed:
analyzing the object to obtain a plurality of characteristic values of the object;
and calculating to obtain the core degree of the object according to the plurality of characteristic values and the weight of each characteristic value.
Note that, the above-mentioned manner of analyzing the object includes, but is not limited to, the contents of 1) to 6) above, such as: the method comprises the steps of analyzing a continuity ranking of an object in a target video, analyzing whether the object is blocked in the target video, analyzing whether the object is a preset object, analyzing whether the object speaks in the target video, analyzing whether the object has a preset action in the target video, and/or analyzing emotional expression of the object in the target video, and the like, wherein the corresponding characteristic value can be set based on actual requirements. For example, if an object is occluded in the target video, the corresponding feature value may be set to 0, and if an object is not occluded in the target video, the corresponding feature value may be set to 1; and so on.
Further, if the feature values include a feature value S, a feature value R, a feature value K, a feature value T, a feature value a, and a feature value E, the calculating the core degree of the object according to the feature values and the weight of each feature value may include:
calculating to obtain the core degree I of the object by adopting the following formula:
I=a*S+b*(1/R)+c*K+d*T+e*A+f*E
wherein, S is 0 or 1, and 0 represents being shielded, namely the object is shielded in the target video; 1 means not occluded, i.e. the object is not occluded in the target video;
r represents the continuity ranking of the object in the target video, and takes a value of a positive integer, for example, 1, 2, 3 … …;
k is 0 or 1, wherein 0 represents that the object is not a preset object, namely the object is not a preset object; 1 represents a preset object, i.e. the object is a preset object;
the value of T is 0 or 1, 0 represents that no speaking exists, namely the object does not speak in the target video; 1 represents speaking, i.e. the subject is speaking in the target video;
the value of A is 0 or 1, wherein 0 represents that no preset action is displayed, namely the object has no preset action in the target video; 1, showing a preset action, namely showing the preset action of the object in a target video;
the value of E is 0 or 1, wherein 0 represents that no preset emotional expression exists, namely the object has no preset emotional expression in the target video; 1, indicating that the preset emotional expression exists, namely the object has the preset emotional expression in the target video;
a is the weight of S, b is the weight of (1/R), c is the weight of K, d is the weight of T, E is the weight of A, and f is the weight of E.
In one embodiment, c may take the value 20, while a, b, d, e and f take the value 10. In this case, the calculated coring level is at least 10/R and at most 70. The preset threshold for determining whether the target object is present or not may be set to 60 at the highest, but not limited to this, may be set to 50 or 40, and may be adjusted as needed.
It should be noted that, if the height of the video frame in the target video is h, the width is w, the aspect ratio is 9:16, the height of the cropping video frame is h ', the width is w', the aspect ratio is 9:16, and the position of the video frame in the target video to be cropped is (xmin, ymin, xmax, ymax) (in this case, a coordinate system constructed by using the upper left point of the video frame in the target video as the origin), the following relationship may exist:
Figure BDA0002684923370000101
Figure BDA0002684923370000102
Figure BDA0002684923370000103
ymin=h-h′
Figure BDA0002684923370000104
ymax=h
wherein the content of the first and second substances,
Figure BDA0002684923370000111
to scale the height of the cropped video frame to the height of the original video frame (i.e., the video frame in the target video), a positive number less than or equal to 1 may be selected. The
Figure BDA0002684923370000112
The method can be preset based on actual requirements and can also be adjusted based on actual conditions. cx is the abscissa of the center point of the cropping frame and can also be understood as the abscissa of the center point of the cropping video frame. The cropped video frame may be cropped based on the crop box.
In the embodiment of the present invention, for the target video to be processed, the number of the corresponding target objects may be one or more. Different video cropping modes can be adopted based on one or more target objects, and the following description is provided.
Scene one: the number of the target objects is one
In this scenario, since the number of the target objects is one, that is, there is one core object to be clipped, the clipping can be performed directly based on the positions of the target objects.
Optionally, the specific cutting method may be: and taking the abscissa of the central position point of the target object as the abscissa of the central point of the cutting frame, and cutting the video frames in the target video to obtain a plurality of cut video frames. And the video frame is obtained by cutting the video frame in the target video based on the cutting frame, and the size of the video frame is the same as that of the cutting frame.
In one embodiment, taking the target object as the person a as an example, the central position point of the target object may be selected as the central point of the head position of the person a.
In an embodiment, taking the target object as the character a as an example, when a video frame in the target video is cropped, each frame can be cropped according to the coordinates of the center point of the head position of the character a, so as to ensure the integrity of the subsequently synthesized cropped video. For example, referring to fig. 2, if the cropping is performed on the video frame 1 in the target video, the abscissa (x') of the center point of the head position of the person a is equal to the abscissa (cx) of the center point of the cropping frame, and the cropping map of the video frame 1 can be as shown in fig. 2.
It should be noted that the size of the above-mentioned crop box may be preset, for example, the width is w ', the height is h ', the aspect ratio is 9:16, and the ratio of the height h ' to the height h of the video frame in the target video is set to be w
Figure BDA0002684923370000113
In addition, the size of the crop box can be adjusted. Optionally, when an area of a target object corresponding to the target video in a video frame in the target video is smaller than a preset area threshold, a height of a cropped video frame (i.e., a cropping frame) may be determined based on the height of the target object, and a ratio of the height of the target object to the height of the cropped video frame is equal to a preset proportion threshold. The preset area threshold and the preset proportion threshold may be preset based on actual requirements.
For example, taking the target object as a character, if the area of the character occupying the video frame in the corresponding target video is smaller than 1/8 (the value is adjustable) of the area of the video frame, the preset cropping frame is the same as the size of the video frame, that is, the size of the cropping frame is the same as that of the video frame
Figure BDA0002684923370000121
To 1, the cropping frame may be scaled down to reduce the height of the cropped video frame by the character height ph occupying the height h', 3/4 (which may be adjusted) of the cropping frame
Figure BDA0002684923370000122
Comprises the following steps:
Figure BDA0002684923370000123
scene two: the number of the target objects is multiple
In this scenario, when the number of target objects corresponding to the target video is multiple, only one target object may be included in a video frame in the target video, and also a pair of target objects may be included in the video frame in the target video. At this time, different video cropping modes can be adopted based on the difference of the number of target objects, and the following description is given.
Case 1: in the case that a first video frame including one target object exists in the target video, for this first video frame, the abscissa of the center position point of the one target object may be taken as the abscissa of the center point of the cropping frame, and the cropping video frame may be cropped from the first video frame. And the cutting video frame is obtained by cutting the first video frame based on the cutting frame. The first video frame may be any video frame in the target video.
In one embodiment, taking the one target object as the person B as an example, the central position point of the one target object may be selected as the central point of the head position of the person B.
Case 2: in the case where a second video frame including a plurality of target objects exists in the target video, for this second video frame, a cropped video frame may be cropped from the second video frame based on a position of a core target object among the plurality of target objects. The second video frame may be any video frame in the target video.
It is to be understood that case 1 and case 2 above may be and/or a relationship. That is, when the number of target objects corresponding to the target video is plural, the case where the video frame in the target video includes the target object may be any one of the following: 1) each video frame includes only one target object, and different video frames may include different target objects; 2) each video frame comprising a plurality of target objects; 3) part of the video frame comprises only one target object, while part of the video frame comprises a plurality of target objects. The embodiments of the invention are not limited thereto.
Optionally, in the above case 2, the core target object may satisfy any one of the following conditions:
(1) the core degree is highest among the plurality of target objects. At this time, the core target object may be located at any position of the second video frame, such as a center position, an edge position, and the like. Therefore, the target object with the highest core degree can be selected preferentially to cut the video, and the effective cutting of the core target object picture is ensured.
In one embodiment, the target objects are core characters, and the core degrees of the core characters in the target video, i.e. the same shot, are higher than the set threshold.
In this (1), the specific video cropping process may include: firstly, determining the distance between the core target object and other target objects according to the position of the core target object; then, according to the distance, determining a first target object which can be in a cutting frame with the core target object at the same time; the first target object may be one or more of a plurality of target objects; finally, selecting a target cutting frame according to a preset condition, and cutting the second video frame by using the target cutting frame to obtain a cut video frame; wherein the target crop box completely covers the core target object and the at least one first target object. For the preset condition, the first target object may be in the same target clipping box as many as possible, or the first target object may be in the target clipping box that meets the preset requirement, and the preset requirement may be preset based on the actual requirement, for example, the continuity ranking is higher than the preset threshold, there is a preset action display, there is an emotional performance, and the like.
In one embodiment, in the video cropping process of (1), after the first target object is determined, in the case that the first target object can be completely covered by the cropping frame, the core target object and the first target object can be directly used as the cropping target, and the cropping video frame can be obtained by cropping from the second video frame; alternatively, when the number of the first target objects is plural, but the plural first target objects cannot be covered by the cropping frame, the first target object satisfying the preset requirement may be selected from the plural first target objects, and the core target object and the first target object satisfying the preset requirement may be used as the cropping target to crop the second video frame to obtain the cropped video frame.
For example, taking a plurality of target objects as a plurality of core characters as an example, if the plurality of core characters include a core character D, a core character E and a core character F, wherein a central core character (i.e., a core target object) is the core character E, and the core character F has a preset action on the display, in a case that the core character D and the core character F can be in a cropping frame with the core character E at the same time, but the core character D and the core character F cannot be covered by the cropping frame, the core character F having the preset action on the display may be selected and cropped with the core character E and the core character F as the cropping target, so that the cropped video frame includes the pictures of the core character E and the core character F.
(2) The center point of the core target object is located closest to the center location of the second video frame among the plurality of target objects. In this case, the core degrees of the plurality of target objects may be the same or different. Therefore, the picture of the center position of the second video frame can be selected preferentially to be effectively cut, and the cutting effect is improved.
In one embodiment, taking the target objects as core characters, the core characters have the same or different core degrees. For example, the plurality of core characters are all known star characters, and all of the core characters do not speak, do actions, have emotional expressions and the like in the target video.
Optionally, when the center point of the core target object is closest to the center position of the second video frame, the above process of cropping the second video frame based on the position of the core target object in the plurality of target objects may include: taking the abscissa of the central position point of the core target object as the abscissa of the central point of the cutting frame, sliding the cutting frame by a preset step length, and cutting the second video frame by using the sliding cutting frame to obtain a cut video frame; wherein the slid crop box completely covers the core target object, and the slid crop box does not have an incomplete second target object covered therein; the second target object is the other target objects except the core target object in the plurality of target objects.
In the case of (2) taking the target object closest to the center position of the video frame as the core target object, it is considered that more target objects can be cut as much as possible in addition to the core target object. The specific video clipping process may include:
1) when the abscissa of the center position point of the core target object is taken as the abscissa of the center point of the crop box, the crop box can cover the core target object (in this case, the crop box may cover only the core target object or may cover other target objects besides the core target object), and there is no second target object which is not completely covered in the crop box, the crop video frame can be obtained by cropping from the second video frame based on the crop box.
2) When the abscissa of the central position point of the core target object is taken as the abscissa of the central point of the cropping frame, the cropping frame can cover the core target object (in this case, the cropping frame may cover only the core target object, or may cover other target objects besides the core target object), and there is an incompletely covered second target object in the cropping frame, the cropping frame may be slid by a preset step length, and in a case that the slid cropping frame can cover the core target object and there is no incompletely covered second target object in the slid cropping frame, a cropping video frame is cropped from the second video frame based on the slid cropping frame; or, when the second target object which is not completely covered exists in the sliding cutting frame, the horizontal coordinate of the central position point of the core target object is used as the horizontal coordinate of the central point of the cutting frame, and the cutting video frame is obtained by cutting from the second video frame.
The second target object is other target objects except the core target object in the plurality of target objects. The size of the crop box may be preset based on actual requirements. The preset step length can be preset based on actual requirements, and when the cutting frame is slid, the cutting frame can be slid leftwards or rightwards.
In one embodiment, taking the core target object as the character B, the center position point of the core target object may be selected as the center point of the head position of the character B.
For example, taking multiple target objects as multiple core characters as an example, if the center core character (i.e., the core target object) is closest to the center of the video frame and it is considered that more core characters can be clipped as much as possible, a better clipping position can be found by sliding the center coordinates of the head of the center core character left and right. Initializing the value of the abscissa cx of the central point of the cutting frame to the abscissa X of the head center position of the central core character, then judging whether the central core character is completely cut and whether other core characters are covered, and when the central core character or a plurality of core characters comprising the central core character are just covered, cutting the video by adopting the value of the cx at the moment. If the coverage is not complete, sliding the cutting frame to the right or left according to the set step length, judging the coverage (at the moment, setting the default middle core character to be covered), and when the coverage character is complete and the interception is not complete, intercepting the video based on the position of the cutting frame at the moment. If not, the video clipping is performed based on the initial value of the abscissa cx of the center point of the clipping frame.
For example, as shown in fig. 3A to 3C, if the video frame 2 includes a core character L, a core character M, and a core character N, the core character M is close to the center of the video frame 2, that is, the core character M is a center core character, and when the head center position abscissa X of the core character M is initially assigned as the center point abscissa cx of the crop box, the crop box does not completely cover the core character L and the core character N (as shown in fig. 3A), the crop box may be slid to the right (as shown in fig. 3B) or slid to the left (as shown in fig. 3C) to find a suitable crop position, that is, to cover the character without complete truncation (as shown in fig. 3C), and video cropping may be performed based on the crop box at this time (i.e., the center position abscissa of the region including the entire heads of the core character M and the core character L is assigned as the. If no suitable clipping position is found, assigning the abscissa X of the head center position of the core character M as the abscissa cx of the center point of the clipping frame for video clipping, as shown in fig. 4.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a video cropping device according to an embodiment of the present invention, which is applied to an electronic device, and as shown in fig. 5, the video cropping device 50 may include:
an obtaining module 51, configured to obtain a target video to be processed;
a first determining module 52, configured to determine a core degree of each object in the target video respectively;
a second determining module 53, configured to determine, according to the core degree of each object, an object whose core degree is greater than a preset threshold as a target object corresponding to the target video;
the cropping module 54 is configured to crop video frames in the target video with the target object as a cropping target to obtain a plurality of cropped video frames;
and a generating module 55, configured to generate a clipped video according to the plurality of clipped video frames.
Optionally, the first determining module 52 is specifically configured to:
respectively determining the core degree of each object in the target video based on at least one of the following items:
a continuity ranking of the each object in the target video;
whether each object is occluded in the target video;
whether each object is a preset object or not;
whether each of the objects is speaking in the target video;
whether each object has the display of a preset action in the target video or not;
and the emotional expression of each object in the target video.
Optionally, the first determining module 52 is specifically configured to: for each of the objects, the following processes are respectively performed: analyzing the object to obtain a plurality of characteristic values of the object; and calculating to obtain the core degree of the object according to the plurality of characteristic values and the weight of each characteristic value.
Optionally, the plurality of feature values include a feature value S, a feature value R, a feature value K, a feature value T, a feature value a, and a feature value E; the first determining module 52 is specifically configured to:
aiming at each object, calculating to obtain the core degree I of the object by adopting the following formula:
I=a*S+b*(1/R)+c*K+d*T+e*A+f*E
wherein, the value of S is 0 or 1, 0 represents that the object is shielded in the target video, and 1 represents that the object is not shielded in the target video; r represents the continuity ranking of the object in the target video, and the value of R is a positive integer; k takes the value of 0 or 1, 0 represents that the object is not a preset object, and 1 represents that the object is a preset object; the value of T is 0 or 1, 0 represents that the object does not speak in the target video, and 1 represents that the object speaks in the target video; the value of A is 0 or 1, wherein 0 represents that the object has no display of a preset action in the target video, and 1 represents that the object has the display of the preset action in the target video; the value of E is 0 or 1, wherein 0 represents that the object has no preset emotional expression in the target video, and 1 represents that the object has the preset emotional expression in the target video; a is the weight of S, b is the weight of (1/R), c is the weight of K, d is the weight of T, E is the weight of A, and f is the weight of E.
Optionally, when the number of the target objects is one, the clipping module 54 is specifically configured to:
and taking the abscissa of the central position point of the target object as the abscissa of the central point of the cutting frame, and cutting the video frames in the target video to obtain a plurality of cut video frames.
Optionally, when the area of the target object occupying a video frame in the target video is smaller than a preset area threshold, the height of the cropped video frame is determined based on the height of the target object, and a ratio of the height of the target object to the height of the cropped video frame is equal to a preset proportion threshold.
Optionally, when the number of the target objects is multiple, the cropping module 54 includes:
the first cropping unit is used for cropping the first video frame to obtain a cropped video frame by taking the abscissa of the central position point of one target object as the abscissa of the central point of the cropping frame under the condition that the first video frame comprising the one target object exists in the target video;
and a second cropping unit, configured to crop a cropped video frame from a second video frame based on a position of a core target object of the plurality of target objects, if the second video frame including the plurality of target objects exists in the target video.
Optionally, the core target object satisfies any one of the following conditions:
a core degree is highest among the plurality of target objects;
the center point position of the core target object among the plurality of target objects is closest to the center position of the second video frame.
Optionally, when the core target object has the highest core degree among the plurality of target objects, the second clipping unit includes:
a first determining subunit, configured to determine, according to the position of the core target object, a distance between the core target object and another target object;
a second determining subunit, configured to determine, according to the distance, a first target object that can be in a crop box simultaneously with the core target object;
the first clipping subunit is used for selecting a target clipping frame according to a preset condition and clipping the second video frame by using the target clipping frame to obtain a clipped video frame; wherein the target crop box completely covers the core target object and the at least one first target object.
Optionally, when the center point of the core target object is closest to the center position of the second video frame, the second clipping unit is specifically configured to: taking the abscissa of the central position point of the core target object as the abscissa of the central point of the cutting frame, sliding the cutting frame by a preset step length, and cutting the second video frame by using the sliding cutting frame to obtain a cut video frame; wherein the slid crop box completely covers the core target object, and the slid crop box does not have an incomplete second target object covered therein; the second target object is the other target objects except the core target object in the plurality of target objects.
It can be understood that the video cropping device 50 according to the embodiment of the present invention can implement the processes of the method embodiment shown in fig. 1, and can achieve the same technical effects, and for avoiding repetition, the details are not repeated here.
In addition, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a program or an instruction stored in the memory and executable on the processor, where the program or the instruction, when executed by the processor, can implement each process of the method embodiment shown in fig. 1 and achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
Referring to fig. 6, an embodiment of the invention further provides an electronic device 60, which includes a bus 61, a transceiver 62, an antenna 63, a bus interface 66, a processor 65, and a memory 66.
In the embodiment of the present invention, the electronic device 60 further includes: a computer program stored on the memory 66 and executable on the processor 65. Optionally, the computer program may be adapted to implement the following steps when executed by the processor 65:
acquiring a target video to be processed;
respectively determining the core degree of each object in the target video;
determining the object with the core degree larger than a preset threshold value as a target object corresponding to the target video according to the core degree of each object;
with the target object as a cutting target, cutting video frames in the target video to obtain a plurality of cutting video frames;
and generating a cutting video according to the plurality of cutting video frames.
It is understood that the computer program can implement the processes of the embodiment of the method shown in fig. 1 when being executed by the processor 65, and can achieve the same technical effects, and the details are not repeated herein to avoid repetition.
In fig. 6, a bus architecture (represented by bus 61), bus 61 may include any number of interconnected buses and bridges, bus 61 linking together various circuits including one or more processors, represented by processor 65, and memory, represented by memory 66. The bus 61 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 66 provides an interface between the bus 61 and the transceiver 62. The transceiver 62 may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 65 is transmitted over a wireless medium via the antenna 63, and further, the antenna 63 receives the data and transmits the data to the processor 65.
The processor 65 is responsible for managing the bus 61 and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And the memory 66 may be used to store data used by the processor 65 in performing operations.
Alternatively, the processor 65 may be a CPU, ASIC, FPGA or CPLD.
The embodiment of the present invention further provides a computer-readable storage medium, on which a program or an instruction is stored, where the program or the instruction, when executed by a processor, can implement the processes of the method embodiment shown in fig. 1 and achieve the same technical effects, and in order to avoid repetition, details are not repeated here.
Computer-readable media, which include both non-transitory and non-transitory, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention or the portions contributing to the prior art may be essentially embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a service classification device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A video clipping method is applied to electronic equipment and is characterized by comprising the following steps:
acquiring a target video to be processed;
respectively determining the core degree of each object in the target video;
determining the object with the core degree larger than a preset threshold value as a target object corresponding to the target video according to the core degree of each object;
with the target object as a cutting target, cutting video frames in the target video to obtain a plurality of cutting video frames;
and generating a cutting video according to the plurality of cutting video frames.
2. The method of claim 1, wherein the separately determining the core degree of each object in the target video comprises:
respectively determining the core degree of each object in the target video based on at least one of the following items:
a continuity ranking of the each object in the target video;
whether each object is occluded in the target video;
whether each object is a preset object or not;
whether each of the objects is speaking in the target video;
whether each object has the display of a preset action in the target video or not;
and the emotional expression of each object in the target video.
3. The method of claim 1, wherein the separately determining the core degree of each object in the target video comprises:
for each of the objects, the following processes are respectively performed:
analyzing the object to obtain a plurality of characteristic values of the object;
and calculating to obtain the core degree of the object according to the plurality of characteristic values and the weight of each characteristic value.
4. The method according to claim 3, wherein the plurality of feature values include a feature value S, a feature value R, a feature value K, a feature value T, a feature value A, and a feature value E;
the calculating the core degree of the object according to the plurality of characteristic values and the weight of each characteristic value comprises:
calculating to obtain the core degree I of the object by adopting the following formula:
I=a*S+b*(1/R)+c*K+d*T+e*A+f*E
wherein, the value of S is 0 or 1, 0 represents that the object is shielded in the target video, and 1 represents that the object is not shielded in the target video; r represents the continuity ranking of the object in the target video, and the value of R is a positive integer; k takes the value of 0 or 1, 0 represents that the object is not a preset object, and 1 represents that the object is a preset object; the value of T is 0 or 1, 0 represents that the object does not speak in the target video, and 1 represents that the object speaks in the target video; the value of A is 0 or 1, wherein 0 represents that the object has no display of a preset action in the target video, and 1 represents that the object has the display of the preset action in the target video; the value of E is 0 or 1, wherein 0 represents that the object has no preset emotional expression in the target video, and 1 represents that the object has the preset emotional expression in the target video; a is the weight of S, b is the weight of (1/R), c is the weight of K, d is the weight of T, E is the weight of A, and f is the weight of E.
5. The method according to claim 1, wherein when the number of the target objects is plural, the cropping the video frame in the target video to obtain plural cropped video frames comprises:
under the condition that a first video frame comprising a target object exists in the target video, the abscissa of the central position point of the target object is used as the abscissa of the central point of the cutting frame, and the cutting video frame is obtained by cutting from the first video frame;
and/or the presence of a gas in the gas,
and in the case that a second video frame comprising a plurality of target objects exists in the target video, cutting out a cut video frame from the second video frame based on the position of a core target object in the plurality of target objects.
6. The method of claim 5, wherein the core target object satisfies any one of the following conditions:
a core degree is highest among the plurality of target objects;
the center point position of the core target object among the plurality of target objects is closest to the center position of the second video frame.
7. The method of claim 6, wherein the cropping from the second video frame based on the position of the core target object of the plurality of target objects when the core target object has a highest core degree among the plurality of target objects comprises:
determining the distance between the core target object and other target objects according to the position of the core target object;
determining a first target object which can be in a cutting box with the core target object at the same time according to the distance;
selecting a target cutting frame according to a preset condition, and cutting the second video frame by using the target cutting frame to obtain a cut video frame; wherein the target crop box completely covers the core target object and the at least one first target object.
8. The method of claim 6, wherein the cropping from the second video frame based on the position of the core target object from the plurality of target objects when the center point position of the core target object is closest to the center position of the second video frame comprises:
taking the abscissa of the central position point of the core target object as the abscissa of the central point of the cutting frame, sliding the cutting frame by a preset step length, and cutting the second video frame by using the sliding cutting frame to obtain a cut video frame; wherein the slid crop box completely covers the core target object, and the slid crop box does not have an incomplete second target object covered therein; the second target object is the other target objects except the core target object in the plurality of target objects.
9. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the steps of the video cropping method of any of claims 1 to 8.
10. A computer-readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the video cropping method of any one of claims 1 to 8.
CN202010973452.3A 2020-09-16 2020-09-16 Video clipping method, electronic device and computer-readable storage medium Pending CN112135188A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010973452.3A CN112135188A (en) 2020-09-16 2020-09-16 Video clipping method, electronic device and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010973452.3A CN112135188A (en) 2020-09-16 2020-09-16 Video clipping method, electronic device and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN112135188A true CN112135188A (en) 2020-12-25

Family

ID=73845807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010973452.3A Pending CN112135188A (en) 2020-09-16 2020-09-16 Video clipping method, electronic device and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN112135188A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112967288A (en) * 2021-02-03 2021-06-15 咪咕文化科技有限公司 Multimedia data processing method, communication equipment and readable storage medium
CN113436072A (en) * 2021-06-24 2021-09-24 湖南快乐阳光互动娱乐传媒有限公司 Video frame clipping method and device
CN114155255A (en) * 2021-12-14 2022-03-08 成都索贝数码科技股份有限公司 Video horizontal screen-vertical screen conversion method based on specific figure space-time trajectory
CN114302226A (en) * 2021-12-28 2022-04-08 北京中科大洋信息技术有限公司 Intelligent cutting method for video picture
CN114339031A (en) * 2021-12-06 2022-04-12 深圳市金九天视实业有限公司 Picture adjusting method, device, equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102124727A (en) * 2008-03-20 2011-07-13 无线电技术研究学院有限公司 A method of adapting video images to small screen sizes
US20120127329A1 (en) * 2009-11-30 2012-05-24 Shane Voss Stabilizing a subject of interest in captured video
US20130287301A1 (en) * 2010-11-22 2013-10-31 JVC Kenwood Corporation Image processing apparatus, image processing method, and image processing program
US20140105573A1 (en) * 2012-10-12 2014-04-17 Nederlandse Organisatie Voor Toegepast-Natuurwetenschappelijk Onderzoek Tno Video access system and method based on action type detection
US20180007286A1 (en) * 2016-07-01 2018-01-04 Snapchat, Inc. Systems and methods for processing and formatting video for interactive presentation
CN109145784A (en) * 2018-08-03 2019-01-04 百度在线网络技术(北京)有限公司 Method and apparatus for handling video
US20190130165A1 (en) * 2017-10-27 2019-05-02 Avigilon Corporation System and method for selecting a part of a video image for a face detection operation
CN110189378A (en) * 2019-05-23 2019-08-30 北京奇艺世纪科技有限公司 A kind of method for processing video frequency, device and electronic equipment
CN110223306A (en) * 2019-06-14 2019-09-10 北京奇艺世纪科技有限公司 A kind of method of cutting out and device of image
CN111277915A (en) * 2018-12-05 2020-06-12 阿里巴巴集团控股有限公司 Video conversion method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102124727A (en) * 2008-03-20 2011-07-13 无线电技术研究学院有限公司 A method of adapting video images to small screen sizes
US20120127329A1 (en) * 2009-11-30 2012-05-24 Shane Voss Stabilizing a subject of interest in captured video
US20130287301A1 (en) * 2010-11-22 2013-10-31 JVC Kenwood Corporation Image processing apparatus, image processing method, and image processing program
US20140105573A1 (en) * 2012-10-12 2014-04-17 Nederlandse Organisatie Voor Toegepast-Natuurwetenschappelijk Onderzoek Tno Video access system and method based on action type detection
US20180007286A1 (en) * 2016-07-01 2018-01-04 Snapchat, Inc. Systems and methods for processing and formatting video for interactive presentation
US20190130165A1 (en) * 2017-10-27 2019-05-02 Avigilon Corporation System and method for selecting a part of a video image for a face detection operation
CN109145784A (en) * 2018-08-03 2019-01-04 百度在线网络技术(北京)有限公司 Method and apparatus for handling video
CN111277915A (en) * 2018-12-05 2020-06-12 阿里巴巴集团控股有限公司 Video conversion method and device
CN110189378A (en) * 2019-05-23 2019-08-30 北京奇艺世纪科技有限公司 A kind of method for processing video frequency, device and electronic equipment
CN110223306A (en) * 2019-06-14 2019-09-10 北京奇艺世纪科技有限公司 A kind of method of cutting out and device of image

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112967288A (en) * 2021-02-03 2021-06-15 咪咕文化科技有限公司 Multimedia data processing method, communication equipment and readable storage medium
CN113436072A (en) * 2021-06-24 2021-09-24 湖南快乐阳光互动娱乐传媒有限公司 Video frame clipping method and device
CN114339031A (en) * 2021-12-06 2022-04-12 深圳市金九天视实业有限公司 Picture adjusting method, device, equipment and storage medium
CN114155255A (en) * 2021-12-14 2022-03-08 成都索贝数码科技股份有限公司 Video horizontal screen-vertical screen conversion method based on specific figure space-time trajectory
CN114302226A (en) * 2021-12-28 2022-04-08 北京中科大洋信息技术有限公司 Intelligent cutting method for video picture

Similar Documents

Publication Publication Date Title
CN112135188A (en) Video clipping method, electronic device and computer-readable storage medium
CN109325933B (en) Method and device for recognizing copied image
US10825187B2 (en) Method and system for object tracking
US20180268559A1 (en) Method for tracking object in video in real time in consideration of both color and shape and apparatus therefor
US20160210530A1 (en) Fast object detection method based on deformable part model (dpm)
CN106529406B (en) Method and device for acquiring video abstract image
CN109214403B (en) Image recognition method, device and equipment and readable medium
CN111524145A (en) Intelligent picture clipping method and system, computer equipment and storage medium
WO2020238374A1 (en) Method, apparatus, and device for facial key point detection, and storage medium
CN107959798B (en) Video data real-time processing method and device and computing equipment
KR20180102639A (en) Image processing apparatus, image processing method, image processing program, and storage medium
CN111429338B (en) Method, apparatus, device and computer readable storage medium for processing video
US8983188B1 (en) Edge-aware smoothing in images
CN111985419B (en) Video processing method and related equipment
CN110991385A (en) Method and device for identifying ship driving track and electronic equipment
CN111667504A (en) Face tracking method, device and equipment
CN110418148B (en) Video generation method, video generation device and readable storage medium
CN108010058A (en) A kind of method and system that vision tracking is carried out to destination object in video flowing
CN113689440A (en) Video processing method and device, computer equipment and storage medium
JP2007293559A (en) Device and method for detecting non-stationary image, and program mounted with the method
US11647294B2 (en) Panoramic video data process
CN112131984A (en) Video clipping method, electronic device and computer-readable storage medium
CN111353330A (en) Image processing method, image processing device, electronic equipment and storage medium
CN113515978B (en) Data processing method, device and storage medium
US20230368576A1 (en) Image processing apparatus, image processing method, and non-transitory storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201225