CN112135188A - Video clipping method, electronic device and computer-readable storage medium - Google Patents

Video clipping method, electronic device and computer-readable storage medium Download PDF

Info

Publication number
CN112135188A
CN112135188A CN202010973452.3A CN202010973452A CN112135188A CN 112135188 A CN112135188 A CN 112135188A CN 202010973452 A CN202010973452 A CN 202010973452A CN 112135188 A CN112135188 A CN 112135188A
Authority
CN
China
Prior art keywords
target
video
core
frame
cropping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010973452.3A
Other languages
Chinese (zh)
Inventor
李琳
周效军
苏毅
吴耀华
李鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, MIGU Culture Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202010973452.3A priority Critical patent/CN112135188A/en
Publication of CN112135188A publication Critical patent/CN112135188A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440245Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

本发明实施例公开了一种视频裁剪方法、电子设备及计算机可读存储介质,属于视频处理技术领域。该视频裁剪方法包括:获取待处理的目标视频;分别确定所述目标视频中每个对象的核心度;根据所述每个对象的核心度,将核心度大于预设阈值的对象确定为所述目标视频对应的目标对象;以所述目标对象为裁剪目标,对所述目标视频中的视频帧进行裁剪,得到多个裁剪视频帧;根据所述多个裁剪视频帧,生成裁剪视频。由此根据本实施例,可以基于核心度来选择裁剪的目标对象,从而保证对目标视频中核心对象的有效裁剪,提升裁剪视频的效果。

Figure 202010973452

The embodiments of the present invention disclose a video cropping method, an electronic device and a computer-readable storage medium, which belong to the technical field of video processing. The video cropping method includes: acquiring a target video to be processed; determining the coreness of each object in the target video respectively; and determining an object whose coreness is greater than a preset threshold according to the coreness of each object as the A target object corresponding to the target video; taking the target object as a clipping target, clipping video frames in the target video to obtain a plurality of clipping video frames; and generating a clipping video according to the plurality of clipping video frames. Therefore, according to this embodiment, the target object to be cropped can be selected based on the degree of coreness, thereby ensuring effective cropping of the core object in the target video and improving the effect of cropping the video.

Figure 202010973452

Description

视频裁剪方法、电子设备及计算机可读存储介质Video cropping method, electronic device and computer-readable storage medium

技术领域technical field

本发明属于视频处理技术领域,具体涉及一种视频裁剪方法、电子设备及计算机可读存储介质。The invention belongs to the technical field of video processing, and in particular relates to a video cropping method, an electronic device and a computer-readable storage medium.

背景技术Background technique

现有技术中,针对视频的裁剪,比如将横屏视频裁剪为竖屏视频,通常是通过视频编辑软件进行固定位置、固定视频宽高比例的裁剪视频生成。由此,这种情况下,裁剪出来的视频可能会出现没有裁剪到核心人物等的情况,造成裁剪视频的效果差。In the prior art, for video cropping, such as cropping a horizontal screen video into a vertical screen video, a cropped video with a fixed position and a fixed video aspect ratio is usually generated by video editing software. Therefore, in this case, the cropped video may not be cropped to the core characters, etc., resulting in a poor effect of cropping the video.

发明内容SUMMARY OF THE INVENTION

本发明实施例的目的是提供一种视频裁剪方法、电子设备及计算机可读存储介质,以解决现有视频裁剪方法裁剪得到的视频效果差的问题。The purpose of the embodiments of the present invention is to provide a video cropping method, an electronic device and a computer-readable storage medium, so as to solve the problem that the video cropped by the existing video cropping method has poor effect.

为了解决上述技术问题,本发明是这样实现的:In order to solve the above-mentioned technical problems, the present invention is achieved in this way:

第一方面,本发明实施例提供了一种视频裁剪方法,应用于电子设备,该方法包括:In a first aspect, an embodiment of the present invention provides a video cropping method, which is applied to an electronic device, and the method includes:

获取待处理的目标视频;Get the target video to be processed;

分别确定所述目标视频中每个对象的核心度;Determine the core degree of each object in the target video respectively;

根据所述每个对象的核心度,将核心度大于预设阈值的对象确定为所述目标视频对应的目标对象;According to the core degree of each object, the object whose core degree is greater than the preset threshold is determined as the target object corresponding to the target video;

以所述目标对象为裁剪目标,对所述目标视频中的视频帧进行裁剪,得到多个裁剪视频帧;Taking the target object as a clipping target, clipping the video frames in the target video to obtain a plurality of clipping video frames;

根据所述多个裁剪视频帧,生成裁剪视频。A cropped video is generated from the plurality of cropped video frames.

可选的,所述分别确定所述目标视频中每个对象的核心度,包括:Optionally, the separately determining the coreness of each object in the target video includes:

基于以下至少一项,分别确定所述目标视频中每个对象的核心度:Determine the coreness of each object in the target video based on at least one of the following:

所述每个对象在所述目标视频中的连续性排名;the sequential ranking of each object in the target video;

所述每个对象在所述目标视频中是否被遮挡;Whether the each object is occluded in the target video;

所述每个对象是否为预设对象;Whether each of the objects is a preset object;

所述每个对象在所述目标视频中是否在说话;whether each object is speaking in the target video;

所述每个对象在所述目标视频中是否有预设动作的展示;Whether the each object has a display of preset actions in the target video;

所述每个对象在所述目标视频中的情感表现。Emotional representation of each object in the target video.

可选的,所述分别确定所述目标视频中每个对象的核心度,包括:Optionally, the separately determining the coreness of each object in the target video includes:

针对每个所述对象,分别执行以下过程:For each of said objects, perform the following procedures separately:

对所述对象进行分析,得到所述对象的多个特征值;Analyzing the object to obtain a plurality of characteristic values of the object;

根据所述多个特征值以及每个特征值的权重,计算得到所述对象的核心度。According to the plurality of eigenvalues and the weight of each eigenvalue, the coreness of the object is calculated.

可选的,所述多个特征值包括特征值S、特征值R、特征值K、特征值T、特征值A和特征值E;所述根据所述多个特征值以及每个特征值的权重,计算得到所述对象的核心度,包括:Optionally, the plurality of eigenvalues include eigenvalue S, eigenvalue R, eigenvalue K, eigenvalue T, eigenvalue A, and eigenvalue E; Weight, calculate the core degree of the object, including:

采用如下公式,计算得到所述对象的核心度I:The following formula is used to calculate the core degree I of the object:

I=a*S+b*(1/R)+c*K+d*T+e*A+f*EI=a*S+b*(1/R)+c*K+d*T+e*A+f*E

其中,S取值为0或1,0表示所述对象在所述目标视频中被遮挡,1表示所述对象在所述目标视频中没有被遮挡;R表示所述对象在所述目标视频中的连续性排名,取值为正整数;K取值为0或1,0表示所述对象不是预设对象,1表示所述对象是预设对象;T取值为0或1,0表示所述对象在所述目标视频中没有在说话,1表示所述对象在所述目标视频中在说话;A取值为0或1,0表示所述对象在所述目标视频中没有预设动作的展示,1表示所述对象在所述目标视频中有预设动作的展示;E取值为0或1,0表示所述对象在所述目标视频中没有预设情感表现,1表示所述对象在所述目标视频中有预设情感表现;a为S的权重,b为(1/R)的权重,c为K的权重,d为T的权重,e为A的权重,f为E的权重。Wherein, S is 0 or 1, 0 means the object is occluded in the target video, 1 means the object is not occluded in the target video; R means the object is in the target video The continuity ranking of , which is a positive integer; K is 0 or 1, 0 means that the object is not a preset object, 1 means that the object is a preset object; T is 0 or 1, 0 means all The object is not speaking in the target video, 1 means the object is speaking in the target video; A is 0 or 1, 0 indicates that the object has no preset action in the target video Display, 1 indicates that the object has a preset action display in the target video; E is 0 or 1, 0 indicates that the object has no preset emotional expression in the target video, and 1 indicates that the object There are preset emotional expressions in the target video; a is the weight of S, b is the weight of (1/R), c is the weight of K, d is the weight of T, e is the weight of A, and f is the weight of E Weights.

可选的,所述目标对象的数量为一个时,所述对所述目标视频中的视频帧进行裁剪,得到多个裁剪视频帧,包括:Optionally, when the number of the target object is one, the video frame in the target video is cropped to obtain a plurality of cropped video frames, including:

将所述目标对象的中心位置点的横坐标作为裁剪框的中心点横坐标,对所述目标视频中的视频帧进行裁剪,得到所述多个裁剪视频帧。The abscissa of the center point of the target object is used as the abscissa of the center point of the cropping frame, and the video frames in the target video are cropped to obtain the plurality of cropped video frames.

可选的,当所述目标对象占所述目标视频中视频帧的面积小于预设面积阈值时,所述裁剪视频帧的高度是基于所述目标对象的高度确定的,所述目标对象的高度与所述裁剪视频帧的高度的比值等于预设比例阈值。Optionally, when the area of the video frame in the target video occupied by the target object is smaller than a preset area threshold, the height of the cropped video frame is determined based on the height of the target object, and the height of the target object is determined. The ratio to the height of the cropped video frame is equal to a preset ratio threshold.

可选的,所述目标对象的数量为多个时,所述对所述目标视频中的视频帧进行裁剪,得到多个裁剪视频帧,包括:Optionally, when the number of the target objects is multiple, the video frames in the target video are cropped to obtain a plurality of cropped video frames, including:

在所述目标视频中存在包括一个目标对象的第一视频帧的情况下,将所述一个目标对象的中心位置点的横坐标作为裁剪框的中心点横坐标,从所述第一视频帧中裁剪得到裁剪视频帧;In the case where there is a first video frame including one target object in the target video, the abscissa of the center point of the one target object is taken as the abscissa of the center point of the cropping frame, and from the first video frame Crop to get cropped video frame;

和/或,and / or,

在所述目标视频中存在包括多个目标对象的第二视频帧的情况下,基于所述多个目标对象中的核心目标对象的位置,从所述第二视频帧中裁剪得到裁剪视频帧。In the case that a second video frame including multiple target objects exists in the target video, a cropped video frame is obtained by cropping from the second video frame based on the position of the core target object in the plurality of target objects.

可选的,所述核心目标对象满足以下条件中的任意一项:Optionally, the core target object satisfies any one of the following conditions:

在所述多个目标对象中的核心度最高;The core degree is the highest among the plurality of target objects;

在所述多个目标对象中,所述核心目标对象的中心点位置最接近所述第二视频帧的中心位置。Among the plurality of target objects, the position of the center point of the core target object is closest to the center position of the second video frame.

可选的,当所述核心目标对象在所述多个目标对象中的核心度最高时,所述基于所述多个目标对象中的核心目标对象的位置,从所述第二视频帧中裁剪得到裁剪视频帧,包括:Optionally, when the core target object has the highest coreness among the plurality of target objects, the cropping from the second video frame is based on the position of the core target object in the plurality of target objects. Get cropped video frames, including:

根据所述核心目标对象的位置,确定所述核心目标对象与其他目标对象之间的距离;According to the position of the core target object, determine the distance between the core target object and other target objects;

根据所述距离,确定能够与所述核心目标对象同时处于裁剪框的第一目标对象;According to the distance, determine a first target object that can be in the cropping frame at the same time as the core target object;

根据预设条件选取目标裁剪框,并利用所述目标裁剪框,从所述第二视频帧中裁剪得到裁剪视频帧;其中,所述目标裁剪框完全覆盖所述核心目标对象和至少一个第一目标对象。Select a target cropping frame according to a preset condition, and use the target cropping frame to crop a cropped video frame from the second video frame; wherein, the target cropping frame completely covers the core target object and at least one first video frame. target.

可选的,当所述核心目标对象的中心点位置最接近所述第二视频帧的中心位置时,所述基于所述多个目标对象中的核心目标对象的位置,从所述第二视频帧中裁剪得到裁剪视频帧,包括:Optionally, when the position of the center point of the core target object is closest to the center position of the second video frame, based on the position of the core target object in the plurality of target objects, the Crop from frame to get cropped video frames, including:

将所述核心目标对象的中心位置点的横坐标作为裁剪框的中心点横坐标,以预设步长滑动所述裁剪框,并利用所述滑动后的裁剪框,从所述第二视频帧中裁剪得到裁剪视频帧;其中,所述滑动后的裁剪框完全覆盖所述核心目标对象,且所述滑动后的裁剪框内没有覆盖不全的第二目标对象;所述第二目标对象为所述多个目标对象中除所述核心目标对象之外的其他目标对象。Taking the abscissa of the center position point of the core target object as the abscissa of the center point of the cropping frame, sliding the cropping frame with a preset step size, and using the sliding cropping frame, from the second video frame The cropped video frame is obtained by cropping in the middle; wherein, the sliding cropping frame completely covers the core target object, and there is no incompletely covered second target object in the sliding cropping frame; the second target object is the other target objects except the core target object among the plurality of target objects.

第二方面,本发明实施例提供了一种电子设备,该电子设备包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如上所述的方法的步骤。In a second aspect, an embodiment of the present invention provides an electronic device, the electronic device includes a processor, a memory, and a program or instruction stored on the memory and executable on the processor, the program or instruction being The processor, when executed, implements the steps of the method as described above.

第三方面,本发明实施例提供了一种计算机可读存储介质,所述计算机可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如上所述的方法的步骤。In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where a program or an instruction is stored on the computer-readable storage medium, and when the program or instruction is executed by a processor, the steps of the above method are implemented.

在本发明实施例中,电子设备可以获取待处理的目标视频,分别确定所述目标视频中每个对象的核心度,根据所述每个对象的核心度,将核心度大于预设阈值的对象确定为所述目标视频对应的目标对象,并以所述目标对象为裁剪目标,对所述目标视频中的视频帧进行裁剪,得到多个裁剪视频帧,根据所述多个裁剪视频帧,生成裁剪视频。由此,可以基于核心度来选择裁剪的目标对象,从而保证对目标视频中核心对象的有效裁剪,提升裁剪视频的效果。In this embodiment of the present invention, the electronic device may acquire the target video to be processed, determine the coreness of each object in the target video, and classify the objects whose coreness is greater than a preset threshold according to the coreness of each object. Determine the target object corresponding to the target video, and take the target object as the cropping target, crop the video frames in the target video to obtain a plurality of cropped video frames, and generate a plurality of cropped video frames according to the plurality of cropped video frames. Crop video. Therefore, the target object to be cropped can be selected based on the coreness, thereby ensuring effective cropping of the core object in the target video and improving the effect of cropping the video.

附图说明Description of drawings

图1是本发明实施例的视频裁剪方法的流程图;1 is a flowchart of a video cropping method according to an embodiment of the present invention;

图2是本发明实施例中的裁剪示意图之一;Fig. 2 is one of the cropping schematic diagrams in the embodiment of the present invention;

图3A、图3B和图3C是本发明实施例中的裁剪示意图之二;3A, 3B and 3C are the second schematic diagrams of cropping in the embodiment of the present invention;

图4是本发明实施例中的裁剪示意图之三;Fig. 4 is the third schematic diagram of cropping in the embodiment of the present invention;

图5是本发明实施例的视频裁剪装置的结构示意图;5 is a schematic structural diagram of a video cropping device according to an embodiment of the present invention;

图6是本发明实施例的电子设备的结构示意图。FIG. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本发明的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施,且“第一”、“第二”等所区分的对象通常为一类,并不限定对象的个数,例如第一对象可以是一个,也可以是多个。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”,一般表示前后关联对象是一种“或”的关系。The terms "first", "second" and the like in the description and claims of the present invention are used to distinguish similar objects, and are not used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that embodiments of the invention can be practiced in sequences other than those illustrated or described herein, and distinguish between "first", "second", etc. The objects are usually of one type, and the number of objects is not limited. For example, the first object may be one or more than one. In addition, "and/or" in the description and claims indicates at least one of the connected objects, and the character "/" generally indicates that the associated objects are in an "or" relationship.

下面结合附图,通过具体的实施例对本发明实施例提供的视频裁剪方法进行详细地说明。The video cropping method provided by the embodiments of the present invention will be described in detail below with reference to the accompanying drawings through specific embodiments.

请参见图1,图1是本发明实施例提供的一种视频裁剪方法的流程图,该方法应用于电子设备,如图1所示,该方法包括如下步骤:Please refer to FIG. 1. FIG. 1 is a flowchart of a video cropping method provided by an embodiment of the present invention. The method is applied to an electronic device. As shown in FIG. 1, the method includes the following steps:

步骤101:获取待处理的目标视频。Step 101: Acquire the target video to be processed.

本实施例中,目标视频可选为横屏视频或者竖屏视频。对于目前的视频内容,一般是以不同镜头的切换进行故事的叙述,且不同镜头之间的核心人物及人物位置等一般都不同,因此为了准确裁剪,上述获取的待处理的目标视频可选为一个镜头下的视频。In this embodiment, the target video can be selected as a horizontal screen video or a vertical screen video. For the current video content, the story is generally narrated by switching between different shots, and the core characters and the positions of the characters are generally different between different shots. Therefore, in order to accurately crop, the target video obtained above to be processed can be selected as Video from one shot.

一种实施方式中,电子设备在获取待裁剪的原始视频(比如横屏视频)之后,若该原始视频包括多个不同镜头下的视频,可以通过镜头检测算法,将该原始视频按镜头进行拆分,得到按镜头为单位的目标视频,并后续对每个镜头下的目标视频进行分析裁剪。In one embodiment, after the electronic device obtains the original video to be cropped (such as a landscape video), if the original video includes videos from multiple different shots, the original video can be disassembled by shot through a shot detection algorithm. The target video is obtained by shot as a unit, and the target video under each shot is subsequently analyzed and cropped.

一种实施方式中,待处理的目标视频的宽高比例可选为9:16。此外,裁剪后的视频(即裁剪视频)的宽高比例也可选为9:16。In one embodiment, the aspect ratio of the target video to be processed may be 9:16. In addition, the aspect ratio of the cropped video (that is, the cropped video) can also be selected as 9:16.

步骤102:分别确定所述目标视频中每个对象的核心度。Step 102: Determine the coreness of each object in the target video respectively.

本实施例中,目标视频中的对象可以为人物、物体(例如车辆)等,在此不作限定。优选的,目标视频中的对象为人物。此步骤中确定的核心度可以理解为相应对象的重要程度。In this embodiment, the object in the target video may be a person, an object (for example, a vehicle), etc., which is not limited here. Preferably, the object in the target video is a person. The core degree determined in this step can be understood as the importance degree of the corresponding object.

以目标视频中的对象为人物为例,人物的核心度可以通过人物识别、人物追踪、人脸检测、预设动作识别、和/或人物情感识别等人工智能(Artificial Intelligence,AI)算法来确定。Taking the object in the target video as a character as an example, the core degree of the character can be determined through artificial intelligence (AI) algorithms such as character recognition, character tracking, face detection, preset action recognition, and/or character emotion recognition. .

步骤103:根据所述每个对象的核心度,将核心度大于预设阈值的对象确定为所述目标视频对应的目标对象。Step 103: According to the coreness of each object, determine an object whose coreness is greater than a preset threshold as the target object corresponding to the target video.

可选的,该预设阈值可以基于实际需求预先设置,比如预先设置为60等,本实施例不对此进行限制。此步骤中确定的目标对象可理解为核心对象,对应数量可以为一个或者多个。Optionally, the preset threshold may be preset based on actual requirements, such as preset as 60, etc., which is not limited in this embodiment. The target objects determined in this step may be understood as core objects, and the corresponding number may be one or more.

步骤104:以所述目标对象为裁剪目标,对所述目标视频中的视频帧进行裁剪,得到多个裁剪视频帧。Step 104: Using the target object as a cropping target, crop video frames in the target video to obtain a plurality of cropped video frames.

本实施例中,在对目标视频中的视频帧进行裁剪时,可以针对目标视频中的每个视频帧进行裁剪。特殊情况下,若目标视频中的某个视频帧不包括目标对象或者包括的目标对象不完整等,可以跳过该视频帧,即不对该视频帧进行裁剪。而裁剪得到的裁剪视频帧的大小优选为一致。In this embodiment, when the video frames in the target video are cropped, each video frame in the target video may be cropped. In a special case, if a certain video frame in the target video does not include the target object or the included target object is incomplete, etc., the video frame can be skipped, that is, the video frame is not cropped. The sizes of the cropped video frames obtained by cropping are preferably the same.

步骤105:根据所述多个裁剪视频帧,生成裁剪视频。Step 105: Generate a cropped video according to the plurality of cropped video frames.

可选的,在生成裁剪视频时,可以按照该多个裁剪视频帧的时序,对该多个裁剪视频帧进行合成。Optionally, when the cropped video is generated, the plurality of cropped video frames may be synthesized according to the time sequence of the plurality of cropped video frames.

上述裁剪视频可选为竖屏视频,比如将横屏视频裁剪为竖屏视频。The above cropped video can be selected as a vertical screen video, for example, a horizontal screen video is cropped into a vertical screen video.

需要说明的是,在本发明实施例中,在对每个镜头下的目标视频进行分析裁剪得到裁剪视频之后,可以基于原始视频中不同镜头间的切换关系,对不同镜头下的裁剪视频进行合并,从而得到与原始视频对应的裁剪视频。It should be noted that, in this embodiment of the present invention, after the target video under each shot is analyzed and cropped to obtain the cropped video, the cropped videos under different shots may be merged based on the switching relationship between different shots in the original video , so as to obtain the cropped video corresponding to the original video.

本发明实施例的视频裁剪方法,电子设备可以获取待处理的目标视频,分别确定目标视频中每个对象的核心度,根据所述每个对象的核心度,将核心度大于预设阈值的对象确定为目标视频对应的目标对象,并以所述目标对象为裁剪目标,对目标视频中的视频帧进行裁剪,得到多个裁剪视频帧,并根据所述多个裁剪视频帧,生成裁剪视频。由此,可以基于核心度来选择裁剪的目标对象,从而保证对目标视频中核心对象的有效裁剪,提升裁剪视频的效果。In the video cropping method according to the embodiment of the present invention, the electronic device can obtain the target video to be processed, determine the coreness of each object in the target video, and classify the objects whose coreness is greater than a preset threshold according to the coreness of each object. A target object corresponding to the target video is determined, and the target object is used as a cropping target, and video frames in the target video are cropped to obtain a plurality of cropped video frames, and a cropped video is generated according to the plurality of cropped video frames. Therefore, the target object to be cropped can be selected based on the coreness, thereby ensuring effective cropping of the core object in the target video and improving the effect of cropping the video.

本发明实施例中,考虑到影响对象重要性的因素,比如在目标视频中的连续性、是否为预设对象、是否有预设动作的展示等等,上述分别确定目标视频中每个对象的核心度的过程可以包括:In the embodiment of the present invention, considering the factors affecting the importance of the object, such as continuity in the target video, whether it is a preset object, whether there is a preset action display, etc., the above-mentioned determination of each object in the target video The core degree process can include:

基于以下至少一项,分别确定目标视频中每个对象的核心度:Determine the coreness of each object in the target video separately based on at least one of the following:

1)每个对象在目标视频中的连续性排名1) Continuity ranking of each object in the target video

对于1)中的连续性排名,可以通过跟踪检测算法(比如人物跟踪检测算法)获得目标视频中每个对象的跟踪坐标,并基于每个对象的跟踪坐标的个数以及间断次数,对每个对象在目标视频中的连续性进行排名。可理解的,连续性排名越高,核心程度越高。For the continuity ranking in 1), the tracking coordinates of each object in the target video can be obtained through a tracking detection algorithm (such as a person tracking detection algorithm), and based on the number of tracking coordinates of each object and the number of interruptions, for each object Objects are ranked by their continuity in the target video. Understandably, the higher the continuity rank, the higher the core.

一种实施方式中,以上述对象为人物为例,人物跟踪检测算法可以采用人体骨骼检测及跟踪算法,对目标视频中每个人物进行跟踪,以获取人物跟踪坐标以及人物头部中心点坐标,并根据获取到的人物跟踪坐标,计算跟踪到的每个人物出现的帧数并进行排名。而若两个人物出现的帧数相同,则可以进一步以跟踪坐标间断次数来排名,间断次数越少排名越高。其中该人物头部中心点坐标可用于保证后续裁剪时能够截取到人物头部位置In one embodiment, taking the above-mentioned object as a person as an example, the person tracking detection algorithm can use a human skeleton detection and tracking algorithm to track each person in the target video to obtain the tracking coordinates of the person and the coordinates of the center point of the person's head, And according to the obtained character tracking coordinates, the number of frames that each tracked character appears in is calculated and ranked. If the two characters appear in the same number of frames, the ranking can be further based on the number of interruptions in the tracking coordinates. The less the number of interruptions, the higher the ranking. The coordinates of the center point of the character's head can be used to ensure that the position of the character's head can be intercepted during subsequent cropping

例如,一个镜头下的目标视频包括5秒,共100帧,并通过人物跟踪检测算法获得每一个人物对应每帧的头部坐标序列。比如,如果人物A出现90帧,帧序列为[1-90],人物B出现90帧,帧序列为[1-40,45-95],人物C出现50帧,则人物A、人物B和人物C的连续性排名可如下表1所示:For example, the target video under one shot includes 5 seconds and a total of 100 frames, and the head coordinate sequence of each character corresponding to each frame is obtained through the character tracking detection algorithm. For example, if character A appears in 90 frames, the frame sequence is [1-90], character B appears in 90 frames, the frame sequence is [1-40, 45-95], and character C appears in 50 frames, then character A, character B and The continuous ranking of character C can be shown in Table 1 below:

表1Table 1

连续性排名Continuity ranking 人物figure 11 AA 22 BB 33 CC

2)每个对象在目标视频中是否被遮挡2) Whether each object is occluded in the target video

此2)中,可以通过图像识别技术对目标视频中的视频帧进行识别,以判定其中的对象是否被遮挡。可理解的,相比于被遮挡的对象,未被遮挡的对象的核心程度通常比较高。In this 2), the video frame in the target video can be recognized by image recognition technology to determine whether the object in it is blocked. Understandably, compared to occluded objects, unoccluded objects usually have a higher core level.

3)每个对象是否为预设对象3) Whether each object is a preset object

此3)中,可以通过图像识别技术对目标视频中的视频帧进行识别,以判定其中的对象是否为预设对象,该预设对象可以基于实际需求预先设置。可理解的,相比于非预设对象,预设对象的核心程度通常比较高。In this 3), the video frame in the target video can be recognized by image recognition technology to determine whether the object in it is a preset object, and the preset object can be preset based on actual requirements. It is understandable that, compared with non-preset objects, the core level of preset objects is usually higher.

比如,以上述对象为人物为例,可以通过人脸识别技术,确定目标视频中的人物是否为特定人物或者已知的明星人物。For example, taking the above object as a person as an example, it can be determined whether the person in the target video is a specific person or a known star person through face recognition technology.

4)每个对象在目标视频中是否在说话4) Whether each object is speaking in the target video

此4)中,该对象可选为人物,可以通过唇部动作识别,确定目标视频中的每个人物在相应镜头中是否有在说话。可理解的,相比于没有在说话的对象,在说话的对象的核心程度通常比较高。In this 4), the object can be selected as a person, and it can be recognized by lip movements to determine whether each person in the target video is speaking in the corresponding shot. Understandably, speaking subjects are usually more central to the core than non-speaking subjects.

5)每个对象在目标视频中是否有预设动作的展示5) Whether each object has a preset action display in the target video

此5)中,可以通过动作识别,确定目标视频中的每个对象在相应镜头中是否有预设动作的展示。其中该预设动作可以基于实际需求预先设置。可理解的,相比于没有预设动作展示的对象,有预设动作展示的对象的核心程度通常比较高。In this 5), it can be determined through action recognition whether each object in the target video has a preset action displayed in the corresponding shot. The preset action may be preset based on actual requirements. It is understandable that, compared with objects displayed without preset actions, the core level of objects displayed with preset actions is usually higher.

比如,以上述对象为人物为例,该预设动作可选为但不限于伸展手臂、踢球、扣篮、和/或跳舞等。For example, taking the above object as a character as an example, the preset action can be selected from, but not limited to, stretching an arm, kicking a ball, dunking, and/or dancing.

6)每个对象在目标视频中的情感表现6) Emotional representation of each object in the target video

此6)中,该对象可选为人物,可以通过人脸表情识别,确定目标视频中的每个人物在相应镜头中是否有情感表现,该情感表现可选为但不限于惊讶、大笑、哭泣等。可理解的,相比于没有情感表现的人物,有情感表现的人物的核心程度通常比较高。In this 6), the object can be selected as a character, and through facial expression recognition, it can be determined whether each character in the target video has an emotional expression in the corresponding shot, and the emotional expression can be selected as but not limited to surprise, laughter, cry etc. Understandably, characters with emotional expressions usually have a higher level of core than those without emotional expressions.

进一步的,在本发明实施例中,除了确定是否有情感表现之外,还可以确定具体的情感表现,并基于该具体的情感表现来评估相应人物的重要程度。可选的,具体的人脸表情识别可以是通过识别目标视频中不同人物不同帧图像的脸部表情来标记每个人的情感。Further, in this embodiment of the present invention, in addition to determining whether there is an emotional expression, a specific emotional expression can also be determined, and the importance of the corresponding character is evaluated based on the specific emotional expression. Optionally, the specific facial expression recognition may be to mark each person's emotion by recognizing the facial expressions of different people in different frames of images in the target video.

一种实施方式中,针对目标视频片段,可以首先按照一定频率进行抽帧,然后使用多任务卷积神经网络(Multi-Task Convolutional Neural Network,MTCNN)模型提取出抽取的视频帧图像中对应人物出现的人脸,最后使用Xception网络模型(一种深度可分离卷积网络模型)对所有人物的人脸表情进行分类,统计人物是否有情感表现。In one embodiment, for the target video segment, the frame can be drawn according to a certain frequency first, and then a multi-task convolutional neural network (Multi-Task Convolutional Neural Network, MTCNN) model is used to extract the appearance of the corresponding person in the extracted video frame image. Finally, the Xception network model (a deep separable convolutional network model) is used to classify the facial expressions of all characters and count whether the characters have emotional expressions.

这样,借助上述1)至6)的影响因素,可以准确确定目标视频中对象的核心度。In this way, with the aid of the above-mentioned influencing factors 1) to 6), the coreness of the object in the target video can be accurately determined.

比如,以基于上述的1)、3)至6)来确定目标视频中人物的核心度为例,若该目标视频中的人物包括人物A、人物B和人物C,则相应识别到的结果可如下表2所示:For example, taking the determination of the coreness of the characters in the target video based on the above 1), 3) to 6) as an example, if the characters in the target video include character A, character B and character C, the correspondingly recognized results can be As shown in Table 2 below:

表2Table 2

Figure BDA0002684923370000091
Figure BDA0002684923370000091

本发明实施例中,可选的,上述分别确定目标视频中每个对象的核心度的过程可以包括:针对每个对象,分别执行以下过程:In the embodiment of the present invention, optionally, the above-mentioned process of respectively determining the coreness of each object in the target video may include: for each object, respectively performing the following process:

对所述对象进行分析,得到所述对象的多个特征值;Analyzing the object to obtain a plurality of characteristic values of the object;

根据多个特征值以及每个特征值的权重,计算得到所述对象的核心度。According to the multiple eigenvalues and the weight of each eigenvalue, the coreness of the object is calculated.

需指出的,上述对对象进行分析的方式包括但不限于上述1)至6)中内容,比如:分析该对象在目标视频中的连续性排名、分析该对象在目标视频中是否被遮挡、分析该对象是否为预设对象、分析该对象在目标视频中是否在说话、分析该对象在目标视频中是否有预设动作的展示、和/或分析该对象在目标视频中的情感表现等,而对应的特征值可以基于实际需求设置。比如,若某对象在目标视频中被遮挡,对应的特征值可取值为0,而若某对象在目标视频中没有被遮挡,对应的特征值可取值为1;等等。It should be pointed out that the above-mentioned methods for analyzing the object include but are not limited to the content in 1) to 6) above, such as: analyzing the continuous ranking of the object in the target video, analyzing whether the object is occluded in the target video, analyzing Whether the object is a preset object, analyze whether the object is speaking in the target video, analyze whether the object has a preset action display in the target video, and/or analyze the emotional performance of the object in the target video, etc., and The corresponding eigenvalues can be set based on actual requirements. For example, if an object is occluded in the target video, the corresponding eigenvalue can take a value of 0, and if an object is not occluded in the target video, the corresponding eigenvalue can take a value of 1; and so on.

进一步的,若上述多个特征值包括特征值S、特征值R、特征值K、特征值T、特征值A和特征值E,上述根据多个特征值以及每个特征值的权重,计算得到所述对象的核心度的过程可以包括:Further, if the above-mentioned multiple eigenvalues include eigenvalue S, eigenvalue R, eigenvalue K, eigenvalue T, eigenvalue A and eigenvalue E, the above-mentioned multiple eigenvalues and the weight of each eigenvalue are calculated to obtain The process of object coreness may include:

采用如下公式,计算得到所述对象的核心度I:The following formula is used to calculate the core degree I of the object:

I=a*S+b*(1/R)+c*K+d*T+e*A+f*EI=a*S+b*(1/R)+c*K+d*T+e*A+f*E

其中,S取值为0或1,0表示被遮挡,即所述对象在目标视频中被遮挡;1表示没有被遮挡,即所述对象在目标视频中没有被遮挡;Wherein, the value of S is 0 or 1, 0 means occluded, that is, the object is occluded in the target video; 1 means that it is not occluded, that is, the object is not occluded in the target video;

R表示所述对象在目标视频中的连续性排名,取值为正整数,比如取值为1、2、3……;R represents the continuous ranking of the object in the target video, which is a positive integer, such as 1, 2, 3...;

K取值为0或1,0表示不是预设对象,即所述对象不是预设对象;1表示是预设对象,即所述对象是预设对象;K is 0 or 1, 0 means it is not a preset object, that is, the object is not a preset object; 1 means it is a preset object, that is, the object is a preset object;

T取值为0或1,0表示没有在说话,即所述对象在目标视频中没有在说话;1表示在说话,即所述对象在目标视频中在说话;T is 0 or 1, 0 means not speaking, that is, the object is not speaking in the target video; 1 means speaking, that is, the object is speaking in the target video;

A取值为0或1,0表示没有预设动作的展示,即所述对象在目标视频中没有预设动作的展示;1表示有预设动作的展示,即所述对象在目标视频中有预设动作的展示;A takes a value of 0 or 1, 0 means no preset action display, that is, the object has no preset action display in the target video; 1 means there is a preset action display, that is, the object has a preset action in the target video. Display of preset actions;

E取值为0或1,0表示没有预设情感表现,即所述对象在目标视频中没有预设情感表现;1表示有预设情感表现,即所述对象在目标视频中有预设情感表现;E takes the value of 0 or 1, 0 means there is no preset emotional expression, that is, the object has no preset emotional expression in the target video; 1 means there is a preset emotional expression, that is, the object has a preset emotional expression in the target video Performance;

a为S的权重,b为(1/R)的权重,c为K的权重,d为T的权重,e为A的权重,f为E的权重。a is the weight of S, b is the weight of (1/R), c is the weight of K, d is the weight of T, e is the weight of A, and f is the weight of E.

一种实施方式中,c可取值为20,同时a、b、d、e和f取值为10。这种情况下,计算出来的核心度最小为10/R,而最大为70。此时用于判断是否是目标对象的预设阈值最高可预设为60,但是不以此为限,还可以预设为50或者40等,可以基于需求进行调整。In one embodiment, c can take a value of 20, while a, b, d, e, and f can take a value of 10. In this case, the calculated coreness is a minimum of 10/R and a maximum of 70. At this time, the preset threshold for judging whether it is the target object can be preset to a maximum of 60, but it is not limited to this, and can also be preset to 50 or 40, etc., which can be adjusted based on needs.

需指出的,若目标视频中视频帧的高度为h,宽度为w,宽高比为9:16,裁剪视频帧的高度为h',宽度为w',宽高比为9:16,以及目标视频中视频帧的待裁剪的位置为(xmin,ymin,xmax,ymax)(此时是以目标视频中视频帧的左上点为原点构建的坐标系),则可存在如下关系:It should be pointed out that if the height of the video frame in the target video is h, the width is w, the aspect ratio is 9:16, the height of the cropped video frame is h', the width is w', the aspect ratio is 9:16, and The position to be cropped of the video frame in the target video is (xmin, ymin, xmax, ymax) (at this time, the coordinate system constructed with the upper left point of the video frame in the target video as the origin), then the following relationship can exist:

Figure BDA0002684923370000101
Figure BDA0002684923370000101

Figure BDA0002684923370000102
Figure BDA0002684923370000102

Figure BDA0002684923370000103
Figure BDA0002684923370000103

ymin=h-h′ymin=h-h′

Figure BDA0002684923370000104
Figure BDA0002684923370000104

ymax=hymax=h

其中,

Figure BDA0002684923370000111
为裁剪视频帧的高度与原始视频帧(即目标视频中视频帧)的高度的比例值,可选为小于或等于1的正数。该
Figure BDA0002684923370000112
可以基于实际需求预设,也可以基于实际情况调整。cx为裁剪框的中心点横坐标,也可理解为裁剪视频帧的中心点横坐标。该裁剪视频帧可以是基于裁剪框裁剪得到。in,
Figure BDA0002684923370000111
It is the ratio of the height of the cropped video frame to the height of the original video frame (that is, the video frame in the target video), which can be a positive number less than or equal to 1. Should
Figure BDA0002684923370000112
It can be preset based on actual needs or adjusted based on actual conditions. cx is the abscissa of the center point of the cropping frame, which can also be understood as the abscissa of the center point of the cropped video frame. The cropped video frame may be obtained by cropping based on the cropping frame.

本发明实施例中,对于待处理的目标视频,其对应的目标对象的数量可以为一个或者多个。而基于目标对象的数量为一个或者多个,可以采用不同的视频裁剪方式,说明如下。In this embodiment of the present invention, for the target video to be processed, the number of corresponding target objects may be one or more. However, based on the number of target objects being one or more, different video cropping methods can be adopted, as described below.

场景一:目标对象的数量为一个Scenario 1: The number of target objects is one

此场景下,由于目标对象的数量为一个,即待裁剪的核心对象有一个,故而可以直接基于该目标对象的位置来进行裁剪。In this scenario, since the number of target objects is one, that is, there is one core object to be cropped, cropping can be performed directly based on the position of the target object.

可选的,具体的裁剪方式可为:将该目标对象的中心位置点的横坐标作为裁剪框的中心点横坐标,对目标视频中的视频帧进行裁剪,得到多个裁剪视频帧。其中,裁剪视频帧为基于裁剪框对目标视频中的视频帧进行裁剪得到,裁剪视频帧与裁剪框的大小相同。Optionally, the specific cropping method may be: the abscissa of the center point of the target object is used as the abscissa of the center point of the cropping frame, and the video frames in the target video are cropped to obtain multiple cropped video frames. The cropped video frame is obtained by cropping the video frame in the target video based on the cropping frame, and the size of the cropped video frame and the cropping frame is the same.

一种实施方式中,以该目标对象为人物A为例,该目标对象的中心位置点可选为人物A头部位置的中心点。In one embodiment, taking the target object as person A as an example, the center point of the target object can be selected as the center point of the head position of person A.

一种实施方式中,以该目标对象为人物A为例,在对目标视频中的视频帧进行裁剪时,可以按照人物A头部位置的中心点坐标进行每帧裁剪,以保证后续合成的裁剪视频的完整。比如,参见图2所示,如果针对目标视频中的视频帧1进行裁剪,此时人物A头部位置的中心点横坐标(x')等于裁剪框的中心点横坐标(cx),对视频帧1的裁剪示意图可如图2所示。In one embodiment, taking the target object as person A as an example, when cropping the video frames in the target video, each frame can be cropped according to the coordinates of the center point of the head position of person A, so as to ensure the cropping of subsequent synthesis. Complete video. For example, as shown in Figure 2, if the video frame 1 in the target video is cropped, the abscissa (x') of the center point of the head position of character A is equal to the abscissa (cx) of the center point of the cropping frame. A schematic diagram of the cropping of frame 1 can be shown in FIG. 2 .

需指出的,上述裁剪框的大小可以是预先设置好的,比如宽度为w',高度为h',宽高比为9:16,且高度h'与目标视频中视频帧的高度h的比例值为

Figure BDA0002684923370000113
It should be pointed out that the size of the above-mentioned cropping frame may be preset, for example, the width is w', the height is h', the aspect ratio is 9:16, and the ratio of the height h' to the height h of the video frame in the target video value is
Figure BDA0002684923370000113

此外,上述裁剪框的大小也是可以调整的。可选的,当目标视频对应的目标对象占该目标视频中视频帧的面积小于预设面积阈值时,裁剪视频帧(即裁剪框)的高度可以是基于该目标对象的高度确定的,该目标对象的高度与裁剪视频帧的高度的比值等于预设比例阈值。该预设面积阈值以及预设比例阈值可以基于实际需求预先设置。In addition, the size of the above-mentioned cropping frame can also be adjusted. Optionally, when the area of the video frame in the target video occupied by the target object corresponding to the target video is less than the preset area threshold, the height of the cropped video frame (that is, the cropping frame) can be determined based on the height of the target object, and the target The ratio of the height of the object to the height of the cropped video frame is equal to the preset scale threshold. The preset area threshold and the preset ratio threshold may be preset based on actual requirements.

比如,以该目标对象为人物为例,如果人物占对应目标视频中视频帧的面积小于该视频帧面积的1/8(此值可调整),预先设置的裁剪框与该视频帧的大小相同即

Figure BDA0002684923370000121
为1,则可以人物高度ph占裁剪框高度h',3/4(此值可调整)的比例缩小裁剪框,以缩小裁剪视频帧高度,此时
Figure BDA0002684923370000122
为:
Figure BDA0002684923370000123
For example, taking the target object as a character as an example, if the area of the video frame in the corresponding target video is less than 1/8 of the video frame area (this value can be adjusted), the preset cropping frame is the same size as the video frame which is
Figure BDA0002684923370000121
If it is 1, you can reduce the cropping frame by the ratio of the height of the character ph to the height of the cropping frame h', 3/4 (this value can be adjusted) to reduce the height of the cropped video frame.
Figure BDA0002684923370000122
for:
Figure BDA0002684923370000123

场景二:目标对象的数量为多个Scenario 2: The number of target objects is multiple

此场景下,当目标视频对应的目标对象的数量为多个时,对于目标视频中的视频帧,可能仅包括一个目标对象,也可能包括对个目标对象。此时基于目标对象数量的不同,可以采用不同的视频裁剪方式,说明如下。In this scenario, when the number of target objects corresponding to the target video is multiple, the video frame in the target video may include only one target object, or may include multiple target objects. At this time, based on the difference in the number of target objects, different video cropping methods can be adopted, as described below.

情况1:在目标视频中存在包括一个目标对象的第一视频帧的情况下,针对此第一视频帧,可以将所述一个目标对象的中心位置点的横坐标作为裁剪框的中心点横坐标,并从所述第一视频帧中裁剪得到裁剪视频帧。其中该裁剪视频帧为基于裁剪框对第一视频帧进行裁剪得到。该第一视频帧可以为目标视频中的任一视频帧。Case 1: When there is a first video frame including a target object in the target video, for the first video frame, the abscissa of the center point of the target object can be used as the abscissa of the center point of the cropping frame , and a cropped video frame is obtained by cropping the first video frame. The cropped video frame is obtained by cropping the first video frame based on the cropping frame. The first video frame may be any video frame in the target video.

一种实施方式中,以该一个目标对象为人物B为例,该一个目标对象的中心位置点可选为人物B头部位置的中心点。In an implementation manner, taking the one target object as a character B as an example, the center point of the one target object can be selected as the center point of the head position of the character B.

情况2:在目标视频中存在包括多个目标对象的第二视频帧的情况下,针对此第二视频帧,可以基于所述多个目标对象中的核心目标对象的位置,从所述第二视频帧中裁剪得到裁剪视频帧。该第二视频帧可以为目标视频中的任一视频帧。Case 2: In the case where there is a second video frame including a plurality of target objects in the target video, for this second video frame, based on the position of the core target object in the plurality of target objects, from the second video frame Crop a video frame to get a cropped video frame. The second video frame may be any video frame in the target video.

可理解的,上述情况1和情况2可以是和/或的关系。也就是说,在目标视频对应的目标对象的数量为多个的情况下,该目标视频中视频帧包括目标对象的情况可以为以下任意一种:1)每个视频帧仅包括一个目标对象,不同视频帧可能包括不同的目标对象;2)每个视频帧包括多个目标对象;3)部分视频帧仅包括一个目标对象,同时部分视频帧包括多个目标对象。对此本发明实施例不进行限制。It can be understood that the above-mentioned case 1 and case 2 may be in the relationship of and/or. That is to say, when the number of target objects corresponding to the target video is multiple, the situation where the video frame in the target video includes the target object may be any of the following: 1) each video frame includes only one target object, Different video frames may include different target objects; 2) each video frame includes multiple target objects; 3) some video frames include only one target object, while some video frames include multiple target objects. This embodiment of the present invention is not limited.

可选的,在上述情况2下,核心目标对象可满足以下条件中任意一项:Optionally, in the above case 2, the core target object may satisfy any one of the following conditions:

(1)在多个目标对象中的核心度最高。此时,核心目标对象可以位于第二视频帧的任意位置,比如中心位置、靠边位置等。这样,可以优先选取核心度最高的目标对象来裁剪视频,保证核心目标对象画面被有效裁剪。(1) The core degree is the highest among multiple target objects. At this time, the core target object may be located at any position of the second video frame, such as a center position, a side position, and the like. In this way, the target object with the highest core degree can be preferentially selected to crop the video, so as to ensure that the picture of the core target object is effectively cropped.

一种实施方式中,以多个目标对象为多个核心人物为例,该多个核心人物在目标视频即同一镜头中的核心度高于已设置的阈值。In one embodiment, taking the multiple target objects as multiple core characters as an example, the coreness of the multiple core characters in the target video, that is, the same shot, is higher than a set threshold.

此(1)中,具体的视频裁剪过程可包括:首先,根据所述核心目标对象的位置,确定所述核心目标对象与其他目标对象之间的距离;然后,根据所述距离,确定能够与所述核心目标对象同时处于裁剪框的第一目标对象;此第一目标对象可以为多个目标对象中的一个或多个对象;最后,根据预设条件选取目标裁剪框,并利用所述目标裁剪框,从所述第二视频帧中裁剪得到裁剪视频帧;其中,所述目标裁剪框完全覆盖所述核心目标对象和至少一个第一目标对象。对于该预设条件,可以为满足尽可能多的第一目标对象同在目标裁剪框,也可以为满足预设要求的第一目标对象在目标裁剪框,该预设要求可以基于实际需求预先设置,比如为连续性排名高于预设阈值、有预设动作展示、有情感表现等。In this (1), the specific video cropping process may include: first, according to the position of the core target object, determine the distance between the core target object and other target objects; then, according to the distance, determine the distance between the core target object and other target objects; The core target object is the first target object of the cropping frame at the same time; the first target object may be one or more objects in multiple target objects; finally, the target cropping frame is selected according to preset conditions, and the target cropping frame is selected using the target A cropping frame is obtained by cropping a cropped video frame from the second video frame; wherein, the target cropping frame completely covers the core target object and at least one first target object. For the preset condition, the first target object that meets as many as possible can be in the target cropping frame, or the first target object that meets the preset requirement can be in the target cropping frame. The preset requirement can be preset based on actual requirements. , such as ranking higher than a preset threshold for continuity, showing preset actions, showing emotions, etc.

一种实施方式中,此(1)的视频裁剪过程中,在确定第一目标对象之后,在第一目标对象能够被裁剪框完全覆盖的情况下,可以直接以核心目标对象和第一目标对象为裁剪目标,从第二视频帧中裁剪得到裁剪视频帧;或者,在第一目标对象的数量为多个,但多个第一目标对象无法全被裁剪框覆盖的情况下,可以从多个第一目标对象中选取满足预设要求的第一目标对象,并以核心目标对象和该满足预设要求的第一目标对象为裁剪目标,从第二视频帧中裁剪得到裁剪视频帧。In one embodiment, in the video cropping process of (1), after the first target object is determined, under the condition that the first target object can be completely covered by the cropping frame, the core target object and the first target object can be directly For the cropping target, crop the cropped video frame from the second video frame; or, in the case that the number of the first target objects is multiple, but the multiple first target objects cannot be covered by the cropping frame, the cropping frame can be obtained from multiple first target objects. A first target object that meets the preset requirements is selected from the first target objects, and the core target object and the first target object that meets the preset requirements are used as clipping targets, and the clipped video frame is obtained by clipping from the second video frame.

例如,以多个目标对象为多个核心人物为例,若该多个核心人物包括核心人物D、核心人物E和核心人物F,其中的中心核心人物(即核心目标对象)为核心人物E,核心人物F有预设动作在展示,则在核心人物D和核心人物F能够与核心人物E同时处于裁剪框,但核心人物D和核心人物F无法全被裁剪框覆盖的情况下,可以选取有预设动作在展示的核心人物F,并以核心人物E和核心人物F为裁剪目标进行裁剪,使得裁剪视频帧包括核心人物E和核心人物F的画面。For example, taking multiple target objects as multiple core characters as an example, if the multiple core characters include core character D, core character E, and core character F, and the central core character (ie, the core target object) is the core character E, Core character F has preset actions to be displayed. If core character D and core character F can be in the cropping frame at the same time as core character E, but core character D and core character F cannot be completely covered by the cropping frame, you can select the The preset action is on the displayed core character F, and is cropped with the core character E and the core character F as the cropping targets, so that the cropped video frame includes images of the core character E and the core character F.

(2)在多个目标对象中,所述核心目标对象的中心点位置最接近第二视频帧的中心位置。此时该多个目标对象的核心度可能相同,也可能不相同。这样,可以优先选取第二视频帧的中心位置画面被有效裁剪,提升裁剪效果。(2) Among the multiple target objects, the position of the center point of the core target object is closest to the center position of the second video frame. At this time, the core degrees of the multiple target objects may or may not be the same. In this way, the center position of the second video frame can be preferentially selected to be effectively cropped, thereby improving the cropping effect.

一种实施方式中,以多个目标对象为多个核心人物为例,该多个核心人物的核心度相同或者不相同。比如,该多个核心人物都是已知明星人物,且在目标视频中都未说话、做动作、有情感表现等。In one embodiment, taking the multiple target objects as multiple core characters as an example, the core degrees of the multiple core characters are the same or different. For example, the multiple core characters are all known star characters, and none of them speak, act, or express emotions in the target video.

可选的,当核心目标对象的中心点位置最接近第二视频帧的中心位置时,上述基于多个目标对象中的核心目标对象的位置,从第二视频帧中裁剪得到裁剪视频帧的过程可以包括:将所述核心目标对象的中心位置点的横坐标作为裁剪框的中心点横坐标,以预设步长滑动所述裁剪框,并利用所述滑动后的裁剪框,从所述第二视频帧中裁剪得到裁剪视频帧;其中,所述滑动后的裁剪框完全覆盖所述核心目标对象,且所述滑动后的裁剪框内没有覆盖不全的第二目标对象;所述第二目标对象为所述多个目标对象中除所述核心目标对象之外的其他目标对象。Optionally, when the center point position of the core target object is closest to the center position of the second video frame, the above-mentioned process based on the position of the core target object in the plurality of target objects is obtained from the second video frame. It may include: taking the abscissa of the center point of the core target object as the abscissa of the center point of the cropping frame, sliding the cropping frame with a preset step size, and using the sliding cropping frame, from the first The cropped video frame is obtained by cropping the two video frames; wherein, the sliding cropping frame completely covers the core target object, and there is no incompletely covered second target object in the sliding cropping frame; the second target The objects are other target objects in the plurality of target objects except the core target object.

此(2)中,在将最接近视频帧中心位置的目标对象作为核心目标对象的情况下,除了截取核心目标对象之外,可以考虑能够尽可能的截取更多的目标对象。具体的视频裁剪过程可以包括:In this (2), when the target object closest to the center of the video frame is used as the core target object, in addition to the core target object, it can be considered that more target objects can be captured as much as possible. The specific video cropping process can include:

1)当将所述核心目标对象的中心位置点的横坐标作为裁剪框的中心点横坐标,所述裁剪框能够覆盖所述核心目标对象(此情况下,所述裁剪框可能仅覆盖了核心目标对象,也可能除了核心目标对象之外还覆盖了其他目标对象),且所述裁剪框内没有覆盖不全的第二目标对象时,可以基于所述裁剪框从所述第二视频帧中裁剪得到裁剪视频帧。1) When the abscissa of the center point of the core target object is used as the abscissa of the center point of the cropping frame, the cropping frame can cover the core target object (in this case, the cropping frame may only cover the core The target object may also cover other target objects in addition to the core target object), and when there is no incomplete second target object in the cropping frame, it can be cropped from the second video frame based on the cropping frame. Get cropped video frames.

2)当将所述核心目标对象的中心位置点的横坐标作为裁剪框的中心点横坐标,所述裁剪框能够覆盖所述核心目标对象(此情况下,所述裁剪框可能仅覆盖了核心目标对象,也可能除了核心目标对象之外还覆盖了其他目标对象),且所述裁剪框内有覆盖不全的第二目标对象时,可以以预设步长滑动所述裁剪框,并在滑动后的裁剪框能够覆盖所述核心目标对象,且所述滑动后的裁剪框内没有覆盖不全的第二目标对象的情况下,基于所述滑动后的裁剪框从所述第二视频帧中裁剪得到裁剪视频帧;或者,在滑动后的裁剪框内总有没有覆盖不全的第二目标对象的情况下,将所述核心目标对象的中心位置点的横坐标作为裁剪框的中心点横坐标,从所述第二视频帧中裁剪得到裁剪视频帧。2) When the abscissa of the center point of the core target object is used as the abscissa of the center point of the cropping frame, the cropping frame can cover the core target object (in this case, the cropping frame may only cover the core The target object may also cover other target objects in addition to the core target object), and when there is a second target object that is not fully covered in the cropping frame, the cropping frame can be slid with a preset step size, and when sliding In the case where the rear cropping frame can cover the core target object, and there is no incompletely covered second target object in the sliding cropping frame, cropping from the second video frame based on the sliding cropping frame Obtain the cropped video frame; or, in the case where there is always a second target object that is not fully covered in the cropped frame after sliding, the abscissa of the center point of the core target object is used as the abscissa of the central point of the cropping frame, A cropped video frame is obtained by cropping from the second video frame.

其中,该第二目标对象为多个目标对象中除核心目标对象之外的其他目标对象。该裁剪框的大小可以基于实际需求预先设置。该预设步长可以基于实际需求预先设置,在滑动裁剪框时,可以向左或向右进行滑动。Wherein, the second target object is other target objects except the core target object among the plurality of target objects. The size of the cropping frame can be preset based on actual needs. The preset step size can be preset based on actual needs, and when sliding the cropping frame, it can be slid left or right.

一种实施方式中,以核心目标对象为人物B为例,该核心目标对象的中心位置点可选为人物B头部位置的中心点。In one embodiment, taking the core target object as character B as an example, the center point of the core target object can be selected as the center point of the head position of character B.

例如,以多个目标对象为多个核心人物为例,如果中心核心人物(即核心目标对象)最靠近视频帧中心,且考虑能够尽可能的截取更多的核心人物,则可以中心核心人物的头部中心坐标左右滑动寻找更优的裁剪位置。裁剪框的中心点横坐标cx的值初始化为中心核心人物的头部中心位置横坐标X,然后判断此时中心核心人物有无截全且有无覆盖其它核心人物,当正好覆盖中心核心人物或者包括中心核心人物的多个核心人物时,则采用此时cx的值进行视频裁剪。如有覆盖不全的情况,则按设定步长向右或向左滑动裁剪框,并判断覆盖情况(此时设定必须覆盖到默认的中间核心人物),当出现覆盖人物完整无截取不全情况时,基于此时裁剪框的位置截取视频。如无此情况则基于裁剪框的中心点横坐标cx的初始值,进行视频裁剪。For example, taking multiple target objects as multiple core characters as an example, if the central core character (ie, the core target object) is closest to the center of the video frame, and considering that as many core characters as possible can be intercepted, the central core character can be Slide the head center coordinates left and right to find a better cropping position. The value of the abscissa cx of the center point of the cropping frame is initialized to the abscissa X of the head center position of the central core character, and then it is judged whether the central core character is complete and whether it covers other core characters. When it just covers the central core character or When there are multiple core characters including the central core character, the value of cx at this time is used for video cropping. If the coverage is incomplete, slide the cropping frame to the right or left according to the set step size, and judge the coverage (in this case, the setting must cover the default middle core character). When the coverage is complete, there is no incomplete interception. , the video is captured based on the position of the cropping frame at this time. If this is not the case, the video is cropped based on the initial value of the abscissa cx of the center point of the cropping frame.

比如参见图3A至3C所示,如果视频帧2包括核心人物L、核心人物M和核心人物N,核心人物M靠近视频帧2中心,即核心人物M为中心核心人物,当将核心人物M的头部中心位置横坐标X初始赋值为裁剪框的中心点横坐标cx时,裁剪框没有完全覆盖核心人物L和核心人物N(如图3A所示),则可以向右滑动裁剪框(如图3B所示),或者向左滑动裁剪框(如图3C所示),以找到合适裁剪位置即覆盖人物完整无截取不全(如图3C所示),并基于此时裁剪框(即将包括核心人物M和核心人物L整个头部的区域的中心位置横坐标赋值为cx)进行视频裁剪。若没有找到合适裁剪位置,则将核心人物M的头部中心位置横坐标X赋值为裁剪框的中心点横坐标cx进行视频裁剪,如图4所示。For example, as shown in FIGS. 3A to 3C , if the video frame 2 includes a core character L, a core character M and a core character N, and the core character M is close to the center of the video frame 2, that is, the core character M is the central core character. When the abscissa X of the center position of the head is initially assigned as the abscissa cx of the center point of the cropping frame, and the cropping frame does not completely cover the core character L and the core character N (as shown in Figure 3A), the cropping frame can be slid to the right (as shown in Figure 3A). 3B), or slide the cropping frame to the left (as shown in Figure 3C) to find a suitable cropping position, that is, to cover the characters completely without clipping (as shown in Figure 3C). The abscissa of the center position of the entire head area of M and the core character L is assigned as cx) for video cropping. If no suitable cropping position is found, assign the abscissa X of the central position of the head of the core character M as the abscissa cx of the central point of the cropping frame to perform video cropping, as shown in FIG. 4 .

请参见图5,图5是本发明实施例提供的一种视频裁剪装置的结构示意图,该装置应用于电子设备,如图5所示,该视频裁剪装置50可以包括:Please refer to FIG. 5. FIG. 5 is a schematic structural diagram of a video cropping apparatus provided by an embodiment of the present invention. The apparatus is applied to electronic equipment. As shown in FIG. 5, the video cropping apparatus 50 may include:

获取模块51,用于获取待处理的目标视频;an acquisition module 51, for acquiring the target video to be processed;

第一确定模块52,用于分别确定所述目标视频中每个对象的核心度;The first determination module 52 is used to determine the coreness of each object in the target video respectively;

第二确定模块53,用于根据所述每个对象的核心度,将核心度大于预设阈值的对象确定为所述目标视频对应的目标对象;The second determination module 53 is configured to, according to the coreness of each object, determine an object whose coreness is greater than a preset threshold as the target object corresponding to the target video;

裁剪模块54,用于以所述目标对象为裁剪目标,对所述目标视频中的视频帧进行裁剪,得到多个裁剪视频帧;The cropping module 54 is used for cropping the video frames in the target video by taking the target object as a cropping target to obtain a plurality of cropping video frames;

生成模块55,用于根据所述多个裁剪视频帧,生成裁剪视频。The generating module 55 is configured to generate a cropped video according to the plurality of cropped video frames.

可选的,所述第一确定模块52具体用于:Optionally, the first determining module 52 is specifically configured to:

基于以下至少一项,分别确定所述目标视频中每个对象的核心度:Determine the coreness of each object in the target video based on at least one of the following:

所述每个对象在所述目标视频中的连续性排名;the sequential ranking of each object in the target video;

所述每个对象在所述目标视频中是否被遮挡;Whether the each object is occluded in the target video;

所述每个对象是否为预设对象;Whether each of the objects is a preset object;

所述每个对象在所述目标视频中是否在说话;whether each object is speaking in the target video;

所述每个对象在所述目标视频中是否有预设动作的展示;Whether the each object has a display of preset actions in the target video;

所述每个对象在所述目标视频中的情感表现。Emotional representation of each object in the target video.

可选的,所述第一确定模块52具体用于:针对每个所述对象,分别执行以下过程:对所述对象进行分析,得到所述对象的多个特征值;根据所述多个特征值以及每个特征值的权重,计算得到所述对象的核心度。Optionally, the first determining module 52 is specifically configured to: for each object, perform the following processes respectively: analyze the object to obtain multiple feature values of the object; according to the multiple features value and the weight of each feature value, the coreness of the object is calculated.

可选的,所述多个特征值包括特征值S、特征值R、特征值K、特征值T、特征值A和特征值E;所述第一确定模块52具体用于:Optionally, the multiple eigenvalues include eigenvalue S, eigenvalue R, eigenvalue K, eigenvalue T, eigenvalue A, and eigenvalue E; the first determining module 52 is specifically configured to:

针对每个所述对象,采用如下公式,计算得到所述对象的核心度I:For each of the objects, the following formula is used to calculate the core degree I of the object:

I=a*S+b*(1/R)+c*K+d*T+e*A+f*EI=a*S+b*(1/R)+c*K+d*T+e*A+f*E

其中,S取值为0或1,0表示所述对象在所述目标视频中被遮挡,1表示所述对象在所述目标视频中没有被遮挡;R表示所述对象在所述目标视频中的连续性排名,取值为正整数;K取值为0或1,0表示所述对象不是预设对象,1表示所述对象是预设对象;T取值为0或1,0表示所述对象在所述目标视频中没有在说话,1表示所述对象在所述目标视频中在说话;A取值为0或1,0表示所述对象在所述目标视频中没有预设动作的展示,1表示所述对象在所述目标视频中有预设动作的展示;E取值为0或1,0表示所述对象在所述目标视频中没有预设情感表现,1表示所述对象在所述目标视频中有预设情感表现;a为S的权重,b为(1/R)的权重,c为K的权重,d为T的权重,e为A的权重,f为E的权重。Wherein, S is 0 or 1, 0 means the object is occluded in the target video, 1 means the object is not occluded in the target video; R means the object is in the target video The continuity ranking of , which is a positive integer; K is 0 or 1, 0 means that the object is not a preset object, 1 means that the object is a preset object; T is 0 or 1, 0 means all The object is not speaking in the target video, 1 means the object is speaking in the target video; A is 0 or 1, 0 indicates that the object has no preset action in the target video Display, 1 indicates that the object has a preset action display in the target video; E is 0 or 1, 0 indicates that the object has no preset emotional expression in the target video, and 1 indicates that the object There are preset emotional expressions in the target video; a is the weight of S, b is the weight of (1/R), c is the weight of K, d is the weight of T, e is the weight of A, and f is the weight of E Weights.

可选的,所述目标对象的数量为一个时,所述裁剪模块54具体用于:Optionally, when the number of the target objects is one, the cropping module 54 is specifically used for:

将所述目标对象的中心位置点的横坐标作为裁剪框的中心点横坐标,对所述目标视频中的视频帧进行裁剪,得到所述多个裁剪视频帧。The abscissa of the center point of the target object is used as the abscissa of the center point of the cropping frame, and the video frames in the target video are cropped to obtain the plurality of cropped video frames.

可选的,当所述目标对象占所述目标视频中视频帧的面积小于预设面积阈值时,所述裁剪视频帧的高度是基于所述目标对象的高度确定的,所述目标对象的高度与所述裁剪视频帧的高度的比值等于预设比例阈值。Optionally, when the area of the video frame in the target video occupied by the target object is smaller than a preset area threshold, the height of the cropped video frame is determined based on the height of the target object, and the height of the target object is determined. The ratio to the height of the cropped video frame is equal to a preset ratio threshold.

可选的,所述目标对象的数量为多个时,所述裁剪模块54包括:Optionally, when the number of the target objects is multiple, the cropping module 54 includes:

第一裁剪单元,用于在所述目标视频中存在包括一个目标对象的第一视频帧的情况下,将所述一个目标对象的中心位置点的横坐标作为裁剪框的中心点横坐标,从所述第一视频帧中裁剪得到裁剪视频帧;The first cropping unit is used for taking the abscissa of the center position point of the target object as the abscissa of the center point of the cropping frame when there is a first video frame including a target object in the target video, from The first video frame is clipped to obtain a clipped video frame;

第二裁剪单元,用于在所述目标视频中存在包括多个目标对象的第二视频帧的情况下,基于所述多个目标对象中的核心目标对象的位置,从所述第二视频帧中裁剪得到裁剪视频帧。a second cropping unit, configured to, in the case that there is a second video frame including multiple target objects in the target video, based on the position of the core target object in the multiple target objects, from the second video frame Crop in to get the cropped video frame.

可选的,所述核心目标对象满足以下条件中任意一项:Optionally, the core target object satisfies any one of the following conditions:

在所述多个目标对象中的核心度最高;The core degree is the highest among the plurality of target objects;

在所述多个目标对象中,所述核心目标对象的中心点位置最接近所述第二视频帧的中心位置。Among the plurality of target objects, the position of the center point of the core target object is closest to the center position of the second video frame.

可选的,当所述核心目标对象在所述多个目标对象中的核心度最高时,所述第二裁剪单元包括:Optionally, when the core target object has the highest core degree among the multiple target objects, the second cropping unit includes:

第一确定子单元,用于根据所述核心目标对象的位置,确定所述核心目标对象与其他目标对象之间的距离;a first determination subunit, configured to determine the distance between the core target object and other target objects according to the position of the core target object;

第二确定子单元,用于根据所述距离,确定能够与所述核心目标对象同时处于裁剪框的第一目标对象;a second determination subunit, configured to determine, according to the distance, a first target object that can be in a cropping frame simultaneously with the core target object;

第一裁剪子单元,用于根据预设条件选取目标裁剪框,并利用所述目标裁剪框,从所述第二视频帧中裁剪得到裁剪视频帧;其中,所述目标裁剪框完全覆盖所述核心目标对象和至少一个第一目标对象。The first cropping subunit is used to select a target cropping frame according to a preset condition, and use the target cropping frame to crop a cropped video frame from the second video frame; wherein, the target cropping frame completely covers the A core target object and at least one first target object.

可选的,当所述核心目标对象的中心点位置最接近所述第二视频帧的中心位置时,所述第二裁剪单元具体用于:将所述核心目标对象的中心位置点的横坐标作为裁剪框的中心点横坐标,以预设步长滑动所述裁剪框,并利用所述滑动后的裁剪框,从所述第二视频帧中裁剪得到裁剪视频帧;其中,所述滑动后的裁剪框完全覆盖所述核心目标对象,且所述滑动后的裁剪框内没有覆盖不全的第二目标对象;所述第二目标对象为所述多个目标对象中除所述核心目标对象之外的其他目标对象。Optionally, when the position of the center point of the core target object is closest to the center position of the second video frame, the second cropping unit is specifically configured to: convert the abscissa of the center point of the core target object. As the abscissa of the center point of the cropping frame, slide the cropping frame with a preset step size, and use the sliding cropping frame to crop the cropped video frame from the second video frame; wherein, after the sliding The cropping frame completely covers the core target object, and there is no second target object that is not fully covered in the sliding cropping frame; the second target object is the core target object among the plurality of target objects. other target objects.

可理解的,本发明实施例的视频裁剪装置50,可实现上述图1所示的方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。It is understandable that the video cropping apparatus 50 according to the embodiment of the present invention can implement each process of the method embodiment shown in FIG. 1 above, and can achieve the same technical effect. To avoid repetition, details are not repeated here.

此外,本发明实施例还提供了一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的程序或指令,其中,所述程序或指令被所述处理器执行时可以实现上述图1所示方法实施例的各个过程且能达到相同的技术效果,为避免重复,这里不再赘述。In addition, an embodiment of the present invention also provides an electronic device, including a memory, a processor, and a program or instruction stored on the memory and executable on the processor, wherein the program or instruction is When executed by the processor, each process of the method embodiment shown in FIG. 1 can be implemented and the same technical effect can be achieved. To avoid repetition, details are not repeated here.

请参见图6所示,本发明实施例还提供了一种电子设备60,包括总线61、收发机62、天线63、总线接口66、处理器65和存储器66。Referring to FIG. 6 , an embodiment of the present invention further provides an electronic device 60 , including a bus 61 , a transceiver 62 , an antenna 63 , a bus interface 66 , a processor 65 and a memory 66 .

在本发明实施例中,电子设备60还包括:存储在存储器66上并可在处理器65上运行的计算机程序。可选的,所述计算机程序被处理器65执行时可实现如下步骤:In the embodiment of the present invention, the electronic device 60 further includes: a computer program stored on the memory 66 and executable on the processor 65 . Optionally, when the computer program is executed by the processor 65, the following steps may be implemented:

获取待处理的目标视频;Get the target video to be processed;

分别确定所述目标视频中每个对象的核心度;Determine the core degree of each object in the target video respectively;

根据所述每个对象的核心度,将核心度大于预设阈值的对象确定为所述目标视频对应的目标对象;According to the core degree of each object, the object whose core degree is greater than the preset threshold is determined as the target object corresponding to the target video;

以所述目标对象为裁剪目标,对所述目标视频中的视频帧进行裁剪,得到多个裁剪视频帧;Taking the target object as a clipping target, clipping the video frames in the target video to obtain a plurality of clipping video frames;

根据所述多个裁剪视频帧,生成裁剪视频。A cropped video is generated from the plurality of cropped video frames.

可理解的,所述计算机程序被处理器65执行时可实现上述图1所示方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。It is understandable that when the computer program is executed by the processor 65, each process of the method embodiment shown in FIG. 1 can be implemented, and the same technical effect can be achieved. In order to avoid repetition, details are not repeated here.

在图6中,总线架构(用总线61来代表),总线61可以包括任意数量的互联的总线和桥,总线61将包括由处理器65代表的一个或多个处理器和存储器66代表的存储器的各种电路链接在一起。总线61还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路链接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口66在总线61和收发机62之间提供接口。收发机62可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器65处理的数据通过天线63在无线介质上进行传输,进一步,天线63还接收数据并将数据传送给处理器65。In FIG. 6, a bus architecture (represented by bus 61), which may include any number of interconnected buses and bridges, bus 61 will include one or more processors represented by processor 65 and memory represented by memory 66 The various circuits are linked together. The bus 61 may also link together various other circuits such as peripherals, voltage regulators and power management circuits, etc., which are well known in the art and therefore will not be described further herein. Bus interface 66 provides an interface between bus 61 and transceiver 62 . Transceiver 62 may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium. The data processed by the processor 65 is transmitted on the wireless medium through the antenna 63 , and further, the antenna 63 also receives the data and transmits the data to the processor 65 .

处理器65负责管理总线61和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器66可以被用于存储处理器65在执行操作时所使用的数据。Processor 65 is responsible for managing bus 61 and general processing, and may also provide various functions including timing, peripheral interface, voltage regulation, power management, and other control functions. Instead, memory 66 may be used to store data used by processor 65 in performing operations.

可选的,处理器65可以是CPU、ASIC、FPGA或CPLD。Optionally, the processor 65 may be a CPU, ASIC, FPGA or CPLD.

本发明实施例还提供了一种计算机可读存储介质,其上存储有程序或指令,所述程序或指令被处理器执行时可以实现上述图1所示方法实施例的各个过程且能达到相同的技术效果,为避免重复,这里不再赘述。Embodiments of the present invention further provide a computer-readable storage medium, on which programs or instructions are stored, and when the programs or instructions are executed by a processor, each process of the method embodiment shown in FIG. 1 can be implemented and can achieve the same In order to avoid repetition, the technical effect will not be repeated here.

计算机可读介质包括永久性和非永久性、可移动和非可移动媒体,可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media includes both permanent and non-permanent, removable and non-removable media, and can be implemented by any method or technology for storage of information. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, excludes transitory computer-readable media, such as modulated data signals and carrier waves.

需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages or disadvantages of the embodiments.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台服务分类设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solutions of the present invention can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products are stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a service classification device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods described in the various embodiments of the present invention.

以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。The above are only the preferred embodiments of the present invention. It should be pointed out that for those skilled in the art, without departing from the principles of the present invention, several improvements and modifications can be made. It should be regarded as the protection scope of the present invention.

Claims (10)

1. A video clipping method is applied to electronic equipment and is characterized by comprising the following steps:
acquiring a target video to be processed;
respectively determining the core degree of each object in the target video;
determining the object with the core degree larger than a preset threshold value as a target object corresponding to the target video according to the core degree of each object;
with the target object as a cutting target, cutting video frames in the target video to obtain a plurality of cutting video frames;
and generating a cutting video according to the plurality of cutting video frames.
2. The method of claim 1, wherein the separately determining the core degree of each object in the target video comprises:
respectively determining the core degree of each object in the target video based on at least one of the following items:
a continuity ranking of the each object in the target video;
whether each object is occluded in the target video;
whether each object is a preset object or not;
whether each of the objects is speaking in the target video;
whether each object has the display of a preset action in the target video or not;
and the emotional expression of each object in the target video.
3. The method of claim 1, wherein the separately determining the core degree of each object in the target video comprises:
for each of the objects, the following processes are respectively performed:
analyzing the object to obtain a plurality of characteristic values of the object;
and calculating to obtain the core degree of the object according to the plurality of characteristic values and the weight of each characteristic value.
4. The method according to claim 3, wherein the plurality of feature values include a feature value S, a feature value R, a feature value K, a feature value T, a feature value A, and a feature value E;
the calculating the core degree of the object according to the plurality of characteristic values and the weight of each characteristic value comprises:
calculating to obtain the core degree I of the object by adopting the following formula:
I=a*S+b*(1/R)+c*K+d*T+e*A+f*E
wherein, the value of S is 0 or 1, 0 represents that the object is shielded in the target video, and 1 represents that the object is not shielded in the target video; r represents the continuity ranking of the object in the target video, and the value of R is a positive integer; k takes the value of 0 or 1, 0 represents that the object is not a preset object, and 1 represents that the object is a preset object; the value of T is 0 or 1, 0 represents that the object does not speak in the target video, and 1 represents that the object speaks in the target video; the value of A is 0 or 1, wherein 0 represents that the object has no display of a preset action in the target video, and 1 represents that the object has the display of the preset action in the target video; the value of E is 0 or 1, wherein 0 represents that the object has no preset emotional expression in the target video, and 1 represents that the object has the preset emotional expression in the target video; a is the weight of S, b is the weight of (1/R), c is the weight of K, d is the weight of T, E is the weight of A, and f is the weight of E.
5. The method according to claim 1, wherein when the number of the target objects is plural, the cropping the video frame in the target video to obtain plural cropped video frames comprises:
under the condition that a first video frame comprising a target object exists in the target video, the abscissa of the central position point of the target object is used as the abscissa of the central point of the cutting frame, and the cutting video frame is obtained by cutting from the first video frame;
and/or the presence of a gas in the gas,
and in the case that a second video frame comprising a plurality of target objects exists in the target video, cutting out a cut video frame from the second video frame based on the position of a core target object in the plurality of target objects.
6. The method of claim 5, wherein the core target object satisfies any one of the following conditions:
a core degree is highest among the plurality of target objects;
the center point position of the core target object among the plurality of target objects is closest to the center position of the second video frame.
7. The method of claim 6, wherein the cropping from the second video frame based on the position of the core target object of the plurality of target objects when the core target object has a highest core degree among the plurality of target objects comprises:
determining the distance between the core target object and other target objects according to the position of the core target object;
determining a first target object which can be in a cutting box with the core target object at the same time according to the distance;
selecting a target cutting frame according to a preset condition, and cutting the second video frame by using the target cutting frame to obtain a cut video frame; wherein the target crop box completely covers the core target object and the at least one first target object.
8. The method of claim 6, wherein the cropping from the second video frame based on the position of the core target object from the plurality of target objects when the center point position of the core target object is closest to the center position of the second video frame comprises:
taking the abscissa of the central position point of the core target object as the abscissa of the central point of the cutting frame, sliding the cutting frame by a preset step length, and cutting the second video frame by using the sliding cutting frame to obtain a cut video frame; wherein the slid crop box completely covers the core target object, and the slid crop box does not have an incomplete second target object covered therein; the second target object is the other target objects except the core target object in the plurality of target objects.
9. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the steps of the video cropping method of any of claims 1 to 8.
10. A computer-readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the video cropping method of any one of claims 1 to 8.
CN202010973452.3A 2020-09-16 2020-09-16 Video clipping method, electronic device and computer-readable storage medium Pending CN112135188A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010973452.3A CN112135188A (en) 2020-09-16 2020-09-16 Video clipping method, electronic device and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010973452.3A CN112135188A (en) 2020-09-16 2020-09-16 Video clipping method, electronic device and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN112135188A true CN112135188A (en) 2020-12-25

Family

ID=73845807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010973452.3A Pending CN112135188A (en) 2020-09-16 2020-09-16 Video clipping method, electronic device and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN112135188A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112967288A (en) * 2021-02-03 2021-06-15 咪咕文化科技有限公司 Multimedia data processing method, communication equipment and readable storage medium
CN113436072A (en) * 2021-06-24 2021-09-24 湖南快乐阳光互动娱乐传媒有限公司 Video frame clipping method and device
CN114155255A (en) * 2021-12-14 2022-03-08 成都索贝数码科技股份有限公司 Video horizontal screen-vertical screen conversion method based on specific figure space-time trajectory
CN114302226A (en) * 2021-12-28 2022-04-08 北京中科大洋信息技术有限公司 Intelligent cutting method for video picture
CN114339031A (en) * 2021-12-06 2022-04-12 深圳市金九天视实业有限公司 Picture adjusting method, device, equipment and storage medium
CN115942046A (en) * 2022-12-08 2023-04-07 北京中科闻歌科技股份有限公司 Method for intelligently cutting video and storage medium
WO2024114569A1 (en) * 2022-11-30 2024-06-06 华为技术有限公司 Video processing method, and electronic device
CN118474482A (en) * 2024-07-12 2024-08-09 银河互联网电视(浙江)有限公司 Video cover determination method, device, equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102124727A (en) * 2008-03-20 2011-07-13 无线电技术研究学院有限公司 Methods for adapting video images to small screen sizes
US20120127329A1 (en) * 2009-11-30 2012-05-24 Shane Voss Stabilizing a subject of interest in captured video
US20130287301A1 (en) * 2010-11-22 2013-10-31 JVC Kenwood Corporation Image processing apparatus, image processing method, and image processing program
US20140105573A1 (en) * 2012-10-12 2014-04-17 Nederlandse Organisatie Voor Toegepast-Natuurwetenschappelijk Onderzoek Tno Video access system and method based on action type detection
US20180007286A1 (en) * 2016-07-01 2018-01-04 Snapchat, Inc. Systems and methods for processing and formatting video for interactive presentation
CN109145784A (en) * 2018-08-03 2019-01-04 百度在线网络技术(北京)有限公司 Method and apparatus for handling video
US20190130165A1 (en) * 2017-10-27 2019-05-02 Avigilon Corporation System and method for selecting a part of a video image for a face detection operation
CN110189378A (en) * 2019-05-23 2019-08-30 北京奇艺世纪科技有限公司 A kind of method for processing video frequency, device and electronic equipment
CN110223306A (en) * 2019-06-14 2019-09-10 北京奇艺世纪科技有限公司 A kind of method of cutting out and device of image
CN111277915A (en) * 2018-12-05 2020-06-12 阿里巴巴集团控股有限公司 Video conversion method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102124727A (en) * 2008-03-20 2011-07-13 无线电技术研究学院有限公司 Methods for adapting video images to small screen sizes
US20120127329A1 (en) * 2009-11-30 2012-05-24 Shane Voss Stabilizing a subject of interest in captured video
US20130287301A1 (en) * 2010-11-22 2013-10-31 JVC Kenwood Corporation Image processing apparatus, image processing method, and image processing program
US20140105573A1 (en) * 2012-10-12 2014-04-17 Nederlandse Organisatie Voor Toegepast-Natuurwetenschappelijk Onderzoek Tno Video access system and method based on action type detection
US20180007286A1 (en) * 2016-07-01 2018-01-04 Snapchat, Inc. Systems and methods for processing and formatting video for interactive presentation
US20190130165A1 (en) * 2017-10-27 2019-05-02 Avigilon Corporation System and method for selecting a part of a video image for a face detection operation
CN109145784A (en) * 2018-08-03 2019-01-04 百度在线网络技术(北京)有限公司 Method and apparatus for handling video
CN111277915A (en) * 2018-12-05 2020-06-12 阿里巴巴集团控股有限公司 Video conversion method and device
CN110189378A (en) * 2019-05-23 2019-08-30 北京奇艺世纪科技有限公司 A kind of method for processing video frequency, device and electronic equipment
CN110223306A (en) * 2019-06-14 2019-09-10 北京奇艺世纪科技有限公司 A kind of method of cutting out and device of image

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112967288A (en) * 2021-02-03 2021-06-15 咪咕文化科技有限公司 Multimedia data processing method, communication equipment and readable storage medium
CN112967288B (en) * 2021-02-03 2024-12-17 咪咕文化科技有限公司 Multimedia data processing method, communication equipment and readable storage medium
CN113436072A (en) * 2021-06-24 2021-09-24 湖南快乐阳光互动娱乐传媒有限公司 Video frame clipping method and device
CN114339031A (en) * 2021-12-06 2022-04-12 深圳市金九天视实业有限公司 Picture adjusting method, device, equipment and storage medium
CN114155255A (en) * 2021-12-14 2022-03-08 成都索贝数码科技股份有限公司 Video horizontal screen-vertical screen conversion method based on specific figure space-time trajectory
CN114302226A (en) * 2021-12-28 2022-04-08 北京中科大洋信息技术有限公司 Intelligent cutting method for video picture
WO2024114569A1 (en) * 2022-11-30 2024-06-06 华为技术有限公司 Video processing method, and electronic device
CN115942046A (en) * 2022-12-08 2023-04-07 北京中科闻歌科技股份有限公司 Method for intelligently cutting video and storage medium
CN115942046B (en) * 2022-12-08 2024-05-31 北京中科闻歌科技股份有限公司 Method for intelligently cutting video and storage medium
CN118474482A (en) * 2024-07-12 2024-08-09 银河互联网电视(浙江)有限公司 Video cover determination method, device, equipment and storage medium
CN118474482B (en) * 2024-07-12 2024-12-03 银河互联网电视(浙江)有限公司 Video cover determination method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112135188A (en) Video clipping method, electronic device and computer-readable storage medium
CN109522815B (en) Concentration degree evaluation method and device and electronic equipment
CN109325933B (en) Method and device for recognizing copied image
CN108829900B (en) Face image retrieval method and device based on deep learning and terminal
CN110147744B (en) Face image quality assessment method, device and terminal
US20210312192A1 (en) Method and device for image processing and storage medium
US20170154238A1 (en) Method and electronic device for skin color detection
CN108198130B (en) Image processing method, image processing device, storage medium and electronic equipment
CN106529406B (en) Method and device for acquiring video abstract image
CN110516529A (en) A feeding detection method and system based on deep learning image processing
CN107944381B (en) Face tracking method, face tracking device, terminal and storage medium
CN111008935B (en) Face image enhancement method, device, system and storage medium
EP4435660A1 (en) Target detection method and apparatus
CN111582654B (en) Service quality evaluation method and device based on deep cycle neural network
CN111476060A (en) Face definition analysis method, device, computer equipment and storage medium
CN111967406A (en) Method, system, equipment and storage medium for generating human body key point detection model
CN111177811A (en) Automatic fire point location layout method applied to cloud platform
CN109986553B (en) Active interaction robot, system, method and storage device
CN112861855A (en) Group-raising pig instance segmentation method based on confrontation network model
CN112087590A (en) Image processing method, device, system and computer storage medium
CN111814846A (en) Training method and recognition method of attribute recognition model and related equipment
CN113518214B (en) Panoramic video data processing method and device
CN111476065A (en) Target tracking method and device, computer equipment and storage medium
AU2020436768B2 (en) Joint rotation inferences based on inverse kinematics
CN110084306B (en) Method and apparatus for generating dynamic image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201225

RJ01 Rejection of invention patent application after publication