WO2020187086A1 - 一种视频剪辑方法、装置、设备和存储介质 - Google Patents

一种视频剪辑方法、装置、设备和存储介质 Download PDF

Info

Publication number
WO2020187086A1
WO2020187086A1 PCT/CN2020/078548 CN2020078548W WO2020187086A1 WO 2020187086 A1 WO2020187086 A1 WO 2020187086A1 CN 2020078548 W CN2020078548 W CN 2020078548W WO 2020187086 A1 WO2020187086 A1 WO 2020187086A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
edited
content
content element
identifier
Prior art date
Application number
PCT/CN2020/078548
Other languages
English (en)
French (fr)
Inventor
朱晓龙
黄生辉
梅利健
陈卫东
林少彬
王一同
季兴
范杰
罗敏
黄婉瑜
方圆
陈仁健
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2020187086A1 publication Critical patent/WO2020187086A1/zh
Priority to US17/314,231 priority Critical patent/US11715497B2/en

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • G11B27/034Electronic editing of digitised analogue information signals, e.g. audio or video signals on discs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/34Indicating arrangements 
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/36Monitoring, i.e. supervising the progress of recording or reproducing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Definitions

  • This application relates to the field of data processing, and in particular to a video editing method, device, equipment and storage medium.
  • Video editing technology is a video processing method that combines objects to be edited, such as static images and dynamic videos, into a clipped video by editing. It is often used in video editing scenes such as short video production and video collections.
  • the traditional video editing method is to use a fixed editing template, and the user can select the editing template suitable for the object to be edited to automatically synthesize the edited video.
  • the editing templates all have fixed materials, such as fixed music, special effects, rendering special effects, etc., resulting in different objects to be edited, and the edited video synthesized through the same editing template has basically the same overall style and homogeneity. Bring a bad user experience.
  • the present application provides a video editing method, device, equipment, and storage medium.
  • the resulting edited video has a lower degree of homogeneity and improves the user experience.
  • an embodiment of the present application provides a video editing method, the method including:
  • the clip video is synthesized according to the content element and the video clip material collection.
  • an embodiment of the present application provides a video editing device, which includes an acquisition unit, a first determination unit, a second determination unit, and a synthesis unit:
  • the acquiring unit is used to acquire the object to be edited
  • the first determining unit is configured to determine a content element used for video editing in the object to be edited, and the content element has a corresponding content type identifier;
  • the second determining unit is configured to determine the material collection identifier corresponding to the content type identifier according to the logic of the first behavior tree;
  • the second determining unit is further configured to determine a video clip material set corresponding to the content type identifier according to the material set identifier;
  • the compositing unit is used to synthesize the clip video according to the content element and the set of video clip material.
  • an embodiment of the present application provides a device for video editing, the device includes a processor and a memory:
  • the memory is used to store at least one piece of program code and transmit the at least one piece of program code to the processor;
  • the processor is configured to execute the video editing method described in the first aspect according to instructions in the at least one piece of program code.
  • an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium is used to store at least one piece of program code, and the at least one piece of program code is used to execute the video editing method described in the first aspect .
  • FIG. 1 is a schematic diagram of an application scenario of a video editing method provided by an embodiment of this application
  • FIG. 2 is a flowchart of a video editing method provided by an embodiment of the application.
  • FIG. 3 is an example diagram of determining content elements for dynamic videos provided by an embodiment of the application
  • Fig. 4 is an example diagram of using a first behavior tree logic to determine a video clip material set corresponding to a content type identifier provided by an embodiment of the application;
  • FIG. 5 is a flowchart for determining a set of video clip materials provided by an embodiment of the application
  • FIG. 6 is a flowchart of a video editing method provided by an embodiment of the application.
  • FIG. 7 is an interface diagram of the homepage of a video editing software provided by an embodiment of the application.
  • FIG. 8 is an example diagram of a video editing method provided by an embodiment of this application.
  • FIG. 9 is a structural diagram of a video editing device provided by an embodiment of the application.
  • FIG. 10 is a structural diagram of a video editing device provided by an embodiment of this application.
  • FIG. 11 is a structural diagram of a video editing device provided by an embodiment of the application.
  • FIG. 12 is a structural diagram of a device for video editing provided by an embodiment of this application.
  • FIG. 13 is a structural diagram of a server provided by an embodiment of this application.
  • the material in an editing template includes special effect A, and the special effect A set by the editing template is located in the 3rd" to 4th" of the edited video.
  • the editing template is selected for different to-be-edited objects
  • the resulting edited video will have special effects A in the 3rd" to the 4th", that is, different to-be-edited videos are synthesized through the same editing template. Basically the same in style.
  • embodiments of this application provide a video editing method, which can be applied to terminal devices, such as smart terminals, computers, personal digital assistants (PDAs for short), and tablet computers. And other devices with video editing capabilities.
  • terminal devices such as smart terminals, computers, personal digital assistants (PDAs for short), and tablet computers.
  • PDAs personal digital assistants
  • the video editing method can also be applied to a server.
  • the server can be a device that provides a video editing service to a terminal device.
  • the terminal device can upload the object to be edited to the server.
  • the server uses the video editing method provided in the embodiments of this application to obtain the clipped video. And return the clipped video to the terminal device.
  • the server can be an independent server or a server in a cluster.
  • the following describes the video editing method provided in the embodiments of the present application by taking a terminal device as an example in combination with actual application scenarios.
  • FIG. 1 is a schematic diagram of an application scenario of a video editing method provided by an embodiment of the application.
  • This application scenario includes a terminal device 101, and the terminal device 101 can obtain the object to be edited.
  • the object to be edited is an object used to synthesize the edited video by means of editing, and the object to be edited includes static images and/or dynamic videos.
  • the object to be edited may also include a dynamic image composed of a group of static images.
  • the object to be edited includes content elements, which can reflect the main content included in the object to be edited.
  • the content element may be at least one of tag information, scene information, object information (for example, human-related information, animal-related information, and plant-related information, etc.) in the object to be edited, voice information, and location information of the object.
  • the determined content element can be the information extracted from the dynamic video itself or some fragments. Some fragments are, for example, long-shot fragments and fighting fragments in the dynamic video. These video clips can also be called highlight moments; if the object to be edited is a static picture, the determined content elements are the human body, animals, buildings, and weather included in the static picture itself.
  • the object to be edited is a dynamic video
  • the dynamic video includes a man, a dog, and a car
  • a longer shot is the dog's shot
  • the determined content elements are men, dogs, and Vehicles etc.
  • the object to be edited is a static picture
  • the static picture includes a woman sitting on a chair holding a cat
  • the determined content elements include woman, cat, chair, etc.
  • the terminal device 101 can determine the content element used for the video clip from the object to be clipped.
  • Each content element in the object to be edited has a corresponding content type identifier, which can identify the content characteristics of the corresponding content element. For example, if the object to be edited is a landscape picture, the content type corresponding to the content element included in the object to be edited is identified as a landscape, and the terminal device 101 can recognize that the content element is a landscape through the content type identification. For another example, when the object to be edited is a picture or video that includes a cat or a dog, the content type corresponding to the cat and dog is identified as a cute pet, and the terminal device 101 can identify the content element as a cute pet through the content type identification.
  • the terminal device 101 can use the first behavior after determining the content elements
  • the tree logic determines the material collection ID corresponding to the content type ID, and then determines the video clip material collection corresponding to the content type ID according to the material collection ID.
  • the video editing material collection includes various materials required for synthesizing and editing the video with content elements, such as stickers, filters, special effects, music, subtitles, credits, credits, etc.
  • the material collection identifier is used to identify the type of the video clip material in the video clip material collection. For example, if the material collection identifier is a cute pet, the video clip material included in the video clip material collection corresponding to the material collection identifier is related to the cute pet.
  • the determined material collection is identified as cute pet
  • the material included in the video clip material set determined according to the material collection identifier is the material related to cute pet
  • the content The content type of the element is identified as landscape
  • the determined material set is identified as landscape
  • the material included in the video clip material set determined according to the material set identifier is landscape-related materials, and so on.
  • the content type identification can identify the content characteristics of the corresponding content element
  • the determined video clip material set is consistent with the content characteristics embodied by the content element, and the video clip material set determined according to the content elements of different content characteristics is different.
  • the synthesized edited video has different characteristics in the overall style.
  • the diversity of the video clip material collection is further increased, so that the clip videos synthesized according to content elements with similar content characteristics can also be different in overall style. It can be seen that by using the video editing method provided in the embodiments of the present application, the degree of homogeneity of the obtained edited video is lower, and the user experience is improved.
  • Figure 2 shows a flowchart of a video editing method, the method includes:
  • the terminal device obtains an object to be edited.
  • the method for obtaining the object to be edited may include multiple methods. Among them, one method may be obtained by the terminal device from the stored static image and/or dynamic video.
  • the stored static image and/or Or the dynamic video can be collected by the terminal device, or obtained by the terminal device by downloading. If the stored static images and/or dynamic videos are collected by the terminal device, then the static images and/or dynamic videos are generally stored in the gallery of the terminal device.
  • the terminal device can prompt the user from the gallery
  • the terminal device can obtain the static image and/or dynamic video selected by the user according to the user's selection operation. At this time, the terminal device combines the static image and/or dynamic video selected by the user. / Or dynamic video as the object to be edited.
  • Another acquisition method can be real-time collection by the user when the user needs to perform video editing through the terminal device. That is, when the user uses the video editing function in the terminal device, the terminal device can prompt the user to collect static images and/or dynamic videos. After the terminal device is used to complete the collection of static images and/or dynamic videos, the terminal device can acquire the currently collected still images and/or dynamic videos. At this time, the currently collected still images and/or dynamic videos are used as objects to be edited.
  • the terminal device determines a content element used for video editing in the object to be edited.
  • the automatic editing algorithm may mainly include a two-layer architecture.
  • the first-layer architecture is a decision element extraction module, which is used to perform the step of S202 to determine content elements for video editing from the objects to be edited.
  • the second layer of architecture is the editing strategy implementation module, which is used to execute the steps of S203 and S204, and determine the set of video clip material that matches the content characteristics embodied by the content element according to the logic of the first behavior tree.
  • AI Artificial Intelligence
  • a possible implementation of S202 is that the terminal device can extract the structured information of the object to be edited to determine the content element in the object to be edited and the content type identifier corresponding to the content element.
  • the terminal device can extract the structured information of the object to be edited based on the timestamp, and then determine the content element and the content type identifier.
  • the structured information may include: face detection and tracking information, such as face key point information, facial expression information, face attribute information, etc.; human body detection and tracking information, such as gesture information, motion information, etc.; pet detection And tracking information, such as key points of pet faces, pet type information, etc.; and audio information obtained through Voice Activity Detection (VAD).
  • face detection and tracking information such as face key point information, facial expression information, face attribute information, etc.
  • human body detection and tracking information such as gesture information, motion information, etc.
  • pet detection And tracking information such as key points of pet faces, pet type information, etc.
  • the object to be edited includes dynamic video A, and an example diagram of content elements obtained for dynamic video A is shown in FIG. 3.
  • the decision element extraction module obtains the dynamic video A, it extracts structured information based on the timestamp, thereby determining the content elements in the object to be edited, so that each content element has a corresponding timestamp.
  • FIG. 3 shows the correspondence between the content elements determined from the dynamic video A and the time stamp.
  • Figure 3 exemplarily shows the five content elements determined from the dynamic video A, namely smile, wave, jump, and scenery, where the timestamps corresponding to the smile are 3"-4" and 23"-27", waving The corresponding time stamp is 7"-10", the time stamp corresponding to jumping is 11"-12", and the time stamp corresponding to scenery is 15"-19".
  • the dynamic video may include many video frames, some video frames can reflect the main content of the object to be edited, and some video frames reflect other content.
  • the object to be edited is a dynamic video of a TV series, and the dynamic video generally contains related content of the TV series and advertisement screens that are not related to the TV series. Then, a video segment that reflects related content of a TV drama in a dynamic video can be used as a content element.
  • a specific video frame can be used as a key frame, and the key frame is the segmentation point for segmenting the object to be edited, so that the video to be edited is divided according to the key frame Multiple video segments are formed, and then it is determined whether each video segment can reflect the main content of the object to be edited according to the key frame, and the content element of the object to be edited is determined according to the key frame.
  • the key frame may include one video frame, or may include multiple consecutive video frames.
  • the object to be edited includes video frame 1, video frame 2, video frame 3... video frame 100, and the 20th to 75th frame in the object to be edited is an advertisement screen that has nothing to do with a TV series.
  • the key frames are video frame 20 and video frame 75, use the key frame to divide the object to be edited to obtain multiple video clips.
  • the first video clip is a video clip composed of video frame 1 to video frame 19
  • the second The video segment is a video segment composed of video frames 20 to 75
  • the third video segment is a video segment composed of video frames 76 to 100.
  • the terminal device recognizes according to the key frame that the first video segment and the third video segment can reflect the main content of the object to be edited, and the second video segment is other content that has nothing to do with the main content of the object to be edited. Therefore, the terminal device determines that the first video segment and the third video segment are content elements.
  • the terminal device determines the material collection identifier corresponding to the content type identifier according to the logic of the first behavior tree.
  • the terminal device determines a video clip material set corresponding to the content type identifier according to the material set identifier.
  • the terminal device may determine the material collection identifier corresponding to the content type identifier from the video clip material library according to the first behavior tree logic, and then determine the video clip material collection according to the material collection identifier.
  • the source of the video clip material in the video clip material library can include multiple sources.
  • the video clip material can come from the basic material library.
  • the video clip material included in the basic material library is the necessary material for video editing; the video clip material can also be It comes from the Media Editing Asset, which includes video editing materials such as music, credits, credits, subtitles, stickers, special effects, and filters.
  • the video clip material in the one-click library is preset.
  • video clip materials can also come from a basic material library and a one-click library.
  • the music in the one-click library has different playing durations, which can fit the content elements;
  • the opening and ending can be visual title stickers and special effects;
  • the subtitle content corresponds to the content type identification corresponding to the content element;
  • the stickers can be panoramic atmosphere stickers , Used to match the atmosphere of the scene obtained by scene recognition, and can also follow stickers for the target, dynamically following objects such as people or animals;
  • special effects can be transition effects, used to connect different content elements, and can be based on graphics processors (Graphics Processing Unit, abbreviated as GPU)
  • the filter can be the filter effect obtained through scene recognition, such as sunset filter, retro movie Filters etc.
  • Behavior Tree (Behaviour Tree) logic refers to a tree-like structure that represents an agent's approach from environment perception to logic execution, and is often used to describe the behavior of non-player characters (NPCs) in games.
  • the behavior tree logic is applied to the video clip to determine the set of video clip material that matches the content characteristics of the content element.
  • the behavior tree may include different nodes, such as sequence (Sequence) node, condition (Condition) node, and action (Action) node.
  • Sequence (Sequence) node executes all child nodes and returns success, if a child node fails, it returns failure.
  • the main task that needs to be performed using the behavior tree is to select a dynamic video with a length of 10", which can be decomposed into three subtasks with a length of 3", 3", and 4". Each subtask Can be expressed as a child node of a sequence node.
  • the Condition node can return success or failure based on the comparison result of the condition. For example, the condition node indicates whether the length of the time period corresponding to a content element in the dynamic video is 3". If so, select the dynamic video corresponding to the time period; if not, select the dynamic video corresponding to the random 3" time period.
  • the Action node can return success, failure or running according to the result of the action.
  • the action node is responsible for the implementation of the dynamic video editing strategy. For example, selecting a segment in the dynamic video and returning it directly corresponds to the application programming interface (API) of the underlying video software development kit (Software Development Kit, SDK for short).
  • API application programming interface
  • the behavior tree logic used in S203 is the first behavior tree logic, and the first behavior tree logic is used to determine the video clip corresponding to the content type identifier. See Figure 4 for an example of the material collection.
  • the content element in FIG. 4 is the content element obtained in FIG. 3.
  • the content element can be input into the first behavior tree shown in FIG. 4, and the first behavior tree logic can determine the video clip material set according to the content type identifier.
  • Action 1, Action 2, and Action 3 are used to represent the video clip material determined for the content element. For example, if action 1 is "add a cute pet sticker", the determined video clip material is a cute pet sticker, and then a collection of video clip materials is obtained.
  • the child nodes of the sequence node indicate that the video clips of 3" and 4" are selected, then the video clips with a timestamp of 3"-5", the video clips with a timestamp of 23"-27", and the video clip with a timestamp of 7"-
  • the 10" video clip returns successfully, then action 1 is waving, and action 2 is smiling; the child node of the selected node indicates whether the length of the video clip is 3”, because the length of the video clip with the time stamp of 3”-5” is 3 ", then action 2 is a smile, and since the length of the video clip with a time stamp of 11"-12" is 2", then action 3 is a jump.
  • behavior tree logic such as the first behavior tree logic can be expressed in the form of Domain Specific Language (DSL), which is convenient for storage and analysis. Specifically, it is described in the form of Extensible Markup Language (XML for short).
  • DSL Domain Specific Language
  • XML Extensible Markup Language
  • the embodiment of this application uses the logic of the first behavior tree for video editing. Before that, it is necessary to create a behavior tree, such as the first behavior tree, and ensure that the terminal device can use the first behavior tree logic. To this end, developers need to develop behavior tree development tools, such as an editor, so as to use the editor to implement behavior tree node creation, modification, subtree editing, and so on. At the same time, developers need to develop an interpreter to read the behavior tree expressed in XML, then dynamically compile it into code logic and load it into the editing strategy implementation module.
  • the one-click library and the created first behavior tree are packaged into the form of resources, and distributed by the back-end resource file management system (Content Management System for short), so that when needed When video editing is performed on the object to be edited, the first behavior tree logic is used to determine the set of video clip materials.
  • the back-end resource file management system Content Management System for short
  • S205 Synthesize the clip video according to the content element and the video clip material collection.
  • the actual shooting time of the content element can be used as the basis for sorting the video clip materials in the video clip material collection on the time axis, so that the content element and the video clip material collection are synthesized on the time axis to obtain the clip video.
  • the content element has a corresponding time stamp.
  • the content element in Figure 3 is "smile", its corresponding time stamp is 3"-4", and the duration of the content element is 1".
  • the duration of the content element allocated in the clip video may be different from the duration of the content element, and the duration of the content element allocated in the clip video may be greater than the duration of the content element, or may be less than the duration of the content element. Therefore, in In one implementation, the terminal device can adjust the time of the content element according to the third behavior tree logic, so that the adjusted duration of the content element conforms to the duration allocated in the clip video, thereby ensuring that the content element and the video clip material collection are synthesized Time is more reasonable and accurate.
  • the manner in which the terminal device adjusts the time of the content element according to the third behavior tree logic depends on the size relationship between the time length allocated for the content element in the clip video and the time length of the content element. If the allocated duration of the content element in the clipped video is greater than the duration of the content element, the terminal device can extend the duration of the content element according to the third behavior tree logic to realize the time adjustment of the content element. For example, if the duration of the content element is 1", and the duration of the content element allocated in the clip video is 2", the terminal device is required to extend the duration of the content element to 2" according to the third behavior tree logic, so that the content element is adjusted The duration of is consistent with the duration allocated in the clip video. The terminal device can use 0.5x speed playback to extend the duration of the content element with a duration of 1′′ to 2′′, and it can also use repeated playback to set the duration to 1. The duration of the content element of "is elongated to 2".
  • the terminal device shortens the duration of the content element according to the third behavior tree logic, and realizes the time adjustment of the content element. For example, if the duration of the content element is 1", and the duration of the content element allocated in the clip video is 0.5", the duration of the content element needs to be shortened to 0.5" according to the third behavior tree logic, so that the adjusted duration of the content element conforms to The length of time allocated in the clipped video.
  • the terminal can use the 2x speed playback method to shorten the time length of 1" content element to 0.5".
  • the content element used for video editing in the object to be edited is determined, and each content element has a corresponding content type identifier, which can identify the content characteristics of the corresponding content element.
  • the first behavior tree logic can determine the material collection ID corresponding to the content type ID, and determine the video clip material collection corresponding to the content type ID according to the material collection ID, that is, the determined video clip material collection and content The content characteristics embodied by the elements are consistent.
  • the overall style of the clip video synthesized according to the content element and the collection of video clip material conforms to the content characteristics of the content element, and the clip video synthesized according to the content elements of different content characteristics is in the overall style
  • the above has different characteristics.
  • the randomness of behavior tree logic can further increase the diversity of video clip material collections, so that clip videos synthesized based on content elements with similar content characteristics can also be different in overall style.
  • the resulting edited video has a lower degree of homogeneity, which improves the user experience.
  • the terminal device can automatically implement the video editing without the user's own tailoring to obtain the content elements , Save user interaction costs and improve video editing efficiency.
  • the terminal device determines a video clip material set according to the content type identifier
  • a video clip material set can be directly used when S204 is executed. If multiple video clip material sets are determined, since only one video clip material set is generally used when S204 is executed, and the overall style of the video clip material set used is most in line with the content characteristics of the content element, in this case, it needs to be selected from A video clip material set is selected as the video clip material set corresponding to the final content type identifier.
  • S204 includes:
  • the terminal device determines a target material collection identifier from the multiple material collection identifiers.
  • the material collection identifier corresponding to the content type identifier determined by the terminal device may include one or more. For example, if there are multiple content type identifiers, multiple material set identifiers may be determined.
  • the terminal device may determine the target material set identifier in a way: statistically many The frequency of each of the material collection identifiers in each of the material collection identifiers, and the material collection identifier with the highest frequency among the multiple material collection identifiers is used as the target material collection identifier. Since the target material collection identifier appears most frequently, the video clip material collection corresponding to the target material collection identifier best conforms to the content characteristics of the content element in the overall style.
  • the object to be edited is a dynamic video
  • the dynamic video is divided into 10 video clips according to the timestamp.
  • the content type of the content element of 8 video clips is identified as cute pet
  • the content type of the content element of 2 video clips is identified
  • the multiple material collection identifiers determined according to the content type identifier include cute pets and landscapes. Among them, the frequency of cute pets is 8 times, and the frequency of scenery is 2 times. It can be seen that the material collection identifiers are video clips of cute pets If the material collection is more in line with the content characteristics of the content element, the cute pet can be used as the target material collection identifier.
  • the terminal device uses the video clip material set corresponding to the target material set identifier as the video clip material set corresponding to the content type identifier.
  • the terminal device can reasonably select the video clip material set corresponding to the content type identifier from the multiple video clip material sets when the material set identifier includes multiple, thereby uniformly synthesizing the video clips required
  • the video clip material collection used makes the video clip material collection used for synthesizing and editing video most in line with the content characteristics of the content element in the overall style.
  • the video clip material collection includes a lot of video clip materials. Some video clip materials match the content type identification, while some video clip materials do not match the content type identification. Video clip materials that match the content type identification are more It conforms to the content characteristics of the content element, and when synthesizing with the content element, the clip video obtained is more coordinated.
  • the video clip material matching the content type identifier can be determined from the video clip material set according to the second behavior tree logic, so that when S205 is executed, the terminal device can determine the video clip material according to the content.
  • the elements and the matching video clip material are synthesized to obtain the clip video.
  • the video clip material collection includes a sticker with a dog pattern, a sticker with a rabbit pattern, and a content type identified as a dog.
  • the terminal device can determine the matching video clip material from the video clip material collection. That is, a sticker with a dog pattern, and a clip video is obtained by synthesizing the content element "dog" and the sticker with a dog pattern.
  • the overall style of the video clip material used to synthesize the clip video is more in line with the content characteristics of the content element, and the matched video clip material is more coordinated with the clip video synthesized by the content element.
  • FIG. 6 shows a flowchart of a video editing method, including:
  • the interface of the video editing software homepage can be seen in Figure 701.
  • the interface 701 includes shooting options 704, editing options 705, AI editing options 706, and a template recommendation area 707.
  • the template recommendation area 707 exemplarily shows the currently available The four templates: template 1, template 2, template 3 and template.
  • the user selects an AI editing option in the video editing software.
  • the video editing interface 702 includes a shooting button 708, a selection area 709 of the object to be edited, a video playback area 710, and a confirmation option 711, such as "selected", at least one of the shooting button 708 and the selection area 709 of the object to be edited is shown
  • the object to be edited is used to prompt the user to shoot or select the object to be edited.
  • S603 The user selects or photographs the object to be edited according to the prompt on the terminal device.
  • the object to be edited selected by the user includes two static pictures and three dynamic videos.
  • S604 The terminal device obtains the object to be edited.
  • step S201 For the implementation manner, refer to the above step S201, which will not be repeated here.
  • the terminal device determines a content element used for video editing in the object to be edited, and the content element has a corresponding content type identifier.
  • the terminal device determines that the content elements used for video editing in the object to be edited are static picture A, dynamic video segment B, static picture C, dynamic video segment D, and dynamic video segment E, respectively.
  • the dynamic video segment B', the dynamic video segment D', and the dynamic video segment E' may be extracted from the aforementioned three dynamic videos.
  • step S202 For the determination method of the content element, refer to step S202, which will not be repeated here.
  • the terminal device determines the material collection identifier corresponding to the content type identifier according to the logic of the first behavior tree.
  • step S203 For the implementation manner, refer to step S203, which will not be repeated here.
  • the terminal device determines a video clip material set corresponding to the content type identifier according to the material set identifier.
  • the terminal device determines the set of video clip materials corresponding to the content type identifier from the one-key library.
  • the terminal device cuts the content elements, and finally obtains the content elements used for synthesizing with the video clip set are static picture A', dynamic video segment B', dynamic video segment D', and dynamic video segment E'.
  • static picture A' is static picture A itself
  • dynamic video segment B' is part of dynamic video segment B
  • dynamic video segment D' is part of dynamic video segment D
  • dynamic video segment E' is dynamic video segment E itself.
  • step S204 refer to step S204, which will not be repeated here.
  • the terminal device synthesizes the clip video according to the content element and the video clip material collection.
  • the obtained clip video is shown in FIG. 8.
  • the clip video includes a video stream and an audio stream.
  • the video stream is a combination of the above-mentioned dynamic video segment B', static picture A', dynamic video segment D', and dynamic video segment E'. Get the theme in the key library.
  • the audio stream is the audio material in the one-click library.
  • the terminal device After the terminal device obtains the clipped video, it can display the obtained clipped video to the user.
  • the display interface see 703 in FIG. 7.
  • the display interface 703 includes a video play area 710, a reproduction option 712, and an output option 713. If the user is satisfied with the clipped video, he can click the output option 713, which is the "Output" button, to output the clipped video; if the user is not satisfied with the clipped video, he can click the reproduced option 712 in 703, which is "change.” Press the button to edit the video again.
  • an embodiment of the present application also provides a video editing device.
  • the device includes an acquisition unit 901, a first determination unit 902, a second determination unit 903, and a synthesis unit 904:
  • the obtaining unit 901 is used to obtain the object to be edited
  • the first determining unit 902 is configured to determine a content element used for video editing in the object to be edited, and the content element has a corresponding content type identifier;
  • the second determining unit 903 is configured to determine the material collection identifier corresponding to the content type identifier according to the logic of the first behavior tree;
  • the second determining unit 903 is further configured to determine a video clip material set corresponding to the content type identifier according to the material set identifier;
  • the compositing unit 904 is configured to synthesize the clip video according to the content element and the video clip material collection.
  • the second determining unit 903 is configured to:
  • the target material collection identifier is determined from the multiple material collection identifiers
  • the video clip material set corresponding to the target material set identifier is used as the video clip material set corresponding to the content type identifier.
  • the apparatus further includes a third determining unit 905:
  • the third determining unit 905 is configured to determine the video clip material matching the content type identifier from the video clip material set according to the second behavior tree logic;
  • the synthesis unit is used to synthesize the clipped video according to the content elements and the matched video clip material.
  • the apparatus further includes an adjustment unit 906:
  • the adjusting unit 906 is configured to adjust the time of the content element according to the third behavior tree logic, so that the adjusted duration of the content element conforms to the assigned duration in the clip video.
  • the first determining unit 902 is configured to determine the content element in the object to be edited and the content type identifier corresponding to the content element by extracting the structured information of the object to be edited.
  • the first determining unit 902 is configured to determine a content element for video editing from the object to be edited according to the key frame of the object to be edited.
  • the content element used for video editing in the object to be edited is determined, and each content element has a corresponding content type identifier, which can identify the content characteristics of the corresponding content element.
  • the first behavior tree logic can determine the material collection ID corresponding to the content type ID, and determine the video clip material collection corresponding to the content type ID according to the material collection ID, that is, the determined video clip material collection and content The content characteristics embodied by the elements are consistent.
  • the overall style of the clip video synthesized according to the content element and the collection of video clip material conforms to the content characteristics of the content element, and the clip video synthesized according to the content elements of different content characteristics is in the overall style
  • the above has different characteristics.
  • the randomness of behavior tree logic can further increase the diversity of video clip material collections, so that clip videos synthesized based on content elements with similar content characteristics can also be different in overall style.
  • the resulting edited video has a lower degree of homogeneity, which improves the user experience.
  • an embodiment of the present application provides a device 1000 for video editing.
  • the device 1000 may also be a terminal device.
  • the terminal device may include a mobile phone, a tablet computer, and a personal digital assistant (Personal Digital Assistant). , PDA for short), Point of Sales (POS for short), in-vehicle computer and other smart terminals. Take the terminal device as a mobile phone as an example:
  • FIG. 12 shows a block diagram of a part of the structure of a mobile phone related to a terminal device provided in an embodiment of the present application.
  • the mobile phone includes: a radio frequency (RF) circuit 1010, a memory 1020, an input unit 1030, a display unit 1040, a sensor 1050, an audio circuit 1060, a wireless fidelity (wireless fidelity, WiFi) module 1070, a processing Adapter 1080, and power supply 1090.
  • RF radio frequency
  • the structure of the mobile phone shown in FIG. 12 does not constitute a limitation on the mobile phone, and may include more or less components than those shown in the figure, or combine some components, or arrange different components.
  • the RF circuit 1010 can be used for receiving and sending signals during the process of sending and receiving information or talking. In particular, after receiving the downlink information of the base station, it is processed by the processor 1080; in addition, the designed uplink data is sent to the base station.
  • the RF circuit 1010 includes but is not limited to an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (LNA for short), a duplexer, and the like.
  • the RF circuit 1010 can also communicate with the network and other devices through wireless communication.
  • the above-mentioned wireless communication can use any communication standard or protocol, including but not limited to Global System of Mobile Communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access ( Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), Email, Short Message Service (Short Messaging Service, SMS) Wait.
  • GSM Global System of Mobile Communication
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • LTE Long Term Evolution
  • Email Short Message Service
  • SMS Short Messaging Service
  • the memory 1020 may be used to store software programs and modules.
  • the processor 1080 runs the software programs and modules stored in the memory 1020 to execute various functional applications and data processing of the mobile phone.
  • the memory 1020 may mainly include a program storage area and a data storage area.
  • the program storage area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data (such as audio data, phone book, etc.) created by the use of mobile phones.
  • the memory 1020 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the input unit 1030 can be used to receive inputted digital or character information, and generate key signal input related to user settings and function control of the mobile phone.
  • the input unit 1030 may include a touch panel 1031 and other input devices 1032.
  • the touch panel 1031 also called a touch screen, can collect the user's touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc.) on the touch panel 1031 or near the touch panel 1031. Operation), and drive the corresponding connection device according to the preset program.
  • the touch panel 1031 may include two parts: a touch detection device and a touch controller.
  • the touch detection device detects the user's touch position, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it To the processor 1080, and can receive and execute the commands sent by the processor 1080.
  • the touch panel 1031 can be realized by various types such as resistive, capacitive, infrared, and surface acoustic wave.
  • the input unit 1030 may also include other input devices 1032.
  • other input devices 1032 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackball, mouse, joystick, and the like.
  • the display unit 1040 may be used to display information input by the user or information provided to the user and various menus of the mobile phone.
  • the display unit 1040 may include a display panel 1041.
  • the display panel 1041 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD for short), Organic Light-Emitting Diode (OLED for short), etc.
  • the touch panel 1031 can cover the display panel 1041. When the touch panel 1031 detects a touch operation on or near it, it is sent to the processor 1080 to determine the type of the touch event, and then the processor 1080 responds to the touch event. Type provides corresponding visual output on the display panel 1041.
  • the touch panel 1031 and the display panel 1041 are used as two independent components to implement the input and input functions of the mobile phone, in some embodiments, the touch panel 1031 and the display panel 1041 can be integrated. Realize the input and output functions of mobile phones.
  • the mobile phone may also include at least one sensor 1050, such as a light sensor, a motion sensor, and other sensors.
  • the light sensor can include an ambient light sensor and a proximity sensor.
  • the ambient light sensor can adjust the brightness of the display panel 1041 according to the brightness of the ambient light.
  • the proximity sensor can close the display panel 1041 and/or when the mobile phone is moved to the ear. Or backlight.
  • the accelerometer sensor can detect the magnitude of acceleration in various directions (usually three-axis), and can detect the magnitude and direction of gravity when stationary, and can be used to identify mobile phone posture applications (such as horizontal and vertical screen switching, related Games, magnetometer posture calibration), vibration recognition related functions (such as pedometer, percussion), etc.; as for other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which can be configured in mobile phones, we will not here Repeat.
  • mobile phone posture applications such as horizontal and vertical screen switching, related Games, magnetometer posture calibration), vibration recognition related functions (such as pedometer, percussion), etc.
  • vibration recognition related functions such as pedometer, percussion
  • other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which can be configured in mobile phones, we will not here Repeat.
  • the audio circuit 1060, the speaker 1061, and the microphone 1062 can provide an audio interface between the user and the mobile phone.
  • the audio circuit 1060 can transmit the electric signal after the conversion of the received audio data to the speaker 1061, which is converted into a sound signal for output by the speaker 1061; on the other hand, the microphone 1062 converts the collected sound signal into an electric signal, and the audio circuit 1060 After being received, it is converted into audio data, and then processed by the audio data output processor 1080, and sent to, for example, another mobile phone via the RF circuit 1010, or the audio data is output to the memory 1020 for further processing.
  • WiFi is a short-distance wireless transmission technology.
  • the mobile phone can help users send and receive emails, browse web pages, and access streaming media through the WiFi module 1070. It provides users with wireless broadband Internet access.
  • FIG. 12 shows the WiFi module 1070, it is understandable that it is not a necessary component of the mobile phone and can be omitted as needed without changing the essence of the invention.
  • the processor 1080 is the control center of the mobile phone. It uses various interfaces and lines to connect the various parts of the entire mobile phone. It executes by running or executing software programs and/or modules stored in the memory 1020, and calling data stored in the memory 1020. Various functions and processing data of the mobile phone can be used to monitor the mobile phone as a whole.
  • the processor 1080 may include one or more processing units; preferably, the processor 1080 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface, application programs, etc. , The modem processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 1080.
  • the mobile phone also includes a power supply 1090 (such as a battery) for supplying power to various components.
  • a power supply 1090 (such as a battery) for supplying power to various components.
  • the power supply can be logically connected to the processor 1080 through a power management system, so that functions such as charging, discharging, and power management can be managed through the power management system.
  • the mobile phone may also include a camera, a Bluetooth module, etc., which will not be repeated here.
  • the processor 1080 included in the terminal device also has the following functions:
  • the clip video is synthesized according to the content element and the video clip material collection.
  • the processor is also used to execute:
  • the target material collection identifier is determined from the multiple material collection identifiers
  • the video clip material set corresponding to the target material set identifier is used as the video clip material set corresponding to the content type identifier.
  • the processor is also used to execute:
  • the clip video is synthesized according to the content element and the matched video clip material.
  • the processor is also used to execute:
  • the processor is also used to execute:
  • the content element in the object to be edited By extracting the structured information of the object to be edited, the content element in the object to be edited and the content type identifier corresponding to the content element are determined.
  • the processor is also used to execute:
  • the content element used for the video clip is determined from the object to be edited.
  • FIG. 13 is a structural diagram of the server 1100 provided in an embodiment of the application.
  • the server 1100 may be relatively large due to different configurations or performance.
  • the difference may include one or more central processing units (CPU) 1122 (for example, one or more processors) and memory 1132, and one or more storage media 1130 for storing application programs 1142 or data 1144 (For example, one or one storage device in Shanghai).
  • the memory 1132 and the storage medium 1130 may be short-term storage or persistent storage.
  • the program stored in the storage medium 1130 may include one or more modules (not shown in the figure), and each module may include a series of command operations on the server.
  • the central processing unit 1122 may be configured to communicate with the storage medium 1130, and execute a series of instruction operations in the storage medium 1130 on the server 1100.
  • the server 1100 may also include one or more power supplies 1126, one or more wired or wireless network interfaces 1150, one or more input and output interfaces 1158, and/or one or more operating systems 1141, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • operating systems 1141 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • the steps performed by the server in the foregoing embodiment may be based on the server structure shown in FIG. 13.
  • the CPU 1022 is used to perform the following steps:
  • the clip video is synthesized according to the content element and the video clip material collection.
  • the processor is also used to execute:
  • the target material collection identifier is determined from the multiple material collection identifiers
  • the video clip material set corresponding to the target material set identifier is used as the video clip material set corresponding to the content type identifier.
  • the processor is also used to execute:
  • the clip video is synthesized according to the content element and the matched video clip material.
  • the processor is also used to execute:
  • the processor is also used to execute:
  • the content element in the object to be edited By extracting the structured information of the object to be edited, the content element in the object to be edited and the content type identifier corresponding to the content element are determined.
  • the processor is also used to execute:
  • the content element used for the video clip is determined from the object to be edited.
  • a computer-readable storage medium is also provided, which is applied to a terminal or a server, such as a memory including instructions, which can be executed by a processor to complete the video editing method in the foregoing embodiment.
  • the computer-readable storage medium may be a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a CD-ROM (Compact Disc Read-Only Memory, CD-ROM), Tapes, floppy disks and optical data storage devices, etc.
  • At least one (item) refers to one or more, and “multiple” refers to two or more.
  • “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, “A and/or B” can mean: only A, only B, and both A and B , Where A and B can be singular or plural.
  • the character “/” generally indicates that the associated objects are in an “or” relationship.
  • the following at least one item (a)” or similar expressions refers to any combination of these items, including any combination of a single item (a) or plural items (a).
  • At least one (a) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c" ", where a, b, and c can be single or multiple.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disks or optical disks, etc., which can store program codes Medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Television Signal Processing For Recording (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)

Abstract

本申请实施例公开了一种视频剪辑方法,针对待剪辑对象,确定该待剪辑对象中用于视频剪辑的内容元素,通过第一行为树逻辑确定内容类型标识对应的素材集合标识,并根据素材集合标识确定与内容类型标识对应的视频剪辑素材集合,根据内容元素和视频剪辑素材集合合成得到的剪辑视频。

Description

一种视频剪辑方法、装置、设备和存储介质
本申请要求于2019年03月21日提交的申请号为201910217779.5、发明名称为“一种视频剪辑方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理领域,特别是涉及一种视频剪辑方法、装置、设备和存储介质。
背景技术
视频剪辑技术是一种将待剪辑对象例如静态图像、动态视频通过剪辑的方式合成一段剪辑视频的视频处理方式,常应用于短视频制作、视频集锦等视频剪辑场景。
传统视频剪辑方式是采用固定的剪辑模板,用户可以选择适合待剪辑对象的剪辑模板自动合成出剪辑视频。
然而剪辑模板都具有固定的素材,例如固定的音乐、特效、渲染特效等,导致针对不同的待剪辑对象,通过同一个剪辑模板合成得到的剪辑视频在整体风格上基本相同,具有同质性,带来不好的用户使用体验。
发明内容
为了解决上述技术问题,本申请提供了一种视频剪辑方法、装置、设备和存储介质,得到的剪辑视频同质性的程度更低,提高了用户的使用体验。
本申请实施例公开了如下技术方案:
一方面,本申请实施例提供一种视频剪辑方法,所述方法包括:
获取待剪辑对象;
确定所述待剪辑对象中用于视频剪辑的内容元素,所述内容元素具有对应的内容类型标识;
根据第一行为树逻辑确定所述内容类型标识对应的素材集合标识;
根据所述素材集合标识确定与所述内容类型标识对应的视频剪辑素材集合;
根据所述内容元素和视频剪辑素材集合合成得到剪辑视频。
另一方面,本申请实施例提供一种视频剪辑装置,所述装置包括获取单元、第一确定单元、第二确定单元和合成单元:
所述获取单元,用于获取待剪辑对象;
所述第一确定单元,用于确定所述待剪辑对象中用于视频剪辑的内容元素,所述内容元素具有对应的内容类型标识;
所述第二确定单元,用于根据第一行为树逻辑确定所述内容类型标识对应的素材集合标识;
所述第二确定单元,还用于根据所述素材集合标识确定与所述内容类型标识对应的视频剪辑素材集合;
所述合成单元,用于根据所述内容元素和视频剪辑素材集合合成得到剪辑视频。
另一方面,本申请实施例提供一种用于视频剪辑的设备,所述设备包括处理器以及存储器:
所述存储器用于存储至少一段程序代码,并将所述至少一段程序代码传输给所述处理器;
所述处理器用于根据所述至少一段程序代码中的指令执行第一方面所述的视频剪辑方法。
另一方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质用于存储至少一段程序代码,所述至少一段程序代码用于执行第一方面所述的视频剪辑方法。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种视频剪辑方法的应用场景示意图;
图2为本申请实施例提供的一种视频剪辑方法的流程图;
图3为本申请实施例提供的针对动态视频确定内容元素的示例图;
图4为本申请实施例提供的利用第一行为树逻辑确定内容类型标识对应的 视频剪辑素材集合的示例图;
图5为本申请实施例提供的一种确定视频剪辑素材集合的流程图;
图6为本申请实施例提供的一种视频剪辑方法的流程图;
图7为本申请实施例提供的一种视频剪辑软件首页的界面图;
图8为本申请实施例提供的一种视频剪辑方法的示例图;
图9为本申请实施例提供的一种视频剪辑装置的结构图;
图10为本申请实施例提供的一种视频剪辑装置的结构图;
图11为本申请实施例提供的一种视频剪辑装置的结构图;
图12为本申请实施例提供的一种用于视频剪辑的设备的结构图;
图13为本申请实施例提供的一种服务器的结构图。
具体实施方式
下面结合附图,对本申请的实施例进行描述。
传统视频剪辑方式中,由于同一剪辑模板具有固定的素材,导致针对不同的待剪辑对象,通过同一个剪辑模板合成得到的剪辑视频在整体风格上基本相同,具有同质性,用户使用体验不好。
例如,一个剪辑模板中的素材包括特效A,该剪辑模板设定的特效A位于剪辑视频的第3″到第4″。这样,当针对不同的待剪辑对象选择该剪辑模板后,得到的剪辑视频都会在第3″到第4″出现特效A,即不同的待剪辑视频通过同一个剪辑模板合成得到的剪辑视频在整体风格上基本相同。
为了解决上述技术问题,本申请实施例提供一种视频剪辑方法,该方法可以应用到终端设备中,终端设备例如可以是智能终端、计算机、个人数字助理(Personal Digital Assistant,简称PDA)、平板电脑等具有视频剪辑功能的设备。
该视频剪辑方法还可以应用到服务器中,服务器可以是向终端设备提供视频剪辑服务的设备,终端设备可以将待剪辑对象上传给服务器,服务器利用本申请实施例提供的视频剪辑方法得到剪辑视频,并将剪辑视频返回给终端设备。其中,服务器可以是独立的服务器,也可以是集群中的服务器。
为了便于理解本申请的技术方案,下面结合实际应用场景,以终端设备为例对本申请实施例提供的视频剪辑方法进行介绍。
参见图1,图1为本申请实施例提供的视频剪辑方法的应用场景示意图。该应用场景中包括终端设备101,终端设备101可以获取待剪辑对象。其中,待剪 辑对象为用于通过剪辑的方式以合成剪辑视频所涉及的对象,待剪辑对象包括静态图像和/或动态视频。待剪辑对象还可以包括一组静态图像组成的动态图像。
待剪辑对象中包括内容元素,内容元素可以体现出待剪辑对象所包括的主要内容。内容元素可以为待剪辑对象中的标签信息、场景信息、物体信息(例如人的相关信息、动物的相关信息以及植物的相关信息等)、语音信息以及物体的位置信息中的至少一种。
一般情况下,若待剪辑对象为动态视频,则确定出的内容元素可以为从动态视频本身或部分片段中提取到的信息,部分片段例如为动态视频中的长镜头片段、打斗片段等较为精彩的视频片段,这些视频片段也可以称为高光时刻;若待剪辑对象为静态图片,则确定出的内容元素为静态图片本身中包括的人体、动物、建筑以及天气等。
例如,待剪辑对象为动态视频时,该动态视频中包括一个男人、一只狗以及一辆车,且有较长的镜头为这只狗的镜头,则确定出的内容元素为男人、狗以及车辆等。在待剪辑对象为静态图片时,该静态图片包括一个女人坐在椅子上抱着猫,则确定出的内容元素包括女人、猫以及椅子等。
终端设备101可以从待剪辑对象中确定用于视频剪辑的内容元素。待剪辑对象中的每个内容元素具有对应的内容类型标识,可以标识所对应内容元素的内容特点。例如,待剪辑对象为风景图片,则该待剪辑对象中包括的内容元素对应的内容类型标识为风景,通过内容类型标识终端设备101可以识别出内容元素为风景。再如,待剪辑对象为包括猫或者狗的图片或者视频时,猫和狗对应的内容类型标识为萌宠,通过内容类型标识终端设备101可以识别出内容元素为萌宠。
为了得到与内容元素所体现的内容特点相符的视频剪辑元素集合,以便合成的剪辑视频在整体风格上根据内容元素的不同而有所不同,终端设备101在确定内容元素后,可以通过第一行为树逻辑确定内容类型标识对应的素材集合标识,进而根据素材集合标识确定与内容类型标识对应的视频剪辑素材集合。其中,视频剪辑素材集合中包括用于与内容元素合成剪辑视频所需的各种素材,例如贴纸、滤镜、特效、音乐、字幕、片头、片尾等。素材集合标识用于标识视频剪辑素材集合中视频剪辑素材的类型,例如,素材集合标识为萌宠,则该素材集合标识对应的视频剪辑素材集合中包括的视频剪辑素材与萌宠相关。
例如,内容元素的内容类型标识为萌宠,那么,确定出来的素材集合标识 为萌宠,进而根据素材集合标识确定出的视频剪辑素材集合中包括的素材是与萌宠相关的素材;若内容元素的内容类型标识为风景,那么,确定出来的素材集合标识为风景,进而根据素材集合标识确定出的视频剪辑素材集合中包括的素材是与风景相关的素材,等等。
由于内容类型标识可以标识所对应内容元素的内容特点,因此,确定出的视频剪辑素材集合与内容元素所体现的内容特点相符,根据不同内容特点的内容元素所确定出的视频剪辑素材集合不同,相应的,所合成的剪辑视频在整体风格上具有不同特点。另外,由于行为树逻辑自身具有随机性,从而进一步提高了视频剪辑素材集合的多样性,使得根据类似内容特点的内容元素所合成的剪辑视频在整体风格上也能有所区别。可见,利用本申请实施例提供的视频剪辑方法,得到的剪辑视频同质性的程度更低,提高了用户的使用体验。
接下来,将结合附图对本申请实施例提供的视频剪辑方法进行详细介绍。
参见图2,图2示出了一种视频剪辑方法的流程图,所述方法包括:
S201、终端设备获取待剪辑对象。
在本实施例中,待剪辑对象的获取方式可以包括多种,其中,一种获取方式可以是终端设备从自身已存储的静态图像和/或动态视频中获取的,已存储的静态图像和/或动态视频可以是终端设备采集的,也可以是终端设备通过下载的方式获得的。若已存储的静态图像和/或动态视频是终端设备采集的,那么,静态图像和/或动态视频一般存储在终端设备的图库中,当用户需要进行视频剪辑时,终端设备可以提示用户从图库中选择静态图像和/或动态视频,当用户完成选择操作后,终端设备可以根据用户的选择操作,获取用户选择的静态图像和/或动态视频,此时,终端设备将用户选择的静态图像和/或动态视频作为待剪辑对象。
另一种获取方式可以是用户通过终端设备在需要进行视频剪辑时实时采集的,即当用户使用终端设备中的视频剪辑功能时,终端设备可以提示用户采集静态图像和/或动态视频,当用户利用终端设备完成静态图像和/或动态视频的采集后,终端设备可以获取当前采集到的静态图像和/或动态视频,此时,当前采集到的静态图像和/或动态视频作为待剪辑对象。
S202、终端设备确定待剪辑对象中用于视频剪辑的内容元素。
需要说明的是,根据获取的待剪辑对象利用自动剪辑算法对待剪辑对象进行自动剪辑是本申请实施例提供的视频剪辑方法的核心。如图1所示,自动剪 辑算法主要可以包括两层架构,第一层架构为决策元素提取模块,其用于执行S202的步骤,从待剪辑对象中确定用于视频剪辑的内容元素。第二层架构为剪辑策略实施模块,其用于执行S203和S204的步骤,根据第一行为树逻辑确定与内容元素所体现的内容特点相符的视频剪辑素材集合。
可以理解的是,若待剪辑对象为动态视频,由于动态视频实际上是一种非结构化数据,为了保证终端设备可以识别出待剪辑对象中的内容元素,可以先利用人工智能(Artificial Intelligence,简称AI)将待剪辑对象转换成结构化数据,即利用AI为待剪辑对象中可以作为内容元素的目标添加内容类型标识,从而将待剪辑对象转换成终端设备可以识别的结构化数据。
在这种情况下,S202的一种可能实现方式为:终端设备可以通过提取待剪辑对象的结构化信息,从而确定待剪辑对象中的内容元素,以及内容元素对应的内容类型标识。一般情况下,终端设备可以基于时间戳,提取待剪辑对象的结构化信息,进而确定内容元素以及内容类型标识。其中,结构化信息可以包括:人脸检测与跟踪信息,如人脸关键点信息、人脸表情信息、人脸属性信息等;人体检测与跟踪得信息,如手势信息、动作信息等;宠物检测与跟踪信息,如宠物脸部关键点、宠物种类信息等;以及通过语音活动检测(Voice Activity Detection,简称为VAD)得到的音频信息等。
例如,待剪辑对象包括动态视频A,针对动态视频A得到内容元素的示例图如图3所示。决策元素提取模块获取到动态视频A后,基于时间戳提取结构化信息,从而确定待剪辑对象中的内容元素,这样,每个内容元素具有对应的时间戳。图3中示出了从动态视频A中确定的内容元素与时间戳的对应关系。图3示例性的示出了从动态视频A中确定的5种内容元素,分别为微笑、挥手、跳跃、风景,其中微笑对应的时间戳为3″-4″和23″-27″,挥手对应的时间戳为7″-10″,跳跃对应的时间戳为11″-12″,风景对应的时间戳为15″-19″。
需要说明的是,若待剪辑对象为动态视频,动态视频中可能包括很多视频帧,有一些视频帧能够体现出待剪辑对象的主要内容,而有一些视频帧体现其他内容。例如,待剪辑对象为电视剧的动态视频,该动态视频中一般会存在电视剧的相关内容,以及与电视剧无关的广告画面。那么,动态视频中体现电视剧的相关内容的视频片段可以作为内容元素。
在这种情况下,为了提高从待剪辑对象中确定内容元素的效率,可以将特定的视频帧作为关键帧,关键帧为对待剪辑对象进行分割的分割点,从而根据 关键帧将待剪辑视频分割成多个视频片段,进而根据关键帧确定每个视频片段是否能够体现待剪辑对象的主要内容,实现根据关键帧确定待剪辑对象的内容元素。其中,关键帧可以包括一帧视频帧,也可以包括多帧连续的视频帧。
例如,待剪辑对象包括视频帧1、视频帧2、视频帧3……视频帧100,待剪辑对象中第20帧至第75帧之间为与电视剧无关的广告画面。若关键帧为视频帧20和视频帧75,则利用关键帧对待剪辑对象进行分割得到多个视频片段,第一个视频片段为由视频帧1至视频帧19所构成的视频片段,第二个视频片段为由视频帧20至视频帧75所构成的视频片段,第三个视频片段为由视频帧76至视频帧100所构成的视频片段。终端设备根据关键帧识别出第一个视频片段和第三个视频片段可以体现待剪辑对象的主要内容,而第二视频片段为与待剪辑对象的主要内容无关的其他内容。故,终端设备确定出第一个视频片段和第三个视频片段为内容元素。
S203、终端设备根据第一行为树逻辑确定内容类型标识对应的素材集合标识。
S204、终端设备根据素材集合标识确定与内容类型标识对应的视频剪辑素材集合。
终端设备可以根据第一行为树逻辑从视频剪辑素材库中确定内容类型标识对应的素材集合标识,进而根据素材集合标识确定视频剪辑素材集合。视频剪辑素材库中的视频剪辑素材的来源可以包括多种,例如,视频剪辑素材可以来自于基本素材库,基本素材库中包括的视频剪辑素材为进行视频剪辑必要的素材;视频剪辑素材也可以来自于一键库(Media Editing Asset),一键库中包括如音乐、片头、片尾、字幕、贴纸、特效、滤镜等视频剪辑素材。一键库中的视频剪辑素材为预先设置的。例如,视频剪辑素材还可以来自于基本素材库和一键库等。其中,一键库中的音乐具有不同的播放时长,可以与内容元素相契合;片头片尾可以为视觉标题贴纸和特效;字幕内容与内容元素对应的内容类型标识相对应;贴纸可以为全景氛围贴纸,用于配合场景识别得到的场景的氛围,还可以为目标跟随贴纸,动态跟随人或者动物等物体;特效可以为转场特效,用于连接不同的内容元素,可以是基于图形处理器(Graphics Processing Unit,简称为GPU)渲染的shader(着色器,一种用于渲染图形的技术)语言实现的视觉动画效果;滤镜可以为配合场景识别得到的滤镜效果,如晚霞滤镜、复古电影滤镜等。
行为树(Behaviour Tree)逻辑,是指用树状结构表示智能体从环境感知到逻辑执行的方法,常用于描述游戏中的非玩家角色(Non-Player Character,简称NPC)行为。在本实施例中,将行为树逻辑应用到视频剪辑中,以确定与内容元素的内容特点相符的视频剪辑素材集合。
行为树可以包括不同的节点,例如顺序(Sequence)节点、条件(Condition)节点、动作(Action)节点。顺序(Sequence)节点执行所有子节点则返回成功,如果某个子节点失败则返回失败。例如,利用行为树需要执行的主任务为选取长度为10″的动态视频,可以将其分解成选取3个长度为分别为3″、3″、4″动态视频片段的子任务,每个子任务可以表示为顺序节点的子节点。
条件(Condition)节点可以根据条件的比较结果,返回成功或失败。例如条件节点表示判断动态视频中一个内容元素对应的时间段的长度是否为3″。如果是,选择该时间段所对应的动态视频;如果不是,选择随机3″时间段所对应的动态视频。
动作(Action)节点可以根据动作结果返回成功、失败或运行。动作节点负责动态视频剪辑策略的实施,如选择动态视频中的片段并返回,直接对应底层视频软件开发工具包(Software Development Kit,简称SDK)的应用程序编程接口(Application Programming Interface,简称API)。
可以理解的是,不同的行为树可以实现不同的逻辑,在本实施例中,S203中所使用的行为树逻辑为第一行为树逻辑,利用第一行为树逻辑确定内容类型标识对应的视频剪辑素材集合的示例图参见图4所示。其中,图4中的内容元素为图3得到的内容元素,内容元素可以输入到图4所示的第一行为树中,第一行为树逻辑可以根据内容类型标识确定出视频剪辑素材集合。
其中,动作1、动作2、动作3所示的节点用于表示针对内容元素确定的视频剪辑素材。例如,动作1为“加个萌宠贴纸”,则确定出的视频剪辑素材为萌宠贴纸,进而得到视频剪辑素材集合。再如,顺序节点的子节点表示选择3″和4″的视频片段,则时间戳为3″-5″的视频片段、时间戳为23″-27″的视频片段以及时间戳为7″-10″的视频片段返回成功,则动作1为挥手,动作2为微笑;选择节点的子节点表示视频片段的长度是否为3″,由于时间戳为3″-5″的视频片段的长度为3″,则动作2为微笑,由于时间戳为11″-12″的视频片段的长度为2″,则动作3为跳跃。
需要说明的是,一般情况下行为树逻辑例如第一行为树逻辑等可以利用领 域描述语言(Domain Specific Language,简称DSL)的形式表述出来,便于存储与解析。具体的,以可扩展标记语言(Extensible Markup Language,简称XML)形式描述。
本申请实施例将第一行为树逻辑用于视频剪辑,那么,在此之前,需要创建行为树例如第一行为树,并且保证终端设备可以使用第一行为树逻辑。为此,需要研发人员开发行为树开发工具,例如编辑器,从而利用编辑器实现行为树节点的创建、修改、子树编辑等等。同时还需要研发人员开发解释器,从而读取以XML形式表示的行为树,然后将其动态编译成代码逻辑,加载到剪辑策略实施模块中去。
在视频剪辑素材来自于一键库的情况下,一键库与创建的第一行为树被打包成资源的形式,利用后台的资源文件管理系统(Content Management System简称CMS)进行分发,从而在需要对待剪辑对象进行视频剪辑时,使用第一行为树逻辑确定出视频剪辑素材集合。
S205、根据所述内容元素和视频剪辑素材集合合成得到剪辑视频。
在本实施例中,可以按照内容元素的实际拍摄时间作为视频剪辑素材集合中视频剪辑素材在时间轴上的排序依据,从而在时间轴上将内容元素与视频剪辑素材集合合成得到剪辑视频。
需要说明的是,内容元素具有对应的时间戳,例如,图3中内容元素为“微笑”,其对应的时间戳为3″-4″,该内容元素的时长为1″。然而,在一些情况下,该内容元素在剪辑视频中被分配的时长可能与内容元素的时长不同,内容元素在剪辑视频中被分配的时长可能大于内容元素的时长,也可能小于内容元素的时长。因此,在一种实现方式中,终端设备可以根据第三行为树逻辑对内容元素进行时间调整,使得内容元素调整后的时长符合剪辑视频中被分配的时长,从而保证将内容元素和视频剪辑素材集合进行合成时更加合理、准确。
终端设备根据第三行为树逻辑对内容元素进行时间调整的方式取决于内容元素在剪辑视频中被分配的时长与内容元素的时长的大小关系。若内容元素在剪辑视频中被分配的时长大于内容元素的时长,则终端设备可以根据第三行为树逻辑拉长内容元素的时长,实现对内容元素的时间调整。例如,内容元素的时长为1″,内容元素在剪辑视频中被分配的时长为2″,则需要终端设备根据第三行为树逻辑将内容元素的时长拉长为2″,使得内容元素调整后的时长符合在剪辑视频中被分配的时长。终端设备可以采用0.5倍速播放的方式,将时长为1″ 的内容元素的时长拉长为2″,还可以采用重复播放的方式,将时长为1″的内容元素的时长拉长为2″。
若内容元素在剪辑视频中被分配的时长小于内容元素的时长,则终端设备根据第三行为树逻辑缩短内容元素的时长,实现对内容元素的时间调整。例如,内容元素的时长为1″,内容元素在剪辑视频中被分配的时长为0.5″,则需要根据第三行为树逻辑将内容元素的时长缩短为0.5″,使得内容元素调整后的时长符合在剪辑视频中被分配的时长。终端可以采用2倍速播放的方式,将时长为1″内容元素的时长缩短为0.5″。
由上述技术方案可以看出,针对待剪辑对象,确定该待剪辑对象中用于视频剪辑的内容元素,每一个内容元素都具有对应的内容类型标识,可以标识所对应内容元素的内容特点。在确定内容元素后,可以通过第一行为树逻辑确定内容类型标识对应的素材集合标识,并根据素材集合标识确定与内容类型标识对应的视频剪辑素材集合,即确定出的视频剪辑素材集合与内容元素所体现的内容特点相符,从而,根据内容元素和视频剪辑素材集合合成得到的剪辑视频在整体风格上符合该内容元素的内容特点,根据不同内容特点的内容元素所合成的剪辑视频在整体风格上具有不同特点。而且,行为树逻辑所具有的随机性可以进一步提高视频剪辑素材集合的多样性,使得根据类似内容特点的内容元素所合成的剪辑视频在整体风格上也能有所区别,相对于传统视频剪辑,得到的剪辑视频同质性的程度更低,提高了用户的使用体验。
另外,利用本申请实施例提供的视频剪辑方法,用户只需要选择最原始的静态图片和/或动态视频作为待剪辑对象,之后,终端设备便可以自动实现视频剪辑,无需用户自己剪裁得到内容元素,节省用户交互成本,提高视频剪辑效率。
可以理解的是,终端设备在根据内容类型标识确定视频剪辑素材集合时,若确定出一个视频剪辑素材集合,那么,在执行S204时可以直接利用该视频剪辑素材集合。若确定出多个视频剪辑素材集合,由于在执行S204时一般仅使用一个视频剪辑素材集合,且所使用的视频剪辑素材集合在整体风格上最为符合该内容元素的内容特点,此时,需要从中挑选出一个视频剪辑素材集合作为最终确定的内容类型标识对应的视频剪辑素材集合。
接下来,将在素材集合标识为多个的情况下,对S204进行介绍。参见图5, S204包括:
S501、若素材集合标识包括多个,终端设备从多个素材集合标识中确定出目标素材集合标识。
可以理解的是,终端设备确定出的内容类型标识对应的素材集合标识可以包括一个,也可以包括多个。例如,若内容类型标识包括多个时,可能确定出多个素材集合标识。
在本实施例中,确定目标素材集合标识的方式可以包括多种。在一种可能的实现方式中,为了保证最终确定出的内容类型标识对应的视频剪辑素材集合在整体风格上最符合内容元素的内容特点,终端设备确定目标素材集合标识的方式可以是:统计多个素材集合标识中每个素材集合标识的频次,将多个素材集合标识中频次最高的素材集合标识作为目标素材集合标识。由于目标素材集合标识出现的频次最高,则目标素材集合标识对应的视频剪辑素材集合在整体风格上最符合内容元素的内容特点。
例如,待剪辑对象为动态视频,该动态视频按照时间戳被划分为10个视频片段,其中8个视频片段的内容元素的内容类型标识为萌宠,2个视频片段的内容元素的内容类型标识为风景,则根据内容类型标识确定的多个素材集合标识包括萌宠和风景,其中,萌宠的频次为8次,而风景的频次为2次,可见,素材集合标识为萌宠的视频剪辑素材集合更为符合内容元素的内容特点,则可以将萌宠作为目标素材集合标识。
S502、终端设备将目标素材集合标识对应的视频剪辑素材集合作为内容类型标识对应的视频剪辑素材集合。
通过本实施例提供的方法,终端设备可以在素材集合标识包括多个的情况下,从多个视频剪辑素材集合中合理地选择出内容类型标识对应的视频剪辑素材集合,从而统一合成剪辑视频所要使用的视频剪辑素材集合,使得用于合成剪辑视频所使用的视频剪辑素材集合在整体风格上最为符合内容元素的内容特点。
在一些情况下,视频剪辑素材集合中包括很多视频剪辑素材,有些视频剪辑素材与内容类型标识相匹配,而有些视频剪辑素材与内容类型标识不匹配,与内容类型标识相匹配的视频剪辑素材更加符合内容元素的内容特点,与内容元素进行合成时,得到的剪辑视频更加协调。
因此,在一种实现方式中,在执行S204后,可以根据第二行为树逻辑从视 频剪辑素材集合中确定出与内容类型标识匹配的视频剪辑素材,从而在执行S205时,终端设备可以根据内容元素和匹配的视频剪辑素材合成得到剪辑视频。
例如,素材集合标识为萌宠的视频剪辑素材集合中包括图案为狗的贴纸、图案为兔子的贴纸,内容类型标识为狗,则终端设备可以从视频剪辑素材集合确定出匹配的视频剪辑素材,即图案为狗的贴纸,从而根据内容元素“狗”和图案为狗的贴纸合成得到剪辑视频。
通过本实施例提供的方法,使得用于合成剪辑视频的视频剪辑素材在整体风格上更加符合内容元素的内容特点,匹配的视频剪辑素材与内容元素合成得到的剪辑视频更加协调。
接下来,将结合实际应用场景对本申请实施例提供的视频剪辑方法进行介绍。在该应用场景中,用户利用终端设备上的视频剪辑软件进行视频剪辑,待剪辑对象由用户从终端设备的图库中选择,视频剪辑素材来自于一键库。参见图6,图6示出了一种视频剪辑方法的流程图,包括:
S601、用户打开终端设备上的视频剪辑软件。
视频剪辑软件首页的界面可以参见图7中701所示,该界面701包括拍摄选项704、编辑选项705、AI剪辑选项706以及模版推荐区域707,该模版推荐区域707示例性的示出了当前可用的四个模板:模版1、模板2、模板3以及模板。
S602、用户选定视频剪辑软件中的AI剪辑选项。
用户选定AI剪辑选项后,进入视频剪辑界面,视频剪辑界面参见图7中702所示。视频剪辑界面702上包括拍摄按键708、待剪辑对象的选择区域709、视频播放区域710以及确认选项711,如“选好了”,拍摄按键708和待剪辑对象的选择区域709示出的至少一个待剪辑对象用于提示用户拍摄或选择待剪辑对象。
S603、用户根据终端设备上的提示选择或拍摄待剪辑对象。
参见图8,图8以用户从图库中选择待剪辑对象为例对后续根据待剪辑对象实现视频剪辑进行介绍。其中,用户选择的待剪辑对象包括两个静态图片和三个动态视频。
S604、终端设备获取待剪辑对象。
实现方式可以参见上述步骤S201,在此不再赘述。
S605、终端设备确定待剪辑对象中用于视频剪辑的内容元素,内容元素具有对应的内容类型标识。
参见图8,终端设备确定待剪辑对象中用于视频剪辑的内容元素分别为静态图片A、动态视频片段B、静态图片C、动态视频片段D和动态视频片段E。动态视频片段B’、动态视频片段D’和动态视频片段E’可以是根据前述三个动态视频提取得到的。内容元素的确定方式可以参见步骤S202,在此不再赘述。
S606、终端设备根据第一行为树逻辑确定内容类型标识对应的素材集合标识。
实现方式可以参见步骤S203,在此不再赘述。
S607、终端设备根据素材集合标识确定与内容类型标识对应的视频剪辑素材集合。
参见图8,终端设备从一键库中确定内容类型标识对应的视频剪辑素材集合。终端设备通过对内容元素进行裁剪,最终得到用于与视频剪辑集合进行合成的内容元素分别为静态图片A’、动态视频片段B’、动态视频片段D’和动态视频片段E’。其中,静态图片A’为静态图片A本身,动态视频片段B’为动态视频片段B的部分片段,动态视频片段D’为动态视频片段D的部分片段,动态视频片段E’为动态视频片段E本身。实现方式可以参见步骤S204,在此不再赘述。
S608、终端设备根据内容元素和视频剪辑素材集合合成得到剪辑视频。
得到的剪辑视频参见图8所示,该剪辑视频包括视频流和音频流,其中,视频流由上述动态视频片段B’、静态图片A’、动态视频片段D’以及动态视频片段E’结合一键库中的主题得到。音频流为一键库中的音频素材。
终端设备得到剪辑视频后,可以将得到的剪辑视频展示给用户,其展示界面参见图7中703所示。展示界面703中包括视频播放区域710、重制选项712以及输出选项713。若用户对该剪辑视频满意则可以点击输出选项713,即“输出”按键,输出该剪辑视频;若用户对该剪辑视频不满意,则可以点击703中重制选项712,即“换一换”按键,重新剪辑视频。
基于前述实施例提供的一种视频剪辑方法,本申请实施例还提供一种视频剪辑装置,参见图9,装置包括获取单元901、第一确定单元902、第二确定单元903和合成单元904:
获取单元901,用于获取待剪辑对象;
第一确定单元902,用于确定待剪辑对象中用于视频剪辑的内容元素,内容元素具有对应的内容类型标识;
第二确定单元903,用于根据第一行为树逻辑确定内容类型标识对应的素材集合标识;
第二确定单元903,还用于根据素材集合标识确定与内容类型标识对应的视频剪辑素材集合;
合成单元904,用于根据内容元素和视频剪辑素材集合合成得到剪辑视频。
在一种可能的实现方式中,第二确定单元903,用于:
若素材集合标识包括多个,从多个素材集合标识中确定出目标素材集合标识;
将目标素材集合标识对应的视频剪辑素材集合作为内容类型标识对应的视频剪辑素材集合。
在一种可能的实现方式中,参见图10,装置还包括第三确定单元905:
第三确定单元905,用于根据第二行为树逻辑从视频剪辑素材集合中确定出与内容类型标识匹配的视频剪辑素材;
合成单元,用于根据内容元素和匹配的视频剪辑素材合成得到剪辑视频。
在一种可能的实现方式中,参见图11,装置还包括调整单元906:
调整单元906,用于根据第三行为树逻辑对内容元素进行时间调整,使得内容元素调整后的时长符合剪辑视频中被分配的时长。
在一种可能的实现方式中,第一确定单元902,用于通过提取待剪辑对象的结构化信息,确定待剪辑对象中的内容元素,以及内容元素对应的内容类型标识。
在一种可能的实现方式中,若待剪辑对象为动态视频,第一确定单元902,用于根据待剪辑对象的关键帧从待剪辑对象中确定用于视频剪辑的内容元素。
由上述技术方案可以看出,针对待剪辑对象,确定该待剪辑对象中用于视频剪辑的内容元素,每一个内容元素都具有对应的内容类型标识,可以标识所对应内容元素的内容特点。在确定内容元素后,可以通过第一行为树逻辑确定内容类型标识对应的素材集合标识,并根据素材集合标识确定与内容类型标识对应的视频剪辑素材集合,即确定出的视频剪辑素材集合与内容元素所体现的内容特点相符,从而,根据内容元素和视频剪辑素材集合合成得到的剪辑视频在整体风格上符合该内容元素的内容特点,根据不同内容特点的内容元素所合 成的剪辑视频在整体风格上具有不同特点。而且,行为树逻辑所具有的随机性可以进一步提高视频剪辑素材集合的多样性,使得根据类似内容特点的内容元素所合成的剪辑视频在整体风格上也能有所区别,相对于传统视频剪辑,得到的剪辑视频同质性的程度更低,提高了用户的使用体验。
本申请实施例还提供了一种用于视频剪辑的设备,下面结合附图对用于视频剪辑的设备进行介绍。请参见图12所示,本申请实施例提供了一种用于视频剪辑的设备1000,该设备1000还可以是终端设备,该终端设备可以为包括手机、平板电脑、个人数字助理(Personal Digital Assistant,简称PDA)、销售终端(Point of Sales,简称POS)、车载电脑等任意智能终端,以终端设备为手机为例:
图12示出的是与本申请实施例提供的终端设备相关的手机的部分结构的框图。参考图12,手机包括:射频(Radio Frequency,简称RF)电路1010、存储器1020、输入单元1030、显示单元1040、传感器1050、音频电路1060、无线保真(wireless fidelity,简称WiFi)模块1070、处理器1080、以及电源1090等部件。本领域技术人员可以理解,图12中示出的手机结构并不构成对手机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
下面结合图12对手机的各个构成部件进行具体的介绍:
RF电路1010可用于收发信息或通话过程中,信号的接收和发送,特别地,将基站的下行信息接收后,给处理器1080处理;另外,将设计上行的数据发送给基站。通常,RF电路1010包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(Low Noise Amplifier,简称LNA)、双工器等。此外,RF电路1010还可以通过无线通信与网络和其他设备通信。上述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(Global System of Mobile communication,简称GSM)、通用分组无线服务(General Packet Radio Service,简称GPRS)、码分多址(Code Division Multiple Access,简称CDMA)、宽带码分多址(Wideband Code Division Multiple Access,简称WCDMA)、长期演进(Long Term Evolution,简称LTE)、电子邮件、短消息服务(Short Messaging Service,简称SMS)等。
存储器1020可用于存储软件程序以及模块,处理器1080通过运行存储在存储器1020的软件程序以及模块,从而执行手机的各种功能应用以及数据处理。存储器1020可主要包括存储程序区和存储数据区,其中,存储程序区可存储操 作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器1020可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
输入单元1030可用于接收输入的数字或字符信息,以及产生与手机的用户设置以及功能控制有关的键信号输入。具体地,输入单元1030可包括触控面板1031以及其他输入设备1032。触控面板1031,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板1031上或在触控面板1031附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触控面板1031可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器1080,并能接收处理器1080发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板1031。除了触控面板1031,输入单元1030还可以包括其他输入设备1032。具体地,其他输入设备1032可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。
显示单元1040可用于显示由用户输入的信息或提供给用户的信息以及手机的各种菜单。显示单元1040可包括显示面板1041,可选的,可以采用液晶显示器(Liquid Crystal Display,简称LCD)、有机发光二极管(Organic Light-Emitting Diode,简称OLED)等形式来配置显示面板1041。进一步的,触控面板1031可覆盖显示面板1041,当触控面板1031检测到在其上或附近的触摸操作后,传送给处理器1080以确定触摸事件的类型,随后处理器1080根据触摸事件的类型在显示面板1041上提供相应的视觉输出。虽然在图12中,触控面板1031与显示面板1041是作为两个独立的部件来实现手机的输入和输入功能,但是在某些实施例中,可以将触控面板1031与显示面板1041集成而实现手机的输入和输出功能。
手机还可包括至少一种传感器1050,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板1041的亮度,接近传感器可在手 机移动到耳边时,关闭显示面板1041和/或背光。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于手机还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。
音频电路1060、扬声器1061,传声器1062可提供用户与手机之间的音频接口。音频电路1060可将接收到的音频数据转换后的电信号,传输到扬声器1061,由扬声器1061转换为声音信号输出;另一方面,传声器1062将收集的声音信号转换为电信号,由音频电路1060接收后转换为音频数据,再将音频数据输出处理器1080处理后,经RF电路1010以发送给比如另一手机,或者将音频数据输出至存储器1020以便进一步处理。
WiFi属于短距离无线传输技术,手机通过WiFi模块1070可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图12示出了WiFi模块1070,但是可以理解的是,其并不属于手机的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。
处理器1080是手机的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器1020内的软件程序和/或模块,以及调用存储在存储器1020内的数据,执行手机的各种功能和处理数据,从而对手机进行整体监控。可选的,处理器1080可包括一个或多个处理单元;优选的,处理器1080可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器1080中。
手机还包括给各个部件供电的电源1090(比如电池),优选的,电源可以通过电源管理系统与处理器1080逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。
尽管未示出,手机还可以包括摄像头、蓝牙模块等,在此不再赘述。
在本实施例中,该终端设备所包括的处理器1080还具有以下功能:
获取待剪辑对象;
确定待剪辑对象中用于视频剪辑的内容元素,内容元素具有对应的内容类型标识;
根据第一行为树逻辑确定内容类型标识对应的素材集合标识;
根据素材集合标识确定与内容类型标识对应的视频剪辑素材集合;
根据内容元素和视频剪辑素材集合合成得到剪辑视频。
可选的,处理器还用于执行:
若素材集合标识包括多个,从多个素材集合标识中确定出目标素材集合标识;
将目标素材集合标识对应的视频剪辑素材集合作为内容类型标识对应的视频剪辑素材集合。
可选的,处理器还用于执行:
根据第二行为树逻辑从视频剪辑素材集合中确定出与内容类型标识匹配的视频剪辑素材;
根据内容元素和匹配的视频剪辑素材合成得到剪辑视频。
可选的,处理器还用于执行:
根据第三行为树逻辑对内容元素进行时间调整,使得内容元素调整后的时长符合剪辑视频中被分配的时长。
可选的,处理器还用于执行:
通过提取待剪辑对象的结构化信息,确定待剪辑对象中的内容元素,以及内容元素对应的内容类型标识。
可选的,若待剪辑对象为动态视频,处理器,还用于执行:
根据待剪辑对象的关键帧,从待剪辑对象中确定用于视频剪辑的内容元素。
本申请实施例提供的用于视频剪辑的设备可以是服务器,请参见图13所示,图13为本申请实施例提供的服务器1100的结构图,服务器1100可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(Central Processing Units,简称CPU)1122(例如,一个或一个以上处理器)和存储器1132,一个或一个以上存储应用程序1142或数据1144的存储介质1130(例如一个或一个以上海量存储设备)。其中,存储器1132和存储介质1130可以是短暂存储或持久存储。存储在存储介质1130的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器1122可以设置为与存储介质1130通信,在服务器1100上执行存储介质1130中的一系列指令操作。
服务器1100还可以包括一个或一个以上电源1126,一个或一个以上有线或 无线网络接口1150,一个或一个以上输入输出接口1158,和/或,一个或一个以上操作系统1141,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
上述实施例中由服务器所执行的步骤可以基于该图13所示的服务器结构。
其中,CPU 1022用于执行如下步骤:
获取待剪辑对象;
确定待剪辑对象中用于视频剪辑的内容元素,内容元素具有对应的内容类型标识;
根据第一行为树逻辑确定内容类型标识对应的素材集合标识;
根据素材集合标识确定与内容类型标识对应的视频剪辑素材集合;
根据内容元素和视频剪辑素材集合合成得到剪辑视频。
可选的,处理器还用于执行:
若素材集合标识包括多个,从多个素材集合标识中确定出目标素材集合标识;
将目标素材集合标识对应的视频剪辑素材集合作为内容类型标识对应的视频剪辑素材集合。
可选的,处理器还用于执行:
根据第二行为树逻辑从视频剪辑素材集合中确定出与内容类型标识匹配的视频剪辑素材;
根据内容元素和匹配的视频剪辑素材合成得到剪辑视频。
可选的,处理器还用于执行:
根据第三行为树逻辑对内容元素进行时间调整,使得内容元素调整后的时长符合剪辑视频中被分配的时长。
可选的,处理器还用于执行:
通过提取待剪辑对象的结构化信息,确定待剪辑对象中的内容元素,以及内容元素对应的内容类型标识。
可选的,若待剪辑对象为动态视频,处理器,还用于执行:
根据待剪辑对象的关键帧,从待剪辑对象中确定用于视频剪辑的内容元素。
在本申请实施例中,还提供了一种计算机可读存储介质,应用于终端或者服务器,例如包括指令的存储器,上述指令可由处理器执行以完成上述实施例中的视频剪辑方法。例如,该计算机可读存储介质可以是只读存储器(Read-Only  Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)、磁带、软盘和光数据存储设备等。
本申请的说明书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中, 也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,简称ROM)、随机存取存储器(Random Access Memory,简称RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (19)

  1. 一种视频剪辑方法,其特征在于,所述方法包括:
    获取待剪辑对象;
    确定所述待剪辑对象中用于视频剪辑的内容元素,所述内容元素具有对应的内容类型标识;
    根据第一行为树逻辑确定所述内容类型标识对应的素材集合标识;
    根据所述素材集合标识确定与所述内容类型标识对应的视频剪辑素材集合;
    根据所述内容元素和视频剪辑素材集合合成得到剪辑视频。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述素材集合标识确定与所述内容类型标识对应的视频剪辑素材集合,包括:
    若所述素材集合标识包括多个,从多个素材集合标识中确定出目标素材集合标识;
    将所述目标素材集合标识对应的视频剪辑素材集合作为所述内容类型标识对应的视频剪辑素材集合。
  3. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    根据第二行为树逻辑从所述视频剪辑素材集合中确定出与所述内容类型标识匹配的视频剪辑素材;
    所述根据所述内容元素和视频剪辑素材集合合成得到剪辑视频,包括:
    根据所述内容元素和所述匹配的视频剪辑素材合成得到剪辑视频。
  4. 根据权利要求1-3任意一项所述的方法,其特征在于,所述方法还包括:
    根据第三行为树逻辑对所述内容元素进行时间调整,使得所述内容元素调整后的时长符合所述剪辑视频中被分配的时长。
  5. 根据权利要求1-3任意一项所述的方法,其特征在于,所述确定所述待剪辑对象中用于视频剪辑的内容元素,包括:
    通过提取所述待剪辑对象的结构化信息,确定所述待剪辑对象中的所述内容元素,以及所述内容元素对应的内容类型标识。
  6. 根据权利要求1-3任意一项所述的方法,其特征在于,若所述待剪辑对象为动态视频,所述确定所述待剪辑对象中用于视频剪辑的内容元素,包括:
    根据所述待剪辑对象的关键帧,从所述待剪辑对象中确定用于视频剪辑的内容元素。
  7. 一种视频剪辑装置,其特征在于,所述装置包括获取单元、第一确定单元、第二确定单元和合成单元:
    所述获取单元,用于获取待剪辑对象;
    所述第一确定单元,用于确定所述待剪辑对象中用于视频剪辑的内容元素,所述内容元素具有对应的内容类型标识;
    所述第二确定单元,用于根据第一行为树逻辑确定所述内容类型标识对应的素材集合标识;
    所述第二确定单元,还用于根据所述素材集合标识确定与所述内容类型标识对应的视频剪辑素材集合;
    所述合成单元,用于根据所述内容元素和视频剪辑素材集合合成得到剪辑视频。
  8. 根据权利要求7所述的装置,其特征在于,所述第二确定单元,用于:
    若所述素材集合标识包括多个,从多个素材集合标识中确定出目标素材集合标识;
    将所述目标素材集合标识对应的视频剪辑素材集合作为所述内容类型标识对应的视频剪辑素材集合。
  9. 根据权利要求7所述的装置,其特征在于,所述装置还包括第三确定单元:
    所述第三确定单元,用于根据第二行为树逻辑从所述视频剪辑素材集合中确定出与所述内容类型标识匹配的视频剪辑素材;
    所述合成单元,用于根据所述内容元素和所述匹配的视频剪辑素材合成得到剪辑视频。
  10. 根据权利要求7-9任意一项所述的装置,其特征在于,所述装置还包括调整单元:
    所述调整单元,用于根据第三行为树逻辑对所述内容元素进行时间调整,使得所述内容元素调整后的时长符合所述剪辑视频中被分配的时长。
  11. 根据权利要求7-9任意一项所述的装置,其特征在于,所述第一确定单元,用于通过提取所述待剪辑对象的结构化信息,确定所述待剪辑对象中的所述内容元素,以及所述内容元素对应的内容类型标识。
  12. 根据权利要求7-9任意一项所述的装置,其特征在于,若所述待剪辑对 象为动态视频,所述第一确定单元,用于根据所述待剪辑对象的关键帧,从所述待剪辑对象中确定用于视频剪辑的内容元素。
  13. 一种用于视频剪辑的设备,其特征在于,所述设备包括处理器以及存储器:
    所述存储器用于存储至少一段程序代码,并将所述至少一段程序代码传输给所述处理器;
    所述处理器用于根据执行:
    获取待剪辑对象;
    确定所述待剪辑对象中用于视频剪辑的内容元素,所述内容元素具有对应的内容类型标识;
    根据第一行为树逻辑确定所述内容类型标识对应的素材集合标识;
    根据所述素材集合标识确定与所述内容类型标识对应的视频剪辑素材集合;
    根据所述内容元素和视频剪辑素材集合合成得到剪辑视频。
  14. 根据权利要求13所述的用于视频剪辑的设备,其特征在于,所述处理器还用于执行:
    若所述素材集合标识包括多个,从多个素材集合标识中确定出目标素材集合标识;
    将所述目标素材集合标识对应的视频剪辑素材集合作为所述内容类型标识对应的视频剪辑素材集合。
  15. 根据权利要求13所述的用于视频剪辑的设备,其特征在于,所述处理器还用于执行:
    根据第二行为树逻辑从所述视频剪辑素材集合中确定出与所述内容类型标识匹配的视频剪辑素材;
    根据所述内容元素和所述匹配的视频剪辑素材合成得到剪辑视频。
  16. 根据权利要求13-15任意一项所述的用于视频剪辑的设备,其特征在于,所述处理器还用于执行:
    根据第三行为树逻辑对所述内容元素进行时间调整,使得所述内容元素调整后的时长符合所述剪辑视频中被分配的时长。
  17. 根据权利要求13-15任意一项所述的用于视频剪辑的设备,其特征在于,所述处理器还用于执行:
    通过提取所述待剪辑对象的结构化信息,确定所述待剪辑对象中的所述内容元素,以及所述内容元素对应的内容类型标识。
  18. 根据权利要求13-15任意一项所述的用于视频剪辑的设备,其特征在于,若所述待剪辑对象为动态视频,所述处理器,还用于执行:
    根据所述待剪辑对象的关键帧,从所述待剪辑对象中确定用于视频剪辑的内容元素。
  19. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储至少一段程序代码,所述至少一段程序代码用于执行权利要求1-6所述的视频剪辑方法。
PCT/CN2020/078548 2019-03-21 2020-03-10 一种视频剪辑方法、装置、设备和存储介质 WO2020187086A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/314,231 US11715497B2 (en) 2019-03-21 2021-05-07 Video editing method, apparatus, and device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910217779.5 2019-03-21
CN201910217779.5A CN109819179B (zh) 2019-03-21 2019-03-21 一种视频剪辑方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/314,231 Continuation US11715497B2 (en) 2019-03-21 2021-05-07 Video editing method, apparatus, and device, and storage medium

Publications (1)

Publication Number Publication Date
WO2020187086A1 true WO2020187086A1 (zh) 2020-09-24

Family

ID=66609926

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/078548 WO2020187086A1 (zh) 2019-03-21 2020-03-10 一种视频剪辑方法、装置、设备和存储介质

Country Status (3)

Country Link
US (1) US11715497B2 (zh)
CN (1) CN109819179B (zh)
WO (1) WO2020187086A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115460455A (zh) * 2022-09-06 2022-12-09 上海硬通网络科技有限公司 一种视频剪辑方法、装置、设备及存储介质
CN116896672A (zh) * 2023-09-11 2023-10-17 北京美摄网络科技有限公司 视频特效处理方法、装置、电子设备及存储介质

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109599079B (zh) * 2017-09-30 2022-09-23 腾讯科技(深圳)有限公司 一种音乐的生成方法和装置
CN109819179B (zh) * 2019-03-21 2022-02-01 腾讯科技(深圳)有限公司 一种视频剪辑方法和装置
CN110139159B (zh) * 2019-06-21 2021-04-06 上海摩象网络科技有限公司 视频素材的处理方法、装置及存储介质
CN110381371B (zh) * 2019-07-30 2021-08-31 维沃移动通信有限公司 一种视频剪辑方法及电子设备
CN110418196B (zh) * 2019-08-29 2022-01-28 金瓜子科技发展(北京)有限公司 视频生成方法、装置及服务器
CN112988671A (zh) * 2019-12-13 2021-06-18 北京字节跳动网络技术有限公司 媒体文件处理方法、装置、可读介质及电子设备
CN113079326A (zh) * 2020-01-06 2021-07-06 北京小米移动软件有限公司 视频剪辑方法及装置、存储介质
CN111263241B (zh) * 2020-02-11 2022-03-08 腾讯音乐娱乐科技(深圳)有限公司 媒体数据的生成方法、装置、设备及存储介质
CN111541936A (zh) * 2020-04-02 2020-08-14 腾讯科技(深圳)有限公司 视频及图像处理方法、装置、电子设备、存储介质
CN111866585B (zh) * 2020-06-22 2023-03-24 北京美摄网络科技有限公司 一种视频处理方法及装置
CN113840099B (zh) 2020-06-23 2023-07-07 北京字节跳动网络技术有限公司 视频处理方法、装置、设备及计算机可读存储介质
CN111541946A (zh) * 2020-07-10 2020-08-14 成都品果科技有限公司 一种基于素材进行资源匹配的视频自动生成方法及系统
CN111756953A (zh) * 2020-07-14 2020-10-09 北京字节跳动网络技术有限公司 视频处理方法、装置、设备和计算机可读介质
CN111859017A (zh) * 2020-07-21 2020-10-30 南京智金科技创新服务中心 基于互联网大数据的数字视频制作系统
CN111741331B (zh) * 2020-08-07 2020-12-22 北京美摄网络科技有限公司 一种视频片段处理方法、装置、存储介质及设备
CN112118397B (zh) * 2020-09-23 2021-06-22 腾讯科技(深圳)有限公司 一种视频合成的方法、相关装置、设备以及存储介质
CN112132931B (zh) * 2020-09-29 2023-12-19 新华智云科技有限公司 一种模板化视频合成的处理方法、装置及系统
CN112969035A (zh) * 2021-01-29 2021-06-15 新华智云科技有限公司 一种可视化视频制作方法及制作系统
CN113079405B (zh) * 2021-03-26 2023-02-17 北京字跳网络技术有限公司 一种多媒体资源剪辑方法、装置、设备及存储介质
US11551397B1 (en) * 2021-05-27 2023-01-10 Gopro, Inc. Media animation selection using a graph
CN113315883B (zh) * 2021-05-27 2023-01-20 北京达佳互联信息技术有限公司 调整视频组合素材的方法和装置
CN113181654A (zh) * 2021-05-28 2021-07-30 网易(杭州)网络有限公司 游戏画面生成方法、装置、存储介质及电子设备
CN113473204B (zh) * 2021-05-31 2023-10-13 北京达佳互联信息技术有限公司 一种信息展示方法、装置、电子设备及存储介质
CN115546855A (zh) * 2021-06-30 2022-12-30 脸萌有限公司 图像处理方法、装置及可读存储介质
CN113794930B (zh) * 2021-09-10 2023-11-24 中国联合网络通信集团有限公司 视频生成方法、装置、设备及存储介质
CN113852767B (zh) * 2021-09-23 2024-02-13 北京字跳网络技术有限公司 视频编辑方法、装置、设备及介质
CN113905189A (zh) * 2021-09-28 2022-01-07 安徽尚趣玩网络科技有限公司 一种视频内容动态拼接方法及装置
CN113784167B (zh) * 2021-10-11 2023-04-28 福建天晴数码有限公司 一种基于3d渲染的互动视频制作和播放的方法及终端
CN114222191B (zh) * 2021-11-24 2023-09-08 星际数科科技股份有限公司 一种聊天换装视频播放的系统
CN114302224B (zh) * 2021-12-23 2023-04-07 新华智云科技有限公司 一种视频智能剪辑方法、装置、设备及存储介质
TWI824453B (zh) * 2022-03-24 2023-12-01 華碩電腦股份有限公司 影像剪輯方法及其系統
CN116866545A (zh) * 2023-06-30 2023-10-10 荣耀终端有限公司 摄像头模组的映射关系调整方法、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120195573A1 (en) * 2011-01-28 2012-08-02 Apple Inc. Video Defect Replacement
CN106899809A (zh) * 2017-02-28 2017-06-27 广州市诚毅科技软件开发有限公司 一种基于深度学习的视频剪辑方法和装置
CN107770626A (zh) * 2017-11-06 2018-03-06 腾讯科技(深圳)有限公司 视频素材的处理方法、视频合成方法、装置及存储介质
CN109819179A (zh) * 2019-03-21 2019-05-28 腾讯科技(深圳)有限公司 一种视频剪辑方法和装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000173246A (ja) * 1998-11-30 2000-06-23 Sony Corp 編集装置及び編集方法
JP2003018580A (ja) * 2001-06-29 2003-01-17 Matsushita Electric Ind Co Ltd コンテンツ配信システムおよび配信方法
JP4225124B2 (ja) * 2003-06-06 2009-02-18 ソニー株式会社 データ処理方法およびそのシステム
JP2005341064A (ja) * 2004-05-25 2005-12-08 Sony Corp 情報送出装置、情報送出方法、プログラム及び記録媒体並びに表示制御装置及び表示方法
JP2006042317A (ja) * 2004-06-23 2006-02-09 Matsushita Electric Ind Co Ltd アーカイブ管理装置、アーカイブ管理システム及びアーカイブ管理プログラム
CN106250100B (zh) * 2016-08-15 2018-05-11 腾讯科技(深圳)有限公司 系统逻辑控制方法及装置
CN106973304A (zh) * 2017-02-14 2017-07-21 北京时间股份有限公司 基于云端的非线性剪辑方法、装置及系统
CN107147959B (zh) * 2017-05-05 2020-06-19 中广热点云科技有限公司 一种广播视频剪辑获取方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120195573A1 (en) * 2011-01-28 2012-08-02 Apple Inc. Video Defect Replacement
CN106899809A (zh) * 2017-02-28 2017-06-27 广州市诚毅科技软件开发有限公司 一种基于深度学习的视频剪辑方法和装置
CN107770626A (zh) * 2017-11-06 2018-03-06 腾讯科技(深圳)有限公司 视频素材的处理方法、视频合成方法、装置及存储介质
CN109819179A (zh) * 2019-03-21 2019-05-28 腾讯科技(深圳)有限公司 一种视频剪辑方法和装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115460455A (zh) * 2022-09-06 2022-12-09 上海硬通网络科技有限公司 一种视频剪辑方法、装置、设备及存储介质
CN115460455B (zh) * 2022-09-06 2024-02-09 上海硬通网络科技有限公司 一种视频剪辑方法、装置、设备及存储介质
CN116896672A (zh) * 2023-09-11 2023-10-17 北京美摄网络科技有限公司 视频特效处理方法、装置、电子设备及存储介质
CN116896672B (zh) * 2023-09-11 2023-12-29 北京美摄网络科技有限公司 视频特效处理方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
US11715497B2 (en) 2023-08-01
US20210264952A1 (en) 2021-08-26
CN109819179B (zh) 2022-02-01
CN109819179A (zh) 2019-05-28

Similar Documents

Publication Publication Date Title
WO2020187086A1 (zh) 一种视频剪辑方法、装置、设备和存储介质
CN109819313B (zh) 视频处理方法、装置及存储介质
TWI592021B (zh) 生成視頻的方法、裝置及終端
TWI732240B (zh) 視頻檔案的生成方法、裝置及儲存媒體
WO2020078299A1 (zh) 一种处理视频文件的方法及电子设备
CN110582018B (zh) 一种视频文件处理的方法、相关装置及设备
WO2016177296A1 (zh) 一种生成视频的方法和装置
CN113453040B (zh) 短视频的生成方法、装置、相关设备及介质
US11653069B2 (en) Subtitle splitter
CN110213504B (zh) 一种视频处理方法、信息发送方法及相关设备
KR102071576B1 (ko) 콘텐트 재생 방법 및 이를 위한 단말
CN110662090B (zh) 一种视频处理方法和系统
CN108966004A (zh) 一种视频处理方法及终端
CN112261481B (zh) 互动视频的创建方法、装置、设备及可读存储介质
CN108055587A (zh) 图像文件的分享方法、装置、移动终端及存储介质
CN107948729B (zh) 富媒体处理方法、装置、存储介质和电子设备
CN112118397B (zh) 一种视频合成的方法、相关装置、设备以及存储介质
CN111491123A (zh) 视频背景处理方法、装置及电子设备
CN114637890A (zh) 在图像画面中显示标签的方法、终端设备及存储介质
CN114697742A (zh) 一种视频录制方法及电子设备
CN114880062B (zh) 聊天表情展示方法、设备、电子设备及存储介质
CN110908638A (zh) 一种操作流创建方法及电子设备
CN114979785A (zh) 视频处理方法和相关装置
CN110784762B (zh) 一种视频数据处理方法、装置、设备及存储介质
CN111915744A (zh) 增强现实图像的交互方法、终端和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20774304

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20774304

Country of ref document: EP

Kind code of ref document: A1