CN114731458A - Video processing method, video processing apparatus, terminal device, and storage medium - Google Patents

Video processing method, video processing apparatus, terminal device, and storage medium Download PDF

Info

Publication number
CN114731458A
CN114731458A CN202080075426.7A CN202080075426A CN114731458A CN 114731458 A CN114731458 A CN 114731458A CN 202080075426 A CN202080075426 A CN 202080075426A CN 114731458 A CN114731458 A CN 114731458A
Authority
CN
China
Prior art keywords
video
template
pit
determining
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080075426.7A
Other languages
Chinese (zh)
Inventor
刘志鹏
李熠宸
朱高
朱梦龙
蒋金峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SZ DJI Technology Co Ltd
Original Assignee
SZ DJI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SZ DJI Technology Co Ltd filed Critical SZ DJI Technology Co Ltd
Publication of CN114731458A publication Critical patent/CN114731458A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/262Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

A video processing method, a video processing apparatus, a terminal device, and a storage medium, wherein the method includes: determining a template according to video information of a video material to be processed, wherein the template at least comprises one video pit (S301); determining a video segment corresponding to the video pit position according to the pit position information of the video pit position in the template to obtain a matching relation corresponding to the template, wherein the video segment is a segment in the video material to be processed (S302); and filling the video segments into corresponding video pit positions of the template according to the matching relation corresponding to the template to obtain a recommended video (S303).

Description

Video processing method, video processing apparatus, terminal device, and storage medium
Technical Field
The present application relates to the field of video processing technologies, and in particular, to a video processing method, a video processing apparatus, a terminal device, and a storage medium.
Background
At present, video content has become a mainstream of self-media, and users share daily life by shooting short videos. In order to obtain short videos with rich contents, a user can freely combine various video materials shot by a shooting device (an unmanned aerial vehicle, a handheld holder, a camera or a mobile phone), and clip and combine a plurality of video materials into one video to be published on a social network site. Currently, most video clipping schemes still require user involvement, and an effective solution for automatically clipping videos is lacking.
Disclosure of Invention
Based on the above, the application provides a video processing method, a video processing device, a terminal device and a storage medium, wherein the video processing method is used for processing video materials to be processed based on a template, and aims to reduce the workload of a user in video editing and provide diversified recommended videos.
In a first aspect, the present application provides a video processing method, including:
determining a template according to video information of a video material to be processed, wherein the template at least comprises a video pit position;
determining a video segment corresponding to the video pit position according to the pit position information of the video pit position in the template to obtain a matching relation corresponding to the template, wherein the video segment is a segment in the video material to be processed;
and filling the video segments into corresponding video pit positions of the template according to the matching relation corresponding to the template to obtain a recommended video.
In a second aspect, the present application further provides a video processing method, including:
constructing a stream network diagram according to the video clips of the video material to be processed and the video pit positions of the template;
determining a matching relation corresponding to the video clip and the video pit bit based on the stream network diagram;
filling the video segments into corresponding video pit positions of the template according to the matching relation to obtain a recommended video;
the stream network graph comprises a plurality of nodes, and each node corresponds to the matching relation between one video clip and one video pit bit.
In a third aspect, the present application further provides a video processing method, including:
acquiring a plurality of templates, wherein the templates at least comprise one video pit bit;
matching video clips for the video pit positions of each template to obtain a matching relation corresponding to each template, and determining a matching score of the matching relation corresponding to each template, wherein the video clips are clips of video materials to be processed;
determining a recommended template from the plurality of templates according to the matching score;
and filling the video clips into corresponding video pit positions of the recommendation template according to the matching relation corresponding to the recommendation template to obtain a recommendation video.
In a fourth aspect, the present application further provides a video processing method, including:
according to video information of a video material to be processed, segmenting the video material to be processed to generate a plurality of video segments;
determining video clips of all video pit positions to be filled in the template according to the pit position information of the video pit positions of the template to obtain a matching relation corresponding to the template;
and filling the video segments into corresponding video pit positions of the template according to the matching relation corresponding to the template to obtain a recommended video.
In a fifth aspect, the present application further provides a video processing apparatus comprising a processor and a memory;
the memory is used for storing a computer program;
the processor, configured to execute the computer program and when executing the computer program, implement:
determining a template according to video information of a video material to be processed, wherein the template at least comprises a video pit position;
determining a video segment corresponding to the video pit position according to the pit position information of the video pit position in the template to obtain a matching relation corresponding to the template, wherein the video segment is a segment in the video material to be processed;
and filling the video segments into corresponding video pit positions of the template according to the matching relation corresponding to the template to obtain a recommended video.
In a sixth aspect, the present application further provides a video processing apparatus comprising a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program and, when executing the computer program, to implement:
constructing a stream network diagram according to the video clips of the video material to be processed and the video pit positions of the template;
determining a matching relation corresponding to the video clip and the video pit bit based on the stream network diagram;
filling the video segments into corresponding video pit positions of the template according to the matching relation to obtain a recommended video;
the flow network diagram comprises a plurality of nodes, and each node corresponds to the matching relation between one video clip and one video pit bit.
In a seventh aspect, the present application further provides a video processing apparatus, which includes a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program and, when executing the computer program, to implement:
acquiring a plurality of templates, wherein the templates at least comprise one video pit bit;
matching video clips for the video pit positions of each template to obtain a matching relation corresponding to each template, and determining a matching score of the matching relation corresponding to each template, wherein the video clips are clips of video materials to be processed;
determining a recommended template from the plurality of templates according to the matching score;
and filling the video clips into corresponding video pit positions of the recommendation template according to the matching relation corresponding to the recommendation template to obtain a recommendation video.
In an eighth aspect, the present application further provides a video processing apparatus comprising a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program and, when executing the computer program, to implement:
according to video information of a video material to be processed, segmenting the video material to be processed to generate a plurality of video segments;
determining video clips of all video pit positions to be filled in the template according to the pit position information of the video pit positions of the template to obtain a matching relation corresponding to the template;
and filling the video segments into corresponding video pit positions of the template according to the matching relation corresponding to the template to obtain a recommended video.
In a ninth aspect, the present application further provides a terminal device, where the terminal device includes a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program and, when executing the computer program, to implement:
determining a template according to video information of a video material to be processed, wherein the template at least comprises a video pit position;
determining a video segment corresponding to the video pit position according to the pit position information of the video pit position in the template to obtain a matching relation corresponding to the template, wherein the video segment is a segment in the video material to be processed;
and filling the video segments into corresponding video pit positions of the template according to the matching relation corresponding to the template to obtain a recommended video.
In a tenth aspect, the present application further provides a terminal device, which includes a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program and, when executing the computer program, to implement:
constructing a stream network diagram according to the video clips of the video material to be processed and the video pit positions of the template;
determining a matching relation corresponding to the video clip and the video pit bit based on the stream network diagram;
filling the video segments into corresponding video pit positions of the template according to the matching relation to obtain a recommended video;
the stream network graph comprises a plurality of nodes, and each node corresponds to the matching relation between one video clip and one video pit bit.
In an eleventh aspect, the present application further provides a terminal device, which includes a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program and, when executing the computer program, to implement:
acquiring a plurality of templates, wherein the templates at least comprise one video pit bit;
matching video clips for the video pit positions of each template to obtain a matching relation corresponding to each template, and determining a matching score of the matching relation corresponding to each template, wherein the video clips are clips of video materials to be processed;
determining a recommended template from the plurality of templates according to the matching score;
and filling the video clips into corresponding video pit positions of the recommendation template according to the matching relation corresponding to the recommendation template to obtain a recommendation video.
In a twelfth aspect, the present application further provides a terminal device, where the terminal device includes a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program and, when executing the computer program, to implement:
according to video information of a video material to be processed, segmenting the video material to be processed to generate a plurality of video segments;
determining video clips of all video pit positions to be filled in the template according to the pit position information of the video pit positions of the template to obtain a matching relation corresponding to the template;
and filling the video segments into corresponding video pit positions of the template according to the matching relation corresponding to the template to obtain a recommended video.
In a thirteenth aspect, the present application further provides a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to implement the steps of the video processing method as described above.
The embodiment of the application provides a video processing method, a video processing device, terminal equipment and a storage medium, which are used for rapidly slicing video materials to be processed, reducing the workload of video editing, increasing the diversity of recommended videos and improving the user experience.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a flowchart illustrating steps of a video processing method according to an embodiment of the present application;
FIG. 2 is a flow diagram of sub-steps of a video processing method provided in FIG. 1;
FIG. 3 is a flowchart illustrating the steps of clustering segments of a first video segment according to an embodiment of the present disclosure;
FIG. 4 is a flow chart of training an image feature network model provided by an embodiment of the present application;
FIG. 5 is a diagram illustrating similarity calculation results provided in an embodiment of the present application;
FIG. 6 is a flow chart illustrating steps of another video processing method according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a flow network graph constructed in accordance with an embodiment of the present application;
FIG. 8 is a flow chart illustrating steps of another video processing method according to an embodiment of the present application;
fig. 9 is a schematic diagram of a source of pending video material provided by an embodiment of the present application;
FIG. 10 is a flow diagram of sub-steps of a video processing method provided in FIG. 8;
FIG. 11 is a flowchart of the steps provided by an embodiment of the present application to determine a matching plurality of templates from video tags;
fig. 12 is a flowchart illustrating steps for filling video clips into corresponding video pit bits according to an embodiment of the present disclosure;
FIG. 13 is a flow chart illustrating steps of another video processing method according to an embodiment of the present application;
fig. 14 is a flowchart illustrating steps for obtaining a matching relationship for video pit bit matching video clips according to an embodiment of the present disclosure;
FIG. 15 is a flowchart of the steps provided by an embodiment of the present application to determine a recommended template based on a match score;
FIG. 16 is a flowchart of sub-steps provided in FIG. 15 to determine a recommendation template based on the match scores;
FIG. 17 is a flow diagram of sub-steps of a video processing method provided in FIG. 13;
FIG. 18 is a flowchart providing steps for determining a recommended template based on template type according to embodiments of the present application;
FIG. 19 is a schematic diagram of selecting a recommended template from a set of templates provided by an embodiment of the present application;
fig. 20 is a schematic block diagram of a video processing apparatus provided in an embodiment of the present application;
fig. 21 is a schematic block diagram of a terminal device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Referring to fig. 1, fig. 1 is a flowchart illustrating steps of a video processing method according to an embodiment of the present disclosure. The video processing method can be applied to terminal equipment or cloud equipment and is used for synthesizing the video material to be processed and a preset template. The terminal equipment comprises a mobile phone, a tablet, a notebook computer and the like.
Specifically, as shown in fig. 1, the video processing method includes steps S101 to S103.
S101, according to video information of a video material to be processed, the video material to be processed is divided to generate a plurality of video segments.
The video material to be processed is used for being synthesized with a preset template, so that a recommended video is generated for a user, and the preset template comprises at least one video pit position for filling a video clip. And segmenting the video material to be processed according to the video information of the video material to be processed to obtain a plurality of video segments so as to fill the video segments into video pits of the template and generate a recommended video.
In one embodiment, the video material to be processed may include video material from multiple sources, for example, video material captured through a handheld terminal, video material captured through a movable platform, video material obtained from a cloud server, video material obtained from a local server, and the like.
Wherein, the handheld end can be cell-phone, flat board and motion camera etc. for example, and the movable platform can be unmanned aerial vehicle etc. for example. Wherein, unmanned aerial vehicle can be for rotating wing type unmanned aerial vehicle, for example four rotor unmanned aerial vehicle, six rotor unmanned aerial vehicle, eight rotor unmanned aerial vehicle, also can be fixed wing unmanned aerial vehicle. This unmanned aerial vehicle is last to have camera equipment.
The video materials of different sources are summarized, and the video materials of different sources are clipped together through mixed clipping, so that the video diversity of the obtained recommended video is increased.
In an embodiment, the video information comprises at least one of a direction of movement of the mirror and scene information. Referring to fig. 2, step S101 includes step S1011 and step S1012.
S1011, segmenting the video material to be processed according to the video information of the video material to be processed to obtain a plurality of first video segments.
When the video material to be processed is subjected to primary segmentation according to the mirror moving direction of the video material to be processed, the mirror moving direction in each obtained first video segment is the same or similar, and the change of the mirror moving direction does not exist.
Specifically, the video material to be processed may be segmented according to a change in the direction of moving the mirror in the video material to be processed, so as to generate a plurality of first video segments. For example, the video material to be processed includes consecutive forward and backward mirrors, the video material to be processed may be divided into two first video segments according to a change of the forward and backward mirror moving direction, the forward mirror moving direction of one first video segment is forward, the backward mirror moving direction of the other first video segment is backward, and the mirror moving direction in each first video segment is the same. In a specific implementation process, the change of the mirror moving direction in the video material to be processed can be judged through some algorithms for detecting the mirror moving direction.
When the video material to be processed is subjected to primary segmentation according to the scene information of the video material to be processed, the obtained scenes in each first video segment are similar. For example, the video material to be processed includes a snow mountain image, and the video material to be processed may be divided into a plurality of first video segments according to whether the video material to be processed includes the snow mountain image and a similar image of the snow mountain image.
In addition, when the video material to be processed is divided, the video material to be processed can be divided according to the same main body in the video material to be processed. Wherein the subject may be a target person or a pet, etc. For example, a cat continuously appears in the video material to be processed, and at this time, the video material to be processed may be segmented according to whether there is the cat in the shot of the video material to be processed, so as to segment the video material to be processed into a plurality of first video segments.
And S1012, clustering and segmenting the first video segments to obtain a plurality of second video segments.
After the first video segment is obtained, the first video segment may be segmented for the second time to obtain a plurality of second video segments, and the second video segments are used as video segments of video pit positions to be filled in the template. Wherein clustering segmentation comprises clustering with similar scenes.
It should be noted that the first video segment may be selectively clustered and segmented, that is, when the first video segment meets a certain condition, the first video segment is clustered and segmented, otherwise, the first video segment may not be clustered and segmented.
In an embodiment, before clustering and segmenting the first video segments to obtain a plurality of second video segments, determining whether a first video segment with a video duration greater than a preset duration exists in the plurality of first video segments; and if a first video segment with the video time length larger than the preset time length exists, executing the step of clustering and segmenting the first video segment.
Before clustering and segmenting the first video segments, whether the first video segments with the video duration larger than the preset duration exist in the obtained multiple first video segments is judged. The preset duration is an empirical value and can be adjusted according to experience or the duration of the video pit position of the template.
And if at least one first video segment with the video time length larger than the preset time length exists in the plurality of first video segments, performing second segmentation on the first video segments with the video time length larger than the preset time length. And for the first video segment with the video duration less than or equal to the preset duration, the first video segment can not be subjected to second segmentation.
In an embodiment, the step of clustering and segmenting the first video segment with reference to fig. 3 specifically includes steps S1012a to S1012 c.
And S1012a, determining a sliding window and a clustering center.
When clustering segmentation is carried out on a first video segment, firstly, a sliding window and a clustering center are determined, wherein the sliding window is used for determining a current video frame to be processed, and the clustering center is used for determining a video segmentation point of the first video segment. In the process of clustering and segmenting the first video segment, the clustering center comprises the image characteristics of the first frame video frame of the first video segment.
Wherein the image characteristics of the first frame video frame of the first video segment may include image coding characteristics of the first frame video frame of the first video segment. The image characteristics are obtained according to a pre-trained image characteristic network model. The image feature network model is capable of outputting image features for respective video frames in the first video segment.
Specifically, for a first video segment, the larger the sliding window is, the fewer video frames are subjected to similar clustering processing during clustering segmentation, and the faster the segmentation speed is; the smaller the sliding window is, the more video frames are subjected to similar clustering processing when clustering segmentation is performed, and the slower the segmentation speed is. Therefore, the size of the sliding window can be set based on this principle. In one embodiment, the size of the sliding window is equal to 1.
In an embodiment, the size of the sliding window is related to the duration of the first video segment. When the duration of the first video segment is longer, in order to perform cluster segmentation on the first video segment quickly, a larger value may be set for the sliding window to increase the speed of cluster segmentation, for example, the size of the sliding window is set to 3. And when the duration of the first video segment is short, a small value may be set for the sliding window, for example, the size of the sliding window is set to 1.
In one embodiment, the size of the sliding window is related to a desired segmentation speed set by the user. When the user desires to rapidly segment the first video segment, a larger value can be set for the sliding window, so that the segmentation speed is increased. When the user desires to slowly segment the first video segment, a smaller value may be set for the sliding window, reducing the segmentation speed.
In one embodiment, the size of the sliding window is related to the fineness of the segmentation. The smaller the sliding window, the more video frames are processed when clustering segmentation is performed, the finer the segmentation is, and therefore, the size of the sliding window can be set according to the requirement on the degree of segmentation fineness.
In one embodiment, the image feature network model may be trained in advance. The training flow chart may be as shown in fig. 4, and the training process may be:
and preparing a training set, labeling the video materials in the training set, and labeling the video materials in the same scene in the training set into one type. During training, three video materials are respectively input into three convolutional neural networks, wherein the convolutional neural networks can be CNN networks, two of the three video materials are in the same class, and one video material is in other classes.
The image characteristics of the three video materials are generated through a convolutional neural network, and the triplet loss is utilized to measure whether the image distance between the video materials of the same class is smaller than the image distance between the video materials of different classes, so that corresponding loss values are generated. Iterative training is carried out on the convolutional neural network based on the loss value, and the weight of the convolutional neural network is continuously adjusted, so that the convolutional neural network can learn discriminative image characteristics.
And after the training of the convolutional neural network is finished, taking the trained convolutional neural network as an image feature network model for outputting the image features of all the video frames in the first video segment.
S1012b, based on the clustering center, performing clustering analysis on the video frames of the first video clip according to the sliding window, and determining video segmentation points.
After the clustering center is determined, the current video frame to be processed is determined according to the sliding window, and then the current video frame to be processed and the clustering center are subjected to clustering analysis, so that a video segmentation point is determined.
For example, in the first clustering, the clustering center is set as the image feature of the first frame video frame, and the first sliding is performed. When the sliding window is N, the current video frame to be processed during the first sliding is the N +1 th frame video frame.
In one embodiment, step S1012b specifically includes: determining a current video frame according to the sliding window, and determining the similarity between the image characteristics of the current video frame and the clustering center; if the similarity is smaller than a preset threshold value, taking the current video frame as a video segmentation point, and re-determining a clustering center; and continuously determining the video segmentation point according to the re-determined clustering center until the last video frame of the first video clip.
Specifically, when a first video segment is clustered for the first time, a clustering center is initialized and a first sliding is performed. Setting the clustering center as the image characteristic of the first frame video frame of the first video segment, and recording as C0. The size of the sliding window is marked as N, at this time, the current video frame to be processed determined during the first sliding is the N +1 th frame video frame of the first video clip, and the current video frame to be processed determined during the mth sliding is the mth N +1 th frame video frame of the first video clip. Where m is the number of sliding of the sliding window.
After the first sliding, inputting the current video frame to be processed, namely the (N + 1) th frame of the first video segment into a pre-trained image feature network model to obtain the image feature F of the current frameN+1Calculating the image feature C of the cluster center0Image feature F of current frameN+1The similarity between them.
If the similarity is smaller than the preset threshold, it is considered that the video content change between the first frame video frame of the first video clip and the (N + 1) th frame video frame of the first video clip is large, the (N + 1) th frame of the first video clip can be used as a video segmentation point, and the video frame before the (N + 1) th frame of the first video clip is segmented into a second video clip. Then the image characteristic F of the N +1 th frame of the first video segmentN+1And as the re-determined clustering center, continuously determining the next video segmentation point until reaching the last video frame of the first video segment.
Wherein, the similarity can be cosine similarity between image features, and the cosine similarity between the image features obtained by calculation is used as the image feature C of the clustering center0Image feature F of current frameN+1The similarity between them. The preset threshold may be an empirical value set in advance.
In an embodiment, the re-determining the cluster center includes: and taking the image characteristics of the current video frame as the re-determined clustering center.
And when the similarity is smaller than a preset threshold value, taking the current video frame as a video segmentation point, and segmenting the video frame before the current video frame into a second video segment. At this time, when performing cluster analysis on the remaining first video segment, since the current video frame is the first frame of the remaining first video segment, the image feature of the current video frame is taken as the newly determined cluster center.
In an embodiment, after determining the similarity between the image feature of the current video frame and the clustering center, if the similarity is greater than or equal to a preset threshold, updating the clustering center; and continuously determining the similarity between the image characteristics of the current video frame and the updated clustering center according to the updated clustering center.
If the image characteristic C of the clustering center is calculated after the first sliding0Image feature F of current frameN+1If the similarity between the first frame video frame of the first video segment and the (N + 1) th frame video frame of the first video segment is larger than the preset threshold value, the video content change between the first frame video frame of the first video segment and the (N + 1) th frame video frame of the first video segment is not large, the clustering center is updated, and the similarity between the image feature of the current video frame to be processed at the moment and the updated clustering center is determined according to the updated clustering center.
For example, after the first sliding, an average value of image features from the first frame video frame to the (N + 1) th frame video frame of the first video clip may be calculated according to the image features of the first frame video frame of the first video clip and the image features of the (N + 1) th frame video frame of the first video clip, and the calculated average value may be used as the image features of the updated cluster center.
In an embodiment, the updating the cluster center includes: acquiring image characteristics of the current video frame; and determining an updated clustering center according to the image characteristics of the current video frame and the clustering center.
And acquiring the image characteristics of the current video frame, then calculating the average value of the image characteristics from the first frame video frame to the current video frame according to the image characteristics of the clustering center and the image characteristics of the current video frame, and taking the calculated average value as the updated image characteristics of the clustering center.
Wherein, the formula for determining the updated cluster center may be:
Figure BDA0003619997670000121
wherein m is the clustering number, N is the size of the sliding window, CmIs the image feature of the cluster center C updated after the m-th clustering, Fm*N+1The image characteristics of the current video frame at the m-th clustering.
For example, there are 10 video frames in the first video segment, when the value of the sliding window N is 1, and the first clustering is performed on the first video segment, the clustering center C is initialized to be the first frame, and the image feature of the clustering center is C0And performing a first sliding, wherein the current video frame determined by the first sliding is a second frame, and the image characteristic of the second frame is F2Comparing image features C of cluster centers0Image feature F of the second frame2If the similarity is larger than a preset threshold value, the clustering center is determined again, and the image characteristic C of the clustering center C after the first clustering is carried out at the moment1=(C0*1*1+F2)/2。
Then, starting the second clustering and performing the second sliding, wherein the current video frame determined by the second sliding is a third frame, and the image characteristic of the third frame is F3Comparing the image characteristics C of the clustering center C after the first clustering1Image feature F of the third frame3If the similarity is greater than the preset threshold value, the clustering center is determined again, and the image characteristic C of the clustering center C after the second clustering is carried out at the moment2=(C1*2*1+F3)/3. And so on until the 10 th video frame of the first video segment.
And S1012c, performing video segmentation on the first video segment according to the video segmentation points.
And after the video segmentation point is determined, video segmentation is carried out on the first video segment according to the video segmentation point to obtain a plurality of second video segments. When the video is divided, the current video frame is taken as a video dividing point, and the video frame before the current video frame is taken as a second video segment to be divided.
For example, as shown in fig. 5, a diagram of the similarity calculation result is shown. The first video clip has 2300 frames in total, the 490 frame is a video dividing point, the 1100 frame is a video dividing point, when the video is divided, the first video clip is divided at the 490 frame and the 1100 frame respectively to obtain three second video clips, the 1 st frame to the 489 th frame of the first video clip are respectively used as a second video clip, the 490 frame to the 1099 th frame are used as a second video clip, and the 1100 th frame to the 2300 th frame are used as a second video clip.
S102, according to the pit information of the video pits of the template, determining the video segments of all the video pits to be filled in the template, and obtaining the matching relation corresponding to the template.
The matching relation corresponding to the template is the corresponding relation between the video pit positions in the template and the corresponding video clips.
After the template corresponding to the video material to be processed is determined, the template at least comprises one video pit position, so that the video clip corresponding to the video pit position can be determined according to the pit position information of the video pit position in the template, and the matching relation corresponding to the template is obtained.
Specifically, the pit information includes at least one of pit music and a pit label. In some embodiments, the pit bit tags may be preset, and one pit bit tag may be preset for each video pit bit.
The pit bit label comprises at least one of a moving mirror direction, a scene type, a pit bit theme, a video theme of a video clip to be filled with the video pit bit, the size and the position of a target object of a single video frame in the video clip to be filled with the video pit bit, the size and the position of a target object of a continuous video frame in the video clip to be filled with the video pit bit, and the similarity of adjacent video frames in the video clip to be filled with the video pit bit.
The pit music of the video pit can be a fragment of the template music of the whole template, and the template comprises a plurality of video pit positions which are sequentially combined into one template, so that the template music of the template can be split according to the sequence and the duration of the video pit positions, and the pit music corresponding to each video pit position is obtained.
In one embodiment, the template may be divided into a plurality of video pit bits, and each video pit bit needs to be filled with one video clip. The template can be segmented according to template music of the template, and the template can be segmented according to a plurality of segmented video clips.
For example, the template music of the template may be divided into multiple segments at certain time intervals, each segment of music corresponds to one video pit, and the duration of the segment of music is equal to the duration of the video pit.
For another example, the music structure of the template music of the template can be divided into a plurality of sections according to the master song, the refrain, the transition sentence, the popular sentence, the interlude and the like, each section of music corresponds to one video pit, and the duration of the section of music is equal to the duration of the video pit.
For another example, the music may be divided into multiple sections according to the rhythm of the template music of the template, where each section of music corresponds to a video pit, and the duration of the section of music is equal to the duration of the video pit.
Specifically, since the time lengths of the obtained multiple video segments may be different when the video material to be processed is divided, the template may be divided according to the time lengths of the video segments, so that the time length of the video pit bit obtained by dividing can be exactly equal to the time length of the video segment.
For example, when the video material to be processed is divided, 30 seconds of the video material to be processed is divided into three video segments, and the time lengths of the three video segments are 5 seconds, 15 seconds and 10 seconds respectively. At this time, the template may be divided into corresponding video pit positions according to the duration of the three video clips, so as to fill the video clips into the corresponding video pit positions.
When the template is segmented according to the duration of the video clips, the template can be segmented according to the wonderful level of the video clips due to the fact that a plurality of video clips exist.
For example, at least one video clip with the highest highlight level is filled in one template, so that at least one video pit with the same duration as the video clip with the highest highlight level can be segmented in the template for filling the video clip with the highest highlight level.
In another embodiment, the template may be segmented according to its shot, scene or scene, etc.
In an embodiment, the determining, according to the pit bit information of the video pit bits in the template, a video segment corresponding to the video pit bits includes: and determining the video clip corresponding to the video pit according to the pit music of the video pit in the template.
Since the pit music itself of a video pit has a sense of tempo, the pit music of the video pit is, for example, a slow tempo or a burst tempo. Therefore, the video clip can be matched with the video clip according to the rhythm of the pit bit music and the video content of the video clip.
For example, a piece of music is broken out, and if there is a picture with strength such as a fountain in a video clip, the two match well. In addition, if the rhythm is a slow rhythm, the method is suitable for slow movement of the character; if the picture is a large Shish picture, the picture is suitable for large-scene delayed photography and the like.
In an embodiment, the determining, according to pit music of video pits in the template, a video segment corresponding to the video pit includes: determining the matching degree of pit music of the video pits in the template and the video clips; and determining the video clip corresponding to the video pit position in the template according to the matching degree.
Matching the rhythm of pit music of the video pit with the video clip, and determining the matching degree between the pit music of the video pit and the video clip, thereby determining the video clip corresponding to the video pit in the template according to the matching degree.
Specifically, a threshold of the matching degree may be set, and when the matching degree exceeds the threshold, it is determined that the video pit bit matches the video clip; or selecting the video segment with the highest matching degree with the video pit bits as the video segment corresponding to the video pit bits according to the matching degree.
In an embodiment, the matching degree of the pit music of the video pit in the template and the video segment is obtained by using a pre-trained music matching model, and the music matching model can output the matching degree score of the pit music of the video pit in the template and the video segment.
The matching degree between the pit music and the video clips of the video pit is output by training the neural network to learn the matching degree between the pit music and the video clips and then taking the neural network obtained by training as a music matching model.
In an embodiment, after determining the video segments corresponding to the video pit bits according to the pit bit information of the video pit bits in the template, determining the shooting quality of a plurality of video segments corresponding to the video pit bits in the template; determining an optimal video clip corresponding to a video pit position in the template according to the shooting quality of the plurality of video clips; and obtaining the matching relation corresponding to the template according to the optimal video clip corresponding to the video pit position in the template.
Wherein the shooting quality of the video clip is determined according to the image content of the video clip and the video clip evaluation. The image content includes shot stability, color saturation, whether there are main shooting objects and the information content in the shot, and the like, and the evaluation of the video clip includes the aesthetic scoring of the video clip.
In particular, the aesthetic scoring of the video segments may take into account factors such as color, composition, mirror motion, and scene type to aesthetically score the video segments. And scoring the shooting quality of the video clip based on the image content of the video clip and the video clip evaluation, wherein the larger the score is, the higher the shooting quality of the video clip is.
And selecting a video clip with the highest shooting quality from the video clips as the optimal video clip corresponding to the video pit position in the template according to the shooting quality of the plurality of video clips, thereby obtaining the matching relation corresponding to the template.
In an embodiment, after determining the video segments corresponding to the video pit bits according to the pit bit information of the video pit bits in the template, determining a matching degree between the video segments corresponding to two adjacent video pit bits in the template; determining an optimal video clip corresponding to the video pit bit according to the matching degree; and obtaining the matching relation corresponding to the template according to the optimal video clip corresponding to the video pit position.
And the matching degree of the video segments corresponding to the two adjacent video pit positions is determined according to the continuity of the moving mirror direction of the video segments, the increment and decrement relation of the scene and the matching clip.
The consistency of the mirror moving direction comprises the fact that the mirror moving directions of the video clips corresponding to the two adjacent video pit positions are the same, and the video clips in opposite mirror moving directions are prevented from being connected together. The incremental decreasing relationship of the view level includes, for example, from a distant view to a middle view to a near view, or from a near view to a middle view to a distant view, or directly from a distant view to a near view, and so on. Matching clips involves matching two shots by similar actions, graphics, colors, etc. to achieve a coherent, fluent narrative for transition transitions between two video segments.
In an embodiment, the determining, according to the pit bit information of the video pit bits in the template, a video segment corresponding to the video pit bits includes: and determining a video clip corresponding to the video pit according to the pit label of the video pit in the template.
And each video pit position in the template is provided with a pit position label, and the labels can be matched according to the pit position label of the video pit position in the template and the label of the video clip, so that the successfully matched video clip is used as the video clip corresponding to the video pit position.
In an embodiment, the determining, according to the pit bit label of the video pit bit in the template, a video clip corresponding to the video pit bit includes: and determining a video label of the video clip, and taking the video clip corresponding to the video label matched with the pit bit label of the video pit bit as the video clip to be filled in the video pit bit.
And extracting labels of the video clips to determine the video labels of the video clips, and then determining the video clips corresponding to the video pit bits according to the pit bit labels of the video pit bits of the template.
S103, filling the video segments into corresponding video pit positions of the template according to the matching relation corresponding to the template to obtain a recommended video.
And after the matching relation is obtained, respectively filling the video clips into corresponding video pit positions of the template according to the matching relation, and carrying out video synthesis to obtain a recommended video. And recommend the recommended video to the user.
In an embodiment, step S103 is specifically to determine whether the video duration of the video segment is greater than the duration of the video pit bit; and if the video duration of the video clip is greater than the duration of the video pit bit, performing clip extraction on the video clip to obtain a selected clip.
And when the video clip is filled into the corresponding video pit position according to the matching relation, judging whether the video time length of the video clip is greater than the time length of the video pit position or not. Therefore, when the video duration of the video clip is greater than the duration of the video pit bit, the video clip cannot be directly filled into the corresponding video pit bit, and the clip with the corresponding duration needs to be extracted from the video clip and filled into the corresponding video pit bit.
And selecting the video time length of the segment to be less than or equal to the time length of the video pit bit. In a specific implementation process, in order to fill the selected segment into the corresponding video pit position and ensure the integrity of the obtained recommended video, when the selected segment is determined, the video duration of the selected segment can be equal to the duration of the video pit position.
In an embodiment, the extracting the video segments to obtain the selected segments includes: and according to the video elements of the video clips, carrying out clip extraction on the video clips to obtain selected clips.
When the video clip is extracted, the video clip can be extracted according to the video elements of the video clip to obtain the selected clip.
Wherein the video elements include at least one of a smiling face picture, a smiling audio, a character motion, a clear human voice, a picture composition, and an aesthetic score. In extracting the selection segment, a more brilliant segment may be extracted from the video segments according to the video elements as the selection segment, for example, a segment including a smiling face picture or a segment with a higher aesthetic score as the selection segment, or the like.
In an embodiment, step S103 is specifically to fill the video segment into a corresponding video pit of the template according to the matching relationship corresponding to the template, so as to obtain an initial video; and performing image optimization on the initial video based on the template requirement of the template to obtain a recommended video.
And filling the video clips into corresponding video pit positions according to the matching relation corresponding to the template to obtain an initial video, then carrying out image optimization on the initial video according to the template requirement, and recommending the video after the image optimization to a user as a recommended video. The template requirements comprise at least one of transition setting, acceleration and deceleration setting and mapping special effect setting.
In an embodiment, the source of the video material to be processed can be an aerial video shot by the unmanned aerial vehicle, and the distance between the camera and the shot object is relatively long during aerial shooting, so that the picture change content is small. Therefore, when the aerial video is filled into the corresponding video pit position, the speed of the change of the picture can be automatically identified, the aerial video is automatically subjected to speed regulation according to the speed of the change of the picture, and then the aerial video subjected to speed regulation is filled into the corresponding video pit position. The speed of the picture change can be obtained by analyzing a plurality of continuous frames within a preset time.
In one embodiment, when determining the matching relationship between the video pit positions of the template and the video clips, for the identified aerial videos, the aerial videos can be placed in the first few video pit positions and/or the last few video pit positions of the template, so that the quality of the obtained recommended videos is improved.
The video processing method provided in the above embodiment obtains a plurality of video segments by dividing a video material to be processed, then determines, according to pit information of video pits of a template, a video segment of each video pit to be filled in the template, obtains a matching relationship corresponding to the template, and finally fills the video segment in the corresponding video pit according to the matching relationship, so as to obtain a recommended video. And (3) dividing the video material to be processed into a plurality of video segments with shorter time length, so that the video segments can be smoothly filled into video pits to synthesize a recommended video.
Referring to fig. 6, fig. 6 is a flowchart illustrating steps of another video processing method according to an embodiment of the present disclosure.
Specifically, as shown in fig. 6, the video processing method includes steps S201 to S203.
S201, constructing a stream network diagram according to the video clips of the video material to be processed and the video pit positions of the template.
The video material to be processed comprises a plurality of video segments, the template comprises at least one video pit position, and each video pit position is filled with one video segment.
In an embodiment, the video segmentation method may also be used to segment a video material to be processed into a plurality of video segments.
The stream network graph comprises a plurality of nodes, and each node corresponds to the matching relation between one video clip and one video pit bit.
The stream network map is constructed according to the video segments and the template pit positions, as shown in fig. 7, which is a schematic diagram of the constructed stream network map. Wherein the left longitudinal axis CmRepresenting video clips, upper horizontal axis SnVideo pit bits representing the template. And adding a source node and an end point in the flow network diagram, wherein the S node is the source node, and the T node is the end point. In the flow network diagram, coordinates (x, y) indicate nodes located on the horizontal axis x and the vertical axis y. For example, N (1, 1) represents the upper left (S)1,C1) N (N, m) represents the lower right (S)n,Cm) The node of (2).
The number of templates may be one or more. When the number of the templates is one, a flow network graph is constructed, and when the number of the templates is multiple, a flow network graph is constructed for each template.
S202, determining the matching relation between the video clip and the video pit bit based on the stream network diagram.
After the flow network graph is drawn, the matching relationship between the video clips in the template and the video pit positions can be determined based on each node in the flow network graph. And taking a source node in the flow network diagram as a starting point of a path, wherein the path passes through a node corresponding to each video pit bit, and taking an end point of the flow network diagram as an end point of the path, wherein the path is a matching relation between the video clip in the template and the video pit bit.
In an embodiment, the determining, based on the stream network map, a matching relationship between the video segment and the video pit bit includes: matching appropriate video clips for the video pit positions of the template based on a maximum flow algorithm to obtain an optimal path; and taking the corresponding relation between the video clip and the video pit position in the optimal path as the matching relation between the video pit position of the template and the video clip.
Referring to fig. 7, the arrows in fig. 7 indicate the paths between two adjacent nodes in the flow network diagram. One path from the source node S to the destination node T is a matching relationship between the video clip and the video pit bit.
Wherein, the maximum flow algorithm means that for a template, video pit positions S are sequentially arranged1To SnThe n video pit bits are filled with the appropriate video segments so that the total energy value of the whole path from the source node S to the destination node T is maximized. And matching the video pit bit of the template with a proper video clip by using a maximum stream algorithm, so that the problem of selecting the video clip is converted into the problem of solving the maximum energy from the source node S to the terminal point T.
In an embodiment, the matching of the video pit bits of the template with appropriate video segments based on the max flow algorithm to obtain an optimal path includes: and determining the optimal path corresponding to the template according to the energy value between two adjacent nodes in the flow network graph.
Wherein, the energy value of any path in the flow network graph can be represented by V, and any path includes a path between two nodes. For example, V ((x, y), (x +1, y + k)) represents the energy value from node (x, y) to node (x +1, y + k). And determining the optimal path corresponding to the template by calculating the energy value between every two adjacent nodes in a path from the source node S to the destination T and taking the path with the maximum total energy value as the optimal path.
In one embodiment, the energy value between two adjacent nodes may be determined according to the energy value influence factor of each of the nodes.
The energy value influence factor comprises at least one of the shooting quality of the video segment corresponding to each video pit bit, the matching degree of each video pit bit and the corresponding video segment, and the matching degree of the video segments corresponding to two adjacent video pit bits.
Specifically, the shooting quality of the video segment corresponding to each video pit bit is determined according to the image content of the video segment and the video segment evaluation.
The image content comprises lens stability, color saturation, whether a main shooting object exists or not, the information content in the lens and the like, and the evaluation of the video clip comprises the aesthetic scoring of the video clip. The aesthetic scoring of the video segments may take into account factors such as color, composition, mirror motion, and scene. And scoring the shooting quality of the video clip based on the image content of the video clip and the video clip evaluation, wherein the larger the score is, the higher the shooting quality of the video clip is.
Specifically, the matching degree of each video pit and the corresponding video segment is determined according to the matching degree of the pit music of the video pit and the video segment.
The template music is preset in the template, the pit music of the video pits can be fragments in the template music of the whole template, and the template comprises a plurality of video pits, and the plurality of video pits are sequentially combined into one template, so that the template music of the template can be split according to the sequence and the duration of the video pits, and the pit music corresponding to each video pit is obtained.
The music itself has rhythm, and the section of pit music corresponding to each video pit in the template also has corresponding rhythm, for example, the pit music is a slow rhythm or a burst rhythm. Therefore, the template pit bit and the video clip can be matched according to the rhythm of the pit bit music and the video content of the video clip.
For example, a piece of music is broken out, and if there is a picture with strength such as a fountain in a video clip, the two match well. In addition, if the rhythm is a slow rhythm, the method is suitable for slow movement of the character; if the picture is a large Shish picture, the picture is suitable for large-scene delayed photography and the like.
In an embodiment, the matching degree of each video pit and the corresponding video segment is obtained by using a pre-trained music matching model, and the music matching model is capable of outputting a matching degree score of the pit music of the video pit and the video segment.
The matching degree between the pit music and the video clip can be learned through training the neural network, and then the trained neural network is used as a music matching model to output the matching degree score between the pit music and the video clip of the video pit.
And the matching degree of the video segments corresponding to the two adjacent video pit positions is determined according to the continuity of the moving mirror direction of the video segments, the increment and decrement relation of the scene and the matching clip.
The consistency of the mirror moving direction comprises the fact that the mirror moving directions of the video clips corresponding to the two adjacent video pit positions are the same, and the video clips in opposite mirror moving directions are prevented from being connected together. The incremental and decremental relationship of the view level includes, for example, from a distant view to a middle view to a near view, or from a near view to a middle view to a distant view, or directly from a distant view to a near view, etc. Matching clips include matching two shots by similar actions, graphics, colors, etc. to achieve a coherent, fluid narrative for transition transitions between two video segments.
In an embodiment, the matching degree of the video segments corresponding to the two adjacent video pit positions is obtained by using a pre-trained segment matching model, and the segment matching model can output the matching degree of the video segments filled in the two adjacent video pit positions.
The matching degree between the video segments corresponding to the two adjacent video pit positions can be learned through training the neural network, and then the neural network obtained through training is used as a segment matching model to output the matching degree score between the video segments corresponding to the two adjacent video pit positions.
In an embodiment, the determining an energy value between two adjacent nodes according to the energy value influence factor of each node includes: obtaining an evaluation score and a preset weight of the energy value influence factor; and determining the energy value between two adjacent nodes according to the evaluation score and the preset weight of the energy value influence factor.
For each node in the flow network graph, the evaluation score and the corresponding preset weight of the energy value influence factor of each node are obtained, and therefore the energy value between two adjacent nodes is determined. The preset weight includes weights corresponding to different energy value influence factors, and may be preset according to an empirical value.
For example, the energy value V in any path in the flow network graph is calculated by the following formula:
V=a*Eclop+b*Etemplate+c*Ematch
wherein E iscliDA represents the score of the shooting quality of the video segment corresponding to the video pit position, a represents the preset weight corresponding to the score of the shooting quality of the video segment corresponding to the video pit position, EtemplateRepresenting the matching degree of the video pit bit and the corresponding video segment, b representing the preset weight corresponding to the matching degree of the video pit bit and the corresponding video segment, EmatchC represents the sum of the matching degrees of the video segments corresponding to two adjacent video pit positions, and c represents the sum of the matching degrees of the video segments corresponding to two adjacent video pit positionsAnd presetting weight corresponding to the matching degree.
It should be noted that, when calculating the energy value between two adjacent nodes, if the duration of a video segment is less than the duration of a filled video pit bit, the energy value of the node is set to 0.
And S203, filling the video segments into corresponding video pit positions of the template according to the matching relation to obtain a recommended video.
After the matching relation between the video clip and the video pit positions is determined, the video clip is filled into the corresponding video pit positions according to the matching relation, so that a recommended video is synthesized, and the recommended video is recommended to a user.
In an embodiment, step S203 is specifically to determine whether a video duration of the video clip is greater than a duration of the video pit bit; and if the video duration of the video clip is greater than the duration of the video pit bit, performing clip extraction on the video clip to obtain a selected clip.
And when the video clip is filled into the corresponding video pit position according to the matching relation, judging whether the video time length of the video clip is greater than the time length of the video pit position or not. Therefore, when the video duration of the video clip is greater than the duration of the video pit bit, the video clip cannot be directly filled into the corresponding video pit bit, and the clip with the corresponding duration needs to be extracted from the video clip and filled into the corresponding video pit bit.
And selecting the video time length of the segment to be less than or equal to the time length of the video pit bit. In a specific implementation process, in order to fill the selected segment into the corresponding video pit position and ensure the integrity of the obtained recommended video, when the selected segment is determined, the video duration of the selected segment can be equal to the duration of the video pit position.
In an embodiment, the extracting the video segments to obtain the selected segments includes: and according to the video elements of the video clips, carrying out clip extraction on the video clips to obtain selected clips.
When the video clip is extracted, the video clip can be extracted according to the video elements of the video clip to obtain the selected clip.
Wherein the video elements include at least one of a smiling face picture, a smiling audio, a character motion, a clear human voice, a picture composition, and an aesthetic score. In extracting the selection segment, a more brilliant segment may be extracted from the video segments according to the video elements as the selection segment, for example, a segment including a smiling face picture or a segment with a higher aesthetic score as the selection segment.
In an embodiment, step S203 specifically includes filling the video segment into a corresponding video pit of the template according to the matching relationship, so as to obtain an initial video; and performing image optimization on the initial video based on the template requirement of the template to obtain a recommended video.
And filling the video clips into corresponding video pit positions according to the matching relation to obtain an initial video, then performing image optimization on the initial video according to the template requirement, and recommending the video subjected to image optimization to a user as a recommended video. The template requirements comprise at least one of transition setting, acceleration and deceleration setting and mapping special effect setting.
In an embodiment, the source of the video material to be processed can be an aerial video shot by the unmanned aerial vehicle, and the distance between the camera and the shot object is relatively long during aerial shooting, so that the picture change content is small. Therefore, when the aerial video is filled into the corresponding video pit position, the speed of the change of the picture can be automatically identified, the aerial video is automatically subjected to speed regulation according to the speed of the change of the picture, and then the aerial video subjected to speed regulation is filled into the corresponding video pit position. The speed of the picture change can be obtained by analyzing a plurality of continuous frames within a preset time.
In one embodiment, when determining the matching relationship between the video pit positions of the template and the video clips, for the identified aerial videos, the aerial videos can be placed in the first few video pit positions and/or the last few video pit positions of the template, so that the quality of the obtained recommended videos is improved.
The embodiment provides a video processing method, which includes constructing a stream network diagram by using video segments and template pit positions, determining a matching relationship between the video segments and the video pit positions based on the stream network diagram, and finally filling the video segments into corresponding video pit positions of a template according to the matching relationship to obtain a recommended video. The matching relation between the video clip and the video pit position is determined by adopting a mode of constructing a stream network diagram, the selection problem of the video clip is modeled into a maximum stream problem, and the accuracy and convenience of the determined matching relation between the video clip and the video pit position are improved.
Referring to fig. 8, fig. 8 is a flowchart illustrating steps of another video processing method according to an embodiment of the present disclosure.
Specifically, as shown in fig. 8, the video processing method includes steps S301 to S303.
S301, determining a template according to the video information of the video material to be processed.
The template at least comprises one video pit position, the video information of the video material to be processed comprises the video content of the video material to be processed, and the corresponding template is determined according to the video content of the video material to be processed. In a specific implementation process, the number of the templates may be one or multiple.
Specifically, as shown in fig. 9, the to-be-processed video material may include video materials from various sources, such as video material captured through a handheld terminal, video material captured through a movable platform, video material acquired from a cloud server, video material acquired from a local server, and the like.
Wherein, the handheld end can be cell-phone, flat board and motion camera etc. for example, and the movable platform can be unmanned aerial vehicle etc. for example. Wherein, unmanned aerial vehicle can be for rotating wing type unmanned aerial vehicle, for example four rotor unmanned aerial vehicle, six rotor unmanned aerial vehicle, eight rotor unmanned aerial vehicle, also can be fixed wing unmanned aerial vehicle. This unmanned aerial vehicle is last to have camera equipment.
The video materials of different sources are summarized, and the video materials of different sources are clipped together through mixed clipping, so that the video diversity of the obtained recommended video is increased.
In an embodiment, the source of the to-be-processed video material may be an aerial video shot by the unmanned aerial vehicle, so that when the to-be-processed video material includes the aerial video, when the template is determined according to the video information of the to-be-processed video material, whether the to-be-processed video material is the aerial video material may be identified first, and when the to-be-processed video material is identified as the aerial video material, the template of the aerial theme is matched for the to-be-processed video material, thereby completing the determination of the template.
In an embodiment, when the template is determined according to the video information of the video material to be processed, the template features of the template and the video features of the video material to be processed may be extracted, and the template is determined for the video material to be processed by matching the template features with the video features.
The template features may be a template theme, a mirror moving direction, a scene type of the template, a size and a position of a target object of a single video frame in the to-be-processed video material to be filled in the template, a size and a position of a target object of a continuous video frame in the to-be-processed video material to be filled in the template, and a similarity between adjacent video frames in the to-be-processed video material to be filled in the template.
The video characteristics of the video material to be processed may include a video theme, a moving direction of a mirror, a scene type, a size and a position of a target object of a single video frame in the video material to be processed, a size and a position of a target object of consecutive video frames in the video material to be processed, a similarity of adjacent video frames in the video material to be processed, and the like.
For example, the feature matching model may be trained in advance to match the template features with the video features, and the template may be determined for the to-be-processed video material according to the matching result output by the feature matching model.
In an embodiment, referring to fig. 10, step S301 includes step S3011 and step S3012.
S3011, determining a video label of the video material to be processed according to video information of the video material to be processed.
And extracting the video label of the video material to be processed, and determining the video label of the video material to be processed according to the video information of the video material to be processed.
The video label comprises at least one of a moving direction of the mirror, a scene, the size and the position of a target object of a single video frame in the video material to be processed, the size and the position of a target object of a continuous video frame in the video material to be processed, and the similarity of adjacent video frames in the video material to be processed.
Specifically, for the mirror moving direction, the change of the mirror moving direction in the video material to be processed can be judged through some algorithms for detecting the mirror moving direction. The scenes are different according to the video content in the video material to be processed, and for characters, the scenes comprise a panorama, a middle scene, a close scene and a special scene (namely close-up), and for objects, the scenes comprise a long scene and a close scene. The size and position of the target object of a single video frame in the video material to be processed are determined by using an object detection algorithm or a significance detection algorithm. The size and position of the target object of the continuous video frames in the video material to be processed are determined based on a pre-trained neural network model.
S3012, determining a plurality of templates matched with the video tags according to the video tags of the video material to be processed.
The template includes a template tag, which may be pre-set when the template is set. After the video tags of the video materials to be processed are obtained, matching can be carried out according to the video tags and the template tags of the video materials to be processed, and therefore the template is matched for the video materials to be processed.
In one embodiment, referring to fig. 11, the step of determining a plurality of matching templates according to the video tag includes steps S3012a and S3012 b. S3012a, determining a video theme corresponding to the video material to be processed according to the video label; s3012b, determining a plurality of templates matched with the video subjects according to the video subjects of the video materials to be processed.
And determining a video theme corresponding to the video material to be processed according to the extracted video label. The video theme can be determined by video tags of a single video frame and/or consecutive video frames in the video material to be processed, for example, if the target object in the consecutive video frames is a tower, the video theme corresponding to the video material to be processed is determined to be a trip.
The video theme may be a theme major class, or may include minor classes under the theme major class. For example, a video topic may be travel, gourmet, parent, etc., and under the broad category of travel topics, a video topic may also be travel-nature scene, travel-city, travel-cultural heritage, etc.
The template tag may be a template theme. After the theme of the video material to be processed is determined, the theme of the video material to be processed is matched with the template theme, and therefore a plurality of templates matched with the video theme are determined.
In an embodiment, if the video theme corresponding to the to-be-processed video material cannot be determined, a preset template is selected as the template corresponding to the to-be-processed video material.
The preset template can be a universal template, and can also be applied to templates in various scenes. The video theme corresponding to the video material to be processed cannot be determined to be the video theme which is not identified clearly, or to be the template which is not corresponding to the video theme of the video material to be processed.
In an embodiment, the determining, according to the video topic of the video material to be processed, a plurality of templates matching with the video topic includes: and determining a template corresponding to the video material to be processed from a plurality of templates matched with the video theme according to the template influence factors of the templates.
And after the templates are roughly screened according to the video topics and a plurality of templates matched with the video topics are determined, secondarily screening the templates according to the template influence factors of the templates, and thus determining the templates corresponding to the video materials to be processed. The template corresponding to the video material to be processed determined by the secondary screening may be one or multiple.
Wherein the template influence factor comprises at least one of a music matching degree, a template hot degree and a user preference degree.
Specifically, template music is preset in each template, the template music is matched with the video materials to be processed, and the matching degree score between the template music and the video materials to be processed is determined. The music matching degree is obtained according to a pre-trained music recommendation network model, and the music recommendation network model can output matching degree scores of template music of a plurality of templates matched with the video subjects and the video materials to be processed.
The template popularity is determined according to the frequency and/or the number of praise used by a plurality of templates matching the video theme. The usage frequency and/or the number of praise of all the users using the templates for each template can be obtained, and the template popularity is determined according to the selection condition of all the users using the templates for the templates.
The user preference is determined according to the frequency and/or satisfaction degree of the user on selecting a plurality of templates matched with the video theme. After the user uses the template for multiple times, the use frequency and/or the satisfaction degree of the user on each template are/is obtained according to the use habit of the user, and therefore the user preference degree is determined.
In an embodiment, the determining, according to the template influence factor of the template, a template corresponding to the to-be-processed video material from a plurality of templates matched with the video theme includes: obtaining an evaluation score and a preset weight of the template influence factor; determining template scores of a plurality of templates matched with the video theme according to the evaluation scores and preset weights of the template influence factors; and determining a template corresponding to the video material to be processed according to the template score.
And obtaining the evaluation score and the corresponding preset weight of the template influence factor of each template so as to determine the template score of each template, and determining the template corresponding to the video material to be processed from the plurality of templates according to the template score of each template. The preset weight includes weights corresponding to different template influence factors, and may be preset according to an empirical value.
For example, the formula for calculating the template score for each template is:
M=A*Emusic+B*Etemplate+C*Euser
wherein E ismusicScore representing music matching degree, A represents preset weight corresponding to the score of music matching degree, EtemplateRepresenting the heat degree of the template, B representing the preset weight corresponding to the heat degree of the template, EuserAnd C represents a preset weight corresponding to the user preference.
In one embodiment, material selection is performed on video material to be processed.
Specifically, materials to be edited are selected from the video materials to be processed by material selection of the video materials to be processed, so that a template is determined according to the selected materials to be edited, and a recommended video is generated.
In one embodiment, the selecting the video material to be processed includes: selecting the materials according to the material parameters of the video materials to be processed; the material parameters comprise at least one of shooting time, shooting place and shooting target object.
Specifically, when material selection is performed based on the material parameters, the material parameters may be set by the user himself. For example, the user wants to select video materials shot within three days of january one to may three from the to-be-processed video materials, or wants to select video materials shot between six to ten in the evening from the to-be-processed video materials, or the user wants to select video materials shot at west double hundredths of a shooting spot from the to-be-processed video materials, or the user wants to select video materials shot at cat from the to-be-processed video materials.
When the material selection is performed on the video material to be processed according to the shooting time, the material selection can be performed according to the shooting time by reading the shooting time recorded when the video is shot by the handheld terminal or the mobile platform, for example. The selection of material may also be made, for example, based on the real-time nature of the video shot, by determining the approximate time period of the shot by the content of the video material being shot, such as ambient lighting, whether there are lights or billboards lit, etc.
When the video material to be processed is selected according to the shooting place, for example, under the condition that the GPS positioning service is started, the GPS information during shooting can be recorded during shooting the video, and the video material to be processed can be screened according to the GPS information recorded during shooting.
When the video materials to be processed are selected according to the material parameters, the user can input time or place and select the materials according to the time or place input by the user, the user can also select one or more video materials to be processed, and the materials are selected according to the analyzed correlation by analyzing the correlation among the video materials to be processed selected by the user.
In one embodiment, the selecting the video material to be processed includes: clustering the video materials to be processed according to the material parameters of the video materials to be processed so as to realize material selection; wherein the clustering comprises at least one of time clustering, place clustering and target object clustering.
When the material selection is carried out according to the material parameters of the video material to be processed, the material selection can be carried out in a clustering mode. For example, clustering selection can be performed through at least one clustering mode of time clustering, place clustering and target object clustering, so that a user can conveniently and quickly select a large batch of materials, and the time for the user to select the materials is saved.
In one embodiment, the selecting the video material to be processed includes: and selecting the video material to be processed according to the selection operation of the user.
The video materials to be edited can be selected according to the self-defined selection of the user. In the specific implementation process, all the materials to be processed can be presented to a user, and the user can select the materials needing to be edited in a self-defined mode from all the materials to be processed.
In addition, after the video materials to be processed are selected or clustered according to the material parameters, the selected or clustered results can be presented to the user, and the user can select the materials to be edited in a self-defined manner from the selected or clustered results.
In addition, in an embodiment, the video materials can be selected or clustered according to the selection preference of the user history on the video materials, so that personalized use experience is provided for the user.
In an embodiment, the video material to be processed may be further segmented according to video information of the video material to be processed to generate a plurality of video segments. The video segmentation approach provided by the above embodiments may be employed, for example.
In an embodiment, the image quality of the video material to be processed can also be obtained; and carrying out scrap removal on the video material to be processed according to the image quality of the video material to be processed.
Wherein the image quality comprises at least one of picture shake, picture blur, picture overexposure, picture underexposure, no clear scene in the image or no clear subject in the image. The method comprises the steps of obtaining a video material to be processed, then carrying out quality detection on a video image of the video material to be processed, judging whether at least one of picture shaking, picture blurring, picture overexposure, picture underexposure, no clear scene in the image or no clear main body in the image appears in the video material to be processed, if so, considering the part as a waste film, and removing the video material to be processed.
It should be noted that, for a video segment in the video material to be processed, the video segment may also be discarded according to the image quality of the video segment.
In an embodiment, the video material to be processed may also be subjected to deduplication processing. Wherein the deduplication process comprises similar material clustering.
In the process of shooting video materials, in order to obtain a satisfactory video, the same scene is usually shot for multiple times, so that a lot of repeated materials exist in the video materials to be processed, similar materials are clustered, the similar materials are classified into one class, and the material with the longest video duration is selected from the materials. Or selecting the material with the best image quality from the materials, such as clear image, high color saturation, etc.
It should be noted that, for a video segment in the video material to be processed, the video segment may also be subjected to deduplication processing.
S302, determining a video clip corresponding to the video pit according to the pit information of the video pit in the template, and obtaining a matching relation corresponding to the template.
And the video clips are clips in the video material to be processed. After the template corresponding to the video material to be processed is determined, the template at least comprises one video pit position, so that the video clip corresponding to the video pit position can be determined according to the pit position information of the video pit position in the template, and the matching relation corresponding to the template is obtained.
Specifically, the pit information includes at least one of pit music and a pit label. In some embodiments, the pit bit tags may be preset, and one pit bit tag may be preset for each video pit bit.
The pit music of the video pit can be a fragment of the template music of the whole template, and the template comprises a plurality of video pit positions which are sequentially combined into one template, so that the template music of the template can be split according to the sequence and the duration of the video pit positions, and the pit music corresponding to each video pit position is obtained.
In an embodiment, the determining, according to the pit bit information of the video pit bits in the template, a video segment corresponding to the video pit bits includes: and determining the video clip corresponding to the video pit according to the pit music of the video pit in the template.
Since the pit music itself of a video pit has a tempo, for example, the pit music of the video pit is a slow tempo or a burst tempo, a suitable video clip can be matched for the video pit according to the tempo of the pit music and the video content of the video clip.
For example, a piece of music is broken out, and if there is a picture with strength such as a fountain in a video clip, the two match well. In addition, if the rhythm is a slow rhythm, the method is suitable for slow movement of the character; if the picture is a large Shish picture, the picture is suitable for large-scene delayed photography and the like.
In an embodiment, the determining, according to pit music of video pits in the template, a video segment corresponding to the video pit includes: determining the matching degree of pit music of the video pit in the template and the video clip; and determining the video clip corresponding to the video pit position in the template according to the matching degree.
Matching the rhythm of pit music of the video pit with the video clip, and determining the matching degree between the pit music of the video pit and the video clip, thereby determining the video clip corresponding to the video pit in the template according to the matching degree.
Specifically, a threshold of the matching degree may be set, and when the matching degree exceeds the threshold, it is determined that the video pit bit matches the video clip; or selecting the video segment with the highest matching degree with the video pit bits as the video segment corresponding to the video pit bits according to the matching degree.
In an embodiment, the matching degree of the pit music of the video pit in the template and the video segment is obtained by using a pre-trained music matching model, and the music matching model can output the matching degree score of the pit music of the video pit in the template and the video segment.
The matching degree between the pit music and the video clips is learned through training the neural network, and then the neural network obtained through training is used as a music matching model to output the matching degree score between the pit music and the video clips of the video pits.
In an embodiment, after determining the video segments corresponding to the video pit bits according to the pit bit information of the video pit bits in the template, determining the shooting quality of a plurality of video segments corresponding to the video pit bits in the template; determining an optimal video clip corresponding to a video pit position in the template according to the shooting quality of the plurality of video clips; and obtaining the matching relation corresponding to the template according to the optimal video clip corresponding to the video pit position in the template.
Wherein the shooting quality of the video clip is determined according to the image content of the video clip and the video clip evaluation. The image content includes shot stability, color saturation, whether there are main shooting objects and the information content in the shot, and the like, and the evaluation of the video clip includes the aesthetic scoring of the video clip.
In particular, the aesthetic scoring of the video segments may take into account factors such as color, composition, mirror motion, and scene type to aesthetically score the video segments. And scoring the shooting quality of the video clip based on the image content of the video clip and the video clip evaluation, wherein the larger the score is, the higher the shooting quality of the video clip is.
And selecting a video clip with the highest shooting quality from the video clips as the optimal video clip corresponding to the video pit position in the template according to the shooting quality of the plurality of video clips, thereby obtaining the matching relation corresponding to the template.
In an embodiment, after determining the video segments corresponding to the video pit bits according to the pit bit information of the video pit bits in the template, determining a matching degree between the video segments corresponding to two adjacent video pit bits in the template; determining an optimal video clip corresponding to the video pit bit according to the matching degree; and obtaining the matching relation corresponding to the template according to the optimal video clip corresponding to the video pit position.
And the matching degree of the video segments corresponding to the two adjacent video pit positions is determined according to the continuity of the moving mirror direction of the video segments, the increment and decrement relation of the scene and the matching clip.
The consistency of the mirror moving direction comprises the fact that the mirror moving directions of the video clips corresponding to the two adjacent video pit positions are the same, and the video clips in opposite mirror moving directions are prevented from being connected together. The incremental decreasing relationship of the view level includes, for example, from a distant view to a middle view to a near view, or from a near view to a middle view to a distant view, or directly from a distant view to a near view, and so on. Matching clips involves matching two shots by similar actions, graphics, colors, etc. to achieve a coherent, fluent narrative for transition transitions between two video segments.
In an embodiment, the determining, according to the pit bit information of the video pit bits in the template, a video segment corresponding to the video pit bits includes: and determining a video clip corresponding to the video pit according to the pit label of the video pit in the template.
And each video pit position in the template is provided with a pit position label, and the labels can be matched according to the pit position label of the video pit position in the template and the label of the video clip, so that the successfully matched video clip is used as the video clip corresponding to the video pit position.
In an embodiment, the determining, according to the pit bit label of the video pit bit in the template, a video clip corresponding to the video pit bit includes: and determining a video label of the video clip, and taking the video clip corresponding to the video label matched with the pit bit label of the video pit bit as the video clip to be filled in the video pit bit.
And extracting labels of the video clips to determine the video labels of the video clips, and then determining the video clips corresponding to the video pit bits according to the pit bit labels of the video pit bits of the template.
And S303, filling the video segments into corresponding video pit positions of the template according to the matching relation corresponding to the template to obtain a recommended video.
And after the matching relation is obtained, respectively filling the video clips into corresponding video pit positions of the template according to the matching relation, and carrying out video synthesis to obtain a recommended video. And recommend the recommended video to the user.
In an embodiment, referring to fig. 12, the step of filling the video segment into the corresponding video pit bit specifically includes steps S3031 to S3032.
S3031, determining whether the video time length of the video clip is greater than the time length of the video pit bit.
And when the video clip is filled into the corresponding video pit position according to the matching relation, judging whether the video time length of the video clip is greater than the time length of the video pit position or not. Therefore, when the video duration of the video clip is greater than the duration of the video pit, the video clip cannot be directly filled into the corresponding video pit, and a clip with the corresponding duration needs to be extracted from the video clip and filled into the corresponding video pit.
S3032, if the video time length of the video clip is larger than the time length of the video pit bit, the video clip is subjected to clip extraction to obtain a selected clip.
And selecting the video time length of the fragment to be less than or equal to the time length of the video pit bit. In a specific implementation process, in order to fill the selected segment into the corresponding video pit position and ensure the integrity of the obtained recommended video, when the selected segment is determined, the video duration of the selected segment can be equal to the duration of the video pit position.
In an embodiment, the extracting the video segments to obtain the selected segments includes: and according to the video elements of the video clips, carrying out clip extraction on the video clips to obtain selected clips.
When the video clip is extracted, the video clip can be extracted according to the video elements of the video clip to obtain the selected clip.
Wherein the video elements include at least one of a smiling face picture, a smiling audio, a character motion, a clear human voice, a picture composition, and an aesthetic score. In extracting the selection segment, a more brilliant segment may be extracted from the video segments according to the video elements as the selection segment, for example, a segment including a smiling face picture or a segment with a higher aesthetic score as the selection segment, or the like.
In one embodiment, step S303 includes: filling the video segments into corresponding video pit positions of the template according to the matching relation corresponding to the template to obtain an initial video; and performing image optimization on the initial video based on the template requirement of the template to obtain a recommended video.
And filling the video clips into corresponding video pit positions according to the matching relation corresponding to the template to obtain an initial video, then carrying out image optimization on the initial video according to the template requirement, and recommending the video after the image optimization to a user as a recommended video. The template requirements comprise at least one of transition setting, acceleration and deceleration setting and mapping special effect setting.
In an embodiment, the source of the video material to be processed can be an aerial video shot by the unmanned aerial vehicle, and the distance between the camera and the shot object is relatively long during aerial shooting, so that the picture change content is small. Therefore, when the aerial video is filled into the corresponding video pit position, the speed of the change of the picture can be automatically identified, the aerial video is automatically subjected to speed regulation according to the speed of the change of the picture, and then the aerial video subjected to speed regulation is filled into the corresponding video pit position. The speed of the picture change can be obtained by analyzing a plurality of continuous frames within a preset time.
In one embodiment, when determining the matching relationship between the video pit positions of the template and the video clips, for the identified aerial videos, the aerial videos can be placed in the first few video pit positions and/or the last few video pit positions of the template, so that the quality of the obtained recommended videos is improved.
In the video processing method provided by the above embodiment, the template is determined according to the video information of the video material to be processed, the video clip corresponding to the video pit is determined according to the pit information of the video pit, the matching relationship corresponding to the template is obtained, and finally the video clip is filled into the corresponding video pit of the template according to the matching relationship corresponding to the template, so as to obtain the recommended video. The template is determined according to the video information, the diversity of the generated recommended videos is improved, the matching relation between the video pit positions and the video clips is determined, the recommended videos are synthesized according to the matching relation, the workload of editing materials by a user is reduced, and the threshold of clipping is lowered.
Referring to fig. 13, fig. 13 is a flowchart illustrating steps of another video processing method according to an embodiment of the present disclosure.
Specifically, as shown in fig. 13, the video processing method includes steps S401 to S404.
S401, obtaining a plurality of templates, wherein the templates at least comprise one video pit bit.
And acquiring a plurality of templates, wherein the templates are used for being synthesized with the video material to be processed to obtain a recommended video, and each template at least comprises one video pit position.
Specifically, the plurality of templates may be arbitrarily acquired from a preset template library, may be acquired according to a selection operation of a user on the template in the template library, and may be acquired according to a template frequently used when the user historically synthesizes a video.
In one embodiment, the material to be processed is divided into a plurality of video segments.
And segmenting the material to be processed to generate a plurality of video segments, wherein the generated video segments are used for filling video pit positions in the template so as to synthesize the recommended video.
In an embodiment, the segmenting the material to be processed to generate a plurality of video segments includes: and according to the video information of the video material to be processed, segmenting the video material to be processed to generate a plurality of video segments. The video segmentation approach provided by the above embodiments may be employed, for example.
S402, matching video clips for the video pit positions of each template to obtain the matching relation corresponding to each template, and determining the matching score of the matching relation corresponding to each template.
The video clips are clips of video materials to be processed. In some embodiments, the video segments may be video segments obtained by segmenting the video material to be processed.
For each video pit in one template, matching the video clip for the video pit in each template respectively to obtain the matching relationship between the video pit and the video clip, taking the matching relationship between the video pit and the video clip as the matching relationship corresponding to the template, and calculating the matching score of the matching relationship.
In an embodiment, please refer to fig. 14, the step of obtaining the matching relationship for the video pit matching video clip specifically includes step S4021 and step S4022.
S4021, constructing a plurality of stream network graphs according to the video clips and the video pit positions of each template.
Wherein the flow network graph comprises a plurality ofAnd each node corresponds to the matching relation between one video clip and one video pit bit. Constructing a stream network diagram according to the video pit positions and the video clips of each template, wherein a left vertical axis C in the stream network diagrammRepresenting video clips, upper horizontal axis SnA video pit bit representing a template.
S4022, determining the matching relation between the video pit bit of each template and the video clip based on the plurality of stream network graphs.
Each template determines a matching relationship between a respective video pit bit and a video clip based on a respective stream network map.
In an embodiment, the determining a matching relationship between the video pit bit of each of the templates and the video clip based on the plurality of stream network maps includes: matching a proper video clip for the video pit position of each template based on a maximum flow algorithm to obtain an optimal path; and taking the corresponding relation between the video clips and the video pit positions in the optimal path as the matching relation between the video pit positions of each template and the video clips. The video segmentation approach provided by the above embodiments may be employed, for example.
In an embodiment, the determining a matching score of the matching relationship between the video pit bit of each template and the video clip includes: and determining the matching score of the matching relation between the video pit position of each template and the video clip according to the energy value between every two adjacent nodes in the optimal path.
And according to the energy value between every two adjacent nodes in the optimal path, adding the energy values between the two adjacent nodes in the optimal path, and taking the sum of the energy values in the optimal path as the matching score of the matching relationship.
In an embodiment, the matching the video clip for the video pit position of each template to obtain the matching relationship corresponding to each template includes: classifying the video clips according to the pit position labels of the video pit positions of the template or the template labels of the template to obtain classified video clips; and determining the matching relation corresponding to the template according to the classified video clips.
Specifically, each video clip has a corresponding video tag, and the pit bit tag of each video pit bit of the template is matched with the video tag corresponding to the video clip, so that the video clips are classified according to the matching degree of the video tags and the pit bit tags, and the video clips are divided into a plurality of categories.
In addition, the template labels of the template can be matched with the video labels corresponding to the video clips, so that the video clips are classified according to the matching degree of the video labels and the template labels, and the video clips are divided into a plurality of categories.
And then determining the video clip corresponding to the video pit position according to the category of the video clip to obtain the matching relation corresponding to the template.
In an embodiment, the classifying the video clips according to the pit bit labels of the video pit bits of the template or the template labels of the template includes: and grading the video clips according to the pit bit labels of the video pit bits or the template labels of the template to obtain video clips of multiple grades.
Wherein the plurality of hierarchical categories of video segments include at least a first category of video segments, a second category of video segments, and a third category of video segments; the highlight level of the video clips of the first category is greater than the highlight level of the video clips of the second category, and the highlight level of the video clips of the second category is greater than the highlight level of the video clips of the third category.
Specifically, when the video segment is classified, the video segment may be classified in a highlight level. Therefore, when the video pit positions are matched with the video clips, the most wonderful video clip can be selected as the video clip corresponding to the video pit positions, and the matching relation of the template is obtained.
Wherein the highlight level is determined according to picture content and audio content of the video clip. For example, the picture content includes a picture composition, whether there is an explicit subject, a moving direction, and a scene, and the like. The audio content includes whether there is clear human voice, laughter, cheering, and the like.
In an embodiment, the determining, according to the classified video clips, a matching relationship between a video pit bit of each template and the video clip includes: sorting the classified video clips according to the pit position labels of the video pit positions or the template labels of the template; and determining the matching relation between the video pit position of each template and the video clip according to the sequencing result.
Specifically, the classified video clips are sorted according to the pit bit labels of the video pit bits and the classification levels of the video clips. In the sorting, for each pit bit label, the video clips can be sorted from high to low according to the classification level. And for each video pit position of each template, selecting the video clip with the top sequence as the video clip to be filled in the video pit position, thereby obtaining the matching relation between the video pit positions and the video clips of the template.
For example, the pit bit labels are three kinds of ABC, and the video labels and the classification levels thereof of the video clips are a1, a2, A3, B1, B3, C1, and C2. The classified video clips are of the type A label: a1, a2, A3, B tag class: b1, B3, C tag class: c1, C2.
When selecting a video clip to be filled in for a video pit, if the pit bit label of the video pit bit is a, the video clip a1 is selected, if the pit bit label of the video pit bit is B, the video clip B1 is selected, and if the pit bit label of the video pit bit is C, the video clip C1 is selected.
And similarly, sequencing the classified video clips according to the template labels of the template and the classification levels of the video clips. In sorting, for each template tag, the video segments may be sorted from high to low according to their classification level. And for each video pit position of each template, selecting the video clip with the top sequence as the video clip to be filled in the video pit position, thereby obtaining the matching relation between the video pit positions and the video clips of the template.
In an embodiment, the determining, according to the classified video clips, a matching relationship between a video pit bit of each template and the video clip includes: and distributing video clips to the video pit positions of the templates according to the sequencing result, and determining the matching relation between the video pit position of each template and the video clip.
And for each video pit position of each template, selecting the video clip with the top sequence as the video clip to be filled in the video pit position, thereby obtaining the matching relation between the video pit positions and the video clips of the template.
And S403, determining a recommended template from the plurality of templates according to the matching score.
After the matching score corresponding to the matching relationship of the templates is obtained, a recommended template can be determined from the obtained multiple templates, wherein the recommended template is a template for synthesizing a recommended video.
In an embodiment, referring to fig. 15, the step of determining the recommended template according to the matching score specifically includes steps S4031 to S4033.
S4031, determining a preset number of templates from the plurality of templates according to the matching scores.
First, a preset number of templates is determined from the plurality of templates according to the matching score, wherein the preset number of templates may be set according to an empirical value.
In a specific implementation, a preset number of templates may be selected from the plurality of templates from high to low according to the matching score.
S4032, optionally selecting a target number of templates from the preset number of templates, and combining to obtain a plurality of template groups, wherein the template groups comprise the target number of templates.
Wherein the target number is the number of the recommended templates recommended to the user. And optionally selecting a target number of templates from a preset number of templates to combine to obtain a plurality of template groups, wherein the number of templates in each template group is the target number.
In a specific implementation, a target number of templates may be selected using the number of combinations and a template set may be obtained. For example, if the preset number is n and the target number is k, the number of template groups may be selected as follows:
Figure BDA0003619997670000371
s4033, determining a recommended template group from a plurality of template groups according to the template types of the target number of templates in the template group, and taking the target number of templates in the recommended template group as recommended templates.
The template types of the target number of templates included in each template group are respectively determined, and a recommended template group is determined from the template groups according to the template types. And considering the richness of the template, the template recommended to the user comprises various styles for the user to select.
In an embodiment, referring to fig. 16, step S4033 includes step S4033a and step S4033 b.
S4033a, obtaining template types of a target number of templates in the plurality of template groups, and determining a combination score corresponding to the plurality of template groups according to the template types and the matching scores.
For each template group, template types of the target number of templates in the template group are obtained, then template richness scores of the template group are calculated according to the template types, and matching scores of the target number of templates in the template group are obtained. And determining a combined score corresponding to each template group according to the matching score and the richness score.
In an embodiment, the determining a combined score corresponding to a plurality of the template groups according to the template types and the matching scores includes: determining template richness among a target number of templates in the plurality of template groups according to the template types; and determining the combined score of the plurality of template groups according to the template richness among the templates with the number of the targets in the template group and the sum of the matching scores of the templates with the number of the targets in the template group.
And acquiring the template type ID of each template in the template group, wherein the IDs corresponding to different template types are different, and judging the template richness among the templates in the template group based on the different IDs.
For example, in determining template richness in a template set, the following formula may be used:
Figure BDA0003619997670000381
wherein E is1Representing the richness of templates in a template set, aiTemplate type representing the ith template, ajTemplate type, f (a), representing the jth templatei,aj) The value of (d) indicates whether the template type of the ith template and the template type of the jth template are the same.
When the template type of the ith template and the template type of the jth template are the same, f (a)i,aj) Is 0, and f (a) is a value when the template type of the ith template and the template type of the jth template are differenti,aj) Has a value of 1. E1The larger the value of (A), the more abundant the template types in the template group.
After the template richness in the template group is obtained, the combination score of the template group can be determined according to the matching score of each template in the template group.
For example, the following formula may be referred to:
E=a*E1+E2
wherein E is the combined score of the template group, E1Is the template richness of the template group, a is the preset weight of the template richness, E2Is the sum of the matching scores of the templates in the template set.
S4033b, determining a recommended template set from the plurality of template sets according to the combined score.
After the combination score of each template group is obtained, a recommended template group can be selected from the plurality of template groups according to the combination score of each template group, and the template in the recommended template group is the recommended template recommended to the user.
In a specific implementation process, the template group with the highest combination score may be selected from the plurality of template groups as the recommended template group according to the combination score.
In an embodiment, the number of the recommended videos includes a plurality of videos, and the plurality of recommended videos are obtained according to a target number of templates in the recommended template group.
In one embodiment, a plurality of the recommended videos are recommended to the user for selection by the user.
And for the templates with the target number in the recommended template group, filling the video clips into the video pit positions of the templates according to the corresponding matching relation of each template, so as to generate a plurality of recommended videos, and recommending the plurality of recommended videos to the user together, so that the user can select the video to be used finally.
In an embodiment, referring to fig. 17, step S403 includes step S4031 'and step S4032'.
S4031', determining a preset number of templates from the plurality of templates according to the matching scores, and forming a template group.
And determining a preset number of templates from the plurality of templates according to the matching scores of the plurality of templates, and forming a template group by the preset number of templates.
Specifically, a preset number of templates may be selected from high to low according to the respective matching scores of the templates to form a template group. For example, five templates may be sequentially selected from the top to the bottom as a group according to the matching scores of the templates, thereby forming a template group.
S4032', determining the template type of the template in the template group, and determining a recommended template from the template group according to the template type.
And determining the template type of the templates in the template group, so as to determine the recommended template from the template group according to the template type.
In an embodiment, referring to fig. 18, the step of determining the recommended template according to the template type specifically includes steps S4032 'a to S4032' c.
S4032' a, determining whether the number of the template types is larger than a preset type threshold value.
Determining whether the number of template types in the template group is greater than a preset type threshold, wherein the preset type threshold is the number of template types expected to be recommended for a user. The preset type threshold may be preset.
S4032' b, if the number of the template types is larger than the preset type threshold value, determining a recommended template according to the matching scores of the template types and templates of the same template types.
And when the number of the template types in the template group is larger than a preset type threshold value, determining a recommended template according to the template types and the matching scores of the templates of the same template type, namely selecting the template with the highest matching score from the templates of the same template type as the recommended template.
In one embodiment, the determining a recommended template according to the matching scores of the template type and templates of the same template type includes: performing type division on the templates in the template group according to the template types to obtain a plurality of types of templates; determining a plurality of types of optimal templates according to the matching scores, wherein the types of optimal templates are the templates with the highest matching scores in each template type; and selecting the template with the highest matching score from a plurality of types of optimal templates as a recommended template.
And carrying out type division on the templates in the template group according to the template types to obtain a plurality of types of templates, and dividing the templates of the corresponding template types into the corresponding types. Then, for each divided template type, selecting one template with the highest matching score from a plurality of templates corresponding to the template type as the optimal template under the template type, namely the template with the optimal type. And for the template with the optimal type of the template types, selecting the template with the highest matching score from the template with the optimal type as a recommended template.
Referring to fig. 19, fig. 19 is a diagram illustrating selection of a recommended template from a set of templates. A. B, C, D, the preset type threshold is 3 for four template types in the template group, that is, three types of templates need to be selected from the template group and recommended to the user.
As shown in FIG. 19, the A template type has A below1To AnN templates with matching score A1-99 min, A2-98 min, A3-96, etc. Under type B of template, there is B1To BmM templates with matching scores of B1-97 min, B 294 minutes, B 389 and the like. Under the C template type, there is C1To CxX templates with matching score of C1-99 min, C2-95 min, C3-90 points, etc. D under the D template type1To DyThe matching scores of the y templates are respectively D1-95 min, D 294 min, D393, etc.
For each template type, selecting the template with the optimal type according to the matching score of the template, and obtaining the template with the optimal type of the template type A as the template type A1The type-optimal template of the B template type is B1The type-optimal template of the C template type is C1And the type-optimal template of the D template type is D1
Since only three types of templates need to be selected for recommendation to the user, the comparison template A1And a template B1And a template C1And a template D1Is scored, wherein A1-99 min, B1Score-97, C1-99 min, D1Score 95, from template A according to the matching score1And a template B1And a template C1And a template D1Three templates are selected from1And a template B1And a template C1A template A1And a template B1And a template C1And recommending the user as a recommendation template.
S4032' c, if the number of the template types is smaller than or equal to the preset type threshold, selecting the template with the highest matching score from a plurality of templates of the same template type in the template group as a recommended template.
If the number of the template types is smaller than the preset type threshold, it is indicated that the template types in the template group do not meet the requirement of the preset type threshold, so that the template with the highest matching score can be directly selected as the recommended template from the templates of the template types owned by the template group.
For example, if the preset type threshold is 3, and the template types in the template group are only two, i.e., a and B, the template with the highest matching score under the a template type is the template a1The template with the highest matching score under the type of the B template is the template B1At this time, the template A may be used1And a template B1As a recommendation template.
If the number of the template types is equal to the preset type threshold, it is indicated that the template types in the template group just meet the requirement of the preset type threshold, and therefore, the template with the highest matching score can be directly selected as the recommended template.
For example, if the preset type threshold is 3, the template types in the template group are A, B and C, and the template with the highest matching score under the A template type is the template A1The template with the highest matching score under the type of the B template is the template B1The template with the highest matching score under the C template type is the template C1At this time, the template A may be used1And a template B1And a template C1As a recommendation template.
In an embodiment, step S403 specifically includes obtaining template types of the templates; and determining a recommended template according to the template types and the matching scores of the plurality of templates.
For the plurality of templates, the template types of the templates are obtained, so that a recommended template can be determined according to the template types and the matching scores of the templates, and the recommended template of the template types can be recommended to a user.
In one embodiment, the determining a recommended template according to the template types and the matching scores of the plurality of templates includes: dividing the templates into a plurality of template groups according to the template types, wherein each template group at least comprises one template; determining templates meeting the number of type requirements from the plurality of type template groups according to the matching scores of the templates; and selecting templates from the remaining templates in the plurality of templates according to the matching scores of the templates, wherein the number of the selected templates meets the number of the template requirements.
For a plurality of templates, when the number of the template types of the plurality of templates is larger than a preset first threshold, acquiring different types of templates with the first threshold as a first recommended template; the first recommended template is the template with the highest matching score in the template types; the first threshold is the number of type requirements.
According to other templates except the first recommended template in the plurality of templates, obtaining a second recommended template according to the matching scores; and the sum of the first recommended template and the second recommended template is a second threshold value, and the second threshold value is the number of the required templates.
Classifying the templates according to the template types of the templates to obtain a plurality of template groups of various types, wherein each template group of various types corresponds to one template type, and each template group of various types comprises at least one template.
And then selecting templates meeting the quantity of the type requirements from the type template group according to the matching scores of the templates, and then selecting templates from the rest templates according to the matching scores of the rest templates until the quantity of the selected templates meets the quantity of the template requirements.
For example, as shown in fig. 19, if there are 24 templates, the number of template requirements is 5, and the number of type requirements is 3.
And obtaining the template types of the 24 templates, and classifying the 24 templates according to different template types to obtain A, B, C, D four template groups, wherein each template group corresponds to one template type, and each template group comprises six templates.
Wherein, A is below the type of A template1To A6The matching scores of the 6 templates are respectively A1-99 min, A2-98 min, A3-96, etc. Under type B of template, there is B1To B6The matching scores of the 6 templates are respectively B1-97 min, B 294 minutes, B 389 and the like. Under the C template type, there is C1To C6The matching scores of the 6 templates are respectively C1-99 min, C2-95 min, C3-90 points, etc. D under the D template type1To D6The matching scores of the 6 templates are respectively D1-95 min, D 294 min, D393, etc.
The highest score of the matching score of the template under the A template type is A1 Score 99, highest score of matching scores for templates under B template type B1Score 97, highest score of matching score of template under C template type C1Score 99, highest score of matching scores for templates under D template type1-95 minutes.
According to A, B, C, D the highest scores of the matching scores of the templates under the four template types, selecting three templates with the highest matching scores from the four template types, wherein the three templates are respectively template A1And a template B1And template C1And enabling the three selected templates to meet the required number of types.
At this time, the number of the selected templates is 3, and the number of the template requirements is 5, so that 2 templates can be selected according to the matching score of each of the remaining 21 templates, so that the number of the selected templates meets the number of the template requirements.
Wherein, after selecting the templates with the required number of types, the remaining 21 templates comprise the template A2To A6,B2To B6,C2To C6And D1To D6. Selecting template A according to the matching scores of the 21 templates2And a template A3At this time, the number of the selected templates meets the number of the template requirements, and the selected templates meeting the number of the template requirements are used as recommended templates, that is, the template A is used1Template A2Template A3And a template B1And a template C1These five templates serve as recommendation templates.
In addition, when the matching scores of at least two templates are the same, the template type different from that of the selected template can be selected by considering the diversity of the templates.
In one embodiment, the determining a recommended template according to the template types and the matching scores of the plurality of templates includes: sequentially selecting templates from the plurality of templates according to the matching scores of the templates until the selected template type meets the type demand quantity; and selecting templates from the rest templates in the plurality of templates according to the matching scores of the templates until the number of the selected templates meets the number of the required templates.
And selecting a template from the plurality of templates according to the matching score of the template, and determining the type of the selected template, so that the selected template types are different in each selection until the selected template types meet the type requirement quantity. And then, selecting templates from the rest templates according to the matching scores of the rest templates until the number of the selected templates meets the required number of the templates.
For example, as shown in fig. 19, if there are 24 templates, the number of template requirements is 5, and the number of type requirements is 3.
Wherein, A is below the type of A template1To A6The matching scores of the 6 templates are respectively A1-99 min, A2-98 min, A3-96, etc. Under type B of template, there is B1To B6The matching scores of the 6 templates are respectively B1-97 min, B 294 minutes, B 389 and the like. Under the C template type, there is C1To C6The matching scores of the 6 templates are respectively C1-99 min, C2-95 min, C3-90 points, etc. D under the D template type1To D6The matching scores of the 6 templates are respectively D1-95 min, D 294 minutes, D393, etc.
When selecting the template, one template with the highest matching score is selected from the 24 templates, wherein the template A is used1And a template C1Are the same and are all the highest scores, so that the template A can be selected from1And a template C1Optionally one template is selected as the first selected template, e.g., the first selected template is template A1The corresponding template type is A.
Then removing the template A1Selecting the template with the highest matching score from the rest 23 templates, wherein the template selected for the second time is the template C1The corresponding template type is C. At this time, the number of the selected template types is two, and the number of the type requirements is not met, so that the template A is continuously removed1And a template C1Selecting the template with the highest matching score from the rest 22 templates, wherein the template with the highest matching score is the template A2The corresponding template type is A, but since two templates have been selected (template A)1And a template C1) In which there is a template A of the same template type1Thus, removing the template A1And a template C1Selecting the matching score from the other 22 templates2The template having the smallest difference in matching scores is the template B1The corresponding template type is B. Due to the template B1With the selected template A1And a template C1Are different in type, the template B is used1As the template selected for the third time, the template selected at this time is template A1And a template C1And a template B1And the number of the template types is three, so that the number of the type requirements is met.
Then according to the matching score of the template, the template A is divided1And a template C1And a template B1Selecting the template with the highest matching score from the rest 21 templates, wherein the selected template is the template A2At this time, the number of the selected templates is four, and the number of the selected templates does not meet the requirement of the templates.
Then, the template with the highest matching score is selected from the rest 20 templates, and the selected template is the template A3At this time, the number of the selected templates is five, and the required number of the templates is met. The selected templates meeting the number of the template requirements are used as the recommended templates, namely, the template A1Template A2Template A3And a template B1And a template C1These five templates serve as recommendation templates.
S404, filling the video clips into corresponding video pit positions of the recommendation template according to the matching relation corresponding to the recommendation template to obtain the recommendation video.
And filling the video clips into the corresponding video pit positions according to the matching relation corresponding to the recommended template, namely the matching relation between the video pit positions and the video clips in the recommended template, so as to obtain the recommended video.
In an embodiment, step S404 is specifically to determine whether the video duration of the video segment is greater than the duration of the video pit bit; and if the video duration of the video clip is greater than the duration of the video pit bit, performing clip extraction on the video clip to obtain a selected clip.
And when the video clip is filled into the corresponding video pit position according to the matching relation, judging whether the video time length of the video clip is greater than the time length of the video pit position or not. Therefore, when the video duration of the video clip is greater than the duration of the video pit bit, the video clip cannot be directly filled into the corresponding video pit bit, and the clip with the corresponding duration needs to be extracted from the video clip and filled into the corresponding video pit bit.
And selecting the video time length of the fragment to be less than or equal to the time length of the video pit bit. In a specific implementation process, in order to fill the selected segment into the corresponding video pit position and ensure the integrity of the obtained recommended video, when the selected segment is determined, the video duration of the selected segment can be equal to the duration of the video pit position.
In an embodiment, the extracting the video segments to obtain the selected segments includes: and according to the video elements of the video clips, carrying out clip extraction on the video clips to obtain selected clips.
When the video clip is extracted, the video clip can be extracted according to the video elements of the video clip to obtain the selected clip.
Wherein the video elements include at least one of a smiling face picture, a smiling audio, a character motion, a clear human voice, a picture composition, and an aesthetic score. In extracting the selection segment, a more brilliant segment may be extracted from the video segments according to the video elements as the selection segment, for example, a segment including a smiling face picture or a segment with a higher aesthetic score as the selection segment.
In an embodiment, step S404 is specifically to fill the video clip into the corresponding video pit position of the recommended template according to the matching relationship corresponding to the recommended template, so as to obtain an initial video; and performing image optimization on the initial video based on the template requirement of the recommended template to obtain a recommended video.
And filling the video clips into corresponding video pit positions according to the matching relation corresponding to the recommended template to obtain an initial video, then carrying out image optimization on the initial video according to the template requirement, and recommending the video after the image optimization to a user as the recommended video. The template requirements comprise at least one of transition setting, acceleration and deceleration setting and mapping special effect setting.
In an embodiment, the source of the video material to be processed can be an aerial video shot by the unmanned aerial vehicle, and the distance between the camera and the shot object is relatively long during aerial shooting, so that the picture change content is small. Therefore, when the aerial video is filled into the corresponding video pit position, the speed of the change of the picture can be automatically identified, the aerial video is automatically subjected to speed regulation according to the speed of the change of the picture, and then the aerial video subjected to speed regulation is filled into the corresponding video pit position. The speed of the picture change can be obtained by analyzing a plurality of continuous frames within a preset time.
In one embodiment, when determining the matching relationship between the video pit positions of the template and the video clips, for the identified aerial videos, the aerial videos can be placed in the first few video pit positions and/or the last few video pit positions of the template, so that the quality of the obtained recommended videos is improved.
In the video processing method provided by the embodiment, the matching relationship corresponding to each template is obtained by obtaining the plurality of templates and matching the video clip for the video pit of each template, so that the matching score of the matching relationship corresponding to each template is determined, the recommended template is determined from the plurality of templates according to the matching score, and finally the recommended video is synthesized according to the matching relationship corresponding to the recommended template. And determining a recommendation template according to the matching score, synthesizing a recommendation video based on the recommendation template, automatically determining a proper recommendation template for the video clip, reducing the workload of a user in video editing, and improving the diversity of the synthesized recommendation video.
It should be noted that, according to actual needs, the above embodiments may be executed alone or in combination, and the specific execution order and combination are not specifically limited; the steps can be executed separately or in combination, and the specific execution order and combination mode are not limited specifically.
Referring to fig. 20, fig. 20 is a schematic block diagram of a video processing apparatus according to an embodiment of the present disclosure. As shown in fig. 20, the video processing apparatus 500 further includes at least one or more processors 501 and a memory 502.
The Processor 501 may be, for example, a Micro-controller Unit (MCU), a Central Processing Unit (CPU), a Digital Signal Processor (DSP), or the like.
The Memory 502 may be a Flash chip, a Read-Only Memory (ROM) magnetic disk, an optical disk, a usb Flash disk, or a removable hard disk.
Wherein the memory 502 is used for storing computer programs; the processor 501 is configured to execute the computer program and execute the video processing method according to any one of the embodiments of the present application when executing the computer program, so as to reduce workload of a user in performing video clips and provide diversified recommended videos.
Referring to fig. 21, fig. 21 is a schematic block diagram of a terminal device according to an embodiment of the present application. As shown in fig. 21, the terminal device 600 further includes at least one or more processors 601 and memory 602.
The terminal equipment comprises terminals such as a mobile phone, a remote controller, a PC (personal computer), a tablet personal computer and the like.
The Processor 601 may be, for example, a Micro-controller Unit (MCU), a Central Processing Unit (CPU), a Digital Signal Processor (DSP), or the like.
The Memory 602 may be a Flash chip, a Read-Only Memory (ROM) magnetic disk, an optical disk, a usb disk, or a removable hard disk.
Wherein the memory 602 is used for storing computer programs; the processor 601 is configured to execute the computer program and execute the video processing method provided in any one of the embodiments of the present application when executing the computer program, so as to reduce workload of a user in performing video clips and provide diversified recommended videos.
In an embodiment of the present application, a computer-readable storage medium is further provided, where the computer-readable storage medium stores a computer program, the computer program includes program instructions, and the processor executes the program instructions to implement the steps of the video processing method provided in any one of the foregoing embodiments.
The computer-readable storage medium may be an internal storage unit of the terminal device in any of the foregoing embodiments, for example, a memory or an internal memory of the terminal device. The computer readable storage medium may also be an external storage device of the terminal device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal device.
While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (144)

1. A video processing method, comprising:
determining a template according to video information of a video material to be processed, wherein the template at least comprises a video pit position;
determining a video segment corresponding to the video pit position according to the pit position information of the video pit position in the template to obtain a matching relation corresponding to the template, wherein the video segment is a segment in the video material to be processed;
and filling the video segments into corresponding video pit positions of the template according to the matching relation corresponding to the template to obtain a recommended video.
2. The method of claim 1, wherein determining the template from the video information of the video material to be processed comprises:
determining a video label of a video material to be processed according to video information of the video material to be processed;
and determining a plurality of templates matched with the video tags according to the video tags of the video material to be processed.
3. The method of claim 2, wherein determining a plurality of templates matching the video tags of the video material to be processed according to the video tags comprises:
determining a video theme corresponding to the video material to be processed according to the video label;
and determining a plurality of templates matched with the video theme according to the video theme of the video material to be processed.
4. The method of claim 2, wherein the video tags comprise at least one of a direction of motion of the mirror, a scene, a size and a position of an object of a single video frame in the video material to be processed, a size and a position of an object of consecutive video frames in the video material to be processed, and a similarity of adjacent video frames in the video material to be processed.
5. The method of claim 4, wherein the size and location of the object of a single video frame in the video material to be processed is determined using an object detection algorithm or a saliency detection algorithm.
6. The method of claim 4, wherein the size and location of the objects in successive video frames of the video material to be processed is determined based on a pre-trained neural network model.
7. The method of claim 3, further comprising:
and if the video theme corresponding to the video material to be processed cannot be determined, selecting a preset template as the template corresponding to the video material to be processed.
8. The method of claim 3, wherein determining a plurality of templates matching the video topic of the video material to be processed according to the video topic comprises:
and determining a template corresponding to the video material to be processed from a plurality of templates matched with the video theme according to the template influence factors of the templates.
9. The method according to claim 8, wherein the determining the template corresponding to the video material to be processed from a plurality of templates matching the video topic according to the template influence factor of the template comprises:
obtaining an evaluation score and a preset weight of the template influence factor;
determining template scores of a plurality of templates matched with the video theme according to the evaluation scores and preset weights of the template influence factors;
and determining a template corresponding to the video material to be processed according to the template score.
10. The method of claim 8, wherein the template impact factors include at least one of music match, template popularity, and user preference.
11. The method according to claim 10, wherein the music matching degree is obtained according to a pre-trained music recommendation network model capable of outputting matching degree scores of template music of a plurality of templates matching with the video theme and the video material to be processed.
12. The method of claim 10, wherein the template popularity is determined according to a frequency and/or a number of praise used for a plurality of templates matching the video theme.
13. The method of claim 10, wherein the user preference is determined based on a frequency and/or satisfaction score of the user with respect to a plurality of templates matching the video topic.
14. The method according to claim 1, wherein the pit bit information includes at least one of pit bit music and a pit bit tag.
15. The method of claim 1, wherein the determining a video segment corresponding to a video pit bit according to pit bit information of the video pit bit in the template comprises:
and determining the video clip corresponding to the video pit according to the pit music of the video pit in the template.
16. The method of claim 15, wherein determining the video segment corresponding to the video pit bit according to the pit bit music of the video pit bit in the template comprises:
determining the matching degree of pit music of the video pit in the template and the video clip;
and determining the video clip corresponding to the video pit position in the template according to the matching degree.
17. The method according to claim 16, wherein the degree of matching of pit music of video pits in the template with video segments is obtained by using a pre-trained music matching model, and the music matching model is capable of outputting a score of matching degree of pit music of video pits in the template with video segments.
18. The method according to claim 1, wherein after determining the video segment corresponding to the video pit bit according to the pit bit information of the video pit bit in the template, further comprising:
determining the shooting quality of a plurality of video clips corresponding to the video pit positions in the template;
determining an optimal video clip corresponding to a video pit position in the template according to the shooting quality of the plurality of video clips;
and obtaining the matching relation corresponding to the template according to the optimal video clip corresponding to the video pit position in the template.
19. The method of claim 18, wherein the quality of the video segment is determined based on the image content of the video segment and the video segment rating.
20. The method of claim 19, wherein the image content comprises at least one of whether there is a main subject, an amount of information within a lens, lens stability, and color saturation.
21. The method of claim 19, wherein the video segment rating comprises an aesthetic score of the video segment.
22. The method according to claim 1, wherein after determining the video segment corresponding to the video pit bit according to the pit bit information of the video pit bit in the template, further comprising:
determining the matching degree between the video clips corresponding to two adjacent video pit positions in the template;
determining an optimal video clip corresponding to the video pit bit according to the matching degree;
and obtaining the matching relation corresponding to the template according to the optimal video clip corresponding to the video pit position.
23. The method of claim 22, wherein the matching degree of the video segments corresponding to the two adjacent video pit positions is determined according to the continuity of the moving direction of the video segments, the incremental and decreasing relationships of the scenes, and the matching clips.
24. The method according to claim 23, wherein the matching degree of the video segments corresponding to the two adjacent video pit bits is obtained by using a pre-trained segment matching model, and the segment matching model is capable of outputting the matching degree of the video segments filled by the two adjacent video pit bits.
25. The method according to claim 1, wherein the determining a video segment corresponding to the video pit bit according to the pit bit information of the video pit bit in the template comprises:
and determining a video clip corresponding to the video pit according to the pit label of the video pit in the template.
26. The method of claim 25, wherein determining the video segment corresponding to the video pit bit according to the pit bit tag of the video pit bit in the template comprises:
and determining a video label of the video clip, and taking the video clip corresponding to the video label matched with the pit bit label of the video pit bit as the video clip to be filled in the video pit bit.
27. The method according to any one of claims 1 to 26, wherein said filling the video segments into the corresponding video pit positions of the template according to the matching relationship corresponding to the template comprises:
determining whether the video duration of the video clip is greater than the duration of the video pit bit;
if the video duration of the video clip is larger than the duration of the video pit bit, performing clip extraction on the video clip to obtain a selected clip;
and selecting the video time length of the fragment to be less than or equal to the time length of the video pit bit.
28. The method of claim 27, wherein said extracting segments of said video segment to obtain selected segments comprises:
and according to the video elements of the video clips, carrying out clip extraction on the video clips to obtain selected clips.
29. The method of claim 28, wherein the video elements comprise at least one of a smiley face picture, laughter audio, character action, clear human voice, picture composition, and aesthetic score.
30. The method according to any one of claims 1 to 26, wherein the filling the video segments into corresponding video pit positions of the template according to the matching relationship corresponding to the template to obtain a recommended video comprises:
filling the video segments into corresponding video pit positions of the template according to the matching relation corresponding to the template to obtain an initial video;
and performing image optimization on the initial video based on the template requirement of the template to obtain a recommended video.
31. The method of claim 30, wherein the template requirements include at least one of a transition setting, an acceleration and deceleration setting, and a map special effects setting.
32. The method according to any one of claims 1 to 26, characterized in that it comprises:
and carrying out duplicate removal processing on the video material to be processed.
33. The method of claim 32, wherein the deduplication process comprises similar material clustering.
34. The method according to any one of claims 1 to 26, characterized in that it comprises:
acquiring the image quality of the video material to be processed;
and carrying out scrap removal on the video material to be processed according to the image quality of the video material to be processed.
35. The method of claim 34, wherein the image quality comprises at least one of picture jitter, picture blur, picture overexposure, picture underexposure, no definite scene in the image, or no definite subject in the image.
36. The method of any of claims 1 to 26, wherein the video material to be processed comprises at least one of video material captured through a handheld terminal, video material captured through a movable platform, video material obtained from a cloud server, and video material obtained from a local server.
37. The method according to any one of claims 1 to 26, characterized in that it comprises:
and selecting the video material to be processed.
38. The method of claim 37, wherein the selecting the video material to be processed comprises:
selecting the materials according to the material parameters of the video materials to be processed;
the material parameters comprise at least one of shooting time, shooting place and shooting target object.
39. The method of claim 37, wherein the selecting the video material to be processed comprises:
and selecting the video material to be processed according to the selection operation of the user.
40. The method of claim 37, wherein said selecting the material for the video material to be processed comprises:
clustering the video materials to be processed according to the material parameters of the video materials to be processed so as to realize material selection;
wherein the clustering comprises at least one of time clustering, place clustering and target object clustering.
41. The method of any one of claims 1 to 26, further comprising:
and according to the video information of the video material to be processed, segmenting the video material to be processed to generate a plurality of video segments.
42. The method of claim 41, wherein the video information comprises at least one of a direction of motion of a mirror and scene information.
43. The method according to claim 41, wherein said segmenting the video material to be processed into a plurality of video segments comprises:
segmenting the video material to be processed according to the video information of the video material to be processed to obtain a plurality of first video segments;
clustering and segmenting the first video segments to obtain a plurality of second video segments;
and taking the second video segment as a video segment of the video pit bit to be filled in the template.
44. The method of claim 43, wherein prior to cluster partitioning the first video segment, the method further comprises:
determining whether a first video clip with a video duration larger than a preset duration exists in the plurality of first video clips;
and if a first video segment with the video time length larger than the preset time length exists, executing the step of clustering and segmenting the first video segment.
45. The method of claim 43, wherein clustering the first video segment comprises:
determining a sliding window and a clustering center, wherein the sliding window is used for determining a current video frame to be processed, and the clustering center is used for determining a video segmentation point of the first video segment;
based on the clustering center, performing clustering analysis on the video frames of the first video clip according to the sliding window to determine video segmentation points;
and performing video segmentation on the first video segment according to the video segmentation point.
46. The method of claim 45, wherein the cluster center comprises image features of a first frame video frame of the first video segment.
47. The method of claim 46, wherein the image features of the video frames of the first video segment are obtained according to a pre-trained image feature network model, and the image feature network model is capable of outputting the image features of the video frames of the first video segment.
48. The method of claim 45, wherein the size of the sliding window is related to the duration of the first video segment; alternatively, the size of the sliding window is related to a desired segmentation speed set by a user.
49. The method of claim 45, wherein the size of the sliding window is equal to 1.
50. The method of claim 45, wherein performing cluster analysis on the video frames of the first video segment according to the sliding window based on the cluster center to determine video segmentation points comprises:
determining a current video frame according to the sliding window, and determining the similarity between the image characteristics of the current video frame and the clustering center;
if the similarity is smaller than a preset threshold value, taking the current video frame as a video segmentation point, and re-determining a clustering center;
and continuously determining the video segmentation point according to the re-determined clustering center until the last video frame of the first video segment.
51. The method of claim 50, wherein determining the similarity of the image feature of the current video frame to the cluster center comprises:
and determining the cosine similarity between the image characteristics of the current video frame and the clustering center.
52. The method of claim 50, wherein said re-centering comprises:
and taking the image characteristics of the current video frame as the re-determined clustering center.
53. The method of claim 50, wherein after determining the similarity of the image feature of the current video frame to the cluster center, the method further comprises:
if the similarity is larger than or equal to a preset threshold value, updating the clustering center;
and continuously determining the similarity between the image characteristics of the current video frame and the updated clustering center according to the updated clustering center.
54. The method of claim 53, wherein said updating the cluster center comprises:
acquiring image characteristics of the current video frame;
and determining an updated clustering center according to the image characteristics of the current video frame and the clustering center.
55. A video processing method, comprising:
constructing a stream network diagram according to the video clips of the video material to be processed and the video pit positions of the template;
determining a matching relation corresponding to the video clip and the video pit bit based on the stream network diagram;
filling the video segments into corresponding video pit positions of the template according to the matching relation to obtain a recommended video;
the stream network graph comprises a plurality of nodes, and each node corresponds to the matching relation between one video clip and one video pit bit.
56. The video processing method according to claim 55, wherein said determining a matching relationship between the video clip and the video pit bit based on the stream network map comprises:
matching appropriate video clips for the video pit positions of the template based on a maximum flow algorithm to obtain an optimal path;
and taking the corresponding relation between the video clip and the video pit position in the optimal path as the matching relation between the video pit position of the template and the video clip.
57. The method according to claim 56, wherein said matching the video pit bits of the template with the appropriate video segments based on the max flow algorithm to obtain the optimal path comprises:
and determining the optimal path corresponding to the template according to the energy value between two adjacent nodes in the flow network graph.
58. The method of claim 57, wherein the method comprises:
and determining the energy value between two adjacent nodes according to the energy value influence factor of each node.
59. The method of claim 58, wherein determining an energy value between two adjacent nodes according to the energy value influence factor of each node comprises:
obtaining an evaluation score and a preset weight of the energy value influence factor;
and determining the energy value between two adjacent nodes according to the evaluation score and the preset weight of the energy value influence factor.
60. The method of claim 58, wherein the energy value impact factor comprises at least one of a shooting quality of a video clip corresponding to each of the video pit bits, a degree of matching of each of the video pit bits with the corresponding video clip, and a degree of matching of video clips corresponding to two adjacent video pit bits.
61. The method according to claim 60, wherein the shooting quality of the video clip corresponding to each of the video pit locations is determined according to the image content of the video clip and the video clip rating.
62. The method according to claim 60, wherein the degree of matching of each of the video pit bits with the corresponding video segment is determined according to the degree of matching of pit bit music of the video pit bit with the video segment.
63. The method according to claim 62, wherein the degree of matching of each of the video pit bits with the corresponding video segment is obtained by using a pre-trained music matching model, and the music matching model is capable of outputting a pit bit music matching score of the video pit bit with the video segment.
64. The method of claim 60, wherein the matching degree of the video segments corresponding to the two adjacent video pit positions is determined according to the continuity of the moving direction of the video segments, the increasing and decreasing relationships of the scene types and the matching clips.
65. The method according to claim 64, wherein said matching degree of video segments corresponding to two adjacent video pit bits is obtained by using a pre-trained segment matching model, and said segment matching model is capable of outputting the matching degree of video segments filled with two adjacent video pit bits.
66. A video processing method, comprising:
acquiring a plurality of templates, wherein the templates at least comprise one video pit bit;
matching video clips for the video pit positions of each template to obtain a matching relation corresponding to each template, and determining a matching score of the matching relation corresponding to each template, wherein the video clips are clips of video materials to be processed;
determining a recommended template from the plurality of templates according to the matching score;
and filling the video clips into corresponding video pit positions of the recommendation template according to the matching relation corresponding to the recommendation template to obtain a recommendation video.
67. The method of claim 66, further comprising:
and dividing the material to be processed to generate a plurality of video segments.
68. The method of claim 67, wherein the segmenting the material to be processed to generate a plurality of video segments comprises:
according to the video information of the video material to be processed, the video material to be processed is segmented to generate a plurality of video segments.
69. The method of claim 68, wherein the video information comprises at least one of a direction of motion of a mirror and scene information.
70. The method according to claim 68, wherein said segmenting the video material to be processed into a plurality of video segments comprises:
segmenting the video material to be processed according to the video information of the video material to be processed to obtain a plurality of first video segments;
clustering and segmenting the first video segments to obtain a plurality of second video segments;
and taking the second video segment as a video segment of the video pit bit to be filled in the template.
71. The method of claim 70, wherein prior to cluster partitioning the first video segment, the method further comprises:
determining whether a first video clip with a video duration larger than a preset duration exists in the plurality of first video clips;
and if a first video segment with the video time length larger than the preset time length exists, executing the step of clustering and segmenting the first video segment.
72. The method of claim 70, wherein clustering the first video segment comprises:
determining a sliding window and a clustering center, wherein the sliding window is used for determining a current video frame to be processed, and the clustering center is used for determining a video segmentation point of the first video segment;
based on the clustering center, performing clustering analysis on the video frames of the first video clip according to the sliding window to determine video segmentation points;
and performing video segmentation on the first video segment according to the video segmentation point.
73. The method of claim 72, wherein the cluster center comprises image features of a first frame video frame of the first video segment.
74. The method of claim 73, wherein the image features of the video frames of the first video segment are obtained according to a pre-trained image feature network model, and wherein the image feature network model is capable of outputting the image features of the video frames of the first video segment.
75. The method of claim 72, wherein the size of the sliding window is related to the duration of the first video segment; alternatively, the size of the sliding window is related to a desired segmentation speed set by a user.
76. The method of claim 72, wherein the size of the sliding window is equal to 1.
77. The method of claim 72, wherein said performing a cluster analysis on video frames of said first video segment according to said sliding window based on said cluster center to determine video segmentation points comprises:
determining a current video frame according to the sliding window, and determining the similarity between the image characteristics of the current video frame and the clustering center;
if the similarity is smaller than a preset threshold value, taking the current video frame as a video segmentation point, and re-determining a clustering center;
and continuously determining the video segmentation point according to the re-determined clustering center until the last video frame of the first video segment.
78. The method of claim 77, wherein the determining the similarity between the image feature of the current video frame and the cluster center comprises:
and determining the cosine similarity between the image characteristics of the current video frame and the clustering center.
79. The method of claim 77, wherein said re-centering comprises:
and taking the image characteristics of the current video frame as the re-determined clustering center.
80. The method of claim 77, wherein after determining the similarity of the image feature of the current video frame to the cluster center, the method further comprises:
if the similarity is larger than or equal to a preset threshold value, updating the clustering center;
and continuously determining the similarity between the image characteristics of the current video frame and the updated clustering center according to the updated clustering center.
81. The method of claim 80, wherein said updating the cluster center comprises:
acquiring image characteristics of the current video frame;
and determining an updated clustering center according to the image characteristics of the current video frame and the clustering center.
82. The method according to claim 66, wherein said matching video clips for the video pit bits of each of the templates to obtain the corresponding matching relationship of each of the templates comprises:
constructing a plurality of flow network graphs according to the video clips and the video pit positions of each template, wherein each flow network graph comprises a plurality of nodes, and each node corresponds to the matching relation between one video clip and one video pit position;
and determining the matching relation of the video pit position of each template and the video clip based on a plurality of the flow network graphs.
83. The method according to claim 82, wherein said determining a matching relationship between the video pit bit of each of the templates and the video clip based on the plurality of the stream network maps comprises:
matching a proper video clip for the video pit position of each template based on a maximum flow algorithm to obtain an optimal path;
and taking the corresponding relation between the video clips and the video pit positions in the optimal path as the matching relation between the video pit positions of each template and the video clips.
84. The method according to claim 83, wherein said matching the video pit bits of each of the templates with the appropriate video segment based on max flow algorithm to obtain the optimal path comprises:
and determining the optimal path corresponding to each template according to the energy value between two adjacent nodes in the flow network graph.
85. The method of claim 84, wherein said determining a matching score for the matching relationship of the video pit bit of each of said templates with said video clip comprises:
and determining the matching score of the matching relation between the video pit position of each template and the video clip according to the energy value between every two adjacent nodes in the optimal path.
86. The method of claim 84, wherein the method comprises:
and determining the energy value between two adjacent nodes according to the energy value influence factor of each node.
87. The method according to claim 86, wherein said determining an energy value between two adjacent nodes according to an energy value influence factor of each of said nodes comprises:
obtaining an evaluation score and a preset weight of the energy value influence factor;
and determining the energy value between two adjacent nodes according to the evaluation score and the preset weight of the energy value influence factor.
88. The method of claim 86, wherein the energy value impact factors comprise at least one of a shooting quality of a video segment corresponding to each of the video pit bits, a degree of matching of each of the video pit bits with a corresponding video segment, and a degree of matching of video segments corresponding to two adjacent video pit bits.
89. The method according to claim 88, wherein said quality of said video clip corresponding to each of said video pit locations is determined based on image content of said video clip and video clip rating.
90. The method according to claim 88, wherein said degree of matching of each of said video pit bits with a corresponding video clip is determined according to a degree of matching of pit bit music of said video pit bit with said video clip.
91. The method according to claim 90, wherein the degree of matching of each of said video pit bits with the corresponding video segment is obtained by using a pre-trained music matching model, and said music matching model is capable of outputting a pit bit music matching score of said video pit bit with said video segment.
92. The method of claim 88, wherein the matching degree of the video segments corresponding to the two adjacent video pit positions is determined according to the continuity of the moving direction of the video segments, the increasing and decreasing relationships of the scene types, and the matching clips.
93. The method according to claim 92, wherein said matching degree of video segments corresponding to two adjacent video pit bits is obtained by using a pre-trained segment matching model, and said segment matching model is capable of outputting the matching degree of video segments filled with two adjacent video pit bits.
94. The method of claim 66, wherein said matching video clips for the video pit bits of each of the templates to obtain the matching relationship corresponding to each of the templates comprises:
classifying the video clips according to the pit position labels of the video pit positions of the template or the template labels of the template to obtain classified video clips;
and determining the matching relation corresponding to the template according to the classified video clips.
95. The method according to claim 94, wherein said classifying the video clips according to the pit bit labels of the video pit bits of the template or the template labels of the template comprises:
and grading the video clips according to the pit bit labels of the video pit bits or the template labels of the template to obtain video clips of multiple grades.
96. The method of claim 95, wherein the plurality of hierarchical categories of video segments include at least a first category of video segments, a second category of video segments, and a third category of video segments;
the highlight level of the video clips of the first category is greater than the highlight level of the video clips of the second category, and the highlight level of the video clips of the second category is greater than the highlight level of the video clips of the third category.
97. The method of claim 96, wherein the highlight level is determined based on picture content and audio content of the video segment.
98. The method according to claim 94, wherein said determining a matching relationship between a video pit bit of each of said templates and said video clip according to said classified video clip comprises:
sorting the classified video clips according to the pit position labels of the video pit positions or the template labels of the template; and
and determining the matching relation between the video pit position of each template and the video clip according to the sequencing result.
99. The method according to claim 98, wherein said determining a matching relationship between a video pit bit of each of said templates and said video clip according to said classified video clip comprises:
and distributing video clips to the video pit positions of the templates according to the sequencing result, and determining the matching relation between the video pit position of each template and the video clip.
100. The method of claim 66, wherein determining a recommended template from the plurality of templates based on the match scores comprises:
determining a preset number of templates from the plurality of templates according to the matching score;
combining the target number of templates in the preset number of templates to obtain a plurality of template groups, wherein the template groups comprise the target number of templates;
and determining a recommended template group from a plurality of template groups according to the template types of the target number of templates in the template group, and taking the target number of templates in the recommended template group as recommended templates.
101. The method according to claim 100, wherein determining a recommended set of templates from a plurality of sets of templates based on template types for a target number of templates in the set of templates comprises:
obtaining template types of a target number of templates in a plurality of template groups, and determining combination scores corresponding to the plurality of template groups according to the template types and the matching scores;
determining a recommended template set from a plurality of the template sets according to the combination score.
102. The method according to claim 101, wherein said determining a combined score for a plurality of said template sets based on said template type and said match score comprises:
determining template richness among a target number of templates in the plurality of template groups according to the template types;
and determining the combined score of the plurality of template groups according to the template richness among the templates with the number of the targets in the template group and the sum of the matching scores of the templates with the number of the targets in the template group.
103. The method according to claim 100, wherein the number of recommended videos includes a plurality of recommended videos, and the plurality of recommended videos are obtained according to a target number of templates in the set of recommended templates.
104. The method of claim 103, further comprising:
and recommending a plurality of the recommended videos to the user so as to facilitate the user to select.
105. The method of claim 66, wherein determining a recommended template from the plurality of templates based on the match scores comprises:
acquiring template types of the templates;
and determining a recommended template according to the template types and the matching scores of the plurality of templates.
106. The method of claim 105, wherein determining a recommended template based on the template types and the match scores for the plurality of templates comprises:
dividing the templates into a plurality of template groups according to the template types, wherein each template group at least comprises one template;
determining templates meeting the number of type requirements from the plurality of type template groups according to the matching scores of the templates; and
and selecting templates from the rest templates in the plurality of templates according to the matching scores of the templates until the number of the selected templates meets the number of the required templates.
107. The method of claim 106, wherein determining a recommended template based on the template types and the match scores for the plurality of templates comprises:
sequentially selecting templates from the plurality of templates according to the matching scores of the templates until the selected template type meets the type demand quantity;
and selecting templates from the rest templates in the plurality of templates according to the matching scores of the templates until the number of the selected templates meets the number of the required templates.
108. The method of any of claims 66 to 107, wherein the video material to be processed comprises at least one of video material captured via a handheld terminal, video material captured via a mobile platform, video material obtained from a cloud server, and video material obtained from a local server.
109. The method of claim 67, wherein the video segmentation of the video material to be processed comprises:
and selecting the video material to be processed, and performing video segmentation on the selected video material to be processed.
110. The method of claim 109, wherein said selecting material for the video material to be processed comprises:
selecting the materials according to the material parameters of the video materials to be processed;
the material parameters comprise at least one of shooting time, shooting place and shooting target object.
111. The method of claim 109, wherein said selecting material for the video material to be processed comprises:
and selecting the video material to be processed according to the selection operation of the user.
112. The method of claim 109, wherein said selecting material for the video material to be processed comprises:
clustering the video materials to be processed according to the material parameters of the video materials to be processed so as to realize material selection;
wherein the clustering comprises at least one of time clustering, place clustering and target object clustering.
113. The method of any one of claims 66 to 107, wherein the method comprises:
acquiring the image quality of the video material to be processed;
and carrying out scrap removal on the video material to be processed according to the image quality of the video material to be processed.
114. The method of claim 113, wherein the image quality comprises at least one of picture jitter, picture blur, picture overexposure, picture underexposure, no definite scene in the image, or no definite subject in the image.
115. The method according to any one of claims 66 to 107, wherein said filling the video clip into the corresponding video pit of the recommended template according to the matching relationship corresponding to the recommended template comprises:
determining whether the video duration of the video clip filled in the video pit bit is greater than the duration of the video pit bit;
if the video time length of the video clip filled in the video pit position is larger than the time length of the video pit position, carrying out clip extraction on the video clip to obtain a selected clip;
and selecting the video time length of the segment to be less than or equal to the time length of the video pit bit.
116. The method of claim 115, wherein said extracting segments from said video segments to obtain selected segments comprises:
and according to the video elements of the video clips, carrying out clip extraction on the video clips to obtain selected clips.
117. The method of claim 116 wherein the video elements comprise at least one of a smiley face picture, laughter audio, character action, clear human voice, picture composition, and aesthetic score.
118. The method according to any one of claims 66 to 107, wherein said filling the video clip into the corresponding video pit of the recommended template according to the matching relationship corresponding to the recommended template comprises:
filling the video clip into the corresponding video pit position of the recommendation model according to the matching relation between the video pit position of the recommendation template and the video clip to obtain an initial video;
and performing image optimization on the initial video based on the template requirement of the recommended template to obtain a recommended video.
119. The method of claim 118, wherein the template requirements comprise at least one of a transition setting, an acceleration and deceleration setting, and a map special effects setting.
120. The method of any one of claims 66 to 107, wherein the method comprises:
and carrying out duplicate removal processing on the video material to be processed.
121. The method of claim 120, wherein the deduplication process comprises similar material clustering.
122. A video processing method is used for synthesizing a video material to be processed and a preset template, and comprises the following steps:
according to video information of a video material to be processed, segmenting the video material to be processed to generate a plurality of video segments;
determining video clips of all video pit positions to be filled in the template according to the pit position information of the video pit positions of the template to obtain a matching relation corresponding to the template;
and filling the video segments into corresponding video pit positions of the template according to the matching relation corresponding to the template to obtain a recommended video.
123. The method of claim 122, wherein the video information comprises at least one of a direction of motion of a mirror and scene information.
124. The method according to claim 122, wherein said segmenting the video material to be processed into a plurality of video segments comprises:
segmenting the video material to be processed according to the video information of the video material to be processed to obtain a plurality of first video segments;
clustering and segmenting the first video segments to obtain a plurality of second video segments;
and taking the second video segment as a video segment of the video pit bit to be filled in the template.
125. The method of claim 124, wherein prior to cluster partitioning the first video segment, the method further comprises:
determining whether a first video clip with a video duration larger than a preset duration exists in the plurality of first video clips;
and if a first video segment with the video time length larger than the preset time length exists, executing the step of clustering and segmenting the first video segment.
126. The method of claim 124, wherein clustering the first video segment comprises:
determining a sliding window and a clustering center, wherein the sliding window is used for determining a current video frame to be processed, and the clustering center is used for determining a video segmentation point of the first video segment;
based on the clustering center, performing clustering analysis on the video frames of the first video clip according to the sliding window to determine video segmentation points;
and performing video segmentation on the first video segment according to the video segmentation point.
127. The method of claim 126, wherein the cluster center comprises image features of a first frame video frame of the first video segment.
128. The method of claim 127, wherein the image features of the video frames of the first video segment are obtained according to a pre-trained image feature network model, and wherein the image feature network model is capable of outputting the image features of the video frames of the first video segment.
129. The method of claim 126, wherein the size of the sliding window is related to the duration of the first video segment; alternatively, the size of the sliding window is related to a desired segmentation speed set by a user.
130. The method of claim 126, wherein the size of the sliding window is equal to 1.
131. The method of claim 126, wherein said performing a cluster analysis on video frames of said first video segment according to said sliding window based on said cluster center to determine video segmentation points comprises:
determining a current video frame according to the sliding window, and determining the similarity between the image characteristics of the current video frame and the clustering center;
if the similarity is smaller than a preset threshold value, taking the current video frame as a video segmentation point, and re-determining a clustering center;
and continuously determining the video segmentation point according to the re-determined clustering center until the last video frame of the first video segment.
132. The method according to claim 131, wherein said determining similarity of image features of the current video frame to the cluster center comprises:
and determining the cosine similarity between the image characteristics of the current video frame and the clustering center.
133. The method of claim 131, wherein said re-determining a cluster center comprises:
and taking the image characteristics of the current video frame as the re-determined clustering center.
134. The method of claim 131, wherein after determining the similarity of the image feature of the current video frame to the cluster center, the method further comprises:
if the similarity is larger than or equal to a preset threshold value, updating the clustering center;
and continuously determining the similarity between the image characteristics of the current video frame and the updated clustering center according to the updated clustering center.
135. The method of claim 134, wherein said updating the cluster center comprises:
acquiring image characteristics of the current video frame;
and determining an updated clustering center according to the image characteristics of the current video frame and the clustering center.
136. A video processing apparatus, characterized in that the video processing apparatus comprises a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program and, when executing the computer program, to implement: a video processing method as claimed in any of claims 1 to 54.
137. A video processing apparatus, characterized in that the video processing apparatus comprises a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program and, when executing the computer program, to implement: a video processing method as claimed in any of claims 55 to 65.
138. A video processing apparatus, characterized in that the video processing apparatus comprises a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program and, when executing the computer program, to implement: a video processing method as claimed in any of claims 66 to 121.
139. A video processing apparatus, characterized in that the video processing apparatus comprises a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program and, when executing the computer program, to implement: a video processing method as claimed in any of claims 122 to 135.
140. A terminal device, characterized in that the terminal device comprises a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program and, when executing the computer program, to implement: a video processing method as claimed in any of claims 1 to 54.
141. A terminal device, characterized in that the terminal device comprises a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program and, when executing the computer program, to implement: a video processing method as claimed in any of claims 55 to 65.
142. A terminal device, characterized in that the terminal device comprises a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program and, when executing the computer program, to implement:
a video processing method as claimed in any of claims 66 to 121.
143. A terminal device, characterized in that the terminal device comprises a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program and, when executing the computer program, to implement: a video processing method as claimed in any of claims 122 to 135.
144. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to carry out the steps of the video processing method according to any one of claims 1 to 135.
CN202080075426.7A 2020-12-31 2020-12-31 Video processing method, video processing apparatus, terminal device, and storage medium Pending CN114731458A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/142432 WO2022141533A1 (en) 2020-12-31 2020-12-31 Video processing method, video processing apparatus, terminal device, and storage medium

Publications (1)

Publication Number Publication Date
CN114731458A true CN114731458A (en) 2022-07-08

Family

ID=82229974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080075426.7A Pending CN114731458A (en) 2020-12-31 2020-12-31 Video processing method, video processing apparatus, terminal device, and storage medium

Country Status (2)

Country Link
CN (1) CN114731458A (en)
WO (1) WO2022141533A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115134646A (en) * 2022-08-25 2022-09-30 荣耀终端有限公司 Video editing method and electronic equipment
CN115695944A (en) * 2022-12-30 2023-02-03 北京远特科技股份有限公司 Vehicle-mounted image processing method and device, electronic equipment and medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100725B (en) * 2022-08-23 2022-11-22 浙江大华技术股份有限公司 Object recognition method, object recognition apparatus, and computer storage medium
CN116866498B (en) * 2023-06-15 2024-04-05 天翼爱音乐文化科技有限公司 Video template generation method and device, electronic equipment and storage medium
CN116980717B (en) * 2023-09-22 2024-01-23 北京小糖科技有限责任公司 Interaction method, device, equipment and storage medium based on video decomposition processing
CN117278801B (en) * 2023-10-11 2024-03-22 广州智威智能科技有限公司 AI algorithm-based student activity highlight instant shooting and analyzing method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080304806A1 (en) * 2007-06-07 2008-12-11 Cyberlink Corp. System and Method for Video Editing Based on Semantic Data
CN110324676A (en) * 2018-03-28 2019-10-11 腾讯科技(深圳)有限公司 Data processing method, media content put-on method, device and storage medium
CN110532426A (en) * 2019-08-27 2019-12-03 新华智云科技有限公司 It is a kind of to extract the method and system that Multi-media Material generates video based on template
CN111357277A (en) * 2018-11-28 2020-06-30 深圳市大疆创新科技有限公司 Video clip control method, terminal device and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9154813B2 (en) * 2011-06-09 2015-10-06 Comcast Cable Communications, Llc Multiple video content in a composite video stream
CN104735468B (en) * 2015-04-03 2018-08-31 北京威扬科技有限公司 A kind of method and system that image is synthesized to new video based on semantic analysis
CN110730381A (en) * 2019-07-12 2020-01-24 北京达佳互联信息技术有限公司 Method, device, terminal and storage medium for synthesizing video based on video template

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080304806A1 (en) * 2007-06-07 2008-12-11 Cyberlink Corp. System and Method for Video Editing Based on Semantic Data
CN110324676A (en) * 2018-03-28 2019-10-11 腾讯科技(深圳)有限公司 Data processing method, media content put-on method, device and storage medium
CN111357277A (en) * 2018-11-28 2020-06-30 深圳市大疆创新科技有限公司 Video clip control method, terminal device and system
CN110532426A (en) * 2019-08-27 2019-12-03 新华智云科技有限公司 It is a kind of to extract the method and system that Multi-media Material generates video based on template

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115134646A (en) * 2022-08-25 2022-09-30 荣耀终端有限公司 Video editing method and electronic equipment
CN115695944A (en) * 2022-12-30 2023-02-03 北京远特科技股份有限公司 Vehicle-mounted image processing method and device, electronic equipment and medium
CN115695944B (en) * 2022-12-30 2023-03-28 北京远特科技股份有限公司 Vehicle-mounted image processing method and device, electronic equipment and medium

Also Published As

Publication number Publication date
WO2022141533A1 (en) 2022-07-07

Similar Documents

Publication Publication Date Title
CN114731458A (en) Video processing method, video processing apparatus, terminal device, and storage medium
CN111683209B (en) Mixed-cut video generation method and device, electronic equipment and computer-readable storage medium
CN113569088B (en) Music recommendation method and device and readable storage medium
CN113709561B (en) Video editing method, device, equipment and storage medium
CN111428088A (en) Video classification method and device and server
CN107222795B (en) Multi-feature fusion video abstract generation method
AU2021231754A1 (en) Systems and methods for automating video editing
US10248865B2 (en) Identifying presentation styles of educational videos
CN112511854B (en) Live video highlight generation method, device, medium and equipment
CN111930994A (en) Video editing processing method and device, electronic equipment and storage medium
CN112533051A (en) Bullet screen information display method and device, computer equipment and storage medium
JP2011215963A (en) Electronic apparatus, image processing method, and program
KR20210118437A (en) Image display selectively depicting motion
CN113094552A (en) Video template searching method and device, server and readable storage medium
CN113641859B (en) Script generation method, system, computer storage medium and computer program product
CN112203140B (en) Video editing method and device, electronic equipment and storage medium
CN111586466B (en) Video data processing method and device and storage medium
CN110879974A (en) Video classification method and device
CN111432206A (en) Video definition processing method and device based on artificial intelligence and electronic equipment
CN112040273A (en) Video synthesis method and device
CN112004138A (en) Intelligent video material searching and matching method and device
CN116595438A (en) Picture creation method, device, equipment and storage medium
CN112800263A (en) Video synthesis system, method and medium based on artificial intelligence
Xu et al. Fast summarization of user-generated videos: exploiting semantic, emotional, and quality clues
CN113660526B (en) Script generation method, system, computer storage medium and computer program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination