WO2023030098A1 - 视频剪辑方法、电子设备及存储介质 - Google Patents

视频剪辑方法、电子设备及存储介质 Download PDF

Info

Publication number
WO2023030098A1
WO2023030098A1 PCT/CN2022/114268 CN2022114268W WO2023030098A1 WO 2023030098 A1 WO2023030098 A1 WO 2023030098A1 CN 2022114268 W CN2022114268 W CN 2022114268W WO 2023030098 A1 WO2023030098 A1 WO 2023030098A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
editing
model
edited
clipping
Prior art date
Application number
PCT/CN2022/114268
Other languages
English (en)
French (fr)
Inventor
李扬
李雪晨
东巍
朱洲
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023030098A1 publication Critical patent/WO2023030098A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content

Definitions

  • the present invention relates to the technical field of intelligent terminals, in particular to a video editing method, electronic equipment and storage media.
  • Embodiments of the present application provide a video editing method, electronic equipment, and a storage medium.
  • the method is implemented through a video editing model that can edit user-input materials according to the editing style selected by the user.
  • the video editing model The video editing part of the user input material can be simulated editing training for a preset number of times.
  • the scoring part of the video editing model can be based on the editing style selected by the user.
  • the video clips are scored and fed back to the video editing part to promote the video editing part to adjust the network parameters based on the feedback, optimize the editing strategy, and then determine the better simulation editing operation for the next video clip to be edited, and finally make the video
  • the editing model finds the optimal editing strategy suitable for the user's input material through the preset times of simulated editing training process, so as to edit a short video that matches the editing style selected by the user and has a relatively high degree of specialization.
  • the optimal editing strategy also reduces the technical threshold for users to operate the video editing process, which is conducive to improving the user experience.
  • the embodiment of the present application provides a video clipping method, which is applied to an electronic device.
  • the method includes: using a first clipping model to clip the material to be clipped to obtain a first clipped video; Evaluate to obtain a first feedback value; determine a second editing model according to the first feedback value, and use the second editing model to perform a second editing on the material to be edited to obtain a second editing video; evaluate the second editing video to obtain a second feedback value , wherein, the second feedback value is higher than the first feedback value; the second clipped video is used as the output clipped video of the material to be clipped.
  • the video editing method provided by the present application can utilize an initial first editing model to perform multiple editing (such as multiple simulation editing) of the material to be edited, and find the optimal editing model in the multiple simulation editing process ( That is, after the second clipping model, such as the clipping strategy network 220 corresponding to the optimal clipping strategy described below), the clipped video (that is, the second clipped video) obtained after clipping the material to be clipped using the second clipping model is used as Output the edited video and complete the editing process of the material to be edited.
  • multiple editing such as multiple simulation editing
  • the optimal editing model in the multiple simulation editing process
  • the first clipping model can be, for example, the clipping strategy network 220 under the initial network parameters described in step 311t below, and the first clipping and the second clipping are, for example, the preset multiple simulation clip training described in the embodiments below.
  • the first clip corresponds to the first simulation clip
  • the second clip corresponds to the last simulation clip corresponding to the preset number of thresholds.
  • the second clip model corresponds to the network parameters adjusted to Clipping strategy network 200 corresponding to the network parameters of the optimal clipping strategy. Therefore, it can be understood that between the first clip and the second clip, the material to be clipped is repeatedly simulated and clipped multiple times to finally determine the second clip model as the optimal clip model, and the second clip video is used as the output clip video.
  • the editing strategy For each analog editing process, for example, in the first analog editing process corresponding to the above-mentioned first editing, the results of the analog editing (for example, each video that has been edited in the first analog editing process corresponding to the first editing video) segment) to obtain the first feedback value for adjusting the first editing model, such as the value network 210 described in steps 314t to 315t in the embodiment below to score the video segment that has been edited and the reward value corresponding to the feedback, the editing strategy The network 220 can calculate the loss function based on the reward value and adjust the network parameters. Every time a simulated editing training is completed, the editing strategy network 220 completes a round of network parameter adjustment accordingly.
  • the editing strategy network 220 finds the optimal editing strategy, and the network parameters of the corresponding editing strategy network 220 are also adjusted to the network parameters corresponding to the optimal editing strategy.
  • the editing strategy network 220 under the network parameters is the optimal editing model, that is, the above-mentioned second editing model, and the actual editing process is performed on the material to be edited using the optimal editing model, and the obtained second editing video is the output editing video.
  • the video clips described in the following embodiments correspond to the video clips obtained by dividing the material to be clipped, so the first feedback value or the second feedback value obtained by evaluating the first clipped video or the second clipped video, It can be understood correspondingly as the value network 210’s cumulative value of the feedback reward value for each video segment that has been edited.
  • the value network 210 represents cumulative value of the feedback reward value for each video segment that has been edited.
  • the above method further includes: determining the second clip model according to the first feedback value, specifically: adjusting the parameters of the first clip model to the second clip model according to the first feedback value The parameters of the model.
  • determining the second clipping model according to the first feedback value is accomplished by adjusting the network parameters of the clipping model according to the first feedback value, for example, the value network 210 pairs The cumulative value of the reward value fed back by the score of the video segment of each segment that has been edited.
  • the parameters of the first editing model are, for example, the initial network parameter ⁇ 1 of the editing strategy network 220 described below, and the parameters of the second editing model are, for example, the following The described network parameters corresponding to the optimal clipping strategy.
  • the process of adjusting the parameters of the first clipping model to the parameters of the second clipping model according to the first feedback value includes clipping the initial network parameter ⁇ 1 of the policy network 220 according to the first feedback value, for example, after the first simulated clipping training , the network parameters of the editing strategy network 220 are adjusted to ⁇ 100 , and then the editing strategy network 220 under the network parameter ⁇ 100 is used to conduct the second simulation editing training on the material to be edited.
  • the value network 210 The reward value fed back by the score of the clipped video segment is continuously used to adjust the network parameters of the clipping policy network 220 . In this way, after multiple times of preset simulated editing training, the network parameters of the editing strategy network 220 are finally adjusted to the network parameters corresponding to the optimal editing strategy.
  • the above method further includes: evaluating the first video clip to obtain a first feedback value, including: evaluating the first video clip through an evaluation model to determine the first feedback value.
  • the above-mentioned evaluation model is, for example, the value network 210 described in the following embodiments. Therefore, it can be understood that the evaluation model is used to evaluate the clipping ability of the first clipping model or the second clipping model, for example, using the first clipping model clipping to obtain The first clip of the video is evaluated and the first feedback value is obtained.
  • the process in which the value network 210 scores the edited video clips obtained by the editing strategy network 220 through simulated editing to feed back the reward value is the evaluation process of the editing ability of the editing strategy network 220 .
  • the evaluation model may also be an evaluation system or evaluation algorithm that has the same scoring feedback function as the value network 210 described below, which is not limited here.
  • the above method further includes: the evaluation model includes scoring rules corresponding to multiple editing styles.
  • the above method further includes: using an evaluation model to evaluate the first clip video to determine the first feedback value, including: responding to the clip style selected by the user, the evaluation model adopts the corresponding Scoring the first video clip according to the scoring rule of the editing style selected by the user to determine the first feedback value.
  • the evaluation model can be preset with scoring rules corresponding to multiple editing styles to evaluate the first video clip.
  • the evaluation model uses the scoring rules corresponding to the editing style selected by the user, and the obtained first
  • the feedback value also corresponds to the editing style selected by the user, and then the parameters of the first editing model are adjusted according to the first feedback value, and the determined second editing model can be edited to obtain a second editing video conforming to the editing style selected by the user.
  • the evaluation model is, for example, the value network 210 described in the following embodiments.
  • the value network 210 is based on the scoring rules corresponding to the editing style selected by the user, and scores the video clips that have been edited to feed back reward values for adjusting the editing strategy.
  • the optimal editing strategy found by the editing strategy network 200 after parameter adjustment is finally the optimal editing strategy corresponding to the editing style selected by the user and the material to be edited.
  • the scoring rules corresponding to multiple editing styles preset in the value network 210 reference may be made to the relevant description in step 301 below, and details will not be repeated here.
  • the above method further includes: evaluating the second video clip to obtain a second feedback value, including: scoring the second video clip through an evaluation model to determine the second feedback value,
  • the evaluation model uses scoring rules corresponding to the editing style selected by the user to rate the second video clip.
  • the evaluation model uses the scoring rules corresponding to the editing style (the editing style selected by the user) in the unified medium to evaluate and obtain feedback values for the first editing or the second editing and other edited videos obtained by simulating the editing process, and the evaluation model adopts the corresponding Score the second video clip according to the scoring rules of the editing style selected by the user, and then determine the second feedback value based on the scoring.
  • This process can refer to the value network 210 described after step 303 below to adopt the scoring corresponding to the editing style selected by the user.
  • the rules score the video clips that have been edited, and the video editing model 200 converts based on the scores to obtain the relevant description of the reward value input to the editing strategy network 220 , which will not be repeated here.
  • the above-mentioned method further includes: the second feedback value is higher than the first feedback value, including: the evaluation model's score for the second clip video is higher than the evaluation model's score for the first clip video score.
  • the first feedback value is determined based on the evaluation model's score for the first video clip
  • the second feedback value is determined based on the evaluation model's score for the second video clip.
  • the score of a segment and the reward value fed back to the editing strategy network 220 may be in a linear conversion relationship, for example, a score of 87 may correspond to a reward value of 8.7, and the higher the score, the greater the reward value fed back. Therefore, it can be understood that the higher the score of the evaluation model, the higher the corresponding feedback value.
  • the above method further includes: multiple editing styles including one or more of Hong Kong style, childhood, comics, suspense, Chinese style, cuteness, and beauty.
  • the above method further includes: the first editing model or the second editing model, and the editing strategy adopted for the first editing of the material to be edited includes at least one of the following: The operation of dividing the video segments to be edited by the video; the operation of determining the editing action for each divided video segment to be edited; and performing the editing operation on each video segment to be edited using the determined editing action.
  • the first clipping model is, for example, the clipping strategy network 220 below the initial network parameter ⁇ 1
  • the clipping strategy corresponding to the first clipping model is the clipping strategy corresponding to the network parameter ⁇ 1
  • the clipping strategy corresponding to the second clipping model is Corresponds to the optimal clipping strategy described below.
  • the editing strategy adopted by the editing strategy network 220 for the material to be edited includes dividing the material to be edited to obtain video segments with smooth content, and the calculation process of determining the editing action for each video segment .
  • the above method further includes: the editing action includes any one or a combination of multiple of the following: adding transition effects; adding dynamic effects; using filters; marking as highlights; Perform lens splicing; variable speed processing; add background music; add special effects audio; adjust sound.
  • the above method further includes: the material to be edited includes a picture and/or video to be edited.
  • the material to be edited can be a collection of pictures, a collection of videos, or a combination of pictures and videos, which is not limited here.
  • the embodiment of the present application provides an electronic device, the electronic device includes one or more processors; one or more memories; one or more memories store one or more programs, when one or more When the program is executed by one or more processors, the electronic device executes the video clipping method above.
  • the embodiment of the present application provides a computer-readable storage medium, on which instructions are stored, and when the instructions are executed on a computer, the computer executes the above video editing method.
  • an embodiment of the present application provides a computer program product, which includes a computer program or instruction; when the computer program or instruction is executed by a processor on a computer, the computer executes the above video editing method.
  • FIGS. 1A to 1E are schematic diagrams of an operation interface of a video editing solution.
  • FIG. 2 is a schematic structural diagram of a video editing model provided by an embodiment of the present application.
  • FIG. 3A is a schematic diagram of a training process of a value network 210 provided by an embodiment of the present application.
  • FIG. 3B-1 is a schematic diagram showing a process of performing simulated editing training on sample data to be edited by a video editing model 200 provided in the embodiment of the present application.
  • Fig. 3B-2 is a schematic diagram of the process of clipping the sample data to be clipped using the optimal clipping strategy trained by a video clipping model 200 provided by the embodiment of the present application using the simulated clipping training process shown in Fig. 3B-1 above. .
  • FIG. 4 is a schematic diagram of an implementation flow of the mobile phone 100 provided in the embodiment of the present application to execute the video clipping method of the present application.
  • 5A to 5E are schematic diagrams of the operation interface of the mobile phone 100 provided by the embodiment of the present application to execute the video clipping method of the present application.
  • FIG. 6 is a schematic structural diagram of a mobile phone 100 provided by an embodiment of the present application.
  • FIG. 7 is a block diagram of a software structure of a mobile phone 100 provided by an embodiment of the present application.
  • Illustrative embodiments of the present application include but are not limited to a video editing method, electronic equipment, storage media, and the like.
  • FIGS. 1A to 1E show schematic diagrams of an operation interface of a video clipping solution.
  • the user clicks to open the clipping TM 110 on the desktop of the mobile phone 100', and enters the editing function interface 101 shown in FIG. 1B.
  • the clipping TM 110 is a video editing application installed on the mobile phone 100' .
  • the user needs to click the editing button 021 on the interface 102 shown in Figure 1C To operate and cut the length of the video material, and click the audio button 022, the text button 023, the picture-in-picture button 024, the special effect button 025 and the filter button 026 respectively to add audio, add text, add video played by picture-in-picture, Add transitions and dynamic special effects, and add filter effects, etc.
  • the editing function interface 101 shown in FIG. After selecting the video or picture to be added on the material selection interface 103, click the next button 031 in the lower right corner of the interface 103 to enter the template selection interface 104 shown in FIG. 1E.
  • the template selection interface 104 includes a template recommendation area 041, and the user can select a template of interest in the template recommendation area 041, for example, click "template 1" to edit with one click to get a short video of the same style as "template 1".
  • the template selection interface 104 shown in FIG. 1E also includes a video preview area 042, and the user can preview the short video effect generated by clipping in the video preview area 042 after selecting a corresponding template. After finishing the video clip, the user can click the export button 043 in the upper right corner of the template selection interface 104 to export the clipped short video.
  • the operation interface provided corresponding to the one-key filming button 012 shown in FIGS. 1D to 1E is more convenient for users to quickly edit.
  • the function of the "one-click film” button 012 can greatly reduce the threshold of creation.
  • the present application provides a video clipping method, specifically, the video clipping method is implemented based on a video clipping model, and the video clipping model can clip the user input material according to the clipping style selected by the user,
  • the video clipping model includes a part for performing video clipping (for example, hereinafter referred to as the clipping policy network 220) and a scoring part for scoring the video clipping part to promote continuous optimization of the video clipping part (for example, hereinafter referred to as the value network 210).
  • the video editing part of the video editing model can first perform a preset number of simulated editing training on the user input material.
  • the video editing part has completed the video clips of analog editing (such as the first video clip that has been edited) to score and feed back to the video editing part to promote the video editing part to adjust the network parameters based on the feedback, optimize the editing strategy and then determine the next The better simulated editing operation performed by the next segment (for example, segment 2) of the clip, so that the video clip model can find the optimal clipping strategy suitable for the user input material through the preset number of simulated clipping training processes, thus clipping out
  • a short video with a relatively high degree of specialization that matches the editing style selected by the user, for example, the degree of specialization presented by the short video can reach the degree of specialization corresponding to a short video of the same style produced by a professional editor.
  • the two parts of the video clip model can be implemented based on the same type of neural network model, or based on different neural network models.
  • the scoring part i.e. the following value network 210
  • the video clipping part i.e.
  • the following clipping strategy network 220 can be implemented based on a convolutional neural network (Convolutional Neural Network, CNN), or the scoring part is based on a recurrent neural network ( Recurrent Neural Network, RNN) to achieve, the video clip part is realized based on CNN, as an example, in the embodiment of the present application, the video clip part of the video clip model 200 (that is, the following clip strategy network 220) can adopt deep reinforcement learning Network (Deep Q-Network, DQN) for training, where DQN is an end-to-end neural network model based on CNN that combines deep learning and reinforcement learning to achieve perception to action.
  • DQN deep reinforcement learning Network
  • the scoring part and the video editing part can also be implemented based on other neural network models, which are not limited here.
  • Fig. 2 shows a scene 20 of editing a video by a video editing model according to some embodiments of the present application.
  • the scene 20 includes a mobile phone 100 and a server 300 .
  • the mobile phone 100 can use the video editing model 200 trained by the server 300 to realize the video editing function, that is, input the material to be edited and select the video editing style.
  • the video editing model 200 first presets the editing material based on the editing style selected by the user. The number of times of simulated editing training, and finally find the optimal editing strategy suitable for the material to be edited, and edit the material to be edited, and finally get the short video that has been edited.
  • the material to be edited can be divided into a plurality of preset continuous video segments to be edited based on a preset video segment division strategy, as described in the context of the embodiment of the present application
  • the described video clips refer to the video clips obtained after the material to be edited is divided.
  • the video clip division strategy preset on the video editing model 200 can reasonably divide the material to be edited into a reasonable number of video clips, and the content of the divided video clips is related to each other, and the content connection between consecutive video clips also has Better fluency.
  • the video clip division strategy preset in the video clip model 200 can also be obtained by training a neural network model based on video clips of various clip styles, which will not be repeated here.
  • the following content will focus on describing the training process of the value network 210 and the editing policy network 220 in the video editing model 200 provided by this application, and the specific process of implementing the video editing method of this application through the video editing model 200 through the cooperation of the two.
  • the video editing model 200 divides the material to be edited into preset multi-segments of video clips
  • the video editing model 200 performs each simulation editing training process on the material to be edited.
  • the multiple video clips are processed in sequence by analog editing.
  • the video editing model 200 can perform simulated editing processing on the first video segment input to the video editing model 200 through the editing strategy network 220, and then perform the simulated editing processing through the value network 210
  • the video segment that obtains this segment that completes editing carries out score feedback, to promote the editing strategy network 220 of video editing model 200 to adjust network parameter, thereby optimize the analog editing operation to the video segment of the 2nd segment input video editing model 200;
  • video The clipping model 200 then performs a simulated clipping operation on the second input video clip, and then provides scoring feedback on the second clip that has been clipped through the value network 210, so as to promote the clipping strategy network 220 of the video clipping model 200 to continue to adjust network parameters , so as to optimize the analog editing operation of the third video segment input to the video editing model 200; this cycle continues until the video editing model 200 completes the analog editing process for all divided video segments.
  • the network parameters of the clipping strategy network 220 can undergo a continuous parameter adjustment process in the process of the video clipping model 200 performing simulated clipping on each video segment, that is to say, the network parameters of the clipping strategy network 220 after the simulated clipping training
  • the corresponding clipping strategy may be better than the clipping strategy corresponding to the network parameters of the clipping strategy network 220 before the simulated clipping training.
  • the video editing model 200 uses the editing strategy network 220 to edit the material to be edited, and can obtain the corresponding optimal editing strategy. Strategic video clips or short videos.
  • the video editing model 200 treats the editing material through the editing strategy network 220
  • the optimal editing strategy is used to perform corresponding editing processing on each of the divided video clips, and correspondingly obtain the first video clip that has been edited, the second video clip that has been edited, the N-1th video clip that has been edited, And the video segment that has been edited in the Nth paragraph, etc., until the video editing model 200 completes the analog editing process for all the materials to be edited, the video editing model 200 outputs a short video that has been edited, and the short video is edited by the first paragraph above.
  • the video segment of , the 2nd completed video segment, the N-1th completed edited video segment, and the Nth completed edited video segment are examples of the video editing strategy obtained after multiple simulation editing trainings to edit and process the editing material.
  • the process of generating the edited short video based on the edited video segments may be, for example, splicing the edited video segments according to the order of editing processing to obtain the completed edited short video.
  • the video editing model 200 is processing Before the material is subjected to simulated editing training, the material to be edited has been reasonably divided into multiple video segments based on the preset video segment division strategy. Therefore, the relevance and cohesion of the content can still be retained between the consecutively edited video segments. Fluency, therefore, after the video editing model 200 completes the editing process of the video clips to be edited, the edited video clips can be sequentially spliced to obtain the edited short video.
  • the clipped video segment may also be processed in other processing ways to obtain the clipped short video, which is not limited here.
  • the server 300 can use the sample data to train the value network 210 (that is, the scoring part of the above-mentioned video clip model) that can score the clipped video clips, and then embed the value network 210 into the video clip After the model 200, for example, the embedding process may be a data docking setting process such as docking the input layer data interface of the value network 210 with the output interface of the edited video clip data in the video clip model 200 .
  • the server 300 inputs sample data to be edited into the video editing model 200 embedded with the value network 210 , and performs training on the editing strategy network 220 of the video editing model 200 (ie, the above-mentioned video editing part).
  • the trained video editing model 200 can be transplanted to the mobile phone 100 to perform the video editing task. It can be understood that after the video clipping model 200 is ported to the mobile phone 100 , the process of performing the video clipping task may be an optimization training process for the clipping strategy network 220 of the video clipping model 200 .
  • the editing strategy network 220 of the video editing model 200 can preset the editing action set to be edited sample data, and the editing actions that make up the editing action set can include but not limited to adding transition effects, adding dynamic effects, using filters, etc.
  • One or more of actions such as mirroring, marking as highlights, lens splicing, speed change processing, adding background music, adding special effect audio, adjusting sound, etc.
  • the editing action can be described by one or more video editing parameters , it can be understood that the video editing parameters include but not limited to video speed change parameters, transition special effect parameters, dynamic special effect parameters, background special effect parameters, background music parameters, sound adjustment parameters and so on.
  • the video clips that have been edited can be the video clips that are obtained through editing operations such as video speed change, background music addition, sound adjustment, transitions and special effects, and filters that are highly compatible with the video content.
  • the editing action may be an accelerated speed change process for a certain video segment of the sample data to be edited. It can be understood that this is determined by a single fixed
  • the speed change process determined by the video speed change parameters is the conventional multi-speed processing process.
  • a certain video clip parameter can also be a continuously changing function.
  • the speed change process for a certain video segment can also be a curve speed change, such as the speed change process of alternating fast and slow, etc. I won't repeat them here.
  • the server 300 of the value network 210 of the above-mentioned training video clip model 200 can also be a laptop computer, a desktop computer, a tablet computer, a mobile phone, a wearable device, a head-mounted Displays, mobile email devices, portable game consoles, portable music players, reader devices, televisions with one or more processors embedded in or coupled to them, or other electronic devices capable of accessing a network.
  • the above-mentioned mobile phone 100 that transplants the video editing model 200 to perform video editing processing can also be a tablet computer, a desktop, a laptop, a handheld computer, a netbook, and an augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) device , smart TV, smart watch or other electronic devices that can access the network; or other electronic devices with shooting functions such as cameras and handheld pan-tilt devices, there is no limitation here.
  • augmented reality augmented reality, AR
  • VR virtual reality
  • smart TV smart watch or other electronic devices that can access the network
  • other electronic devices with shooting functions such as cameras and handheld pan-tilt devices
  • the value network 210 of the video clip model can be embedded into the video clip model 200 including the clip policy network 220, for example, by
  • the output layer of the value network 210 is connected to the input layer of the editing strategy network 220 (for example, the score output by the value network 210 is converted into a reward value and input to the editing strategy network 220), and the output layer of the editing strategy network 220 is connected to the value network
  • the input layer of 210 completes data docking (for example, the value network 210 obtains the edited video segment output by the clipping strategy network 220 ) and other compiling processes to complete the embedding of the value network 210 .
  • the video editing model 200 retrain the video editing model 200 through the editing strategy network 220 to edit the material to be edited, so that the editing strategy network 220 has a certain editing prediction ability and decision-making ability, such as the editing strategy network in the trained video editing model 200 220 can find the optimal editing strategy suitable for the material to be edited through multiple simulation editing training, and edit the material to be edited.
  • the video clip model 200 can be transplanted from the server 300 to the mobile phone 100, so that the user can implement the application through the mobile phone 100 A video clipping method for video clipping.
  • FIG. 3A shows a schematic diagram of a training process of a value network 210 according to an embodiment of the present application.
  • the training process includes the following steps:
  • the server 300 acquires a sample database for training the value network 210.
  • the sample database may be a database corresponding to various editing styles.
  • a large number of edited video clips may be collected, and the collected video clips may be scored, so as to obtain scores including video clips and corresponding video clips
  • a large amount of sample data forms a sample database for training the value network 210 .
  • each sample data includes a video clip segment, and a score corresponding to the video clip segment, the score is used in the training step of the subsequent value network 210 to fit scoring rules, and the score is used in the subsequent video clip model for video During the clipping process, it is used as the output of the value network 210 to feed back to the clipping strategy network 220 in the video clipping model, which will be described in the corresponding steps and the video clipping process below, and will not be repeated here.
  • sample database acquired by the server 300 may be a database corresponding to multiple editing styles, so as to be used for fitting in subsequent steps to obtain scoring rules corresponding to each editing style.
  • scoring rules corresponding to different editing styles are different. Specifically, It will be introduced in detail below and will not be repeated here.
  • a professional editor can be asked to score the collected video clips respectively, wherein the rules for scoring each video clip can be set by the professional editor .
  • the video clip to be evaluated can be scored separately from multiple dimensions. For example, after watching a video clip, the professional editor can start from the Dimensions such as creativity, interest, artistry, and narrative ability score the video clips separately. Therefore, in the final sample database, the video clips in one sample data can correspond to scores of multiple different dimensions.
  • the professional editor when the professional editor scores each video clip, he may also determine a comprehensive score for the video clip after watching the video clip. For example, after watching a video clip, a professional editor can comprehensively evaluate the video clip to determine a comprehensive score. Comprehensive consideration is used to determine a comprehensive score for the video clip. Therefore, in the finally formed sample database, a video clip in a sample data corresponds to a comprehensive score. It can be understood that the collected video clips may be short videos edited by professional editors, or video clips constituting the short video, etc., which are not limited here.
  • the server 300 inputs the sample data into the value network model to be trained, and fits the scoring rules.
  • the server 300 when training the value network 210, the server 300 first performs feature extraction on the above-mentioned sample data used for training, and then inputs the value network model to be trained for training to fit scoring rules. Among them, the server 300 performs feature extraction on the sample data in order to convert the image data of the video clips in the sample data and the scores corresponding to each video clip in the sample data into feature vector sets or matrix sets that can be read by the value network model .
  • the video clips in the above sample data and the scores corresponding to the video clips are generally unstructured data with different structures, which have high dimensions, different forms of expression, and contain a lot of redundant information. Features. Therefore, it is necessary to extract feature vectors that can characterize these sample data. It is understandable that these initial feature vectors can be one-dimensional or multi-dimensional.
  • one video clip corresponds to multiple ratings, that is, the above-mentioned professional editor evaluates a video clip from multiple dimensions such as creativity, interest, artistry, and narrative ability. score, then the score corresponding to this video clip is jointly represented by the scores of the four dimensions of creativity, fun, artistry, and narrative ability. That is to say, the score feature vector corresponding to this video clip has four dimensions, namely, the creative score , fun scoring, artistic scoring, and narrative ability scoring.
  • a video clip corresponds to a comprehensive score, that is, a comprehensive score determined by the above-mentioned professional editor after watching a video clip and comprehensively evaluating the video clip , then the score corresponding to the video clip can be represented by the comprehensive score, that is to say, the score feature vector corresponding to the video clip has one dimension, that is, the comprehensive score.
  • a video clip can be formed by several frames of image data, and the image data can include matrix data of multiple color channels, such as image data in RGB format. Therefore, the server 300 can When feature extraction is performed on video clips in the sample data, the image data that makes up the video clip can be represented by an n-order matrix. An image data in RGB format can be expressed as a third-order matrix. Therefore, an image data consisting of multiple consecutive frames The composed video clip segments can be represented as a collection of matrices. It can be understood that there is a one-to-one correspondence between the matrix set corresponding to the video clip extracted by the server 300 for the sample data and the scoring feature vector corresponding to the video clip.
  • the server 300 then inputs the feature vector or matrix set corresponding to the extracted sample data into the value network 210 to be trained, the matrix data of each video clip segment in each sample data is used as the input of the value network 210, and the scoring feature of each video clip segment
  • the vector serves as the output of the value network 210. For example, input the matrix data of video clip segment A to the value network 210, and the score of the professional editor is B, then when training the value network, it can be based on whether the output score of the value network is B or the distance between the output score and B. Whether the difference is lower than the preset difference to adjust the parameters of the value network.
  • the server 300 can fit a scoring rule.
  • the server 300 can perform feature extraction processing on each sample data in the sample database one by one (one by one) and input the value network 210 to be trained for training, and can also batch each sample data in the sample database (batch by batch) for feature extraction processing and input to the value network 210 to be trained for training, no limitation is set here.
  • the value network 210 trained by the server 300 can score the first video segment of the video clip according to the editing style selected by the user and feed it back to the editing strategy network 220 (that is, the video clip). clipping part). Therefore, it can be understood that if the trained value network 210 includes scoring rules for multiple editing styles, the training process of the above steps 301 to 302 can be performed corresponding to each editing style to obtain the scoring rules corresponding to each editing style, Therefore, the value network 210 that has been trained can output corresponding scores corresponding to different editing styles by using scoring rules corresponding to the editing styles.
  • the value network 210 is required to include scoring rules for video clips whose editing style is "Hong Kong style", and if in the trained value network 210, a comprehensive score is output corresponding to a video clip, then in the above steps 301
  • a large number of video clips collected may all be short videos or video clips based on Hong Kong movies or Wong Kar Wai movie clips.
  • professional editors score the collected video clips they may also It can be scored based on the scoring weights of the four dimensions (creativity, fun, artistry, and narrative ability) corresponding to the video clips with the editing style of "Hong Kong style", and the sample data with the editing style of "Hong Kong style” can be obtained to train the value Network 210.
  • the server 300 can train the value network 210 to fit the scoring rule corresponding to the editing style of "Hong Kong style".
  • the sample data when collecting sample data, can be obtained by artificially screening and collecting video clip samples corresponding to each editing style for scoring, or by computer screening and collecting video clip samples corresponding to each editing style for scoring.
  • a large number of sample data can be collected in the above step 301 to form a sample database, for example, for each The video editing style can collect more than 500 sample data, and there is no limit here.
  • the server 300 obtains the trained value network 210.
  • the server 300 can train the value network 210 with scoring rules corresponding to various editing styles. Since the trained value network 210 has scoring rules corresponding to multiple editing styles, the video editing model 200 with the value network 210 trained in this step can perform the editing on the material to be edited selected by the user according to the editing style selected by the user. Stylized video clips, which output video clips that match the clip style selected by the user. The specific process for the user to select the video editing style and the exemplary operation interface will be described in detail below, and will not be repeated here.
  • the value network 210 trained in the above steps 301 to 302 is under the scoring rules corresponding to each editing style, if the score output by the value network 210 to the input video clip is a comprehensive score, and the comprehensive score is When the value range does not match the value range of the reward value set and received by the editing policy network 220, then when the value network 210 is embedded in the video editing model 200, it is necessary to add the value
  • the comprehensive score output by the network 210 is converted into a reward value conversion module fed back to the clipping strategy network 220.
  • the conversion module can convert the composite score into a reward value fed back to the clipping strategy network 220 reward value.
  • the score output by the value network 210 can also be directly fed back to the editing strategy network 220. In this case, there is no need to convert the score output by the value network 210.
  • the value obtained by the server 300 in the above steps 301 to 302 after training is a plurality of score values corresponding to the video clips to be evaluated, then when the value network 210 is embedded in the video editing model 200, it is necessary to add the output of the value network 210 between the value network 210 and the editing strategy network 220 Multiple ratings are converted into a calculation module that feeds back reward values to the editing strategy network 220.
  • the calculation module can first weight the multiple ratings output by the value network 210 to obtain a comprehensive score, and then based on the relationship between the comprehensive score and the reward value, Linear correspondence, the comprehensive score is converted into a reward value fed back to the editing policy network 220 .
  • the calculation module calculates the process of obtaining the comprehensive score based on multiple scoring weights output by the value network 210, and the weight calculation formula based on it can refer to the following formula, for example:
  • E represents the comprehensive evaluation (Evaluation)
  • M represents the number of evaluations in multiple different dimensions output by the value network 210
  • the weights of the corresponding M dimension score values are w i is the weight coefficient corresponding to the rating of the i-th dimension. It can be understood that the sum of the weight coefficients corresponding to the ratings of each dimension is 1.
  • the comprehensive score E 75.
  • scoring dimensions involved in the above formula (1) and the weight coefficients corresponding to the scoring values of each dimension can be reasonably set by professional editors according to different video editing styles.
  • Scoring dimensions can be four dimensions of creativity, fun, artistry, and narrative ability.
  • the weight coefficients corresponding to the scoring values of each dimension are, for example: creativity 0.2 (or 20%), fun 0.1 (or 10%), artistry 0.4 (or 40%), narrative ability 0.3 (or 30%), no limitation here.
  • the value network 210 can be used to complete the edited video clips. The scores obtained by the comprehensive scoring are converted into reward values and fed back to the editing strategy network 220 .
  • the comprehensive score obtained by the value network 210 is 87 points, based on the above conversion relation, the converted reward value is 8.7, so the reward value fed back to the editing strategy network 220 is 8.7.
  • the above-mentioned correspondence between the comprehensive score and the reward value may also be a preset correspondence table between the comprehensive score and the reward value, which is not limited here.
  • the trained value network 210 can be embedded into the video editing model 200 including the editing strategy network 220, and then on the server 300, a large number of
  • the sample data input to be clipped includes the clipping strategy network 220 and the video clipping model 200 of the above-mentioned trained value network 210, and the clipping strategy network 220 in the training video clipping model 200 (that is, the video clipping part of the video clipping model 200) makes the video
  • the editing model 200 has a certain editing strategy prediction ability and decision-making ability for the input data to be edited.
  • the server 300 completes the training of the video editing model 200
  • the trained video editing model 200 is transplanted into the mobile phone 100 to perform editing processing on the material to be edited selected by the user, it can be edited and generated with a high appreciation value. Short videos to improve user experience.
  • the process of training the video editing model 200 to edit the material to be edited through the editing strategy network 220 can be completed during the training process of the server 300 to edit the material to be edited by the video editing model 200.
  • FIG. 3B -1 the schematic diagram of the editing training process of the video editing model 200 shown in FIG. 3B-2 , specifically introducing the process of the server 300 training the video editing model 200 to edit the material to be edited through the editing strategy network 220 .
  • the process for the server 300 to train the video editing model 200 to edit the material to be edited includes two processes: one of the processes is shown in FIG. 3B-1, and the video editing model 200 performs simulated editing training (train) on the input sample data to be edited to find an optimal editing strategy suitable for the input sample data to be edited.
  • the process shown in FIG. 3B-1 is a process of adjusting the network parameters of the clip strategy network 220 to the optimal parameters.
  • Another process is shown in FIG. 3B-2.
  • the video editing model 200 uses the optimal editing strategy found in the simulated editing training process shown in FIG. 3B-1 to perform actual editing processing on the sample data to be edited (that is, the inference process, inference) .
  • the editing strategy network 220 can update the network parameters during the simulated editing training process.
  • the multiple video clips edited by the network 220 can obtain scoring feedback from the value network 210.
  • the cumulative reward value will also increase accordingly.
  • the value network 210 improves the evaluation of the editing ability of the editing strategy network 220; when the video editing model 200 completes the preset multiple times (for example, exceeding the preset number of times threshold) simulation editing training process, the editing strategy after the network parameter update is completed
  • the reward value fed back by the score obtained by editing the video segment obtained by the network 220 may reach a maximum value, and at this time it can be determined that the editing strategy network 220 has found an optimal editing strategy.
  • the video editing model 200 further adopts the optimal editing strategy found by the simulated editing training to perform the actual editing process on the sample data to be edited, so as to obtain a short video with high appreciation value.
  • the cumulative reward value (Q value) finally obtained after multiple simulated editing training processes is the expected value of the cumulative reward value.
  • the video editing model 200 can perform multiple simulated editing processes to determine the optimal editing process suitable for the user input material.
  • Strategy that is, the final clipping strategy that can produce the highest Q value.
  • the process of the server 300 training the video editing model 200 to edit the input sample data to be edited will be described in detail below with reference to FIG. 3B-1 and FIG. 3B-2 .
  • FIG. 3B-1 shows a schematic diagram of the process of video clipping model 200 performing simulation clipping training on sample data to be clipped according to an embodiment of the present application. It can be understood that FIG. 3B-1 shows a process in which the video editing model 200 performs a simulated editing training on the sample data to be edited.
  • the video editing model 200 performs the next (for example, the second) simulated editing training
  • the The initial network parameters of the clipping strategy network 220 correspond to the updated network parameters of the clipping strategy network 220 at the end of the first simulated clipping training.
  • the video editing model 200 needs to carry out the simulated editing training process shown in FIG. 3B-1 for a preset number of times on the sample data to be edited, it can be determined that the video editing model 200 finds an optimal editing strategy suitable for the sample data to be edited.
  • the process includes the following steps:
  • the video editing model 200 acquires the nth video segment in the sample data to be edited. It can be understood that when the video editing model 200 edits the sample data to be edited, the sample data to be edited may be divided into m video segments to be edited and edited segment by segment, where m ⁇ n.
  • a large amount of sample data to be edited can be collected and input to the server 300 for training the editing strategy network 220 of the video editing model 200, wherein the sample data to be edited can include image data and/or video clips etc.
  • the server 300 first needs to perform feature vector extraction on the sample data to be edited used for training, for example, each sample data to be edited is represented by a plurality of matrix sets. For details, please refer to the relevant description in the above step 302, which will not be repeated here. repeat.
  • the server 300 After the server 300 completes the feature extraction of the sample data to be edited, it may input the sample data to be edited into the video editing model 200 for editing.
  • a sample data to be edited can be preset to be divided into multiple video segments, and then the sample data to be edited is subjected to analog editing process by segment, for example, in the video editing model 200, a sample data to be edited is preset
  • the video editing model 200 can sequentially acquire the first video segment of the sample data to be edited.
  • the second video segment, the nth video segment, and the 100th video segment are respectively subjected to analog editing processing.
  • each video clip (for example, the first video clip) acquired by the video clipping model 200 must first be input into the clipping strategy network 220 of the video clipping model 200 for clipping processing.
  • the clipping strategy network 220 in the video clipping model 200 determines the current optimal clipping action that can be taken for the nth video clip under the network parameter ⁇ n .
  • the clipping strategy network 220 can predict each cumulative reward value corresponding to the clipping process of the nth video clip (hereinafter represented by Qn) if various preset clipping actions are taken based on the input nth video clip, For example, there are x types of editing actions preset in the clipping strategy network 220, then the clipping strategy network 220 can predict and obtain x Qn based on the input nth video segment, and the clipping strategy network 220 further uses the maximum of the predicted x Qn The editing action corresponding to the value is determined as the current optimal editing action that can be taken for the nth video segment.
  • the cumulative reward value Qn can be understood as if the editing strategy network 220 takes a preset editing action to edit the input nth video segment, then the editing strategy network 220 may edit all the video segments after the nth segment The cumulative value of the rating feedback reward value obtained later.
  • the network parameter ⁇ n of the clipping strategy network 220 corresponds to the corresponding network parameter when the clipping strategy network 220 predicts Qn for the input nth video segment.
  • the editing strategy network 220 can use the editing strategy corresponding to the initial network parameters such as ⁇ 1 to predict Q1 for the first video segment.
  • the clipping strategy network 220 will correspondingly predict x kinds of Qn, and select the maximum value to determine The optimal editing action for the nth input video segment.
  • the clipping strategy network 220 can predict 1000 Q1 values based on the input first video segment, and the clipping strategy network 220 further determines the clipping action corresponding to the maximum value of the predicted 1000 Q1 values as the first video clip The currently optimal editing action that the clip can take.
  • the editing strategy network 220 can also predict 1000 Q2 values based on the second input video segment, and the editing strategy network 220 can then determine the editing action corresponding to the maximum value of the predicted 1000 Q2 values as the corresponding The current optimal editing action that can be taken for the second video segment.
  • the editing strategy network 220 predicts Q2 for the second video segment, the network parameters may have changed.
  • the network parameter change of the clipping policy network 220 will be described in detail in the following step 315t, and will not be repeated here.
  • the clipping strategy network 220 in the video clipping model 200 adopts the determined current optimal clipping action to perform simulate clipping processing on the nth video segment.
  • the video editing model 200 needs to pass the editing strategy network 220 to treat
  • the clipping sample data is subjected to a preset number of simulated clipping training processes.
  • the clipping strategy network 220 takes clipping actions for each video segment in the sample data to be clipped.
  • the clipping process is simulated clipping processing.
  • the clipping strategy network 220 in the video clipping model 200 performs simulate clipping on the nth video clip in the sample data to be clipped based on the current optimal clipping action corresponding to the nth video clip determined in the above step 311t.
  • the clipping strategy network 220 in the video clipping model 200 outputs the nth video segment that has been clipped.
  • the clipping strategy network 220 in the video clipping model 200 adopts the determined current optimal clipping action corresponding to the nth video clip to simulate the clipping process on the nth video clip, and obtain the nth clipped video clip and output clip policy network 220 .
  • the clipping strategy network 220 in the video clipping model 200 takes the current optimal clipping action corresponding to the first video clip and completes the analog clipping process on the first video clip to obtain the first clipped video clip and
  • the clipping strategy network 220 is output, and the video clipping model 200 can temporarily store the first video segment that has been edited outputted by the clipping strategy network 220 .
  • step 313t the value network 210 of the video clip model 200 can continue to execute the following step 314t, and the video clip model 200 can continue to execute the following step 316t.
  • the execution order of steps 314t and 316t is not limited, and may be performed simultaneously or sequentially.
  • the value network 210 in the video editing model 200 scores the n-th completed edited video segment output by the editing strategy network 220, and the video editing model feeds back corresponding reward values to the editing strategy network 220 based on the score.
  • the value network 210 in the video editing model 200 can obtain the nth video clip that has been edited output by the editing strategy network 220, score the video clip that has been edited and output the score, and the video editing model 200 then The score output by the value network 210 corresponding to the n-th completed edited video segment may be converted into a corresponding reward value and fed back to the editing policy network 220 .
  • the scoring rule adopted by the value network 210 to score the edited video clips may be a scoring rule corresponding to a certain editing style. Any of a variety of editing styles. For the process of converting the score of the value network 210 on the n-th clip that has been edited by the video editing model 200 into a reward value, reference may be made to the relevant description after step 303 above, and details will not be repeated here.
  • the value network 210 in the video editing model 200 obtains the first segment of the edited video segment output by the editing strategy network 220 in the above step 314t for scoring.
  • the value network 210 obtained through the training process of the above steps 301 to 302 can be Including scoring rules for video clips of various editing styles, therefore, before or when the process of simulated clip training including this step is started (for example, before or when the above-mentioned step 310t starts), the video clip can be
  • the model 200 presets an editing style, so when the value network 210 in the video editing model 200 scores the first segment of the edited video segment, it can use the scoring rule corresponding to the preset editing style to rate the first segment Completely edited video clips are scored.
  • the value network 210 in the video editing model 200 uses scoring rules corresponding to the preset editing style to score the first segment of the video clip that has been edited. For the process of converting the score of the value network 210 on the first clip that has been edited into a reward value, reference may be made to the relevant description after step 303 above, and details will not be repeated here.
  • the video editing model 200 uses the value network 210 based on the value network 210 to convert the reward value obtained by converting the score of the first video clip that has been edited, as the feedback data of the editing strategy network 220 in the video editing model 200.
  • the reward value is actually Perform the above steps 310t to 313t on the editing strategy network 220 of the video editing model 200 to complete the evaluation feedback of the simulation editing results of the first video segment of the sample data to be edited, and this evaluation feedback will be used to optimize the editing strategy network 220 to treat the editing sample
  • the network parameters of the editing strategy corresponding to the simulation editing process of the second video segment of the data, the specific process of optimizing the network parameters will be described in detail below, and will not be repeated here.
  • the clipping strategy network 220 in the video clipping model 200 updates the parameters of the clipping strategy network 220 to network parameters ⁇ n+1 based on the obtained feedback reward value.
  • the clipping strategy network 220 in the video clipping model 200 can synthesize three parts of values to calculate the loss function based on the simulated clipping results of the nth video segment, these three parts of values include (a) the clipping strategy in the above-mentioned step 311t The maximum Qn value predicted by the network 220 under the network parameters ⁇ n ; (b) the reward value fed back by the video clip model 200 in the above-mentioned step 314t based on the scoring of the nth segment of the clipped video segment by the value network 210; and (c) The clipping strategy network 220 predicts the maximum Qn+1 value under the network parameter ⁇ n , and the clipping strategy network 220 adjusts the network parameter from the network parameter ⁇ n to the network parameter ⁇ n+1 based on the loss function determined by the above three parts.
  • the calculation formula of the loss function is:
  • Loss function maximum Qn+1 value+feedback reward value obtained by the nth segment of the video clip that has been edited-maximum Qn value;
  • the sum of the above (b) and (c) (that is, the Qn+1 value + the feedback reward value obtained by the nth segment of the video clip that has been edited) can be used as the editing strategy network 220 in the video editing model 200 to treat the editing sample
  • the target cumulative reward value (hereinafter referred to as the target Qn value) of the nth video segment of the data therefore, in this step, the loss function calculated by the editing strategy network 220 parameters represents the clipping strategy network 220's target for the nth segmental video segment The difference between the Qn value and the predicted maximum Qn value.
  • the clipping strategy network 220 can use the gradient descent method to update the parameters of the clipping strategy network according to the calculated loss function . Let me repeat.
  • the network parameter ⁇ n+1 of the clipping strategy network 220 corresponds to the following decision of the clipping strategy network on the input n+1th video segment.
  • the video editing model 200 since the video editing model 200 needs to use the maximum Qn+1 value predicted by the editing strategy network 220 under the network parameter ⁇ n when performing this step 315t to calculate the loss function, the video editing model 200 calculates the loss when performing this step 315t function can pre-acquire the n+1th segment of video clips in the sample data to be edited to predict the maximum Qn+1 value, this process can refer to the dotted arrow between step 310t and step 315t shown in Figure 3B-1, For the process of predicting the maximum Qn+1 value for the n+1th video segment in the sample data to be edited, refer to the process of predicting the maximum Qn value for the nth video segment in step 311t above, which will not be repeated here.
  • step 316t After the video editing model 200 completes the execution process of step 316t, it can continue to execute the following step 317t or continue to execute the steps after the following step 317t, for example, execute the following step 318t or enter into the next paragraph (the n+1th paragraph ) The analog clipping process of the video segment.
  • the video editing model 200 judges whether to complete the simulation editing training. If so, then execute step 317t; if not, then the clipping strategy network 220 of the video clipping model 200 enters into the simulation clipping process of the n+1th (for example, the 2nd paragraph) video segment under the network parameter ⁇ n+1 , namely The video clip model 200 returns to execute step 310t, and the clip policy network 220 of the video clip model 200 executes steps 311t to 313t under the network parameter ⁇ n +1 , and executes step 315t after the value network 210 executes step 314t to update the network parameter ⁇ n+ 2 , and so on.
  • the video editing model 200 can judge whether to complete the simulation editing training process of the sample data to be edited this time based on the timing information carried by the nth video clip (for example, the first video clip) acquired in the above step 310t.
  • the timing information carried by n video clips may be the position information of the nth video clip (such as the 1st) video clip in the sample data to be edited, or the video clip model 200 piece by piece Obtain sequence label information and the like of the sample data to be clipped, and there is no limitation here.
  • the video editing model 200 may also judge based on the timing information carried in the nth video segment that has been edited (for example, the first video segment that has been edited) output by the editing strategy network 220 in step 314t. Whether to complete the simulation clip training process of the sample data to be clipped. If the judgment result of the execution step of the video editing model 200 is no, that is, the simulation editing training has not been completed, the video editing model 200 returns to the execution step 310t to obtain the n+1th segment of the video clip (such as the 2nd segment) in the sample data to be edited.
  • the clipping strategy network 220 of the video clipping model 200 continues to execute steps 311t to 313t under network parameters ⁇ n+1 (such as network parameters ⁇ 2 ), and after the value network 210 executes step 314t Execute step 315t to update to the network parameter ⁇ n+2 (for example, update to the network parameter ⁇ 3 ), which will not be repeated here.
  • the video editing model 200 completes a simulated editing training process of the sample data to be edited, including performing the process described in the above steps 310t to 316t piece by piece for all the video segments to be edited in the sample data to be edited, such as the example shown in step 310t If the sample data to be edited is divided into 100 video clips to be edited, then the video clip model 200 performs the process described in the above-mentioned steps 310t to 316t on the 100th video clip, in this step the video clip model 200 If the judgment result is yes, that is, the simulation editing training has been completed, and step 317t can be executed.
  • the video editing model 200 performs the processes described in the above-mentioned steps 310t to 316t on the 1st to 99th video clips respectively, the results judged by the video editing model 200 in this step are all negative, that is, this process has not been completed.
  • the second simulated editing training it is necessary to obtain the next video segment to be edited in the sample data to be edited, and return to step 310t to execute the simulated editing process of the next video segment.
  • the video editing model 200 judges whether the number of training times reaches the preset number of times threshold. If not, it means that the video editing model 200 has not yet found an optimal editing strategy suitable for the sample data to be edited, and then execute step 318t; if so, it indicates that the video editing model 200 has found an optimal editing strategy suitable for the sample data to be edited , then execute step 319t.
  • the simulated clip training performed by the video clip model 200 on the sample data to be clipped can be preset.
  • the number of times threshold when the video editing model 200 judges that the number of simulated editing training described in the above steps 310t to 316t does not reach the preset number of times threshold, then perform the following step 318t; when the video editing model 200 judges that the above steps 310t to 316t are completed
  • the number of simulated editing training times described reaches the preset number threshold, it indicates that the video editing model 200 has found an optimal editing strategy suitable for the sample data to be edited, and the following step 319 t is performed.
  • the value network 210 can also set the cumulative threshold value of the reward value fed back by scoring each segment of the simulated clipped video clips in each simulated clipping process as the judgment condition executed in step 317t For example, it is judged whether the value network 210 in the above step 341t scores the video clips that have completed the simulation editing, and whether the cumulative value of the reward value fed back reaches the preset cumulative threshold. If it does not reach the cumulative threshold, it indicates that the simulation needs to be continued. For editing training to determine the optimal editing strategy, the following step 318t needs to be performed; if the cumulative threshold is reached, it indicates that the optimal editing strategy has been found, and the following step 319t can be performed.
  • the preset simulated training times threshold and the above-mentioned cumulative threshold of reward values may also be used as the judgment condition of step 317t, which is not limited here.
  • the video clip model 200 enters the next simulation clip training. Specifically, when the video editing model 200 enters the next simulation editing training, the editing strategy network 220 in the video editing model 200 starts to execute the first video segment of the sample data to be edited under the updated network parameters after the simulation editing training is completed. Step 311t.
  • the sample data to be edited in the above step 311t includes m segments of video clips to be edited.
  • the network parameters of the editing strategy network 220 can undergo m+1 updates, for example, editing Policy network 220 parameters may be updated to network parameters ⁇ m+1 .
  • step 317t if the video clip model 200 judges that the number of simulation clip training described in the above-mentioned steps 310t to 316t does not reach the preset number threshold, the video clip model 200 must perform another clipping training on the sample data to be clipped.
  • the above steps 310t to 316t simulate the clip training process, and after the above steps 310t to 316t simulate clip training process is completed again, the video clip model 200 executes the judgment process described in the above step 317t again.
  • the video editing model 200 uses the optimal editing strategy to perform actual editing processing on the sample data to be edited.
  • the process of actually editing the sample data to be edited using the optimal editing strategy can refer to the process shown in FIG. 3B-2 , which will be described in detail below and will not be repeated here.
  • the video editing model 200 finds an optimal editing strategy suitable for the sample data to be edited after completing a preset number of simulated editing trainings on the sample data to be edited, and then the video editing model 200 obtains the The optimal clipping strategy continues to execute steps 310i to 360i shown in FIG. 3B-2 below, and performs actual clipping processing on the sample data to be clipped.
  • FIG. 3B-2 shows a schematic diagram of the process of clipping the sample data to be clipped using the optimal clipping strategy trained by the video clipping model 200 using the simulated clipping training process shown in FIG. 3B-1 according to an embodiment of the present application.
  • the clipping process shown in FIG. 3B-2 that is, the clipping strategy network 220 in the video clipping model 200, according to the relevant clipping processing logic corresponding to the optimal clipping strategy, determines the video clips to be clipped in the sample data to be clipped.
  • the process includes the following steps:
  • the video clip model 200 acquires the nth video clip.
  • the video editing model 200 divides the sample data to be edited into multiple video segments corresponding to the number of editing strategies based on the optimal editing strategy trained in the simulated editing training process shown in FIG. 3B-1, and then treats them segment by segment. Clip sample data for clip processing.
  • the video editing model 200 may sequentially acquire the first video segment, the second video segment, and the second video segment of the sample data to be edited.
  • the Nth video segment is edited according to the editing processing strategy corresponding to each video segment in the optimal editing strategy.
  • Each video segment (for example, the first video segment) acquired by the video clipping model 200 is first input to the clipping strategy network 220 of the video clipping model 200 for clipping processing.
  • the optimal editing strategy trained by the video editing model 200 includes strategies such as how many video segments to be edited are divided into the sample data to be edited, and what editing action is taken for each video segment to be edited.
  • the editing strategy network 220 in the video editing model 200 adopts the determined optimal editing strategy to determine the editing action for the nth video segment.
  • the clipping strategy network 220 in the video clipping model 200 determines the Q1 value corresponding to the first video segment based on the determined optimal clipping strategy. It can be understood that when the video editing model 200 determines to adopt the optimal editing strategy to edit the sample data to be edited, the parameters of the editing strategy network 220 in the video editing model 200 are no longer updated, and accordingly, the video editing model 200 determines to adopt the optimal editing strategy. During the editing process of the strategy on the sample data to be edited, the value network 210 in the video editing model 200 will not score each edited video segment output by the editing strategy network 220 .
  • the clipping policy network 220 in the video clipping model 200 takes the determined clipping action to perform actual clipping processing on the nth video segment.
  • the clipping strategy network 220 in the video clipping model 200 performs actual clipping on the first video segment in the sample data to be clipped based on the clipping action determined in step 330i and corresponding to the predicted Q1 value.
  • the editing strategy network 220 in the above video editing model 200 adopts the determined editing action to actually edit the first video segment.
  • the relevant description in the above step 312t please refer to the editing actions taken, please refer to the above-mentioned Related descriptions in FIG. 2 are omitted here.
  • the clipping strategy network 220 in the video clipping model 200 outputs the nth video clip that has been clipped.
  • the editing strategy network 220 in the video editing model 200 completes the editing process of the first video segment
  • the first video segment that has been edited is obtained and output, and the video editing model 200 outputs the first video segment output by the editing strategy network 220.
  • 1 video clip that has been edited is temporarily stored.
  • the video editing model 200 judges whether the editing is completed. If yes, execute step 360i; if not, the video editing model 200 enters the editing process of the next (for example, the second) video segment, returns to step 310i, and obtains the second video segment in the sample data to be edited.
  • the video editing model 200 can judge whether to complete the actual editing process of the sample data to be edited based on the timing information carried by the first video clip acquired in the above step 310i. In other embodiments, the video editing model 200 It is also possible to judge whether the actual editing process of the sample data to be edited is completed based on the timing information carried in the first edited video segment output by the editing strategy network 220 in the above step 350i, or the video editing model 200 may also be based on the above steps In 340i, whether the first segment of the edited video segment output by the editing strategy network 220 includes the label or the mark of the last segment of the video segment in the sample data to be edited is used to determine whether to complete the actual editing process of the sample data to be edited. Do limit.
  • the video editing model 200 judges that the editing process of all the video clips to be edited in the sample data to be edited has been completed, the video editing model 200 continues to perform the following step 360i; In the editing process of all the video segments to be edited in the data, the video editing model 200 enters the editing process of the next (for example, the second segment) video segment, returns to step 310i, and obtains the second segment of video in the sample data to be edited fragment.
  • the video editing model 200 outputs a short video that has been edited.
  • the video editing model 200 judges that the editing process of all the video segments to be edited in the sample data to be edited has been completed, it means that all the video segments that have been edited have been stored in the video editing model 200, Including the first edited video segment output by the editing strategy network 220 in the above step 340i, the second edited video segment and the nth edited video segment that are sequentially completed by repeating the above steps 310i to 340i.
  • the video editing model 200 can generate a short video clip that is composed of the first video clip that has been clipped, the video clip that has been clipped in the second clip, and the video clip that has been clipped in the nth clip, and outputs the video clip Model 200. So far, the video editing model 200 has completed the editing process of a sample data to be edited.
  • the video editing model 200 is executed on each sample data to be edited.
  • the training of the video clip model 200 is completed.
  • the above-mentioned simulated editing training process in FIG. 3B-1 and the editing processing process shown in FIG. 3B-2 performed by the video editing model 200 are also the adjustment and optimization process of the network parameters of the editing strategy network 220 in the video editing model 200. .
  • the trained video editing model 200 can be transplanted to the mobile phone 100 to realize the video editing process of the material to be edited selected by the user.
  • the process of transplanting the video clip model 200 to the mobile phone 100 may be, for example, reading and parsing the video clip model 200 through the model reading interface in the operating system configured on the mobile phone 100, and then compiling it into an application program file and installing it on the mobile phone
  • the operating system Android (Android) TM system configured by the mobile phone 100 can read and parse the video clip model 200 through the model reading interface in the Android project, and then compile it into APK (Android application package, Android application package) program package) file, installed in the mobile phone 100, and the transplantation of the video editing model 200 is completed.
  • APK Android application package, Android application package
  • the operating system configured on the mobile phone 100 can also be other systems, such as HarmonyOS , and correspondingly, the video clip model 200 can be compiled into a The application program file of the corresponding operating system is not limited here.
  • the pictures and/or video data in the mobile phone 100 can be input into the video editing model 200 to obtain features corresponding to the pictures and/or video data in the mobile phone 100.
  • Data such as the matrix describing image data or the matrix set describing video clips described in step 302 above.
  • the user can operate the mobile phone 100 to complete the editing process of data such as pictures and/or videos stored in the mobile phone 100 through the video editing model 200 transplanted to the mobile phone 100, and obtain short videos with high appreciation value, and, Since the value network 210 in the trained video editing model 200 includes scoring rules corresponding to multiple editing styles, the short video obtained by editing the mobile phone 100 transplanted with the above-mentioned trained video editing model 200 can also have the user's The selected clip style.
  • FIG. 4 shows a schematic diagram of an implementation flow of the mobile phone 100 executing the video clipping method of the present application in response to user operations.
  • the mobile phone 100 transplanted with the above-mentioned video clipping model 200 can clip the clipping style corresponding to the clipping style selected by the user and have high appreciation value during the process of clipping the clipping material selected and input by the user. short video.
  • the above-mentioned video editing model 200 is transplanted to the mobile phone 100 in the process of correspondingly compiling and installing the application on the mobile phone 100 (that is, the video editing application), which can be displayed on the mobile phone 100 as follows
  • the clipping video application 511 shown in FIG. 5A is described.
  • the process includes the following steps:
  • the mobile phone 100 acquires the material to be edited selected by the user, and acquires the video editing style selected by the user.
  • the user operates the mobile phone 100 to run the video editing application, uploads and adds the material to be edited, and the mobile phone 100 can obtain the material to be edited.
  • the material to be edited may be a picture and/or video taken by the mobile phone 100, which is not limited here.
  • 5A to 5E show schematic diagrams of the operation interface of the mobile phone 100 running the video editing application to execute the video editing method of the present application.
  • the user can click the video clip application 511 to enter the operation interface 520 shown in FIG. 5B .
  • the operation interface 520 includes a style setting option 521 and an add material button 522.
  • the operation interface 520 may also include a setting button 523.
  • the interface shown in Figure 5B does not constitute an implementation of the present application.
  • the interface function buttons and interface layout styles of the video editing application 511 provided by the example.
  • the interface of the video editing application applicable to this application can also be in other forms, and can have more or less
  • the number of function controls of the buttons shown in FIG. 5B is not limited here.
  • the user can check the video style he wants to make under the style setting option 521, for example, select the "Hong Kong style” style option, as shown in Figure 5B, the various styles provided under the style setting option 521
  • You can set keyword descriptions for each style option to briefly introduce the characteristics of various styles to users.
  • the keyword description corresponding to the "Hong Kong Style” style option is "Wong Kar Wai Movie Style”
  • the keyword description corresponding to the "Childhood” style option is "Old photos" and "memories”
  • the keywords corresponding to the "manga” style option are described as “full of imagination”
  • the keywords corresponding to the "suspense” style option are described as “plot” and “logic”.
  • other styles and more style options can be set under the style setting option 521, and the keyword descriptions for various style options can also be other content, which is not limited here.
  • the various video editing styles provided under the style setting option 521 shown in FIG. 5B may correspond to the corresponding selection operation during the video editing application 511 running on the mobile phone 100 in response to the user's operation process of selecting a video editing style.
  • the generated instruction for example, may include a tag corresponding to the video clip style selected by the user, which is not limited here.
  • the interface for adding the material to be edited can refer to the interface 530 shown in FIG. 5C.
  • the material to be edited can be It is a single video material, multiple video materials, combined materials of pictures and videos, etc., and there is no limitation here.
  • the user can check the check box 532 on the material to be edited to be added in the material selection area 531 on the interface 530 shown in FIG. box to cancel the addition of the corresponding material.
  • the user can also click the “ ⁇ ” button 534 in the upper right corner of the added material in the material management area 533 below the interface 530 to delete the checked material, and there is no limitation here.
  • the interface displayed by the mobile phone 100 for the user to select and add material to be edited may be an interface in other layout forms different from that shown in FIG. 5C , and no limitation is set here.
  • FIG. 5D and FIG. 5E The schematic diagrams of the interfaces shown in FIG. 5D and FIG. 5E will be described in detail in the corresponding steps below, and will not be repeated here.
  • the video editing application running on the mobile phone 100 performs editing processing on the material to be edited selected by the user based on the editing style selected by the user.
  • the video clipping application 511 running on the mobile phone 100 can input the obtained label corresponding to the clipping style selected by the user and the obtained material to be clipped selected by the user into the video clipping model 200 for clipping processing. 200.
  • the video editing model 200 then performs the above-mentioned editing process of Steps 310i to 360i on the material to be edited based on the optimal editing strategy found through multiple times of simulated editing training.
  • simulation editing training process performed by the above-mentioned video editing model 200 on the material to be edited can refer to the relevant descriptions in the above-mentioned steps 310t to 319t for details, and will not be repeated here;
  • editing process performed by the strategy on the material to be edited please refer to the relevant descriptions in the above-mentioned steps 310i to 360i for details, and details will not be repeated here.
  • the video editing application running on the mobile phone 100 finishes editing the material to be edited, and generates a short video that has been edited.
  • the video editing application 511 run by the mobile phone 100 inputs the tag corresponding to the editing style selected by the user and the material to be edited selected by the user into the video editing model 200 for editing processing, and the video editing model 200 After completing the editing process of the material to be edited, referring to the relevant description in the above step 360i, the video editing model 200 generates and outputs a short video with the editing style selected by the user. It can be understood that the short video output by the video editing model 200 can be presented on the interface of the video editing application run by the mobile phone 100 .
  • the video clipping application 511 running on the mobile phone 100 uses the trained video clipping model 200 to edit the material to be edited selected by the user, which can It is understood that during the clipping operation, the video clipping model 200 uses the video segment of the material to be clipped and the label corresponding to the video clipping style selected by the user as the input of the video clipping model 200, and the trained video clipping model 200 passes the trained Editing strategy network 220 and value network 210 implement editing to obtain a short video with the editing style selected by the user. For details, refer to the relevant description in step 402 above, which will not be repeated here.
  • the mobile phone 100 can display the interface 540 of completed video editing shown in FIG. 5D, and the user can click the play button 541 on the completed video clip to preview The content of the edited video has been completed; the user can click the share button 542 below the interface 540 to share the video that has been edited to other applications or choose to publish it on a short video application platform; the user can click the save button 543 below the interface 540 to save the edited video The finished edited video is saved to the local photo album of the mobile phone 100 for appreciation or to add the finished edited video on other applications; if the user clicks the play button 541 to preview the finished edited video, it feels that the video needs to be further added.
  • the delete button 545 deletes the video.
  • the bottom of the interface 540 shown in FIG. 5D also includes a more button 546 , the user can click on the button 546 to perform other operations on the edited video, such as "rename", "cast and play” and other operations.
  • the user can also click the setting button 523 shown in FIG. 5B to set some default options and configuration parameters of the video clip application 511.
  • the user can update or set preferences for the style model on the setting interface 550.
  • the user can click "check and update the style model" under the style model option 551 to click on the style setting option 521 shown in FIG. 5B above.
  • the types of video editing styles that can be selected are checked and updated, and the updated and trained new video editing style models are added in time. The training of this model will be described in detail below and will not be repeated here.
  • the user can click the "style preference setting” under the style model option 551 to set their own preferred video editing style, refer to the operation 4 shown in Figure 5E, to display the user's preferred video editing style under the style setting option 521 shown in Figure 5B.
  • Style for example, the preferred style set by the user is “Hong Kong style”, “childhood”, “manga”, and “suspense”.
  • the style setting option 521 shown in FIG. the user can also slide left and right under the style setting option 521 shown in FIG. 5B or click the left slide button 5212 or right slide button 5213 to select other styles.
  • the preferred style set by the user is also It can be other styles, such as “Chinese style", “cute”, “beautiful” and other editing styles, and there is no limitation here.
  • the user can also choose to enable the "automatically add credits" function to automatically add a preset credits to the clipped video.
  • the user can also add a default watermark or a custom Define the watermark, etc., which will not be repeated here.
  • the interface function layout of the interface 540 for further processing the completed edited video and the operations corresponding to each function option can also be set in other forms of combinations, and the more button 546 can also include other More operations are not limited here.
  • the mobile phone 100 executes the video clipping method of the present application, and uses the value network 210 in the trained video clipping model 200 to guide the clipping strategy network 220 to treat
  • the clipping material is subjected to multiple times of simulated editing training, so as to find the optimal editing strategy suitable for the material to be edited, and the editing strategy network 220 in the video editing model 200 then uses the optimal editing strategy obtained from training to complete the editing process of the material to be edited, wherein , the value network 210 in the video editing model 200 can guide the editing strategy network 220 to determine the optimal editing for each video segment in each simulated editing training process from multiple professional appreciation perspectives based on the editing style selected by the user action decision-making process.
  • the short video clipped by implementing the video clipping method of the present application can have a higher appreciation value and have a clipping style that meets user preferences, compared with short videos clipped by other existing video clipping solutions.
  • the operation interface of the video clipping application provided to the user based on the video clipping method of the present application is also very simple and easy to operate, and the operation threshold is low, which is conducive to improving user experience.
  • the mobile phone 100 executes the video editing method of the present application, and uses the video editing model 200 to perform simulated editing training on the material to be edited selected by the user, it also continues to execute the process described in Figure 3B-1 through the material to be edited selected by the user.
  • the simulated training process shown in the figure is the process of using the material to be edited selected by the user as sample data to train the editing strategy network 220 in the video editing model 200 to further optimize the network parameters. That is to say, the clipping policy network 220 in the video clipping model 200 can continue to be trained during the process of video clipping by the user through the mobile phone 100 .
  • the video editing application installed on the mobile phone 100 to implement the video editing method of the present application is not limited to the interface and implementation process of the above-mentioned video editing application 511, nor is it limited to the application icon and application icon of the video editing application 511 shown in FIG. 5A.
  • Application name, in some other embodiments, the video editing application can also be a third-party application program installed on the mobile phone 100 in other forms, which is not limited here.
  • the video editing method of the present application and the above-mentioned video editing model 200 can also be implemented by configuring the video editing function in the camera application of the mobile phone 100, or, the video editing method of the present application and the above-mentioned video editing model 200 can also be implemented through a service card configured in the system of the mobile phone 100, such as the video editing service configured by the HarmonyOS system carried by the mobile phone 100.
  • the video editing method of the present application can also It can be directly preset in a camera, a handheld pan/tilt, and other devices with camera functions, so that the device has the function of directly performing video editing on the photos or videos taken, and there is no limitation here.
  • FIG. 6 shows a schematic structural diagram of a mobile phone 100 .
  • the mobile phone 100 may include a processor 610, an external memory interface 620, an internal memory 621, a universal serial bus (universal serial bus, USB) interface 630, a charge management module 640, a power management module 641, a battery 642, an antenna 1, an antenna 2, Mobile communication module 650, wireless communication module 660, audio module 670, speaker 670A, receiver 670B, microphone 670C, earphone jack 670D, sensor module 680, button 690, motor 691, indicator 692, camera 693, display screen 694, and user An identification module (subscriber identification module, SIM) card interface 695 and the like.
  • SIM subscriber identification module
  • the sensor module 680 may include a pressure sensor 680A, a gyroscope sensor 680B, an air pressure sensor 680C, a magnetic sensor 680D, an acceleration sensor 680E, a distance sensor 680F, a proximity light sensor 680G, a fingerprint sensor 680H, a temperature sensor 680J, a touch sensor 680K, an ambient light Sensor 680L, bone conduction sensor 680M, etc.
  • the structure shown in the embodiment of the present invention does not constitute a specific limitation on the mobile phone 100 .
  • the mobile phone 100 may include more or fewer components than shown in the figure, or combine certain components, or separate certain components, or arrange different components.
  • the illustrated components can be realized in hardware, software or a combination of software and hardware.
  • the processor 610 may include one or more processing units, for example: the processor 610 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • the controller can generate an operation control signal according to the instruction opcode and timing signal, and complete the control of fetching and executing the instruction.
  • the processor 610 may control the execution of the video clipping method of the present application through the controller, including controlling the execution of the clipping process of the video clips in the input material to be clipped by the video clipping model 200 .
  • a memory may also be provided in the processor 610 for storing instructions and data.
  • the memory in processor 610 is a cache memory.
  • the memory may hold instructions or data that the processor 610 has just used or recycled. If the processor 610 needs to use the instruction or data again, it can be directly recalled from the memory. Repeated access is avoided, and the waiting time of the processor 610 is reduced, thereby improving the efficiency of the system.
  • processor 610 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transmitter (universal asynchronous receiver/transmitter, UART) interface, mobile industry processor interface (mobile industry processor interface, MIPI), general-purpose input and output (general-purpose input/output, GPIO) interface, subscriber identity module (subscriber identity module, SIM) interface, and /or universal serial bus (universal serial bus, USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input and output
  • subscriber identity module subscriber identity module
  • SIM subscriber identity module
  • USB universal serial bus
  • the I2C interface is a bidirectional synchronous serial bus, including a serial data line (serial data line, SDA) and a serial clock line (derail clock line, SCL).
  • processor 610 may include multiple sets of I2C buses.
  • the processor 610 may be respectively coupled to the touch sensor 680K, the charger, the flashlight, the camera 693 and the like through different I2C bus interfaces.
  • the processor 610 may be coupled to the touch sensor 680K through the I2C interface, so that the processor 610 and the touch sensor 680K communicate through the I2C bus interface to realize the touch function of the mobile phone 100 .
  • the user can click the application icon of the video clipping application 511 through the touch function of the mobile phone 100, and perform corresponding operations on the operation interface of the video clipping application 511, and there is no limitation here.
  • the I2S interface can be used for audio communication.
  • processor 610 may include multiple sets of I2S buses.
  • the processor 610 may be coupled to the audio module 670 through an I2S bus to implement communication between the processor 610 and the audio module 670 .
  • the audio module 670 can transmit audio signals to the wireless communication module 660 through the I2S interface, so as to realize the function of answering calls through the Bluetooth headset.
  • the PCM interface can also be used for audio communication, sampling, quantizing and encoding the analog signal.
  • the audio module 670 and the wireless communication module 660 may be coupled through a PCM bus interface.
  • the audio module 670 can also transmit audio signals to the wireless communication module 660 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus can be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • a UART interface is generally used to connect the processor 610 and the wireless communication module 660 .
  • the processor 610 communicates with the Bluetooth module in the wireless communication module 660 through the UART interface to realize the Bluetooth function.
  • the audio module 670 can transmit audio signals to the wireless communication module 660 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.
  • the MIPI interface can be used to connect the processor 610 with peripheral devices such as a display screen 694 and a camera 693 .
  • MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc.
  • the processor 610 communicates with the camera 693 through the CSI interface to realize the shooting function of the mobile phone 100 .
  • the processor 610 communicates with the display screen 694 through the DSI interface to realize the display function of the mobile phone 100 .
  • the GPIO interface can be configured by software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface can be used to connect the processor 610 with the camera 693 , the display screen 694 , the wireless communication module 660 , the audio module 670 , the sensor module 680 and so on.
  • the GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 630 is an interface conforming to the USB standard specification, specifically, it may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like.
  • the USB interface 630 can be used to connect a charger to charge the mobile phone 100, and can also be used to transmit data between the mobile phone 100 and peripheral devices. It can also be used to connect headphones and play audio through them. This interface can also be used to connect other electronic devices, such as AR devices.
  • the interface connection relationship between modules shown in the embodiment of the present invention is only a schematic illustration, and does not constitute a structural limitation of the mobile phone 100 .
  • the mobile phone 100 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.
  • the charging management module 640 is configured to receive charging input from the charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module 640 can receive charging input from the wired charger through the USB interface 630 .
  • the charging management module 640 can receive wireless charging input through the wireless charging coil of the mobile phone 100 . While the charging management module 640 is charging the battery 642 , it can also supply power to the electronic device through the power management module 641 .
  • the power management module 641 is used for connecting the battery 642 , the charging management module 640 and the processor 610 .
  • the power management module 641 receives the input of the battery 642 and/or the charging management module 640, and supplies power for the processor 610, the internal memory 621, the display screen 694, the camera 693, and the wireless communication module 660, etc.
  • the power management module 641 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance).
  • the power management module 641 may also be set in the processor 610 .
  • the power management module 641 and the charging management module 640 can also be set in the same device.
  • the wireless communication function of the mobile phone 100 can be realized by the antenna 1, the antenna 2, the mobile communication module 650, the wireless communication module 660, the modem processor and the baseband processor.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in handset 100 can be used to cover single or multiple communication frequency bands. Different antennas can also be multiplexed to improve the utilization of the antennas.
  • Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 650 can provide wireless communication solutions including 2G/3G/4G/5G applied on the mobile phone 100 .
  • the mobile communication module 650 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA) and the like.
  • the mobile communication module 650 can receive electromagnetic waves through the antenna 1, filter and amplify the received electromagnetic waves, and send them to the modem processor for demodulation.
  • the mobile communication module 650 can also amplify the signal modulated by the modem processor, convert it into electromagnetic wave and radiate it through the antenna 1 .
  • at least part of the functional modules of the mobile communication module 650 may be set in the processor 610 .
  • at least part of the functional modules of the mobile communication module 650 and at least part of the modules of the processor 610 may be set in the same device.
  • a modem processor may include a modulator and a demodulator.
  • the modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator sends the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is passed to the application processor after being processed by the baseband processor.
  • the application processor outputs sound signals through audio equipment (not limited to speaker 670A, receiver 670B, etc.), or displays images or videos through display screen 694 .
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent of the processor 610, and be set in the same device as the mobile communication module 650 or other functional modules.
  • the wireless communication module 660 can provide wireless local area networks (wireless local area networks, WLAN) (such as wireless fidelity (Wireless Fidelity, Wi-Fi) network), bluetooth (bluetooth, BT), global navigation satellite system, etc. (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • the wireless communication module 660 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 660 receives electromagnetic waves via the antenna 2 , frequency-modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 610 .
  • the wireless communication module 660 can also receive the signal to be sent from the processor 610, frequency-modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 to radiate out.
  • the antenna 1 of the mobile phone 100 is coupled to the mobile communication module 650, and the antenna 2 is coupled to the wireless communication module 660, so that the mobile phone 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC , FM, and/or IR techniques, etc.
  • GSM global system for mobile communications
  • GPRS general packet radio service
  • code division multiple access code division multiple access
  • CDMA broadband Code division multiple access
  • WCDMA wideband code division multiple access
  • time division code division multiple access time-division code division multiple access
  • TD-SCDMA time-division code division multiple access
  • the GNSS may include a global positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a Beidou navigation satellite system (beidou navigation satellite system, BDS), a quasi-zenith satellite system (quasi -zenith satellite system (QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • Beidou navigation satellite system beidou navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the mobile phone 100 realizes the display function through the GPU, the display screen 694, and the application processor.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 694 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 610 may include one or more GPUs that execute program instructions to generate or alter display information.
  • the display screen 694 is used to display images, videos and the like.
  • Display 694 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Mini-LED, Micro-LED, Micro-OLED, quantum dot light emitting diodes (quantum dot light emitting diodes, QLED), etc.
  • the mobile phone 100 may include 1 or N display screens 694, where N is a positive integer greater than 1.
  • the mobile phone 100 can realize the shooting function through ISP, camera 693 , video codec, GPU, display screen 694 and application processor.
  • the ISP is used for processing the data fed back by the camera 693 .
  • the light is transmitted to the photosensitive element of the camera through the lens, and the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin color.
  • ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be located in the camera 693 .
  • Camera 693 is used to capture still images or video.
  • the object generates an optical image through the lens and projects it to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the light signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other image signals.
  • the mobile phone 100 may include 1 or N cameras 693, where N is a positive integer greater than 1.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the mobile phone 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the handset 100 may support one or more video codecs.
  • the mobile phone 100 can play or record videos in various encoding formats, for example: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • MPEG moving picture experts group
  • the NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • Applications such as intelligent cognition of the mobile phone 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
  • the external memory interface 620 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the mobile phone 100.
  • the external memory card communicates with the processor 610 through the external memory interface 620 to implement a data storage function. Such as saving music, video and other files in the external memory card.
  • the pictures and/or videos included in the materials to be edited selected by the user may be pictures and/or video materials stored in the external memory card of the mobile phone 100 .
  • the internal memory 621 may be used to store computer-executable program code, which includes instructions.
  • the internal memory 621 may include an area for storing programs and an area for storing data.
  • the stored program area can store an operating system, at least one application program required by a function (such as a sound playing function, an image playing function, etc.) and the like.
  • the storage data area can store data (such as audio data, phone book, etc.) created during the use of the mobile phone 100 .
  • the internal memory 621 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (universal flash storage, UFS) and the like.
  • the processor 610 executes various functional applications and data processing of the mobile phone 100 by executing instructions stored in the internal memory 621 and/or instructions stored in the memory provided in the processor.
  • the clipping video application 511 run by the mobile phone 100 can temporarily store each frame or each video segment of the clipping material in the internal memory 621 after clipping, and the cell phone 100 executes relevant instructions of the video clipping method of the present application through the video clipping application 511, and may also be stored in the internal memory 621, which is not limited here.
  • the mobile phone 100 can realize the audio function through the audio module 670, the speaker 670A, the receiver 670B, the microphone 670C, the earphone interface 670D, and the application processor. Such as music playback, recording, etc.
  • the audio module 670 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signal.
  • the audio module 670 may also be used to encode and decode audio signals.
  • the audio module 670 may be set in the processor 610 , or some functional modules of the audio module 670 may be set in the processor 610 .
  • Loudspeaker 670A also called “horn" is used to convert audio electrical signals into sound signals.
  • Cell phone 100 can listen to music through speaker 670A, or listen to hands-free calls.
  • Receiver 670B also called “earpiece” is used to convert audio electrical signals into audio signals.
  • the receiver 670B can be placed close to the human ear to receive the voice.
  • the microphone 670C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
  • the user can approach the microphone 670C to make a sound through the human mouth, and input the sound signal to the microphone 670C.
  • the mobile phone 100 can be provided with at least one microphone 670C.
  • the mobile phone 100 can be provided with two microphones 670C, which can also implement a noise reduction function in addition to collecting sound signals.
  • the mobile phone 100 can also be provided with three, four or more microphones 670C to realize sound signal collection, noise reduction, identify sound sources, and realize directional recording functions, etc.
  • the earphone interface 670D is used to connect wired earphones.
  • the earphone interface 670D may be a USB interface 630, or a 3.5mm open mobile terminal platform (open mobile terminal platform, OMTP) standard interface, or a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the pressure sensor 680A is used to sense the pressure signal and convert the pressure signal into an electrical signal.
  • pressure sensor 680A may be located on display screen 694 .
  • pressure sensors 680A such as resistive pressure sensors, inductive pressure sensors, and capacitive pressure sensors.
  • a capacitive pressure sensor may be comprised of at least two parallel plates with conductive material.
  • the mobile phone 100 may also calculate the touched position according to the detection signal of the pressure sensor 680A.
  • touch operations acting on the same touch position but with different touch operation intensities may correspond to different operation instructions. For example: when a touch operation with a touch operation intensity less than the first pressure threshold acts on the short message application icon, an instruction to view short messages is executed. When a touch operation whose intensity is greater than or equal to the first pressure threshold acts on the icon of the short message application, the instruction of creating a new short message is executed.
  • the gyroscope sensor 680B can be used to determine the motion posture of the mobile phone 100 .
  • the angular velocity of the cell phone 100 about three axes may be determined by the gyro sensor 680B.
  • the gyro sensor 680B can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor 680B detects the shaking angle of the mobile phone 100, and calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shaking of the mobile phone 100 through reverse motion to achieve anti-shake.
  • the gyroscope sensor 680B can also be used for navigation and somatosensory game scenes.
  • the air pressure sensor 680C is used to measure air pressure.
  • the mobile phone 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 680C to assist positioning and navigation.
  • the magnetic sensor 680D includes a Hall sensor.
  • the mobile phone 100 can use the magnetic sensor 680D to detect the opening and closing of the flip leather case.
  • the mobile phone 100 can detect the opening and closing of the flip according to the magnetic sensor 680D.
  • features such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 680E can detect the acceleration of the mobile phone 100 in various directions (generally three axes). When the mobile phone 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and can be used in applications such as horizontal and vertical screen switching, pedometers, etc.
  • the mobile phone 100 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the mobile phone 100 can use the distance sensor 680F for distance measurement to achieve fast focusing.
  • Proximity light sensor 680G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes.
  • the light emitting diodes may be infrared light emitting diodes.
  • the mobile phone 100 emits infrared light through the light emitting diode.
  • Cell phone 100 uses photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the mobile phone 100 . When insufficient reflected light is detected, the cell phone 100 may determine that there is no object in the vicinity of the cell phone 100 .
  • the mobile phone 100 can use the proximity light sensor 680G to detect that the user holds the mobile phone 100 close to the ear to make a call, so as to automatically turn off the screen to save power.
  • Proximity light sensor 680G can also be used in leather case mode, automatic unlock and lock screen in pocket mode.
  • the ambient light sensor 680L is used for sensing ambient light brightness.
  • the mobile phone 100 can adaptively adjust the brightness of the display screen 694 according to the perceived ambient light brightness.
  • the ambient light sensor 680L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 680L can also cooperate with the proximity light sensor 680G to detect whether the mobile phone 100 is in the pocket, so as to prevent accidental touch.
  • the fingerprint sensor 680H is used to collect fingerprints.
  • the mobile phone 100 can use the collected fingerprint features to realize fingerprint unlocking, access to the application lock, take pictures with the fingerprint, answer calls with the fingerprint, and the like.
  • the temperature sensor 680J is used to detect temperature.
  • the mobile phone 100 uses the temperature detected by the temperature sensor 680J to implement a temperature processing strategy. For example, when the temperature reported by the temperature sensor 680J exceeds the threshold, the mobile phone 100 may reduce the performance of the processor located near the temperature sensor 680J, so as to reduce power consumption and implement thermal protection.
  • the mobile phone 100 when the temperature is lower than another threshold, the mobile phone 100 heats the battery 532 to avoid abnormal shutdown of the mobile phone 100 caused by the low temperature.
  • the mobile phone 100 boosts the output voltage of the battery 532 to avoid abnormal shutdown caused by low temperature.
  • Touch sensor 680K also known as "touch device”.
  • the touch sensor 680K can be arranged on the display screen 694, and the touch sensor 680K and the display screen 694 form a touch screen, also called “touch screen”.
  • the touch sensor 680K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to touch operations can be provided through the display screen 694 .
  • the touch sensor 680K may also be disposed on the surface of the mobile phone 100 , which is different from the position of the display screen 694 .
  • the bone conduction sensor 680M can acquire vibration signals.
  • the bone conduction sensor 680M can acquire the vibration signal of the vibrating bone mass of the human voice.
  • the bone conduction sensor 680M can also touch the human pulse and receive the blood pressure beating signal.
  • the bone conduction sensor 680M can also be disposed in the earphone, combined into a bone conduction earphone.
  • the audio module 670 can analyze the voice signal based on the vibration signal of the vibrating bone mass of the vocal part acquired by the bone conduction sensor 680M, so as to realize the voice function.
  • the application processor can analyze the heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 680M, so as to realize the heart rate detection function.
  • the keys 690 include a power key, a volume key and the like. Key 690 may be a mechanical key. It can also be a touch button.
  • the mobile phone 100 can receive key input and generate key signal input related to user settings and function control of the mobile phone 100 .
  • the motor 691 can generate a vibrating reminder.
  • the motor 691 can be used for incoming call vibration prompts, and can also be used for touch vibration feedback.
  • touch operations applied to different applications may correspond to different vibration feedback effects.
  • the motor 691 can also correspond to different vibration feedback effects for touch operations acting on different areas of the display screen 694 .
  • Different application scenarios for example: time reminder, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 692 can be an indicator light, and can be used to indicate charging status, power change, and can also be used to indicate messages, missed calls, notifications, and the like.
  • the SIM card interface 695 is used for connecting a SIM card.
  • the SIM card can be connected and separated from the mobile phone 100 by inserting it into the SIM card interface 695 or pulling it out from the SIM card interface 695 .
  • the mobile phone 100 can support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
  • SIM card interface 695 can support Nano SIM card, Micro SIM card, SIM card etc. Multiple cards can be inserted into the same SIM card interface 695 at the same time. The types of the multiple cards may be the same or different.
  • the SIM card interface 695 is also compatible with different types of SIM cards.
  • the SIM card interface 695 is also compatible with external memory cards.
  • the mobile phone 100 interacts with the network through the SIM card to implement functions such as calling and data communication.
  • the mobile phone 100 adopts eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the mobile phone 100 and cannot be separated from the mobile phone 100 .
  • FIG. 7 shows a software structural block diagram of a mobile phone 100 .
  • the software system of the mobile phone 100 can adopt a layered architecture, an event-driven architecture, a micro-kernel architecture, a micro-service architecture, or a cloud architecture.
  • the software structure of the mobile phone 100 is illustrated by taking the Android system with a layered architecture as an example.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces.
  • the Android system is divided into four layers, which are, from top to bottom, the application program layer, the application program framework layer, Android runtime (Android runtime ) and system library, and the kernel layer.
  • the application layer can consist of a series of application packages.
  • the application package may include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and short message.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer can include window manager, content provider, view system, phone manager, resource manager, notification manager, etc.
  • a window manager is used to manage window programs.
  • the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, capture the screen, etc.
  • Content providers are used to store and retrieve data and make it accessible to applications.
  • Said data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebook, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on.
  • the view system can be used to build applications.
  • a display interface can consist of one or more views.
  • a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide communication functions of the mobile phone 100 . For example, the management of call status (including connected, hung up, etc.).
  • the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
  • the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and can automatically disappear after a short stay without user interaction.
  • the notification manager is used to notify the download completion, message reminder, etc.
  • the notification manager can also be a notification that appears on the top status bar of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window.
  • prompting text information in the status bar issuing a prompt sound, vibrating the electronic device, and flashing the indicator light, etc.
  • the Android Runtime includes core library and virtual machine.
  • the Android runtime is responsible for the scheduling and management of the Android TM system.
  • the core library includes two parts: one part is the function function that the java language needs to call, and the other part is the core library of AndroidTM .
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application program layer and the application program framework layer as binary files.
  • the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
  • a system library can include multiple function modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
  • the surface manager is used to manage the display subsystem and provides the fusion of 2D and 5E layers for multiple applications.
  • the media library supports playback and recording of various commonly used audio and video formats, as well as still image files, etc.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing, etc.
  • 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer includes at least a display driver, a camera driver, an audio driver, and a sensor driver.
  • the user selects one of the picture or video sources of the material to be edited, and the mobile phone 100 captures the scene of taking pictures, and exemplifies the workflow of the software and hardware of the mobile phone 100 .
  • a corresponding hardware interrupt is sent to the kernel layer.
  • the kernel layer processes touch operations into original input events (including touch coordinates, time stamps of touch operations, and other information). Raw input events are stored at the kernel level.
  • the application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Take the touch operation as a touch-click operation, and the control corresponding to the click operation is the control of the video clip application icon as an example.
  • the video clip application calls the interface of the application framework layer to start the video clip application. Call the camera application to capture images or videos to be edited, then the video editing application can call the camera application, and then use the camera application to call the kernel layer to start the camera driver, and use the camera 693 to capture still images or videos.
  • the present disclosure also relates to means for performing operations in text.
  • This apparatus may be specially constructed for the required purposes or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer.
  • a computer program may be stored on a computer readable medium such as, but not limited to, any type of disk including floppy disk, compact disk, CD-ROM, magneto-optical disk, read-only memory (ROM), random-access memory (RAM) , EPROM, EEPROM, magnetic or optical card, application specific integrated circuit (ASIC), or any type of medium suitable for storing electronic instructions, and each may be coupled to a computer system bus.
  • computers referred to in the specification may comprise a single processor or may be architectures involving multiple processors for increased computing power.

Abstract

本申请涉及智能终端技术领域,具体涉及一种视频剪辑方法、电子设备及存储介质,该方法包括:利用第一剪辑模型对待剪辑素材进行第一剪辑,得到第一剪辑视频;对第一剪辑视频进行评估得到第一反馈值;根据第一反馈值确定第二剪辑模型,并利用第二剪辑模型对待剪辑素材进行第二剪辑,得到第二剪辑视频;对第二剪辑视频进行评估得到高于第一反馈值的第二反馈值;将第二剪辑视频作为待剪辑素材的输出剪辑视频。本申请通过采用视频剪辑模型对待剪辑素材先进行模拟剪辑训练找到最优剪辑策略,再采用所找到的最优剪辑策略对待剪辑素材进行剪辑处理,剪辑得到专业化程度、欣赏价值较高的视频。

Description

视频剪辑方法、电子设备及存储介质
本申请要求于2021年08月31日提交中国专利局、申请号为202111012801.6、申请名称为“视频剪辑方法、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及智能终端技术领域,具体涉及一种视频剪辑方法、电子设备及存储介质。
背景技术
随着流媒体技术的蓬勃发展,越来越多的人们热衷于在短视频平台上发布富有趣味和创意的短视频内容来分享观点、展示自己等,然而短视频剪辑制作的过程往往需要花费大量的精力,剪辑制作一个好的短视频,从视频素材的构思、拍摄,到后期剪辑视频、添加转场、添加特效以及背景音乐等各个环节都需要精心设计,这无疑给非专业的普通用户设置较高的技术门槛。
目前也有许多视频剪辑应用工具可以给用户提供一站式剪辑功能以及其他多种短视频制作功能,通过让用户选取一定数量的照片或短视频,再按照预置在剪辑应用内的参数,比如用于控制在某个固定的时刻采用某种转场处理、以及添加特效渲染等的参数,最终生成与模板同款的短视频内容。然而这种短视频剪辑创作方式虽然提高了视频剪辑效率,但套用模板生成的短视频往往很难体现用户的个性化特点,大量套用模板生成的短视频也很可能会使得用户剪辑制作的短视频千篇一律,内容较为单一。
发明内容
本申请实施例提供了一种视频剪辑方法、电子设备及存储介质,该方法通过一种可以根据用户选择的剪辑风格对用户输入素材进行剪辑的视频剪辑模型实现,剪辑时,该视频剪辑模型中的视频剪辑部分可以先对用户输入素材进行预设次数的模拟剪辑训练,在模拟剪辑训练的过程中该视频剪辑模型的评分部分能够根据用户选择的剪辑风格、对视频剪辑部分已经完成模拟剪辑的视频片段进行评分,并反馈给视频剪辑部分,以促进视频剪辑部分基于该反馈调整网络参数、优化剪辑策略进而确定对将要剪辑的下一段视频片段所执行的更优的模拟剪辑操作,最终使视频剪辑模型通过预设次数的模拟剪辑训练过程找到适合用户输入素材的最优剪辑策略,从而剪辑出匹配用户所选剪辑风格的、专业化程度比较高的短视频,通过视频剪辑模型自行模拟剪辑找到最优剪辑策略,也降低了用户操作视频剪辑过程的技术门槛,利于提高用户的使用体验。
第一方面,本申请实施例提供了一种视频剪辑方法,应用于电子设备,该方法包括:利用第一剪辑模型对待剪辑素材进行第一剪辑,得到第一剪辑视频;对第一剪辑视频进行评估得到第一反馈值;根据第一反馈值确定第二剪辑模型,并利用第二剪辑模型对待剪辑素材进行第二剪辑,得到第二剪辑视频;对第二剪辑视频进行评估得到第二反馈值,其中,第二反馈值高于第一反馈值;将第二剪辑视频作为待剪辑素材的输出剪辑视频。
即本申请所提供的视频剪辑方法,可以利用一个初始的第一剪辑模型,对待剪辑素材进行多次剪辑(例如多次模拟剪辑),并在多次模拟剪辑过程中找到最优的剪辑模型(即第二剪辑模型,例如下文所描述的最优剪辑策略对应的剪辑策略网络220)后,将利用第二剪辑模型对待剪辑素材进行剪辑处理后得到的剪辑视频(即第二剪辑视频),作为输出剪辑视频,完成对待剪辑素材的剪辑处理过程。
其中,第一剪辑模型例如可以是下文步骤311t中所描述的初始网络参数下的剪辑策略网络220,第一剪辑、第二剪辑例如是下文实施例中所描述的预设多次的模拟剪辑训练过程,即第一剪辑对应于第 一次模拟剪辑,第二剪辑对应于预设次数阈值对应的最后一次模拟剪辑,相应地,第二剪辑模型对应于经多次模拟剪辑训练后网络参数调整为对应于最优剪辑策略的网络参数的剪辑策略网络200。因此,可以理解,在第一剪辑与第二剪辑之间,对待剪辑素材重复进行了多次模拟剪辑,以最终确定第二剪辑模型为最优剪辑模型,将第二剪辑视频作为输出剪辑视频。
对于每次的模拟剪辑过程,例如上述第一剪辑所对应的第一次模拟剪辑过程中,对模拟剪辑的成果(例如第一剪辑视频所对应的第一次模拟剪辑过程中完成剪辑的各视频片段)进行评估得到用于调整第一剪辑模型的第一反馈值,例如下文实施例中步骤314t至315t中所描述的价值网络210对完成剪辑的视频片段进行评分对应反馈的奖励值,剪辑策略网络220则可以基于该奖励值计算损失函数并调整网络参数,每完成一次模拟剪辑训练,剪辑策略网络220相应完成一轮网络参数的调整。经过预设次数的模拟剪辑训练后,剪辑策略网络220找到最优剪辑策略,相应的剪辑策略网络220的网络参数也调整为最优剪辑策略对应的网络参数,此时,处于最优剪辑策略对应的网络参数下的剪辑策略网络220即为最优剪辑模型,即上述第二剪辑模型,利用最优剪辑模型对待剪辑素材进行实际的剪辑处理,得到的即为输出剪辑视频的第二剪辑视频。
另外,可以理解,下文实施例中所描述的视频片段对应于待剪辑素材划分得到的视频片段,因此对第一剪辑视频或第二剪辑视频评估所得到的第一反馈值或第二反馈值,可以对应理解为价值网络210对各段完成剪辑的视频片段评分对应反馈的奖励值的累计值,参考下文步骤303之后的相关描述,价值网络210所反馈的奖励值的累计值越高则表明剪辑得到的第一剪辑视频或第二剪辑视频的欣赏价值、专业化程度等越高。
在上述第一方面的一种可能的实现中,上述方法还包括:根据第一反馈值确定第二剪辑模型,具体为:根据第一反馈值,将第一剪辑模型的参数调整至第二剪辑模型的参数。
即根据第一反馈值确定第二剪辑模型是通过根据第一反馈值调整剪辑模型的网络参数来完成的,第一反馈值例如是下文所描述的第一次模拟剪辑训练过程中价值网络210对各段完成剪辑的视频片段评分所反馈的奖励值的累计值,第一剪辑模型的参数例如是下文所描述的剪辑策略网络220的初始网络参数δ 1,第二剪辑模型的参数例如是下文所描述的对应于最优剪辑策略的网络参数。因此根据第一反馈值将第一剪辑模型的参数调整至第二剪辑模型的参数的过程,包括根据第一反馈值先剪辑策略网络220的初始网络参数δ 1,例如第一次模拟剪辑训练后,剪辑策略网络220的网络参数调整为δ 100,接着再利用网络参数δ 100下的剪辑策略网络220对待剪辑素材进行第二次模拟剪辑训练,第二次模拟剪辑训练过程中价值网络210对各段完成剪辑的视频片段评分所反馈的奖励值继续用于调整剪辑策略网络220的网络参数。如此,经过预设多次模拟剪辑训练后,剪辑策略网络220的网络参数则最终调整为最优剪辑策略对应的网络参数。
在上述第一方面的一种可能的实现中,上述方法还包括:对第一剪辑视频进行评估得到第一反馈值,包括:通过评价模型对第一剪辑视频进行评估以确定第一反馈值。
上述评价模型,例如是下文实施例中所描述的价值网络210,因此可以理解,评价模型用于对第一剪辑模型或第二剪辑模型的剪辑能力进行评估,例如对利用第一剪辑模型剪辑得到的第一剪辑视频进行评估,得到第一反馈值。对应于下文实施例中所描述的,价值网络210对剪辑策略网络220模拟剪辑得到的完成剪辑的视频片段进行评分以反馈奖励值的过程,即为对剪辑策略网络220的剪辑能力的评估过程。
可以理解,在另一些实施例中,评价模型也可以是与下文所描述的价值网络210具有同等评分反馈功能的评价系统或评价算法,在此不做限制。
在上述第一方面的一种可能的实现中,上述方法还包括:评价模型包括对应于多种剪辑风格的评分规则。
在上述第一方面的一种可能的实现中,上述方法还包括:通过评价模型对第一剪辑视频进行评估以确定第一反馈值,包括:响应于用户所选择的剪辑风格,评价模型采用对应于用户所选剪辑风格的评分规则对第一剪辑视频进行评分以确定第一反馈值。
即评价模型中可以预置有对应于多种剪辑风格的评分规则来对第一剪辑视频进行评估,评价模型评估第一剪辑视频时采用对应于用户所选剪辑风格的评分规则,得到的第一反馈值也是对应于用户所选的剪辑风格,进而根据第一反馈值调整第一剪辑模型的参数、确定的第二剪辑模型才能够剪辑得到符合用户所选剪辑风格的第二剪辑视频。其中,评价模型例如是下文实施例中所描述的价值网络210,价值网络210基于用户所选剪辑风格对应的评分规则,对完成剪辑的视频片段进行评分以反馈奖励值用于调整剪辑策略网络200的网络参数,最终完成参数调整的剪辑策略网络200找到的最优剪辑策略即为对应于用户所选剪辑风格和待剪辑素材的最优剪辑策略。价值网络210中预设的对应于多种剪辑风格的评分规则可以参考下文步骤301中相关描述,在此不再赘述。
在上述第一方面的一种可能的实现中,上述方法还包括:对第二剪辑视频进行评估得到第二反馈值,包括:通过评价模型对第二剪辑视频进行评分以确定第二反馈值,其中评价模型采用对应于用户所选剪辑风格的评分规则对第二剪辑视频进行评分。
即评价模型对于第一剪辑或第二剪辑以及其他模拟剪辑过程得到的剪辑视频,均采用统一中剪辑风格(用户所选的剪辑风格)对应的评分规则进行评估并得到反馈值,评价模型采用对应于用户所选剪辑风格的评分规则对第二剪辑视频进行评分,进而基于评分确定第二反馈值,该过程可以参考下文步骤303后所描述的价值网络210采用对应于用户所选剪辑风格的评分规则对完成剪辑的视频片段进行评分,视频剪辑模型200基于该评分转换得到输入给剪辑策略网络220的奖励值的相关描述,在此不再赘述。
在上述第一方面的一种可能的实现中,上述方法还包括:第二反馈值高于第一反馈值,包括:评价模型对第二剪辑视频的评分高于评价模型对第一剪辑视频的评分。
即第一反馈值基于评价模型对第一剪辑视频的评分确定,第二反馈值基于评价模型对第二剪辑视频的评分确定,例如下文步骤303后所描述的,价值网络210对完成剪辑的视频片段的评分与反馈给剪辑策略网络220的奖励值可以是线性转换关系,例如评分87可以对应与奖励值8.7,评分越高,所反馈的奖励值越大。因此可以理解,评价模型的评分越高,相应的反馈值也越高。
在上述第一方面的一种可能的实现中,上述方法还包括:多种剪辑风格包括港风、童年、漫画、悬疑、中国风、可爱、唯美中的一种或多种。
在上述第一方面的一种可能的实现中,上述方法还包括:第一剪辑模型或第二剪辑模型,对待剪辑素材进行第一剪辑所采取的剪辑策略包括下列中的至少一项:对待剪辑视频执行的划分待剪辑视频片段的操作;对划分得到的各待剪辑视频片段确定剪辑动作的操作;以及对各待剪辑视频片段采用确定的剪辑动作进行剪辑操作。
第一剪辑模型例如是下文初始网络参数δ 1下的剪辑策略网络220,第一剪辑模型所对应的剪辑策略则是对应于网络参数δ 1的剪辑策略,第二剪辑模型所对应的剪辑策略为对应于下文所描述的最优剪辑策略。结合下文实施例内容可以理解,剪辑策略网络220对待剪辑素材采取的剪辑策略,包括对待剪辑素材进行划分得到内容衔接流畅的各段视频片段、以及确定对各视频片段所采取的剪辑动作的运算过程。
在上述第一方面的一种可能的实现中,上述方法还包括:剪辑动作包括下列中的任一项或多项组合:添加转场特效;添加动感特效;使用滤镜;标记为精彩片段;进行镜头拼接;变速处理;添加背景音乐; 添加特效音频;调节声音。
上述可采取的剪辑动作可以参考下文图2相关描述,在此不再赘述。
在上述第一方面的一种可能的实现中,上述方法还包括:待剪辑素材包括待剪辑的图片和/或视频。
即本申请所提供的视频剪辑方法,所适用的待剪辑素材,可以图片集合,也可以视频集合或者图片与视频的组合,在此不做限制。
第二方面,本申请实施例提供了一种电子设备,该电子设备包括一个或多个处理器;一个或多个存储器;一个或多个存储器存储有一个或多个程序,当一个或者多个程序被一个或多个处理器执行时,使得电子设备执行上述视频剪辑方法。
第三方面,本申请实施例提供了一种计算机可读存储介质,该存储介质上存储有指令,指令在计算机上执行时使计算机执行上述视频剪辑方法。
第四方面,本申请实施例提供了一种计算机程序产品,该产品包括计算机程序或指令;计算机程序或指令在计算机上被处理器执行时使计算机执行上述视频剪辑方法。
附图说明
图1A至1E所示为一种视频剪辑方案的操作界面示意图。
图2所示为本申请实施例所提供的视频剪辑模型的一种结构示意图。
图3A所示为本申请实施例所提供的一种价值网络210的训练过程示意图。
图3B-1所示为本申请实施例所提供的一种视频剪辑模型200对待剪辑样本数据进行模拟剪辑训练的过程示意图。
图3B-2所示为本申请实施例所提供的一种视频剪辑模型200采用上述图3B-1所示的模拟剪辑训练过程训练得到的最优剪辑策略对待剪辑样本数据进行剪辑处理的过程示意图。
图4所示为本申请实施例所提供的手机100执行本申请的视频剪辑方法的实施流程示意图。
图5A至5E所示为本申请实施例所提供的手机100执行本申请的视频剪辑方法的操作界面示意图。
图6所示为本申请实施例所提供的一种手机100的结构示意图。
图7所示为本申请实施例所提供的一种手机100的软件结构框图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面通过结合附图和实施方案,对本申请实施例的技术方案做进一步地详细描述。
本申请的说明性实施例包括但不限于一种视频剪辑方法、电子设备及存储介质等。
图1A至1E示出了一种视频剪辑方案的操作界面示意图。
如图1A所示,用户点击打开手机100'桌面上的剪映 TM110,进入图1B所示的剪辑功能界面101,可以理解,剪映 TM110为手机100'上安装的一种视频剪辑应用。
如图1B所示,用户点击剪辑功能界面101上的开始创作按钮011可以进入图1C所示的自定义剪辑界面102,剪辑视频时,用户需要在图1C所示的界面102上点击剪辑按钮021来操作裁剪视频素材的长度、以及分别点击音频按钮022、文字按钮023、画中画按钮024、特效按钮025以及滤镜按钮026等分别操作添加音频、添加文字、添加画中画播放的视频、添加转场及动感特效、以及添加滤镜效果等等,完成视频剪辑后,用户点击图1C所示的自定义剪辑界面102右上方的导出按钮027,可以导出完成剪辑的短视频。
通过以上关于对用户在图1C所示界面102上的操作完成视频剪辑的过程介绍可知,普通用户在图1C所示的自定义剪辑界面102完成一个视频内容的剪辑,需要花费大量的时间尝试各按钮所对应的素 材、特效等并不断预览视频剪辑效果,最终才能得到剪辑后的视频内容,用户体验较差。
因此,为了便于普通用户使用,上述图1B所示的剪辑功能界面101上还提供有一键成片按钮012,用户可以点击该一键成片按钮012进入图1D所示素材选择界面103,用户在该素材选择界面103上选择需要添加的视频或图片后,点击界面103右下角的下一步按钮031即可进入图1E所示的模板选择界面104。如图1E所示,模板选择界面104包括模板推荐区域041,用户可以在模板推荐区域041中选择感兴趣的模板,例如点击“模板1”一键剪辑得到与“模板1”同款的短视频,图1E所示的模板选择界面104还包括视频预览区域042,用户选择相应模板后可以在视频预览区域042内预览剪辑生成的短视频效果。完成视频剪辑后,用户可以点击模板选择界面104右上角的导出按钮043,可以导出完成剪辑的短视频。
相比于上述图1C所示的自定义剪辑界面102,图1D至1E所示的对应于一键成片按钮012所提供的操作界面更加便于用户快速剪辑,对于非专业的普通用户来说,“一键成片”按钮012的功能能够大大降低创作门槛。但是,可以想象,当大量用户使用“一键成片”功能简单套用模板,生成的视频内容相似度也会较高,可能导致上述背景技术中所描述的内容单一的问题。
因此,对于短视频的剪辑制作需要有一种方法,既能降低短视频剪辑制作门槛、提高短视频剪辑的效率,又能够使得剪辑得到的短视频具有专业剪辑师的专业水准。
为了解决上述问题,本申请提供了一种视频剪辑方法,具体地,该视频剪辑方法是基于一种视频剪辑模型实现的,该视频剪辑模型可以根据用户选择的剪辑风格对用户输入素材进行剪辑,其中,该视频剪辑模型包括了执行视频剪辑的部分(例如,下文称为剪辑策略网络220)和对视频剪辑部分进行评分以促进视频剪辑部分不断进行优化的评分部分(例如,下文称为价值网络210)。剪辑时,该视频剪辑模型中的视频剪辑部分可以先对用户输入素材进行预设次数的模拟剪辑训练,在模拟剪辑训练的过程中该视频剪辑模型的评分部分能够根据用户选择的剪辑风格、对视频剪辑部分已经完成模拟剪辑的视频片段(例如第1段完成剪辑的视频片段)进行评分,并反馈给视频剪辑部分,以促进视频剪辑部分基于该反馈调整网络参数、优化剪辑策略进而确定对将要剪辑的下一段(例如第2段)视频片段所执行的更优的模拟剪辑操作,从而使视频剪辑模型通过预设次数的模拟剪辑训练过程找到适合用户输入素材的最优剪辑策略,从而剪辑出匹配用户所选剪辑风格的、专业化程度比较高的短视频,例如该短视频所呈现的专业化程度可以达到专业剪辑师所制作的相同风格的短视频所对应的专业化程度。
可以理解,在本申请的各实施例中,视频剪辑模型的这两个部分(视频剪辑部分和评分部分)可以基于同一类型的神经网络模型来实现,也可以基于不同的神经网络模型来实现。例如,评分部分(即下述价值网络210)和视频剪辑部分(即下述剪辑策略网络220)均可以基于卷积神经网络(Convolutional Neural Network,CNN)来实现,或者评分部分基于循环神经网络(Recurrent Neural Network,RNN)来实现、视频剪辑部分基于CNN来实现,作为示例,在本申请的实施例中,视频剪辑模型200的视频剪辑部分(即下述剪辑策略网络220)可以采用深度强化学习网络(Deep Q-Network,DQN)进行训练,其中,DQN是一种基于CNN建立的将深度学习(deeplearning)与强化学习(reinforcementlearning)相结合以实现从感知到动作的端到端神经网络模型。在另一些实施例中,评分部分和视频剪辑部分也可以基于其他神经网络模型来实现,在此不做限制。
下面将结合附图对本申请的实施例作进一步地详细描述。
图2根据本申请的一些实施例,示出了一种视频剪辑模型剪辑视频的场景20。
如图2所示,该场景20包括手机100和服务器300。其中,手机100能够使用服务器300训练出来的视频剪辑模型200,实现视频剪辑功能,即输入待剪辑的素材、选择视频剪辑风格,视频剪辑模型 200先基于用户所选剪辑风格对待剪辑素材进行预设次数的模拟剪辑训练,最终找到适合待剪辑素材的最优剪辑策略对待剪辑素材进行剪辑处理,最终得到完成剪辑的短视频。
可以理解,在视频剪辑模型200对待剪辑素材进行模拟剪辑训练之前,可以基于预设的视频片段划分策略将待剪辑素材划分为预设多段连续的待剪辑的视频片段,本申请实施例上下文中所描述的视频片段均指对待剪辑素材完成划分后得到的视频片段。其中,视频剪辑模型200上预设的视频片段划分策略能够将待剪辑素材合理划分为合理数量的视频片段,并且所划分的视频片段内容上相互关联、连续的视频片段之间的内容衔接也具有较好的流畅性。可以理解,视频剪辑模型200中预设的视频片段划分策略也可以基于各种剪辑风格的视频剪辑片段通过神经网络模型训练得到,在此不再赘述。以下内容将重点描述本申请所提供的视频剪辑模型200中价值网络210以及剪辑策略网络220的训练过程、以及二者相互配合实现通过视频剪辑模型200执行本申请的视频剪辑方法的具体过程。
在上述图2所示的场景20中,视频剪辑模型200将待剪辑素材划分为预设的多段视频片段后,视频剪辑模型200对待剪辑素材进行每次模拟剪辑训练的过程,即是对划分得到的多段视频片段依次进行模拟剪辑处理的过程。也就是说:在每次的模拟剪辑训练过程中,视频剪辑模型200能够通过剪辑策略网络220对第1段输入视频剪辑模型200的视频片段进行模拟剪辑处理,再通过价值网络210对模拟剪辑处理得到的该段完成剪辑的视频片段进行评分反馈,以促进视频剪辑模型200的剪辑策略网络220调整网络参数,从而优化对第2段输入视频剪辑模型200的视频片段的模拟剪辑操作;接着,视频剪辑模型200再对输入的第2段视频片段进行模拟剪辑操作,再通过价值网络210对第2段完成剪辑的视频片段进行评分反馈,以促进视频剪辑模型200的剪辑策略网络220继续调整网络参数,从而优化对第3段输入视频剪辑模型200的视频片段的模拟剪辑操作;以此循环,直至视频剪辑模型200完成对划分得到的全部视频片段的模拟剪辑处理过程。相应地,剪辑策略网络220的网络参数在视频剪辑模型200对各段视频片段进行模拟剪辑的过程中可以连续的参数调整过程,也就是说,经过模拟剪辑训练之后的剪辑策略网络220的网络参数所对应的剪辑策略,可以优于模拟剪辑训练之前的剪辑策略网络220的网络参数所对应的剪辑策略。
因此可以理解,每完成一次模拟剪辑训练的过程,便是调整剪辑策略网络220的网络参数优化剪辑策略的过程,当视频剪辑模型200基于剪辑策略网络220完成预设次数的模拟剪辑训练之后,剪辑策略网络220的网络参数能够调整至最佳参数,所对应的剪辑策略即为最优剪辑策略,此时视频剪辑模型200通过该剪辑策略网络220对待剪辑素材进行剪辑,能够得到对应于最优剪辑策略的视频剪辑片段或短视频。
继续参考图2所示场景20,在视频剪辑模型200采用经多次模拟剪辑训练后得到的最优剪辑策略对待剪辑素材进行剪辑处理的过程中,视频剪辑模型200通过剪辑策略网络220对待剪辑素材采用最优剪辑策略对划分得到的各视频片段进行相应的剪辑处理,相应地得到第1段完成剪辑的视频片段、第2段完成剪辑的视频片段、第N-1段完成剪辑的视频片段、以及第N段完成剪辑的视频片段等,直到视频剪辑模型200完成对全部待剪辑素材的模拟剪辑处理过程之后,视频剪辑模型200输出完成剪辑的短视频,该短视频由上述第1段完成剪辑的视频片段、第2段完成剪辑的视频片段、第N-1段完成剪辑的视频片段、以及第N段完成剪辑的视频片段构成。
其中,基于完成剪辑的视频片段生成完成剪辑的短视频的过程,例如可以是对完成剪辑的视频片段按照剪辑处理顺序进行拼接得到完成剪辑的短视频,可以理解,由于视频剪辑模型200在对待剪辑素材进行模拟剪辑训练之前,已基于预设的视频片段划分策略将待剪辑素材合理划分为多段视频片段,因此,连续的完成剪辑的视频片段之间仍可以保留内容上的关联性和内容衔接的流畅性,因此,视频剪辑模型 200完成对待剪辑的视频片段的剪辑处理后可以将各段完成剪辑的视频片段依次拼接得到完成剪辑的短视频。在另一些实施例中,也可以通过其他处理方式处理完成剪辑的视频片段得到完成剪辑的短视频,在此不做限制。
具体地,在一些实施例中,服务器300能够通过样本数据来训练出能够对完成剪辑的视频片段进行评分的价值网络210(即上述视频剪辑模型的评分部分),然后将价值网络210嵌入视频剪辑模型200之后,例如该嵌入过程可以是将价值网络210的输入层数据接口与视频剪辑模型200中完成剪辑的视频片段数据输出的接口进行对接等数据对接设置过程。之后,服务器300再对嵌入了价值网络210的视频剪辑模型200输入待剪辑样本数据,进行视频剪辑模型200的剪辑策略网络220(即上述视频剪辑部分)的训练。当服务器300完成视频剪辑模型200的全部训练之后,可以将训练好的视频剪辑模型200移植到手机100上执行视频剪辑任务。可以理解,视频剪辑模型200在移植到手机100之后,执行视频剪辑任务的过程可以是对视频剪辑模型200的剪辑策略网络220的优化训练过程。
需要说明的是,视频剪辑模型200的剪辑策略网络220中可以预置对待剪辑样本数据的剪辑动作集,组成剪辑动作集的剪辑动作可以包括但不限于添加转场特效、添加动感特效、使用滤镜、标记为精彩片段、进行镜头拼接、变速处理、添加背景音乐、添加特效音频、调节声音等动作中的一种或多种,例如此时是否应添加转场以及采用何种转场特效、是否使用滤镜以及采用何种滤镜、是否标记为精彩片段、是否需要进行镜头拼接、以及是否需要进行快动作或慢动作变速处理等,该剪辑动作可以通过一个或多个视频剪辑参数进行描述,可以理解,该视频剪辑参数包括但不限于视频变速参数、转场特效参数、动感特效参数、背景特效参数、背景音乐参数、声音调节参数等。完成剪辑的视频片段,则相应地可以是经过与该视频内容高度契合的视频变速、添加背景音乐、声音调节、添加转场及特效、添加滤镜等剪辑动作处理得到的视频片段。
例如,某个剪辑动作所对应的视频变速参数为1.5,其他参数均为0,则该剪辑动作可能是对待剪辑样本数据的某一段视频片段进行的加速变速处理,可以理解这种由单个固定的视频变速参数确定的变速处理常规的倍速处理过程。在另一些实施例中,某个视频剪辑参数也可以是一段连续变化的函数,对于前述例子,例如对某一段视频片段进行的变速处理也可以是曲线变速,例如是快慢交替的变速处理等,在此不再赘述。
可以理解,在本申请的另一些实施例中,上述训练视频剪辑模型200的价值网络210的服务器300,还可以是膝上型计算机、台式计算机、平板计算机、手机、可穿戴设备、头戴式显示器、移动电子邮件设备、便携式游戏机、便携式音乐播放器、阅读器设备、其中嵌入或耦接有一个或多个处理器的电视机、或能够访问网络的其他电子设备。上述移植视频剪辑模型200进行视频剪辑处理的手机100,还可以是平板电脑、桌面型、膝上型、手持计算机、上网本,以及增强现实(augmentedreality,AR)/虚拟现实(virtual reality,VR)设备、智能电视、智能手表或能够访问网络的其他电子设备;或者是摄像机、以及手持云台设备等具有拍摄功能的其他电子设备,在此不做限制。
下文结合附图,继续以电子设备100为手机,电子设备200为服务器为例,说明本申请的技术方案。
如前所述,在本申请的一些实施例中,可以在服务器300上先训练出视频剪辑模型的价值网络210之后,将该价值网络210嵌入包括剪辑策略网络220的视频剪辑模型200,例如通过将价值网络210的输出层与剪辑策略网络220的输入层进行数据对接(例如将价值网络210输出的评分转换为奖励值输入至剪辑策略网络220)、以及剪辑策略网络220的输出层与价值网络210的输入层之间完成数据对接(例如价值网络210获取剪辑策略网络220输出的完成剪辑的视频片段)等编译过程来完成价值网络210的嵌入。之后,再训练视频剪辑模型200通过剪辑策略网络220对待剪辑素材进行剪辑处理的过程,以 使剪辑策略网络220具有一定的剪辑预测能力和决策能力,例如训练后的视频剪辑模型200中剪辑策略网络220能够通过多次模拟剪辑训练找到适合待剪辑素材的最优剪辑策略,对待剪辑素材进行剪辑处理。
可以理解,在完成视频剪辑模型200中的价值网络210和剪辑策略网络220的全部训练过程之后,可以将视频剪辑模型200从服务器300移植到手机100上,以使用户能够通过手机100实施本申请的视频剪辑方法,进行视频剪辑。
图3A根据本申请实施例示出了一种价值网络210的训练过程示意图。
示例性地,如图3A所示,该训练过程包括以下步骤:
301:服务器300获取用于训练价值网络210的样本数据库。其中,该样本数据库可以是对应于多种剪辑风格的数据库。
示例性地,在服务器300训练价值网络210之前,可以采集大量的剪辑处理后的视频剪辑片段,并对采集到的视频剪辑片段进行评分,从而得到包括视频剪辑片段以及对应各视频剪辑片段的评分的样本数据,大量的样本数据形成用于训练价值网络210的样本数据库。其中,每个样本数据包括一个视频剪辑片段,以及对应于该视频剪辑片段的评分,该评分在后续价值网络210的训练步骤中用于拟合评分规则,并且该评分在后续视频剪辑模型进行视频剪辑处理的过程中用于作为价值网络210的输出反馈给视频剪辑模型中的剪辑策略网络220,具体将在下文中相应步骤和视频剪辑处理过程中进行描述,在此不再赘述。可以理解,服务器300获取的样本数据库可以是对应于多种剪辑风格的数据库,以用于在后续步骤中拟合得到各剪辑风格对应的评分规则,不同剪辑风格所对应的评分规则不同,具体将在下文中详细介绍,在此不再赘述。
作为示例,对采集到的视频剪辑片段进行评分,例如可以通过请专业剪辑师对所采集的视频剪辑片段分别进行评分,其中,对各视频剪辑片段进行评分的规则,可以由专业剪辑师设定。其中,专业剪辑师对各视频剪辑片段进行评分时,可以从多个维度对待评价的视频剪辑片段分别进行评分,例如,专业剪辑师在观看完一个视频剪辑片段之后,可以从该视频剪辑片段的创造性、趣味性、艺术性、叙事能力等维度对该视频剪辑片段分别进行评分,因此,在最终形成的样本数据库中,一个样本数据中的视频剪辑片段可以对应于多个不同维度的评分。
在另一些实施例中,专业剪辑师对各视频剪辑片段进行评分时,也可以在观看完该视频剪辑片段之后对该视频剪辑片段确定一个综合评分。例如,专业剪辑师在观看完一个视频剪辑片段之后,可以对该视频剪辑片段进行综合评估确定一个综合评分,可以理解,专业剪辑师也可以从创造性、趣味性、艺术性、叙事能力等多个维度综合考量,来确定对该视频剪辑片段的综合评分,因此,在最终形成的样本数据库中,一个样本数据中的视频剪辑片段则对应于一个综合评分。可以理解,所采集的视频剪辑片段可以是由专业剪辑师完成剪辑的短视频,或者构成该短视频的视频片段等,在此不做限制。
302:服务器300将样本数据输入待训练的价值网络模型,拟合评分规则。
示例性地,在训练价值网络210时,服务器300首先要对训练所用到的上述样本数据做特征提取,然后输入待训练的价值网络模型进行训练,用于拟合评分规则。其中,服务器300对样本数据做特征提取是为了将样本数据中视频剪辑片段的图像数据、以及样本数据中对应于各视频剪辑片段的评分转换成价值网络模型能够读取的特征向量集或者矩阵集合。
具体地,在一些实施例中,上述样本数据中的视频剪辑片段以及对应于视频剪辑片段的评分一般为结构各异的非结构化数据,具有维度较高,表现形式迥异,含有大量冗余信息等特点。因此需要提取可以表征这些样本数据的特征向量,可以理解的是,这些初始特征向量可以是一维的,也可以是多维的。例如,上述步骤301中所采集样本数据中,一个视频剪辑片段对应于多个评分,即上述专业剪辑师从创 造性、趣味性、艺术性、叙事能力等多个维度对一个视频剪辑片段进行评估得到的评分,那么这个视频剪辑片段对应的评分通过创造性、趣味性、艺术性、叙事能力这四个维度的评分共同表示,也就是说这个视频剪辑片段对应的评分特征向量就具有四个维度,即创造性评分、趣味性评分、艺术性评分、叙事能力评分。
又例如,上述步骤301中所采集样本数据中,一个视频剪辑片段对应于一个综合评分,即上述专业剪辑师在观看完一个视频剪辑片段之后,对该视频剪辑片段进行综合评估确定的一个综合评分,那么这个视频剪辑片段对应的评分则可以通过这个综合评分来表示,也就是说这个视频剪辑片段对应的评分特征向量就具有一个维度,即综合评分。
可以理解,上述步骤301中所采集样本数据中,一个视频剪辑片段可以由若干帧图像数据形成,而图像数据可以包括多个颜色通道的矩阵数据,例如RGB格式的图像数据,因此,服务器300对样本数据中视频剪辑片段做特征提取时,可以通过n阶矩阵来表征组成视频剪辑片段的图像数据,一个RGB格式的图像数据可以表示为一个三阶矩阵,因此,一个由多个连续帧图像数据组成的视频剪辑片段可以表示为一个矩阵集合。可以理解,服务器300对样本数据所提取的对应于视频剪辑片段的矩阵集合、以及对应于该视频剪辑片段的评分特征向量之间具有一一对应关系。
服务器300然后将所提取的样本数据对应的特征向量或矩阵集合输入待训练的价值网络210,各样本数据中的各视频剪辑片段的矩阵数据作为价值网络210的输入,各视频剪辑片段的评分特征向量作为价值网络210的输出。例如,对价值网络210输入视频剪辑片段A的矩阵数据,专业剪辑师的评分为B,则在训练价值网络时,可以根据价值网络每次的输出评分是否为B或者输出评分与B之间的差值是否低于预设差值来调整价值网络的各参数。通过对大量样本数据执行上述训练方式,服务器300可以拟合出一种评分规则。可以理解,服务器300可以对样本数据库中的各样本数据逐个(one by one)进行提取特征处理并输入待训练的价值网络210进行训练,也可以对样本数据库中的各样本数据分批(batch by batch)进行提取特征处理并输入待训练的价值网络210进行训练,在此不做限制。
另外,如上所述,服务器300训练得到的价值网络210(即上述评分部分)能够根据用户选择的剪辑风格、对视频剪辑部分剪辑的第1段视频片段进行评分反馈给剪辑策略网络220(即视频剪辑部分)。因此可以理解,如果需要训练得到的价值网络210包括多种剪辑风格的评分规则,则可以对应于每种剪辑风格分别经过上述步骤301至302的训练过程,得到每种剪辑风格对应的评分规则,从而使完成训练的价值网络210能够对应于不同的剪辑风格采用对应于该剪辑风格的评分规则输出相应评分。
例如,如果需要价值网络210包括对剪辑风格为“港风”的视频剪辑片段的评分规则,并且如果训练的价值网络210中,对应于一个视频剪辑片段输出的是一个综合评分,则在上述步骤301采集样本数据的过程中,所采集的大量视频剪辑片段则可以全部是基于港片或王家卫电影剪辑得到的短视频或视频片段,专业剪辑师在对所采集的视频剪辑片段进行评分时,也可以基于剪辑风格为“港风”的视频剪辑片段所对应的四个维度(创造性、趣味性、艺术性、叙事能力)的评分权重进行评分,得到剪辑风格为“港风”的样本数据来训练价值网络210。进而,在上述步骤302中,服务器300可以训练价值网络210拟合出对应于剪辑风格为“港风”的评分规则。
可以理解,上述步骤301中,采集样本数据时,可以人为筛选采集各剪辑风格对应的视频剪辑样本进行评分得到样本数据,也可以通过计算机筛选采集各剪辑风格对应的视频剪辑样本进行评分得到样本数据,为了使训练得到的价值网络210中的评分规则更加符合专业剪辑师的专业欣赏角度,对应于每一种剪辑风格,在上述步骤301中都可以大量采集样本数据形成样本数据库,例如对于每种视频剪辑风格可以采集500个以上样本数据,在此不做限制。
303:服务器300得到训练好的价值网络210。
示例性地,通过重复上述步骤301至302的训练过程,服务器300可以训练得到具有对应于多种剪辑风格的评分规则的价值网络210。由于训练好的价值网络210具有对应于多种剪辑风格的评分规则,使得具有本步骤训练好的价值网络210的视频剪辑模型200能够根据用户选择的剪辑风格,对用户所选的待剪辑素材进行风格化的视频剪辑,输出与用户所选剪辑风格相匹配的视频剪辑片段。具体用户选择视频剪辑风格的过程以及示例性操作界面,将在下文中详细描述,在此不再赘述。
可以理解,如果上述步骤301至302所训练的价值网络210中对应于各剪辑风格的评分规则下,价值网络210对输入的视频剪辑片段所输出的评分如果是一个综合评分,并且该综合评分的值域范围与剪辑策略网络220设定接收的奖励值的值域范围不匹配时,则当该价值网络210嵌入视频剪辑模型200时,需要在价值网络210与剪辑策略网络220之间增加将价值网络210输出的综合评分转换为反馈给剪辑策略网络220的奖励值的换算模块,该换算模块可以基于综合分值与奖励值之间的线性对应关系,将该综合评分转换为反馈给剪辑策略网络220的奖励值。可以理解,如果上述步骤301至302所训练的价值网络210输出的评分所属的值域范围与奖励值的值域范围匹配,则价值网络210输出的评分也可以直接作为反馈给剪辑策略网络220的奖励值,此种情形则无需再对价值网络210输出的评分进行转换。
可以理解,在另一些实施例中,如果训练价值网络210所采用的样本数据中,一个视频剪辑片段对应于多个欣赏角度的评分,那么服务器300在上述步骤301至302中所训练得到的价值网络210输出的则是对应于待评价视频片段的多个评分值,则当该价值网络210嵌入视频剪辑模型200时,需要在价值网络210与剪辑策略网络220之间增加将价值网络210输出的多个评分转化为反馈给剪辑策略网络220的奖励值的计算模块,该计算模块可以先将价值网络210输出的多个评分进行权重计算得到一个综合评分,再基于综合评分与奖励值之间的线性对应关系,将该综合评分转换为反馈给剪辑策略网络220的奖励值。其中,计算模块基于价值网络210输出的多个评分权重计算得到综合评分的过程,所基于的权重计算公式例如可以参考下述公式:
Figure PCTCN2022114268-appb-000001
其中,E表示综合评分(Evaluation),M表示价值网络210输出的多个不同维度的评分的数量,g i表示第i个评分,i∈M,即M个不同维度的评分可记为e i(i=1,…,M);对应的M个维度评分值的权重为
Figure PCTCN2022114268-appb-000002
w i为第i维度的评分对应的权重系数,可以理解,各维度的评分对应的权重系数的加和为1。
例如,在上述服务器300训练价值网络210时,对专业剪辑师从创造性、趣味性、艺术性、叙事能力4个维度对一个视频剪辑片段进行评估得到的评分所提取的评分特征向量为(创造性评分、趣味性评分、艺术性评分、叙事能力评分),则服务器300训练得到的价值网络210对输入的该视频剪辑片段对应输出的评分包括4个评分,即M=4,例如e 1=68,w 1=0.3;e 2=75,w 2=0.5;e 3=81,w 3=0.1;e 4=90,w 4=0.1,则基于上述公式(1)可以得到对应于价值网络210输出的4个评分的综合评分E=75。
可以理解,上述公式(1)所涉及的评分维度以及各维度的评分值所对应的权重系数,可以由专业剪辑师根据不同的视频剪辑风格合理设定,例如,对于视频剪辑风格为港风的评分维度可以是创造性、趣味性、艺术性、叙事能力四个维度,相应地各维度的评分值所对应的权重系数例如是:创造性0.2(或20%)、趣味性0.1(或10%)、艺术性0.4(或40%)、叙事能力0.3(或30%),在此不做限制。
另外,关于上述综合评分与奖励值之间的对应关系,作为示例,该对应关系式例如可以参考线性关系式:y=0.1x,即可以预设综合评分与奖励值之间的对应关系是线性对应关系,其中x为综合评分,该综合评分例如可以是0~100之间的任意值,y为相应的奖励值,基于该对应关系式,则可以将价值网 络210对完成剪辑的视频片段进行综合评分得到的分值转换为奖励值,反馈给剪辑策略网络220。例如价值网络210得到的综合评分分值是87分,则基于上述转换关系式,转换得到的奖励值为8.7,因此反馈给剪辑策略网络220的奖励值为8.7。在另一些实施例中,上述综合评分与奖励值之间的对应关系还可以预设的综合评分与奖励值之间的对应关系表,在此不做限制。
如上所述,服务器300通过上述步骤301至303完成对价值网络210的训练后,可以将该训练好的价值网络210嵌入包括剪辑策略网络220的视频剪辑模型200,之后在服务器300上可以通过大量的待剪辑样本数据输入包括剪辑策略网络220和上述训练好的价值网络210的视频剪辑模型200,训练视频剪辑模型200中的剪辑策略网络220(即视频剪辑模型200的视频剪辑部分),使得视频剪辑模型200对输入的待剪辑数据具有一定的剪辑策略预测能力和决策能力。如此,当服务器300完成对视频剪辑模型200的训练之后,如果将训练好的视频剪辑模型200移植到手机100中执行对用户选择的待剪辑素材进行剪辑处理时,能够剪辑生成具有较高欣赏价值的短视频,提高用户的使用体验。
可以理解,训练视频剪辑模型200通过剪辑策略网络220对待剪辑素材进行剪辑处理的过程可以在服务器300训练对视频剪辑模型200对待剪辑素材进行剪辑处理的过程中完成,为了便于描述,下面结合图3B-1至图3B-2所示的视频剪辑模型200的剪辑训练流程示意图,具体介绍服务器300训练视频剪辑模型200通过剪辑策略网络220对待剪辑素材进行剪辑处理的过程。
示例性地,参考图3B-1和图3B-2所示,服务器300训练视频剪辑模型200对待剪辑素材进行剪辑处理的过程包括两个:其中一个过程如图3B-1所示,视频剪辑模型200对输入的待剪辑样本数据进行模拟剪辑训练(train),以找到适合于输入的待剪辑样本数据的最优剪辑策略,可以理解,图3B-1所示的过程也是视频剪辑模型200通过多次模拟剪辑训练调整剪辑策略网络220的网络参数至最佳参数的过程。另一个过程如图3B-2所示,视频剪辑模型200采用图3B-1所示的模拟剪辑训练过程找到的最优剪辑策略对待剪辑样本数据进行实际剪辑处理的过程(即推理过程,inference)。
也就是说,在通过待剪辑样本数据对视频剪辑模型200进行训练的过程中,需要先令视频剪辑模型200通过剪辑策略网络220对待剪辑样本数据进行预设次数的模拟剪辑训练,可以理解,视频剪辑模型200每完成一次模拟剪辑训练过程,得以在模拟剪辑训练过程中更新网络参数的剪辑策略网络220剪辑得到的多段视频片段能够获得来自价值网络210的评分反馈的累计奖励值也会相应提高一些,即价值网络210对剪辑策略网络220的剪辑能力评价有所提高;当视频剪辑模型200完成预设多次(例如超过预设次数阈值)模拟剪辑训练过程后,完成网络参数更新后的剪辑策略网络220剪辑得到的视频片段所能获得的评分所反馈的奖励值可以达到一个最高值,此时可以确定剪辑策略网络220已经找到最优剪辑策略。视频剪辑模型200进而采用模拟剪辑训练找到的最优剪辑策略对待剪辑样本数据进行实际的剪辑处理过程,从而得到具有较高欣赏价值的短视频。
其中,多次模拟剪辑训练过程之后最终得到的累计奖励值(Q值)是累计奖励值的期望值,Q值越大则表明采用本次模拟剪辑训练过程得到的剪辑策略剪辑出来的短视频在各欣赏角度上的价值评分越高,也就是说,该短视频的专业化程度或者欣赏价值越高,例如可以达到专业剪辑师剪辑得到的短视频所具有的专业化程度。可以理解,在对待剪辑样本数据进行实际剪辑处理(即图3B-2所示的剪辑过程)之前,视频剪辑模型200中可以进行多次模拟剪辑过程,来确定适合于用户输入素材的最优剪辑策略,即最终得到的能够产生最高Q值的剪辑策略。
下面结合图3B-1和图3B-2详细说明服务器300训练视频剪辑模型200对输入的待剪辑样本数据进行剪辑处理的过程。
图3B-1根据本申请实施例示出了视频剪辑模型200对待剪辑样本数据进行模拟剪辑训练的过程示 意图。可以理解,图3B-1所示为视频剪辑模型200对待剪辑样本数据进行一次模拟剪辑训练的过程,视频剪辑模型200再进行下一次(例如第2次)模拟剪辑训练时,视频剪辑模型200中剪辑策略网络220的初始网络参数对应于第1次模拟剪辑训练结束时剪辑策略网络220更新后的网络参数。视频剪辑模型200需要对待剪辑样本数据进行预设次数的图3B-1所示的模拟剪辑训练过程之后,可以确定视频剪辑模型200找到适合于待剪辑样本数据的最优剪辑策略。
具体地,如图3B-1所示,该过程包括以下步骤:
310t:视频剪辑模型200获取待剪辑样本数据中的第n段视频片段。可以理解,视频剪辑模型200对待剪辑样本数据进行剪辑时,可以将待剪辑样本数据划分为m段待剪辑的视频片段逐段进行剪辑,其中,m≥n。
示例性地,在服务器300训练视频剪辑模型200之前,可以采集大量待剪辑样本数据输入服务器300,用于训练视频剪辑模型200的剪辑策略网络220,其中待剪辑样本数据可以包括图像数据和/或视频片段等。可以理解,服务器300首先要对训练所用到的待剪辑样本数据做特征向量提取,例如通过多个矩阵集合来表示每个待剪辑样本数据,具体可以参考上述步骤302中相关描述,在此不再赘述。服务器300完成对待剪辑样本数据的特征提取后,可以将待剪辑样本数据输入视频剪辑模型200进行剪辑。可以理解,在视频剪辑模型200中,可以预设将一个待剪辑样本数据划分为多段视频片段、再逐段对待剪辑样本数据进行模拟剪辑处理,例如在视频剪辑模型200中预设将一个待剪辑样本数据划分为100段待剪辑的视频片段(即m=100),当视频剪辑模型200对待剪辑样本数据进行模拟剪辑训练时,视频剪辑模型200可以依次获取待剪辑样本数据的第1段视频片段、第2段视频片段、第n段视频片段、以及第100段视频片段,分别进行模拟剪辑处理。可以理解,视频剪辑模型200所获取的各视频片段(例如第1段视频片段)首先要输入视频剪辑模型200的剪辑策略网络220进行剪辑处理。
311t:视频剪辑模型200中的剪辑策略网络220在网络参数δ n下确定对第n段视频片段可采取的当前最优剪辑动作。
具体地,剪辑策略网络220可以基于输入的第n段视频片段,预测如果采取预设的各种剪辑动作对第n段视频片段进行剪辑处理所对应的各个累计奖励值(以下用Qn表示),例如剪辑策略网络220中预设有x种剪辑动作,那么剪辑策略网络220可以基于输入的第n段视频片段相应预测得到x个Qn,剪辑策略网络220进而将所预测的x个Qn中的最大值所对应的剪辑动作确定为对第n段视频片段可采取的当前最优剪辑动作。其中,累计奖励值Qn可以理解为如果剪辑策略网络220对输入的第n段视频片段采取预设的某种剪辑动作进行剪辑,则剪辑策略网络220可能在完成第n段之后的全部视频片段剪辑后获得的评分反馈奖励值的累计值。可以理解,剪辑策略网络220的网络参数δ n对应于剪辑策略网络220在对输入的第n段视频片段预测Qn时所对应的网络参数。例如,对于第一次模拟剪辑训练过程中第1段输入剪辑策略网络220的视频片段,剪辑策略网络220可以采用初始网络参数例如δ 1所对应的剪辑策略对第1段视频片段预测Q1。
可以理解,剪辑策略网络220中如果预设有x种剪辑动作,则对于输入剪辑策略网络220的每一段视频片段,剪辑策略网络220均会相应预测得到x种Qn、并从中选择最大值以确定对输入的第n段视频片段的最优剪辑动作。
示例性地,如果待剪辑样本数据包括100段待剪辑的视频片段(即上述步骤310t中的m=100),剪辑策略网络220中预设有1000种(即x=1000)剪辑动作。则剪辑策略网络220可以基于输入的第1段视频片段预测得到1000个Q1值,剪辑策略网络220进而将所预测的1000个Q1值中的最大值所对应的剪辑动作确定为对第1段视频片段可采取的当前最优剪辑动作。
类似地,剪辑策略网络220基于输入的第2段视频片段也可以预测得到1000个Q2值,剪辑策略网络220进而将所预测1000个Q2值中的最大值所对应的剪辑动作即可确定为对第2段视频片段可采取的当前最优剪辑动作。
可以理解,剪辑策略网络220在对第2段视频片段预测Q2时网络参数可能已经发生变化。其中,剪辑策略网络220的网络参数变化将在下述步骤315t中详细描述,在此不再赘述。
312t:视频剪辑模型200中的剪辑策略网络220采取所确定的当前最优剪辑动作对第n段视频片段进行模拟剪辑处理。
示例性地,为了使视频剪辑模型200能够找到适合于待剪辑样本数据的最优剪辑策略,在视频剪辑模型200对待剪辑样本数据进行实际剪辑处理之前,视频剪辑模型200需要通过剪辑策略网络220对待剪辑样本数据进行预设次数的模拟剪辑训练过程,在每次模拟训练过程中,剪辑策略网络220采取剪辑动作对待剪辑样本数据中各视频片段进行的剪辑处理均为模拟剪辑处理。视频剪辑模型200中的剪辑策略网络220基于上述步骤311t中所确定的对应于第n段视频片段的当前最优剪辑动作,对待剪辑样本数据中的第n段视频片段进行模拟剪辑处理。
313t:视频剪辑模型200中的剪辑策略网络220输出第n段完成剪辑的视频片段。
示例性地,视频剪辑模型200中的剪辑策略网络220采取所确定与第n段视频片段相应的当前最优剪辑动作对第n段视频片段进行模拟剪辑处理后,得到第n段完成剪辑的视频片段并输出剪辑策略网络220。
例如,视频剪辑模型200中的剪辑策略网络220在采取与第1段视频片段相应的当前最优剪辑动作,完成对第1段视频片段的模拟剪辑处理后得到第1段完成剪辑的视频片段并输出剪辑策略网络220,视频剪辑模型200则可以对剪辑策略网络220输出的第1段完成剪辑的视频片段进行暂存。
可以理解,在视频剪辑模型200完成本步骤313t的执行过程之后,视频剪辑模型200的价值网络210可以继续执行下述步骤314t,并且视频剪辑模型200可以继续执行下述步骤316t的过程,下述步骤314t和316t的执行顺序不限,可以同时进行或者先后执行。
314t:视频剪辑模型200中的价值网络210对剪辑策略网络220输出的第n段完成剪辑的视频片段进行评分,视频剪辑模型基于该评分向剪辑策略网络220反馈相应的奖励值。
示例性地,视频剪辑模型200中的价值网络210可以获取剪辑策略网络220输出的第n段完成剪辑的视频片段,对该段完成剪辑的视频片段进行评分并输出该评分,视频剪辑模型200则可以将价值网络210所输出的对应于第n段完成剪辑的视频片段的评分转换为相应的奖励值,反馈给剪辑策略网络220。其中,价值网络210对完成剪辑的视频片段进行评分所采用的评分规则,可以是对应于某个剪辑风格的评分规则,该剪辑风格例如是上述步骤301至302训练过程得到的价值网络210所包括的多种剪辑风格中的任一种。视频剪辑模型200将价值网络210对第n段完成剪辑的视频片段的评分转换为奖励值的过程可以参考上述步骤303之后的相关描述,在此不再赘述。
例如,视频剪辑模型200中的价值网络210获取上述步骤314t中剪辑策略网络220输出的第1段完成剪辑的视频片段进行评分,可以理解,经上述步骤301至302训练过程得到的价值网络210可以包括对多种剪辑风格的视频剪辑片段的评分规则,因此,在执行包括本步骤在内的模拟剪辑训练的过程开始前或者开始时(例如上述步骤310t开始前或者开始时),可以对视频剪辑模型200预设一种剪辑风格,如此,视频剪辑模型200中的价值网络210在对第1段完成剪辑的视频片段进行评分时,可以采用对应于预设的剪辑风格的评分规则对第1段完成剪辑的视频片段进行评分。
可以理解,视频剪辑模型200中的价值网络210采用对应于预设的剪辑风格的评分规则对第1段完 成剪辑的视频片段进行评分的具体过程可以参考上述步骤302中相关描述;视频剪辑模型200将价值网络210对第1段完成剪辑的视频片段的评分转换为奖励值的过程可以参考上述步骤303之后的相关描述,在此不再赘述。
可以理解,视频剪辑模型200将基于价值网络210对第1段完成剪辑的视频片段的评分转换得到的奖励值,作为视频剪辑模型200中的剪辑策略网络220的反馈数据,该奖励值实际上是对视频剪辑模型200的剪辑策略网络220执行上述步骤310t至313t完成对待剪辑样本数据的第1段视频片段的模拟剪辑结果的评价反馈,这一评价反馈将用于优化剪辑策略网络220对待剪辑样本数据的第2段视频片段进行模拟剪辑处理过程所对应的剪辑策略网络参数,具体优化网络参数的过程将在下文详细描述,在此不再赘述。
315t:视频剪辑模型200中的剪辑策略网络220基于所获得的反馈奖励值更新剪辑策略网络220参数为网络参数δ n+1
示例性地,视频剪辑模型200中的剪辑策略网络220可以综合三部分数值来计算基于第n段视频片段的模拟剪辑结果得到的损失函数,这三部分数值包括(a)上述步骤311t中剪辑策略网络220在网络参数δ n下预测的最大Qn值;(b)上述步骤314t中视频剪辑模型200基于价值网络210对第n段完成剪辑的视频片段的评分所反馈的奖励值;以及(c)剪辑策略网络220在网络参数δ n下预测的最大Qn+1值,剪辑策略网络220基于上述三部分确定的损失函数,调整网络参数从网络参数δ n至网络参数δ n+1。其中,损失函数的计算公式为:
损失函数=最大Qn+1值+第n段完成剪辑的视频片段所获得的反馈奖励值-最大Qn值;
其中,上述(b)与(c)之和(即Qn+1值+第n段完成剪辑的视频片段所获得的反馈奖励值),可以作为视频剪辑模型200中的剪辑策略网络220对待剪辑样本数据的第n段视频片段的目标累计奖励值(下称目标Qn值),因此,本步骤中剪辑策略网络220参数所计算的损失函数表征的是剪辑策略网络220对第n段视频片段的目标Qn值与预测的最大Qn值之间的差值。可以理解,剪辑策略网络220根据计算得到的损失函数,使用梯度下降法可以对剪辑策略网络参数进行更新,例如更新后的剪辑策略网络220的参数表示为上述网络参数δ n+1,此处不再赘述。另外,参考上述步骤311t中关于网络参数δ n的描述,可以理解,剪辑策略网络220的网络参数δ n+1则对应于剪辑策略网络在后续对输入的第n+1段视频片段确定最优剪辑动作时预测Q n+1′所采用的网络参数。
可以理解,由于视频剪辑模型200在执行本步骤315t计算损失函数时需要用到剪辑策略网络220在网络参数δ n下预测的最大Qn+1值,因此视频剪辑模型200在执行本步骤315t计算损失函数时可以预先获取待剪辑样本数据中的第n+1段视频片段来预测最大Qn+1值,该过程可以参考图3B-1所示的步骤310t与步骤315t之间的虚线箭头所示,对待剪辑样本数据中的第n+1段视频片段来预测最大Qn+1值的过程参考上述步骤311t中对第n段视频片段预测最大Qn值的过程,在此不再赘述。
可以理解,视频剪辑模型200完成步骤316t的执行过程之后,可以继续执行下述步骤317t或者继续执行下述步骤317t之后的步骤,例如执行下述步骤318t或者进入进入下一段(第n+1段)视频片段的模拟剪辑处理过程。
316t:视频剪辑模型200判断是否完成本次模拟剪辑训练。若是,则执行步骤317t;若否,则视频剪辑模型200的剪辑策略网络220在网络参数δ n+1下进入对第n+1段(例如第2段)视频片段的模拟剪辑处理过程,即视频剪辑模型200返回执行步骤310t,视频剪辑模型200的剪辑策略网络220在网络参数δ n+1下执行步骤311t至313t、以及在价值网络210执行步骤314t之后执行步骤315t更新网络参数δ n+2,依此循环。
示例性地,视频剪辑模型200可以基于上述步骤310t中获取的第n段视频片段(例如第1段视频片段)所携带的时序信息判断是否完成本次对待剪辑样本数据的模拟剪辑训练过程,第n段视频片段(例如第1段视频片段)所携带的时序信息可以是第n段视频片段(例如第1段)视频片段在待剪辑样本数据中的位置信息,或者是视频剪辑模型200逐段获取待剪辑样本数据的顺序标签信息等,在此不做限制。在另一些实施例中,视频剪辑模型200也可以基于上述步骤314t中剪辑策略网络220输出的第n段完成剪辑的视频片段(例如第1段完成剪辑的视频片段)所携带的时序信息来判断是否完成本次对待剪辑样本数据的模拟剪辑训练过程。如果视频剪辑模型200执行步骤的判断结果为否,即未完成本次模拟剪辑训练,则视频剪辑模型200返回执行步骤310t获取待剪辑样本数据中的第n+1段视频片段(例如第2段视频片段)进行模拟剪辑处理,接着,视频剪辑模型200的剪辑策略网络220在网络参数δ n+1(例如网络参数δ 2)下继续执行步骤311t至313t、以及在价值网络210执行步骤314t之后执行步骤315t更新至网络参数δ n+2(例如更新至网络参数δ 3),在此不再赘述。
可以理解,视频剪辑模型200完成一次对待剪辑样本数据的模拟剪辑训练过程,包括对待剪辑样本数据中全部待剪辑的视频片段逐段执行上述步骤310t至316t所描述的过程,例如步骤310t中所示例的,如果待剪辑样本数据划分为100段待剪辑的视频片段,则视频剪辑模型200在对第100段视频片段执行上述步骤310t至316t所描述的过程之后,在本步骤中视频剪辑模型200的判断结果为是,即已完成本次模拟剪辑训练,可以执行步骤317t。也就是说,视频剪辑模型200在对第1至99段视频片段分别执行上述步骤310t至316t所描述的过程之后,在本步骤中视频剪辑模型200所判断的结果均为否,即未完成本次模拟剪辑训练,需获取待剪辑样本数据中的下一段待剪辑的视频片段,返回步骤310t,执行对下一段视频片段的模拟剪辑处理过程。
317t:视频剪辑模型200判断训练次数是否达到预设次数阈值。若否,则表明视频剪辑模型200暂未找到适合于待剪辑样本数据的最优剪辑策略,则执行步骤318t;若是,则表明视频剪辑模型200已经找到适合于待剪辑样本数据的最优剪辑策略,则执行步骤319t。
示例性地,在执行包括本步骤在内的模拟剪辑训练的过程开始前或者开始时(例如上述步骤310t开始前或者开始时),可以预设视频剪辑模型200对待剪辑样本数据进行的模拟剪辑训练次数阈值,当视频剪辑模型200判断完成上述步骤310t至316t所描述的模拟剪辑训练次数未达到预设的次数阈值时,则执行下述步骤318t;当视频剪辑模型200判断完成上述步骤310t至316t所描述的模拟剪辑训练次数达到预设的次数阈值时,则表明视频剪辑模型200已经找到适合于待剪辑样本数据的最优剪辑策略,则执行下述步骤319t。
可以理解,在另一些实施例中,也可以设置价值网络210在每次模拟剪辑过程中对各段完成模拟剪辑的视频片段进行评分所反馈的奖励值的累计阈值作为步骤317t所执行的判断条件,例如判断上述步骤341t中价值网络210对各段完成模拟剪辑的视频片段进行评分所反馈的奖励值的累计值是否达到预设的累计阈值,如果未达到累计阈值,则表明还需要继续进行模拟剪辑训练以确定最优剪辑策略,需继续执行下述步骤318t;如果达到累计阈值,则表明已找到最优剪辑策略,可以继续执行下述步骤319t。或者,在一些实施例中,也可以将预设的模拟训练次数阈值和上述奖励值的累计阈值共同作为步骤317t的判断条件,在此不做限制。
318t:视频剪辑模型200进入下一次模拟剪辑训练。具体地,视频剪辑模型200进入下一次模拟剪辑训练时,视频剪辑模型200中的剪辑策略网络220在本次模拟剪辑训练完成后更新的网络参数下开始对待剪辑样本数据的第1段视频片段执行步骤311t。例如,上述步骤311t示例的待剪辑样本数据包括m段待剪辑的视频片段,视频剪辑模型200完成第1次模拟剪辑训练后,剪辑策略网络220的网络参数 可以经历m+1次更新,例如剪辑策略网络220参数可以更新至网络参数δ m+1
示例性地,在上述步骤317t中,如果视频剪辑模型200判断完成上述步骤310t至316t所描述的模拟剪辑训练次数未达到预设的次数阈值,则视频剪辑模型200须对待剪辑样本数据再进行一次上述步骤310t至316t的模拟剪辑训练过程,并在再次完成上述步骤310t至316t的模拟剪辑训练过程之后,视频剪辑模型200再次执行上述步骤317t所描述的判断过程。
319t:视频剪辑模型200采用最优剪辑策略对待剪辑样本数据进行实际剪辑处理。其中,采用最优剪辑策略对待剪辑样本数据进行实际剪辑处理的过程可以参考图3B-2所示的过程,将在下文详细描述,在此不再赘述。
示例性地,视频剪辑模型200在对待剪辑样本数据完成预设次数的模拟剪辑训练后找到适合于待剪辑样本数据的最优剪辑策略,进而视频剪辑模型200采用经过预设多次模拟剪辑训练得到的最优剪辑策略继续执行下述图3B-2所示的步骤310i至360i,对待剪辑样本数据进行实际剪辑处理。
图3B-2根据本申请实施例示出了视频剪辑模型200采用上述图3B-1所示的模拟剪辑训练过程训练得到的最优剪辑策略对待剪辑样本数据进行剪辑处理的过程示意图。可以理解,图3B-2所示的剪辑处理过程,即视频剪辑模型200中的剪辑策略网络220根据最优剪辑策略对应的相关剪辑处理逻辑,确定对待剪辑样本数据中各段待剪辑的视频片段进行实际剪辑处理所采用的剪辑动作、以及采用确定的剪辑动作对相应的某一段待剪辑的视频片段进行实际剪辑处理的过程。
如图3B-2所示,该过程包括以下步骤:
310i:视频剪辑模型200获取第n段视频片段。
示例性地,视频剪辑模型200基于上述图3B-1所示的模拟剪辑训练过程训练的最优剪辑策略,将待剪辑样本数据划分为与该剪辑策略对应数量的多段视频片段、再逐段对待剪辑样本数据进行剪辑处理。
示例性地,当视频剪辑模型200基于训练得到的最优剪辑策略对待剪辑样本数据进行剪辑处理时,视频剪辑模型200可以依次获取待剪辑样本数据的第1段视频片段、第2段视频片段、第N段视频片段,分别按照最优剪辑策略中对应于各段视频片段的剪辑处理策略进行剪辑处理。视频剪辑模型200所获取的各视频片段(例如第1段视频片段)首先要输入视频剪辑模型200的剪辑策略网络220进行剪辑处理。
上述视频剪辑模型200对待剪辑样本数据进行分段的具体处理过程可以参考上述步骤310t中相关描述,在此不再赘述。
可以理解,视频剪辑模型200训练得到的最优剪辑策略包括对待剪辑样本数据划分成多少段待剪辑的视频片段、以及对每段待剪辑的视频片段采取何种剪辑动作等策略。
320i:视频剪辑模型200中的剪辑策略网络220采用所确定的最优剪辑策略,确定对第n段视频片段的剪辑动作。
示例性地,视频剪辑模型200中的剪辑策略网络220基于所确定的最优剪辑策略确定对应于第1段视频片段的Q 1值。可以理解,当视频剪辑模型200确定采用最优剪辑策略对待剪辑样本数据进行剪辑处理时,视频剪辑模型200中剪辑策略网络220参数便不再更新,相应地,视频剪辑模型200确定采用最优剪辑策略对待剪辑样本数据进行剪辑处理的过程中,视频剪辑模型200中的价值网络210不会对剪辑策略网络220输出的各完成剪辑的视频片段进行评分。
上述视频剪辑模型200中的剪辑策略网络220预测第1段视频片段的Q 1值的具体过程可以参考上述步骤311t中相关描述,在此不再赘述。
330i:视频剪辑模型200中的剪辑策略网络220采取所确定的剪辑动作对第n段视频片段进行实际剪辑处理。
示例性地,视频剪辑模型200中的剪辑策略网络220基于上述步骤330i所确定的与预测的Q 1值相对应的剪辑动作,对待剪辑样本数据中的第1段视频片段进行实际剪辑处理。
上述视频剪辑模型200中的剪辑策略网络220采用所确定的剪辑动作对第1段视频片段进行实际剪辑处理的具体过程可以参考上述步骤312t中的相关描述,所采取的剪辑动作包括哪些可以参考上述图2中相关描述,在此不再赘述。
340i:视频剪辑模型200中的剪辑策略网络220输出第n段完成剪辑的视频片段。
示例性地,视频剪辑模型200中的剪辑策略网络220完成对第1段视频片段的剪辑处理后得到第1段完成剪辑的视频片段并输出,视频剪辑模型200则对剪辑策略网络220输出的第1段完成剪辑的视频片段进行暂存。
350i:视频剪辑模型200判断是否完成剪辑。若是,则执行步骤360i;若否,则视频剪辑模型200进入下一段(例如第2段)视频片段的剪辑处理过程,返回步骤310i,获取待剪辑样本数据中的第2段视频片段。
示例性地,视频剪辑模型200可以基于上述步骤310i中获取的第1段视频片段所携带的时序信息判断是否完成对待剪辑样本数据的实际剪辑处理过程,在另一些实施例中,视频剪辑模型200也可以基于上述步骤350i中剪辑策略网络220输出的第1段完成剪辑的视频片段所携带的时序信息来判断是否完成对待剪辑样本数据的实际剪辑处理过程,或者视频剪辑模型200也可以基于上述步骤340i中剪辑策略网络220输出的第1段完成剪辑的视频片段中是否包含待剪辑样本数据中最后一段视频片段的标签或标记,来判断是否完成对待剪辑样本数据的实际剪辑处理过程,在此不做限制。因此,如果视频剪辑模型200判断已完成对待剪辑样本数据中的全部待剪辑的视频片段的剪辑处理过程,则视频剪辑模型200继续执行下述步骤360i;如果视频剪辑模型200判断未完成对待剪辑样本数据中的全部待剪辑的视频片段的剪辑处理过程,则视频剪辑模型200进入下一段(例如第2段)视频片段的剪辑处理过程,返回步骤310i,获取待剪辑样本数据中的第2段视频片段。
上述视频剪辑模型200判断是否完成剪辑的具体执行过程可以参考上述步骤316t中的相关描述,在此不再赘述。
360i:视频剪辑模型200输出完成剪辑的短视频。
示例性地,上述步骤350i中,如果视频剪辑模型200判断已完成对待剪辑样本数据中的全部待剪辑的视频片段的剪辑处理过程,即说明视频剪辑模型200中已存储完成剪辑的全部视频片段,包括上述步骤340i中剪辑策略网络220输出的第1段完成剪辑的视频片段、以及重复上述步骤310i至340i依次完成剪辑处理的第2段完成剪辑的视频片段、第n段完成剪辑的视频片段。此时,视频剪辑模型200则可以生成由上述第1段完成剪辑的视频片段、第2段完成剪辑的视频片段、第n段完成剪辑的视频片段构成的完成剪辑的短视频,并输出视频剪辑模型200。至此,视频剪辑模型200完成对一个待剪辑样本数据的剪辑处理过程。
可以理解,在服务器300上,通过对视频剪辑模型200输入大量的待剪辑样本数据,例如300个待剪辑样本数据,使视频剪辑模型200在对每个待剪辑样本数据执行上述图3B-1的模拟剪辑训练过程和图3B-2所示的剪辑处理过程中,完成对视频剪辑模型200的训练。可以理解,视频剪辑模型200所执行的上述图3B-1的模拟剪辑训练过程和图3B-2所示的剪辑处理过程也是对视频剪辑模型200中的剪辑策略网络220的网络参数的调整优化过程。
可以理解,在服务器300上完成对视频剪辑模型200的训练之后,可以将训练好的视频剪辑模型200移植到手机100上,实现对用户所选择的待剪辑素材的视频剪辑处理过程。
可以理解,视频剪辑模型200移植到手机100的过程,例如可以是通过手机100所配置的操作系统中的模型读取接口读取并解析该视频剪辑模型200,然后编译成应用程序文件安装到手机100中,例如手机100所配置的操作系统安卓(Android) TM系统,则可以通过Android工程中的模型读取接口读取并解析该视频剪辑模型200,然后编译成APK(Android application package,Android应用程序包)文件,安装到手机100中,完成视频剪辑模型200的移植。
可以理解,在另一些实施例中,手机100所配置的操作系统也可以是其他系统,例如鸿蒙 TM系统(HarmonyOS),相应地,该视频剪辑模型200可以被编译成适用于手机100所配置的相应操作系统的应用程序文件,在此不做限制。
完成视频剪辑模型200移植到手机100的过程后,可以将手机100中的图片和/或视频数据等,输入视频剪辑模型200中,得到对应于手机100中的图片和/或视频等数据的特征数据,例如上述步骤302所描述的描述图像数据的矩阵或描述视频片段的矩阵集合。如此,用户便可以操作手机100通过移植到手机100上的视频剪辑模型200完成对手机100中存储的图片和/或视频等数据的剪辑处理过程,得到具有较高欣赏价值的短视频,而且,由于训练好的视频剪辑模型200中的价值网络210包括对应于多种剪辑风格的评分规则,因此,移植了上述训练好的视频剪辑模型200的手机100剪辑处理得到的短视频还可以具备用户所选择的剪辑风格。
图4示出了手机100响应于用户操作,执行本申请的视频剪辑方法的实施流程示意图。
可以理解,移植了上述视频剪辑模型200的手机100在对用户选择输入的待剪辑素材在进行视频剪辑的过程中,能够剪辑得到对应于用户所选的剪辑风格的、并且具有较高欣赏价值的短视频。为了便于描述,在本申请实施例中,作为示例,上述视频剪辑模型200移植到手机100的过程中对应编译安装到手机100上的应用程序(即视频剪辑应用),在手机100可以显示为下述图5A所示的剪视频应用511。
示例性地,如图4所示,该流程包括以下步骤:
401:手机100获取用户所选取的待剪辑素材,并获取用户所选择的视频剪辑风格。
示例性地,用户操作手机100运行该视频剪辑应用,完成待剪辑素材的上传添加,手机100即可获取待剪辑素材。可以理解,该待剪辑素材可以是手机100所拍摄的图片和/或视频,在此不做限制。
图5A至5E示出了手机100运行视频剪辑应用执行本申请的视频剪辑方法的操作界面示意图。
如图5A所示,在手机100的桌面510上,用户可以点击剪视频应用511,进入图5B所示的操作界面520。
如图5B所示,操作界面520包括风格设置选项521、添加素材按钮522,在一些实施例中,操作界面520还可以包括设置按钮523,可以理解,图5B所示界面并不构成对本申请实施例所提供的剪视频应用511的界面功能按钮及界面排布样式等方面的限制,在另一些实施例中,适用本申请的视频剪辑应用的界面也可以是其他形式,可以具有多于或少于图5B所示按钮数量的功能控件,在此不做限制。
参考图5B所示的操作①,用户可以在风格设置选项521下勾选自己想要制作的视频风格,例如选择“港风”风格选项,如图5B所示,风格设置选项521下提供的各种风格选项可以设置关键词描述,以向用户简单介绍各种风格的特点,例如“港风”风格选项对应的关键词描述为“王家卫电影风格”,“童年”风格选项对应的关键词描述为“老照片”和“回忆”,“漫画”风格选项对应的关键词描述为“充满想象”,“悬疑”风格选项对应的关键词描述为“情节”和“逻辑”,可以理解,在另一些实施例中,风格设置选项521下可以设置其他的风格以及更多风格选项,并且对于各种风格选项的关键词描述也可以为其他内容,在此不做限制。
如果用户制作视频时不确定自己想要剪辑什么风格的视频,也可以点击图5B所示的自动匹配选项 5211,在该选项5211被勾选的情况下,手机100可以在后续的视频剪辑制作中为用户所添加的素材自动匹配适合的风格,另外,用户也可以在图5B所示的风格设置选项521下方左右滑动或者点击向左滑动按钮5212或向右滑动按钮5213,选择其他风格,在此不做限制在此不做限制。
可以理解,上述图5B所示的风格设置选项521下提供的各种视频剪辑风格,在手机100运行的剪视频应用511,响应于用户选择视频剪辑风格的操作过程中,可以对应于相应选择操作所生成的指令,例如,该指令中可以包含用户所选择的视频剪辑风格对应的标签,在此不做限制。
参考图5B所示的操作②,用户完成风格设置后,可以点击添加素材按钮522添加待剪辑素材,添加待剪辑素材的界面可以参考图5C所示的界面530,如上所述,待剪辑素材可以是单个视频素材、多个视频素材以及图片和视频的组合素材等,在此不做限制。用户可以在图5C所示界面530上的素材选择区531勾选需要添加的待剪辑素材上的复选框532添加待剪辑素材,用户也可以在素材选择区531内点击已勾选的复选框取消对相应素材的添加。在另一些实施例中,用户也可以在界面530下方的素材管理区533内点击所添加素材右上角的“×”按钮534,删除已勾选的素材,在此不做限制。
参考上述图5C所示的操作③,用户完成添加待剪辑素材后,可以点击界面530右上角的“一键创作”按钮535进行视频剪辑。另外,用户也可以点击界面530左上角的“取消”按钮536取消本次视频剪辑操作。可以理解,另一些实施例中,手机100所显示的供用户选择添加待剪辑素材的界面,可以是不同于图5C所示的其他布局形式的界面,在此不做限制。
图5D和图5E所示的界面示意图将在下文相应步骤中详细描述,在此不再赘述。
402:手机100上运行的视频剪辑应用基于用户所选择的剪辑风格,对用户所选择的待剪辑素材进行剪辑处理。
示例性地,手机100所运行剪视频应用511可以将获取到的用户所选择的剪辑风格对应的标签、以及获取的用户所选择的待剪辑素材输入视频剪辑模型200中进行剪辑处理,视频剪辑模型200先基于用户所选择的剪辑风格对待剪辑素材多次执行上述步骤310t至319t的模拟剪辑训练,通过多次模拟剪辑训练找到适合于用户所选择的待剪辑素材的最优剪辑策略,视频剪辑模型200再基于多次模拟剪辑训练找到的最优剪辑策略对待剪辑素材执行上述步骤310i至360i的剪辑处理过程。
可以理解,上述视频剪辑模型200对待剪辑素材所执行的模拟剪辑训练过程,具体可以参考上述步骤310t至319t中相关描述,在此不再赘述;视频剪辑模型200基于模拟剪辑训练得到的最优剪辑策略对待剪辑素材所执行的剪辑处理过程,具体可以参考上述步骤310i至360i中的相关描述,在此不再赘述。
403:手机100上运行的视频剪辑应用完成对待剪辑素材的剪辑处理,生成完成剪辑的短视频。
示例性地,在上述步骤402中手机100所运行剪视频应用511将用户所选择的剪辑风格对应的标签、以及用户所选择的待剪辑素材输入视频剪辑模型200中进行剪辑处理,视频剪辑模型200完成对待剪辑素材的剪辑处理后,参考上述步骤360i中的相关描述,视频剪辑模型200生成具有用户所选的剪辑风格的短视频并输出。可以理解,视频剪辑模型200输出的短视频可以呈现是手机100所运行的视频剪辑应用的界面上。
参考上述图5C所示的操作③,用户点击一键创作按钮535之后,手机100上运行的剪视频应用511则通过训练好的视频剪辑模型200对用户所选择的待剪辑素材进行剪辑操作,可以理解,该剪辑操作过程中,视频剪辑模型200以待剪辑素材的视频片段和用户所选择的视频剪辑风格对应的标签等作为视频剪辑模型200的输入,由训练好视频剪辑模型200通过训练好的剪辑策略网络220和价值网络210,实现剪辑得到具有用户所选的剪辑风格的短视频,具体参考上述步骤402中相关描述,在此不再赘述。
可以理解,用户在图5C所示的界面530上点击一键创作按钮535之后,手机100可以显示图5D所示的完成视频剪辑的界面540,用户可以点击已完成剪辑视频上的播放按钮541预览已完成剪辑视频的内容;用户可以点击界面540下方的分享按钮542分享已完成剪辑的视频至其他应用或者选择在某短视频应用平台上进行发布;用户可以点击界面540下方的保存按钮543将已完成剪辑的视频保存至手机100的本地相册,以供欣赏或在其他应用上添加该已完成剪辑的视频使用;如果用户点击播放按钮541预览已完成剪辑视频后,感觉该视频需要进一步添加一些其他元素进行优化,则可以点击编辑按钮544对该视频进行编辑;另外,如果用户点击播放按钮541预览已完成剪辑视频后,感觉视频风格不太喜欢或者内容不太喜欢等,也可以点击界面540下方的删除按钮545删除该视频。图5D所示的界面540下方还包括更多按钮546,用户可以点击该按钮546对已完成剪辑的视频进行其他操作,例如“重命名”、“投屏播放”等操作。
另外,在上述图5B所示的操作界面520上,用户也可以点击图5B所示的设置按钮523对剪视频应用511的一些默认选项、配置参数等进行设置,作为示例,参考图5E所示的设置界面550,用户可以在设置界面550上对风格模型进行更新或偏好设置,例如用户可以点击风格模型选项551下的“检查并更新风格模型”对上述图5B所示的风格设置选项521下可选择的各视频剪辑风格的种类进行检查更新,及时添加更新的已训练完成的新的视频剪辑风格模型,该模型的训练将在下文详细描述,在此不再赘述。另外,用户可以点击风格模型选项551下的“风格偏爱设置”设置自己偏好的视频剪辑风格,参考图5E所示的操作④,以在图5B所示的风格设置选项521下方优先展示用户偏好的风格,例如用户设置的偏好风格为“港风”、“童年”、“漫画”、“悬疑”四种风格,则图5B所示的风格设置选项521下方相应的优先展示这四种风格,如上所述,用户也可以在图5B所示的风格设置选项521下方左右滑动或者点击向左滑动按钮5212或向右滑动按钮5213,选择其他风格,在另一些实施例中,用户设置的偏好风格还可以为其他风格,例如“中国风”、“可爱”、“唯美”等剪辑风格,在此不做限制。
另外,如图5E所示,用户还可以选择开启“自动添加片尾”功能为所剪辑视频自动添加预设的片尾,在另一些实施例中,用户也可以对所剪辑视频内容添加默认水印或自定义水印等,在此不再赘述。
可以理解,在另一些实施例中,对已完成剪辑视频进行进一步处理的界面540的界面功能布局以及各功能选项所对应的操作也可以设置为其他形式的组合,更多按钮546也可以包括其他更多操作,在此不做限制。
通过上述步骤401至403执行本申请的视频剪辑方法的过程,可以理解,手机100执行本申请的视频剪辑方法,通过采用训练好的视频剪辑模型200中的价值网络210来指导剪辑策略网络220对待剪辑素材进行多次模拟剪辑训练,从而找到适合于待剪辑素材的最优剪辑策略,视频剪辑模型200中的剪辑策略网络220进而采用训练得到的最优剪辑策略完成对待剪辑素材完成剪辑处理,其中,视频剪辑模型200中的价值网络210可以基于用户所选择的剪辑风格从多个专业欣赏角度来指导剪辑策略网络220在每次模拟剪辑训练过程中对每一段视频片段确定可采取的最优剪辑动作的决策过程。因此,通过实施本申请的视频剪辑方法剪辑得到的短视频,相较于现有的其他视频剪辑方案剪辑得到的短视频而言,能够具有更高的欣赏价值并且具有符合用户偏好的剪辑风格,而且基于本申请的视频剪辑方法提供给用户的剪视频应用的操作界面也非常简洁且易于操作,操作门槛低,利于提高用户体验。
另外可以理解,手机100执行本申请的视频剪辑方法,通过视频剪辑模型200对用户选择的待剪辑素材进行模拟剪辑训练的过程中,也是通过用户选择的待剪辑素材继续执行上述图3B-1所示的模拟训练过程,即:将用户所选的待剪辑素材作为样本数据训练视频剪辑模型200中的剪辑策略网络220进一步优化网络参数的过程。也就是说,视频剪辑模型200中的剪辑策略网络220可以在用户通过手机100 进行视频剪辑处理的过程中得以继续训练。
可以理解,手机100实施本申请的视频剪辑方法所安装的视频剪辑应用,不局限于上述剪视频应用511的界面及实现过程,亦不限于上述图5A所示的剪视频应用511的应用图标及应用名称,在另一些实施例中,该视频剪辑应用也可以是其他形式的安装于手机100上的第三方应用程序,在此不做限制。在另一些实施例中,本申请的视频剪辑方法以及上述视频剪辑模型200也可以通过在手机100的相机应用中配置为视频剪辑功能来实现,或者,本申请的视频剪辑方法以及上述视频剪辑模型200也可以通过手机100的系统配置的一项服务卡片实现,例如手机100所搭载的鸿蒙 TM系统(HarmonyOS)配置的视频剪辑服务,另外,在另一些实施例中,本申请的视频剪辑方法还可以直接预设在摄像机、手持云台等具有摄像功能的设备中,以使该设备具有直接对所拍摄照片或视频进行视频剪辑处理的功能,在此不做限制。
作为示例,图6示出了一种手机100的结构示意图。
手机100可以包括处理器610,外部存储器接口620,内部存储器621,通用串行总线(universal serial bus,USB)接口630,充电管理模块640,电源管理模块641,电池642,天线1,天线2,移动通信模块650,无线通信模块660,音频模块670,扬声器670A,受话器670B,麦克风670C,耳机接口670D,传感器模块680,按键690,马达691,指示器692,摄像头693,显示屏694,以及用户标识模块(subscriber identification module,SIM)卡接口695等。其中传感器模块680可以包括压力传感器680A,陀螺仪传感器680B,气压传感器680C,磁传感器680D,加速度传感器680E,距离传感器680F,接近光传感器680G,指纹传感器680H,温度传感器680J,触摸传感器680K,环境光传感器680L,骨传导传感器680M等。
可以理解的是,本发明实施例示意的结构并不构成对手机100的具体限定。在本申请另一些实施例中,手机100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器610可以包括一个或多个处理单元,例如:处理器610可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。在本申请实施例中,处理器610可以通过控制器控制执行本申请视频剪辑方法,包括控制执行视频剪辑模型200对输入的待剪辑素材中的视频片段的剪辑处理过程。
处理器610中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器610中的存储器为高速缓冲存储器。该存储器可以保存处理器610刚用过或循环使用的指令或数据。如果处理器610需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器610的等待时间,因而提高了系统的效率。
在一些实施例中,处理器610可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity  module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器610可以包含多组I2C总线。处理器610可以通过不同的I2C总线接口分别耦合触摸传感器680K,充电器,闪光灯,摄像头693等。例如:处理器610可以通过I2C接口耦合触摸传感器680K,使处理器610与触摸传感器680K通过I2C总线接口通信,实现手机100的触摸功能。在本申请实施例中,用户可以通过手机100的触摸功能点击剪视频应用511的应用图标,以及在剪视频应用511的操作界面上进行相应操作等,在此不做限制。
I2S接口可以用于音频通信。在一些实施例中,处理器610可以包含多组I2S总线。处理器610可以通过I2S总线与音频模块670耦合,实现处理器610与音频模块670之间的通信。在一些实施例中,音频模块670可以通过I2S接口向无线通信模块660传递音频信号,实现通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块670与无线通信模块660可以通过PCM总线接口耦合。在一些实施例中,音频模块670也可以通过PCM接口向无线通信模块660传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器610与无线通信模块660。例如:处理器610通过UART接口与无线通信模块660中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块670可以通过UART接口向无线通信模块660传递音频信号,实现通过蓝牙耳机播放音乐的功能。
MIPI接口可以被用于连接处理器610与显示屏694,摄像头693等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器610和摄像头693通过CSI接口通信,实现手机100的拍摄功能。处理器610和显示屏694通过DSI接口通信,实现手机100的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器610与摄像头693,显示屏694,无线通信模块660,音频模块670,传感器模块680等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
USB接口630是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口630可以用于连接充电器为手机100充电,也可以用于手机100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。
可以理解的是,本发明实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对手机100的结构限定。在本申请另一些实施例中,手机100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块640用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块640可以通过USB接口630接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块640可以通过手机100的无线充电线圈接收无线充电输入。充电管理模块640为电池642充电的同时,还可以通过电源管理模块641为电子设备供电。
电源管理模块641用于连接电池642,充电管理模块640与处理器610。电源管理模块641接收电池642和/或充电管理模块640的输入,为处理器610,内部存储器621,显示屏694,摄像头693,和 无线通信模块660等供电。电源管理模块641还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块641也可以设置于处理器610中。在另一些实施例中,电源管理模块641和充电管理模块640也可以设置于同一个器件中。
手机100的无线通信功能可以通过天线1,天线2,移动通信模块650,无线通信模块660,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。手机100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块650可以提供应用在手机100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块650可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块650可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块650还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块650的至少部分功能模块可以被设置于处理器610中。在一些实施例中,移动通信模块650的至少部分功能模块可以与处理器610的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器670A,受话器670B等)输出声音信号,或通过显示屏694显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器610,与移动通信模块650或其他功能模块设置在同一个器件中。
无线通信模块660可以提供应用在手机100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块660可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块660经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器610。无线通信模块660还可以从处理器610接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,手机100的天线1和移动通信模块650耦合,天线2和无线通信模块660耦合,使得手机100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
手机100通过GPU,显示屏694,以及应用处理器等实现显示功能。GPU为图像处理的微处理器, 连接显示屏694和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器610可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏694用于显示图像,视频等。显示屏694包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Mini-LED,Micro-LED,Micro-OLED,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,手机100可以包括1个或N个显示屏694,N为大于1的正整数。
手机100可以通过ISP,摄像头693,视频编解码器,GPU,显示屏694以及应用处理器等实现拍摄功能。
ISP用于处理摄像头693反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头693中。
摄像头693用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,手机100可以包括1个或N个摄像头693,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当手机100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。手机100可以支持一种或多种视频编解码器。这样,手机100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现手机100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口620可以用于连接外部存储卡,例如Micro SD卡,实现扩展手机100的存储能力。外部存储卡通过外部存储器接口620与处理器610通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。例如本申请实施例中,用户选择的待剪辑素材中所包括的图片和/或视频可以是存储在手机100的外部存储卡中的图片和/或视频素材。
内部存储器621可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器621可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储手机100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器621可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器610通过运行存储在内部存储器621的指令,和/或存储在设置于处理器中的存储器的指令,执行手机100的各种功能应用以及数据处理。例如本申请实施例中,手机100所运行的剪视频应用511 在进行视频剪辑过程中,对待剪辑素材的各帧或各段视频片段完成剪辑后的暂存,可以存储在内部存储器621内,手机100通过剪视频应用511执行本申请的视频剪辑方法的相关指令,也可以存储在内部存储器621内,在此不做限制。
手机100可以通过音频模块670,扬声器670A,受话器670B,麦克风670C,耳机接口670D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块670用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块670还可以用于对音频信号编码和解码。在一些实施例中,音频模块670可以设置于处理器610中,或将音频模块670的部分功能模块设置于处理器610中。
扬声器670A,也称“喇叭”,用于将音频电信号转换为声音信号。手机100可以通过扬声器670A收听音乐,或收听免提通话。
受话器670B,也称“听筒”,用于将音频电信号转换成声音信号。当手机100接听电话或语音信息时,可以通过将受话器670B靠近人耳接听语音。
麦克风670C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风670C发声,将声音信号输入到麦克风670C。手机100可以设置至少一个麦克风670C。在另一些实施例中,手机100可以设置两个麦克风670C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,手机100还可以设置三个,四个或更多麦克风670C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口670D用于连接有线耳机。耳机接口670D可以是USB接口630,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器680A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器680A可以设置于显示屏694。压力传感器680A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器680A,电极之间的电容改变。手机100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏694,手机100根据压力传感器680A检测所述触摸操作强度。手机100也可以根据压力传感器680A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器680B可以用于确定手机100的运动姿态。在一些实施例中,可以通过陀螺仪传感器680B确定手机100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器680B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器680B检测手机100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消手机100的抖动,实现防抖。陀螺仪传感器680B还可以用于导航,体感游戏场景。
气压传感器680C用于测量气压。在一些实施例中,手机100通过气压传感器680C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器680D包括霍尔传感器。手机100可以利用磁传感器680D检测翻盖皮套的开合。在一些实施例中,当手机100是翻盖机时,手机100可以根据磁传感器680D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器680E可检测手机100在各个方向上(一般为三轴)加速度的大小。当手机100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。
距离传感器680F,用于测量距离。手机100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,手机100可以利用距离传感器680F测距以实现快速对焦。
接近光传感器680G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。手机100通过发光二极管向外发射红外光。手机100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定手机100附近有物体。当检测到不充分的反射光时,手机100可以确定手机100附近没有物体。手机100可以利用接近光传感器680G检测用户手持手机100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器680G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器680L用于感知环境光亮度。手机100可以根据感知的环境光亮度自适应调节显示屏694亮度。环境光传感器680L也可用于拍照时自动调节白平衡。环境光传感器680L还可以与接近光传感器680G配合,检测手机100是否在口袋里,以防误触。
指纹传感器680H用于采集指纹。手机100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器680J用于检测温度。在一些实施例中,手机100利用温度传感器680J检测的温度,执行温度处理策略。例如,当温度传感器680J上报的温度超过阈值,手机100执行降低位于温度传感器680J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,手机100对电池532加热,以避免低温导致手机100异常关机。在其他一些实施例中,当温度低于又一阈值时,手机100对电池532的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器680K,也称“触控器件”。触摸传感器680K可以设置于显示屏694,由触摸传感器680K与显示屏694组成触摸屏,也称“触控屏”。触摸传感器680K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏694提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器680K也可以设置于手机100的表面,与显示屏694所处的位置不同。
骨传导传感器680M可以获取振动信号。在一些实施例中,骨传导传感器680M可以获取人体声部振动骨块的振动信号。骨传导传感器680M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器680M也可以设置于耳机中,结合成骨传导耳机。音频模块670可以基于所述骨传导传感器680M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器680M获取的血压跳动信号解析心率信息,实现心率检测功能。
按键690包括开机键,音量键等。按键690可以是机械按键。也可以是触摸式按键。手机100可以接收按键输入,产生与手机100的用户设置以及功能控制有关的键信号输入。
马达691可以产生振动提示。马达691可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏694不同区域的触摸操作,马达691也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器692可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口695用于连接SIM卡。SIM卡可以通过插入SIM卡接口695,或从SIM卡接口695拔出, 实现和手机100的接触和分离。手机100可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口695可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口695可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口695也可以兼容不同类型的SIM卡。SIM卡接口695也可以兼容外部存储卡。手机100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,手机100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在手机100中,不能和手机100分离。
作为示例,图7示出了一种手机100的软件结构框图。
手机100的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本发明实施例以分层架构的Android系统为例,示例性说明手机100的软件结构。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓 TM运行时(Android runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包。
如图7所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图7所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供手机100的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓 TM系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓 TM的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries), 三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和5E图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
下面结合本申请实施例中,用户选择添加的待剪辑素材的图片或视频来源之一,手机100捕获拍照的场景,示例性说明手机100软件以及硬件的工作流程。
当触摸传感器680K接收到触摸操作,相应的硬件中断被发给内核层。内核层将触摸操作加工成原始输入事件(包括触摸坐标,触摸操作的时间戳等信息)。原始输入事件被存储在内核层。应用程序框架层从内核层获取原始输入事件,识别该输入事件所对应的控件。以该触摸操作是触摸单击操作,该单击操作所对应的控件为剪视频应用图标的控件为例,剪视频应用调用应用框架层的接口,启动剪视频应用,如果用户选择通过剪视频应用调用相机应用拍摄待剪辑的图像或视频,则剪视频应用可以调用相机应用、进而通过相机应用调用内核层启动摄像头驱动,通过摄像头693捕获静态图像或视频。
在说明书对“一个实施例”或“实施例”的引用意指结合实施例所描述的具体特征、结构或特性被包括在根据本申请公开的至少一个范例实施方案或技术中。说明书中的各个地方的短语“在一个实施例中”的出现不一定全部指代同一个实施例。
本申请公开还涉及用于执行文本中的操作装置。该装置可以专门处于所要求的目的而构造或者其可以包括被存储在计算机中的计算机程序选择性地激活或者重新配置的通用计算机。这样的计算机程序可以被存储在计算机可读介质中,诸如,但不限于任何类型的盘,包括软盘、光盘、CD-ROM、磁光盘、只读存储器(ROM)、随机存取存储器(RAM)、EPROM、EEPROM、磁或光卡、专用集成电路(ASIC)或者适于存储电子指令的任何类型的介质,并且每个可以被耦合到计算机系统总线。此外,说明书中所提到的计算机可以包括单个处理器或者可以是采用针对增加的计算能力的多个处理器涉及的架构。
本文所提出的过程和显示器固有地不涉及任何具体计算机或其他装置。各种通用系统也可以与根据本文中的教导的程序一起使用,或者构造更多专用装置以执行一个或多个方法步骤可以证明是方便的。在一下描述中讨论了用于各种这些系统的结构。另外,可以使用足以实现本申请公开的技术和实施方案的任何具体编程语言。各种编程语言可以被用于实施本公开,如本文所讨论的。
另外,在本说明书所使用的语言已经主要被选择用于可读性和指导性的目的并且可能未被选择为描绘或限制所公开的主题。因此,本申请公开旨在说明而非限制本文所讨论的概念的范围。

Claims (14)

  1. 一种视频剪辑方法,应用于电子设备,其特征在于,包括:
    利用第一剪辑模型对待剪辑素材进行第一剪辑,得到第一剪辑视频;
    对所述第一剪辑视频进行评估得到第一反馈值;
    根据所述第一反馈值确定第二剪辑模型,并利用第二剪辑模型对所述待剪辑素材进行第二剪辑,得到第二剪辑视频;
    对第二剪辑视频进行评估得到第二反馈值,其中,所述第二反馈值高于所述第一反馈值;
    将所述第二剪辑视频作为所述待剪辑素材的输出剪辑视频。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述第一反馈值确定第二剪辑模型,具体为:根据所述第一反馈值,将所述第一剪辑模型的参数调整至所述第二剪辑模型的参数。
  3. 根据权利要求2所述的方法,其特征在于,所述对所述第一剪辑视频进行评估得到第一反馈值,包括:
    通过评价模型对所述第一剪辑视频进行评估以确定所述第一反馈值。
  4. 根据权利要求3所述的方法,其特征在于,所述评价模型包括对应于多种剪辑风格的评分规则。
  5. 根据权利要求4所述的方法,其特征在于,所述通过评价模型对所述第一剪辑视频进行评估以确定所述第一反馈值,包括:
    响应于用户所选择的剪辑风格,所述评价模型采用对应于用户所选剪辑风格的评分规则对所述第一剪辑视频进行评分以确定所述第一反馈值。
  6. 根据权利要求5所述的方法,其特征在于,所述对第二剪辑视频进行评估得到第二反馈值,包括:
    通过所述评价模型对所述第二剪辑视频进行评分以确定所述第二反馈值,其中
    所述评价模型采用对应于用户所选剪辑风格的评分规则对所述第二剪辑视频进行评分。
  7. 根据权利要求6所述的方法,其特征在于,所述第二反馈值高于所述第一反馈值,包括:
    所述评价模型对所述第二剪辑视频的评分高于所述评价模型对所述第一剪辑视频的评分。
  8. 根据权利要求4至7中任一项所述的方法,其特征在于,所述多种剪辑风格包括港风、童年、漫画、悬疑、中国风、可爱、唯美中的一种或多种。
  9. 根据权利要求1至8中任一项所述的方法,其特征在于,所述第一剪辑模型或所述第二剪辑模型,对待剪辑素材进行第一剪辑所采取的剪辑策略包括下列中的至少一项:
    对所述待剪辑视频执行的划分待剪辑视频片段的操作;
    对划分得到的各所述待剪辑视频片段确定剪辑动作的操作;
    以及对各所述待剪辑视频片段采用确定的剪辑动作进行剪辑操作。
  10. 根据权利要求9所述的方法,其特征在于,所述剪辑动作包括下列中的任一项或多项组合:
    添加转场特效;添加动感特效;使用滤镜;标记为精彩片段;进行镜头拼接;变速处理;添加背景音乐;添加特效音频;调节声音。
  11. 根据权利要求1至10中任一项所述的方法,其特征在于,所述待剪辑素材包括待剪辑的图片和/或视频。
  12. 一种电子设备,其特征在于,包括:一个或多个处理器;一个或多个存储器;所述一个或多个存储器存储有一个或多个程序,当所述一个或者多个程序被所述一个或多个处理器执行时,使得所述电子设备执行权利要求1至11中任一项所述的视频剪辑方法。
  13. 一种计算机可读存储介质,其特征在于,所述存储介质上存储有指令,所述指令在计算机上执行时使所述计算机执行权利要求1至11中任一项所述的视频剪辑方法。
  14. 一种计算机程序产品,其特征在于,包括计算机程序或指令;所述计算机程序或指令在计算机上被处理器执行时使所述计算机执行权利要求1至11中任一项所述的视频剪辑方法。
PCT/CN2022/114268 2021-08-31 2022-08-23 视频剪辑方法、电子设备及存储介质 WO2023030098A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111012801.6 2021-08-31
CN202111012801.6A CN115734032A (zh) 2021-08-31 2021-08-31 视频剪辑方法、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023030098A1 true WO2023030098A1 (zh) 2023-03-09

Family

ID=85291478

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/114268 WO2023030098A1 (zh) 2021-08-31 2022-08-23 视频剪辑方法、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN115734032A (zh)
WO (1) WO2023030098A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117278802A (zh) * 2023-11-23 2023-12-22 湖南快乐阳光互动娱乐传媒有限公司 一种视频剪辑痕迹的比对方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002857A (zh) * 2018-07-23 2018-12-14 厦门大学 一种基于深度学习的视频风格变换与自动生成方法及系统
CN109819338A (zh) * 2019-02-22 2019-05-28 深圳岚锋创视网络科技有限公司 一种视频自动剪辑方法、装置及便携式终端
US20190182565A1 (en) * 2017-12-13 2019-06-13 Playable Pty Ltd System and Method for Algorithmic Editing of Video Content
CN112770061A (zh) * 2020-12-16 2021-05-07 影石创新科技股份有限公司 视频剪辑方法、系统、电子设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190182565A1 (en) * 2017-12-13 2019-06-13 Playable Pty Ltd System and Method for Algorithmic Editing of Video Content
CN109002857A (zh) * 2018-07-23 2018-12-14 厦门大学 一种基于深度学习的视频风格变换与自动生成方法及系统
CN109819338A (zh) * 2019-02-22 2019-05-28 深圳岚锋创视网络科技有限公司 一种视频自动剪辑方法、装置及便携式终端
CN112770061A (zh) * 2020-12-16 2021-05-07 影石创新科技股份有限公司 视频剪辑方法、系统、电子设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117278802A (zh) * 2023-11-23 2023-12-22 湖南快乐阳光互动娱乐传媒有限公司 一种视频剪辑痕迹的比对方法及装置
CN117278802B (zh) * 2023-11-23 2024-02-13 湖南快乐阳光互动娱乐传媒有限公司 一种视频剪辑痕迹的比对方法及装置

Also Published As

Publication number Publication date
CN115734032A (zh) 2023-03-03

Similar Documents

Publication Publication Date Title
CN110134316B (zh) 模型训练方法、情绪识别方法及相关装置和设备
WO2020078299A1 (zh) 一种处理视频文件的方法及电子设备
US20220300251A1 (en) Meme creation method and apparatus
WO2021258814A1 (zh) 视频合成方法、装置、电子设备及存储介质
CN109981885B (zh) 一种电子设备在来电时呈现视频的方法和电子设备
US20220343648A1 (en) Image selection method and electronic device
CN112214636A (zh) 音频文件的推荐方法、装置、电子设备以及可读存储介质
CN114363527B (zh) 视频生成方法和电子设备
WO2020192761A1 (zh) 记录用户情感的方法及相关装置
WO2023173850A1 (zh) 视频处理方法、电子设备及可读介质
CN113170037A (zh) 一种拍摄长曝光图像的方法和电子设备
WO2023030098A1 (zh) 视频剪辑方法、电子设备及存储介质
WO2022135157A1 (zh) 页面显示的方法、装置、电子设备以及可读存储介质
WO2023179490A1 (zh) 应用推荐方法和电子设备
CN114444000A (zh) 页面布局文件的生成方法、装置、电子设备以及可读存储介质
CN113536834A (zh) 眼袋检测方法以及装置
CN114173184A (zh) 投屏方法和电子设备
WO2023045597A1 (zh) 大屏业务的跨设备流转操控方法和装置
CN113723397A (zh) 一种截屏方法及电子设备
CN112416984A (zh) 一种数据处理方法及其装置
CN115730091A (zh) 批注展示方法、装置、终端设备及可读存储介质
CN115964231A (zh) 基于负载模型的评估方法和装置
WO2023116669A1 (zh) 视频生成系统、方法及相关装置
WO2023065832A1 (zh) 视频的制作方法和电子设备
CN115359156B (zh) 音频播放方法、装置、设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22863237

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE