CN115052201A

CN115052201A - Video editing method and electronic equipment

Info

Publication number: CN115052201A
Application number: CN202210535943.9A
Authority: CN
Inventors: 蔡嘉琦
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-05-17
Filing date: 2022-05-17
Publication date: 2022-09-13

Abstract

The embodiment of the application discloses a video clipping method and electronic equipment, wherein the method comprises the following steps: receiving target video material submitted for a current video clip project; splitting a plurality of video segments from the target video material by using a preset algorithm, and respectively generating text labels for the plurality of video segments, wherein the text labels are used for describing the plot content characteristics of the video segments; generating at least one shot script frame according to the text labels corresponding to the video segments, wherein the shot script frame is formed by combining the text labels according to a target sequence; and after a request for generating a new video is received, splicing the video segments corresponding to the text labels according to the target sequence according to the shot script frame, and generating the new video for the video clip project. Through the embodiment of the application, the effect of the video mixing and cutting process can be improved by the cutting personnel.

Description

Video editing method and electronic equipment

Technical Field

The present application relates to the field of video editing technologies, and in particular, to a video editing method and an electronic device.

Background

Video mixing and clipping refers to mixing and clipping a plurality of video materials, so that various scene shots are mixed together to express a certain story or meaning. Video mixing and cutting are widely applied to various scenes, for example, in a commodity information service system, advertisement videos of a certain commodity or a shop may need to be delivered to an external content service system, and at the moment, a plurality of video materials need to be collected, mixed and cut, short videos are generated, and then the videos are delivered. Alternatively, the merchant may need to deliver short videos to the item detail page or the store top page, the recommendation information stream, etc., and in this case, it may also need to collect video materials and then perform mixed cropping, etc.

In the prior art, some video mixing and cutting tools exist, however, an editing person is generally required to cut out video segments from video materials after watching the video materials, and then splice the cut out video segments together to generate a short video. Moreover, only one video clip can be captured at a time, and if a plurality of video clips need to be captured from the same video material, the related operations of opening the video material, selecting a clip area, determining the capture and the like need to be repeatedly executed. Therefore, the implementation process of video mixing and cutting is relatively complicated, and a very large amount of work is required for editing personnel.

Therefore, how to help the editing personnel to improve the effect of the video mixing and cutting process becomes a technical problem to be solved by the technical personnel in the field.

Disclosure of Invention

The application provides a video clipping method and electronic equipment, which can help a clipping person to improve the effect of a video mixing and clipping process.

The present application provides the following:

a video clipping method comprising:

receiving target video material submitted for a current video clip project;

splitting a plurality of video segments from the target video material by using a preset algorithm, and respectively generating text labels for the plurality of video segments, wherein the text labels are used for describing the plot content characteristics of the video segments;

generating at least one shot script frame according to the text labels corresponding to the video segments, wherein the shot script frame is formed by combining the text labels according to a target sequence;

and after a request for generating a new video is received, splicing the video segments corresponding to the text labels according to the target sequence according to the shot script frame, and generating the new video for the video clip project.

Wherein, still include:

providing a video clip interface, wherein the video clip interface comprises video track elements; the video track elements are generated according to a sequence composed of a plurality of image frames included by the target video material in a time dimension;

after the video clips are split, displaying the positions of the video clips on the video track elements.

Wherein, still include:

displaying the text labels corresponding to the plurality of video clips on the video track element.

Wherein, still include:

and adding different visual features to the areas where the video clips corresponding to different text labels are located on the video track elements.

Wherein, still include:

responding to the operation of adjusting the start-stop time of the video clip executed by the user through the video track element.

Wherein, still include:

after the target video clip on the video track element is selected, providing an operation option for modifying the corresponding text label so as to perform text label modification operation on the currently selected target video clip.

Wherein, still include:

and responding to the video clip interception operation manually executed by the user through the video track element.

Wherein the video clip interface further comprises: and according to the text labels included in the shot script frame, classifying and displaying the split video clips.

The area also comprises operation options for setting the number of the video clips selected under the same text label.

Wherein the generating at least one shot script frame according to the text labels corresponding to the plurality of video clips comprises:

and matching and judging with a plurality of pre-generated sub-lens script frame templates according to the text labels corresponding to the plurality of video segments, and generating at least one sub-lens script frame according to the successfully matched sub-lens script frame template.

Wherein, still include:

judging whether a video segment corresponding to a certain text label is missing in a plurality of video segments split currently according to a plurality of text labels required to be included in the shot script frame template matched successfully;

and if so, carrying out prompt information, wherein the prompt information is used for prompting a user to add a target video material for the current video clip item so as to complete the missing video segment.

Wherein the video clip item is used for generating a video related to a target commodity;

the method further comprises the following steps: determining a target industry to which the target commodity belongs;

the generating at least one shot script frame according to the text labels corresponding to the plurality of video clips comprises:

and generating the at least one shot script frame according to the text labels corresponding to the plurality of video segments and the shot script frame generation rule corresponding to the target industry.

Wherein, in the split video segments, if there are a plurality of video segments corresponding to the same text tag, when generating a new video for the video clip item, the method includes:

and according to the plurality of text labels included in the shot script frame, performing cross-product combination on the video clips under the text labels to generate a plurality of new videos for delivering the new videos to a target flow field.

A video clipping device comprising:

a video material receiving unit for receiving a target video material submitted for a current video clip item;

the video clip splitting unit is used for splitting a plurality of video clips from the target video material by using a preset algorithm and respectively generating text labels for the plurality of video clips, wherein the text labels are used for describing the plot content characteristics of the video clips;

the script frame generating unit is used for generating at least one shot script frame according to the text labels corresponding to the video fragments, wherein the shot script frame is formed by combining a plurality of text labels according to a target sequence;

and the video generating unit is used for splicing the video segments corresponding to the text labels according to the target sequence according to the shot script framework after receiving a request for generating a new video, and generating the new video for the video clip project.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any of the preceding claims.

An electronic device, comprising:

one or more processors; and

a memory associated with the one or more processors for storing program instructions that, when read and executed by the one or more processors, perform the steps of the method of any of the preceding claims.

According to the specific embodiments provided herein, the present application discloses the following technical effects:

according to the embodiment of the application, after a target video material submitted aiming at a current video clip project is received, a plurality of video segments can be split from the target video material by using a preset algorithm, text labels are respectively generated for the video segments, and the plot content characteristics of the video segments are described through the text labels. And then, according to the shot script frame, splicing the video segments corresponding to the text labels according to the target sequence to generate a new video for the video clip item. Through this kind of mode, because can follow the video material automatically in the split a plurality of video segments of going out, and generate corresponding text label, consequently can avoid the personnel of editing to intercept through the mode of manual operation's one by one of segment, can support once only to edit a plurality of material segments from whole section video material, realize that high-efficient convenient video segment is in batches edited. And through automatically generating a suitable shot script frame, the split material segments can be added into the specific script frame according to the corresponding text labels, so that a new video is automatically generated, and therefore, the efficiency of editing personnel can be improved as a whole.

In addition, after multiple video materials are submitted for the same video editing project, multiple video segments can be edited under the same text label, and therefore batch generation of multiple new videos can be supported in the modes of cross-multiplication combination and the like of the multiple video segments under different text labels, and therefore the requirement for video diversification can be met under the scenes of video delivery to a target flow field and the like.

Moreover, through the display of the video track material on the video editing interface, the split video clips can be displayed in the time dimension, and the differences of the video clips corresponding to different text labels are displayed by using visualization means such as color matching, transparency and morphology, so that the method is easy to understand and recognize, and high in user friendliness. In addition, the video segments split from each video material can be displayed in a clustering manner in the text label dimension according to the text labels included in the specifically generated script frame, so that a user can visually see which video segments are split under various text labels respectively, the amount of new videos which can be generated in anticipation, the duration of a single video and the like can be displayed, the user can trigger the final video generation operation under the condition that whether the requirements are definitely met or not, if the requirements are not met, adjustment can be performed through the video clip interface, including adding new video materials to the project, or adjusting the amount of the video segments selected under each text label, and the like.

Of course, it is not necessary for any product to achieve all of the above-described advantages at the same time for the practice of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic diagram of a process flow provided by an embodiment of the present application;

FIG. 2 is a flow chart of a method provided by an embodiment of the present application;

3-9 are schematic diagrams of video clip related interfaces provided by embodiments of the present application;

FIG. 10 is a schematic view of an apparatus provided by an embodiment of the present application;

fig. 11 is a schematic diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments that can be derived from the embodiments given herein by a person of ordinary skill in the art are intended to be within the scope of the present disclosure.

In the embodiment of the application, a related solution is provided for helping the editing personnel to improve the effect. In this scenario, a clipboard may be provided with a video mix-cutting tool by which video mix-cutting projects may be created and specific video material specified. In an alternative embodiment, the industry described for a particular good may also be specified, and so on, for video clip requirements associated with the good and so on. Then, the specific video mixing and cutting tool can automatically split a plurality of video segments from the video material by using a pre-trained algorithm (which can be trained in different industries respectively), and can generate text labels for the video segments respectively to represent the plot content characteristics of the video segments. For example, in a merchandise-related video mix-cutting application, specific text labels may include point of sale introductions, appearance designs, brand displays, product trials, lead purchases, and so forth. Of course, these text labels may also be related to a specific industry, and may be some descriptors commonly focused and understood by merchants, consumers, and the like in the specific industry. After the video segments are split and the corresponding text labels are determined, a shot script frame can be generated according to the text labels corresponding to the split video segments, and the script frame can be determined according to a preset script template or can be dynamically generated according to some script generation rules. After a specific script frame is generated, the video segments corresponding to the text labels related to the specific script can be filled in the script frame, so that a corresponding short video is generated for the current video mixed cutting project.

That is to say, according to the embodiment of the application, the prediction of the video segments which may need to be captured can be realized, and then the script frame can be automatically determined and the short video can be generated by using the split video segments and the corresponding text labels. The above process can be completely realized in a background running mode, that is, after the user uploads the video material, the system returns the generated short video, and the user does not need to intervene in the middle process. Or, in practical applications, some inaccuracy may exist due to the result predicted by a specific algorithm, for example, a certain segment may have a tail sound cut off, or a text label of a certain video segment may be not accurate enough, or a user may need to additionally cut out other video segments, and so on. Therefore, in the embodiment of the present application, a visual video mixing and cutting interface may also be provided for the user, and in this interface, video track elements, specifically video clips identified by an algorithm, may be provided to show the positions (including start and stop times, etc.) of the video clips on the video track elements on the time axis. Specific text labels may also be presented on the video track. Different visual features can be added to the areas where different video clips are located according to different text labels, for example, the areas are expressed by different transparencies, colors and the like. In addition, the start-stop position of each video clip can be adjusted, and if the user finds that the start-stop time of a certain video clip is not accurate enough, the start-stop position of each video clip can be adjusted manually. Alternatively, a new video clip may also be manually intercepted on the video track, and so on. Moreover, text labels in the script can be provided in the interface, and areas for classified display of the split video clips can be provided, so that a user can more intuitively know which video clips are split under each text label respectively, and the like. Therefore, the video segments split by the algorithm, the added text labels, the generated shot scripts and the like can be used as references in the mixed cutting process, and a user can complete the mixed cutting processing of the videos more quickly and efficiently based on the reference information.

From the perspective of system architecture, the embodiment of the present application may provide a related cut-mixing function for a user with a video cut-mixing requirement, and specifically, the function may exist in the form of an application (a stand-alone application, or a functional module, an applet, and the like in another application). Or the video clip may also exist in the form of Web, H5, and the like, and a user may create a specific video clip item in a page in a manner of accessing a specific page website through a browser, and then, as shown in fig. 1, may submit a video material, and may also perform industry selection, and then, the system may perform automatic video clip splitting, and generate a text tag (also may be referred to as an event label), obtain a plurality of video clips, and after associating the text tags respectively, may generate a script, and generate a final video. In the process, the user can also perform manual operations such as relevant adjustment, and the finally generated short video can be downloaded to the local or directly published to an associated information publishing system, and the like.

The following describes in detail specific implementations provided in embodiments of the present application.

First, an embodiment of the present application provides a video clipping method, and referring to fig. 2, the method may specifically include:

s201: target video material submitted for the current video clip item is received.

In a scene of video mixed cutting, a plurality of video segments are required to be cut from a plurality of video materials and then spliced into a short video, so that in a specific implementation, specific cutting processing can be performed in units of video cutting items. That is, when a user needs to perform video mixing and clipping, a video clip project may be created through the interface provided in the embodiment of the present application, and accordingly, the user may submit a specific video material for the created video clip project, and the clipping tool may perform processing such as storing of data such as related material and splitting result in units of projects.

Where the target video material may be submitted by a user, the particular source of the material may not be limited. For example, the content may be a content that is previously submitted to a content library of the video clip system for saving, a content that is locally uploaded from the user terminal device, or the like.

It should be noted here that, in the embodiment of the present application, a specific application scenario may also not be limited, but for example, a video clip scenario related to a commodity may be specifically used. In the video clip scene related to the commodity, in order to facilitate more accurate splitting of video clips, generation of text labels and subsequent generation of scripts, industry-based processing can be performed. For example, when an algorithm model is generated in advance, training may be performed by industry, and when a script is generated, a script template may be generated by industry or a script generation rule may be set. Thus, in this case, after the user creates a specific video clip item, in addition to submitting specific video material, operational options for industry selection may be provided. For example, as shown in fig. 3, operational options for uploading video footage may be provided in the interface, and in addition, operational options for selecting industries may also be provided, which may include, for example, make-up, personal care, mother and baby, and so on. Furthermore, the aspect ratio, the duration, and the like of the short video to be generated may also be set.

S202: and splitting a plurality of video segments from the target video material by using a preset algorithm, and respectively generating text labels for the plurality of video segments, wherein the text labels are used for describing the plot content characteristics of the video segments.

After the video materials are uploaded, the video clips can be split and the text labels can be generated for the video materials respectively. The specific splitting and text label generation process can be performed through an algorithm model trained in advance. In the specific implementation, the emphasis of information concerned by merchants or consumer users may be different in different industries, so the algorithm model may be trained separately in different industries. That is, each industry may train a respective algorithmic model. For example, when a certain industry is trained, a large number of video segments related to the industry can be collected, and text labels are labeled, a specific algorithm model can perform feature extraction (including image features, voice features and the like) based on the training sample, and output probabilities belonging to various text labels, then parameters in the model are modified by comparing the labeled text labels, and then the next iteration is performed, and after the algorithm is converged, corresponding parameters can be stored to generate the specific algorithm model. Wherein, the specific algorithm model training process is not detailed here.

After obtaining the specific algorithm model, the method may be used to identify a video material, split a plurality of video segments from the video material, and generate corresponding text labels, where the text labels are texts that can describe the story content characteristics of the specific video segments, and may include, for example, selling point introduction, appearance design, product trial, and the like.

As mentioned earlier, in a preferred embodiment, video track elements may also be provided in a video clip interface, such video tracks in particular being generated from a sequence of a plurality of image frames comprised by the target video material, organized in a temporal dimension. And, a video clip splitting result of a specific algorithm and a corresponding text label may be presented based on the video track element. For example, as shown in fig. 4, 41 is a video track element shown for the currently selected video material, and 42 is a case where a plurality of video segments split by the algorithm are shown on the video track. As can be seen from the figure, the start and stop time information of each commodity fragment can be visually seen from the video track. In addition, as shown at 42 in fig. 4, when each video clip is displayed on a specific video track element, a text tag recognition result corresponding to each video clip may also be displayed, for example, as can be seen from the figure, the text tag recognition result may specifically include selling point introduction, guiding purchase, product trial, and the like.

In addition, in order to facilitate the user to distinguish various different types of video clips, different visual features may be added to the video track elements for the areas where the video clips corresponding to different text labels are located. For example, a masking element with a certain transparency may be added to the region where each video clip is located, and the colors of the text labels may be set respectively, so that the video clips corresponding to the same text label may be displayed with the same visual features, the video clips of different text labels may be displayed with different visual features, and so on.

Some errors may exist due to the recognition result of the algorithm, for example, the end position of a certain commercial product segment may fall just before the end of the tail tone of a certain word, and so on. Therefore, the operation of adjusting the start-stop time of the video clip executed by the user can also be responded through the video track element. During specific implementation, a user can select a specific video clip to play on a selected video track, and if the situation that the starting and stopping time points are not accurate enough is found, the starting and stopping points of the video clip can be corrected manually. For example, as shown at 51 in fig. 5, for the selected video segment, the left and right sides of the segment may be shown in a draggable state, and the start and stop points of the video segment may be adjusted by dragging left and right.

In addition, a specific algorithm error may also be embodied in text tag identification of the video segment, and therefore, after the target video segment on the video track element is selected, an operation option for modifying the corresponding text tag may also be provided, so as to perform a text tag modification operation on the currently selected target video segment.

Furthermore, the video track element can also respond to the video clip interception operation manually executed by the user. That is, besides the automatic video clip splitting performed by the algorithm, the operation of manually splitting the video clip by the user can be supported. For example, as shown at 43 in fig. 4, an operation option for adding a video clip may be provided in a free area of the video track, and after clicking the operation option, as shown in fig. 6, an operation control for dragging left and right to determine the start and end positions of the video clip may be provided at the corresponding position, and in addition, the user may add a text label to the specifically intercepted video clip, and so on.

It should be noted that, when multiple video materials are uploaded in the same video clip item, processing such as splitting of video segments and generation of text labels can be performed on each video material, and in addition, more video materials can be added to the current video clip item through an "upload material" option in the interface, so as to split more video segments.

S203: and generating at least one shot script frame according to the text labels corresponding to the video segments, wherein the shot script frame is formed by combining the text labels according to a target sequence.

After a plurality of video segments are split from the video material and the text labels respectively corresponding to the video segments are determined, at least one shot script frame can be generated according to the specifically identified text label condition. In the embodiment of the present application, since the video is generated by editing, it is only necessary to define which text labels are needed in the script and in which order the text labels are arranged. For example, the following text labels may be included in a shot script: selling point introduction + appearance design + product trial + texture/ingredient making + guide purchase, etc., that is, the short video generated according to the script can be formed by splicing the video clips corresponding to the text labels in the corresponding order.

For example, in one mode, some shot script templates may be provided in advance, and which text labels in a specific script constitute are defined in the template, so that after a plurality of commodity fragments are split from a video material associated with a current video clip item and corresponding text labels are determined, matching judgment can be performed with the plurality of shot script frame templates, the at least one shot script frame is generated according to a successfully matched shot script frame template, and a corresponding script title is provided according to a title of the successfully matched shot script frame template. For example, when determining matching with a template, if a plurality of text labels included in a template are found and all of the split and recognition results of the current video clip item exist, matching with the template may be considered successful. Specifically, if a template includes a text tag forming set a and a text tag forming set B identified in the video material of the current video clip item, the template is hit if all elements in set a appear in set B. Of course, the same video clip item may hit multiple templates, and thus, multiple script frames may be generated. In the case where the script frame is generated from the template, the name of the specifically generated script may also be determined from the specific template name, and so on. For example, a particular script name may include "all-round", "sell-spot", "experience-seed", and so forth.

Or, if there is no template that is completely hit, but most text labels in a certain template appear in the current video clip item, and only individual text labels do not appear, the template matching may also be considered successful, at this time, the user may be prompted to lack a video clip corresponding to a certain text label, and in order to obtain a better effect, more video materials may be uploaded to complement the corresponding video clip.

It should be noted that, regarding the specific script template, it may also be configured in different industries, that is, each industry may be configured with a plurality of script templates, so that a specific shot script frame may be generated for the current video clip item by using the script templates in the specific industry.

Or, in another implementation, corresponding script generation rules may be provided in advance according to a specific industry, so that the at least one shot script frame may be generated according to the text labels corresponding to the split video segments and the shot script frame generation rules corresponding to the target industry, and the script title may be dynamically generated. There may be various specific implementations regarding specific script generation rules. For example, in one mode, the short video to be generated may be divided into three parts, a head part, a body part and a tail part, and then, the text labels respectively suitable for the respective parts may be set according to the characteristics of each industry. For example, for the cosmetic industry, it is suitable to place a segment of a "selling point introduction" category on the head part, a segment of a "design", "product trial" category on the body part, a segment related to "guide purchase" on the tail part, and the like. In this way, a script frame can be dynamically generated by selecting from each text tag specifically identified in the current video clip item according to the script generation rules.

After the script frame is generated, an operation option for displaying the split video segments according to the script can be provided in the video clip interface, and after an operation request is initiated through the operation option, an area for displaying the split video segments in a classified manner according to text labels included in the shot script frame can be provided in the interface. For example, as shown in 71 in fig. 7, assuming that four scenario frames are generated and the scenario titles are "all-round grass planting", "selling spot grass planting", "experience grass planting 1", and "experience grass planting 2", the four scenario frames may be displayed in a switched manner in this area. When a certain script frame is selected, text labels included in the script can be displayed, and which video segments are respectively determined under each text label. That is, on the video track element, the video segments are arranged in a time sequence, and in the area indicated by 71, the video segments can be displayed according to specific text labels, that is, the video segments corresponding to the same text label are aggregated together for displaying, and the text labels can also be arranged in a sequence in the script.

In addition, as described above, since there may be more video materials uploaded in the same video clip item, there may be more video segments corresponding to the same text tag. However, when generating the final short video, part of the video clips may be selected from the same text label in the same short video. In this case, in the process of displaying the video segments corresponding to the text labels, configuration options may be provided to configure the number of the video segments selected under each text label. For example, in a default state, the head part may select one video clip by default, but if the user needs to enhance the content of the head part, a plurality of video clips may be selected instead by this option, and so on.

S204: and after a request for generating a new video is received, splicing the video segments corresponding to the text labels according to the target sequence according to the shot script frame, and generating the new video for the video clip project.

After the splitting of the video segments and the generation of the script framework are completed, a specific video may be generated. For example, as shown in fig. 7, an operation option of "generate video" may be provided in the interface, and the user may initiate a request to generate video through the operation option. And then, according to the shot script framework, splicing the video segments corresponding to the text labels involved in the script according to a target sequence to generate a new video for the video clip project.

As described above, since a plurality of text labels may be included in the same script frame, each text label may include a plurality of video segments, and only one or a few video segments need to be selected for each text label, in this case, a plurality of new videos may be generated by performing cross-product combination on the video segments under each text label. Thus, under the same script framework, a plurality of different new videos can be combined, and if a plurality of script frameworks are generated, a larger number of new videos can be generated. In a specific implementation, as shown in fig. 8, the number of new videos that can be combined can be prompted in a video clip interface, and in addition, the duration of a single video can be prompted, and the like. The user can judge whether the current splitting condition meets the requirement or not through the information. If the requirements are not met, the adjustment can be performed through the video clip interface, including adding new video material to the project, or adjusting the number of video segments selected under each text label, and the like. If the requirements have been met, the generation of a specific new video can be triggered by the "generate video" option.

For example, as shown in fig. 9, a specific video generation result may be shown, new videos generated under different script frames may be respectively shown, and which video segment is selected under each text label for each new video may also be shown. In addition, some videos can be preferentially displayed from the new videos generated under each script frame (for example, the videos can be evaluated according to multiple aspects such as image quality and material integrity), and the like. Therefore, the requirement of video diversification when the target flow field is released can be met. That is, in the embodiment of the present application, the clipboard can be helped to produce a large amount of videos in a batch manner with higher efficiency, and then the videos can be used for delivery to a target traffic field.

It should be noted that, in the specific implementation, before the new video is specifically generated, as shown in fig. 8, functions of modifying a background color, removing an original sound of the video, adding background music, adding subtitles and voice, and the like may also be provided in the video clip interface, and in addition, a preview function may also be provided in the video clip interface, and after the configuration is completed, the user may preview a specific effect, and after it is confirmed that the effect is expected, the specific new video is generated through a "generate video" option.

In summary, according to the embodiment of the present application, after a target video material submitted for a current video clip project is received, a plurality of video segments are split from the target video material by using a preset algorithm, text labels are respectively generated for the plurality of video segments, and the plot content characteristics of the video segments are described by using the text labels. And then, according to the shot script frame, splicing the video segments corresponding to the text labels according to the target sequence to generate a new video for the video clip item. Through the mode, because a plurality of video segments can be automatically split from the video material and corresponding text labels are generated, the situation that a cutting person cuts the video segments one by one in a manual operation mode can be avoided, a plurality of material segments can be cut from the whole video material at one time, and efficient and convenient video segment batch cutting is realized. And through automatically generating a proper shot script frame, the split material segments can be added into the specific script frame according to the corresponding text labels, so that a new video is automatically generated, and therefore, the efficiency of editing personnel can be improved integrally.

Moreover, through the display of the video track material on the video editing interface, the split video clips can be displayed in the time dimension, and the differences of the video clips corresponding to different text labels are displayed by using visualization means such as color matching, transparency and morphology, so that the method is easy to understand and recognize, and high in user friendliness. In addition, the video segments split from each video material can be displayed in a clustering manner in the dimension of the text label according to the text label included in the specifically generated script frame, so that a user can visually see which video segments are split under various text labels respectively, and the predicted number of new videos which can be generated, the duration of a single video and the like can be displayed, so that the user can trigger the final video generation operation under the condition of definitely determining whether the video segments meet the requirement, and if the video segments do not meet the requirement, the adjustment can be performed through the video clipping interface, including the addition of new video materials to the project, or the adjustment of the number of video segments selected under each text label, and the like.

It should be noted that, in the embodiments of the present application, the user data may be used, and in practical applications, the user-specific personal data may be used in the scheme described herein within the scope permitted by the applicable law, under the condition of meeting the requirements of the applicable law and regulations in the country (for example, the user explicitly agrees, the user is informed, etc.).

In correspondence with the foregoing method embodiment, the present application embodiment further provides a video clipping apparatus, referring to fig. 10, the apparatus may include:

a video material receiving unit 1001 for receiving a target video material submitted for a current video clip item;

a video segment splitting unit 1002, configured to split a plurality of video segments from the target video material by using a preset algorithm, and generate text labels for the plurality of video segments, where the text labels are used to describe characteristics of the plot content of the video segments;

a script frame generating unit 1003, configured to generate at least one shot script frame according to the text labels corresponding to the multiple video segments, where the shot script frame is formed by combining multiple text labels according to a target sequence;

and the video generating unit 1004 is configured to, after receiving a request for generating a new video, splice video segments corresponding to the text labels according to the target sequence according to the shot script framework, and generate a new video for the video clip item.

In a specific implementation, the apparatus may further include:

the video clip interface providing unit is used for providing a video clip interface, and the video clip interface comprises video track elements; the video track element is generated from a sequence composed in a temporal dimension of a plurality of image frames comprised by the target video material;

and the track display unit is used for displaying the positions of the video clips on the video track element after the video clips are split.

In addition, the method can also comprise the following steps:

a text label display unit, configured to display the text labels corresponding to the multiple video clips on the video track element.

And the visual feature adding unit is used for adding different visual features to the areas where the video clips corresponding to the different text labels are located on the video track elements.

And the adjustment processing unit is used for responding to the operation of adjusting the start-stop time of the video clip executed by the user through the video track element.

And the label modifying unit is used for providing an operation option for modifying the corresponding text label after the target video clip on the video track element is selected so as to carry out text label modifying operation on the currently selected target video clip.

And the manual intercepting unit is used for responding to the video clip intercepting operation manually executed by the user through the video track element.

In addition, the video clip interface may further include: and according to the text labels included in the shot script frame, classifying and displaying the split video clips.

Specifically, the area may further include an operation option for setting the number of the video clips selected under the same text label.

The script frame generating unit may be specifically configured to:

At this time, the apparatus may further include:

the judging unit is used for judging whether a video segment corresponding to a certain text label is missing in a plurality of currently split video segments according to a plurality of text labels required to be included in the shot script frame template which is successfully matched;

and if so, prompting information, wherein the prompting information is used for prompting a user to add a target video material to the current video clip item so as to complete the missing video clip.

the apparatus may further include:

the industry determining unit is used for determining a target industry to which the target commodity belongs;

in this case, the script framework generating unit may be specifically configured to:

In a specific implementation, the video generation unit may specifically be configured to:

In addition, the present application also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method described in any of the preceding method embodiments.

And an electronic device comprising:

one or more processors; and

a memory associated with the one or more processors for storing program instructions that, when read and executed by the one or more processors, perform the steps of the method of any of the preceding method embodiments.

FIG. 11 illustrates an architecture of an electronic device, which may include, in particular, a processor 1110, a video display adapter 1111, a disk drive 1112, an input/output interface 1113, a network interface 1114, and a memory 1120. The processor 1110, the video display adapter 1111, the disk drive 1112, the input/output interface 1113, the network interface 1114, and the memory 1120 may be communicatively connected by a communication bus 1130.

The processor 1110 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solution provided by the present Application.

The Memory 1120 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1120 may store an operating system 1121 for controlling the operation of the electronic device 1100, a Basic Input Output System (BIOS) for controlling low-level operations of the electronic device 1100. In addition, a web browser 1123, a data store management system 1124, and a video clip processing system 1125, among others, may also be stored. The video clip processing system 1125 may be an application program that implements the operations of the foregoing steps in this embodiment of the present application. In summary, when the technical solution provided by the present application is implemented by software or firmware, the relevant program codes are stored in the memory 1120 and called for execution by the processor 1110.

The input/output interface 1113 is used for connecting an input/output module to realize information input and output. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

Network interface 1114 is used to connect to a communications module (not shown) to enable the device to interact with other devices for communication. The communication module can realize communication in a wired mode (for example, USB, network cable, etc.), and can also realize communication in a wireless mode (for example, mobile network, WIFI, bluetooth, etc.).

Bus 1130 includes a path that transfers information between the various components of the device, such as processor 1110, video display adapter 1111, disk drive 1112, input/output interface 1113, network interface 1114, and memory 1120.

It should be noted that although the above-mentioned devices only show the processor 1110, the video display adapter 1111, the disk drive 1112, the input/output interface 1113, the network interface 1114, the memory 1120, the bus 1130 and so on, in the implementation process, the devices may also include other components necessary for normal operation. Furthermore, it will be understood by those skilled in the art that the apparatus described above may also include only the components necessary to implement the solution of the present application, and not necessarily all of the components shown in the figures.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The video editing method and the electronic device provided by the present application are introduced in detail, and a specific example is applied in the description to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understanding the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, the specific implementation and the application range may be changed. In view of the above, the description should not be taken as limiting the application.

Claims

1. A video clipping method, comprising:

receiving target video material submitted for a current video clip project;

2. The method of claim 1, further comprising:

providing a video clip interface, wherein the video clip interface comprises video track elements; the video track element is generated from a sequence composed in a temporal dimension of a plurality of image frames comprised by the target video material;

after the video clips are split, displaying positions of the video clips on the video track elements.

3. The method of claim 2, further comprising:

4. The method of claim 2, further comprising:

5. The method of claim 2, further comprising:

6. The method of claim 2, further comprising:

7. The method of claim 2, further comprising:

responding to the video clip intercepting operation manually executed by the user through the video track element.

8. The method of claim 2,

the video clip interface further comprises: and according to the text labels included in the shot script frame, classifying and displaying the split video clips.

9. The method according to any one of claims 1 to 8,

generating at least one shot script frame according to the text labels corresponding to the plurality of video segments, comprising:

10. The method of claim 9, further comprising:

11. The method according to any one of claims 1 to 8,

the video clip item is used for generating a video related to a target commodity;

12. The method according to any one of claims 1 to 8,

in the split video segments, if there are a plurality of video segments corresponding to the same text tag, when generating a new video for the video clip item, the method includes:

13. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 12.

14. An electronic device, comprising:

one or more processors; and

a memory associated with the one or more processors for storing program instructions that, when read and executed by the one or more processors, perform the steps of the method of any of claims 1 to 12.