CN113891113A

CN113891113A - Video clip synthesis method and electronic equipment

Info

Publication number: CN113891113A
Application number: CN202111152811.XA
Authority: CN
Inventors: 刘卓
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2022-01-04
Anticipated expiration: 2041-09-29
Also published as: CN113891113B

Abstract

The embodiment of the application discloses a video clip synthesis method and electronic equipment, wherein the method comprises the following steps: receiving material addition and editing operation of a user through a video editing and combining interface, and determining a video combining scheme, wherein the video combining scheme comprises content to be combined of a plurality of image frames in a video to be combined; determining the total duration of a video to be synthesized in the process of executing video synthesis according to the video synthesis request; determining a plurality of segment time lengths according to the total time length and the target segment quantity, and creating a plurality of segment synthesis tasks according to the plurality of segment time lengths; performing parallel processing on the plurality of segmented synthetic tasks through a multithreading technology; and performing splicing rendering on the segmented synthetic results respectively corresponding to the plurality of segmented synthetic tasks, and outputting a video synthetic result. By the method and the device, video synthesis processing efficiency can be improved.

Description

Video clip synthesis method and electronic equipment

Technical Field

The present application relates to the field of video composition technologies, and in particular, to a video clip composition method and an electronic device.

Background

The product publicity approaches of many merchants or enterprises are enriched due to the rise of industries such as network live broadcast and short video. For example, a video resource of several hours, which contains a lot of product explanation, factory introduction, etc., is live. The live broadcast has timeliness, so that the problem that the platform and the merchant need to solve is solved by how to utilize the existing live broadcast recorded broadcast video resources after the live broadcast is finished.

In the prior art, some systems can identify a starting point and an ending point of a live video for explaining a specific commodity, then can intercept a video clip according to the time point to serve as an explanation video of a corresponding commodity object, and put the explanation video into pages such as a detail page of the commodity for a consumer to check at any time.

Although this method can convert live video contents into commercial explanation videos, the quality of generated explanation videos may vary due to the presence of relatively low-quality or invalid contents in the live video contents. Therefore, some software developers provide a video editing and synthesizing tool for users, and through the tool, users can edit original videos, remove middle low-quality content, splice multiple sections of videos and pictures together, add materials such as subtitles and stickers, and the like, finally synthesize the videos into one section, and then the users can use the synthesized videos to launch the videos.

The video composition can improve the quality of the produced video through the clip splicing function, but because the composition of the picture content needs to be carried out in sequence according to frames in the whole video composition process, the composition time is not less than the time length of the actually produced video, and the video composition time is long. For example, if a video with a time length of 100 seconds needs to be synthesized, after the user finishes preparing, editing, and the like on the material, the time for subsequently generating the synthesized video will be greater than or equal to 100s, which causes a long wait for the user.

Therefore, how to improve the video synthesis processing efficiency becomes a technical problem to be solved by those skilled in the art.

Disclosure of Invention

The application provides a video editing and synthesizing method and electronic equipment, which can improve video synthesizing and processing efficiency.

The application provides the following scheme:

a video clip composition method, comprising:

receiving material addition and editing operation of a user through a video editing and combining interface, and determining a video combining scheme, wherein the video combining scheme comprises content to be combined of a plurality of image frames in a video to be combined;

determining the total duration of a video to be synthesized in the process of executing video synthesis according to the video synthesis request;

determining a plurality of segment time lengths according to the total time length and the target segment quantity, and creating a plurality of segment synthesis tasks according to the plurality of segment time lengths;

performing parallel processing on the plurality of segmented synthetic tasks through a multithreading technology;

and performing splicing rendering on the segmented synthetic results respectively corresponding to the plurality of segmented synthetic tasks, and outputting a video synthetic result.

Wherein the video clip composition interface is generated and presented based on browser technology, and is responsive to a user's material addition and clipping operation in the browser, and performs video composition in the browser.

The page code of the video clip composite page comprises an SDK (software development kit), wherein the SDK is used for providing a video clip function and a video composite function for the video clip composite page; the SDK is common to multiple developers.

Wherein, still include:

after receiving the synthesis request, creating an audio node based on a browser technology, wherein the audio node is used for periodically playing the target sound as a refresh mechanism depended on in the video synthesis process.

Wherein, the determining a plurality of segment durations according to the total duration and the number of segments comprises:

if the total duration is not divisible by the number of segments, processing segment boundaries by rounding the segment durations such that a sum of the segment durations is equal to the total duration.

Wherein the clipping operation comprises: and creating a plurality of material tracks, and editing the picture level and the start and end time of the added material through the material tracks so as to overlap and/or splice a plurality of materials in the time and/or space dimension.

Wherein, still include:

and in the process of responding to the material adding and clipping operations of the user, providing video picture preview content so as to perform video picture based on preview and visually clip the position of the material content in the picture.

A video clip composition method, comprising:

providing a Software Development Kit (SDK) based on a browser technology, an Application Programming Interface (API) of the SDK and a structure description protocol for a plurality of developers, wherein the SDK comprises the SDK for providing a video clip function and the SDK for a video synthesis function, so that the developers can develop a video clip synthesis page displayed based on the browser by using the API and the structure description protocol and write the SDK into a page code;

responding to the added materials and the clipping operation of the user in a browser through the SDK of the video clip function in the process of displaying the video clip composite page to the user;

and after receiving the video synthesis request, performing video synthesis processing in the browser through the SDK of the video synthesis function.

Wherein the SDK further comprises an SDK for providing a preview function;

the method further comprises the following steps:

and in the process of responding to the material adding and clipping operations of the user, providing video picture preview content through the SDK of the preview function so as to perform visual clipping based on the video picture.

Wherein, the video synthesis processing is performed in the browser through the SDK of the video synthesis function, and the processing includes:

determining a plurality of contents to be synthesized corresponding to the multi-frame images in the video to be synthesized respectively according to the material adding and editing operation of the user so as to record the video to be synthesized frame by frame;

when recording the current image frame, respectively converting a plurality of contents to be synthesized corresponding to the current image frame into visual image streams, and providing the visual image streams to the recorder unit for recording the current image frame.

determining the total duration of the video to be synthesized according to the added materials and the editing operation of the user;

determining a plurality of segment time lengths according to the total time length and the target segment number;

creating a plurality of segment synthesis tasks according to the segment durations;

performing parallel processing on the plurality of segmented synthetic tasks by a browser multithreading technology;

And the API corresponding to the SDK of the video synthesis function is associated with a segment quantity parameter so as to enable the developer to specify the target segment quantity.

A video clip composition apparatus comprising:

the video composition scheme determining unit is used for receiving material addition and editing operation of a user through a video editing and compositing interface and determining a video composition scheme, wherein the video composition scheme comprises contents to be composited of a plurality of image frames in a video to be composited;

the total duration determining unit is used for determining the total duration of the video to be synthesized in the process of executing video synthesis according to the video synthesis request;

the segmentation synthesis task creating unit is used for determining a plurality of segmentation time lengths according to the total time length and the target segmentation quantity and creating a plurality of segmentation synthesis tasks according to the plurality of segmentation time lengths;

the parallel processing unit is used for carrying out parallel processing on the plurality of segmented synthetic tasks through a multithreading technology;

and the splicing rendering unit is used for performing splicing rendering on the segmentation synthetic results respectively corresponding to the plurality of segmentation synthetic tasks and outputting the video synthetic result.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any of the preceding claims.

An electronic device, comprising:

one or more processors; and

a memory associated with the one or more processors for storing program instructions that, when read and executed by the one or more processors, perform the steps of the method of any of the preceding claims.

According to the specific embodiments provided herein, the present application discloses the following technical effects:

according to the embodiment of the application, in the video synthesis task execution process driven by a video synthesis scheme (schema), a specific video synthesis task can be divided into a plurality of segmented synthesis tasks according to the total time length of videos to be synthesized and the number of target segments. In this way, the multiple segment synthesis tasks can be processed in parallel through the multithreading technology, and then the video synthesis results are output by performing splicing rendering on the segment synthesis results respectively corresponding to the multiple segment synthesis tasks. By the mode, multithreading sectional parallel synthesis can be carried out, so that the video synthesis efficiency can be improved, and the time required by video synthesis can be shortened.

The specific video composition scheme may be generated by a user through a video clip composition interface after material addition and clipping operations. The specific video clip composition interface can be a Web page generated and displayed based on browser technology, and can directly respond to material addition and clipping operations of a user in a browser, and perform video composition in the browser, so that the service cost of a developer is saved.

The embodiment of the application can also provide a clipping function SDK, a video synthesis function SDK and a structure description protocol based on browser technology for a specific developer. In this way, when the developer develops the video clip composition interface specifically, the developer can focus on designing the style, the front-end and back-end links, and the like of the specific video clip composition interface, hatching more product forms, and jointly building the web video clip ecology because the specific clipping function, the composition function, and the like can be realized by the uniform SDK.

Of course, it is not necessary for any product to achieve all of the above-described advantages at the same time for the practice of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application;

FIG. 2 is a flow chart of a first method provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a video clipping process provided by an embodiment of the present application;

fig. 4 is a schematic diagram of a video composition process provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a process of parallel composition and stitching rendering of segments according to an embodiment of the present application;

FIG. 6 is a flow chart of a second method provided by embodiments of the present application;

FIG. 7 is a schematic view of an apparatus provided by an embodiment of the present application;

fig. 8 is a schematic diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments that can be derived from the embodiments given herein by a person of ordinary skill in the art are intended to be within the scope of the present disclosure.

In order to facilitate understanding of the technical solutions provided in the embodiments of the present application, first, a brief description is given to a flow of video synthesis processing.

In the process of video composition processing, it is usually necessary to input specific materials (including videos, pictures, flower characters, and the like) by a user, and then, in a video composition tool, visual editing can be performed by performing a series of actions such as dragging, and a document can be added to a picture, or the position of the specific materials on a space and a time axis can be adjusted. Then, the video composition tool can know the content that the video that the user needs to compose specifically needs to be composed on each frame. Then, the user can click on operation options such as "synthesize" and enter a specific synthesizing flow. In the process of composition, the recording of the picture content needs to be performed frame by frame. Specifically, in the process of recording frame by frame, for a current image frame, a plurality of items of contents to be synthesized (which may include a certain frame of picture in an original video, and may also include a picture, a document, a picture-in-picture image, and the like displayed by being superimposed on a picture) that need to be displayed in the image frame may be determined according to a material addition and editing result of a user, then the plurality of items of contents to be synthesized may be converted into a visual picture stream (e.g., a Canvas stream), and a Canvas stream corresponding to each item of contents to be synthesized is delivered to a recorder for recording, so that a frame of image in a real video may be generated.

Certainly, in another mode, a specific video composition tool may also provide some templates to a user, where some set special effects, backgrounds, and the like may be included, and the user may complete editing of a video to be composited only by replacing the main video, the file, and other contents in the template. And then, the user can also trigger to enter a specific synthesis flow by clicking a synthesis option and the like, wherein the specific synthesis flow is basically the same as the process.

Just because each frame of image needs to be recorded serially frame by frame, in the prior art, the time required for a specific video synthesis process is longer, at least not shorter than the total time of the video to be synthesized. For example, if a video of 100 seconds needs to be synthesized, the time for specifically generating the synthesized video after the user completes the addition and clipping of the material is not less than 100 seconds. That is, the user experiences that after the material addition and editing operations are completed through a series of dragging and the like, a long time is required to wait for the final composite video to be generated.

In view of the above situation, the embodiment of the present application provides a multithread segmented parallel composition scheme, that is, after a user completes adding and editing a specific material, the total duration of a video to be synthesized may be determined, and then, a plurality of segmented composition tasks may be generated. For example, assuming that the total duration is 100 seconds, divided into four segments, each segment may perform a composition task of 25 seconds therein, where segment 1 corresponds to the 1 st to 25 th seconds, segment 2 corresponds to the 26 th to 50 th seconds, segment 3 corresponds to the 51 st to 75 th seconds, segment 4 corresponds to the 76 th to 100 th seconds, and so on. Each segmented synthesis task can be executed in parallel in a multi-thread mode, and then the respective synthesis results are spliced and rendered to obtain the final synthesized video. In this way, the time required for the composition process is the longest segment composition task plus the time required for the tiled rendering, which together are less than the time required for the serial recording.

The above multithreading segmented parallel composition scheme can be used in the existing video clip composition tool, or the inventor of the present application further finds in the implementation of the present application that: in the prior art, a specific video clip composition tool usually exists in the form of client software, and if a user needs to perform video composition, the client software needs to be downloaded locally. For larger merchandise information systems (e.g., e-commerce systems), the video clip composition tool may belong to a third party tool. Therefore, if a merchant user generates a requirement for synthesizing a video in the process of using the commodity object information system, the merchant user can only download and install the third-party tool, then use the tool to edit, synthesize and make the video, download the synthesized video to the local, and upload the video to the background of a specific developer of a specific commodity information system for release and the like. That is, the user needs to switch between the merchandise object information system and the third party video composition tool.

Therefore, if the commodity object information system can also provide the service related to video composition, convenience can be provided for the merchant. In this case, the multi-thread segmented parallel composition scheme may be applied to a video composition service provided in a specific commodity object information system.

Here, since the commodity object information system itself is mainly based on the commodity object service, the commodity object information system includes distribution of the commodity object, a transaction link, and the like. With respect to specific video composition related requirements, it is common to generate in the service link of a particular developer or developers associated with the merchandise object information system. For example, in a development and operation company of a commodity object information system, different product lines may be corresponded to, and services of different aspects and industries are provided for users; each product line may further include a plurality of different functional modules, and each functional module may correspond to a respective developer, for example, a developer providing a product distribution service, a developer providing an information recommendation service, and the like. Thus, in particular implementations, it may be determined by a particular developer whether to provide video composition services to users in their existing product links.

However, product links of different developers may be different, functions related to synthesizing video may appear in different nodes of different links, and developers may also need to open or merge specific video synthesizing functions with respective specific service links, or perform some personalized settings on specific interfaces to meet overall product tonality of specific developers, and so on. Therefore, it is not feasible if the same video composition application or service is multiplexed between different developers.

In order to achieve the above object, each developer may separately develop a video composition service and provide the video composition service to the user. However, although the design schemes of the developers at the application layer (interface design, front-end back-link design, etc.) are different, the developers involve core content related to video composition, and thus, redundant development between the developers may occur.

In order to reduce redundant Development, the embodiments of the present Application provide a set of general video clip protocols, and a clip and composition kernel with strong customizability, which may include an SDK (Software Development Kit) specifically configured to provide a video clip function and an SDK of a video composition function, and expose an Interface of the specific SDK, that is, an Application Programming Interface (API), to a developer. In this way, each developer can develop video composition services based on the above-described protocol and the API corresponding to the SDK. That is to say, a specific developer may only need to design a page style, a pre-configured postlink, and the like without paying attention to specific clipping and synthetic implementation logics, so that redundant development conditions among different developers are reduced, and data circulation and reusability are improved.

However, in practical applications, there may be the following problems: as described above, the specific merchandise object information system is not specific to the developer who provides the video composition service, and is specific to the product line or the function module in the system, and even if the video composition service is provided, the service may not be the main service content of the developer. Therefore, additional occupation of service resources of a specific developer may be involved, increasing the service cost of the developer. Even for some smaller departments, the available service resources may be more limited, or even no additional service resources are available for providing the video composition service, and so on.

Therefore, for the above situation, the SDK provided in the embodiment of the present application may be directly run in the terminal device of the user, so that the user may use the hardware device resource of the user to complete the specific editing and video synthesizing process. This can be done in the form of client software or also in browser technology. Since the client software involves the problem of downloading and installing, the latter is selected in the embodiment of the present application, that is, the corresponding SDK and the structure description protocol can be provided based on the browser technology. Therefore, each developer can access the set of SDK, package and customize the SDK at an upper application layer, and develop a specific video clip composite page and a related preposed postlink. Therefore, the specific edited and combined page can exist in the form of a Web page, and the developer can release the page link in various ways, for example, the page link can be released to a background with relatively large flow, so that users such as merchants and the like can directly access the link and use the set of functions in the page. In addition, the developer can also directly write the SDK into a specific page code, so that the video clip synthesis processing based on the page can be directly completed in a browser without occupying server resources, thereby avoiding occupying the service cost of the developer. In other words, a developer may provide video clip composition services to its users in a "0-cost" manner.

Of course, the SDK and the structure description protocol provided by the browser technology are not limited to be provided to developers in a product object information system, and may be opened to other developers.

In the above process of providing the SDK based on the browser technology and the related structure description protocol for the developer, specific SDKs may include an SDK for providing a clip function, an SDK for providing a video merge function, and the like. Since the browser also supports the multithreading technology, the above-mentioned multithreaded piecewise parallel composition scheme can be used to shorten the time required for the video composition process with respect to the SDK for providing the video composition function.

From the perspective of system architecture, referring to fig. 1, the embodiment of the present application may provide various SDKs, APIs, structure description protocols, and the like, and may provide the SDKs, APIs, structure description protocols, and the like to a plurality of developers, so that the developers may participate in the development process of a specific video clip development page to implement personalized design on aspects of a page style, a pre-configured back link, and the like. The implementation of the specific clipping functions and the implementation of the video composition can be realized by the SDK without repeated design by a specific developer. In addition, in the embodiment of the present application, a specific SDK may be developed based on a browser technology, so that a developer may develop a video editing and synthesizing tool in a web page form, and may launch a link of a specific web page through multiple channels, so that a user may enter a specific page to perform video editing and synthesizing processing by clicking the link. In the synthesizing process, a multithreading segmented parallel synthesizing scheme can be used to improve the specific video synthesizing processing efficiency.

The following describes in detail a specific technical solution provided in an embodiment of the present application.

Example one

First, the embodiment provides a video clip composition method from the perspective of multi-thread segmented parallel composition of video, and specifically, the execution subject of the method may be a separate video clip composition tool, or the development kit abstracted by the SDK manner described above. The method can be realized through client technology and can also be realized through browser technology.

Specifically, referring to fig. 2, the method may include:

s201: receiving material addition and editing operation of a user through a video editing and combining interface, and determining a video combining scheme, wherein the video combining scheme comprises content to be combined of a plurality of image frames in a video to be combined;

in a specific implementation, the video clip composition interface may specifically be an interface provided in a video composition tool existing in a client form, or, as described above, the video clip composition interface may also be a Web page generated based on a browser technology. In an optional manner, the specific Web page may also be developed and implemented based on the SDK and the structure description protocol provided in the embodiment of the present application.

The editing function SDK may be various, and for example, may include an SDK for adding material or material (video, picture, text, etc.), an SDK for adding special effects such as a flower, a cartoon, etc., an SDK for realizing immersive video (a fill mode in which an edge portion is cut off, etc.), an SDK for realizing video muting (e.g., removing original sound of a video), an SDK for adding music, an SDK for performing video editing (e.g., content of 3 rd to 5 th seconds in an original video material needs to be used, and then the SDK may be realized by capabilities provided by the SDK), and the like.

Each SDK may provide a specific API, so that the developer can implement a call to the corresponding SDK through the specific API in the developed page to implement the corresponding function.

Of course, the structure description protocol may specifically include a description protocol of a structure such as a basic material (a picture, a file, music, a video, a character, and the like), a video special effect (a special effect such as transition, filter, video filling, and the like), a video clip, a splice, and the like. In addition, the concept of material tracks can be introduced simultaneously, specific material tracks are used for receiving materials, different tracks are used for distinguishing picture levels, and time sequence is embodied on the same track. Therefore, the superposition and splicing rendering of various basic materials in time and space can be supported. In addition, the method can also support additional effects such as complex transition, filter, filling mode and the like, and realize functional video editing. Specifically, the method can realize the additional effect (multiple tracks and multiple materials), namely, the method supports multiple tracks, multiple materials in each track are edited, and effect linkage among single or multiple materials is supported.

When a specific developer develops a video clip composition page, various structures can be described by using the structure description protocol, and the SDK of the corresponding function can be called by using a specific API. In addition, it is also possible to directly write a specific clipping function SDK and video composition function SDK into a page code or the like to realize response to material addition and clipping operation by the user in the browser and to perform video composition in the browser.

In the development of the specific video clip composition page, the user can access the specific page and perform specific video clip composition operation. For example, as shown in fig. 3, when a user performs a clipping operation, the user may first upload a specific material such as a picture, a file, a video, and the like to a media asset library, or may use a media asset library, a special effects library, and the like provided by the system. And then, a specific material track can be created, a specific material is selected and dragged to the specific material track, correspondingly, a control bar corresponding to the material can be created by the specific SDK, and the appearance time point and the duration of the corresponding material on the corresponding picture level can be determined by the modes of dragging the position of the control bar on the material track, zooming the length of the control bar and the like. When the next material is added, if the next material needs to be displayed on the same picture level as the previous material, the next material can be directly dragged to the previously created material track, and the position, the length and the like of the next material on the track can be adjusted. Alternatively, a new material track may be created (different material tracks may correspond to the same timeline) so that the next item is presented at another screen level, may overlap with the previous item in time, and so on. Therefore, the design of a plurality of picture levels can be realized through the material track, the appearance time of different materials can be designed in the same picture level, and the like. Therefore, the overlapping design of a plurality of different materials in time and space dimensions is realized, and the display effect of the video is improved.

In addition, the transition special effects of a plurality of different video materials in the same picture level can be designed through the material track. For example, if two videos need to be played continuously in a certain picture level, there may be a partial overlap between the first video and the second video on the time axis, so that when the first video is about to end, the second video starts to be played to achieve the transition effect, and so on.

Furthermore, it should be noted that, in the specific implementation, an SDK for providing a preview function may be further provided, so that, specifically, in the process that a user creates a material track and adds a specific material to the material track, a visual preview video may also be provided (at this time, only the specific content to be synthesized may be converted into a visual picture stream, so that the visual picture stream can be continuously played, but no real video is generated yet), and the effect that each material presents in the current design state is exhibited. In the process of previewing, playing can be paused at any time, content in a specific picture can be dragged, the position of the content in the specific picture in the hierarchy of the specific picture can be changed, and the like. That is, only what contents are in the same screen level and the start time, end time, etc. of each appearance can be determined by the material track, but it cannot be determined where each part of the contents is shown in the screen in the same screen level. Therefore, the adjustment of the display position of the specific content in the screen can be realized by the way of visualizing the preview screen.

In summary, in the foregoing manner, since a plurality of material tracks can be created, the picture level and the start and end times of the added material can be edited through the material tracks, so that a plurality of materials can be overlapped and/or spliced in the time and/or space dimension. In this way, the specifically added material can have information of the picture level, the start time, the end time, the position in the picture, and the like. If the specific material itself also has continuity information, for example, a video-like material, the time information may also have two aspects of information, where on one hand, the start time and the playing time of the video on the specific track correspond to the time axis, and the time is used to determine in what time period the video is to be finally played; another aspect is which time period of content of a particular video is played. For example, a video has 10 seconds of content, and the 3 rd to 5 th seconds of content need to be played in the 10 th to 12 th seconds of the video to be synthesized. The above-mentioned "play is performed in the 10 th to 12 th second time period in the video to be synthesized", and "take the content of the current video in the 3 rd to 5 th second", etc. can be embodied in the description information of the video, and so on. When the specific synthesizing process is performed subsequently, the specific content to be synthesized in each frame can be determined according to the information.

The display style, the position, the style of the operation control and the like of the material track in the specific page can be customized and personalized by a specific developer according to respective requirements. The capability provided by the corresponding SDK according to the embodiment of the present application may be used only when specifically responding to the user's selection, dragging, clipping, and the like.

S202: determining the total duration of a video to be synthesized in the process of executing video synthesis according to the video synthesis request;

after the addition of various materials and the design of the time and the spatial position of the materials are completed, it is known how each frame of the video that the user wants to specifically compose is presented for the composition tool, that is, each frame of the video needs to be composed together with specific content, and so on. Then, the specific content to be synthesized can be converted into a visual picture stream frame by frame, and picture recording is performed through a recorder.

Of course, in the embodiment of the present application, in order to improve the efficiency of the synthesis processing, multi-thread segmented parallel synthesis may be performed. For this reason, in the implementation, after the user completes adding and designing the specific material, the total duration of the video to be synthesized may be determined first. In the specific implementation, in the process of performing the stacking design on the multiple materials in the space and time dimensions through the material tracks, the specific tracks also correspond to the time axis, so that the total time length can be determined according to the time length of the materials added on each track. Specifically, since the starting points of the time axes corresponding to the plurality of tracks are all the same, the time length corresponding to the track with the longest time can be determined as the total time length of the video to be synthesized.

S203: determining a plurality of segment time lengths according to the total time length and the target segment quantity, and creating a plurality of segment synthesis tasks according to the plurality of segment time lengths;

after the total time length of the video to be synthesized is determined, a plurality of segment time lengths can be determined according to the number of target segments, and then a plurality of segment synthesis tasks can be created according to the plurality of segment time lengths. Specifically, the number of target segments may be fixed, or may be set by a developer according to actual conditions. Of course, when the number of segments is set by the developer, the maximum number of segments to be supported may be set, and the developer may set the number of segments within an appropriate range. In specific implementation, the number of segments may be set in a specific page code, and the number of segments is used as a parameter to call a specific synthesized SDK, and so on.

Specifically, there may be multiple ways to determine the multiple segment durations according to the total duration and the number of target segments. For example, if the total duration can be divided exactly by the target number of segments, the division can be made directly so that each segment duration is equal. For example, if the total time length of a video to be synthesized is 100 seconds, and the target segment number is 4, the segment time lengths of the four segments are 25 seconds, respectively.

If the total duration is not exactly divisible by the target number of segments, frame losses at the boundaries between each segment may occur. To avoid this, the segment boundaries may be further processed by rounding the segment durations so that the sum of the segment durations is equal to the total duration. For example, assuming that the total duration of a certain video to be synthesized is 97 seconds, and the number of target segments is 3, the result obtained by directly dividing 97 by 3 is an infinite loop decimal: 32.333 … … are provided. For this case, the three segment durations may be set to 32, 32, 33, respectively, so that the addition result of the three segment durations is strictly equal to the total duration of the video to be synthesized, and so on.

After the plurality of segment durations are determined, a particular video composition task may be split into a plurality of segment composition tasks. For example, in the case where the aforementioned three segment durations are 32, 32, and 33, respectively, then three segment synthesis tasks may be generated, where task 1 is used to synthesize the segments from the 1 st second to the 32 nd second, task 2 is used to synthesize the segments from the 33 rd second to the 64 th second, task 3 is used to synthesize the segments from the 65 th second to the 97 th second, and so on.

S204: performing parallel processing on the plurality of segmented synthetic tasks through a multithreading technology;

after the plurality of segmented composition tasks are determined, the plurality of segmented composition tasks can be processed in parallel through a multithreading technology. That is, multiple threads may be created, each for processing one of the segmented composition tasks, and the multiple threads may be processed in parallel. For example, in a browser technology implementation-based approach, multiple threads may be created in a web-worker based approach for performing multiple segment composition tasks in parallel. The web-worker specifically can start a sub-thread for program processing on the basis of Javascript single-thread execution without influencing the execution of the main thread, and returns to the main thread after the sub-thread is executed, so that the execution process of the main thread is not influenced in the process.

When each thread processes a specific segmentation synthesis task, the content to be synthesized in each image frame can be respectively determined, then the content is converted into a visual image stream, and the visual image stream is sent to a recorder unit for recording so as to generate a specific video image frame.

In specific implementation, specific video synthesis task processing can be realized through the video synthesis function SDK provided in the embodiment of the present application. Specifically, in an implementation manner, the SDK may implement a set of schema (template, that is, a video design scheme generated after a user adds a material and performs a clipping operation) driven video player based on Canvas and the like, and the Canvas may be redrawn at each time. After the user completes the material adding and editing operation, each material can have time attributes such as start time, end time, start playing and the like, so that the player can determine whether the material is drawn or not according to the time attributes. As shown in fig. 4, for a material hitting the playing range at each moment, the image resource of the material may be obtained first, and then the image resource is converted into a visual picture stream, and the visual picture stream is drawn on a canvas. In the process of drawing, the drawing sequence can be determined according to the track hierarchical relation in the schema. For example, a low orbit is drawn first, then a high orbit is drawn, and the high orbit material naturally covers the low orbit material on the canvas.

In a specific implementation manner, based on the recording function of the MediaRecorder, the drawing of the Canvas animation is directly and dynamically output the picture stream according to frames, and the real video is recorded based on the picture stream, so that the obtained (real video) is obtained (preview screen). In addition, ffmpeg transcoding capability may be utilized to produce a standardized piece of video (e.g., in mp4 format, etc.).

The MediaRecorder is a set of API provided for developers to record audio or video, and at present, each end has its own implementation, a MediaRecorder instance can be directly initialized in a modern browser, and its core input is Stream, which, in combination with captureStream, can effectively record Canvas animation as a segment of real video, which is also a specific implementation scheme for converting a template into real video in a browser.

FFmpeg is a set of open source multimedia video processing tools that can be used to record, convert digital audio, video, and convert them into streams. FFmpeg has very powerful functions including video capture functions, video format conversion, video capture, watermarking of video, etc. In the embodiment of the application, the method is mainly used for audio and video decomposition, transcoding and video synthesis.

It should be noted that, in the process of specifically performing video composition, since the composition is performed frame by frame, in order to determine when to update the picture, a refresh mechanism is required. That is, the update of the next frame can be triggered by a periodically occurring event. That is, after the content to be synthesized in one image frame is converted into a visual image stream and delivered to the recorder to generate a specific video frame, the generation of the next video frame is performed when a trigger event of the next period needs to be waited for, so that the video image is updated.

In a video synthesis mode based on a browser technology, one implementation is that the updating of the picture can be performed directly depending on the refresh mechanism of the browser. That is, a browser refresh event may be monitored, and when a browser refresh event of the next period is monitored, the screen of the composite video may be updated. However, this approach may present the following problems: due to switching to other tab pages or minimizing the browser window, etc., the refresh mechanism of the browser is frozen, and at this time, the refresh mechanism cannot be provided for the video composition process. This results in that, during the process of executing specific video composition in the browser, the user is required to wait in the current page, and cannot switch to another tab page, or perform operations such as minimization on the browser.

To avoid this, the embodiments of the present application also provide an improved way. Specifically, in the embodiment of the present application, an audio node may be created based on a browser technology without depending on a refresh mechanism of a browser itself, and the audio node may play a target sound periodically, and in a preferred implementation, a volume gain of the target sound may also be set to 0, so as to avoid causing interference to a user. In this way, the audio node can be used as a refresh mechanism that is relied on in the video composition process. That is, an event of the periodic playing of the target sound may be monitored, a screen refresh operation may be performed when the target sound is played next time, and the like. Therefore, the audio node is not frozen due to page switching, browser minimization and the like, and the audio node can always play the target sound periodically as long as the audio node is not actively ended in a program or a user does not perform operations such as shutdown and the like, so that the user can freely perform operations such as page switching, browser minimization and the like in the process of synthesizing the video, and the user experience is improved.

In the concrete implementation, the web Audio Api is used for starting an Audio node, and a node with the volume of 0 is manufactured to simulate a hardware timer, so that the period of all pixel operations is accurately controlled, and the influence of the inactivated state of a browser is avoided. The web Audio Api has a very powerful function, and can be usually combined with hardware such as a microphone and the like to acquire real Audio in real time and process corresponding Audio nodes, including series of operations such as Audio special effects and clipping. The main reason for selecting this API in the embodiments of the present application is that, using this API, time can be controlled very precisely with little delay, so that developers can control time accurately.

In addition, for the extreme case that the video is played for about 10s in an inactivated state and the picture is possibly still (no hook can be transmitted to a developer), the video can be reloaded at regular time and synchronized to the latest playing progress recorded before, and the recorder is paused until the video is successfully loaded and the first frame of the video is acquired.

It should be noted that the above-mentioned multiple segment composition tasks executed in parallel are mainly composition of video picture contents, and in concrete implementation, since audio contents may also exist in the finally generated video and the audio contents are complete segments, the operations may be complete segments, and frame-by-frame rendering and composition are not required. Therefore, in specific implementation, the audio part can be extracted from the video synthesis scheme designed by the user for independent recording. Specifically, whether audio is added in the video description structure or not or whether the specific video material has sound or not can be detected, if yes, the audio can be separated, and overlapping recording is performed according to the hierarchy, the sequence and the like of the specific separated audio, so that an audio recording result is obtained. Specifically, the recording process may be performed in parallel with each segment synthesizing task.

S205: and performing splicing rendering on the segmented synthetic results respectively corresponding to the plurality of segmented synthetic tasks, and outputting a video synthetic result.

After the plurality of segment synthesis tasks are completed, as shown in fig. 5, segment synthesis results corresponding to the plurality of segment synthesis tasks may be subjected to stitching rendering, and if an audio recording result recorded separately exists, the audio recording result may be further synthesized to generate a final video synthesis result.

In summary, according to the embodiment of the present application, in a video synthesis task execution process driven by a video synthesis scheme (schema), a specific video synthesis task may be divided into a plurality of segment synthesis tasks according to a total duration of videos to be synthesized and a target segment number. In this way, the multiple segment synthesis tasks can be processed in parallel through the multithreading technology, and then the video synthesis results are output by performing splicing rendering on the segment synthesis results respectively corresponding to the multiple segment synthesis tasks. By the mode, multithreading sectional parallel synthesis can be carried out, so that the video synthesis efficiency can be improved, and the time required by video synthesis can be shortened.

Example two

The second embodiment provides a video clip composition method mainly from the perspective of capability (i.e. specific SDK, structure description protocol, etc.) provider, and referring to fig. 6, the method may include:

s601: providing a Software Development Kit (SDK) based on a browser technology, an Application Programming Interface (API) of the SDK and a structure description protocol for a plurality of developers, wherein the SDK comprises the SDK for providing a video clip function and the SDK for a video synthesis function, so that the developers can develop a video clip synthesis page displayed based on the browser by using the API and the structure description protocol and write the SDK into a page code;

s602: responding to the added materials and the clipping operation of the user in a browser through the SDK of the video clip function in the process of displaying the video clip composite page to the user;

s603: and after receiving the video synthesis request, performing video synthesis processing in the browser through the SDK of the video synthesis function.

Specifically, the SDK may further include an SDK for providing a preview function, and in this case, in the process of responding to the material addition and the clipping operation of the user, the video picture preview content may be provided through the SDK for providing the video picture based visual clipping.

Specifically, when video synthesis processing is performed in the browser, a plurality of contents to be synthesized corresponding to a plurality of frames of images in the video to be synthesized are determined according to the added materials and the clipping operation of the user, so that the video to be synthesized is recorded frame by frame; when recording the current image frame, respectively converting a plurality of contents to be synthesized corresponding to the current image frame into visual image streams, and providing the visual image streams to the recorder unit for recording the current image frame.

In addition, in the specific implementation, when the video synthesis capability provided by the specific SDK is provided, the multithreading segmented parallel synthesis scheme described in the first embodiment may be further adopted, so as to improve the synthesis efficiency. For example, specifically, the total duration of the video to be synthesized may be determined according to the material addition and the clipping operation of the user; then, determining a plurality of segment time lengths according to the total time length and the target segment quantity, and creating a plurality of segment synthesis tasks according to the plurality of segment time lengths; then, the multiple segmented synthetic tasks can be processed in parallel through a browser multithreading technology; and finally, performing splicing rendering on the segmented synthetic results respectively corresponding to the plurality of segmented synthetic tasks, and outputting a video synthetic result.

Specifically, the API corresponding to the SDK of the video synthesis function may be associated with a segment number parameter, so that the developer specifies the target segment number according to actual requirements.

For the parts not described in detail in the second embodiment, reference may be made to the description in the first embodiment, and details are not repeated here.

It should be noted that, in the embodiments of the present application, the user data may be used, and in practical applications, the user-specific personal data may be used in the scheme described herein within the scope permitted by the applicable law, under the condition of meeting the requirements of the applicable law and regulations in the country (for example, the user explicitly agrees, the user is informed, etc.).

Corresponding to the first embodiment, the embodiment of the present application further provides a video clip composition apparatus, referring to fig. 7, which may specifically include:

a video composition scheme determining unit 701, configured to receive material addition and a clipping operation of a user through a video clipping composition interface, and determine a video composition scheme, where the video composition scheme includes content to be composed of multiple image frames in a video to be composed;

a total duration determining unit 702, configured to determine a total duration of a video to be synthesized in a process of performing video synthesis according to the video synthesis request;

a segment synthesis task creating unit 703, configured to determine a plurality of segment durations according to the total duration and the target segment number, and create a plurality of segment synthesis tasks according to the plurality of segment durations;

a parallel processing unit 704, configured to perform parallel processing on the multiple segment synthesis tasks through a multithreading technique;

a splicing rendering unit 705, configured to perform splicing rendering on the segmentation synthesis results corresponding to the multiple segmentation synthesis tasks, respectively, and output a video synthesis result.

The page code of the video clip composite page comprises an SDK, and the SDK is used for providing a video clip function and a video composite function for the video clip composite page; the SDK is common to multiple developers.

Specifically, the apparatus may further include:

and the audio node creating unit is used for creating an audio node based on a browser technology after receiving the synthesis request, wherein the audio node is used for periodically playing the target sound to serve as a refreshing mechanism depended on in the video synthesis process.

The segmentation and synthesis task creation unit may specifically be configured to:

Specifically, the clipping operation includes: and creating a plurality of material tracks, and editing the picture level and the start and end time of the added material through the material tracks so as to overlap and/or splice a plurality of materials in the time and/or space dimension.

In addition, the apparatus may further include:

and the previewing unit is used for providing video picture preview content in the process of responding to the material adding and clipping operations of the user so as to perform video picture based on preview and perform visual clipping on the position of the material content in the picture.

In addition, the present application also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method described in any of the preceding method embodiments.

And an electronic device comprising:

one or more processors; and

a memory associated with the one or more processors for storing program instructions that, when read and executed by the one or more processors, perform the steps of the method of any of the preceding method embodiments.

Fig. 8 illustrates an architecture of an electronic device, which may include, in particular, a processor 810, a video display adapter 811, a disk drive 812, an input/output interface 813, a network interface 814, and a memory 820. The processor 810, the video display adapter 811, the disk drive 812, the input/output interface 813, the network interface 814, and the memory 820 may be communicatively connected by a communication bus 830.

The processor 810 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solution provided by the present Application.

The Memory 820 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 820 may store an operating system 821 for controlling operation of the electronic device 800, a Basic Input Output System (BIOS) for controlling low-level operation of the electronic device 800. In addition, a web browser 823, a data storage management system 824, and a video clip composition system 825, etc. may also be stored. The video clip composition system 825 may be an application program that implements the operations of the foregoing steps in this embodiment. In summary, when the technical solution provided in the present application is implemented by software or firmware, the relevant program codes are stored in the memory 820 and called for execution by the processor 810.

The input/output interface 813 is used for connecting an input/output module to realize information input and output. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The network interface 814 is used for connecting a communication module (not shown in the figure) to realize communication interaction between the device and other devices. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).

Bus 830 includes a pathway for communicating information between various components of the device, such as processor 810, video display adapter 811, disk drive 812, input/output interface 813, network interface 814, and memory 820.

It should be noted that although the above-mentioned devices only show the processor 810, the video display adapter 811, the disk drive 812, the input/output interface 813, the network interface 814, the memory 820, the bus 830, etc., in a specific implementation, the devices may also include other components necessary for normal operation. Furthermore, it will be understood by those skilled in the art that the apparatus described above may also include only the components necessary to implement the solution of the present application, and not necessarily all of the components shown in the figures.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The video clip synthesis method and the electronic device provided by the present application are introduced in detail, and a specific example is applied in the text to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understanding the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, the specific embodiments and the application range may be changed. In view of the above, the description should not be taken as limiting the application.

Claims

1. A method of video clip composition, comprising:

2. The method of claim 1,

the video clip composition interface is generated and presented based on browser technology, and is responsive to a user's material addition and clipping operations in a browser, and performs video composition in the browser.

3. The method of claim 2,

4. The method of claim 2, further comprising:

5. The method of claim 1,

determining a plurality of segment durations according to the total duration and the number of segments, including:

6. The method according to any one of claims 1 to 5,

the clipping operation includes: and creating a plurality of material tracks, and editing the picture level and the start and end time of the added material through the material tracks so as to overlap and/or splice a plurality of materials in the time and/or space dimension.

7. The method of any of claims 1 to 5, further comprising:

8. A method of video clip composition, comprising:

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.

10. An electronic device, comprising:

one or more processors; and

a memory associated with the one or more processors for storing program instructions that, when read and executed by the one or more processors, perform the steps of the method of any of claims 1 to 7.