CN112822544A

CN112822544A - Video material file generation method, video synthesis method, device and medium

Info

Publication number: CN112822544A
Application number: CN202011633376.8A
Authority: CN
Inventors: 刘春宇
Original assignee: Guangzhou Kugou Computer Technology Co Ltd
Current assignee: Guangzhou Kugou Computer Technology Co Ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-05-18
Anticipated expiration: 2040-12-31
Also published as: CN112822544B

Abstract

The embodiment of the application discloses a video material file generation method, a video synthesis method, video synthesis equipment and a video material file generation medium, and belongs to the technical field of video processing. The method comprises the following steps: acquiring at least one video; for each image of the video, responding to a sub-image interception operation of the image, and acquiring a sub-image of the image and position information of the sub-image in the corresponding image; and generating a video material file based on the sub-image and the position information, wherein the video material file is used for being analyzed by the terminal to be synthesized into a target video. In the process of adding materials into the target video, the video is decoded to obtain the added sub-images, so that the time consumed for obtaining the sub-images is saved, the efficiency of adding the sub-images into the target video is improved, and the decoding resources are saved.

Description

Video material file generation method, video synthesis method, device and medium

Technical Field

The embodiment of the application relates to the technical field of video processing, in particular to a video material file generation method, a video synthesis method, video synthesis equipment and a video material file medium.

Background

With the rapid development of video processing technology, a user can add different materials into a target video, so that the target video with the added materials is more attractive.

The terminal can acquire at least one video, each video comprises a plurality of sub-images, each sub-image can be used as a material added in the target video, the terminal decodes the at least one video simultaneously to obtain the sub-images included in each video, and the obtained sub-images are synthesized to the target video according to the playing sequence of the corresponding video.

However, since at least one video needs to be decoded at the same time, it takes a long time to synthesize the sub-images included in the video into the target video, which is inefficient.

Disclosure of Invention

The embodiment of the application provides a video material file generation method, a video synthesis method, a device and a medium, so that the time consumed for obtaining sub-images is saved, the efficiency of adding the sub-images to a target video is improved, and decoding resources are saved. The technical scheme is as follows:

in one aspect, a method for generating a video material file is provided, where the method includes:

acquiring at least one video;

for each image of the video, responding to a sub-image interception operation of the image, and acquiring a sub-image of the image and position information of the sub-image in the corresponding image;

and generating a video material file based on the sub-image and the position information, wherein the video material file is used for being analyzed by the terminal to be synthesized into a target video.

Optionally, for each image of the video, in response to a sub-image clipping operation on the image, acquiring a sub-image of the image and position information of the sub-image in the corresponding image includes:

displaying an image of any one of the videos;

in response to the intercepting operation of the image of any one of the videos, displaying an area selection frame on the image of any one of the videos;

and responding to the dragging operation of the area selection frame, and intercepting the sub-image in the area selection frame when the dragging operation is finished to obtain the sub-image of the image of any one video and the position information of the sub-image in the corresponding image.

Optionally, the displaying an image of any one of the videos includes:

and according to the playing sequence of each video, after the interception of the last video is finished, displaying the images of the videos of which the playing sequences are positioned behind the last video.

and responding to the triggering operation of an image capturing option, carrying out image recognition on each image of the video, acquiring the recognized image as a sub-image of the image, and acquiring the position information of the recognized image in the corresponding image as the position information of the sub-image in the corresponding image.

and if the shape of the sub-images of the at least two images is the same and the positions of the sub-images in the images are different, merging the sub-images of the at least two images and the corresponding position information.

Optionally, after, for each image of the video, in response to a sub-image clipping operation on the image, acquiring a sub-image of the image and position information of the sub-image in the corresponding image, the method further includes:

determining the playing starting point of the corresponding image of the sub-image in the video to which the corresponding image belongs as the playing starting point of the corresponding sub-image;

determining the playing end point of the corresponding image of the sub-image in the video to which the corresponding image belongs as the playing end point of the corresponding sub-image;

generating a video material file based on the sub-image and the position information, comprising:

and generating the video material file based on the sub-image, the position information and the playing starting point and the playing ending point of the sub-image.

Optionally, the playing start point of each sub-image is a playing start time point of a corresponding image of each sub-image in the video to which the corresponding image belongs, or the playing start point of each sub-image is a playing start time stamp of the corresponding image of each sub-image in the video to which the corresponding image belongs;

the playing end point of each sub-image is the playing end time point of the corresponding image of each sub-image in the video to which the corresponding image belongs, or the playing end point of each sub-image is the playing end time stamp of the corresponding image of each sub-image in the video to which the corresponding image belongs.

Optionally, the generating a video material file based on the sub-image and the position information includes:

correspondingly storing the name and the position information of the sub-image in a position sub-file;

and compressing the position sub-file and the sub-image to generate the video material file.

Optionally, after generating the video material file based on the sub-image and the position information, the method further includes:

and sending the video material file to a server, and storing the video material file by the server.

In another aspect, a video composition method is provided, the method including:

acquiring a video material file, wherein the video material file comprises at least one sub-image and position information of each sub-image;

analyzing the video material file to obtain the at least one sub-image and the position information of each sub-image;

and synthesizing the at least one sub-image to a target video based on the position information of each sub-image.

Optionally, the obtaining the video material file includes:

and acquiring the video material file from a server.

Optionally, the analyzing the video material file to obtain the at least one sub-image and the position information of each sub-image includes:

analyzing the video material file to obtain the at least one sub-image, the position information of each sub-image, and the playing starting point and the playing ending point of each sub-image;

the synthesizing the at least one sub-image to the target video based on the position information of each sub-image comprises:

and based on the playing starting point and the playing ending point of each sub-image, superposing the at least one sub-image on the video picture of the corresponding point in the target video according to the position information of each sub-image.

Optionally, after the analyzing the video material file to obtain the at least one sub-image, the position information of each sub-image, and the play start point and the play end point of each sub-image, the method further includes:

displaying the at least one sub-image, the position information of each sub-image, and the playing start point and the playing end point of each sub-image;

modifying at least one of the play start point, the play end point, and the position information of each sub-image in response to a modification operation;

the superimposing, based on the play start point and the play end point of each sub-image, the at least one sub-image on the video frame of the corresponding point in the target video according to the position information of each sub-image, including:

and based on the modified playing start point and playing end point of each sub-image and the position information of each sub-image, superposing the at least one sub-image on a video picture corresponding to the playing point and the playing end point in the target video according to the modified position information.

In another aspect, there is provided a video material file generating apparatus, the apparatus comprising:

the video acquisition module is used for acquiring at least one video;

the intercepting module is used for responding to the intercepting operation of the sub-image of the image for each image of the video and acquiring the sub-image of the image and the position information of the sub-image in the corresponding image;

and the file generation module is used for generating a video material file based on the subimages and the position information, and the video material file is used for being analyzed by the terminal to be synthesized into a target video.

Optionally, the intercepting module includes:

an image display unit for displaying an image of any one of the videos;

a selection frame display unit configured to display an area selection frame on an image of the any one of the videos in response to a clipping operation on the image of the any one of the videos;

and the intercepting unit is used for responding to the dragging operation of the area selection frame, intercepting the sub-image in the area selection frame when the dragging operation is finished, and obtaining the sub-image of the image of any one video and the position information of the sub-image in the corresponding image.

Optionally, the image display unit is configured to display, according to a playing sequence of each video, an image of a video whose playing sequence is located after a last video after the last video is cut.

Optionally, the capture module is configured to perform, in response to a trigger operation on an image capture option, image recognition on each image of the video, acquire the recognized image as a sub-image of the image, and acquire position information of the recognized image in a corresponding image as position information of the sub-image in the corresponding image.

Optionally, the intercepting module is configured to merge the sub-images of the at least two images and the corresponding position information if the sub-images of the at least two images have the same shape and have different positions in the images.

Optionally, the apparatus further comprises:

the time determining module is used for determining the playing starting point of the corresponding image of the sub-image in the video to which the corresponding image belongs as the playing starting point of the corresponding sub-image;

the time determining module is used for determining the playing end point of the corresponding image of the sub-image in the video to which the corresponding image belongs as the playing end point of the corresponding sub-image;

and the file generation module is used for generating the video material file based on the subimages, the position information and the playing starting point and the playing ending point of the subimages.

Optionally, a playing start point of a corresponding image of the sub-image in the video to which the corresponding image belongs is a playing start time point of the corresponding image of the sub-image in the video to which the corresponding image belongs, or the playing start point is a playing start time stamp of the corresponding image of the sub-image in the video to which the corresponding image belongs;

the playing end point of the corresponding image of the sub-image in the video to which the corresponding image belongs is the playing end time point of the corresponding image of the sub-image in the video to which the corresponding image belongs, or the playing end point is the playing end time stamp of the corresponding image of the sub-image in the video to which the corresponding image belongs.

Optionally, the file generating module is configured to:

Optionally, the apparatus further comprises:

and the file sending module is used for sending the video material file to a server, and the server stores the video material file.

In another aspect, a video compositing apparatus is provided, the apparatus comprising:

the file acquisition module is used for acquiring a video material file, and the video material file comprises at least one sub-image and the position information of each sub-image;

the analysis module is used for analyzing the video material file to obtain the at least one sub-image and the position information of each sub-image;

and the synthesizing module is used for synthesizing the at least one sub-image into the target video based on the position information of each sub-image.

Optionally, the file obtaining module is configured to obtain the video material file from a server.

Optionally, the parsing module is configured to parse the video material file to obtain the at least one sub-image, the position information of each sub-image, and a playing start point and a playing end point of each sub-image;

and the synthesizing module is used for superposing the at least one sub-image on a video picture of a corresponding starting point and an ending point in the target video according to the position information of each sub-image based on the playing starting point and the playing ending point of each sub-image.

Optionally, the apparatus further comprises:

a display module, configured to display the at least one sub-image, the position information of each sub-image, and a play start point and a play end point of each sub-image;

a modification module, configured to modify at least one of the play start point, the play end point, and the position information of each sub-image in response to a modification operation;

and the synthesizing module is used for superposing the at least one sub-image to a video picture corresponding to the starting point and the ending point in the target video according to the modified position information based on the modified playing starting point and the modified playing ending point of each sub-image and the position information of each sub-image.

In another aspect, there is provided a computer apparatus comprising a processor and a memory, the memory having stored therein at least one program code, the at least one program code being loaded and executed by the processor to implement the video material file generation method according to the above aspect or to implement the video composition method according to the above aspect.

In another aspect, there is provided a computer readable storage medium having stored therein at least one program code, which is loaded and executed by a processor, to implement the video material file generation method according to the above aspect, or to implement the video composition method according to the above aspect.

In yet another aspect, there is provided a computer program product or a computer program comprising computer program code stored in a computer-readable storage medium, the computer program code being read by a processor of a computer device from the computer-readable storage medium, the computer program code being executed by the processor to cause the computer device to implement the video material file generation method according to the above aspect or to cause the computer device to implement the video composition method according to the above aspect.

According to the video material file generation method, the video synthesis equipment and the media, the sub-image is intercepted from the image of any one of at least one video, the terminal can combine the intercepted sub-image with the position information of the sub-image in the corresponding image to generate the video material file, other terminals can analyze the video material file to synthesize the video material file into the target video, the effect of adding the material into the target video is achieved, the video is decoded to obtain the added sub-image in the process of adding the material into the target video, the time consumed for obtaining the sub-image is saved, the efficiency of adding the sub-image into the target video is improved, and decoding resources are saved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic structural diagram of an implementation environment provided in an embodiment of the present application.

Fig. 2 is a flowchart of a video material file generation method according to an embodiment of the present application.

Fig. 3 is a flowchart of a video synthesis method according to an embodiment of the present application.

Fig. 4 is a flowchart of a video material file generation method according to an embodiment of the present application.

Fig. 5 is a schematic diagram of a display area selection box according to an embodiment of the present application.

Fig. 6 is a schematic diagram of a display interface provided in an embodiment of the present application.

Fig. 7 is a schematic interface diagram of a video material file according to an embodiment of the present application.

Fig. 8 is a flowchart of a video material file generation method according to an embodiment of the present application.

Fig. 9 is a flowchart of a video synthesis method according to an embodiment of the present application.

Fig. 10 is a schematic diagram of a video frame according to an embodiment of the present application.

Fig. 11 is a flowchart of a method for generating a video material file and synthesizing a video according to an embodiment of the present application.

Fig. 12 is a schematic structural diagram of a video material file generation apparatus according to an embodiment of the present application.

Fig. 13 is a schematic structural diagram of another video material file generation apparatus according to an embodiment of the present application.

Fig. 14 is a schematic structural diagram of a video compositing apparatus according to an embodiment of the present application.

Fig. 15 is a schematic structural diagram of another video compositing apparatus according to an embodiment of the present application.

Fig. 16 is a schematic structural diagram of a terminal according to an embodiment of the present application.

Fig. 17 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail with reference to the accompanying drawings.

It will be understood that, as used herein, the terms "first," "second," "third," "fourth," "fifth," "sixth," and the like may be used herein to describe various concepts, which are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, a first arrangement order may be referred to as a second arrangement order, and a second arrangement order may be referred to as a first arrangement order, without departing from the scope of the present application.

As used herein, the terms "each," "plurality," "at least one," "any," and the like, at least one of which comprises one, two, or more than two, and a plurality of which comprises two or more than two, each refer to each of the corresponding plurality, and any refer to any one of the plurality. For example, the plurality of elements includes 3 elements, each of which refers to each of the 3 elements, and any one of the 3 elements may refer to any one of the 3 elements, which may be a first element, a second element, or a third element.

Fig. 1 is a schematic structural diagram of an implementation environment provided in an embodiment of the present application. Referring to fig. 1, the implementation environment includes a terminal 101 and a server 102, which are connected through a wireless or wired network.

The terminal 101 captures a sub-image of an image of any one of the videos, acquires position information of the sub-image in a corresponding image, and can generate a video material file based on the acquired sub-image and the position information. The terminal 101 can also transmit the generated video material file to the server 102, the server 102 being configured to store the received video material file, and the server 102 can also transmit the video material file to other terminals, which parse the video material file for composition into the target video.

The terminal in the embodiment of the application is a plurality of types of terminals such as a mobile phone, a tablet computer, a computer and the like, and the server is a server, or a server cluster consisting of a plurality of servers, or a cloud computing service center.

The method provided by the embodiment of the application is applied to a video editing scene, a user can watch any video through the terminal, the user can edit the video to add any material in the video, the control terminal obtains the video material file provided by the embodiment of the application, and the terminal can analyze the video material file to be synthesized into a target video to add the material in the target video.

Fig. 2 is a flowchart of a video material file generation method according to an embodiment of the present application. Referring to fig. 2, the method includes:

201. the terminal acquires at least one video.

Any one of the videos comprises a plurality of images, and the plurality of images are sequentially ordered from front to back to form the video. The video is a material video, and sub-images in the video can be overlaid to video pictures of other videos so as to beautify the video pictures of other videos.

Optionally, videos in the embodiment of the application are stored in a server, and a terminal can obtain at least one video from the server; or, the video in the embodiment of the application is stored in the terminal, and the terminal can directly acquire the stored video.

202. For each image of the video, the terminal responds to the sub-image interception operation of the image, and acquires the sub-image of the image and the position information of the sub-image in the corresponding image.

In the embodiment of the application, the terminal can display the images of each video, determine the area corresponding to the sub-image interception operation if the sub-image interception operation on any image is detected, and acquire the sub-image in the area and the position information of the sub-image in the corresponding image.

For example, the sub-image cutout operation in the embodiment of the present application includes a combination operation of a long-press operation and a slide operation, or an operation of selecting a frame for a display area and dragging the frame for the display area, or another type of operation. The intercepting operation is not described in the embodiment of the present application, and for a specific description, refer to the following embodiments.

The intercepted sub-image is a partial image or a whole image of the corresponding image. The position information is used to indicate the position of the sub-image in the corresponding image. For example, the position information is represented by the coordinates, width and height of the top left corner of the sub-image, or the position information is represented by the coordinates, width and height of the top right corner of the sub-image, or the position information is represented by the coordinates, width and height of the center of the sub-image, or in other ways.

For example, if the position information is represented by the upper left coordinate, the width, and the height of the sub-image, the upper left coordinate may be (60, 100), the width may be 100, and the height may be 100. Alternatively, the coordinates at the upper left corner are (50, 60), the width is 60, and the height is 100.

203. And the terminal generates a video material file based on the sub-image and the position information.

The video material file is used for being analyzed by the terminal to be synthesized into the target video.

In the embodiment of the application, the terminal can analyze the video material file to acquire the sub-image and the position information synthesized to the target video, the video does not need to be decoded, and the energy consumption of the terminal can be saved.

Optionally, the video material file is a compressed file, and after the terminal acquires the video material file, the terminal can analyze the video material file to obtain sub-images and position information included in the video material file, and then synthesize the sub-images to the target video according to the position information.

According to the method provided by the embodiment of the application, the sub-image is intercepted from the image of any one of the at least one video, the terminal can combine the intercepted sub-image and the position information of the sub-image in the corresponding image to generate the video material file, other terminals can analyze the video material file to synthesize the video material file into the target video, the effect of adding the material into the target video is achieved, the video is decoded to obtain the added sub-image in the process of adding the material into the target video is not needed, the time consumed for obtaining the sub-image is saved, the efficiency of adding the sub-image into the target video is improved, and decoding resources are saved.

Fig. 3 is a flowchart of a video synthesis method according to an embodiment of the present application. Referring to fig. 3, the method includes:

301. and the terminal acquires the video material file.

The video material file comprises at least one sub-image and the position information of each sub-image.

The video material file in the embodiment of the present application is the same as the video material file in step 203, and is not described herein again.

302. And the terminal analyzes the video material file to obtain at least one sub-image and the position information of each sub-image.

In the embodiment of the application, the video material file is stored in a file form, and after the terminal acquires the video material file, the terminal can analyze the video material file.

In addition, the position information of the sub-image and the sub-image in the embodiment of the present application is the same as the position information of the sub-image and the sub-image in step 202, and is not described herein again.

303. The terminal synthesizes at least one sub-image to the target video based on the position information of each sub-image.

The position information of the sub-images is used for indicating the positions of the sub-images in the corresponding images, and then the terminal can add the sub-images to the positions, corresponding to the position information, in the target video according to the position information of each sub-image, so that the effect of adding the sub-images in the target video is achieved.

According to the method provided by the embodiment of the application, the video material file is analyzed to obtain the sub-image and the corresponding position information, the sub-image can be synthesized into the target video according to the position information, the video is decoded to obtain the added sub-image in the process of adding the material into the target video, the time consumed for obtaining the sub-image is saved, the efficiency of adding the sub-image into the target video is improved, and the decoding resources are saved.

Fig. 4 is a flowchart of a video material file generation method according to an embodiment of the present application. Referring to fig. 4, the method includes:

401. the terminal acquires at least one video.

In order to reduce resources consumed by decoding a video, the method and the device for decoding the video can intercept an image included in the video in advance so as to compress the intercepted sub-image and the position of the sub-image in the corresponding image to generate a video material file, reduce resource consumption of the terminal for decoding the video, and acquire at least one video before the video material file is generated.

The at least one video acquired by the terminal is the material video, the images in the material video comprise the subimages, and the subimages in the images are intercepted, so that the intercepted subimages can be synthesized into other video pictures, and other video contents are richer.

Alternatively, if the video is stored in the server, the terminal sends a video acquisition request to the server, the server can send the video to the terminal based on the video acquisition request, and the terminal acquires the video sent by the server.

For example, the terminal displays at least one candidate video, and acquires at least one video in a selected state in response to a selection operation for any one of the candidate videos. Wherein the selection operation is a single click operation, a double click operation, a long press operation or other types of operations.

Alternatively, if the video is stored in the terminal, the terminal directly displays the stored video. For example, the terminal displays stored videos, and in response to a selection operation for any one of the videos, acquires at least one video in a selected state.

In the embodiment of the application, the terminal can acquire at least one video in any one of the following manners:

(1) the videos are classified according to types, for example, the types of the videos include cartoon type, landscape type, music material type, artistic word type, and the like. The terminal can obtain the video according to the type of the video in the process of obtaining at least one video.

Optionally, the terminal displays at least one video in an interface corresponding to each type according to the type of the video, displays at least one video corresponding to the target type in response to a trigger operation on the target type, and determines to acquire the video in the selected state in response to a selection operation on any one of the videos. According to the embodiment of the application, videos can be obtained according to types, the video material files comprising the sub-images of the same type can be generated subsequently, and the uniformity of the sub-images included in the video material files can be guaranteed.

(2) The videos are arranged according to the time sequence, so that the terminal can acquire the videos according to the sequence of the release time of the videos in the process of acquiring at least one video.

For example, the terminal displays the videos in the order of their distribution time from first to last, and acquires a video in a selected state in response to a selection operation for any one of the videos. According to the embodiment of the application, the video can be acquired according to the release time of the video, the video material file comprising the newly released sub-image can be generated subsequently, the timeliness of the included sub-image is guaranteed, and the timeliness of the generated video material file is further improved.

(3) The videos are arranged according to the heat value of the videos, so that the terminal can acquire the videos from high to low according to the heat value of the videos in the process of acquiring at least one video. Wherein, the heat value is used for representing the hot degree of the video, the higher the heat value is, the hotter the video is, and the lower the heat value is, the colder the video is.

For example, the terminal displays the videos in the order of the heat value of the videos from high to low, and acquires the video in the selected state in response to a selection operation for any one of the videos. In the embodiment of the application, the video can be acquired according to the heat value of the video, the video material file comprising the sub-image with high heat degree can be generated subsequently, and the utilization rate of the video material file can be improved.

402. The terminal displays an image of any one of the videos.

In the embodiment of the present application, any one of the videos includes a plurality of images, and the terminal can sequentially display any one of the images in any one of the videos.

After the terminal acquires at least one video, the terminal can decode the video to acquire a plurality of images included in each video, and then can display the images in any one video.

In addition, since a video is composed of a plurality of images in the order of arrangement from front to back, in the course of displaying an image of any one video, the images are sequentially displayed in the order of arrangement of the plurality of images in the video. Optionally, according to the playing sequence of each video, after the previous video is intercepted, the images of the videos whose playing sequences are located after the previous video are displayed.

The terminal in the embodiment of the application acquires at least one video based on the application interface, and displays images included in each video based on the at least one video. Wherein the application interface is provided by a target application having a video material generation function. Optionally, in the process of displaying the image of at least one video based on the application interface, the terminal sequentially displays the images of the videos according to the selected sequence of each video, or sequentially displays the images of the videos according to a sequence designated by a user, or sequentially displays the images of the videos according to other sequences. And the terminal displays a plurality of images of a video, executes 403 and 404 to intercept sub-images in the images, and then executes 405 to generate a video material file.

It should be noted that 403-404 only take the example of capturing any image in one of the videos, and for other images in the video or images of other videos, the images are captured according to 403-404, which is not limited in the embodiment of the present application.

403. The terminal displays an area selection frame on the image of any one of the videos in response to a clipping operation on the image of any one of the videos.

In the embodiment of the application, after the terminal displays an image of any one video, if the image interception operation is detected, it is determined that the user needs to intercept the image, and an area selection frame is displayed in the image of the video. For example, as shown in fig. 5, after the terminal responds to a clipping operation on any one of the videos, a rectangular area selection box is displayed on the image.

Optionally, the area selection box includes various shapes, such as a square, a rectangle, a heart, or other shapes, which is not limited in this embodiment.

In the embodiment of the application, the terminal responds to the intercepting operation of the image of any one video, displays the area selection frame of at least one shape, the user can select any one shape from the at least one shape, the selection operation is carried out on the selected shape, and the terminal responds to the selection operation, and displays the area selection frame corresponding to the shape in the image of any one video.

When the user selects the shape of the area selection frame, the area selection frame with the similar shape can be selected according to the shape of the sub-image to be intercepted in the image, and the accuracy of intercepting the sub-image by adopting the area selection frame with the shape is improved.

In some embodiments, the terminal further provides a function of setting the area selection box in a user-defined manner, and the user can perform a circle selection operation through the terminal, wherein the circle selection operation is selected from the area selection boxes with the defined shape, and the terminal displays the corresponding area selection boxes with the user-defined shape in response to the circle selection operation.

For example, in the process of displaying the image, the user can slide the edge of the sub-image as required to make the position where the sliding starts coincide with the starting position, so as to complete the selection of the sub-image, and then the terminal displays the area selection frame formed by the sliding.

Optionally, in the process of displaying an image of any video, if a combination key operation of the input device is detected, the terminal enters an image capturing state, the user can select an area in the interface for displaying the image by circle, and the terminal displays an area selection frame in response to the selection by circle. For example, the combination key operation is "ctrl + alt", "ctrl + alt + f", or other combination key.

It should be noted that, in the embodiment of the present application, the display of the area selection box according to the clipping operation is merely taken as an example for description, and in another embodiment, after the area selection box is displayed by the terminal, the size of the displayed area selection box can be adjusted in response to the adjustment operation.

In the embodiment of the present application, the manner of adjusting the size of the region selection box includes, but is not limited to, the following manners:

(1) after the terminal detects the dragging operation on the edge of the area selection frame, the size of the area selection frame is adjusted in response to the dragging operation, and then the edge of the area selection frame is adjusted to the stop position of the dragging operation.

For example, the terminal may control the size of the area selection frame to be expanded in the right direction when detecting a rightward dragging operation to the right edge of the area selection frame, control the size of the area selection frame to be reduced in the downward direction when detecting a downward dragging operation to the upper edge of the area selection frame, control the size of the area selection frame to be expanded in the upward and rightward directions when detecting an outward dragging operation to the upper right edge of the area selection frame, or control the terminal to expand or reduce the size of the area selection frame according to other operations, which is not limited in the embodiment of the present application.

For example, as shown in fig. 6, after the mouse pointer points to the edge of the area selection box, a double-headed arrow shown in fig. 6 is displayed, and the user can trigger a drag operation to adjust the size of the area selection box.

(2) And after the terminal detects the adjustment operation on the edge of the area selection frame, displaying the size adjustment window in a floating mode, responding to the input operation on the size adjustment window, and adjusting the size of the area selection frame according to the size input by the input operation.

Alternatively, the center coordinates of the area selection frame, the width and the height of the area selection frame are displayed in the size adjustment window with reference to the center of the area selection frame, and the terminal can adjust the size of the area selection frame based on the coordinates, the width or the height input in the size adjustment window.

404. And the terminal responds to the dragging operation of the area selection frame, intercepts the sub-image in the area selection frame when the dragging operation is finished, and obtains the sub-image of the image of any one video and the position information of the sub-image in the corresponding image.

In the embodiment of the application, the terminal displays the area selection frame, and the user can also adjust the position of the area selection frame by triggering the dragging operation of the area selection frame, so that the sub-image included in the adjusted area selection frame is the sub-image needing to be intercepted.

If the user needs to adjust the position of the area selection frame, the terminal triggers the dragging operation of the area selection frame, and after the terminal detects the dragging operation, the terminal drags the area selection frame to the end position of the dragging operation, and then intercepts the sub-image in the area selection frame to obtain the sub-image and the position information of the sub-image in the corresponding image.

The terminal records the position information of the sub-image, and then can add the sub-image to other videos according to the position information. In addition, if the image of any video includes a plurality of sub-images, the terminal can execute step 403 and step 404 multiple times to intercept the plurality of sub-images and determine the position of each sub-image in the corresponding image.

It should be noted that, in the embodiments of the present application, the sub-image capturing operation is performed as an example of an operation of manually controlling the area selection box to capture the sub-image. In another embodiment, the terminal is capable of automatically truncating the sub-image in the image in response to the sub-image truncating operation. Optionally, the terminal performs, in response to a trigger operation on the image capturing option, image recognition on an image of each video, acquires the recognized image as a sub-image of the image, and acquires position information of the recognized image in the corresponding image as position information of the sub-image in the corresponding image.

In the embodiment of the application, an image interception option is included in an interface for displaying the image of the video, and the image interception option is used for triggering a sub-image interception operation for automatically intercepting the image. If the terminal detects the triggering operation of the image capturing option, determining that the image in the image needs to be captured, performing image recognition on the image at the moment, and executing the step of performing image recognition on the image to obtain the sub-image. The terminal can identify the graph by adopting a straight line extraction method, or identify the graph by adopting an image segmentation method, or identify the graph by adopting other modes.

The terminal in the embodiment of the application can automatically perform image recognition on the image based on the detected subimage intercepting operation so as to automatically intercept the subimage, improve the accuracy of the subimage intercepting, simplify the operation flow of the subimage intercepting and improve the efficiency of the subimage intercepting.

In some embodiments, the sub-images are intercepted from the image of the video through the steps, after the position information of the sub-images in the corresponding image is determined, the intercepted sub-images and the corresponding position information can be merged, and on the premise that the effect of storing the sub-images is not affected, the number of the sub-images is reduced, so that the data volume of the stored sub-images is reduced, and the utilization rate of the storage space is improved.

Optionally, if the shape of the sub-images of the at least two images is the same and the positions in the images are different, the sub-images of the at least two images and the corresponding position information are merged.

Wherein merging the sub-images of the at least two images and the corresponding position information comprises:

and merging the sub-images with the same shape into one sub-image, and storing the position information of each sub-image in the sub-images with the same shape and the merged sub-image in parallel to ensure that one sub-image corresponds to a plurality of position information.

In one possible implementation manner, one sub-image is randomly selected from the sub-images with the same shape as the merged sub-image, the other sub-images are deleted, the position information of the other sub-images is reserved, and the reserved position information and the merged sub-image are correspondingly stored.

In another possible implementation manner, one sub-image with the highest image quality is selected from the sub-images with the same shape as the merged sub-image, the other sub-images are deleted, the position information of the other sub-images is reserved, and the reserved position information is stored in correspondence with the merged sub-image.

In another possible implementation manner, the sub-images with the same shape are averaged to obtain the merged sub-image, and the position information of each sub-image in the sub-images with the same shape is stored in correspondence with the merged sub-image.

For example, in the embodiment of the present application, if there are three sub-images with the same shape and different positions in the image, the three sub-images are merged to obtain one merged sub-image, the position information of the three sub-images is retained, and the position information of the three sub-images and the merged sub-image are stored correspondingly.

405. And the terminal generates a video material file based on the sub-image and the position information.

In the embodiment of the application, the terminal can intercept the subimages in the image and can also acquire the position information of each subimage in the corresponding image, at the moment, each subimage has the corresponding position information, and the intercepted subimages and the corresponding position information are stored in the video material file.

In some embodiments, fig. 7 is a schematic interface diagram of a video material file provided by an embodiment of the present application, and as shown in fig. 7, the video material file includes a square file, a circle file, a pentagram file, and a location subfile. Correspondingly, the following processes are adopted when the video material file is generated: and correspondingly storing the name and the position information of the picture file into a position subfile, and compressing the square picture file, the circular picture file, the five-pointed star picture file and the position subfile to generate a video material file. For example, the location subfile is template.

It should be noted that the sub-image in the embodiment of the present application may be any image. The embodiments of the present application are described by taking a circle, a pentagram, and a square as examples.

The name of the sub-image is named by the shape included in the sub-image, or the name of the sub-image is named by the intercepting sequence, or named by other modes. For example, the file header of the position subfile includes the coordinates of the sitting side, the coordinates of the top, the width, the height, and the name of the sub-image, and the position information of each sub-image is sequentially recorded in the lower portion of the position subfile in the order of the coordinates of the sitting side, the coordinates of the top, the width, the height, and the name of the sub-image. The left coordinate is represented by "left", the top coordinate is represented by "top", the width is represented by "width", and the height is represented by "height".

In the embodiment of the application, after the terminal generates the video material file, the video material file is sent to the server, the server stores the video material file, and other subsequent terminals can acquire the video material file from the server and further analyze the video material file to be synthesized into the target video.

It should be noted that, in the embodiment of the present application, the terminal executes the

steps

401 and 405 only as an example, and the terminal can execute the

steps

401 and 405 through the template material editor to generate the video material file. In addition, the terminal can also adopt the template material player to analyze the video material file after the video material file is generated by the template material editor, and play at least one sub-image in the video material file.

In addition, the embodiment of the application can also determine the sub-images with the same shape and different position information, and combine the determined sub-images and the corresponding position information, so that the number of the sub-images is reduced on the premise of not influencing the effect of storing the sub-images, the data volume of the stored sub-images is further reduced, and the utilization rate of the storage space is improved.

In addition, the terminal in the embodiment of the application can automatically perform image recognition on the image based on the detected sub-image interception operation so as to automatically intercept the sub-image, thereby improving the accuracy of intercepting the sub-image, simplifying the operation flow of intercepting the sub-image and improving the efficiency of intercepting the sub-image.

In the above embodiment, the terminal generates the video material file according to the sub-image and the position information, but in the following embodiment, the terminal can further obtain a playing start point and a playing end point of a corresponding image of the captured sub-image in the video to which the corresponding image belongs, and further generate the video material file based on the sub-image, the position information, and the obtained playing start point and playing end point, please refer to the embodiment in fig. 8 in detail:

fig. 8 is a flowchart of a video material file generation method according to an embodiment of the present application. Referring to fig. 8, the method includes:

801. the terminal acquires at least one video.

802. For each image of the video, the terminal responds to the sub-image interception operation of the image, and acquires the sub-image of the image and the position information of the sub-image in the corresponding image.

The process of steps 801-802 is similar to that of steps 401-404, and will not be described herein again.

803. And the terminal determines the playing starting point of the corresponding image of the sub-image in the video to which the corresponding image belongs as the playing starting point of the corresponding sub-image.

In this embodiment of the present application, the terminal performs an image capturing operation to capture a sub-image, where the image corresponding to the sub-image has a playing point in the video to which the terminal belongs, so that the terminal can determine a playing start point of the image in the video to which the terminal belongs, and determine the playing start point as a playing start point of the corresponding sub-image.

In some embodiments, the playing start point is a playing start time point of a corresponding image of the sub-image in the video to which the corresponding image belongs, or the playing start point is a playing start time stamp of the corresponding image of the sub-image in the video to which the corresponding image belongs.

The terminal captures the sub-image, the video is also in a playing state, and the playing time point of the corresponding image of the first captured sub-image when the video is played currently is determined as the playing starting point of the sub-image.

Or each video comprises a plurality of video frames, each video frame corresponds to a time stamp, and the time stamp represents the original playing time point of the video, so that the terminal acquires the time stamp corresponding to the video to which the image corresponding to the first intercepted sub-image belongs in the process of intercepting the sub-images, and determines the time stamp as the playing starting point of the sub-image.

For example, for any video, if the playback start point of the image of the capture sub-image in the video is 0 seconds, the playback start point of the sub-image captured from the image is also 0 seconds, or if the playback start point of the image of the capture sub-image in the video is 3 seconds, the playback start point of the sub-image captured from the image is also 3 seconds.

804. And the terminal determines the playing end point of the corresponding image of the sub-image in the video to which the terminal belongs as the playing end point of the corresponding sub-image.

In this embodiment of the application, the terminal can determine not only the playing start point of the sub-image but also the playing end point of the sub-image through step 803, and the terminal can determine the playing end point of the image in the video to which the terminal belongs, and then can determine the playing end point as the playing end point of the corresponding sub-image.

In some embodiments, the playing end point is a playing end time point of the corresponding image of the sub-image in the video to which the corresponding image belongs, or the playing end point is a playing end time stamp of the corresponding image of the sub-image in the video to which the corresponding image belongs.

The terminal captures the sub-images, the video is also in a playing state, and the playing time point of the corresponding image of the last captured sub-image when the video is played currently is determined as the playing end point of the sub-image.

Or each video comprises a plurality of video frames, each video frame corresponds to a time stamp, and the time stamp represents the original playing time point of the video, so that the terminal acquires the time stamp corresponding to the image corresponding to the last intercepted sub-image in the video to which the sub-image belongs in the process of intercepting the sub-image, and determines the time stamp as the playing end point of the sub-image.

It should be noted that, in the present application, the execution order of 803 and 804 is not sequential, 803 is executed before 804, or 803 and 804 are executed simultaneously, or 803 is executed after 804.

805. And the terminal generates a video material file based on the subimage, the position information and the playing starting point and the playing ending point of the subimage.

In this embodiment, the process of generating the video material file by the terminal based on the sub-image, the position information, and the playing start point and the playing end point of the sub-image is similar to the process of generating the video material file in step 405, and is not described herein again.

It should be noted that, the difference between step 805 and step 405 is that the names of the sub-images, the position information, the playing start point, and the playing end point are stored in the position sub-file, and the position sub-file and the sub-images are compressed to obtain the video material file.

According to the method provided by the embodiment of the application, the sub-images can be intercepted from the images of each video, the position information of the sub-images in the corresponding images can be determined, the playing starting point and the playing ending point of the sub-images can also be determined, and further, based on the video material file which comprises the sub-images, the position information and the playing starting point and the playing ending point of the sub-images, other terminals can respectively add the sub-images to the target video according to the playing starting point and the playing ending point, and the accuracy of adding the sub-images is improved. In addition, the video is decoded to obtain the added sub-images in the process of adding materials into the target video, so that the time consumed for obtaining the sub-images is saved, the efficiency of adding the sub-images into the target video is improved, and the decoding resources are saved.

The terminal may further synthesize the video based on the already produced video material file to obtain a video with an additional display effect, and the synthesizing process is described below with an embodiment shown in fig. 9. Fig. 9 is a flowchart of a video synthesis method according to an embodiment of the present application. Referring to fig. 9, the method includes:

901. and the terminal acquires the video material file.

The video material file comprises at least one sub-image, position information of each sub-image, and a playing starting point and a playing ending point of each sub-image.

Optionally, at least one video material file is stored in the server, and the terminal acquires the video material file from the server after sending a file acquisition instruction to the server.

In the embodiment of the application, the terminal can acquire the video material file by adopting any one of the following modes:

(1) when the server stores the video material file, the video material file can be stored according to the type of the video material file, and then the terminal can acquire the video material file based on the type of the video material file.

The types of the video material files comprise cartoon types, landscape painting types, music material types, artistic word types and the like.

(2) The server arranges the video material files according to the sequence of the generation time of the video material files, and the terminal acquires the video material files according to the sequence of the generation time of the video material files.

For example, the terminal acquires a preset number of video material files in the order of generation time of the video material files from first to last. The preset number is set by a terminal, a server, an operator or other modes.

(3) The server stores the heat values of the video material files, and the terminal can acquire videos from high to low according to the heat values of the video material files in the process of acquiring at least one video material file. Wherein, the heat value is used for representing the hot degree of the video material file, the higher the heat value is, the hotter the video material file is, and the lower the heat value is, the colder the video material file is.

902. And analyzing the video material file to obtain at least one sub-image, the position information of each sub-image, and the playing starting point and the playing ending point of each sub-image.

In this embodiment of the present application, the video material file includes at least one sub-image, position information of each sub-image, and a playing start point and a playing end point of each sub-image, and the terminal analyzes the acquired video material file to obtain the at least one sub-image, the position information of each sub-image, and the playing start point and the playing end point of each sub-image included in the video material file.

Optionally, the video material file includes at least one sub-image and a position sub-file, when the terminal parses the video material file, the terminal first acquires the at least one sub-image and the position sub-file, and then parses the position sub-file, so as to obtain the position information of each sub-image included in the position sub-file, and the play start point and the play end point of each sub-image.

In some embodiments, the playing start point of each sub-image is a playing start time point of the corresponding image of each sub-image in the video to which the sub-image belongs, or the playing start point of each sub-image is a playing start time stamp of the corresponding image of each sub-image in the video to which the sub-image belongs.

The playing start point of each sub-image is the same as the playing start point of step 803 in the above embodiment, and is not described herein again.

903. And the terminal superposes at least one sub-image on the video picture of the corresponding starting point and ending point in the target video according to the position information of each sub-image based on the playing starting point and the playing ending point of each sub-image.

In the embodiment of the application, the terminal determines the position of each sub-image in the image, and the playing starting point and the playing ending point of each sub-image, so that the terminal can superimpose each sub-image on a video picture in the target video according to the position information, the playing starting point and the playing ending point, and the effect of adding the sub-images in the target video is achieved.

For example, if the playback start point of the sub-image is 0 second and the playback end point is 2 seconds, the sub-image is superimposed on the video screen within a time period of 0 second to 2 seconds of the target video.

For example, if the video material file includes a square, a circle, and a pentagon, the video frame is displayed as an interface as shown in fig. 10 after the sub-image is superimposed on the video frame in the above manner.

It should be noted that, the embodiments of the present application are only described by taking as an example at least one sub-image in the video material file, the position information of each sub-image, and the play start point and the play end point of each sub-image. In another embodiment, the terminal is further capable of modifying at least one of a play start point, a play end point, and position information of each sub-image, and compositing the sub-images into the target video based on the modified information.

Optionally, at least one sub-image, position information of each sub-image, and a play start point and a play end point of each sub-image are displayed, at least one of the play start point, the play end point, and the position information of each sub-image is modified in response to a modification operation, and based on the modified play start point and play end point of each sub-image and the position information of each sub-image, at least one sub-image is superimposed on a video picture of a corresponding start point and end point in the target video according to the modified position information.

In some embodiments, the playing end point of each sub-image is a playing end time point of the corresponding image of each sub-image in the video to which the corresponding image belongs, or the playing end point of each sub-image is a playing end time stamp of the corresponding image of each sub-image in the video to which the corresponding image belongs.

The playing end point of each sub-image is the same as the playing end point in step 804 in the above embodiment, and is not described herein again.

In the embodiment of the application, after the terminal acquires the video material file, the terminal analyzes the video material file to acquire at least one sub-image, the position information of each sub-image, and the playing start point and the playing end point of each sub-image, the terminal can also display the analyzed information, the user can modify the information according to the requirement, if the modification operation executed by the user is detected by the terminal, based on the modification operation, and the terminal superposes at least one sub-image on a video picture of a corresponding starting point and an ending point in the target video based on the modified position information of each sub-image and the playing starting point and the playing ending point of each sub-image.

For example, if the user needs to modify the position information of the sub-image, perform a modification operation on the position information, and input the modified position information in the terminal, the terminal can modify the sub-image based on the modified position information. For another example, if the user needs to modify the playing start point of the sub-image, the modification operation on the playing start point of the sub-image is executed, and the modified playing start point is input in the terminal, so that the terminal can modify the sub-image based on the modified playing start point.

According to the method provided by the embodiment of the application, the obtained video material file comprises at least one sub-image, the position information of each sub-image, and the playing starting point and the playing ending point of each sub-image, so that the sub-images can be superposed in the target video corresponding to the starting point and the ending point according to the position information of the sub-images on the basis of the playing starting point and the playing ending point of each sub-image, the accuracy of superposing the sub-images in the target video can be improved, the video is decoded to obtain the added sub-images in the process of adding materials in the target video, the time consumed by obtaining the sub-images is saved, the efficiency of adding the sub-images to the target video is improved, and the decoding resources are saved.

In addition, according to the method provided by the embodiment of the application, the terminal can display the position information of each sub-image and the playing starting point and the playing ending point of each sub-image, and the user can modify at least one of the position information of each sub-image and the playing starting point and the playing ending point of each sub-image based on the displayed information, so that the function of editing the video material file is expanded, the user can also freely control the playing duration of the video material file, and the flexibility of adjusting the video material file is improved.

The above-mentioned fig. 4 embodiment and fig. 9 embodiment respectively describe the process of making a video material file by a terminal, and the process of further synthesizing a video based on the made video material file by the terminal to obtain a video with an additional display effect, and the following fig. 11 embodiment describes the process of comprehensively making a video material file and the process of synthesizing a video based on a video material file. Fig. 11 is a flowchart of a method for generating a video material file and synthesizing a video according to an embodiment of the present application. Referring to fig. 11, the method includes:

1101. the first terminal acquires at least one video.

1102. The first terminal responds to the sub-image intercepting operation of the image for each image of the video, and acquires the sub-image of the image and the position information of the sub-image in the corresponding image.

1103. And the first terminal generates a video material file based on the sub-image and the position information.

1104. And the first terminal sends the video material file to the server.

The steps 1101-1104 in the embodiment of the present application are similar to the step 405 described above, and are not described herein again.

1105. The server stores the video material files.

1106. And the second terminal acquires the video material file from the server.

1107. And the second terminal analyzes the video material file to obtain at least one sub-image and the position information of each sub-image.

1108. The second terminal synthesizes at least one sub-image to the target video based on the position information of each sub-image.

The steps 1106-1108 in the embodiment of the present application are similar to the steps 901-903, and are not described herein again.

It should be noted that the first terminal in the embodiment of the present application is a terminal used by a developer, and the developer can perform the above-mentioned playing to generate a video material file. The second terminal is a terminal used by any user, and the second terminal is controlled to acquire the video material file, so that the second terminal can analyze the video material file and synthesize at least one sub-image included in the video material file into a target video.

The embodiment shown in fig. 11 is an example in which the first terminal generates a video material file from at least one video, and the second terminal uses the generated video material file. The source of the at least one video acquired by the first terminal will be described below.

In the related art, if the second terminal needs to add a video material to the target video, at least one video is selected, the selected at least one video is synthesized to the target video, and the effect of adding the material to the target video is completed.

Fig. 12 is a schematic structural diagram of a video material file generation apparatus according to an embodiment of the present application. Referring to fig. 12, the apparatus includes:

a video obtaining module 1201, configured to obtain at least one video;

the capture module 1202 is configured to, for each image of the video, obtain a sub-image of the image and position information of the sub-image in a corresponding image in response to a sub-image capture operation on the image;

and a file generating module 1203, configured to generate a video material file based on the sub-image and the position information, where the video material file is used for the terminal to parse and synthesize the video material file into the target video.

Optionally, referring to fig. 13, intercept module 1202 comprises:

an image display unit 12021 for displaying an image of any one of the videos;

a selection frame display unit 12022 for displaying an area selection frame on the image of any one of the videos in response to a clipping operation on the image of any one of the videos;

the intercepting unit 12023 is configured to, in response to the dragging operation on the area selection box, intercept the sub-image in the area selection box when the dragging operation is ended, and obtain the sub-image of the image of any one of the videos and the position information of the sub-image in the corresponding image.

Optionally, the image display unit 12021 is configured to display, according to the playing order of each video, after the previous video is cut, an image of a video whose playing order is after the previous video.

Optionally, the capturing module 1202 is configured to perform, in response to a triggering operation on an image capturing option, image recognition on an image of each video, acquire the recognized image as a sub-image of the image, and acquire position information of the recognized image in the corresponding image as position information of the sub-image in the corresponding image.

Optionally, the intercepting module 1202 is configured to merge the sub-images of the at least two images and the corresponding position information if the sub-images of the at least two images have the same shape and have different positions in the images.

Optionally, referring to fig. 13, the apparatus further comprises:

a time determining module 1204, configured to determine a playing start point of a corresponding image of the sub-image in the video to which the corresponding image belongs as a playing start point of the corresponding sub-image;

a time determining module 1204, configured to determine a playing end point of a corresponding image of the sub-image in the video to which the corresponding image belongs as a playing end point of the corresponding sub-image;

the file generating module 1203 is configured to generate a video material file based on the sub-image, the position information, and the playing start point and the playing end point of the sub-image.

Optionally, the playing start point of the corresponding image of the sub-image in the video to which the corresponding image of the sub-image belongs is the playing start time point of the corresponding image of the sub-image in the video to which the corresponding image of the sub-image belongs, or the playing start point is the playing start time stamp of the corresponding image of the sub-image in the video to which the corresponding image of the sub-image belongs;

Optionally, the file generating module 1203 is configured to:

and compressing the position subfiles and the subimages to generate a video material file.

Optionally, referring to fig. 13, the apparatus further comprises:

the file sending module 1205 is configured to send the video material file to the server, and the server stores the video material file.

It should be noted that: the video material file generating apparatus provided in the above embodiment is only illustrated by dividing the above functional modules when generating a video material file, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the terminal is divided into different functional modules to complete all or part of the above described functions. In addition, the embodiment of the video material file generation apparatus provided in the above embodiment and the embodiment of the video material file generation method belong to the same concept, and specific implementation processes thereof are described in detail in the method embodiment and are not described herein again.

Fig. 14 is a schematic structural diagram of a video compositing apparatus according to an embodiment of the present application. Referring to fig. 14, the apparatus includes:

a file obtaining module 1401, configured to obtain a video material file, where the video material file includes at least one sub-image and position information of each sub-image;

the analysis module 1402 is configured to analyze the video material file to obtain at least one sub-image and position information of each sub-image;

a composition module 1403, configured to compose at least one sub-image to the target video based on the position information of each sub-image.

Optionally, the file obtaining module 1401 is configured to obtain the video material file from a server.

Optionally, the parsing module 1402 is configured to parse the video material file to obtain at least one sub-image, position information of each sub-image, and a playing start point and a playing end point of each sub-image;

the synthesizing module 1403 is configured to superimpose at least one sub-image on the video frame corresponding to the start point and the end point in the target video according to the position information of each sub-image based on the play start point and the play end point of each sub-image.

Optionally, the playing start point of each sub-image is a playing start time point of the corresponding image of each sub-image in the video to which the corresponding image belongs, or the playing start point of each sub-image is a playing start time stamp of the corresponding image of each sub-image in the video to which the corresponding image belongs;

Optionally, referring to fig. 15, the apparatus further comprises:

a display module 1404 for displaying at least one sub-picture, position information of each sub-picture, and a play start point and a play end point of each sub-picture;

a modification module 1405 for modifying at least one of the play start point, the play end point, and the position information of each sub-image in response to the modification operation;

a synthesizing module 1403, configured to superimpose at least one sub-image on the video frame corresponding to the start point and the end point in the target video according to the modified position information based on the modified play start point and play end point of each sub-image and the position information of each sub-image.

It should be noted that: in the video compositing apparatus provided in the above embodiment, when compositing a target video, only the division of the above functional modules is taken as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the terminal is divided into different functional modules to complete all or part of the above described functions. In addition, the embodiment of the video synthesis apparatus provided in the above embodiment and the embodiment of the video synthesis method belong to the same concept, and specific implementation processes thereof are described in the method embodiment and are not described herein again.

The disclosed embodiments also provide a computer device comprising a processor and a memory, wherein at least one program code is stored in the memory, and the at least one program code is loaded and executed by the processor to implement the video material file generation method as in the above embodiments, or to implement the video composition method as in the above embodiments.

Optionally, the computer device is provided as a terminal. Fig. 16 is a schematic structural diagram of a terminal according to an embodiment of the present application. The terminal 1600 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 1600 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.

The terminal 1600 includes: a processor 1601, and a memory 1602.

Processor 1601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 1601 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). Processor 1601 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also referred to as a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1601 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing content that the display screen needs to display. In some embodiments, the processor 1601 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 1602 may include one or more computer-readable storage media, which may be non-transitory. The memory 1602 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 1602 is used to store at least one program code for execution by the processor 1601 to implement a video material file generation method, or a video composition method, provided by method embodiments herein.

In some embodiments, the terminal 1600 may also optionally include: peripheral interface 1603 and at least one peripheral. Processor 1601, memory 1602 and peripheral interface 1603 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 1603 via buses, signal lines, or circuit boards. Specifically, the peripheral device includes: at least one of a radio frequency circuit 1604, a display 1605, a camera assembly 1606, audio circuitry 1607, a positioning assembly 1608, and a power supply 1609.

Peripheral interface 1603 can be used to connect at least one I/O (Input/Output) related peripheral to processor 1601 and memory 1602. In some embodiments, processor 1601, memory 1602, and peripheral interface 1603 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1601, the memory 1602 and the peripheral device interface 1603 may be implemented on a separate chip or circuit board, which is not limited by this embodiment.

The Radio Frequency circuit 1604 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 1604 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 1604 converts the electrical signal into an electromagnetic signal to be transmitted, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1604 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1604 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 1604 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display 1605 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1605 is a touch display screen, the display screen 1605 also has the ability to capture touch signals on or over the surface of the display screen 1605. The touch signal may be input to the processor 1601 as a control signal for processing. At this point, the display 1605 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 1605 can be one, disposed on the front panel of the terminal 1600; in other embodiments, the display screens 1605 can be at least two, respectively disposed on different surfaces of the terminal 1600 or in a folded design; in other embodiments, display 1605 can be a flexible display disposed on a curved surface or a folded surface of terminal 1600. Even further, the display 1605 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 1605 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or other materials.

The camera assembly 1606 is used to capture images or video. Optionally, camera assembly 1606 includes a front camera and a rear camera. The front camera is arranged on the front panel of the terminal, and the rear camera is arranged on the back of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1606 can also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuitry 1607 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1601 for processing or inputting the electric signals to the radio frequency circuit 1604 to achieve voice communication. For stereo sound acquisition or noise reduction purposes, the microphones may be multiple and disposed at different locations of terminal 1600. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1601 or the radio frequency circuit 1604 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuit 1607 may also include a headphone jack.

The positioning component 1608 is configured to locate a current geographic Location of the terminal 1600 for purposes of navigation or LBS (Location Based Service). The Positioning component 1608 may be a Positioning component based on the Global Positioning System (GPS) in the united states, the beidou System in china, or the galileo System in russia.

Power supply 1609 is used to provide power to the various components of terminal 1600. Power supply 1609 may be alternating current, direct current, disposable or rechargeable. When power supply 1609 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 1600 also includes one or more sensors 1610. The one or more sensors 1610 include, but are not limited to: acceleration sensor 1611, gyro sensor 1612, pressure sensor 1613, fingerprint sensor 1614, optical sensor 1615, and proximity sensor 1616.

The acceleration sensor 1611 may detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the terminal 140. For example, the acceleration sensor 1611 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 1601 may control the display screen 1605 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1611. The acceleration sensor 1611 may also be used for acquisition of motion data of a game or a user.

Gyroscope sensor 1612 can detect the organism direction and the turned angle of terminal 1600, and gyroscope sensor 1612 can gather the 3D action of user to terminal 1600 with acceleration sensor 1611 in coordination. From the data collected by the gyro sensor 1612, the processor 1601 may perform the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 1613 may be disposed on the side frames of terminal 1600 and/or underlying display 1605. When the pressure sensor 1613 is disposed on the side frame of the terminal 1600, a user's holding signal of the terminal 1600 can be detected, and the processor 1601 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 1613. When the pressure sensor 1613 is disposed at the lower layer of the display 1605, the processor 1601 controls the operability control on the UI interface according to the pressure operation of the user on the display 1605. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 1614 is configured to collect a fingerprint of the user, and the processor 1601 is configured to identify the user based on the fingerprint collected by the fingerprint sensor 1614, or the fingerprint sensor 1614 is configured to identify the user based on the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 1601 authorizes the user to perform relevant sensitive operations including unlocking a screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 1614 may be disposed on the front, back, or side of the terminal 1600. When a physical key or vendor Logo is provided on the terminal 1600, the fingerprint sensor 1614 may be integrated with the physical key or vendor Logo.

The optical sensor 1615 is used to collect ambient light intensity. In one embodiment, the processor 1601 may control the display brightness of the display screen 1605 based on the ambient light intensity collected by the optical sensor 1615. Specifically, when the ambient light intensity is high, the display luminance of the display screen 1605 is increased; when the ambient light intensity is low, the display brightness of the display screen 1605 is adjusted down. In another embodiment, the processor 1601 may also dynamically adjust the shooting parameters of the camera assembly 1606 based on the ambient light intensity collected by the optical sensor 1615.

A proximity sensor 1616, also referred to as a distance sensor, is disposed on the front panel of terminal 1600. The proximity sensor 1616 is used to collect the distance between the user and the front surface of the terminal 1600. In one embodiment, the processor 1601 controls the display 1605 to switch from the light screen state to the clear screen state when the proximity sensor 1616 detects that the distance between the user and the front surface of the terminal 1600 is gradually decreased; when the proximity sensor 1616 detects that the distance between the user and the front surface of the terminal 1600 is gradually increased, the display 1605 is controlled by the processor 1601 to switch from the breath-screen state to the bright-screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 16 is not intended to be limiting of terminal 1600, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be employed.

Optionally, the computer device is provided as a server. Fig. 17 is a schematic structural diagram of a server according to an exemplary embodiment, where the server 1700 may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 1701 and one or more memories 1702, where the memory 1702 stores at least one program code, and the at least one program code is loaded and executed by the processors 1701 to implement the methods provided by the above method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

An embodiment of the present application further provides a computer-readable storage medium, where at least one program code is stored in the computer-readable storage medium, and the at least one program code is loaded and executed by a processor to implement the video material file generation method of the foregoing embodiment or to implement the video composition method of the foregoing embodiment.

The embodiments of the present application also provide a computer program product or a computer program, which includes a computer program code stored in a computer-readable storage medium, a processor of a computer device reads the computer program code from the computer-readable storage medium, and the processor executes the computer program code, so that the computer device implements the video material file generation method as described in the above embodiments, or so that the computer device implements the video composition method as described in the above embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only an alternative embodiment of the present application and is not intended to limit the present application, and any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for generating a video material file, the method comprising:

acquiring at least one video;

2. The method according to claim 1, wherein for each image of the video, acquiring a sub-image of the image and position information of the sub-image in the corresponding image in response to a sub-image clipping operation on the image comprises:

displaying an image of any one of the videos;

3. The method of claim 2, wherein said displaying an image of any of said videos comprises:

4. The method according to claim 1, wherein for each image of the video, acquiring a sub-image of the image and position information of the sub-image in the corresponding image in response to a sub-image clipping operation on the image comprises:

5. The method according to claim 1, wherein for each image of the video, acquiring a sub-image of the image and position information of the sub-image in the corresponding image in response to a sub-image clipping operation on the image comprises:

6. The method according to claim 1, wherein for each image of the video, after acquiring a sub-image of the image and position information of the sub-image in the corresponding image in response to a sub-image clipping operation on the image, the method further comprises:

7. The method according to claim 6, wherein the playing start point of the corresponding image of the sub-image in the video to which the corresponding image belongs is a playing start time point of the corresponding image of the sub-image in the video to which the corresponding image belongs, or the playing start point is a playing start time stamp of the corresponding image of the sub-image in the video to which the corresponding image belongs;

8. The method of claim 1, wherein generating a video material file based on the sub-image and the location information comprises:

9. The method of claim 1, wherein after generating a video material file based on the sub-image and the location information, the method further comprises:

10. A method for video compositing, the method comprising:

11. The method of claim 10, wherein parsing the video material file to obtain the at least one sub-image and the position information of each sub-image comprises:

and based on the playing starting point and the playing ending point of each sub-image, superposing the at least one sub-image on the video picture of the corresponding starting point and ending point in the target video according to the position information of each sub-image.

12. The method according to claim 11, wherein the playing start point of each sub-image is a playing start time point of the corresponding image of each sub-image in the video to which the corresponding image belongs, or the playing start point of each sub-image is a playing start time stamp of the corresponding image of each sub-image in the video to which the corresponding image belongs;

13. The method of claim 11, wherein after parsing the video material file to obtain the at least one sub-image, the position information of each sub-image, and the play-out start point and the play-out end point of each sub-image, the method further comprises:

and based on the modified playing starting point and playing ending point of each sub-image and the position information of each sub-image, overlapping the at least one sub-image to the video picture of the corresponding starting point and ending point in the target video according to the modified position information.

14. An apparatus for generating a video material file, the apparatus comprising:

the video acquisition module is used for acquiring at least one video;

15. A video compositing apparatus, characterized in that the apparatus comprises:

16. A computer device comprising a processor and a memory, the memory having stored therein at least one program code, the at least one program code being loaded into and executed by the processor to implement the video material file generation method of any one of claims 1 to 9 or to implement the video composition method of any one of claims 10 to 13.

17. A computer-readable storage medium having stored therein at least one program code loaded and executed by a processor to implement the video material file generating method according to any one of claims 1 to 9 or to implement the video composing method according to any one of claims 10 to 13.