WO2022061806A1

WO2022061806A1 - Film production method, terminal device, photographing device, and film production system

Info

Publication number: WO2022061806A1
Application number: PCT/CN2020/118084
Authority: WO
Inventors: 朱梦龙; 刘志鹏; 朱高
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2020-09-27
Filing date: 2020-09-27
Publication date: 2022-03-31
Also published as: CN113841417A; CN113841417B

Abstract

Embodiments of the present application disclose a film production method, comprising: obtaining semantic information of a target material video, the semantic information at least comprising semantic information of an external material video obtained from a photographing device; determining, according to the semantic information, video clip information required by film production; obtaining target video clips corresponding to the video clip information, wherein the target video clips at least comprise video clips of the external material video obtained from the photographing device; and producing a film by using the target video clips. The method disclosed in the embodiments of the present application can solve the technical problem that existing automatic editing requires long waiting time of users.

Description

Movie generation method, terminal device, shooting device and movie generation system

technical field

The present application relates to the technical field of audio and video processing, and in particular, to a film generation method, a terminal device, a shooting device, a film generation system, and a computer-readable storage medium.

Background technique

Automatic editing provides great convenience for users who need to edit videos. Automatic editing means that the machine can automatically select suitable video clips, background music, transition effects, video effects, etc. and edit them into a film. This process requires no user operation or only simple operation by the user. However, the existing automatic editing is slow and requires users to wait for a long time.

SUMMARY OF THE INVENTION

In view of this, embodiments of the present application provide a movie generation method, a terminal device, a shooting device, a movie generation system, and a computer-readable storage medium, so as to solve the existing technical problem of requiring users to wait for a long time for automatic editing.

A first aspect of the embodiments of the present application provides a method for generating a movie, including:

Acquiring semantic information of the target material video, the semantic information at least includes: the semantic information of the external material video obtained from the shooting device;

According to the semantic information, determine the video segment information required to generate the movie;

Acquiring a target video clip corresponding to the video clip information, wherein the target video clip at least includes: a video clip of the external material video obtained from the shooting device;

A movie is generated using the target video segment.

A second aspect of the embodiments of the present application provides a method for generating a movie, including:

Obtain the semantic information of the target material video;

sending the semantic information to a terminal device, wherein the semantic information is used by the terminal device to determine video segment information required for generating a movie;

acquiring the video clip information sent by the terminal device, and editing the target material video according to the video clip information to obtain a target video clip;

The target video clip is transmitted to the terminal device, so that the terminal device generates a movie using the target video clip.

A third aspect of the embodiments of the present application provides a terminal device, including:

A communication interface for communicating with the photographing device;

A processor and a memory storing a computer program that, when executed by the processor, implements the following steps:

A movie is generated using the target video segment.

A fourth aspect of the embodiments of the present application provides a photographing device, including:

Camera, used to shoot material video;

Communication interface for communicating with terminal equipment;

Obtain the semantic information of the target material video;

A fifth aspect of the embodiments of the present application provides a movie generation system, including:

The terminal device is used to obtain semantic information of the target material video, the semantic information at least includes: the semantic information of the external material video obtained from the shooting device; according to the semantic information, determine the video segment information required for generating the movie; A target video clip corresponding to the video clip information, wherein the target video clip at least includes: a video clip of the external material video obtained from the shooting device; and generating a movie by using the target video clip;

a shooting device, configured to acquire the semantic information of the external material video; send the semantic information to the terminal device; acquire the video clip information sent by the terminal device, and analyze the The target material video is edited to obtain a target video segment; and the target video segment is transmitted to the terminal device.

A sixth aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the movie generation method provided in the first aspect.

A seventh aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the movie generation method provided in the second aspect.

The movie generation method provided by the embodiment of the present application does not require the shooting device to first transmit the target material video that may be used to generate the movie to the terminal device, but the terminal device can first obtain the semantic information of the target material video from the shooting device, Using semantic information to determine the required video clip information, it is only necessary to obtain the target video clip corresponding to the video clip information from the shooting device, and it is not necessary to transmit all the target material videos, which greatly reduces the user's waiting time and improves the automatic The speed of the clip.

Description of drawings

In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative labor.

FIG. 1 is a schematic diagram of a scenario provided by an embodiment of the present application.

FIG. 2 is a flowchart of a method for generating a movie provided by an embodiment of the present application.

FIG. 3 is an interaction diagram of the method for generating a movie provided by an embodiment of the present application.

FIG. 4 is another flowchart of the method for generating a movie provided by an embodiment of the present application.

FIG. 5 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.

FIG. 6 is a schematic structural diagram of a photographing device provided by an embodiment of the present application.

FIG. 7 is a schematic structural diagram of a movie generation system provided by an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

With the development of Internet technology, people are more and more keen to share and record their own life. People use mobile phones, cameras and other devices to shoot, and use the captured materials to edit videos and share them on social platforms. However, it takes a lot of time for users to edit a watchable video. For example, it is necessary to select suitable video clips from the video, select music that matches the video content, and the transition timing needs to match the rhythm of the music, etc. .

Automatic editing provides great convenience for users who need to edit videos. Automatic editing means that the machine can automatically select suitable video clips, background music, transition effects, video effects, etc. and edit them into a film. This process requires no user operation or only simple operation by the user. The automatic clipping function can be implemented in an application program (APP), which can be installed on a terminal device and run on hardware such as a processor and a memory of the terminal device.

In some cases, the material video required for automatic editing is not on the terminal device where automatic editing is performed. For example, the material video may be in a shooting device that shoots the material video, and the shooting device is another device independent of the terminal device, such as a camera, an action camera, a handheld gimbal camera, or a drone equipped with a camera. Because the shooting equipment usually has the characteristics of small screen and inconvenient networking, the automatic editing processing is often carried out on the terminal equipment. The terminal device can be a mobile phone, a tablet or a personal computer.

Since the automatic editing process is performed at the terminal device, and the material video used for forming a film is stored on other shooting devices, during automatic editing, the terminal device needs to obtain the required material video from the shooting device. In the related art, the shooting device will first transmit all the material videos that may be used for film formation to the terminal equipment, and it will take a lot of time to transmit all the material videos that may be used for film formation to the terminal equipment.

For ease of understanding, reference may be made to FIG. 1 , which is a schematic diagram of a scenario provided by an embodiment of the present application. In the example in Figure 1, the terminal device can be a mobile phone, a PC, or a tablet computer, and the shooting device can be an action camera, a gimbal camera, or a drone equipped with a camera. During automatic editing, the action camera will All videos (video 1, video 2, video 3... in the picture) shot in the past two days, three days and other times, this is only an example) are transmitted to the mobile phone, although all videos shot on the day may be As a finished film, all the videos shot on that day have a large amount of data, which takes a lot of time to transmit, causing inconvenience to users.

To solve the above problem, an embodiment of the present application provides a method for generating a movie, and the method can be applied to a terminal device. Referring to FIG. 2 , FIG. 2 is a flowchart of a method for generating a movie provided by an embodiment of the present application. The method includes:

S210. Acquire semantic information of the target material video.

The semantic information at least includes: the semantic information of the external material video obtained from the shooting device; the semantic information may include information such as scene, video theme, video style, camera movement, and whether it is blurred. The semantic information of the external material video may be sent by the shooting device to the terminal device.

S220. Determine video segment information required for generating a movie according to the semantic information.

S230. Acquire a target video segment corresponding to the video segment information.

The terminal device sends the video clip information to the shooting device, and then the shooting device acquires a corresponding target video clip based on the video clip information, and transmits the target video clip to the terminal device. Wherein, the target video clip may be a video clip of a material video captured by the shooting device, and the video clip information may include time node information of the shooting or video number, start time and end time.

Wherein, the target video segment at least includes: the video segment of the external material video acquired from the shooting device.

S240. Generate a movie by using the target video segment.

The target material video may be a material video that may be used to generate a movie, for example, the target material video may be all videos shot on the same day, and for another example, the target material video may be all videos shot at the same place. Wherein, the target material video may at least include an external material video shot by a shooting device. As mentioned above, the shooting device is different from the terminal device. Therefore, the material video shot by the shooting device belongs to the external material video for the terminal device.

Semantic information can be obtained by semantic analysis of video content. In one embodiment, the semantic analysis of the video content may be implemented by using a machine learning algorithm such as a neural network. The semantic information of the video can include the content recognition result of at least one segment or at least one frame of the video, and the content recognition results can be various, such as scene recognition results (such as sky, grass, street, etc.), character action detection results (such as Running, walking, standing, jumping, etc.), facial expression detection results (such as smiling faces, crying faces, etc.), target detection results (such as animals, cars, etc.), composition evaluation results, aesthetic evaluation results, etc. In other words, through the semantic information of the video, the content contained in the video can be determined. In one embodiment, the semantic information may be a semantic tag, that is, the semantic information may be assigned to the video by tagging the video.

It should be noted that, when acquiring the semantic information of the target material video, the semantic information of the external material video can be obtained directly from the shooting device. In other words, the semantic information of the external material video may not be analyzed by the terminal device at the local end, but obtained by the shooting device through semantic analysis of the external material video. After analyzing the semantic information of the external material video, the shooting device can send it to the terminal device, so that the terminal device obtains the semantic information of the external material video.

Considering that the semantic analysis of the material video also takes a certain amount of time, the shooting device can analyze the semantic information of the material video before starting automatic editing. In one embodiment, if the computing power of the shooting device is sufficient, the semantic analysis of the material video can be simultaneously performed during the process of shooting the material video. In one embodiment, if the computing power of the shooting device is not enough to support semantic analysis while shooting, the shooting device can be made to perform semantic analysis on the captured material video after the shooting of the material video, for example, it can be used after charging. Semantic analysis is performed in the process.

Using the semantic information of the target material video, the video segment information required to generate the movie can be determined. Specifically, when determining the video segment information required for generating a movie, it can be determined according to a preset movie-forming rule and in combination with the semantic information of the target material video. The preset film-forming rule can be an algorithm module during implementation, and the algorithm module can be called a film-forming module. By inputting the semantic information of each target material video into the film-forming module, the film-forming module can output the required output of the film. The video clip information corresponding to the video clip.

Regarding the chip-forming module, there are various implementations in the specific implementation. In one embodiment, the slicing module may be constructed based on manually set slicing rules. For example, by relying on the prior knowledge of professional video clips, a method on how to select a suitable video segment when generating a video can be summarized, so that a corresponding computer program can be written according to the summarized method to generate a film-forming module. In another embodiment, the sliced module can be trained by machine learning technology. For example, multiple groups of sample material videos can be obtained, and professionals can screen each group of sample material videos to select the video clips that will be used for filming in each group of sample material videos, so that the selected video clips can be compared with the selected video clips. The sample material video groups corresponding to the video clips are training samples to train the neural network model, and a film-forming module based on the neural network model is obtained. The film-forming rule includes single-dimensional or multi-dimensional matching based on preset scene combinations, camera movement combinations, themes, and the like.

The video clip information may be related information used to generate a target video clip of a movie. In one embodiment, it may be used to indicate a target material video to which the target video clip belongs and a time period corresponding to the target video clip. For example, it may indicate that the target video segment belongs to the 10th-20th second video segment of the target material video X.

After the video clip information is determined, a target video clip corresponding to the video clip information can be acquired, so that a movie can be generated by using the acquired target video clip. Wherein, the target video segment belonging to the external material video can be obtained from the shooting device.

When acquiring a target video clip belonging to an external material video from a shooting device, in one implementation manner, the determined video clip information may be sent to the shooting device, so that the shooting device can use the received video clip information to analyze the video clip. The target material video indicated by the information is edited, and after the target video segment corresponding to the time period of the target material video is edited, the target video segment can be transmitted to the terminal device.

Considering that the material that the user wants to edit into the movie does not necessarily come from the shooting equipment, for example, when the user goes to a certain place to play, part of the video shot may be shot by an action camera or a gimbal camera, and the other part may be It is shot by a mobile phone. At this time, the user may wish to automatically generate a movie. The target material video included in the selection range includes not only the external material video shot by the camera and other shooting devices, but also the local material video shot by the mobile phone and other terminal devices. Therefore, in one embodiment, automatic editing can also support mixed-cutting, that is, the target material video can also include local material video. When acquiring the semantic information of the target material video, in addition to acquiring the semantic information of the external material video from the shooting device , and also includes obtaining the semantic information of the local material video from the local.

In one implementation manner, the semantic information of the local material video may be obtained by the terminal device performing semantic analysis on the local material video. In another implementation manner, it may also be semantic information carried by the local material video itself. Because the source of the local material video of the terminal device is rich and diverse, for example, it can come from the Internet, and the mobile phone may already carry the corresponding semantic information when the source material video is obtained, so the mobile phone does not need to repeat the material video. Perform semantic analysis.

It can be understood that even if the target material video includes the local material video, the video segment information determined according to the semantic information of the target material video does not necessarily include the video segment information corresponding to the local material video. For example, in one case, the terminal device may determine, according to the semantic information of the local material video, that the shooting quality of the local material video is poor and does not meet the requirements for making a film, so that the determined video segment information corresponds to the external material video. video clip information.

In one case, if the target video clip corresponding to the video clip information includes the video clip of the local material video, when acquiring the target video clip, for the video clip of the local material video, the local material video can be processed according to the video clip information. Clip obtained.

In the above embodiment, the automatic editing can support the mixed-cutting function, that is, the movie generated by the automatic editing can also include the local material video of the terminal device, thereby improving the richness of the content of the finished movie. In addition, when mixing and cutting, the information of the video clips can also be determined according to the semantic information for the local material video, so that the video clips suitable for forming a film can be selected from the local material video, compared to randomly selecting the video in the local material video. The clips are inserted into the movie and have a higher quality of the finished movie.

As mentioned above, the video segment information can be obtained by inputting the semantic information into the slice module. In one embodiment, after the semantic information is input into the slice formation module, the output of the slice formation module may include the target slice formation template and the video segment information corresponding to each video slot in the target slice formation template.

The finished film template may be a preset film template, which may include a plurality of video slots, and each video slot may be used for importing or inserting video clips. Each film template can have its own characteristics. For example, different textures, texts, video special effects and other elements can be matched with the video slots. Among them, the video special effects can be acceleration, deceleration, filter, mirror movement and other special effects. There can also be different transition effects between video slots and video slots. In addition, different film templates can also be matched with different music, and the transition time corresponding to the transition effect can also match the music rhythm points of the film template.

In one embodiment, the target patching template may be determined from candidate patching templates, and the candidate patching templates may be determined from a patching template library. The film template library can include multiple preset film templates. Considering that there are too many film templates in the film template library, when determining the target film template, you can first screen out the candidate films from the film template library. A slice template is selected, and the target slice template is determined from the candidate slice templates to reduce the workload of screening.

There may be various implementations in screening candidate sheeting templates. In one embodiment, the style type of the movie to be generated may be determined according to the semantic information of the target material video. For example, themes corresponding to (most) target material videos can be determined according to the semantic information of the target material videos, such as parent-child, nature, city, gourmet, etc., so that the finished film in the film template library can be determined according to the determined theme Templates are screened to screen out candidate template templates that match the theme.

When determining the target tablet template from the candidate tablet templates, there may also be multiple ways. In one embodiment, since different candidate film templates have different features, such as different music, different video space elements, different transition effects, etc., the priority corresponding to different features can be preset. , and then match each feature of the candidate film template with the semantic information of the target material video according to the priority from high to low. Target tablet template. In one embodiment, since the semantic information may include the semantic information of different segments in the video, various combinations of importing the video segments into the video slots of the candidate segment template can be simulated by using the semantic information of the different segments, thus, The scores of various combinations can be calculated according to the degree of matching between the video clip and the video slot, and the smoothness of the transition between adjacent video slots, and the candidate film-forming template of the combination with the highest score is determined as the target film-forming template, and the target The video segment information corresponding to each video slot in the film-forming template is also determined accordingly.

After the target movie template and the video clip information corresponding to each video slot in the target movie template are determined, the target video clip corresponding to the video clip information can be obtained, and the target video clip can be imported into the video slot corresponding to the target movie template, thereby generating Film.

It can be seen from the foregoing that the target material video is a material video that may be used to generate a movie, and the material video that may be used to generate a movie is not necessarily all the currently stored material videos. The conditions are filtered from the stored material videos. The set conditions may be one or more of time, place, character information, scene information, etc. After the target material video is screened out through the set conditions, the semantic information of the target material video can be obtained.

It should be noted that each of the above set conditions can have multiple implementations. For example, the time condition can be the current day, the last two days, the last week, from date A to date B, etc., and the location condition can be scenic spots, Cities, countries, homes, companies, etc. Character conditions can be specific people such as Xiao Ming, or abstract categories such as male, female, old, and young, and scene conditions can be day, night, rainy and other environments, or streets. , pastoral and other venues, or objects such as buses and the sky. In a specific example, if the set condition is the current day, the target material video may include all videos shot on that day. If the set condition is location A, the target material video may include all videos shot at location A. If the set condition is to include Xiao Ming, the target material video can be all videos that include Xiao Ming, and if the set condition is a street, the target material video can be all videos that include streets.

Moreover, since the target material video may include external material video on the shooting device, and may also include local material video on the terminal device, the target material video can be screened independently on the shooting device and the terminal device. In one embodiment, the conditions for filtering the target material video can be set by the user, for example, the filtering conditions set by the user can be obtained by interacting with the user before automatic editing. In one embodiment, the terminal device and the shooting device may also have their own default filter conditions, so that automatic editing can be started directly, and a movie can be automatically generated without the user's perception, giving the user a certain sense of surprise.

A relatively detailed embodiment is provided below. In this embodiment, the user uses the mixed-cut function, that is, the target material video also includes the local material video. Referring to FIG. 3 , FIG. 3 is an interaction diagram of the method for generating a movie provided by an embodiment of the present application.

Before the automatic editing starts, the shooting device may complete the semantic analysis of the local material video A in advance ( S300 ). It can be understood that the local mentioned here is relative to the local of the photographing device.

After the automatic editing starts, the shooting device and the terminal device can respectively determine the target material video according to their respective set conditions (S310a and S310b). The target material video determined by the shooting device can be referred to by the target material video a, and the target material video determined by the terminal device The target material video can be guided by the target material video b.

After the target material video b is determined, the terminal device may perform semantic analysis on the target material video b to obtain semantic information of the target material video b (S320). After the target material video a is determined, the shooting device may send the semantic information of the target material video a to the terminal device (S330).

Using the semantic information of the target material video a and the target material video b, the video segment information a belonging to the target material video a and the video segment information b belonging to the target material video b can be determined (S340). Wherein, the video clip information a can be used to send to the shooting device (S350), so that the shooting device can edit the corresponding target material video a according to the video clip information a to obtain the target video clip a (S360a); and the video clip information b can be used by the terminal device to edit the target material video b according to the video clip information b (S360b) to obtain the target video clip b.

The shooting device may transmit the target video clip a to the terminal device (S370), and the terminal device imports the target video clip a and the target video clip b into the target film template, thereby generating a final movie (S380).

In one embodiment, during the flight of the drone, the semantic information of the captured material can be transmitted back to the remote control terminal (including the remote control and mobile phone) in real time, and the semantic information of the captured material and the remote control terminal are stored locally. When the semantic information of the material conforms to the preset rules, the UAV is triggered to automatically shoot, and the UAV is controlled to adjust the flight trajectory and attitude based on the preset rules to obtain the target shooting material. After preliminary processing of the compressed material based on the real-time image transmission of the target shooting material and the local material, an initial preview video can be generated for the user to preview. When the user performs an original film synthesis operation on the initial preview movie, the target shooting material is acquired according to the original film synthesis operation, and a final film is synthesized based on the target material and the local material. Through the above methods, when the semantic information of the captured material and the locally stored material conforms to the preset rules, the flight and shooting of the drone can be controlled based on the preset rules, without the need for the user to have professional shooting skills and manipulation skills As well as a sensitive sense of shooting smell, it can prevent users from missing the shooting opportunity of shooting materials that match the local materials, and also avoid occupying the shooting memory and image transmission bandwidth of the drone at the initial stage, which improves the user experience and saves memory and graphics. transmission bandwidth.

The specific implementation of some steps involved in the foregoing embodiments has been described in the foregoing, and will not be repeated here.

Referring to FIG. 4 below, FIG. 4 is another flowchart of the method for generating a movie provided by an embodiment of the present application. The method can be applied to a photographing device, and the method includes:

S410. Acquire semantic information of the target material video.

S420. Send the semantic information to the terminal device.

Wherein, the semantic information is used for the terminal device to determine the video segment information required for generating the movie;

S430: Acquire the video clip information sent by the terminal device, and edit the target material video according to the video clip information to obtain a target video clip.

S440. Transmit the target video segment to the terminal device.

The target video segment is used by the terminal device to generate a movie.

Optionally, the video clip information is used to indicate a target material video to which the target video clip belongs and a time period corresponding to the target video clip.

Optionally, the semantic information of the target material video is obtained by performing semantic analysis on the target material video.

Optionally, the semantic analysis is performed during the shooting process of the target material video.

Optionally, the semantic analysis is performed during the charging process.

Optionally, before the acquiring the semantic information of the target material video, the method further includes:

According to the set conditions, the target material video is filtered from the stored material video.

Optionally, the set condition is a preset default condition.

Optionally, the set condition is set by a user.

Optionally, the semantic information includes semantic tags.

Optionally, the semantic information includes one or more of the following: scene recognition results, character action detection results, character expression detection results, target detection results, composition evaluation results, and aesthetic evaluation results.

The specific implementation of the above embodiments has been described in the foregoing, and will not be repeated here.

The movie generation method provided by the embodiment of the present application does not require the shooting device to first transmit the target material video that may be used to generate the movie to the terminal device, but the terminal device can obtain the semantic information of the target material video from the shooting device first, Using semantic information to determine the required video clip information, it is only necessary to obtain the target video clip corresponding to the video clip information from the shooting device, and it is not necessary to transmit all the target material videos, which greatly reduces the user's waiting time and improves the automatic The speed of the clip.

Referring to FIG. 5 below, FIG. 5 is a schematic structural diagram of a terminal device provided by an embodiment of the present application. The terminal equipment includes:

a communication interface 510 for communicating with the photographing device;

A processor 520 and a memory 530 storing a computer program that, when executed by the processor, implements the following steps:

A movie is generated using the target video segment.

Optionally, when acquiring the target video clip corresponding to the video clip information, the processor is configured to, after sending the video clip information to the shooting device, receive the video clip information from the shooting device according to the video clip information. The target video segment obtained from the external material video clip.

Optionally, the video clip information is used to indicate an external material video to which the target video clip belongs and a time period corresponding to the target video clip.

Optionally, the target material video further includes: local material video.

Optionally, the semantic information of the local material video is obtained in the following manner:

Semantic analysis is performed on the local material video to obtain semantic information of the local material video.

Optionally, the target video segment further includes: a video segment of the local material video;

When acquiring the target video segment corresponding to the video segment information, the processor is configured to edit the local material video according to the video segment information to obtain the video segment of the local material video.

Optionally, the semantic information of the external material video is obtained by the shooting device performing semantic analysis on the external material video.

Optionally, the processor is used to determine, according to the semantic information, the video clip information required for generating a movie, to determine the target movie template and each video slot in the target movie template according to the semantic information. Corresponding video clip information.

Optionally, the target sheet-forming template is determined from candidate sheet-forming templates, and the candidate sheet-forming template is determined from a sheet-forming template library.

Optionally, the candidate slice template is determined in the following manner:

According to the semantic information, determine the style type of the movie to be generated;

According to the style type, the candidate film templates are screened from the film template library.

Optionally, the processor determines, according to the semantic information, the target filming template and the video clip information corresponding to each video slot in the target filming template, and uses the semantic information to calculate the target material. The matching degree of the video clip in the video and each video slot in the candidate film template, and calculate the smoothness of the video transition between adjacent video slots; According to the matching degree and the smoothness, determine the target film template and the target video segments corresponding to each video slot in the target filming template.

Optionally, the target film-forming template includes one or more of the following contents: music, transition effects, textures, and video special effects.

Optionally, when generating a movie by using the target video clip, the processor is configured to import the target video clip into a video slot corresponding to the target movie template to generate a movie.

Optionally, the target material video is automatically selected from the stored material video according to a preset condition.

Optionally, the target material video is selected from the stored material video according to a condition set by the user.

Optionally, the conditions include one or more of the following: time, location, character information, and scene information.

Optionally, the semantic information includes semantic tags.

The terminal device provided by the embodiment of the present application does not require the shooting device to first transmit the target material video that may be used to generate the movie to the terminal device, but the terminal device can first obtain the semantic information of the target material video from the shooting device, and use the The semantic information determines the required video clip information. Therefore, it is only necessary to obtain the target video clip corresponding to the video clip information from the shooting device, and it is not necessary to transmit all the target material videos, which greatly reduces the user's waiting time and improves automatic editing. speed.

Please refer to FIG. 6 below. FIG. 6 is a schematic structural diagram of a photographing device provided by an embodiment of the present application. The shooting equipment includes:

The camera 610 is used for shooting material video;

a communication interface 620 for communicating with terminal equipment;

A processor 630 and a memory 640 storing a computer program that, when executed by the processor, implements the following steps:

Obtain the semantic information of the target material video;

Optionally, the semantic analysis is performed during the charging process.

Optionally, the processor is further configured to, before acquiring the semantic information of the target material video, filter out the target material video from the stored material video according to a set condition.

Optionally, the set condition is a preset default condition.

Optionally, the set condition is set by a user.

Optionally, the semantic information includes semantic tags.

Optionally, the photographing device includes a movable platform or a camera or a pan-tilt camera.

The shooting device provided by the embodiment of the present application does not need to transmit the target material video that may be used to generate the movie to the terminal device first, but can first send the semantic information of the target material video to the terminal device, so that the terminal device can use the semantic information Determine the required video clip information and send the video clip information to the shooting device. Therefore, the shooting device only transmits the target video segment corresponding to the video segment information to the terminal device, and does not need to transmit all the target material videos, which greatly reduces the user's waiting time and improves the speed of automatic editing.

Please refer to FIG. 7 below. FIG. 7 is a schematic structural diagram of a movie generation system provided by an embodiment of the present application. The system includes:

The terminal device 710 is configured to acquire semantic information of the target material video, where the semantic information at least includes: the semantic information of the external material video acquired from the shooting device; according to the semantic information, determine the video segment information required for generating the film; acquire A target video clip corresponding to the video clip information, wherein the target video clip at least includes: a video clip of the external material video acquired from the shooting device; generating a movie by using the target video clip;

The shooting device 720 is configured to acquire the semantic information of the external material video; send the semantic information to the terminal device; acquire the video clip information sent by the terminal device, and analyze the video clip information according to the video clip information. The target material video is edited to obtain a target video segment; and the target video segment is transmitted to the terminal device.

Optionally, the target material video further includes: a local material video, and the terminal device is further configured to locally acquire semantic information of the local material video.

Optionally, the terminal device is further configured to perform semantic analysis on the local material video to obtain semantic information of the local material video.

The terminal device is further configured to edit the local material video according to the video segment information to obtain a video segment of the local material video.

Optionally, the semantic information of the external material video is obtained by semantic analysis of the external material video by the shooting device.

Optionally, the semantic analysis is performed during the charging process.

Optionally, the terminal device is used to determine, according to the semantic information, the video clip information required for generating a movie, according to the semantic information, to determine a target movie template and each video slot in the target movie template. Corresponding video clip information.

Optionally, the candidate slice template is determined in the following manner:

Optionally, the terminal device determines, according to the semantic information, the target filming template and the video clip information corresponding to each video slot in the target filming template, and uses the semantic information to calculate the target material. The matching degree of the video clip in the video and each video slot in the candidate film template, and calculate the smoothness of the video transition between adjacent video slots; According to the matching degree and the smoothness, determine the target film template and the target video segments corresponding to each video slot in the target filming template.

Optionally, when generating a movie by using the target video clip, the terminal device is configured to import the target video clip into a video slot corresponding to the target movie template to generate a movie.

Optionally, the semantic information includes semantic tags.

The movie generation system provided by the embodiment of the present application does not require the shooting device to first transmit the target material video that may be used to generate the movie to the terminal device, but the terminal device can first obtain the semantic information of the target material video from the shooting device. Using semantic information to determine the required video clip information, it is only necessary to obtain the target video clip corresponding to the video clip information from the shooting device, and it is not necessary to transmit all the target material videos, which greatly reduces the user's waiting time and improves the automatic The speed of the clip.

Embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, realizes the generation of a movie applied to a terminal device provided by the embodiments of the present application method.

Embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, realizes the generation of a film applied to a shooting device provided by the embodiments of the present application method.

As long as there is no conflict or contradiction in the technical features provided in the above embodiments, those skilled in the art can combine the technical features according to actual conditions to form various embodiments. However, this application document is limited in space and does not describe various embodiments, but it is understood that various embodiments also belong to the scope disclosed by the embodiments of this application.

Embodiments of the present application may take the form of a computer program product implemented on one or more storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having program code embodied therein. Computer-usable storage media includes permanent and non-permanent, removable and non-removable media, and storage of information can be accomplished by any method or technology. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

It should be noted that, in this document, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. The terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion such that a process, method, article or device comprising a list of elements includes not only those elements, but also other not expressly listed elements, or also include elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

The methods, devices, and systems provided by the embodiments of the present application have been described in detail above, and the principles and implementations of the present application are described with specific examples herein. The descriptions of the above embodiments are only used to help understand the methods of the present invention. and its core idea; at the same time, for those of ordinary skill in the art, according to the idea of the application, there will be changes in the specific implementation and application scope. To sum up, the content of this specification should not be construed as a limits.

Claims

A method for generating a film, comprising:

Acquiring semantic information of the target material video, the semantic information at least includes: the semantic information of the external material video obtained from the shooting device;

According to the semantic information, determine the video segment information required to generate the movie;

Acquiring a target video clip corresponding to the video clip information, wherein the target video clip at least includes: a video clip of the external material video obtained from the shooting device;

A movie is generated using the target video segment.
The method according to claim 1, wherein the acquiring the target video clip corresponding to the video clip information comprises:

After the video clip information is sent to the shooting device, a target video clip obtained by the shooting device from clipping the external material video according to the video clip information is received.
The method according to claim 2, wherein the video clip information is used to indicate an external material video to which the target video clip belongs and a time period corresponding to the target video clip.
The method according to claim 1, wherein the target material video further comprises: local material video.
The method according to claim 4, wherein the semantic information of the local material video is obtained in the following manner:

Semantic analysis is performed on the local material video to obtain semantic information of the local material video.
The method according to claim 4, wherein the target video segment further comprises: a video segment of the local material video;

The acquiring the target video clip corresponding to the video clip information includes:

The local material video is edited according to the video segment information to obtain a video segment of the local material video.
The method according to claim 1, wherein the semantic information of the external material video is obtained by semantic analysis of the external material video by the shooting device.
The method according to claim 1, wherein the determining, according to the semantic information, video segment information required for generating a movie comprises:

According to the semantic information, the target slice template and the video segment information corresponding to each video slot in the target slice template are determined.
The method according to claim 8, wherein the target sheet-forming template is determined from candidate sheet-forming templates, and the candidate sheet-forming templates are determined from a sheet-forming template library.
The method according to claim 9, wherein the candidate slice template is determined in the following manner:

According to the semantic information, determine the style type of the movie to be generated;

According to the style type, the candidate film templates are screened from the film template library.
The method according to claim 9, wherein, according to the semantic information, determining the target filming template and the video clip information corresponding to each video slot in the target filming template, comprising:

Using the semantic information, calculate the degree of matching between the video segment in the target material video and each video slot in the candidate film template, and calculate the smoothness of video transition between adjacent video slots;

According to the matching degree and the smoothness, the target filming template and the target video segment corresponding to each video slot in the target filming template are determined.
The method according to claim 8, wherein the target film-forming template includes one or more of the following contents: music, transition effects, textures, and video special effects.
The method according to claim 8, wherein the generating a movie by using the target video segment comprises:

The target video clip is imported into the video slot corresponding to the target film template to generate a movie.
The method according to claim 1, wherein the target material video is automatically obtained from the stored material video according to preset conditions.
The method according to claim 1, wherein the target material video is obtained by screening from the stored material video according to a condition set by a user.
The method according to claim 14 or 15, wherein the conditions include one or more of the following: time, location, character information, and scene information.
The method of claim 1, wherein the semantic information includes semantic tags.
The method according to claim 1, wherein the semantic information includes one or more of the following: scene recognition results, human action detection results, human expression detection results, target detection results, composition evaluation results, and aesthetic evaluation results .
A method for generating a film, comprising:

Obtain the semantic information of the target material video;

Sending the semantic information to the terminal device, wherein the semantic information is used for the terminal device to determine the video segment information required for generating the movie;

acquiring the video clip information sent by the terminal device, and editing the target material video according to the video clip information to obtain a target video clip;

The target video clip is transmitted to the terminal device, so that the terminal device generates a movie using the target video clip.
The method according to claim 19, wherein the video clip information is used to indicate a target material video to which the target video clip belongs and a time period corresponding to the target video clip.
The method according to claim 19, wherein the semantic information of the target material video is obtained by semantic analysis of the target material video.
The method according to claim 21, wherein the semantic analysis is performed during the shooting process of the target material video.
The method of claim 21, wherein the semantic analysis is performed during charging.
The method according to claim 19, wherein before the acquiring the semantic information of the target material video, the method further comprises:

According to the set conditions, the target material video is filtered from the stored material video.
The method according to claim 24, wherein the set condition is a preset default condition.
The method of claim 24, wherein the set condition is set by a user.
The method of claim 19, wherein the semantic information includes semantic tags.
The method according to claim 19, wherein the semantic information includes one or more of the following: scene recognition results, human action detection results, human expression detection results, target detection results, composition evaluation results, and aesthetic evaluation results .
A terminal device, characterized in that it includes:

A communication interface for communicating with the photographing device;

A processor and a memory storing a computer program that, when executed by the processor, implements the following steps:

Acquiring semantic information of the target material video, the semantic information at least includes: the semantic information of the external material video obtained from the shooting device;

According to the semantic information, determine the video segment information required to generate the movie;

Acquiring a target video clip corresponding to the video clip information, wherein the target video clip at least includes: a video clip of the external material video obtained from the shooting device;

A movie is generated using the target video segment.
The terminal device according to claim 29, wherein when acquiring the target video clip corresponding to the video clip information, the processor is configured to, after sending the video clip information to the shooting device, receive the video clip information The target video segment obtained by the shooting device from the video clip of the external material according to the video segment information.
The terminal device according to claim 30, wherein the video clip information is used to indicate an external material video to which the target video clip belongs and a time period corresponding to the target video clip.
The terminal device according to claim 29, wherein the target material video further comprises: local material video.
The terminal device according to claim 32, wherein the semantic information of the local material video is obtained in the following manner:

Semantic analysis is performed on the local material video to obtain semantic information of the local material video.
The terminal device according to claim 32, wherein the target video segment further comprises: a video segment of the local material video;

When acquiring the target video segment corresponding to the video segment information, the processor is configured to edit the local material video according to the video segment information to obtain the video segment of the local material video.
The terminal device according to claim 29, wherein the semantic information of the external material video is obtained by semantic analysis of the external material video by the shooting device.
The terminal device according to claim 29, wherein, when the processor determines the video segment information required for generating a movie according to the semantic information, the processor is configured to, according to the semantic information, determine a target movie-forming template and Video segment information corresponding to each video slot in the target filming template.
The terminal device according to claim 36, wherein the target slice forming template is determined from a candidate slice forming template, and the candidate slice forming template is determined from a slice forming template library.
The terminal device according to claim 37, wherein the candidate slice template is determined in the following manner:

According to the semantic information, determine the style type of the movie to be generated;

According to the style type, the candidate film template is screened from the film template library.
The terminal device according to claim 37, wherein, according to the semantic information, the processor determines a target filming template and video clip information corresponding to each video slot in the target filming template, and is used for, using For the semantic information, the degree of matching between the video segment in the target material video and each video slot in the candidate film template is calculated, and the smoothness of the video transition between adjacent video slots is calculated; according to the matching degree and For the smoothness, the target filming template and the target video segments corresponding to each video slot in the target filming template are determined.
The terminal device according to claim 36, wherein the target film-forming template includes one or more of the following contents: music, transition effects, textures, and video special effects.
The terminal device according to claim 36, wherein when generating a movie by using the target video clip, the processor is configured to import the target video clip into a video slot corresponding to the target movie template, and generate Film.
The terminal device according to claim 29, wherein the target material video is automatically selected from the stored material video according to a preset condition.
The terminal device according to claim 29, wherein the target material video is obtained by screening from the stored material video according to a condition set by the user.
The terminal device according to claim 42 or 43, wherein the conditions include one or more of the following: time, location, character information, and scene information.
The terminal device according to claim 29, wherein the semantic information comprises a semantic tag.
The terminal device according to claim 29, wherein the semantic information includes one or more of the following: scene recognition result, character action detection result, character expression detection result, target detection result, composition evaluation result, aesthetic evaluation result.
A photographing device, comprising:

Camera, used to shoot material video;

Communication interface for communicating with terminal equipment;

A processor and a memory storing a computer program that, when executed by the processor, implements the following steps:

Obtain the semantic information of the target material video;

sending the semantic information to a terminal device, wherein the semantic information is used by the terminal device to determine video segment information required for generating a movie;

acquiring the video clip information sent by the terminal device, and editing the target material video according to the video clip information to obtain a target video clip;

The target video clip is transmitted to the terminal device, so that the terminal device generates a movie using the target video clip.
The shooting device according to claim 47, wherein the video clip information is used to indicate a target material video to which the target video clip belongs and a time period corresponding to the target video clip.
The shooting device according to claim 47, wherein the semantic information of the target material video is obtained by semantic analysis of the target material video.
The shooting device according to claim 49, wherein the semantic analysis is performed during the shooting process of the target material video.
The photographing device of claim 49, wherein the semantic analysis is performed during charging.
The photographing device according to claim 47, wherein the processor is further configured to, before acquiring the semantic information of the target material video, filter out the target material from the stored material video according to a set condition video.
The photographing device according to claim 52, wherein the set condition is a preset default condition.
The photographing apparatus according to claim 52, wherein the set condition is set by a user.
The photographing device of claim 47, wherein the semantic information includes a semantic tag.
The photographing device according to claim 47, wherein the semantic information includes one or more of the following: scene recognition result, character action detection result, character expression detection result, target detection result, composition evaluation result, aesthetic evaluation result.
The photographing device according to claim 47, wherein the photographing device comprises a movable platform or a camera or a pan-tilt camera.
A film generation system, characterized in that it includes:

The terminal device is used for acquiring the semantic information of the target material video, the semantic information at least includes: the semantic information of the external material video obtained from the shooting device; according to the semantic information, determine the video segment information required for generating the film; A target video clip corresponding to the video clip information, wherein the target video clip at least includes: a video clip of the external material video obtained from the shooting device; and generating a movie by using the target video clip;

a shooting device, configured to acquire the semantic information of the external material video; send the semantic information to the terminal device; acquire the video clip information sent by the terminal device, and analyze the The target material video is edited to obtain a target video segment; and the target video segment is transmitted to the terminal device.
The system according to claim 58, wherein the video clip information is used to indicate an external material video to which the target video clip belongs and a time period corresponding to the target video clip.
The system according to claim 58, wherein the target material video further comprises: a local material video, and the terminal device is further configured to locally acquire semantic information of the local material video.
The system according to claim 60, wherein the terminal device is further configured to perform semantic analysis on the local material video to obtain semantic information of the local material video.
The system according to claim 60, wherein the target video segment further comprises: a video segment of the local material video;

The terminal device is further configured to edit the local material video according to the video segment information to obtain a video segment of the local material video.
The system according to claim 58, wherein the semantic information of the external material video is obtained by semantic analysis of the external material video by the shooting device.
The shooting device according to claim 63, wherein the semantic analysis is performed during the shooting process of the target material video.
The photographing device of claim 63, wherein the semantic analysis is performed during charging.
The system according to claim 58, wherein, when the terminal device determines the video segment information required for generating a movie according to the semantic information, the terminal device is used to determine, according to the semantic information, a target movie-forming template and all the required video clips. The video segment information corresponding to each video slot in the target film template is described.
The system of claim 66, wherein the target sheeting template is determined from candidate sheeting templates, and the candidate sheeting templates are determined from a sheeting template library.
The system of claim 67, wherein the candidate slice templates are determined by:

According to the semantic information, determine the style type of the movie to be generated;

According to the style type, the candidate film templates are screened from the film template library.
The system according to claim 67, wherein, according to the semantic information, the terminal device determines the target filming template and the video clip information corresponding to each video slot in the target filming template for, using the The semantic information is calculated, the degree of matching between the video clip in the target material video and each video slot in the candidate film template is calculated, and the smoothness of video transition between adjacent video slots is calculated; The smoothness is determined, and the target filming template and the target video segment corresponding to each video slot in the target filming template are determined.
The system according to claim 66, wherein the target film-forming template includes one or more of the following contents: music, transition effects, textures, and video special effects.
The system according to claim 66, wherein when generating a movie by using the target video clip, the terminal device is used to import the target video clip into a video slot corresponding to the target movie template to generate a movie .
The system according to claim 58, wherein the target material video is automatically selected from the stored material video according to preset conditions.
The system according to claim 58, wherein the target material video is obtained by screening from the stored material video according to the conditions set by the user.
The system according to claim 72 or 73, wherein the conditions include one or more of the following: time, location, character information, and scene information.
The system of claim 58, wherein the semantic information includes semantic tags.
The system according to claim 58, wherein the semantic information includes one or more of the following: scene recognition results, human action detection results, human expression detection results, target detection results, composition evaluation results, and aesthetic evaluation results .
A computer-readable storage medium, characterized in that, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the method for generating a movie according to any one of claims 1-18 is implemented.
A computer-readable storage medium, characterized in that, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the method for generating a movie according to any one of claims 19-28 is implemented.