WO2022061806A1 - Film production method, terminal device, photographing device, and film production system - Google Patents

Film production method, terminal device, photographing device, and film production system Download PDF

Info

Publication number
WO2022061806A1
WO2022061806A1 PCT/CN2020/118084 CN2020118084W WO2022061806A1 WO 2022061806 A1 WO2022061806 A1 WO 2022061806A1 CN 2020118084 W CN2020118084 W CN 2020118084W WO 2022061806 A1 WO2022061806 A1 WO 2022061806A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
target
information
semantic information
terminal device
Prior art date
Application number
PCT/CN2020/118084
Other languages
French (fr)
Chinese (zh)
Inventor
朱梦龙
刘志鹏
朱高
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to CN202080035038.6A priority Critical patent/CN113841417B/en
Priority to PCT/CN2020/118084 priority patent/WO2022061806A1/en
Publication of WO2022061806A1 publication Critical patent/WO2022061806A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/4223Cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip

Definitions

  • the present application relates to the technical field of audio and video processing, and in particular, to a film generation method, a terminal device, a shooting device, a film generation system, and a computer-readable storage medium.
  • Automatic editing provides great convenience for users who need to edit videos. Automatic editing means that the machine can automatically select suitable video clips, background music, transition effects, video effects, etc. and edit them into a film. This process requires no user operation or only simple operation by the user. However, the existing automatic editing is slow and requires users to wait for a long time.
  • embodiments of the present application provide a movie generation method, a terminal device, a shooting device, a movie generation system, and a computer-readable storage medium, so as to solve the existing technical problem of requiring users to wait for a long time for automatic editing.
  • a first aspect of the embodiments of the present application provides a method for generating a movie, including:
  • the semantic information at least includes: the semantic information of the external material video obtained from the shooting device;
  • the target video clip at least includes: a video clip of the external material video obtained from the shooting device;
  • a movie is generated using the target video segment.
  • a second aspect of the embodiments of the present application provides a method for generating a movie, including:
  • the target video clip is transmitted to the terminal device, so that the terminal device generates a movie using the target video clip.
  • a third aspect of the embodiments of the present application provides a terminal device, including:
  • a communication interface for communicating with the photographing device
  • the semantic information at least includes: the semantic information of the external material video obtained from the shooting device;
  • the target video clip at least includes: a video clip of the external material video obtained from the shooting device;
  • a movie is generated using the target video segment.
  • a fourth aspect of the embodiments of the present application provides a photographing device, including:
  • the target video clip is transmitted to the terminal device, so that the terminal device generates a movie using the target video clip.
  • a fifth aspect of the embodiments of the present application provides a movie generation system, including:
  • the terminal device is used to obtain semantic information of the target material video, the semantic information at least includes: the semantic information of the external material video obtained from the shooting device; according to the semantic information, determine the video segment information required for generating the movie; A target video clip corresponding to the video clip information, wherein the target video clip at least includes: a video clip of the external material video obtained from the shooting device; and generating a movie by using the target video clip;
  • a shooting device configured to acquire the semantic information of the external material video; send the semantic information to the terminal device; acquire the video clip information sent by the terminal device, and analyze the The target material video is edited to obtain a target video segment; and the target video segment is transmitted to the terminal device.
  • a sixth aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the movie generation method provided in the first aspect.
  • a seventh aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the movie generation method provided in the second aspect.
  • the movie generation method provided by the embodiment of the present application does not require the shooting device to first transmit the target material video that may be used to generate the movie to the terminal device, but the terminal device can first obtain the semantic information of the target material video from the shooting device, Using semantic information to determine the required video clip information, it is only necessary to obtain the target video clip corresponding to the video clip information from the shooting device, and it is not necessary to transmit all the target material videos, which greatly reduces the user's waiting time and improves the automatic The speed of the clip.
  • FIG. 1 is a schematic diagram of a scenario provided by an embodiment of the present application.
  • FIG. 2 is a flowchart of a method for generating a movie provided by an embodiment of the present application.
  • FIG. 3 is an interaction diagram of the method for generating a movie provided by an embodiment of the present application.
  • FIG. 4 is another flowchart of the method for generating a movie provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a photographing device provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a movie generation system provided by an embodiment of the present application.
  • Automatic editing provides great convenience for users who need to edit videos. Automatic editing means that the machine can automatically select suitable video clips, background music, transition effects, video effects, etc. and edit them into a film. This process requires no user operation or only simple operation by the user.
  • the automatic clipping function can be implemented in an application program (APP), which can be installed on a terminal device and run on hardware such as a processor and a memory of the terminal device.
  • APP application program
  • the material video required for automatic editing is not on the terminal device where automatic editing is performed.
  • the material video may be in a shooting device that shoots the material video, and the shooting device is another device independent of the terminal device, such as a camera, an action camera, a handheld gimbal camera, or a drone equipped with a camera. Because the shooting equipment usually has the characteristics of small screen and inconvenient networking, the automatic editing processing is often carried out on the terminal equipment.
  • the terminal device can be a mobile phone, a tablet or a personal computer.
  • the terminal device Since the automatic editing process is performed at the terminal device, and the material video used for forming a film is stored on other shooting devices, during automatic editing, the terminal device needs to obtain the required material video from the shooting device.
  • the shooting device will first transmit all the material videos that may be used for film formation to the terminal equipment, and it will take a lot of time to transmit all the material videos that may be used for film formation to the terminal equipment.
  • FIG. 1 is a schematic diagram of a scenario provided by an embodiment of the present application.
  • the terminal device can be a mobile phone, a PC, or a tablet computer
  • the shooting device can be an action camera, a gimbal camera, or a drone equipped with a camera.
  • the action camera will All videos (video 1, video 2, video 3... in the picture) shot in the past two days, three days and other times, this is only an example) are transmitted to the mobile phone, although all videos shot on the day may be As a finished film, all the videos shot on that day have a large amount of data, which takes a lot of time to transmit, causing inconvenience to users.
  • FIG. 2 is a flowchart of a method for generating a movie provided by an embodiment of the present application. The method includes:
  • the semantic information at least includes: the semantic information of the external material video obtained from the shooting device; the semantic information may include information such as scene, video theme, video style, camera movement, and whether it is blurred.
  • the semantic information of the external material video may be sent by the shooting device to the terminal device.
  • the terminal device sends the video clip information to the shooting device, and then the shooting device acquires a corresponding target video clip based on the video clip information, and transmits the target video clip to the terminal device.
  • the target video clip may be a video clip of a material video captured by the shooting device
  • the video clip information may include time node information of the shooting or video number, start time and end time.
  • the target video segment at least includes: the video segment of the external material video acquired from the shooting device.
  • the target material video may be a material video that may be used to generate a movie, for example, the target material video may be all videos shot on the same day, and for another example, the target material video may be all videos shot at the same place.
  • the target material video may at least include an external material video shot by a shooting device. As mentioned above, the shooting device is different from the terminal device. Therefore, the material video shot by the shooting device belongs to the external material video for the terminal device.
  • Semantic information can be obtained by semantic analysis of video content.
  • the semantic analysis of the video content may be implemented by using a machine learning algorithm such as a neural network.
  • the semantic information of the video can include the content recognition result of at least one segment or at least one frame of the video, and the content recognition results can be various, such as scene recognition results (such as sky, grass, street, etc.), character action detection results (such as Running, walking, standing, jumping, etc.), facial expression detection results (such as smiling faces, crying faces, etc.), target detection results (such as animals, cars, etc.), composition evaluation results, aesthetic evaluation results, etc.
  • the semantic information may be a semantic tag, that is, the semantic information may be assigned to the video by tagging the video.
  • the semantic information of the external material video can be obtained directly from the shooting device.
  • the semantic information of the external material video may not be analyzed by the terminal device at the local end, but obtained by the shooting device through semantic analysis of the external material video.
  • the shooting device can send it to the terminal device, so that the terminal device obtains the semantic information of the external material video.
  • the shooting device can analyze the semantic information of the material video before starting automatic editing.
  • the computing power of the shooting device if the computing power of the shooting device is sufficient, the semantic analysis of the material video can be simultaneously performed during the process of shooting the material video.
  • the shooting device if the computing power of the shooting device is not enough to support semantic analysis while shooting, the shooting device can be made to perform semantic analysis on the captured material video after the shooting of the material video, for example, it can be used after charging. Semantic analysis is performed in the process.
  • the video segment information required to generate the movie can be determined. Specifically, when determining the video segment information required for generating a movie, it can be determined according to a preset movie-forming rule and in combination with the semantic information of the target material video.
  • the preset film-forming rule can be an algorithm module during implementation, and the algorithm module can be called a film-forming module.
  • the film-forming module can output the required output of the film.
  • the slicing module may be constructed based on manually set slicing rules. For example, by relying on the prior knowledge of professional video clips, a method on how to select a suitable video segment when generating a video can be summarized, so that a corresponding computer program can be written according to the summarized method to generate a film-forming module.
  • the sliced module can be trained by machine learning technology. For example, multiple groups of sample material videos can be obtained, and professionals can screen each group of sample material videos to select the video clips that will be used for filming in each group of sample material videos, so that the selected video clips can be compared with the selected video clips.
  • the sample material video groups corresponding to the video clips are training samples to train the neural network model, and a film-forming module based on the neural network model is obtained.
  • the film-forming rule includes single-dimensional or multi-dimensional matching based on preset scene combinations, camera movement combinations, themes, and the like.
  • the video clip information may be related information used to generate a target video clip of a movie. In one embodiment, it may be used to indicate a target material video to which the target video clip belongs and a time period corresponding to the target video clip. For example, it may indicate that the target video segment belongs to the 10th-20th second video segment of the target material video X.
  • a target video clip corresponding to the video clip information can be acquired, so that a movie can be generated by using the acquired target video clip.
  • the target video segment belonging to the external material video can be obtained from the shooting device.
  • the movie generation method provided by the embodiment of the present application does not require the shooting device to first transmit the target material video that may be used to generate the movie to the terminal device, but the terminal device can first obtain the semantic information of the target material video from the shooting device, Using semantic information to determine the required video clip information, it is only necessary to obtain the target video clip corresponding to the video clip information from the shooting device, and it is not necessary to transmit all the target material videos, which greatly reduces the user's waiting time and improves the automatic The speed of the clip.
  • the determined video clip information may be sent to the shooting device, so that the shooting device can use the received video clip information to analyze the video clip.
  • the target material video indicated by the information is edited, and after the target video segment corresponding to the time period of the target material video is edited, the target video segment can be transmitted to the terminal device.
  • the material that the user wants to edit into the movie does not necessarily come from the shooting equipment, for example, when the user goes to a certain place to play, part of the video shot may be shot by an action camera or a gimbal camera, and the other part may be It is shot by a mobile phone. At this time, the user may wish to automatically generate a movie.
  • the target material video included in the selection range includes not only the external material video shot by the camera and other shooting devices, but also the local material video shot by the mobile phone and other terminal devices. Therefore, in one embodiment, automatic editing can also support mixed-cutting, that is, the target material video can also include local material video.
  • the semantic information of the local material video may be obtained by the terminal device performing semantic analysis on the local material video. In another implementation manner, it may also be semantic information carried by the local material video itself. Because the source of the local material video of the terminal device is rich and diverse, for example, it can come from the Internet, and the mobile phone may already carry the corresponding semantic information when the source material video is obtained, so the mobile phone does not need to repeat the material video. Perform semantic analysis.
  • the video segment information determined according to the semantic information of the target material video does not necessarily include the video segment information corresponding to the local material video.
  • the terminal device may determine, according to the semantic information of the local material video, that the shooting quality of the local material video is poor and does not meet the requirements for making a film, so that the determined video segment information corresponds to the external material video. video clip information.
  • the target video clip corresponding to the video clip information includes the video clip of the local material video
  • the local material video can be processed according to the video clip information. Clip obtained.
  • the automatic editing can support the mixed-cutting function, that is, the movie generated by the automatic editing can also include the local material video of the terminal device, thereby improving the richness of the content of the finished movie.
  • the information of the video clips can also be determined according to the semantic information for the local material video, so that the video clips suitable for forming a film can be selected from the local material video, compared to randomly selecting the video in the local material video. The clips are inserted into the movie and have a higher quality of the finished movie.
  • the video segment information can be obtained by inputting the semantic information into the slice module.
  • the output of the slice formation module may include the target slice formation template and the video segment information corresponding to each video slot in the target slice formation template.
  • the finished film template may be a preset film template, which may include a plurality of video slots, and each video slot may be used for importing or inserting video clips.
  • Each film template can have its own characteristics. For example, different textures, texts, video special effects and other elements can be matched with the video slots. Among them, the video special effects can be acceleration, deceleration, filter, mirror movement and other special effects. There can also be different transition effects between video slots and video slots. In addition, different film templates can also be matched with different music, and the transition time corresponding to the transition effect can also match the music rhythm points of the film template.
  • the target patching template may be determined from candidate patching templates, and the candidate patching templates may be determined from a patching template library.
  • the film template library can include multiple preset film templates. Considering that there are too many film templates in the film template library, when determining the target film template, you can first screen out the candidate films from the film template library. A slice template is selected, and the target slice template is determined from the candidate slice templates to reduce the workload of screening.
  • the style type of the movie to be generated may be determined according to the semantic information of the target material video.
  • themes corresponding to (most) target material videos can be determined according to the semantic information of the target material videos, such as parent-child, nature, city, gourmet, etc., so that the finished film in the film template library can be determined according to the determined theme Templates are screened to screen out candidate template templates that match the theme.
  • the priority corresponding to different features can be preset. , and then match each feature of the candidate film template with the semantic information of the target material video according to the priority from high to low.
  • Target tablet template since different candidate film templates have different features, such as different music, different video space elements, different transition effects, etc., the priority corresponding to different features can be preset. , and then match each feature of the candidate film template with the semantic information of the target material video according to the priority from high to low. Target tablet template.
  • the semantic information may include the semantic information of different segments in the video
  • various combinations of importing the video segments into the video slots of the candidate segment template can be simulated by using the semantic information of the different segments, thus, The scores of various combinations can be calculated according to the degree of matching between the video clip and the video slot, and the smoothness of the transition between adjacent video slots, and the candidate film-forming template of the combination with the highest score is determined as the target film-forming template, and the target The video segment information corresponding to each video slot in the film-forming template is also determined accordingly.
  • the target video clip corresponding to the video clip information can be obtained, and the target video clip can be imported into the video slot corresponding to the target movie template, thereby generating Film.
  • the target material video is a material video that may be used to generate a movie
  • the material video that may be used to generate a movie is not necessarily all the currently stored material videos.
  • the conditions are filtered from the stored material videos.
  • the set conditions may be one or more of time, place, character information, scene information, etc. After the target material video is screened out through the set conditions, the semantic information of the target material video can be obtained.
  • the time condition can be the current day, the last two days, the last week, from date A to date B, etc.
  • the location condition can be scenic spots, Cities, countries, homes, companies, etc.
  • Character conditions can be specific people such as Xiao Ming, or abstract categories such as male, female, old, and young, and scene conditions can be day, night, rainy and other environments, or streets. , scenic and other venues, or objects such as buses and the sky.
  • the set condition is the current day, the target material video may include all videos shot on that day. If the set condition is location A, the target material video may include all videos shot at location A. If the set condition is to include Xiao Ming, the target material video can be all videos that include Xiao Ming, and if the set condition is a street, the target material video can be all videos that include streets.
  • the target material video may include external material video on the shooting device, and may also include local material video on the terminal device
  • the target material video can be screened independently on the shooting device and the terminal device.
  • the conditions for filtering the target material video can be set by the user, for example, the filtering conditions set by the user can be obtained by interacting with the user before automatic editing.
  • the terminal device and the shooting device may also have their own default filter conditions, so that automatic editing can be started directly, and a movie can be automatically generated without the user's perception, giving the user a certain sense of surprise.
  • the movie generation method provided by the embodiment of the present application does not require the shooting device to first transmit the target material video that may be used to generate the movie to the terminal device, but the terminal device can first obtain the semantic information of the target material video from the shooting device, Using semantic information to determine the required video clip information, it is only necessary to obtain the target video clip corresponding to the video clip information from the shooting device, and it is not necessary to transmit all the target material videos, which greatly reduces the user's waiting time and improves the automatic The speed of the clip.
  • FIG. 3 is an interaction diagram of the method for generating a movie provided by an embodiment of the present application.
  • the shooting device may complete the semantic analysis of the local material video A in advance ( S300 ). It can be understood that the local mentioned here is relative to the local of the photographing device.
  • the shooting device and the terminal device can respectively determine the target material video according to their respective set conditions (S310a and S310b).
  • the target material video determined by the shooting device can be referred to by the target material video a, and the target material video determined by the terminal device
  • the target material video can be guided by the target material video b.
  • the terminal device may perform semantic analysis on the target material video b to obtain semantic information of the target material video b (S320).
  • the shooting device may send the semantic information of the target material video a to the terminal device (S330).
  • the video segment information a belonging to the target material video a and the video segment information b belonging to the target material video b can be determined (S340).
  • the video clip information a can be used to send to the shooting device (S350), so that the shooting device can edit the corresponding target material video a according to the video clip information a to obtain the target video clip a (S360a); and the video clip information b can be used by the terminal device to edit the target material video b according to the video clip information b (S360b) to obtain the target video clip b.
  • the shooting device may transmit the target video clip a to the terminal device (S370), and the terminal device imports the target video clip a and the target video clip b into the target film template, thereby generating a final movie (S380).
  • the semantic information of the captured material can be transmitted back to the remote control terminal (including the remote control and mobile phone) in real time, and the semantic information of the captured material and the remote control terminal are stored locally.
  • the UAV is triggered to automatically shoot, and the UAV is controlled to adjust the flight trajectory and attitude based on the preset rules to obtain the target shooting material.
  • an initial preview video can be generated for the user to preview.
  • the target shooting material is acquired according to the original film synthesis operation, and a final film is synthesized based on the target material and the local material.
  • FIG. 4 is another flowchart of the method for generating a movie provided by an embodiment of the present application.
  • the method can be applied to a photographing device, and the method includes:
  • the semantic information is used for the terminal device to determine the video segment information required for generating the movie
  • S430 Acquire the video clip information sent by the terminal device, and edit the target material video according to the video clip information to obtain a target video clip.
  • the target video segment is used by the terminal device to generate a movie.
  • the video clip information is used to indicate a target material video to which the target video clip belongs and a time period corresponding to the target video clip.
  • the semantic information of the target material video is obtained by performing semantic analysis on the target material video.
  • the semantic analysis is performed during the shooting process of the target material video.
  • the semantic analysis is performed during the charging process.
  • the method before the acquiring the semantic information of the target material video, the method further includes:
  • the target material video is filtered from the stored material video.
  • the set condition is a preset default condition.
  • the set condition is set by a user.
  • the semantic information includes semantic tags.
  • the semantic information includes one or more of the following: scene recognition results, character action detection results, character expression detection results, target detection results, composition evaluation results, and aesthetic evaluation results.
  • the movie generation method provided by the embodiment of the present application does not require the shooting device to first transmit the target material video that may be used to generate the movie to the terminal device, but the terminal device can obtain the semantic information of the target material video from the shooting device first, Using semantic information to determine the required video clip information, it is only necessary to obtain the target video clip corresponding to the video clip information from the shooting device, and it is not necessary to transmit all the target material videos, which greatly reduces the user's waiting time and improves the automatic The speed of the clip.
  • FIG. 5 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • the terminal equipment includes:
  • a communication interface 510 for communicating with the photographing device
  • the semantic information at least includes: the semantic information of the external material video obtained from the shooting device;
  • the target video clip at least includes: a video clip of the external material video obtained from the shooting device;
  • a movie is generated using the target video segment.
  • the processor when acquiring the target video clip corresponding to the video clip information, is configured to, after sending the video clip information to the shooting device, receive the video clip information from the shooting device according to the video clip information.
  • the target video segment obtained from the external material video clip.
  • the video clip information is used to indicate an external material video to which the target video clip belongs and a time period corresponding to the target video clip.
  • the target material video further includes: local material video.
  • the semantic information of the local material video is obtained in the following manner:
  • Semantic analysis is performed on the local material video to obtain semantic information of the local material video.
  • the target video segment further includes: a video segment of the local material video;
  • the processor When acquiring the target video segment corresponding to the video segment information, the processor is configured to edit the local material video according to the video segment information to obtain the video segment of the local material video.
  • the semantic information of the external material video is obtained by the shooting device performing semantic analysis on the external material video.
  • the processor is used to determine, according to the semantic information, the video clip information required for generating a movie, to determine the target movie template and each video slot in the target movie template according to the semantic information. Corresponding video clip information.
  • the target sheet-forming template is determined from candidate sheet-forming templates, and the candidate sheet-forming template is determined from a sheet-forming template library.
  • the candidate slice template is determined in the following manner:
  • the candidate film templates are screened from the film template library.
  • the processor determines, according to the semantic information, the target filming template and the video clip information corresponding to each video slot in the target filming template, and uses the semantic information to calculate the target material.
  • the matching degree of the video clip in the video and each video slot in the candidate film template and calculate the smoothness of the video transition between adjacent video slots; According to the matching degree and the smoothness, determine the target film template and the target video segments corresponding to each video slot in the target filming template.
  • the target film-forming template includes one or more of the following contents: music, transition effects, textures, and video special effects.
  • the processor when generating a movie by using the target video clip, is configured to import the target video clip into a video slot corresponding to the target movie template to generate a movie.
  • the target material video is automatically selected from the stored material video according to a preset condition.
  • the target material video is selected from the stored material video according to a condition set by the user.
  • the conditions include one or more of the following: time, location, character information, and scene information.
  • the semantic information includes semantic tags.
  • the semantic information includes one or more of the following: scene recognition results, character action detection results, character expression detection results, target detection results, composition evaluation results, and aesthetic evaluation results.
  • the terminal device provided by the embodiment of the present application does not require the shooting device to first transmit the target material video that may be used to generate the movie to the terminal device, but the terminal device can first obtain the semantic information of the target material video from the shooting device, and use the The semantic information determines the required video clip information. Therefore, it is only necessary to obtain the target video clip corresponding to the video clip information from the shooting device, and it is not necessary to transmit all the target material videos, which greatly reduces the user's waiting time and improves automatic editing. speed.
  • FIG. 6 is a schematic structural diagram of a photographing device provided by an embodiment of the present application.
  • the shooting equipment includes:
  • the camera 610 is used for shooting material video
  • the target video clip is transmitted to the terminal device, so that the terminal device generates a movie using the target video clip.
  • the video clip information is used to indicate a target material video to which the target video clip belongs and a time period corresponding to the target video clip.
  • the semantic information of the target material video is obtained by performing semantic analysis on the target material video.
  • the semantic analysis is performed during the shooting process of the target material video.
  • the semantic analysis is performed during the charging process.
  • the processor is further configured to, before acquiring the semantic information of the target material video, filter out the target material video from the stored material video according to a set condition.
  • the set condition is a preset default condition.
  • the set condition is set by a user.
  • the semantic information includes semantic tags.
  • the semantic information includes one or more of the following: scene recognition results, character action detection results, character expression detection results, target detection results, composition evaluation results, and aesthetic evaluation results.
  • the photographing device includes a movable platform or a camera or a pan-tilt camera.
  • the shooting device provided by the embodiment of the present application does not need to transmit the target material video that may be used to generate the movie to the terminal device first, but can first send the semantic information of the target material video to the terminal device, so that the terminal device can use the semantic information Determine the required video clip information and send the video clip information to the shooting device. Therefore, the shooting device only transmits the target video segment corresponding to the video segment information to the terminal device, and does not need to transmit all the target material videos, which greatly reduces the user's waiting time and improves the speed of automatic editing.
  • FIG. 7 is a schematic structural diagram of a movie generation system provided by an embodiment of the present application.
  • the system includes:
  • the terminal device 710 is configured to acquire semantic information of the target material video, where the semantic information at least includes: the semantic information of the external material video acquired from the shooting device; according to the semantic information, determine the video segment information required for generating the film; acquire A target video clip corresponding to the video clip information, wherein the target video clip at least includes: a video clip of the external material video acquired from the shooting device; generating a movie by using the target video clip;
  • the shooting device 720 is configured to acquire the semantic information of the external material video; send the semantic information to the terminal device; acquire the video clip information sent by the terminal device, and analyze the video clip information according to the video clip information.
  • the target material video is edited to obtain a target video segment; and the target video segment is transmitted to the terminal device.
  • the video clip information is used to indicate an external material video to which the target video clip belongs and a time period corresponding to the target video clip.
  • the target material video further includes: a local material video
  • the terminal device is further configured to locally acquire semantic information of the local material video.
  • the terminal device is further configured to perform semantic analysis on the local material video to obtain semantic information of the local material video.
  • the target video segment further includes: a video segment of the local material video;
  • the terminal device is further configured to edit the local material video according to the video segment information to obtain a video segment of the local material video.
  • the semantic information of the external material video is obtained by semantic analysis of the external material video by the shooting device.
  • the semantic analysis is performed during the shooting process of the target material video.
  • the semantic analysis is performed during the charging process.
  • the terminal device is used to determine, according to the semantic information, the video clip information required for generating a movie, according to the semantic information, to determine a target movie template and each video slot in the target movie template. Corresponding video clip information.
  • the target sheet-forming template is determined from candidate sheet-forming templates, and the candidate sheet-forming template is determined from a sheet-forming template library.
  • the candidate slice template is determined in the following manner:
  • the candidate film templates are screened from the film template library.
  • the terminal device determines, according to the semantic information, the target filming template and the video clip information corresponding to each video slot in the target filming template, and uses the semantic information to calculate the target material.
  • the matching degree of the video clip in the video and each video slot in the candidate film template and calculate the smoothness of the video transition between adjacent video slots; According to the matching degree and the smoothness, determine the target film template and the target video segments corresponding to each video slot in the target filming template.
  • the target film-forming template includes one or more of the following contents: music, transition effects, textures, and video special effects.
  • the terminal device when generating a movie by using the target video clip, is configured to import the target video clip into a video slot corresponding to the target movie template to generate a movie.
  • the target material video is automatically selected from the stored material video according to a preset condition.
  • the target material video is selected from the stored material video according to a condition set by the user.
  • the conditions include one or more of the following: time, location, character information, and scene information.
  • the semantic information includes semantic tags.
  • the semantic information includes one or more of the following: scene recognition results, character action detection results, character expression detection results, target detection results, composition evaluation results, and aesthetic evaluation results.
  • the movie generation system does not require the shooting device to first transmit the target material video that may be used to generate the movie to the terminal device, but the terminal device can first obtain the semantic information of the target material video from the shooting device. Using semantic information to determine the required video clip information, it is only necessary to obtain the target video clip corresponding to the video clip information from the shooting device, and it is not necessary to transmit all the target material videos, which greatly reduces the user's waiting time and improves the automatic The speed of the clip.
  • Embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, realizes the generation of a movie applied to a terminal device provided by the embodiments of the present application method.
  • Embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, realizes the generation of a film applied to a shooting device provided by the embodiments of the present application method.
  • Embodiments of the present application may take the form of a computer program product implemented on one or more storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having program code embodied therein.
  • Computer-usable storage media includes permanent and non-permanent, removable and non-removable media, and storage of information can be accomplished by any method or technology.
  • Information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • PRAM phase-change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read only memory
  • EEPROM Electrically Erasable Programmable Read Only Memory
  • Flash Memory or other memory technology
  • CD-ROM Compact Disc Read Only Memory
  • CD-ROM Compact Disc Read Only Memory
  • DVD Digital Versatile Disc
  • Magnetic tape cassettes magnetic tape magnetic disk storage or other magnetic storage devices or any other non-

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Television Signal Processing For Recording (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Studio Devices (AREA)

Abstract

Embodiments of the present application disclose a film production method, comprising: obtaining semantic information of a target material video, the semantic information at least comprising semantic information of an external material video obtained from a photographing device; determining, according to the semantic information, video clip information required by film production; obtaining target video clips corresponding to the video clip information, wherein the target video clips at least comprise video clips of the external material video obtained from the photographing device; and producing a film by using the target video clips. The method disclosed in the embodiments of the present application can solve the technical problem that existing automatic editing requires long waiting time of users.

Description

影片生成方法、终端设备、拍摄设备及影片生成系统Movie generation method, terminal device, shooting device and movie generation system 技术领域technical field
本申请涉及音视频处理技术领域,尤其涉及一种影片生成方法、终端设备、拍摄设备、影片生成系统及计算机可读存储介质。The present application relates to the technical field of audio and video processing, and in particular, to a film generation method, a terminal device, a shooting device, a film generation system, and a computer-readable storage medium.
背景技术Background technique
自动剪辑为有剪辑影片需求的用户提供了极大的方便。自动剪辑是指机器可以自动挑选合适的视频片段、背景音乐、转场效果、视频效果等剪辑成片,该过程无需用户操作或仅需用户进行简单的操作。然而,现有的自动剪辑成片速度慢,需要用户等待较长的时间。Automatic editing provides great convenience for users who need to edit videos. Automatic editing means that the machine can automatically select suitable video clips, background music, transition effects, video effects, etc. and edit them into a film. This process requires no user operation or only simple operation by the user. However, the existing automatic editing is slow and requires users to wait for a long time.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本申请实施例提供了一种影片生成方法、终端设备、拍摄设备、影片生成系统及计算机可读存储介质,以解决现有的自动剪辑需要用户等待较长的时间的技术问题。In view of this, embodiments of the present application provide a movie generation method, a terminal device, a shooting device, a movie generation system, and a computer-readable storage medium, so as to solve the existing technical problem of requiring users to wait for a long time for automatic editing.
本申请实施例第一方面提供一种影片生成方法,包括:A first aspect of the embodiments of the present application provides a method for generating a movie, including:
获取目标素材视频的语义信息,所述语义信息至少包括:从拍摄设备获取的外部素材视频的语义信息;Acquiring semantic information of the target material video, the semantic information at least includes: the semantic information of the external material video obtained from the shooting device;
根据所述语义信息,确定生成影片所需的视频片段信息;According to the semantic information, determine the video segment information required to generate the movie;
获取与所述视频片段信息对应的目标视频片段,其中,所述目标视频片段至少包括:从所述拍摄设备获取的所述外部素材视频的视频片段;Acquiring a target video clip corresponding to the video clip information, wherein the target video clip at least includes: a video clip of the external material video obtained from the shooting device;
利用所述目标视频片段生成影片。A movie is generated using the target video segment.
本申请实施例第二方面提供一种影片生成方法,包括:A second aspect of the embodiments of the present application provides a method for generating a movie, including:
获取目标素材视频的语义信息;Obtain the semantic information of the target material video;
将所述语义信息发送给终端设备,其中,所述语义信息用于所述终端设备确定生成影片所需的视频片段信息;sending the semantic information to a terminal device, wherein the semantic information is used by the terminal device to determine video segment information required for generating a movie;
获取所述终端设备发送的所述视频片段信息,并根据所述视频片段信息对所述目标素材视频进行剪辑,得到目标视频片段;acquiring the video clip information sent by the terminal device, and editing the target material video according to the video clip information to obtain a target video clip;
将所述目标视频片段传输给所述终端设备,以便所述终端设备利用所述目标视频片段生成影片。The target video clip is transmitted to the terminal device, so that the terminal device generates a movie using the target video clip.
本申请实施例第三方面提供一种终端设备,包括:A third aspect of the embodiments of the present application provides a terminal device, including:
通信接口,用于与拍摄设备通信;A communication interface for communicating with the photographing device;
处理器和存储有计算机程序的存储器,所述计算机程序被所述处理器执行时实现以下步骤:A processor and a memory storing a computer program that, when executed by the processor, implements the following steps:
获取目标素材视频的语义信息,所述语义信息至少包括:从拍摄设备获取的外部素材视频的语义信息;Acquiring semantic information of the target material video, the semantic information at least includes: the semantic information of the external material video obtained from the shooting device;
根据所述语义信息,确定生成影片所需的视频片段信息;According to the semantic information, determine the video segment information required to generate the movie;
获取与所述视频片段信息对应的目标视频片段,其中,所述目标视频片段至少包括:从所述拍摄设备获取的所述外部素材视频的视频片段;Acquiring a target video clip corresponding to the video clip information, wherein the target video clip at least includes: a video clip of the external material video obtained from the shooting device;
利用所述目标视频片段生成影片。A movie is generated using the target video segment.
本申请实施例第四方面提供一种拍摄设备,包括:A fourth aspect of the embodiments of the present application provides a photographing device, including:
摄像头,用于拍摄素材视频;Camera, used to shoot material video;
通信接口,用于与终端设备通信;Communication interface for communicating with terminal equipment;
处理器和存储有计算机程序的存储器,所述计算机程序被所述处理器执行时实现以下步骤:A processor and a memory storing a computer program that, when executed by the processor, implements the following steps:
获取目标素材视频的语义信息;Obtain the semantic information of the target material video;
将所述语义信息发送给终端设备,其中,所述语义信息用于所述终端设备确定生成影片所需的视频片段信息;sending the semantic information to a terminal device, wherein the semantic information is used by the terminal device to determine video segment information required for generating a movie;
获取所述终端设备发送的所述视频片段信息,并根据所述视频片段信息对所述目标素材视频进行剪辑,得到目标视频片段;acquiring the video clip information sent by the terminal device, and editing the target material video according to the video clip information to obtain a target video clip;
将所述目标视频片段传输给所述终端设备,以便所述终端设备利用所述目标视频片段生成影片。The target video clip is transmitted to the terminal device, so that the terminal device generates a movie using the target video clip.
本申请实施例第五方面提供影片生成系统,包括:A fifth aspect of the embodiments of the present application provides a movie generation system, including:
终端设备,用于获取目标素材视频的语义信息,所述语义信息至少包括:从拍摄设备获取的外部素材视频的语义信息;根据所述语义信息,确定生成影片所需的视频片段信息;获取与所述视频片段信息对应的目标视频片段,其中,所述目标视频片段至少包括:从所述拍摄设备获取的所述外部素材视频的视频片段;利用所述目标视频 片段生成影片;The terminal device is used to obtain semantic information of the target material video, the semantic information at least includes: the semantic information of the external material video obtained from the shooting device; according to the semantic information, determine the video segment information required for generating the movie; A target video clip corresponding to the video clip information, wherein the target video clip at least includes: a video clip of the external material video obtained from the shooting device; and generating a movie by using the target video clip;
拍摄设备,用于获取所述外部素材视频的语义信息;将所述语义信息发送给所述终端设备;获取所述终端设备发送的所述视频片段信息,并根据所述视频片段信息对所述目标素材视频进行剪辑,得到目标视频片段;将所述目标视频片段传输给所述终端设备。a shooting device, configured to acquire the semantic information of the external material video; send the semantic information to the terminal device; acquire the video clip information sent by the terminal device, and analyze the The target material video is edited to obtain a target video segment; and the target video segment is transmitted to the terminal device.
本申请实施例第六方面提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如上述第一方面所提供的影片生成方法。A sixth aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the movie generation method provided in the first aspect.
本申请实施例第七方面提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如上述第二方面所提供的影片生成方法。A seventh aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the movie generation method provided in the second aspect.
本申请实施例提供的影片生成方法,并不需要拍摄设备将可能用于生成影片的目标素材视频先传输给终端设备,而是可以由终端设备先从拍摄设备处获取目标素材视频的语义信息,利用语义信息确定所需的视频片段信息,从而,只需从拍摄设备处获取视频片段信息对应的目标视频片段即可,无需传输所有的目标素材视频,大大减少了用户的等待时间,提高了自动剪辑的速度。The movie generation method provided by the embodiment of the present application does not require the shooting device to first transmit the target material video that may be used to generate the movie to the terminal device, but the terminal device can first obtain the semantic information of the target material video from the shooting device, Using semantic information to determine the required video clip information, it is only necessary to obtain the target video clip corresponding to the video clip information from the shooting device, and it is not necessary to transmit all the target material videos, which greatly reduces the user's waiting time and improves the automatic The speed of the clip.
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative labor.
图1是本申请实施例提供的场景示意图。FIG. 1 is a schematic diagram of a scenario provided by an embodiment of the present application.
图2是本申请实施例提供的影片生成方法的流程图。FIG. 2 is a flowchart of a method for generating a movie provided by an embodiment of the present application.
图3是本申请实施例提供的影片生成方法的交互图。FIG. 3 is an interaction diagram of the method for generating a movie provided by an embodiment of the present application.
图4是本申请实施例提供的影片生成方法的另一流程图。FIG. 4 is another flowchart of the method for generating a movie provided by an embodiment of the present application.
图5是本申请实施例提供的一种终端设备的结构示意图。FIG. 5 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
图6是本申请实施例提供的一种拍摄设备的结构示意图。FIG. 6 is a schematic structural diagram of a photographing device provided by an embodiment of the present application.
图7是本申请实施例提供的一种影片生成系统的结构示意图。FIG. 7 is a schematic structural diagram of a movie generation system provided by an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
随着互联网技术的发展,人们越来越热衷于分享和记录自己的生活。人们通过手机、相机等各种设备进行拍摄,并利用拍摄的素材剪辑成影片分享到社交平台上。但剪辑出一段有观赏性的影片需要耗费用户不少的时间,比如需要从视频中挑选出合适的视频片段,需要选取与视频内容匹配的音乐,转场时机需要与音乐的节奏点契合等等。With the development of Internet technology, people are more and more keen to share and record their own life. People use mobile phones, cameras and other devices to shoot, and use the captured materials to edit videos and share them on social platforms. However, it takes a lot of time for users to edit a watchable video. For example, it is necessary to select suitable video clips from the video, select music that matches the video content, and the transition timing needs to match the rhythm of the music, etc. .
自动剪辑为有剪辑影片需求的用户提供了极大的方便。自动剪辑是指机器可以自动挑选合适的视频片段、背景音乐、转场效果、视频效果等剪辑成片,该过程无需用户操作或仅需用户进行简单的操作。自动剪辑功能可以在应用程序(APP)中实现,该应用程序可以安装在终端设备上,依靠终端设备的处理器、存储器等硬件运行。Automatic editing provides great convenience for users who need to edit videos. Automatic editing means that the machine can automatically select suitable video clips, background music, transition effects, video effects, etc. and edit them into a film. This process requires no user operation or only simple operation by the user. The automatic clipping function can be implemented in an application program (APP), which can be installed on a terminal device and run on hardware such as a processor and a memory of the terminal device.
在一些情况中,自动剪辑所需的素材视频并不在进行自动剪辑的终端设备上。比如,素材视频可以在拍摄该素材视频的拍摄设备中,而该拍摄设备是独立于终端设备的另一设备,如相机、运动相机、手持云台相机或搭载有相机的无人机等。由于拍摄设备通常具有屏幕小、联网不方便等特点,因此在自动剪辑处理往往在终端设备上进行。终端设备可以是手机、平板或个人电脑等。In some cases, the material video required for automatic editing is not on the terminal device where automatic editing is performed. For example, the material video may be in a shooting device that shoots the material video, and the shooting device is another device independent of the terminal device, such as a camera, an action camera, a handheld gimbal camera, or a drone equipped with a camera. Because the shooting equipment usually has the characteristics of small screen and inconvenient networking, the automatic editing processing is often carried out on the terminal equipment. The terminal device can be a mobile phone, a tablet or a personal computer.
由于自动剪辑的处理是在终端设备处进行,而用于成片的素材视频却存储在其他的拍摄设备上,因此在自动剪辑时,终端设备需要从拍摄设备处获取所需的素材视频。而相关技术中,拍摄设备会先将所有可能用于成片的素材视频均传输给终端设备,而将所有可能用于成片的素材视频均传输给终端设备将耗费大量的时间。Since the automatic editing process is performed at the terminal device, and the material video used for forming a film is stored on other shooting devices, during automatic editing, the terminal device needs to obtain the required material video from the shooting device. In the related art, the shooting device will first transmit all the material videos that may be used for film formation to the terminal equipment, and it will take a lot of time to transmit all the material videos that may be used for film formation to the terminal equipment.
为方便理解,可以参考图1,图1是本申请实施例提供的场景示意图。在图1的例子中,终端设备可以是手机、PC、平板电脑,拍摄设备可以是运动相机、云台相机或搭载相机的无人机,在自动剪辑时,运动相机会将当天(也可以是近两天、三天等 其他时间,此处仅作为示例)拍摄的所有视频(如图中的视频1、视频2、视频3……)均传输给手机,虽然当天拍摄的所有视频均有可能作为成片素材,但当天拍摄的所有视频的数据量很大,需要耗费大量的时间进行传输,给用户造成不便。For ease of understanding, reference may be made to FIG. 1 , which is a schematic diagram of a scenario provided by an embodiment of the present application. In the example in Figure 1, the terminal device can be a mobile phone, a PC, or a tablet computer, and the shooting device can be an action camera, a gimbal camera, or a drone equipped with a camera. During automatic editing, the action camera will All videos (video 1, video 2, video 3... in the picture) shot in the past two days, three days and other times, this is only an example) are transmitted to the mobile phone, although all videos shot on the day may be As a finished film, all the videos shot on that day have a large amount of data, which takes a lot of time to transmit, causing inconvenience to users.
为解决上述问题,本申请实施例提供了一种影片生成方法,该方法可以应用于终端设备。可以参考图2,图2是本申请实施例提供的影片生成方法的流程图。该方法包括:To solve the above problem, an embodiment of the present application provides a method for generating a movie, and the method can be applied to a terminal device. Referring to FIG. 2 , FIG. 2 is a flowchart of a method for generating a movie provided by an embodiment of the present application. The method includes:
S210、获取目标素材视频的语义信息。S210. Acquire semantic information of the target material video.
所述语义信息至少包括:从拍摄设备获取的外部素材视频的语义信息;所述语义信息可以包括景别、视频主题、视频风格、运镜、是否模糊等信息。所述外部素材视频的语义信息可以由拍摄设备向所述终端设备发送。The semantic information at least includes: the semantic information of the external material video obtained from the shooting device; the semantic information may include information such as scene, video theme, video style, camera movement, and whether it is blurred. The semantic information of the external material video may be sent by the shooting device to the terminal device.
S220、根据所述语义信息,确定生成影片所需的视频片段信息。S220. Determine video segment information required for generating a movie according to the semantic information.
S230、获取与所述视频片段信息对应的目标视频片段。S230. Acquire a target video segment corresponding to the video segment information.
所述终端设备向所述拍摄设备发送所述视频片段信息,然后所述拍摄设备基于所述视频片段信息获取对应的目标视频片段,并将所述目标视频片段传输给所述终端设备。其中,所述目标视频片段可以为所述拍摄设备拍摄到的素材视频的视频片段,所述视频片段信息可以包括拍摄的时间节点信息或视频编号、起始时间和终止时间。The terminal device sends the video clip information to the shooting device, and then the shooting device acquires a corresponding target video clip based on the video clip information, and transmits the target video clip to the terminal device. Wherein, the target video clip may be a video clip of a material video captured by the shooting device, and the video clip information may include time node information of the shooting or video number, start time and end time.
其中,所述目标视频片段至少包括:从所述拍摄设备获取的所述外部素材视频的视频片段。Wherein, the target video segment at least includes: the video segment of the external material video acquired from the shooting device.
S240、利用所述目标视频片段生成影片。S240. Generate a movie by using the target video segment.
目标素材视频可以是可能用于生成影片的素材视频,比如,目标素材视频可以是同一天拍摄的所有视频,又比如,目标素材视频可以是同一地点拍摄的所有视频。其中,目标素材视频至少可以包括拍摄设备拍摄的外部素材视频。如前文所述,拍摄设备是区别于终端设备的其它设备,因此,拍摄设备所拍摄的素材视频对于终端设备而言属于外部素材视频。The target material video may be a material video that may be used to generate a movie, for example, the target material video may be all videos shot on the same day, and for another example, the target material video may be all videos shot at the same place. Wherein, the target material video may at least include an external material video shot by a shooting device. As mentioned above, the shooting device is different from the terminal device. Therefore, the material video shot by the shooting device belongs to the external material video for the terminal device.
语义信息,可以通过对视频内容进行语义分析得到。在一种实施方式中,对视频内容进行语义分析可以利用神经网络等机器学习算法实现。视频的语义信息可以包括该视频至少一个片段或至少一帧的内容识别结果,内容识别结果可以有多种,比如可以是场景识别结果(如天空、草地、街道等)、人物动作检测结果(如跑步、行走、站立、跳跃等)、人物表情检测结果(如笑脸、哭脸等)、目标检测结果(如动物、汽车等)、构图评价结果、美学评价结果等。换言之,通过视频的语义信息,即可确定该 视频所包含的内容。在一种实施方式中,语义信息可以是语义标签,即可以通过对视频打标签的做法将语义信息赋予该视频。Semantic information can be obtained by semantic analysis of video content. In one embodiment, the semantic analysis of the video content may be implemented by using a machine learning algorithm such as a neural network. The semantic information of the video can include the content recognition result of at least one segment or at least one frame of the video, and the content recognition results can be various, such as scene recognition results (such as sky, grass, street, etc.), character action detection results (such as Running, walking, standing, jumping, etc.), facial expression detection results (such as smiling faces, crying faces, etc.), target detection results (such as animals, cars, etc.), composition evaluation results, aesthetic evaluation results, etc. In other words, through the semantic information of the video, the content contained in the video can be determined. In one embodiment, the semantic information may be a semantic tag, that is, the semantic information may be assigned to the video by tagging the video.
需要注意的是,在获取目标素材视频的语义信息时,对于外部素材视频的语义信息,可以从拍摄设备直接获取。换言之,外部素材视频的语义信息可以不是本端的终端设备分析出来的,而是拍摄设备对外部素材视频进行语义分析得到的。拍摄设备在分析出外部素材视频的语义信息后可以发送给终端设备,从而终端设备获取到外部素材视频的语义信息。It should be noted that, when acquiring the semantic information of the target material video, the semantic information of the external material video can be obtained directly from the shooting device. In other words, the semantic information of the external material video may not be analyzed by the terminal device at the local end, but obtained by the shooting device through semantic analysis of the external material video. After analyzing the semantic information of the external material video, the shooting device can send it to the terminal device, so that the terminal device obtains the semantic information of the external material video.
考虑到对素材视频进行语义分析也需要占用一定的时间,因此,可以使拍摄设备在开始自动剪辑之前就分析出素材视频的语义信息。在一种实施方式中,若拍摄设备的算力充足,则可以在素材视频拍摄的过程中同时进行素材视频的语义分析。在一种实施方式中,若拍摄设备的算力不足以支持一边拍摄一边进行语义分析,则可以使拍摄设备在素材视频的拍摄结束后再对所拍摄的素材视频进行语义分析,比如可以在充电过程中进行语义分析。Considering that the semantic analysis of the material video also takes a certain amount of time, the shooting device can analyze the semantic information of the material video before starting automatic editing. In one embodiment, if the computing power of the shooting device is sufficient, the semantic analysis of the material video can be simultaneously performed during the process of shooting the material video. In one embodiment, if the computing power of the shooting device is not enough to support semantic analysis while shooting, the shooting device can be made to perform semantic analysis on the captured material video after the shooting of the material video, for example, it can be used after charging. Semantic analysis is performed in the process.
利用目标素材视频的语义信息,可以确定生成影片所需的视频片段信息。具体的,在确定生成影片所需的视频片段信息时,可以根据预设的成片规则,结合目标素材视频的语义信息进行确定。预设的成片规则在实施时可以是一个算法模块,该算法模块可以称为成片模块,通过将各个目标素材视频的语义信息输入到成片模块中,成片模块可以输出生成影片所需的视频片段对应的视频片段信息。Using the semantic information of the target material video, the video segment information required to generate the movie can be determined. Specifically, when determining the video segment information required for generating a movie, it can be determined according to a preset movie-forming rule and in combination with the semantic information of the target material video. The preset film-forming rule can be an algorithm module during implementation, and the algorithm module can be called a film-forming module. By inputting the semantic information of each target material video into the film-forming module, the film-forming module can output the required output of the film. The video clip information corresponding to the video clip.
关于成片模块,在具体实现时有多种实施方式。在一种实施方式中,成片模块可以是基于人工设定的成片规则搭建的。比如,可以依靠专业人员的影片剪辑先验,总结出关于生成影片时如何挑选合适的视频片段的方法,从而可以根据总结出的该方法编写相应的计算机程序,以生成成片模块。在另一种实施方式中,可以通过机器学习技术训练出成片模块。比如,可以获取多组样本素材视频,通过专业人员对各组样本素材视频进行筛选,选取出各组样本素材视频中将会用于成片的视频片段,从而可以以选取出的视频片段与该视频片段对应的样本素材视频组为训练样本对神经网络模型进行训练,得到基于神经网络模型的成片模块。所述成片规则包括基于预设的景别组合、运镜组合、主题等进行单维度或多维度的匹配。Regarding the chip-forming module, there are various implementations in the specific implementation. In one embodiment, the slicing module may be constructed based on manually set slicing rules. For example, by relying on the prior knowledge of professional video clips, a method on how to select a suitable video segment when generating a video can be summarized, so that a corresponding computer program can be written according to the summarized method to generate a film-forming module. In another embodiment, the sliced module can be trained by machine learning technology. For example, multiple groups of sample material videos can be obtained, and professionals can screen each group of sample material videos to select the video clips that will be used for filming in each group of sample material videos, so that the selected video clips can be compared with the selected video clips. The sample material video groups corresponding to the video clips are training samples to train the neural network model, and a film-forming module based on the neural network model is obtained. The film-forming rule includes single-dimensional or multi-dimensional matching based on preset scene combinations, camera movement combinations, themes, and the like.
视频片段信息可以是用于生成影片的目标视频片段的相关信息,在一种实施方式中,其可以用于指示出目标视频片段所属的目标素材视频、及目标视频片段对应的时间段。比如,其可以指示出目标视频片段属于目标素材视频X的第10-20秒的视频片 段。The video clip information may be related information used to generate a target video clip of a movie. In one embodiment, it may be used to indicate a target material video to which the target video clip belongs and a time period corresponding to the target video clip. For example, it may indicate that the target video segment belongs to the 10th-20th second video segment of the target material video X.
在确定视频片段信息之后,可以获取视频片段信息对应的目标视频片段,从而可以利用获取的目标视频片段生成影片。其中,对于属于外部素材视频的目标视频片段,可以从拍摄设备处获取。After the video clip information is determined, a target video clip corresponding to the video clip information can be acquired, so that a movie can be generated by using the acquired target video clip. Wherein, the target video segment belonging to the external material video can be obtained from the shooting device.
本申请实施例提供的影片生成方法,并不需要拍摄设备将可能用于生成影片的目标素材视频先传输给终端设备,而是可以由终端设备先从拍摄设备处获取目标素材视频的语义信息,利用语义信息确定所需的视频片段信息,从而,只需从拍摄设备处获取视频片段信息对应的目标视频片段即可,无需传输所有的目标素材视频,大大减少了用户的等待时间,提高了自动剪辑的速度。The movie generation method provided by the embodiment of the present application does not require the shooting device to first transmit the target material video that may be used to generate the movie to the terminal device, but the terminal device can first obtain the semantic information of the target material video from the shooting device, Using semantic information to determine the required video clip information, it is only necessary to obtain the target video clip corresponding to the video clip information from the shooting device, and it is not necessary to transmit all the target material videos, which greatly reduces the user's waiting time and improves the automatic The speed of the clip.
在从拍摄设备获取属于外部素材视频的目标视频片段时,在一种实施方式中,可以将确定的视频片段信息发送给拍摄设备,从而,拍摄设备可以利用接收到的视频片段信息,对视频片段信息所指示的目标素材视频进行剪辑,剪辑出目标素材视频对应时间段的目标视频片段后,可以将该目标视频片段传输给终端设备。When acquiring a target video clip belonging to an external material video from a shooting device, in one implementation manner, the determined video clip information may be sent to the shooting device, so that the shooting device can use the received video clip information to analyze the video clip. The target material video indicated by the information is edited, and after the target video segment corresponding to the time period of the target material video is edited, the target video segment can be transmitted to the terminal device.
考虑到用户希望剪辑到影片中的素材并不一定全部来自拍摄设备,比如当用户去某个地点游玩时,其所拍摄的视频中可能有一部分是运动相机或云台相机拍摄的,另一部分可能是手机拍摄的,此时,用户可能希望自动生成影片时,纳入选择范围的目标素材视频不止包括相机等拍摄设备拍摄的外部素材视频,也可以包括手机等终端设备拍摄的本地素材视频。因此,在一种实施方式中,自动剪辑还可以支持混剪,即目标素材视频还可以包括本地素材视频,在获取目标素材视频的语义信息时,除了可以从拍摄设备获取外部素材视频的语义信息,还包括从本地获取本地素材视频的语义信息。Considering that the material that the user wants to edit into the movie does not necessarily come from the shooting equipment, for example, when the user goes to a certain place to play, part of the video shot may be shot by an action camera or a gimbal camera, and the other part may be It is shot by a mobile phone. At this time, the user may wish to automatically generate a movie. The target material video included in the selection range includes not only the external material video shot by the camera and other shooting devices, but also the local material video shot by the mobile phone and other terminal devices. Therefore, in one embodiment, automatic editing can also support mixed-cutting, that is, the target material video can also include local material video. When acquiring the semantic information of the target material video, in addition to acquiring the semantic information of the external material video from the shooting device , and also includes obtaining the semantic information of the local material video from the local.
本地素材视频的语义信息,在一种实施方式中,可以是终端设备对本地素材视频进行语义分析得到的。在另一种实施方式中,也可以是本地素材视频自身携带的语义信息。由于终端设备的本地素材视频来源丰富多样,比如,可以来源于互联网,而手机可能在获取到该来源于素材视频时,该素材视频已经携带有对应的语义信息,从而手机无需重复对该素材视频进行语义分析。In one implementation manner, the semantic information of the local material video may be obtained by the terminal device performing semantic analysis on the local material video. In another implementation manner, it may also be semantic information carried by the local material video itself. Because the source of the local material video of the terminal device is rich and diverse, for example, it can come from the Internet, and the mobile phone may already carry the corresponding semantic information when the source material video is obtained, so the mobile phone does not need to repeat the material video. Perform semantic analysis.
可以理解的,即便目标素材视频包括本地素材视频,但根据目标素材视频的语义信息所确定出的视频片段信息中,也并不一定包括对应本地素材视频的视频片段信息。例如,在一种情况中,终端设备可能根据本地素材视频的语义信息,判断出本地素材 视频的拍摄质量较差,不符合成片的要求,从而所确定的视频片段信息均是对应外部素材视频的视频片段信息。It can be understood that even if the target material video includes the local material video, the video segment information determined according to the semantic information of the target material video does not necessarily include the video segment information corresponding to the local material video. For example, in one case, the terminal device may determine, according to the semantic information of the local material video, that the shooting quality of the local material video is poor and does not meet the requirements for making a film, so that the determined video segment information corresponds to the external material video. video clip information.
而在一种情况中,若视频片段信息对应的目标视频片段包括本地素材视频的视频片段,则在获取目标视频片段时,对于本地素材视频的视频片段,可以根据视频片段信息对本地素材视频进行剪辑获得。In one case, if the target video clip corresponding to the video clip information includes the video clip of the local material video, when acquiring the target video clip, for the video clip of the local material video, the local material video can be processed according to the video clip information. Clip obtained.
在上述实施方式中,自动剪辑可以支持混剪功能,即自动剪辑生成的影片中还可以包括终端设备本地的素材视频,从而提高了成片内容的丰富度。并且,在混剪时,对于本地素材视频也可以根据语义信息进行视频片段信息的确定,从而能够选取出本地素材视频中适合用于成片的视频片段,相比随机选取本地素材视频中的视频片段插入影片,有更高的成片质量。In the above embodiment, the automatic editing can support the mixed-cutting function, that is, the movie generated by the automatic editing can also include the local material video of the terminal device, thereby improving the richness of the content of the finished movie. In addition, when mixing and cutting, the information of the video clips can also be determined according to the semantic information for the local material video, so that the video clips suitable for forming a film can be selected from the local material video, compared to randomly selecting the video in the local material video. The clips are inserted into the movie and have a higher quality of the finished movie.
如前文所述,视频片段信息可以通过将语义信息输入成片模块后得到。在一种实施方式中,在将语义信息输入成片模块后,成片模块的输出可以包括目标成片模板和目标成片模板中各视频空位对应的视频片段信息。As mentioned above, the video segment information can be obtained by inputting the semantic information into the slice module. In one embodiment, after the semantic information is input into the slice formation module, the output of the slice formation module may include the target slice formation template and the video segment information corresponding to each video slot in the target slice formation template.
成片模板可以是预先设定的影片模板,其可以包括多个视频空位,每个视频空位可以用于导入或插入视频片段。每个成片模板可以有各自的特征,比如,视频空位上可以配套有不同的贴图、文字、视频特效等元素,其中,视频特效可以是加速、减速、滤镜、运镜等各种特效。在视频空位和视频空位之间还可以有不同的转场效果。并且,不同的成片模板也可以搭配有不同的音乐,而转场效果对应的转场时间还可以与成片模板的音乐节奏点相匹配。The finished film template may be a preset film template, which may include a plurality of video slots, and each video slot may be used for importing or inserting video clips. Each film template can have its own characteristics. For example, different textures, texts, video special effects and other elements can be matched with the video slots. Among them, the video special effects can be acceleration, deceleration, filter, mirror movement and other special effects. There can also be different transition effects between video slots and video slots. In addition, different film templates can also be matched with different music, and the transition time corresponding to the transition effect can also match the music rhythm points of the film template.
在一种实施方式中,目标成片模板可以是从候选成片模板中确定的,候选成片模板可以是从成片模板库中确定的。成片模板库中可以包括多个预设的成片模板,考虑到成片模板库中的成片模板过多,在确定目标成片模板时,可以先从成片模板库中筛选出候选成片模板,再从候选成片模板中确定目标成片模板,减少筛选的工作量。In one embodiment, the target patching template may be determined from candidate patching templates, and the candidate patching templates may be determined from a patching template library. The film template library can include multiple preset film templates. Considering that there are too many film templates in the film template library, when determining the target film template, you can first screen out the candidate films from the film template library. A slice template is selected, and the target slice template is determined from the candidate slice templates to reduce the workload of screening.
在筛选候选成片模板时可以有多种实施方式。在一种实施方式中,可以根据目标素材视频的语义信息,确定待生成影片的风格类型。比如,可以根据目标素材视频的语义信息,确定(多数)目标素材视频对应的主题,如亲子、自然、城市、美食等,从而,可以根据该确定出的主题对成片模板库中的成片模板进行筛选,筛选出与该主题匹配的候选成片模板。There may be various implementations in screening candidate sheeting templates. In one embodiment, the style type of the movie to be generated may be determined according to the semantic information of the target material video. For example, themes corresponding to (most) target material videos can be determined according to the semantic information of the target material videos, such as parent-child, nature, city, gourmet, etc., so that the finished film in the film template library can be determined according to the determined theme Templates are screened to screen out candidate template templates that match the theme.
而从候选成片模板中确定目标成片模板时,也可以有多种方式。在一种实施方式中,由于不同的候选成片模板有不同的特征,比如有不同的音乐、不同的视频空位元素、不同的转场效果等,因此,可以预先设定不同特征对应的优先级,再按照从高到 低的优先级,将候选成片模板的每种特征分别与目标素材视频的语义信息进行匹配,每一次匹配后可以根据匹配结果进行一次筛选,从而最终筛选出最合适的目标成片模板。在一种实施方式中,由于语义信息可以包括视频中不同片段的语义信息,因此,可以利用不同片段的语义信息,模拟出将视频片段导入候选成片模板的视频空位的各种组合,从而,可以根据视频片段与视频空位的匹配度,相邻视频空位之间过渡的平滑度,计算出各种组合的得分,将得分最高的组合的候选成片模板确定为目标成片模板,且该目标成片模板中各视频空位对应的视频片段信息也随之确定。When determining the target tablet template from the candidate tablet templates, there may also be multiple ways. In one embodiment, since different candidate film templates have different features, such as different music, different video space elements, different transition effects, etc., the priority corresponding to different features can be preset. , and then match each feature of the candidate film template with the semantic information of the target material video according to the priority from high to low. Target tablet template. In one embodiment, since the semantic information may include the semantic information of different segments in the video, various combinations of importing the video segments into the video slots of the candidate segment template can be simulated by using the semantic information of the different segments, thus, The scores of various combinations can be calculated according to the degree of matching between the video clip and the video slot, and the smoothness of the transition between adjacent video slots, and the candidate film-forming template of the combination with the highest score is determined as the target film-forming template, and the target The video segment information corresponding to each video slot in the film-forming template is also determined accordingly.
在目标成片模板与目标成片模板中各视频空位对应的视频片段信息确定后,可以获取视频片段信息对应的目标视频片段,并将目标视频片段导入目标成片模板对应的视频空位,从而生成影片。After the target movie template and the video clip information corresponding to each video slot in the target movie template are determined, the target video clip corresponding to the video clip information can be obtained, and the target video clip can be imported into the video slot corresponding to the target movie template, thereby generating Film.
由前文可知,目标素材视频是可能用于生成影片的素材视频,而可能用于生成影片的素材视频不一定是当前存储的所有素材视频,在一种实施方式中,目标素材视频可以通过设定的条件从存储的素材视频中筛选得到。其中,设定的条件可以是时间、地点、人物信息、场景信息等一种或多种,通过设定的条件筛选出目标素材视频后,可以获取目标素材视频的语义信息。It can be seen from the foregoing that the target material video is a material video that may be used to generate a movie, and the material video that may be used to generate a movie is not necessarily all the currently stored material videos. The conditions are filtered from the stored material videos. The set conditions may be one or more of time, place, character information, scene information, etc. After the target material video is screened out through the set conditions, the semantic information of the target material video can be obtained.
需要注意的是,上述的每一种设定的条件可以有多种实施方式,比如,时间条件可以是当天、近两天、近一周、从日期A到日期B等,地点条件可以是景点、城市、国家、家、公司等,人物条件可以是具体的人如小明,也可以是男、女、老、少等抽象的类别,场景条件可以是白天、黑夜、雨天等环境,也可以是街道、田园等场地,也可以是公交车、天空等物体。在具体的例子中,若设定的条件是当天,则目标素材视频可以包括当天拍摄的所有视频,若设定的条件是A地点,则目标素材视频可以包括A地点拍摄的所有视频,若设定的条件是包括小明,则目标素材视频可以是包含小明的所有视频,若设定的条件是街道,则目标素材视频可以是包含街道的所有视频。It should be noted that each of the above set conditions can have multiple implementations. For example, the time condition can be the current day, the last two days, the last week, from date A to date B, etc., and the location condition can be scenic spots, Cities, countries, homes, companies, etc. Character conditions can be specific people such as Xiao Ming, or abstract categories such as male, female, old, and young, and scene conditions can be day, night, rainy and other environments, or streets. , pastoral and other venues, or objects such as buses and the sky. In a specific example, if the set condition is the current day, the target material video may include all videos shot on that day. If the set condition is location A, the target material video may include all videos shot at location A. If the set condition is to include Xiao Ming, the target material video can be all videos that include Xiao Ming, and if the set condition is a street, the target material video can be all videos that include streets.
并且,由于目标素材视频可以包括拍摄设备上的外部素材视频,也可以包括终端设备上的本地素材视频,因此,目标素材视频的筛选可以在拍摄设备与终端设备上分别独立的进行。在一种实施方式中,用于筛选目标素材视频的条件可以由用户自行设定,比如,可以在自动剪辑之前与用户进行交互,获取用户设定的筛选条件。在一种实施方式中,终端设备与拍摄设备也可以有各自默认的筛选条件,从而,可以自动剪辑可以直接开始,在用户无感知下自动生成影片,给用户一定的惊喜感。Moreover, since the target material video may include external material video on the shooting device, and may also include local material video on the terminal device, the target material video can be screened independently on the shooting device and the terminal device. In one embodiment, the conditions for filtering the target material video can be set by the user, for example, the filtering conditions set by the user can be obtained by interacting with the user before automatic editing. In one embodiment, the terminal device and the shooting device may also have their own default filter conditions, so that automatic editing can be started directly, and a movie can be automatically generated without the user's perception, giving the user a certain sense of surprise.
本申请实施例提供的影片生成方法,并不需要拍摄设备将可能用于生成影片的目 标素材视频先传输给终端设备,而是可以由终端设备先从拍摄设备处获取目标素材视频的语义信息,利用语义信息确定所需的视频片段信息,从而,只需从拍摄设备处获取视频片段信息对应的目标视频片段即可,无需传输所有的目标素材视频,大大减少了用户的等待时间,提高了自动剪辑的速度。The movie generation method provided by the embodiment of the present application does not require the shooting device to first transmit the target material video that may be used to generate the movie to the terminal device, but the terminal device can first obtain the semantic information of the target material video from the shooting device, Using semantic information to determine the required video clip information, it is only necessary to obtain the target video clip corresponding to the video clip information from the shooting device, and it is not necessary to transmit all the target material videos, which greatly reduces the user's waiting time and improves the automatic The speed of the clip.
下面提供一个相对详尽的实施例,该实施例中,用户使用了混剪功能,即目标素材视频还包括本地素材视频。可以参见图3,图3是本申请实施例提供的影片生成方法的交互图。A relatively detailed embodiment is provided below. In this embodiment, the user uses the mixed-cut function, that is, the target material video also includes the local material video. Referring to FIG. 3 , FIG. 3 is an interaction diagram of the method for generating a movie provided by an embodiment of the present application.
在自动剪辑开始之前,拍摄设备可以预先完成对本地的素材视频A的语义分析(S300),比如前文所述的在素材视频拍摄的同时或在充电等的空闲时间进行所述语义分析。可以理解,此处所述本地是相对于拍摄设备的本地。Before the automatic editing starts, the shooting device may complete the semantic analysis of the local material video A in advance ( S300 ). It can be understood that the local mentioned here is relative to the local of the photographing device.
在自动剪辑开始后,拍摄设备和终端设备可以分别根据各自设定的条件确定目标素材视频(S310a和S310b),拍摄设备确定出的目标素材视频可以用目标素材视频a指代,终端设备确定的目标素材视频可以用目标素材视频b指导。After the automatic editing starts, the shooting device and the terminal device can respectively determine the target material video according to their respective set conditions (S310a and S310b). The target material video determined by the shooting device can be referred to by the target material video a, and the target material video determined by the terminal device The target material video can be guided by the target material video b.
在目标素材视频b确定后,终端设备可以对目标素材视频b进行语义分析以获取该目标素材视频b的语义信息(S320)。而拍摄设备在目标素材视频a确定后,可以将目标素材视频a的语义信息发送给终端设备(S330)。After the target material video b is determined, the terminal device may perform semantic analysis on the target material video b to obtain semantic information of the target material video b (S320). After the target material video a is determined, the shooting device may send the semantic information of the target material video a to the terminal device (S330).
利用目标素材视频a和目标素材视频b的语义信息,可以确定属于目标素材视频a的视频片段信息a和属于目标素材视频b的视频片段信息b(S340)。其中,视频片段信息a可以用于发送给拍摄设备(S350),以供拍摄设备根据该视频片段信息a对相应的目标素材视频a进行剪辑,得到目标视频片段a(S360a);而视频片段信息b可以用于终端设备根据该视频片段信息b对目标素材视频b进行剪辑(S360b),得到目标视频片段b。Using the semantic information of the target material video a and the target material video b, the video segment information a belonging to the target material video a and the video segment information b belonging to the target material video b can be determined (S340). Wherein, the video clip information a can be used to send to the shooting device (S350), so that the shooting device can edit the corresponding target material video a according to the video clip information a to obtain the target video clip a (S360a); and the video clip information b can be used by the terminal device to edit the target material video b according to the video clip information b (S360b) to obtain the target video clip b.
拍摄设备可以将目标视频片段a传输给终端设备(S370),终端设备将目标视频片段a和目标视频片段b导入目标成片模板,从而生成最终的影片(S380)。The shooting device may transmit the target video clip a to the terminal device (S370), and the terminal device imports the target video clip a and the target video clip b into the target film template, thereby generating a final movie (S380).
在一实施例中,在无人机飞行的过程中,可以实时将拍摄到的素材的语义信息传回遥控终端(包括遥控器和手机),当拍摄到的素材的语义信息和遥控终端本地存储的素材的语义信息符合预设的规则时,触发所述无人机自动进行拍摄,并控制无人机基于所述预设的规则调整飞行轨迹和姿态,以获取目标拍摄素材。基于目标拍摄素材的实时图传的压缩素材与本地素材进行初步的处理后,可以生成初始预览影片供用户预 览。当用户对该初始预览影片进行原片合成操作时,根据所述原片合成操作获取所述目标拍摄素材,并基于所述目标素材和本地素材合成最终影片。通过以上方式,可以在拍摄到的素材与本地存储的素材的语义信息符合预设的规则时,基于该预设的规则控制无人机的飞行和拍摄,无需用户具有专业的拍摄技巧、操控技巧以及灵敏的拍摄嗅觉,可以防止用户错过与本地素材匹配的拍摄素材的拍摄时机,也避免在初始阶段就占用无人机的拍摄内存以及图传带宽,提高用户体验的同时,也节约内存及图传带宽。In one embodiment, during the flight of the drone, the semantic information of the captured material can be transmitted back to the remote control terminal (including the remote control and mobile phone) in real time, and the semantic information of the captured material and the remote control terminal are stored locally. When the semantic information of the material conforms to the preset rules, the UAV is triggered to automatically shoot, and the UAV is controlled to adjust the flight trajectory and attitude based on the preset rules to obtain the target shooting material. After preliminary processing of the compressed material based on the real-time image transmission of the target shooting material and the local material, an initial preview video can be generated for the user to preview. When the user performs an original film synthesis operation on the initial preview movie, the target shooting material is acquired according to the original film synthesis operation, and a final film is synthesized based on the target material and the local material. Through the above methods, when the semantic information of the captured material and the locally stored material conforms to the preset rules, the flight and shooting of the drone can be controlled based on the preset rules, without the need for the user to have professional shooting skills and manipulation skills As well as a sensitive sense of shooting smell, it can prevent users from missing the shooting opportunity of shooting materials that match the local materials, and also avoid occupying the shooting memory and image transmission bandwidth of the drone at the initial stage, which improves the user experience and saves memory and graphics. transmission bandwidth.
对于上述实施例中所涉及的一些步骤的具体实现,在前文中已有相关说明,在此不再赘述。The specific implementation of some steps involved in the foregoing embodiments has been described in the foregoing, and will not be repeated here.
下面可以参见图4,图4是本申请实施例提供的影片生成方法的另一流程图。该方法可以应用于拍摄设备,该方法包括:Referring to FIG. 4 below, FIG. 4 is another flowchart of the method for generating a movie provided by an embodiment of the present application. The method can be applied to a photographing device, and the method includes:
S410、获取目标素材视频的语义信息。S410. Acquire semantic information of the target material video.
S420、将所述语义信息发送给终端设备。S420. Send the semantic information to the terminal device.
其中,所述语义信息用于所述终端设备确定生成影片所需的视频片段信息;Wherein, the semantic information is used for the terminal device to determine the video segment information required for generating the movie;
S430、获取所述终端设备发送的所述视频片段信息,并根据所述视频片段信息对所述目标素材视频进行剪辑,得到目标视频片段。S430: Acquire the video clip information sent by the terminal device, and edit the target material video according to the video clip information to obtain a target video clip.
S440、将所述目标视频片段传输给所述终端设备。S440. Transmit the target video segment to the terminal device.
所述目标视频片段用于所述终端设备生成影片。The target video segment is used by the terminal device to generate a movie.
可选的,所述视频片段信息用于指示出所述目标视频片段所属的目标素材视频及所述目标视频片段对应的时间段。Optionally, the video clip information is used to indicate a target material video to which the target video clip belongs and a time period corresponding to the target video clip.
可选的,所述目标素材视频的语义信息是通过对目标素材视频进行语义分析得到的。Optionally, the semantic information of the target material video is obtained by performing semantic analysis on the target material video.
可选的,所述语义分析是在所述目标素材视频的拍摄过程中进行的。Optionally, the semantic analysis is performed during the shooting process of the target material video.
可选的,所述语义分析是在充电过程中进行的。Optionally, the semantic analysis is performed during the charging process.
可选的,在所述获取目标素材视频的语义信息之前,所述方法还包括:Optionally, before the acquiring the semantic information of the target material video, the method further includes:
根据设定的条件,从存储的素材视频中筛选出目标素材视频。According to the set conditions, the target material video is filtered from the stored material video.
可选的,所述设定的条件是预先设定的默认条件。Optionally, the set condition is a preset default condition.
可选的,所述设定的条件是由用户设定的。Optionally, the set condition is set by a user.
可选的,所述语义信息包括语义标签。Optionally, the semantic information includes semantic tags.
可选的,所述语义信息包括以下一种或多种:场景识别结果、人物动作检测结果、 人物表情检测结果、目标检测结果、构图评价结果、美学评价结果。Optionally, the semantic information includes one or more of the following: scene recognition results, character action detection results, character expression detection results, target detection results, composition evaluation results, and aesthetic evaluation results.
以上各实施方式的具体实现,在前文中已有相关说明,在此不再赘述。The specific implementation of the above embodiments has been described in the foregoing, and will not be repeated here.
本申请实施例提供的影片生成方法,并不需要拍摄设备将可能用于生成影片的目标素材视频先传输给终端设备,而是可以由终端设备先从拍摄设备处获取目标素材视频的语义信息,利用语义信息确定所需的视频片段信息,从而,只需从拍摄设备处获取视频片段信息对应的目标视频片段即可,无需传输所有的目标素材视频,大大减少了用户的等待时间,提高了自动剪辑的速度。The movie generation method provided by the embodiment of the present application does not require the shooting device to first transmit the target material video that may be used to generate the movie to the terminal device, but the terminal device can obtain the semantic information of the target material video from the shooting device first, Using semantic information to determine the required video clip information, it is only necessary to obtain the target video clip corresponding to the video clip information from the shooting device, and it is not necessary to transmit all the target material videos, which greatly reduces the user's waiting time and improves the automatic The speed of the clip.
下面请参见图5,图5是本申请实施例提供的一种终端设备的结构示意图。该终端设备包括:Referring to FIG. 5 below, FIG. 5 is a schematic structural diagram of a terminal device provided by an embodiment of the present application. The terminal equipment includes:
通信接口510,用于与拍摄设备通信;a communication interface 510 for communicating with the photographing device;
处理器520和存储有计算机程序的存储器530,所述计算机程序被所述处理器执行时实现以下步骤:A processor 520 and a memory 530 storing a computer program that, when executed by the processor, implements the following steps:
获取目标素材视频的语义信息,所述语义信息至少包括:从拍摄设备获取的外部素材视频的语义信息;Acquiring semantic information of the target material video, the semantic information at least includes: the semantic information of the external material video obtained from the shooting device;
根据所述语义信息,确定生成影片所需的视频片段信息;According to the semantic information, determine the video segment information required to generate the movie;
获取与所述视频片段信息对应的目标视频片段,其中,所述目标视频片段至少包括:从所述拍摄设备获取的所述外部素材视频的视频片段;Acquiring a target video clip corresponding to the video clip information, wherein the target video clip at least includes: a video clip of the external material video obtained from the shooting device;
利用所述目标视频片段生成影片。A movie is generated using the target video segment.
可选的,所述处理器在获取与所述视频片段信息对应的目标视频片段时用于,将所述视频片段信息发送给所述拍摄设备后,接收所述拍摄设备根据所述视频片段信息对所述外部素材视频剪辑得到的目标视频片段。Optionally, when acquiring the target video clip corresponding to the video clip information, the processor is configured to, after sending the video clip information to the shooting device, receive the video clip information from the shooting device according to the video clip information. The target video segment obtained from the external material video clip.
可选的,所述视频片段信息用于指示出所述目标视频片段所属的外部素材视频及所述目标视频片段对应的时间段。Optionally, the video clip information is used to indicate an external material video to which the target video clip belongs and a time period corresponding to the target video clip.
可选的,所述目标素材视频还包括:本地素材视频。Optionally, the target material video further includes: local material video.
可选的,所述本地素材视频的语义信息是通过以下方式得到的:Optionally, the semantic information of the local material video is obtained in the following manner:
对所述本地素材视频进行语义分析,得到所述本地素材视频的语义信息。Semantic analysis is performed on the local material video to obtain semantic information of the local material video.
可选的,所述目标视频片段还包括:所述本地素材视频的视频片段;Optionally, the target video segment further includes: a video segment of the local material video;
所述处理器在获取与所述视频片段信息对应的目标视频片段时用于,根据所述视频片段信息对所述本地素材视频进行剪辑,得到所述本地素材视频的视频片段。When acquiring the target video segment corresponding to the video segment information, the processor is configured to edit the local material video according to the video segment information to obtain the video segment of the local material video.
可选的,所述外部素材视频的语义信息是所述拍摄设备对所述外部素材视频进行 语义分析得到的。Optionally, the semantic information of the external material video is obtained by the shooting device performing semantic analysis on the external material video.
可选的,所述处理器在根据所述语义信息,确定生成影片所需的视频片段信息时用于,根据所述语义信息,确定目标成片模板及所述目标成片模板中各视频空位对应的视频片段信息。Optionally, the processor is used to determine, according to the semantic information, the video clip information required for generating a movie, to determine the target movie template and each video slot in the target movie template according to the semantic information. Corresponding video clip information.
可选的,所述目标成片模板是从候选成片模板中确定的,所述候选成片模板是从成片模板库中确定的。Optionally, the target sheet-forming template is determined from candidate sheet-forming templates, and the candidate sheet-forming template is determined from a sheet-forming template library.
可选的,所述候选成片模板是通过以下方式确定的:Optionally, the candidate slice template is determined in the following manner:
根据所述语义信息,确定待生成影片的风格类型;According to the semantic information, determine the style type of the movie to be generated;
根据所述风格类型,从成片模板库中筛选出所述候选成片模板。According to the style type, the candidate film templates are screened from the film template library.
可选的,所述处理器在根据所述语义信息,确定目标成片模板及所述目标成片模板中各视频空位对应的视频片段信息用于,利用所述语义信息,计算所述目标素材视频中的视频片段与所述候选成片模板中各视频空位的匹配度,并计算相邻视频空位之间视频过渡的平滑度;根据所述匹配度与所述平滑度,确定目标成片模板及所述目标成片模板中各视频空位对应的目标视频片段。Optionally, the processor determines, according to the semantic information, the target filming template and the video clip information corresponding to each video slot in the target filming template, and uses the semantic information to calculate the target material. The matching degree of the video clip in the video and each video slot in the candidate film template, and calculate the smoothness of the video transition between adjacent video slots; According to the matching degree and the smoothness, determine the target film template and the target video segments corresponding to each video slot in the target filming template.
可选的,所述目标成片模板包括以下一种或多种内容:音乐、转场效果、贴图、视频特效。Optionally, the target film-forming template includes one or more of the following contents: music, transition effects, textures, and video special effects.
可选的,所述处理器在利用所述目标视频片段生成影片时用于,将所述目标视频片段导入所述目标成片模板对应的视频空位,生成影片。Optionally, when generating a movie by using the target video clip, the processor is configured to import the target video clip into a video slot corresponding to the target movie template to generate a movie.
可选的,所述目标素材视频是根据预设的条件自动从存储的素材视频中筛选得到的。Optionally, the target material video is automatically selected from the stored material video according to a preset condition.
可选的,所述目标素材视频是根据用户设定的条件从存储的素材视频中筛选得到的。Optionally, the target material video is selected from the stored material video according to a condition set by the user.
可选的,所述条件包括以下一种或多种:时间、地点、人物信息、场景信息。Optionally, the conditions include one or more of the following: time, location, character information, and scene information.
可选的,所述语义信息包括语义标签。Optionally, the semantic information includes semantic tags.
可选的,所述语义信息包括以下一种或多种:场景识别结果、人物动作检测结果、人物表情检测结果、目标检测结果、构图评价结果、美学评价结果。Optionally, the semantic information includes one or more of the following: scene recognition results, character action detection results, character expression detection results, target detection results, composition evaluation results, and aesthetic evaluation results.
以上各实施方式的具体实现,在前文中已有相关说明,在此不再赘述。The specific implementation of the above embodiments has been described in the foregoing, and will not be repeated here.
本申请实施例提供的终端设备,并不需要拍摄设备将可能用于生成影片的目标素材视频先传输给终端设备,而是可以由终端设备先从拍摄设备处获取目标素材视频的语义信息,利用语义信息确定所需的视频片段信息,从而,只需从拍摄设备处获取视频片段信息对应的目标视频片段即可,无需传输所有的目标素材视频,大大减少了用 户的等待时间,提高了自动剪辑的速度。The terminal device provided by the embodiment of the present application does not require the shooting device to first transmit the target material video that may be used to generate the movie to the terminal device, but the terminal device can first obtain the semantic information of the target material video from the shooting device, and use the The semantic information determines the required video clip information. Therefore, it is only necessary to obtain the target video clip corresponding to the video clip information from the shooting device, and it is not necessary to transmit all the target material videos, which greatly reduces the user's waiting time and improves automatic editing. speed.
下面请参见图6,图6是本申请实施例提供的一种拍摄设备的结构示意图。该拍摄设备包括:Please refer to FIG. 6 below. FIG. 6 is a schematic structural diagram of a photographing device provided by an embodiment of the present application. The shooting equipment includes:
摄像头610,用于拍摄素材视频;The camera 610 is used for shooting material video;
通信接口620,用于与终端设备通信;a communication interface 620 for communicating with terminal equipment;
处理器630和存储有计算机程序的存储器640,所述计算机程序被所述处理器执行时实现以下步骤:A processor 630 and a memory 640 storing a computer program that, when executed by the processor, implements the following steps:
获取目标素材视频的语义信息;Obtain the semantic information of the target material video;
将所述语义信息发送给终端设备,其中,所述语义信息用于所述终端设备确定生成影片所需的视频片段信息;sending the semantic information to a terminal device, wherein the semantic information is used by the terminal device to determine video segment information required for generating a movie;
获取所述终端设备发送的所述视频片段信息,并根据所述视频片段信息对所述目标素材视频进行剪辑,得到目标视频片段;acquiring the video clip information sent by the terminal device, and editing the target material video according to the video clip information to obtain a target video clip;
将所述目标视频片段传输给所述终端设备,以便所述终端设备利用所述目标视频片段生成影片。The target video clip is transmitted to the terminal device, so that the terminal device generates a movie using the target video clip.
可选的,所述视频片段信息用于指示出所述目标视频片段所属的目标素材视频及所述目标视频片段对应的时间段。Optionally, the video clip information is used to indicate a target material video to which the target video clip belongs and a time period corresponding to the target video clip.
可选的,所述目标素材视频的语义信息是通过对目标素材视频进行语义分析得到的。Optionally, the semantic information of the target material video is obtained by performing semantic analysis on the target material video.
可选的,所述语义分析是在所述目标素材视频的拍摄过程中进行的。Optionally, the semantic analysis is performed during the shooting process of the target material video.
可选的,所述语义分析是在充电过程中进行的。Optionally, the semantic analysis is performed during the charging process.
可选的,所述处理器还用于,在所述获取目标素材视频的语义信息之前,根据设定的条件,从存储的素材视频中筛选出目标素材视频。Optionally, the processor is further configured to, before acquiring the semantic information of the target material video, filter out the target material video from the stored material video according to a set condition.
可选的,所述设定的条件是预先设定的默认条件。Optionally, the set condition is a preset default condition.
可选的,所述设定的条件是由用户设定的。Optionally, the set condition is set by a user.
可选的,所述语义信息包括语义标签。Optionally, the semantic information includes semantic tags.
可选的,所述语义信息包括以下一种或多种:场景识别结果、人物动作检测结果、人物表情检测结果、目标检测结果、构图评价结果、美学评价结果。Optionally, the semantic information includes one or more of the following: scene recognition results, character action detection results, character expression detection results, target detection results, composition evaluation results, and aesthetic evaluation results.
可选的,所述拍摄设备包括可移动平台或相机或云台相机。Optionally, the photographing device includes a movable platform or a camera or a pan-tilt camera.
以上各实施方式的具体实现,在前文中已有相关说明,在此不再赘述。The specific implementation of the above embodiments has been described in the foregoing, and will not be repeated here.
本申请实施例提供的拍摄设备,并不需要将可能用于生成影片的目标素材视频先 传输给终端设备,而是可以先将目标素材视频的语义信息发送给终端设备,以便终端设备利用语义信息确定所需的视频片段信息,并将该视频片段信息发送给拍摄设备。从而,拍摄设备只将该视频片段信息对应的目标视频片段传输给终端设备,无需传输所有的目标素材视频,大大减少了用户的等待时间,提高了自动剪辑的速度。The shooting device provided by the embodiment of the present application does not need to transmit the target material video that may be used to generate the movie to the terminal device first, but can first send the semantic information of the target material video to the terminal device, so that the terminal device can use the semantic information Determine the required video clip information and send the video clip information to the shooting device. Therefore, the shooting device only transmits the target video segment corresponding to the video segment information to the terminal device, and does not need to transmit all the target material videos, which greatly reduces the user's waiting time and improves the speed of automatic editing.
下面请参见图7,图7是本申请实施例提供的一种影片生成系统的结构示意图。该系统包括:Please refer to FIG. 7 below. FIG. 7 is a schematic structural diagram of a movie generation system provided by an embodiment of the present application. The system includes:
终端设备710,用于获取目标素材视频的语义信息,所述语义信息至少包括:从拍摄设备获取的外部素材视频的语义信息;根据所述语义信息,确定生成影片所需的视频片段信息;获取与所述视频片段信息对应的目标视频片段,其中,所述目标视频片段至少包括:从所述拍摄设备获取的所述外部素材视频的视频片段;利用所述目标视频片段生成影片;The terminal device 710 is configured to acquire semantic information of the target material video, where the semantic information at least includes: the semantic information of the external material video acquired from the shooting device; according to the semantic information, determine the video segment information required for generating the film; acquire A target video clip corresponding to the video clip information, wherein the target video clip at least includes: a video clip of the external material video acquired from the shooting device; generating a movie by using the target video clip;
拍摄设备720,用于获取所述外部素材视频的语义信息;将所述语义信息发送给所述终端设备;获取所述终端设备发送的所述视频片段信息,并根据所述视频片段信息对所述目标素材视频进行剪辑,得到目标视频片段;将所述目标视频片段传输给所述终端设备。The shooting device 720 is configured to acquire the semantic information of the external material video; send the semantic information to the terminal device; acquire the video clip information sent by the terminal device, and analyze the video clip information according to the video clip information. The target material video is edited to obtain a target video segment; and the target video segment is transmitted to the terminal device.
可选的,所述视频片段信息用于指示出所述目标视频片段所属的外部素材视频及所述目标视频片段对应的时间段。Optionally, the video clip information is used to indicate an external material video to which the target video clip belongs and a time period corresponding to the target video clip.
可选的,所述目标素材视频还包括:本地素材视频,所述终端设备还用于,在本地获取所述本地素材视频的语义信息。Optionally, the target material video further includes: a local material video, and the terminal device is further configured to locally acquire semantic information of the local material video.
可选的,所述终端设备还用于,对所述本地素材视频进行语义分析,得到所述本地素材视频的语义信息。Optionally, the terminal device is further configured to perform semantic analysis on the local material video to obtain semantic information of the local material video.
可选的,所述目标视频片段还包括:所述本地素材视频的视频片段;Optionally, the target video segment further includes: a video segment of the local material video;
所述终端设备还用于,根据所述视频片段信息对所述本地素材视频进行剪辑,得到所述本地素材视频的视频片段。The terminal device is further configured to edit the local material video according to the video segment information to obtain a video segment of the local material video.
可选的,所述外部素材视频的语义信息是所述拍摄设备对所述外部素材视频进行语义分析得到的。Optionally, the semantic information of the external material video is obtained by semantic analysis of the external material video by the shooting device.
可选的,所述语义分析是在所述目标素材视频的拍摄过程中进行的。Optionally, the semantic analysis is performed during the shooting process of the target material video.
可选的,所述语义分析是在充电过程中进行的。Optionally, the semantic analysis is performed during the charging process.
可选的,所述终端设备在根据所述语义信息,确定生成影片所需的视频片段信息时用于,根据所述语义信息,确定目标成片模板及所述目标成片模板中各视频空位对应的视频片段信息。Optionally, the terminal device is used to determine, according to the semantic information, the video clip information required for generating a movie, according to the semantic information, to determine a target movie template and each video slot in the target movie template. Corresponding video clip information.
可选的,所述目标成片模板是从候选成片模板中确定的,所述候选成片模板是从成片模板库中确定的。Optionally, the target sheet-forming template is determined from candidate sheet-forming templates, and the candidate sheet-forming template is determined from a sheet-forming template library.
可选的,所述候选成片模板是通过以下方式确定的:Optionally, the candidate slice template is determined in the following manner:
根据所述语义信息,确定待生成影片的风格类型;According to the semantic information, determine the style type of the movie to be generated;
根据所述风格类型,从成片模板库中筛选出所述候选成片模板。According to the style type, the candidate film templates are screened from the film template library.
可选的,所述终端设备在根据所述语义信息,确定目标成片模板及所述目标成片模板中各视频空位对应的视频片段信息用于,利用所述语义信息,计算所述目标素材视频中的视频片段与所述候选成片模板中各视频空位的匹配度,并计算相邻视频空位之间视频过渡的平滑度;根据所述匹配度与所述平滑度,确定目标成片模板及所述目标成片模板中各视频空位对应的目标视频片段。Optionally, the terminal device determines, according to the semantic information, the target filming template and the video clip information corresponding to each video slot in the target filming template, and uses the semantic information to calculate the target material. The matching degree of the video clip in the video and each video slot in the candidate film template, and calculate the smoothness of the video transition between adjacent video slots; According to the matching degree and the smoothness, determine the target film template and the target video segments corresponding to each video slot in the target filming template.
可选的,所述目标成片模板包括以下一种或多种内容:音乐、转场效果、贴图、视频特效。Optionally, the target film-forming template includes one or more of the following contents: music, transition effects, textures, and video special effects.
可选的,所述终端设备在利用所述目标视频片段生成影片时用于,将所述目标视频片段导入所述目标成片模板对应的视频空位,生成影片。Optionally, when generating a movie by using the target video clip, the terminal device is configured to import the target video clip into a video slot corresponding to the target movie template to generate a movie.
可选的,所述目标素材视频是根据预设的条件自动从存储的素材视频中筛选得到的。Optionally, the target material video is automatically selected from the stored material video according to a preset condition.
可选的,所述目标素材视频是根据用户设定的条件从存储的素材视频中筛选得到的。Optionally, the target material video is selected from the stored material video according to a condition set by the user.
可选的,所述条件包括以下一种或多种:时间、地点、人物信息、场景信息。Optionally, the conditions include one or more of the following: time, location, character information, and scene information.
可选的,所述语义信息包括语义标签。Optionally, the semantic information includes semantic tags.
可选的,所述语义信息包括以下一种或多种:场景识别结果、人物动作检测结果、人物表情检测结果、目标检测结果、构图评价结果、美学评价结果。Optionally, the semantic information includes one or more of the following: scene recognition results, character action detection results, character expression detection results, target detection results, composition evaluation results, and aesthetic evaluation results.
以上各实施方式的具体实现,在前文中已有相关说明,在此不再赘述。The specific implementation of the above embodiments has been described in the foregoing, and will not be repeated here.
本申请实施例提供的影片生成系统,并不需要拍摄设备将可能用于生成影片的目标素材视频先传输给终端设备,而是可以由终端设备先从拍摄设备处获取目标素材视频的语义信息,利用语义信息确定所需的视频片段信息,从而,只需从拍摄设备处获取视频片段信息对应的目标视频片段即可,无需传输所有的目标素材视频,大大减少了用户的等待时间,提高了自动剪辑的速度。The movie generation system provided by the embodiment of the present application does not require the shooting device to first transmit the target material video that may be used to generate the movie to the terminal device, but the terminal device can first obtain the semantic information of the target material video from the shooting device. Using semantic information to determine the required video clip information, it is only necessary to obtain the target video clip corresponding to the video clip information from the shooting device, and it is not necessary to transmit all the target material videos, which greatly reduces the user's waiting time and improves the automatic The speed of the clip.
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现本申请实施例提供的应用于终端设备的影片生成方法。Embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, realizes the generation of a movie applied to a terminal device provided by the embodiments of the present application method.
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现本申请实施例提供的应用于拍摄设备的影片生成方法。Embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, realizes the generation of a film applied to a shooting device provided by the embodiments of the present application method.
以上实施例中提供的技术特征,只要不存在冲突或矛盾,本领域技术人员可以根据实际情况对各个技术特征进行组合,从而构成各种不同的实施例。而本申请文件限于篇幅,未对各种不同的实施例展开说明,但可以理解的是,各种不同的实施例也属于本申请实施例公开的范围。As long as there is no conflict or contradiction in the technical features provided in the above embodiments, those skilled in the art can combine the technical features according to actual conditions to form various embodiments. However, this application document is limited in space and does not describe various embodiments, but it is understood that various embodiments also belong to the scope disclosed by the embodiments of this application.
本申请实施例可采用在一个或多个其中包含有程序代码的存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。计算机可用存储介质包括永久性和非永久性、可移动和非可移动媒体,可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括但不限于:相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。Embodiments of the present application may take the form of a computer program product implemented on one or more storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having program code embodied therein. Computer-usable storage media includes permanent and non-permanent, removable and non-removable media, and storage of information can be accomplished by any method or technology. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this document, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. The terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion such that a process, method, article or device comprising a list of elements includes not only those elements, but also other not expressly listed elements, or also include elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.
以上对本申请实施例所提供的方法、设备及系统进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The methods, devices, and systems provided by the embodiments of the present application have been described in detail above, and the principles and implementations of the present application are described with specific examples herein. The descriptions of the above embodiments are only used to help understand the methods of the present invention. and its core idea; at the same time, for those of ordinary skill in the art, according to the idea of the application, there will be changes in the specific implementation and application scope. To sum up, the content of this specification should not be construed as a limits.

Claims (78)

  1. 一种影片生成方法,其特征在于,包括:A method for generating a film, comprising:
    获取目标素材视频的语义信息,所述语义信息至少包括:从拍摄设备获取的外部素材视频的语义信息;Acquiring semantic information of the target material video, the semantic information at least includes: the semantic information of the external material video obtained from the shooting device;
    根据所述语义信息,确定生成影片所需的视频片段信息;According to the semantic information, determine the video segment information required to generate the movie;
    获取与所述视频片段信息对应的目标视频片段,其中,所述目标视频片段至少包括:从所述拍摄设备获取的所述外部素材视频的视频片段;Acquiring a target video clip corresponding to the video clip information, wherein the target video clip at least includes: a video clip of the external material video obtained from the shooting device;
    利用所述目标视频片段生成影片。A movie is generated using the target video segment.
  2. 根据权利要求1所述的方法,其特征在于,所述获取与所述视频片段信息对应的目标视频片段,包括:The method according to claim 1, wherein the acquiring the target video clip corresponding to the video clip information comprises:
    将所述视频片段信息发送给所述拍摄设备后,接收所述拍摄设备根据所述视频片段信息对所述外部素材视频剪辑得到的目标视频片段。After the video clip information is sent to the shooting device, a target video clip obtained by the shooting device from clipping the external material video according to the video clip information is received.
  3. 根据权利要求2所述的方法,其特征在于,所述视频片段信息用于指示出所述目标视频片段所属的外部素材视频及所述目标视频片段对应的时间段。The method according to claim 2, wherein the video clip information is used to indicate an external material video to which the target video clip belongs and a time period corresponding to the target video clip.
  4. 根据权利要求1所述的方法,其特征在于,所述目标素材视频还包括:本地素材视频。The method according to claim 1, wherein the target material video further comprises: local material video.
  5. 根据权利要求4所述的方法,其特征在于,所述本地素材视频的语义信息是通过以下方式得到的:The method according to claim 4, wherein the semantic information of the local material video is obtained in the following manner:
    对所述本地素材视频进行语义分析,得到所述本地素材视频的语义信息。Semantic analysis is performed on the local material video to obtain semantic information of the local material video.
  6. 根据权利要求4所述的方法,其特征在于,所述目标视频片段还包括:所述本地素材视频的视频片段;The method according to claim 4, wherein the target video segment further comprises: a video segment of the local material video;
    所述获取与所述视频片段信息对应的目标视频片段,包括:The acquiring the target video clip corresponding to the video clip information includes:
    根据所述视频片段信息对所述本地素材视频进行剪辑,得到所述本地素材视频的视频片段。The local material video is edited according to the video segment information to obtain a video segment of the local material video.
  7. 根据权利要求1所述的方法,其特征在于,所述外部素材视频的语义信息是所述拍摄设备对所述外部素材视频进行语义分析得到的。The method according to claim 1, wherein the semantic information of the external material video is obtained by semantic analysis of the external material video by the shooting device.
  8. 根据权利要求1所述的方法,其特征在于,所述根据所述语义信息,确定生成影片所需的视频片段信息,包括:The method according to claim 1, wherein the determining, according to the semantic information, video segment information required for generating a movie comprises:
    根据所述语义信息,确定目标成片模板及所述目标成片模板中各视频空位对应的视频片段信息。According to the semantic information, the target slice template and the video segment information corresponding to each video slot in the target slice template are determined.
  9. 根据权利要求8所述的方法,其特征在于,所述目标成片模板是从候选成片模 板中确定的,所述候选成片模板是从成片模板库中确定的。The method according to claim 8, wherein the target sheet-forming template is determined from candidate sheet-forming templates, and the candidate sheet-forming templates are determined from a sheet-forming template library.
  10. 根据权利要求9所述的方法,其特征在于,所述候选成片模板是通过以下方式确定的:The method according to claim 9, wherein the candidate slice template is determined in the following manner:
    根据所述语义信息,确定待生成影片的风格类型;According to the semantic information, determine the style type of the movie to be generated;
    根据所述风格类型,从成片模板库中筛选出所述候选成片模板。According to the style type, the candidate film templates are screened from the film template library.
  11. 根据权利要求9所述的方法,其特征在于,所述根据所述语义信息,确定目标成片模板及所述目标成片模板中各视频空位对应的视频片段信息,包括:The method according to claim 9, wherein, according to the semantic information, determining the target filming template and the video clip information corresponding to each video slot in the target filming template, comprising:
    利用所述语义信息,计算所述目标素材视频中的视频片段与所述候选成片模板中各视频空位的匹配度,并计算相邻视频空位之间视频过渡的平滑度;Using the semantic information, calculate the degree of matching between the video segment in the target material video and each video slot in the candidate film template, and calculate the smoothness of video transition between adjacent video slots;
    根据所述匹配度与所述平滑度,确定目标成片模板及所述目标成片模板中各视频空位对应的目标视频片段。According to the matching degree and the smoothness, the target filming template and the target video segment corresponding to each video slot in the target filming template are determined.
  12. 根据权利要求8所述的方法,其特征在于,所述目标成片模板包括以下一种或多种内容:音乐、转场效果、贴图、视频特效。The method according to claim 8, wherein the target film-forming template includes one or more of the following contents: music, transition effects, textures, and video special effects.
  13. 根据权利要求8所述的方法,其特征在于,所述利用所述目标视频片段生成影片,包括:The method according to claim 8, wherein the generating a movie by using the target video segment comprises:
    将所述目标视频片段导入所述目标成片模板对应的视频空位,生成影片。The target video clip is imported into the video slot corresponding to the target film template to generate a movie.
  14. 根据权利要求1所述的方法,其特征在于,所述目标素材视频是根据预设的条件自动从存储的素材视频中筛选得到的。The method according to claim 1, wherein the target material video is automatically obtained from the stored material video according to preset conditions.
  15. 根据权利要求1所述的方法,其特征在于,所述目标素材视频是根据用户设定的条件从存储的素材视频中筛选得到的。The method according to claim 1, wherein the target material video is obtained by screening from the stored material video according to a condition set by a user.
  16. 根据权利要求14或15所述的方法,其特征在于,所述条件包括以下一种或多种:时间、地点、人物信息、场景信息。The method according to claim 14 or 15, wherein the conditions include one or more of the following: time, location, character information, and scene information.
  17. 根据权利要求1所述的方法,其特征在于,所述语义信息包括语义标签。The method of claim 1, wherein the semantic information includes semantic tags.
  18. 根据权利要求1所述的方法,其特征在于,所述语义信息包括以下一种或多种:场景识别结果、人物动作检测结果、人物表情检测结果、目标检测结果、构图评价结果、美学评价结果。The method according to claim 1, wherein the semantic information includes one or more of the following: scene recognition results, human action detection results, human expression detection results, target detection results, composition evaluation results, and aesthetic evaluation results .
  19. 一种影片生成方法,其特征在于,包括:A method for generating a film, comprising:
    获取目标素材视频的语义信息;Obtain the semantic information of the target material video;
    将所述语义信息发送给终端设备,其中,所述语义信息用于所述终端设备确定生 成影片所需的视频片段信息;Sending the semantic information to the terminal device, wherein the semantic information is used for the terminal device to determine the video segment information required for generating the movie;
    获取所述终端设备发送的所述视频片段信息,并根据所述视频片段信息对所述目标素材视频进行剪辑,得到目标视频片段;acquiring the video clip information sent by the terminal device, and editing the target material video according to the video clip information to obtain a target video clip;
    将所述目标视频片段传输给所述终端设备,以便所述终端设备利用所述目标视频片段生成影片。The target video clip is transmitted to the terminal device, so that the terminal device generates a movie using the target video clip.
  20. 根据权利要求19所述的方法,其特征在于,所述视频片段信息用于指示出所述目标视频片段所属的目标素材视频及所述目标视频片段对应的时间段。The method according to claim 19, wherein the video clip information is used to indicate a target material video to which the target video clip belongs and a time period corresponding to the target video clip.
  21. 根据权利要求19所述的方法,其特征在于,所述目标素材视频的语义信息是通过对目标素材视频进行语义分析得到的。The method according to claim 19, wherein the semantic information of the target material video is obtained by semantic analysis of the target material video.
  22. 根据权利要求21所述的方法,其特征在于,所述语义分析是在所述目标素材视频的拍摄过程中进行的。The method according to claim 21, wherein the semantic analysis is performed during the shooting process of the target material video.
  23. 根据权利要求21所述的方法,其特征在于,所述语义分析是在充电过程中进行的。The method of claim 21, wherein the semantic analysis is performed during charging.
  24. 根据权利要求19所述的方法,其特征在于,在所述获取目标素材视频的语义信息之前,所述方法还包括:The method according to claim 19, wherein before the acquiring the semantic information of the target material video, the method further comprises:
    根据设定的条件,从存储的素材视频中筛选出目标素材视频。According to the set conditions, the target material video is filtered from the stored material video.
  25. 根据权利要求24所述的方法,其特征在于,所述设定的条件是预先设定的默认条件。The method according to claim 24, wherein the set condition is a preset default condition.
  26. 根据权利要求24所述的方法,其特征在于,所述设定的条件是由用户设定的。The method of claim 24, wherein the set condition is set by a user.
  27. 根据权利要求19所述的方法,其特征在于,所述语义信息包括语义标签。The method of claim 19, wherein the semantic information includes semantic tags.
  28. 根据权利要求19所述的方法,其特征在于,所述语义信息包括以下一种或多种:场景识别结果、人物动作检测结果、人物表情检测结果、目标检测结果、构图评价结果、美学评价结果。The method according to claim 19, wherein the semantic information includes one or more of the following: scene recognition results, human action detection results, human expression detection results, target detection results, composition evaluation results, and aesthetic evaluation results .
  29. 一种终端设备,其特征在于,包括:A terminal device, characterized in that it includes:
    通信接口,用于与拍摄设备通信;A communication interface for communicating with the photographing device;
    处理器和存储有计算机程序的存储器,所述计算机程序被所述处理器执行时实现以下步骤:A processor and a memory storing a computer program that, when executed by the processor, implements the following steps:
    获取目标素材视频的语义信息,所述语义信息至少包括:从拍摄设备获取的外部素材视频的语义信息;Acquiring semantic information of the target material video, the semantic information at least includes: the semantic information of the external material video obtained from the shooting device;
    根据所述语义信息,确定生成影片所需的视频片段信息;According to the semantic information, determine the video segment information required to generate the movie;
    获取与所述视频片段信息对应的目标视频片段,其中,所述目标视频片段至少包括:从所述拍摄设备获取的所述外部素材视频的视频片段;Acquiring a target video clip corresponding to the video clip information, wherein the target video clip at least includes: a video clip of the external material video obtained from the shooting device;
    利用所述目标视频片段生成影片。A movie is generated using the target video segment.
  30. 根据权利要求29所述的终端设备,其特征在于,所述处理器在获取与所述视频片段信息对应的目标视频片段时用于,将所述视频片段信息发送给所述拍摄设备后,接收所述拍摄设备根据所述视频片段信息对所述外部素材视频剪辑得到的目标视频片段。The terminal device according to claim 29, wherein when acquiring the target video clip corresponding to the video clip information, the processor is configured to, after sending the video clip information to the shooting device, receive the video clip information The target video segment obtained by the shooting device from the video clip of the external material according to the video segment information.
  31. 根据权利要求30所述的终端设备,其特征在于,所述视频片段信息用于指示出所述目标视频片段所属的外部素材视频及所述目标视频片段对应的时间段。The terminal device according to claim 30, wherein the video clip information is used to indicate an external material video to which the target video clip belongs and a time period corresponding to the target video clip.
  32. 根据权利要求29所述的终端设备,其特征在于,所述目标素材视频还包括:本地素材视频。The terminal device according to claim 29, wherein the target material video further comprises: local material video.
  33. 根据权利要求32所述的终端设备,其特征在于,所述本地素材视频的语义信息是通过以下方式得到的:The terminal device according to claim 32, wherein the semantic information of the local material video is obtained in the following manner:
    对所述本地素材视频进行语义分析,得到所述本地素材视频的语义信息。Semantic analysis is performed on the local material video to obtain semantic information of the local material video.
  34. 根据权利要求32所述的终端设备,其特征在于,所述目标视频片段还包括:所述本地素材视频的视频片段;The terminal device according to claim 32, wherein the target video segment further comprises: a video segment of the local material video;
    所述处理器在获取与所述视频片段信息对应的目标视频片段时用于,根据所述视频片段信息对所述本地素材视频进行剪辑,得到所述本地素材视频的视频片段。When acquiring the target video segment corresponding to the video segment information, the processor is configured to edit the local material video according to the video segment information to obtain the video segment of the local material video.
  35. 根据权利要求29所述的终端设备,其特征在于,所述外部素材视频的语义信息是所述拍摄设备对所述外部素材视频进行语义分析得到的。The terminal device according to claim 29, wherein the semantic information of the external material video is obtained by semantic analysis of the external material video by the shooting device.
  36. 根据权利要求29所述的终端设备,其特征在于,所述处理器在根据所述语义信息,确定生成影片所需的视频片段信息时用于,根据所述语义信息,确定目标成片模板及所述目标成片模板中各视频空位对应的视频片段信息。The terminal device according to claim 29, wherein, when the processor determines the video segment information required for generating a movie according to the semantic information, the processor is configured to, according to the semantic information, determine a target movie-forming template and Video segment information corresponding to each video slot in the target filming template.
  37. 根据权利要求36所述的终端设备,其特征在于,所述目标成片模板是从候选成片模板中确定的,所述候选成片模板是从成片模板库中确定的。The terminal device according to claim 36, wherein the target slice forming template is determined from a candidate slice forming template, and the candidate slice forming template is determined from a slice forming template library.
  38. 根据权利要求37所述的终端设备,其特征在于,所述候选成片模板是通过以下方式确定的:The terminal device according to claim 37, wherein the candidate slice template is determined in the following manner:
    根据所述语义信息,确定待生成影片的风格类型;According to the semantic information, determine the style type of the movie to be generated;
    根据所述风格类型,从成片模板库中筛选出所述候选成片模板。According to the style type, the candidate film template is screened from the film template library.
  39. 根据权利要求37所述的终端设备,其特征在于,所述处理器在根据所述语义信息,确定目标成片模板及所述目标成片模板中各视频空位对应的视频片段信息用于, 利用所述语义信息,计算所述目标素材视频中的视频片段与所述候选成片模板中各视频空位的匹配度,并计算相邻视频空位之间视频过渡的平滑度;根据所述匹配度与所述平滑度,确定目标成片模板及所述目标成片模板中各视频空位对应的目标视频片段。The terminal device according to claim 37, wherein, according to the semantic information, the processor determines a target filming template and video clip information corresponding to each video slot in the target filming template, and is used for, using For the semantic information, the degree of matching between the video segment in the target material video and each video slot in the candidate film template is calculated, and the smoothness of the video transition between adjacent video slots is calculated; according to the matching degree and For the smoothness, the target filming template and the target video segments corresponding to each video slot in the target filming template are determined.
  40. 根据权利要求36所述的终端设备,其特征在于,所述目标成片模板包括以下一种或多种内容:音乐、转场效果、贴图、视频特效。The terminal device according to claim 36, wherein the target film-forming template includes one or more of the following contents: music, transition effects, textures, and video special effects.
  41. 根据权利要求36所述的终端设备,其特征在于,所述处理器在利用所述目标视频片段生成影片时用于,将所述目标视频片段导入所述目标成片模板对应的视频空位,生成影片。The terminal device according to claim 36, wherein when generating a movie by using the target video clip, the processor is configured to import the target video clip into a video slot corresponding to the target movie template, and generate Film.
  42. 根据权利要求29所述的终端设备,其特征在于,所述目标素材视频是根据预设的条件自动从存储的素材视频中筛选得到的。The terminal device according to claim 29, wherein the target material video is automatically selected from the stored material video according to a preset condition.
  43. 根据权利要求29所述的终端设备,其特征在于,所述目标素材视频是根据用户设定的条件从存储的素材视频中筛选得到的。The terminal device according to claim 29, wherein the target material video is obtained by screening from the stored material video according to a condition set by the user.
  44. 根据权利要求42或43所述的终端设备,其特征在于,所述条件包括以下一种或多种:时间、地点、人物信息、场景信息。The terminal device according to claim 42 or 43, wherein the conditions include one or more of the following: time, location, character information, and scene information.
  45. 根据权利要求29所述的终端设备,其特征在于,所述语义信息包括语义标签。The terminal device according to claim 29, wherein the semantic information comprises a semantic tag.
  46. 根据权利要求29所述的终端设备,其特征在于,所述语义信息包括以下一种或多种:场景识别结果、人物动作检测结果、人物表情检测结果、目标检测结果、构图评价结果、美学评价结果。The terminal device according to claim 29, wherein the semantic information includes one or more of the following: scene recognition result, character action detection result, character expression detection result, target detection result, composition evaluation result, aesthetic evaluation result.
  47. 一种拍摄设备,其特征在于,包括:A photographing device, comprising:
    摄像头,用于拍摄素材视频;Camera, used to shoot material video;
    通信接口,用于与终端设备通信;Communication interface for communicating with terminal equipment;
    处理器和存储有计算机程序的存储器,所述计算机程序被所述处理器执行时实现以下步骤:A processor and a memory storing a computer program that, when executed by the processor, implements the following steps:
    获取目标素材视频的语义信息;Obtain the semantic information of the target material video;
    将所述语义信息发送给终端设备,其中,所述语义信息用于所述终端设备确定生成影片所需的视频片段信息;sending the semantic information to a terminal device, wherein the semantic information is used by the terminal device to determine video segment information required for generating a movie;
    获取所述终端设备发送的所述视频片段信息,并根据所述视频片段信息对所述目标素材视频进行剪辑,得到目标视频片段;acquiring the video clip information sent by the terminal device, and editing the target material video according to the video clip information to obtain a target video clip;
    将所述目标视频片段传输给所述终端设备,以便所述终端设备利用所述目标视频片段生成影片。The target video clip is transmitted to the terminal device, so that the terminal device generates a movie using the target video clip.
  48. 根据权利要求47所述的拍摄设备,其特征在于,所述视频片段信息用于指示出所述目标视频片段所属的目标素材视频及所述目标视频片段对应的时间段。The shooting device according to claim 47, wherein the video clip information is used to indicate a target material video to which the target video clip belongs and a time period corresponding to the target video clip.
  49. 根据权利要求47所述的拍摄设备,其特征在于,所述目标素材视频的语义信息是通过对目标素材视频进行语义分析得到的。The shooting device according to claim 47, wherein the semantic information of the target material video is obtained by semantic analysis of the target material video.
  50. 根据权利要求49所述的拍摄设备,其特征在于,所述语义分析是在所述目标素材视频的拍摄过程中进行的。The shooting device according to claim 49, wherein the semantic analysis is performed during the shooting process of the target material video.
  51. 根据权利要求49所述的拍摄设备,其特征在于,所述语义分析是在充电过程中进行的。The photographing device of claim 49, wherein the semantic analysis is performed during charging.
  52. 根据权利要求47所述的拍摄设备,其特征在于,所述处理器还用于,在所述获取目标素材视频的语义信息之前,根据设定的条件,从存储的素材视频中筛选出目标素材视频。The photographing device according to claim 47, wherein the processor is further configured to, before acquiring the semantic information of the target material video, filter out the target material from the stored material video according to a set condition video.
  53. 根据权利要求52所述的拍摄设备,其特征在于,所述设定的条件是预先设定的默认条件。The photographing device according to claim 52, wherein the set condition is a preset default condition.
  54. 根据权利要求52所述的拍摄设备,其特征在于,所述设定的条件是由用户设定的。The photographing apparatus according to claim 52, wherein the set condition is set by a user.
  55. 根据权利要求47所述的拍摄设备,其特征在于,所述语义信息包括语义标签。The photographing device of claim 47, wherein the semantic information includes a semantic tag.
  56. 根据权利要求47所述的拍摄设备,其特征在于,所述语义信息包括以下一种或多种:场景识别结果、人物动作检测结果、人物表情检测结果、目标检测结果、构图评价结果、美学评价结果。The photographing device according to claim 47, wherein the semantic information includes one or more of the following: scene recognition result, character action detection result, character expression detection result, target detection result, composition evaluation result, aesthetic evaluation result.
  57. 根据权利要求47所述的拍摄设备,其特征在于,所述拍摄设备包括可移动平台或相机或云台相机。The photographing device according to claim 47, wherein the photographing device comprises a movable platform or a camera or a pan-tilt camera.
  58. 一种影片生成系统,其特征在于,包括:A film generation system, characterized in that it includes:
    终端设备,用于获取目标素材视频的语义信息,所述语义信息至少包括:从拍摄设备获取的外部素材视频的语义信息;根据所述语义信息,确定生成影片所需的视频片段信息;获取与所述视频片段信息对应的目标视频片段,其中,所述目标视频片段至少包括:从所述拍摄设备获取的所述外部素材视频的视频片段;利用所述目标视频片段生成影片;The terminal device is used for acquiring the semantic information of the target material video, the semantic information at least includes: the semantic information of the external material video obtained from the shooting device; according to the semantic information, determine the video segment information required for generating the film; A target video clip corresponding to the video clip information, wherein the target video clip at least includes: a video clip of the external material video obtained from the shooting device; and generating a movie by using the target video clip;
    拍摄设备,用于获取所述外部素材视频的语义信息;将所述语义信息发送给所述终端设备;获取所述终端设备发送的所述视频片段信息,并根据所述视频片段信息对所述目标素材视频进行剪辑,得到目标视频片段;将所述目标视频片段传输给所述终 端设备。a shooting device, configured to acquire the semantic information of the external material video; send the semantic information to the terminal device; acquire the video clip information sent by the terminal device, and analyze the The target material video is edited to obtain a target video segment; and the target video segment is transmitted to the terminal device.
  59. 根据权利要求58所述的系统,其特征在于,所述视频片段信息用于指示出所述目标视频片段所属的外部素材视频及所述目标视频片段对应的时间段。The system according to claim 58, wherein the video clip information is used to indicate an external material video to which the target video clip belongs and a time period corresponding to the target video clip.
  60. 根据权利要求58所述的系统,其特征在于,所述目标素材视频还包括:本地素材视频,所述终端设备还用于,在本地获取所述本地素材视频的语义信息。The system according to claim 58, wherein the target material video further comprises: a local material video, and the terminal device is further configured to locally acquire semantic information of the local material video.
  61. 根据权利要求60所述的系统,其特征在于,所述终端设备还用于,对所述本地素材视频进行语义分析,得到所述本地素材视频的语义信息。The system according to claim 60, wherein the terminal device is further configured to perform semantic analysis on the local material video to obtain semantic information of the local material video.
  62. 根据权利要求60所述的系统,其特征在于,所述目标视频片段还包括:所述本地素材视频的视频片段;The system according to claim 60, wherein the target video segment further comprises: a video segment of the local material video;
    所述终端设备还用于,根据所述视频片段信息对所述本地素材视频进行剪辑,得到所述本地素材视频的视频片段。The terminal device is further configured to edit the local material video according to the video segment information to obtain a video segment of the local material video.
  63. 根据权利要求58所述的系统,其特征在于,所述外部素材视频的语义信息是所述拍摄设备对所述外部素材视频进行语义分析得到的。The system according to claim 58, wherein the semantic information of the external material video is obtained by semantic analysis of the external material video by the shooting device.
  64. 根据权利要求63所述的拍摄设备,其特征在于,所述语义分析是在所述目标素材视频的拍摄过程中进行的。The shooting device according to claim 63, wherein the semantic analysis is performed during the shooting process of the target material video.
  65. 根据权利要求63所述的拍摄设备,其特征在于,所述语义分析是在充电过程中进行的。The photographing device of claim 63, wherein the semantic analysis is performed during charging.
  66. 根据权利要求58所述的系统,其特征在于,所述终端设备在根据所述语义信息,确定生成影片所需的视频片段信息时用于,根据所述语义信息,确定目标成片模板及所述目标成片模板中各视频空位对应的视频片段信息。The system according to claim 58, wherein, when the terminal device determines the video segment information required for generating a movie according to the semantic information, the terminal device is used to determine, according to the semantic information, a target movie-forming template and all the required video clips. The video segment information corresponding to each video slot in the target film template is described.
  67. 根据权利要求66所述的系统,其特征在于,所述目标成片模板是从候选成片模板中确定的,所述候选成片模板是从成片模板库中确定的。The system of claim 66, wherein the target sheeting template is determined from candidate sheeting templates, and the candidate sheeting templates are determined from a sheeting template library.
  68. 根据权利要求67所述的系统,其特征在于,所述候选成片模板是通过以下方式确定的:The system of claim 67, wherein the candidate slice templates are determined by:
    根据所述语义信息,确定待生成影片的风格类型;According to the semantic information, determine the style type of the movie to be generated;
    根据所述风格类型,从成片模板库中筛选出所述候选成片模板。According to the style type, the candidate film templates are screened from the film template library.
  69. 根据权利要求67所述的系统,其特征在于,所述终端设备在根据所述语义信息,确定目标成片模板及所述目标成片模板中各视频空位对应的视频片段信息用于,利用所述语义信息,计算所述目标素材视频中的视频片段与所述候选成片模板中各视频空位的匹配度,并计算相邻视频空位之间视频过渡的平滑度;根据所述匹配度与所述平滑度,确定目标成片模板及所述目标成片模板中各视频空位对应的目标视频片段。The system according to claim 67, wherein, according to the semantic information, the terminal device determines the target filming template and the video clip information corresponding to each video slot in the target filming template for, using the The semantic information is calculated, the degree of matching between the video clip in the target material video and each video slot in the candidate film template is calculated, and the smoothness of video transition between adjacent video slots is calculated; The smoothness is determined, and the target filming template and the target video segment corresponding to each video slot in the target filming template are determined.
  70. 根据权利要求66所述的系统,其特征在于,所述目标成片模板包括以下一种或多种内容:音乐、转场效果、贴图、视频特效。The system according to claim 66, wherein the target film-forming template includes one or more of the following contents: music, transition effects, textures, and video special effects.
  71. 根据权利要求66所述的系统,其特征在于,所述终端设备在利用所述目标视频片段生成影片时用于,将所述目标视频片段导入所述目标成片模板对应的视频空位,生成影片。The system according to claim 66, wherein when generating a movie by using the target video clip, the terminal device is used to import the target video clip into a video slot corresponding to the target movie template to generate a movie .
  72. 根据权利要求58所述的系统,其特征在于,所述目标素材视频是根据预设的条件自动从存储的素材视频中筛选得到的。The system according to claim 58, wherein the target material video is automatically selected from the stored material video according to preset conditions.
  73. 根据权利要求58所述的系统,其特征在于,所述目标素材视频是根据用户设定的条件从存储的素材视频中筛选得到的。The system according to claim 58, wherein the target material video is obtained by screening from the stored material video according to the conditions set by the user.
  74. 根据权利要求72或73所述的系统,其特征在于,所述条件包括以下一种或多种:时间、地点、人物信息、场景信息。The system according to claim 72 or 73, wherein the conditions include one or more of the following: time, location, character information, and scene information.
  75. 根据权利要求58所述的系统,其特征在于,所述语义信息包括语义标签。The system of claim 58, wherein the semantic information includes semantic tags.
  76. 根据权利要求58所述的系统,其特征在于,所述语义信息包括以下一种或多种:场景识别结果、人物动作检测结果、人物表情检测结果、目标检测结果、构图评价结果、美学评价结果。The system according to claim 58, wherein the semantic information includes one or more of the following: scene recognition results, human action detection results, human expression detection results, target detection results, composition evaluation results, and aesthetic evaluation results .
  77. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-18任一项所述的影片生成方法。A computer-readable storage medium, characterized in that, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the method for generating a movie according to any one of claims 1-18 is implemented.
  78. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求19-28任一项所述的影片生成方法。A computer-readable storage medium, characterized in that, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the method for generating a movie according to any one of claims 19-28 is implemented.
PCT/CN2020/118084 2020-09-27 2020-09-27 Film production method, terminal device, photographing device, and film production system WO2022061806A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080035038.6A CN113841417B (en) 2020-09-27 2020-09-27 Film generation method, terminal device, shooting device and film generation system
PCT/CN2020/118084 WO2022061806A1 (en) 2020-09-27 2020-09-27 Film production method, terminal device, photographing device, and film production system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/118084 WO2022061806A1 (en) 2020-09-27 2020-09-27 Film production method, terminal device, photographing device, and film production system

Publications (1)

Publication Number Publication Date
WO2022061806A1 true WO2022061806A1 (en) 2022-03-31

Family

ID=78963293

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/118084 WO2022061806A1 (en) 2020-09-27 2020-09-27 Film production method, terminal device, photographing device, and film production system

Country Status (2)

Country Link
CN (1) CN113841417B (en)
WO (1) WO2022061806A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114786069A (en) * 2022-04-22 2022-07-22 北京有竹居网络技术有限公司 Video generation method, device, medium and electronic equipment
CN115119050A (en) * 2022-06-30 2022-09-27 北京奇艺世纪科技有限公司 Video clipping method and device, electronic equipment and storage medium
CN115134646A (en) * 2022-08-25 2022-09-30 荣耀终端有限公司 Video editing method and electronic equipment

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114501076A (en) * 2022-02-07 2022-05-13 浙江核新同花顺网络信息股份有限公司 Video generation method, apparatus, and medium
CN115460459B (en) * 2022-09-02 2024-02-27 百度时代网络技术(北京)有限公司 Video generation method and device based on AI and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105578269A (en) * 2016-01-20 2016-05-11 努比亚技术有限公司 Mobile terminal and video processing method thereof
US20180014052A1 (en) * 2016-07-09 2018-01-11 N. Dilip Venkatraman Method and system for real time, dynamic, adaptive and non-sequential stitching of clips of videos
CN109076263A (en) * 2017-12-29 2018-12-21 深圳市大疆创新科技有限公司 Video data handling procedure, equipment, system and storage medium
CN110198432A (en) * 2018-10-30 2019-09-03 腾讯科技(深圳)有限公司 Processing method, device, computer-readable medium and the electronic equipment of video data
US20190392214A1 (en) * 2018-06-25 2019-12-26 Panasonic Intellectual Property Management Co., Ltd. Information processing apparatus and method for generating video picture data
CN111357277A (en) * 2018-11-28 2020-06-30 深圳市大疆创新科技有限公司 Video clip control method, terminal device and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110121116A (en) * 2018-02-06 2019-08-13 上海全土豆文化传播有限公司 Video generation method and device
CN110582025B (en) * 2018-06-08 2022-04-01 北京百度网讯科技有限公司 Method and apparatus for processing video
CN110855904B (en) * 2019-11-26 2021-10-01 Oppo广东移动通信有限公司 Video processing method, electronic device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105578269A (en) * 2016-01-20 2016-05-11 努比亚技术有限公司 Mobile terminal and video processing method thereof
US20180014052A1 (en) * 2016-07-09 2018-01-11 N. Dilip Venkatraman Method and system for real time, dynamic, adaptive and non-sequential stitching of clips of videos
CN109076263A (en) * 2017-12-29 2018-12-21 深圳市大疆创新科技有限公司 Video data handling procedure, equipment, system and storage medium
US20190392214A1 (en) * 2018-06-25 2019-12-26 Panasonic Intellectual Property Management Co., Ltd. Information processing apparatus and method for generating video picture data
CN110198432A (en) * 2018-10-30 2019-09-03 腾讯科技(深圳)有限公司 Processing method, device, computer-readable medium and the electronic equipment of video data
CN111357277A (en) * 2018-11-28 2020-06-30 深圳市大疆创新科技有限公司 Video clip control method, terminal device and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114786069A (en) * 2022-04-22 2022-07-22 北京有竹居网络技术有限公司 Video generation method, device, medium and electronic equipment
CN115119050A (en) * 2022-06-30 2022-09-27 北京奇艺世纪科技有限公司 Video clipping method and device, electronic equipment and storage medium
CN115119050B (en) * 2022-06-30 2023-12-15 北京奇艺世纪科技有限公司 Video editing method and device, electronic equipment and storage medium
CN115134646A (en) * 2022-08-25 2022-09-30 荣耀终端有限公司 Video editing method and electronic equipment

Also Published As

Publication number Publication date
CN113841417A (en) 2021-12-24
CN113841417B (en) 2023-07-28

Similar Documents

Publication Publication Date Title
WO2022061806A1 (en) Film production method, terminal device, photographing device, and film production system
CN111866585B (en) Video processing method and device
Fossati From Grain to Pixel: The Archival Life of Film in Transition THIRD REVISED EDITION
Davenport et al. Cinematic primitives for multimedia
US9443337B2 (en) Run-time techniques for playing large-scale cloud-based animations
CN101300567B (en) Method for media sharing and authoring on the web
Buckingham et al. Video cultures: Media technology and everyday creativity
US20140328570A1 (en) Identifying, describing, and sharing salient events in images and videos
WO2022141533A1 (en) Video processing method, video processing apparatus, terminal device, and storage medium
CN110121116A (en) Video generation method and device
WO2018050021A1 (en) Virtual reality scene adjustment method and apparatus, and storage medium
JP2012070283A (en) Video processing apparatus, method, and video processing system
CN108600632A (en) It takes pictures reminding method, intelligent glasses and computer readable storage medium
CN111667557B (en) Animation production method and device, storage medium and terminal
WO2017157135A1 (en) Media information processing method, media information processing device and storage medium
CN114638232A (en) Method and device for converting text into video, electronic equipment and storage medium
CN113992973B (en) Video abstract generation method, device, electronic equipment and storage medium
Lehmuskallio The camera as a sensor: The visualization of everyday digital photography as simulative, heuristic and layered pictures
CN117252966B (en) Dynamic cartoon generation method and device, storage medium and electronic equipment
US20200152237A1 (en) System and Method of AI Powered Combined Video Production
US7610554B2 (en) Template-based multimedia capturing
KR101898765B1 (en) Auto Content Creation Methods and System based on Content Recognition Technology
WO2013187796A1 (en) Method for automatically editing digital video files
KR20210078206A (en) Editing multimedia contents based on voice recognition
US20220217430A1 (en) Systems and methods for generating new content segments based on object name identification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20954657

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20954657

Country of ref document: EP

Kind code of ref document: A1