WO2023217155A1 - 视频生成方法、装置、设备、存储介质和程序产品 - Google Patents

视频生成方法、装置、设备、存储介质和程序产品 Download PDF

Info

Publication number
WO2023217155A1
WO2023217155A1 PCT/CN2023/093089 CN2023093089W WO2023217155A1 WO 2023217155 A1 WO2023217155 A1 WO 2023217155A1 CN 2023093089 W CN2023093089 W CN 2023093089W WO 2023217155 A1 WO2023217155 A1 WO 2023217155A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
multimedia data
template
video
segment
Prior art date
Application number
PCT/CN2023/093089
Other languages
English (en)
French (fr)
Inventor
李欣玮
曹嘉晋
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Priority to EP23802924.3A priority Critical patent/EP4344230A4/en
Priority to JP2023578709A priority patent/JP2024528440A/ja
Priority to US18/573,097 priority patent/US20240296871A1/en
Publication of WO2023217155A1 publication Critical patent/WO2023217155A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • G11B27/036Insert-editing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/34Indicating arrangements 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Definitions

  • the present disclosure relates to the field of video processing technology, and in particular, to a video generation method, device, equipment, storage medium and program product.
  • embodiments of the present disclosure provide a video generation method, device, equipment, storage medium and program product.
  • the editing operation in the obtained editing template is directly applied to the multimedia data to generate the video without any need.
  • Users manually edit videos, which can not only reduce the time cost of making videos, but also improve the quality of the produced videos.
  • an embodiment of the present disclosure provides a video generation method, including:
  • Initial multimedia data is generated based on the received text data; wherein the initial multimedia data includes a video image in which the spoken voice of the text data matches the text data, the initial multimedia data includes at least one multimedia segment, and the at least one multimedia segment respectively corresponds to the text data divided At least one text segment; the target multimedia segment in the at least one multimedia segment corresponds to the target text segment in the at least one text segment, and the target multimedia segment
  • the segment includes a target video segment and a target voice segment, the target video segment includes a video image that matches the target text segment, and the target voice segment includes a reading voice that matches the target text segment;
  • an embodiment of the present disclosure provides a video generation device, including:
  • An initial multimedia data generation module is used to generate initial multimedia data based on the received text data; wherein the initial multimedia data includes a video image of a reading voice of the text data that matches the text data, and the initial multimedia data includes at least one multimedia segment, at least one multimedia The segments respectively correspond to at least one text segment divided into text data; a target multimedia segment in at least one multimedia segment corresponds to a target text segment in at least one text segment, and the target multimedia segment includes a target video segment and a target voice segment, and the target video segment The target text segment includes a video image that matches the target text segment, and the target speech segment includes a read voice that matches the target text segment;
  • the target editing template acquisition module is used to obtain the target editing template in response to the editing template acquisition request;
  • the target multimedia data generation module is used to apply the editing operation indicated by the target editing template to the initial multimedia data to obtain the target multimedia data;
  • the target video generation module is used to generate the target video based on the target multimedia data.
  • an embodiment of the present disclosure provides an electronic device.
  • the electronic device includes:
  • processors one or more processors
  • a storage device for storing one or more programs
  • the one or more processors When one or more programs are executed by one or more processors, the one or more processors are caused to implement the video generation method in any one of the above-mentioned first aspects.
  • embodiments of the present disclosure provide a computer-readable storage medium on which a computer program is stored.
  • the program is executed by a processor, the video generation method as described in any one of the above-mentioned first aspects is implemented.
  • inventions of the present disclosure provide a computer program product.
  • the computer program product includes a computer program or instructions. When the computer program or instructions are executed by a processor, Implement the video generation method as described in any one of the above first aspects.
  • Embodiments of the present disclosure provide a video generation method, device, equipment, storage medium and program product.
  • the method includes: generating initial multimedia data based on received text data; acquiring a target clipping template in response to a clipping template acquisition request; The editing operation indicated by the editing template is applied to the initial multimedia data to obtain the target multimedia data; the target video is generated based on the target multimedia data.
  • Embodiments of the present disclosure generate videos by directly applying the editing operations in the obtained editing templates to multimedia data, eliminating the need for users to manually edit videos. This not only reduces the time cost of making videos, but also improves the quality of the produced videos.
  • Figure 1 shows an architecture diagram of a video production scenario provided by an embodiment of the present disclosure
  • Figure 2 is a schematic flowchart of a video generation method in an embodiment of the present disclosure
  • Figure 3 is a schematic diagram of triggering the template theme control in an embodiment of the present disclosure
  • Figure 4 is a schematic diagram of triggering a template control in an embodiment of the present disclosure
  • Figure 5 is a schematic diagram of a template application prompt in an embodiment of the present disclosure
  • Figure 6 is a schematic structural diagram of a video generation device in an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure.
  • the term “include” and its variations are open-ended, ie, “including but not limited to.”
  • the term “based on” means “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; and the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
  • keywords need to be extracted from text data; for each keyword, video images matching the keyword are searched in a preset image library; text information and video images are synthesized according to typesetting rules. Get target video.
  • search for each keyword, video images matching the keyword are searched in a preset image library; text information and video images are synthesized according to typesetting rules. Get target video.
  • the related technology only the found video pictures and text data are simply synthesized, and the quality of the produced video is not high, and the user needs to manually edit it later. If the user lacks editing experience, test, it affects the quality of the video.
  • FIG. 1 shows an architectural diagram of a video production scenario provided by an embodiment of the present disclosure.
  • the architecture diagram may include at least one electronic device 101 on the client side and at least one server 102 on the server side.
  • the electronic device 101 can establish a connection with the server 102 and exchange information through a network protocol such as Hyper Text Transfer Protocol over Secure Socket Layer (HTTPS).
  • HTTPS Hyper Text Transfer Protocol over Secure Socket Layer
  • the electronic device 101 may include mobile phones, tablet computers, desktop computers, notebook computers, vehicle-mounted terminals, wearable devices, all-in-one machines, smart home devices and other devices with communication functions, and may also include devices simulated by virtual machines or simulators.
  • the server 102 may include a cloud server or a server cluster and other devices with storage and computing functions.
  • the user can create videos in a designated platform on the electronic device 101, and the designated platform can be a designated application or a designated website.
  • the user can send the video to the server 102 of the designated platform.
  • the server 102 can receive the video sent by the electronic device 101 and store the received video to send the video to the electronic device that needs to play the video.
  • the electronic device 101 can receive the user's editing template acquisition request for the initial multimedia data.
  • the target editing template can be obtained, the editing operation indicated by the target editing template is applied to the initial multimedia data, the target multimedia data is obtained, and the target video is generated based on the target multimedia data. It can be seen that during the generation process of the target video, the editing operations in the obtained target editing template are directly applied to the initial multimedia data, without the need for the user to manually edit the video, which can not only reduce the time cost of making videos, but also improve the quality of the produced videos. .
  • the electronic device 101 can also obtain the target clipping template after receiving the clipping template acquisition request, apply the clipping operation indicated by the target clipping template to the initial multimedia data, obtain the target multimedia data, and based on the target multimedia data
  • the target video is generated so that the clip indicated by the target clip template is locally generated on the electronic device 101
  • the operation is applied to the initial multimedia data to generate the target video to further reduce the time cost of video production.
  • the electronic device 101 may also send a clipping template acquisition request carrying a template identifier to the server 102 after receiving the clipping template acquisition request.
  • the server 102 may respond to the clipping template acquisition request, obtain the target clipping template, and apply the clipping operation indicated by the target clipping template to the initial multimedia data to obtain
  • the target multimedia data is generated, and the target video is generated based on the target multimedia data, and the generated target video is sent to the electronic device 101, so that the electronic device 101 can request the server 102 to obtain the target clipping template based on the clipping template acquisition request, and obtain the target clipping template.
  • the indicated clipping operation is applied to the initial multimedia data to generate a target video to further improve the quality of the produced video and reduce the data processing volume of the electronic device 101 .
  • the electronic device may be a mobile terminal, a fixed terminal or a portable terminal, such as a mobile phone, a site, a unit, a device, a multimedia computer, a multimedia tablet, an Internet node, a communicator, a desktop computer, a laptop computer, a notebook computer, a netbook computer , tablet computers, personal communications system (PCS) devices, personal navigation devices, personal digital assistants (PDA), audio/video players, digital cameras/camcorders, positioning devices, television receivers, radio broadcast receivers, e-book devices, Gaming equipment or any combination thereof, including accessories and peripherals for such equipment or any combination thereof.
  • PCS personal communications system
  • PDA personal digital assistants
  • audio/video players digital cameras/camcorders
  • positioning devices television receivers, radio broadcast receivers, e-book devices, Gaming equipment or any combination thereof, including accessories and peripherals for such equipment or any combination thereof.
  • the server can be a physical server or a cloud server.
  • the server can be a server or a server cluster.
  • FIG 2 is a flow chart of a video generation method in an embodiment of the present disclosure. This embodiment can be applied to the situation of generating a video based on text information.
  • the method can be executed by a video generation device, and the video generation device can use software and/or Or implemented in hardware, the video generation method can be implemented in the electronic device described in Figure 1.
  • the video generation method provided by the embodiment of the present disclosure mainly includes steps S101-S104.
  • the text data may be recorded by the user through an input device.
  • the data input to the electronic device can also be the data sent by other devices to the electronic device.
  • the method before generating the initial multimedia data based on the received text data, the method further includes: receiving text data in response to a user's input data operation.
  • the user's input data operation may include an adding operation to text data, or may include an input operation to text data, which is not specifically limited in this embodiment.
  • the initial multimedia data includes a video image in which the spoken voice of the text data matches the text data
  • the initial multimedia data includes at least one multimedia segment
  • the at least one multimedia segment respectively corresponds to at least one text segment divided by the text data.
  • the target multimedia segment in at least one multimedia segment corresponds to the target text segment in at least one text segment
  • the target multimedia segment includes a target video segment and a target voice segment
  • the target video segment includes a video image that matches the target text segment
  • the segments include spoken speech that matches the target text segment.
  • generating initial multimedia data based on the received text data includes: dividing the received text data into at least one text segment, where the text segment includes a plurality of target text segments. For each target text segment, the video image corresponding to the target text segment is searched in the preset gallery based on the target text segment, and the video image is processed according to the preset animation effect to obtain the target video segment corresponding to the target text segment. Obtain the reading speech matching the target text fragment and generate the target speech fragment. The target video clip and the target voice clip are synthesized to obtain the target multimedia clip. For each target text fragment, multiple target multimedia fragments are obtained, and the multiple target multimedia fragments are synthesized in the order of the target text fragments to obtain initial multimedia data.
  • subtitle text matching the target text segment is included on the video image.
  • subtitle text matching the target text segment is added to the video image, so that the user can intuitively see the subtitles corresponding to the read speech while watching the video, thereby improving the user's viewing experience.
  • responding to the clipping template acquisition request may be responding to the clipping template acquisition request after receiving a user's operation on the electronic device. It may also be that after detecting the initial multimedia data generation, responding to the clip template acquisition request.
  • the target clip template can be based on the user's operation of the electronic device, the selected clip Templates can also be automatically matched to clipping templates based on keywords in text data.
  • obtaining the target clipping template includes: the electronic device directly obtains the target clipping module from a locally pre-stored template database.
  • obtaining the target clip template includes: the electronic device obtains a template identifier corresponding to the target clip template, sends a clip template acquisition request carrying the template identifier to the server, and the server responds to the clip template carrying the template identifier.
  • a template acquisition request is made, and a target clipping template is obtained based on the template identification, and the obtained target clipping template is returned to the electronic device.
  • a prompt popup box is displayed in the display interface of the electronic device, and the prompt popup box is used to prompt the failure to obtain the target clipping template.
  • obtaining the target clip template in response to a clip template acquisition request, includes: in response to a trigger operation on the template theme control, determining the clip template corresponding to the trigger operation as the target clip template; obtaining the target clip template .
  • At least one template theme control is displayed on the interactive interface of the electronic device, and in response to the user's triggering operation on the template theme control, the clipping template corresponding to the triggering operation is determined as the target clipping template.
  • the clipping template corresponding to the template theme 1 control is determined as the target clipping template.
  • the target editing template is selected through the user's triggering operation, which facilitates the user to select a satisfactory editing template and improves the user experience.
  • the method before responding to the triggering operation on the clipping template control, the method further includes: displaying a video editing area, wherein the video editing area includes the template control; responding to the triggering operation on the template control, displaying the mask layer Area; displays at least one template theme control on the masked area.
  • the video preview area 10 and the video editing area 20 are displayed in the display interface of the electronic device, and the video editing area 20 includes a plurality of editing controls, For example: template control, screen control, text control, reading tone control and music control.
  • the template control is used to indicate that the user can use the existing template to edit the initial multimedia data.
  • the screen control is used to instruct the user to edit the video image in the initial multimedia data.
  • Text controls are used to instruct users about The subtitle text in the initial multimedia data is edited.
  • the reading voice control is used to instruct the user to edit the reading voice in the initial multimedia data.
  • the music control is used to instruct the user to edit the background music in the initial multimedia data.
  • a masked layer area is displayed, and multiple clip template theme controls are displayed in the masked layer area.
  • multiple clip template theme controls are displayed with a left and right sliding effect.
  • multiple template theme controls are displayed in response to the user's triggering operation on the template control, making the operation simple and easy to understand and convenient for the user to operate.
  • the target clipping template includes at least one clipping operation, the clipping operation is applied to the initial multimedia data, and the clipping operation can be performed on the initial multimedia data.
  • the electronic device in the process of applying the clipping operation indicated by the target clipping template to the initial multimedia data, since clipping the initial multimedia data requires a certain amount of time, the electronic device An application prompt box is displayed in the display interface, and the application prompt box is used to prompt the user that the initial multimedia video is being edited using the editing operation indicated in the editing template.
  • a prompt message indicating that the clipping template is successfully applied is displayed; if the clipping operation indicated by the target clipping template is unsuccessfully applied to the initial multimedia data data, a prompt message indicating that the clipping template application failed will be displayed, and the user will be prompted to reselect the clipping template.
  • the editing operation indicated by the target editing template includes: a video synthesis operation; applying the editing operation indicated by the target editing template to the initial multimedia data to obtain the target multimedia data includes: based on the video synthesis operation
  • the video clips included in the target editing template are synthesized with the multimedia clips included in the initial multimedia data to obtain target multimedia data.
  • the target clip template includes one or more video clips.
  • the editing operation indicated by the target editing template includes: a video synthesis operation, combine one or more video clips included in the target editing template with the initial multimedia data The multimedia fragments included in it are synthesized to obtain the target multimedia data.
  • the video clip included in the target clip template is added between any two video frames of the multimedia clip.
  • the above video clip synthesis operation can be any existing video synthesis method, and is no longer specifically limited in this embodiment.
  • the synthesis of multiple videos is realized through the video synthesis operation in the editing template, which avoids the user's manual synthesis of videos, reduces the time cost of making videos, and improves the quality of the produced videos.
  • synthesizing the video clips included in the target clipping template and the multimedia clips included in the initial multimedia data based on the video synthesis operation to obtain the target multimedia data includes: based on the video synthesis operation, combining the target clipping template
  • the video clips included in are loaded to the set positions of the multimedia clips included in the initial multimedia data to obtain the target multimedia data, where the set positions include: before the first frame of media data of the initial multimedia data, and/or, at the end of the initial multimedia data After one frame of media data.
  • the target editing template includes multiple video clips and the corresponding adding positions of each video clip.
  • the video clip is added before the first frame of media data of the initial multimedia data as the target video title.
  • the video clip is added after the last frame of media data of the initial multimedia data as the beginning of the target video.
  • the text theme is added to the position of the text theme in the video clip corresponding to the title, and the text theme is processed according to the text theme display effect included in the target clip template. Edit and render on screen. Further, if the text data includes a text author, the text author is added to the text author's position in the video clip corresponding to the title, and the text author information is edited and rendered on the screen according to the text author display effect included in the target editing template. .
  • the video producer's information is obtained, the video producer's information is added to the producer's position in the video clip corresponding to the end, and displayed according to the video producer included in the target editing template. The effect edits the video producer's information and renders it on the screen.
  • the operation of adding the title and/or the end is realized, which avoids the user from manually adding the beginning or the end, reduces the time cost of making the video, and improves the quality of the produced video.
  • the editing operation indicated by the target editing template includes: a transition setting operation; applying the editing operation indicated by the target editing template to the initial multimedia data to obtain the target multimedia data includes: based on the transition setting operation The operation is to add a transition effect to the multimedia clips included in the initial multimedia data to obtain target multimedia data.
  • the initial multimedia data includes multiple video images that match the text data.
  • the process of switching multiple video images inevitably involves image transition settings.
  • users need to manually set the transition effect between two adjacent video images, which increases the time cost of video production.
  • the transition effect includes one or more of the following: blind animation effect, cut-in animation effect, flash animation effect, gradient animation effect, cross dissolve animation effect, zoom animation effect, etc.
  • the editing operation indicated by the target editing template includes: a transition setting operation, and the transition setting operation includes multiple transition effect types. Multiple transition effect types included in the transition setting operation are applied to the multimedia clips, so that each multimedia clip has its own corresponding transition effect.
  • the transition effect type is applied to the multimedia clips, so that the multimedia clips have the same transition effect.
  • transition effects are added to multimedia clips through the transition setting operation in the editing template, which avoids users manually setting transition effects, reduces the time cost of making videos, and improves the quality of the produced videos.
  • applying the clipping operation indicated by the target clipping template includes: a virtual object adding operation; applying the clipping operation indicated by the target clipping template to the initial multimedia data to obtain the target multimedia data, including: based on the virtual object The adding operation adds the virtual object included in the target clipping template to a preset position of the initial multimedia data to obtain the target multimedia data.
  • virtual objects include: target video clips, virtual stickers, virtual objects, virtual cards and other objects.
  • Optional can include: face Decorative features, headwear features, clothing features, clothing accessories features, etc.
  • the virtual object saved in the target clipping template may be directly added to a preset position of the initial multimedia data.
  • Specific parameters of the preset positions can be saved in the target clip template, optionally. Add the glitter effect sticker to the third video image according to the settings saved in the target clip template.
  • the location where the virtual object is added can be determined based on keywords proposed in the text information.
  • virtual objects are added to multimedia clips through the virtual object addition operation in the editing template, which avoids users from manually adding virtual objects, reduces the time cost of making videos, and improves the quality of the produced videos.
  • the editing operation indicated by the target editing template includes: a background audio adding operation; applying the editing operation indicated by the target editing template to the initial multimedia data to obtain the target multimedia data includes: adding based on the background audio The operation mixes the background audio included in the target clip template and the spoken voice included in the initial multimedia data to obtain target multimedia data.
  • a background audio is included in the target clip template. Based on the back audio adding operation, based on the timestamp corresponding to the background audio and the timestamp corresponding to the read voice, the background audio and the read voice are mixed to obtain the target multimedia data.
  • the playback parameters of the background audio are adjusted based on the playback parameters of the read speech, so that the two can be more integrated.
  • background music is added to multimedia clips through the adding operation of background audio in the editing template, which avoids users from manually adding background music, reduces the time cost of making videos, and improves the quality of produced videos.
  • the clipping operation indicated by the target clipping template includes: a keyword extraction operation; applying the clipping operation indicated by the target clipping template to the initial multimedia data includes: for at least one target text segment, extracting the target Keywords in text fragments; add keywords to the target multimedia fragment corresponding to the target text fragment.
  • the keyword may be a date, a number, a person's name, a proper name, a place name, a plant, an animal, etc.
  • the target text segment is "Zhang San reported to "Li Si paid cash 200,000 yuan”
  • the keyword extracted from the target text fragment is "200,000 yuan”
  • the keyword "200,000 yuan” is added to the target multimedia fragment corresponding to the target text fragment.
  • the target editing module also includes: keyword parameters, where the keyword parameters include: keyword color, font, added effects, etc. Set the keyword display information in the target multimedia clip according to the keyword parameters.
  • keywords are added to the multimedia clips through keyword extraction operations in the editing template, so that users can more clearly understand the key information of the text clips.
  • adding keywords to the target multimedia segment corresponding to the target text segment includes: obtaining key text information that matches the keyword; adding the keyword and key text information to the target text segment corresponding to in the target multimedia clip.
  • key information matching the keywords is obtained based on the above keywords.
  • the keyword is “Wang Wu”
  • the key information matching the keyword is: Wang Wu is an actor, and his representative works are “TV Series A” and “Movie B”.
  • “ ⁇ ” is used as the keyword
  • “actor” and “representative works “TV Series A” and “Movie B”” are used as key text information and added to the target multimedia clip.
  • the keyword is "Office embezzlement crime”
  • the matching key text information is "Office embezzlement crime” refers to the personnel of a company, enterprise or other unit who take advantage of their position to illegally take possession of the unit's property as their own.
  • different display parameters can be set for keywords and key text information.
  • the above-mentioned key text information matching keywords may be text information extracted from text data, or may be text information obtained from the Internet or a preset knowledge base.
  • the method of obtaining key text information is no longer specifically limited in this embodiment.
  • key text information is extracted through keywords, and keywords and key text information are added to the video, so that users can quickly understand keyword-related Knowledge helps users understand the content of text data.
  • Embodiments of the present disclosure provide a video generation method, device, equipment, storage medium and program product.
  • the method includes: generating initial multimedia data based on received text data; wherein the initial multimedia data includes the reading voice of the text data and the text data.
  • Matching video images, the initial multimedia data includes at least one multimedia segment, and the at least one multimedia segment respectively corresponds to at least one text segment divided by the text data; the target multimedia segment in the at least one multimedia segment is consistent with the target text segment in the at least one text segment.
  • the target multimedia segment includes a target video segment and a target voice segment
  • the target video segment includes a video image that matches the target text segment
  • the target voice segment includes a reading voice that matches the target text segment
  • the target clip is obtained Template
  • Apply the editing operation indicated by the target editing template to the initial multimedia data to obtain the target multimedia data
  • Embodiments of the present disclosure generate videos by directly applying the editing operations in the obtained editing templates to multimedia data, eliminating the need for users to manually edit videos. This not only reduces the time cost of making videos, but also improves the quality of the produced videos.
  • Figure 6 is a flow chart of a video generation method in an embodiment of the present disclosure. This embodiment can be applied to the situation of generating a video based on text information.
  • the method can be executed by a video generation device, and the video generation device can use software and/or Or implemented in hardware, the video generating device can be configured in an electronic device.
  • the video generation device 60 provided by the embodiment of the present disclosure mainly includes: an initial multimedia data generation module 61 , a target clipping template acquisition module 62 , a target multimedia data generation module 63 and a target video generation module 64 .
  • the initial multimedia data generation module 61 is used to generate initial multimedia data based on the received text data; wherein the initial multimedia data includes a video image in which the reading voice of the text data matches the text data, and the initial multimedia data includes at least one multimedia segment, At least one multimedia segment respectively corresponds to at least one text segment divided into text data; a target multimedia segment in at least one multimedia segment corresponds to a target text segment in at least one text segment, and the target multimedia segment includes a target video segment and a target voice segment,
  • the target video clip includes a video image that matches the target text clip, and the target speech clip includes a reading voice that matches the target text clip;
  • the target clip template obtains the model
  • Block 62 is used to obtain the target clip template in response to the clip template acquisition request;
  • the target multimedia data generation module 63 is used to apply the clip operation indicated by the target clip template to the initial multimedia data to obtain the target multimedia data; the target video generation module 64, used to generate target videos based on target multimedia data.
  • subtitle text matching the target text segment is included on the video image.
  • the target clipping template acquisition module 62 is used to obtain the target clipping template in response to a clipping template acquisition request, including: a target clipping template determination unit, used to respond to a triggering operation on the template theme control. , determine the clipping template corresponding to the trigger operation as the target clipping template; the target clipping template acquisition unit is used to obtain the target clipping template.
  • the target clipping template acquisition module 62 also includes: a video editing area display unit, configured to display the video editing area before responding to a triggering operation on the clipping template control, wherein the video editing area includes: A template control; a masked layer area display unit, used for displaying the masked layer area in response to a triggering operation on the template control; and displaying at least one template theme control on the masked layer area.
  • a video editing area display unit configured to display the video editing area before responding to a triggering operation on the clipping template control
  • the video editing area includes: A template control; a masked layer area display unit, used for displaying the masked layer area in response to a triggering operation on the template control; and displaying at least one template theme control on the masked layer area.
  • the editing operation indicated by the target editing template includes: a video synthesis operation; a target multimedia data generation module 63, specifically configured to combine the video clips included in the target editing template with the initial multimedia data based on the video synthesis operation.
  • the multimedia fragments included in it are synthesized to obtain the target multimedia data.
  • the target multimedia data generation module 63 is specifically configured to load the video clips included in the target clipping template to the set positions of the multimedia clips included in the initial multimedia data based on the video synthesis operation to obtain the target Multimedia data, wherein the setting position includes: before the first frame of media data of the initial multimedia data, and/or after the last frame of media data of the initial multimedia data.
  • the editing operation indicated by the target editing template includes: a transition setting operation; a target multimedia data generation module 63, specifically configured to add transitions to the multimedia clips included in the initial multimedia data based on the transition setting operation. field effect to obtain the target multimedia data.
  • the editing operation indicated by the target editing template includes: a virtual object adding operation; the target multimedia data generation module 63, which is specifically used based on The virtual object adding operation adds the virtual object included in the target clip template to a preset position of the initial multimedia data to obtain the target multimedia data.
  • the editing operation indicated by the target editing template includes: a background audio adding operation; the target multimedia data generation module 63 is specifically configured to combine the background audio included in the target editing template with the initial background audio based on the background audio adding operation. The reading voices included in the multimedia data are mixed to obtain target multimedia data.
  • the editing operation indicated by the target editing template includes: a keyword extraction operation; the target multimedia data generation module 63 is specifically configured to extract keywords in the target text segment for at least one target text segment; Add keywords to the target multimedia segment corresponding to the target text segment.
  • the target multimedia data generation module 63 is specifically configured to obtain key text information matching keywords; add keywords and key text information to the target multimedia segment corresponding to the target text segment.
  • the video generation device provided by the embodiments of the present disclosure can execute the steps performed in the video generation method provided by the method embodiments of the present disclosure. The execution steps and beneficial effects will not be described again here.
  • FIG. 7 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure.
  • the electronic device 700 in the embodiment of the present disclosure may include, but is not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Tablets), PMPs (Portable Multimedia Players), vehicle-mounted terminals ( Mobile terminals such as car navigation terminals), wearable terminal devices, etc., and fixed terminals such as digital TVs, desktop computers, smart home devices, etc.
  • the electronic device shown in FIG. 7 is only an example and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
  • the electronic device 700 may include a processing device (eg, central processing unit, graphics processor, etc.) 701 that may be loaded into a random access device according to a program stored in a read-only memory (ROM) 702 or from a storage device 708 .
  • the program in the memory (RAM) 703 performs various appropriate actions and processes to implement the image rendering method according to the embodiments of the present disclosure.
  • various programs and data required for the operation of the terminal device 700 are also stored.
  • the processing device 701, the ROM 702 and the RAM 703 are connected to each other via a bus 704.
  • An input/output (I/O) interface 705 is also connected to bus 704.
  • the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 707 such as a computer; a storage device 708 including a magnetic tape, a hard disk, etc.; and a communication device 709.
  • the communication device 709 may allow the terminal device 700 to communicate wirelessly or wiredly with other devices to exchange data.
  • FIG. 7 shows the terminal device 700 having various means, it should be understood that implementation or possession of all illustrated means is not required. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer-readable medium, the computer program including program code for executing the method shown in the flowchart, thereby achieving the above The video generation method described.
  • the computer program may be downloaded and installed from the network via communication device 709, or from storage device 708, or from ROM 702.
  • the processing device 701 the above-mentioned functions defined in the method of the embodiment of the present disclosure are performed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium may be, for example, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination thereof. More specific examples of computer readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), removable Programmd read-only memory (EPROM or flash memory), fiber optics, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • a computer-readable signal medium may also be any computer-readable medium other than computer-readable storage media, The computer-readable signal medium may send, propagate, or transport the program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wire, optical cable, RF (radio frequency), etc., or any suitable combination of the above.
  • the client and server can communicate using any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol), and can communicate with digital data in any form or medium.
  • Communications e.g., communications network
  • communications networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (e.g., the Internet), and end-to-end networks (e.g., ad hoc end-to-end networks), as well as any currently known or developed in the future network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; it may also exist independently without being assembled into the electronic device.
  • the computer-readable medium carries one or more programs.
  • the terminal device When the one or more programs are executed by the terminal device, the terminal device: generates initial multimedia data based on the received text data; wherein the initial multimedia data includes text A video image in which the reading voice of the data matches the text data.
  • the initial multimedia data includes at least one multimedia segment, and the at least one multimedia segment respectively corresponds to at least one text segment divided by the text data; the target multimedia segment in the at least one multimedia segment is consistent with at least one text segment.
  • the target multimedia segment includes a target video segment and a target voice segment
  • the target video segment includes a video image that matches the target text segment
  • the target voice segment includes a reading voice that matches the target text segment
  • the terminal device may also perform other steps described in the above embodiments.
  • Computer program code for performing the operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages—such as "C” or similar programming languages.
  • the program code may execute entirely on the user's computer, partially on the user's computer, as a Stand-alone software packages execute, partially on the user's computer and partially on a remote computer, or entirely on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider such as an Internet service provider through Internet connection
  • each block in the flowchart or block diagram may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or operations. , or can be implemented using a combination of specialized hardware and computer instructions.
  • the units involved in the embodiments of the present disclosure can be implemented in software or hardware. Among them, the name of a unit does not constitute a limitation on the unit itself under certain circumstances.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs Systems on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include electrical connections based on one or more wires, portable computer disks, hard drives, random access storage (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device , or any suitable combination of the above.
  • RAM random access storage
  • ROM read-only memory
  • EPROM or flash memory erasable programmable read-only memory
  • CD-ROM compact disk read-only memory
  • magnetic storage device magnetic storage device
  • the present disclosure provides a video generation method, including: generating initial multimedia data based on received text data; wherein the initial multimedia data includes a reading voice of the text data that matches the text data.
  • the video image, the initial multimedia data includes at least one multimedia segment, and the at least one multimedia segment respectively corresponds to at least one text segment divided by the text data; the target multimedia segment in the at least one multimedia segment corresponds to the target text segment in the at least one text segment.
  • the target multimedia clip includes a target video clip and a target voice clip
  • the target video clip includes a video image that matches the target text clip
  • the target voice clip includes a reading voice that matches the target text clip
  • the present disclosure provides a video generation method, wherein the video image includes subtitle text matching the target text segment.
  • the present disclosure provides a video generation method, wherein in response to a clip template acquisition request, obtaining a target clip template includes: in response to a trigger operation on a template theme control, triggering an operation The corresponding editing template is determined as the target editing template; the target editing template is obtained.
  • the present disclosure provides a video generation method, wherein before responding to the triggering operation on the clip template control, the method further includes: displaying a video editing area, wherein the video editing area includes a template Control; in response to a triggering operation on the template control, display the masked area; display at least one template theme control on the masked area.
  • the present disclosure provides a video generation method, wherein applying the editing operation indicated by the target editing template includes: a video synthesis operation; applying the editing operation indicated by the target editing template to the initial Multimedia data, obtaining target multimedia data includes: synthesizing video clips included in the target editing template and multimedia clips included in the initial multimedia data based on a video synthesis operation to obtain target multimedia data.
  • the present disclosure provides a video generation method, wherein video segments included in the target clip template and multimedia segments included in the initial multimedia data are synthesized based on a video synthesis operation to obtain the target
  • the multimedia data includes: based on the video synthesis operation, loading the video clips included in the target editing template to the set positions of the multimedia clips included in the initial multimedia data to obtain the target multimedia data, where the set positions include: the initial multimedia data No. Before one frame of media data, and/or, after the last frame of media data of the initial multimedia data.
  • the present disclosure provides a video generation method, wherein applying the editing operation indicated by the target editing template includes: a transition setting operation; applying the editing operation indicated by the target editing template to Initial multimedia data to obtain target multimedia data includes: adding transition effects to multimedia clips included in the initial multimedia data based on a transition setting operation to obtain target multimedia data.
  • the present disclosure provides a video generation method, wherein applying the editing operation indicated by the target editing template includes: a virtual object adding operation; applying the editing operation indicated by the target editing template to Initial multimedia data and obtaining target multimedia data include: adding virtual objects included in the target clipping template to a preset position of the initial multimedia data based on a virtual object adding operation to obtain target multimedia data.
  • the present disclosure provides a video generation method, wherein the editing operation indicated by the target editing template includes: a background audio adding operation; applying the editing operation indicated by the target editing template to the initial Multimedia data, obtaining the target multimedia data includes: mixing the background audio included in the target clip template and the reading voice included in the initial multimedia data based on the background audio adding operation to obtain the target multimedia data.
  • the present disclosure provides a video generation method, wherein the editing operation indicated by the target editing template includes: a keyword extraction operation; applying the editing operation indicated by the target editing template to the initial
  • the multimedia data includes: extracting keywords from the target text segment for at least one target text segment; and adding the keywords to the target multimedia segment corresponding to the target text segment.
  • the present disclosure provides a video generation method, wherein keywords are added to the target multimedia segment corresponding to the target text segment, It includes: obtaining key text information that matches the keyword; adding the keyword and key text information to the target multimedia fragment corresponding to the target text fragment.
  • the present disclosure provides a video generation device.
  • the device includes: an initial multimedia data generation module, configured to generate initial multimedia data based on received text data; wherein the initial multimedia data includes text A video image in which the reading voice of the data matches the text data.
  • the initial multimedia data includes at least one multimedia segment, and the at least one multimedia segment respectively corresponds to at least one text segment divided by the text data; the target multimedia segment in the at least one multimedia segment is consistent with at least one text segment.
  • the target multimedia segment includes a target video segment and a target voice segment
  • the target video segment includes a video image that matches the target text segment
  • the target voice segment includes a reading voice that matches the target text segment
  • the target clip template The acquisition module is used to obtain the target editing template in response to the editing template acquisition request;
  • the target multimedia data generation module is used to apply the editing operation indicated by the target editing template to the initial multimedia data to obtain the target multimedia data; the target video generation module is used to generate the target video based on the target multimedia data.
  • the present disclosure provides a video generation device, wherein the video image includes subtitle text matching the target text segment.
  • the present disclosure provides a video generation device, wherein the target clip template acquisition module is configured to obtain the target clip template in response to the clip template acquisition request, including: target clip template determination The unit is used to determine the clipping template corresponding to the triggering operation as the target clipping template in response to the triggering operation on the template theme control; the target clipping template acquisition unit is used to obtain the target clipping template.
  • the present disclosure provides a video generation device, wherein the target clip template acquisition module further includes: a video editing area display unit, configured to respond to a triggering operation on the clip template control before , display the video editing area, wherein the video editing area includes a template control; the masked layer area display unit is used to display the masked layer area in response to a triggering operation on the template control; and display at least one template theme control on the masked layer area.
  • the present disclosure provides a video generation device, wherein the editing operation indicated by the target editing template includes: a video synthesis operation; a target multimedia data generation module, specifically used for based on the video synthesis operation Target clipping template
  • the video clips included in the multimedia data are synthesized with the multimedia clips included in the initial multimedia data to obtain the target multimedia data.
  • the present disclosure provides a video generation device, wherein the target multimedia data generation module is specifically configured to load the video clips included in the target clip template into the initial multimedia based on the video synthesis operation.
  • the set position of the multimedia fragment included in the data is used to obtain the target multimedia data, where the set position includes: before the first frame of media data of the initial multimedia data, and/or after the last frame of media data of the initial multimedia data.
  • the present disclosure provides a video generation device, wherein the editing operation indicated by the target editing template includes: a transition setting operation; a target multimedia data generation module, specifically configured to generate a video based on the transition
  • the setting operation adds transition effects to the multimedia clips included in the initial multimedia data to obtain target multimedia data.
  • the present disclosure provides a video generation device, wherein the editing operation indicated by the target editing template includes: a virtual object adding operation; a target multimedia data generation module, specifically configured to generate video based on the virtual object The adding operation adds the virtual object included in the target clipping template to a preset position of the initial multimedia data to obtain the target multimedia data.
  • the present disclosure provides a video generation device, in which the editing operation indicated by the target editing template includes: a background audio adding operation; a target multimedia data generation module, specifically for based on the background audio The adding operation mixes the background audio included in the target clip template and the spoken voice included in the initial multimedia data to obtain target multimedia data.
  • the present disclosure provides a video generation device, wherein the editing operation indicated by the target editing template includes: a keyword extraction operation; a target multimedia data generation module, specifically configured to target at least one Target text fragment, extract keywords from the target text fragment; add the keywords to the target multimedia fragment corresponding to the target text fragment.
  • the present disclosure provides a video generation device, in which the target multimedia data generation module is specifically configured to obtain key text information matching keywords; add keywords and key text information to the target multimedia segment corresponding to the target text segment.
  • the present disclosure provides an electronic device, including:
  • processors one or more processors
  • Memory used to store one or more programs
  • the one or more processors are caused to implement any of the video generation methods provided by this disclosure.
  • the present disclosure provides a computer-readable storage medium having a computer program stored thereon.
  • the program is executed by a processor, the video generation as described in any one provided by the present disclosure is implemented. method.
  • Embodiments of the present disclosure also provide a computer program product.
  • the computer program product includes a computer program or instructions. When the computer program or instructions are executed by a processor, the video generation method as described above is implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Television Signal Processing For Recording (AREA)
  • Studio Circuits (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本公开涉及一种视频生成方法、装置、设备、存储介质和程序产品,所述方法包括:基于接收到的文本数据生成初始多媒体数据;响应于剪辑模板获取请求,获取目标剪辑模板;将所述目标剪辑模板所指示的剪辑操作应用于所述初始多媒体数据,得到目标多媒体数据;基于所述目标多媒体数据生成目标视频。本公开实施例通过将获取到的剪辑模板中的剪辑操作直接应用在多媒体数据中,生成视频,无需用户手动剪辑视频,不但能够降低制作视频的时间成本,还能够提高制作的视频质量。

Description

视频生成方法、装置、设备、存储介质和程序产品
本申请要求2022年5月10日递交的,标题为“视频生成方法、装置、设备、存储介质和程序产品”、申请号为202210508063.2的中国发明专利申请的优先权。
技术领域
本公开涉及视频处理技术领域,尤其涉及一种视频生成方法、装置、设备、存储介质和程序产品。
背景技术
随着计算机技术和移动通信技术的迅速发展,基于电子设备的各种视频平台得到了普遍应用,极大地丰富了人们的日常生活。越来越多的用户乐于在视频平台上分享自己的视频作品,以供其他用户观看。
在相关技术中,在制作视频时,用户首先需要自行寻找视频中需要的各类素材,然后对素材进行一系列复杂的视频剪辑操作,最终生成一个视频作品。
如果用户缺乏剪辑经验,导致制作视频的时间成本增加,且使得制作的视频质量不高。
发明内容
为了解决上述技术问题,本公开实施例提供了一种视频生成方法、装置、设备、存储介质和程序产品,将获取到的剪辑模板中的剪辑操作直接应用在多媒体数据中,生成视频,也无需用户手动剪辑视频,不但能够降低制作视频的时间成本,还能够提高制作的视频质量。
第一方面,本公开实施例提供一种视频生成方法,包括:
基于接收到的文本数据生成初始多媒体数据;其中,初始多媒体数据包括文本数据的朗读语音与文本数据匹配的视频图像,初始多媒体数据包括至少一个多媒体片段,至少一个多媒体片段分别对应于文本数据划分的至少一个文本片段;至少一个多媒体片段中的目标多媒体片段与至少一个文本片段中的目标文本片段相对应,目标多媒体片 段包括目标视频片段和目标语音片段,目标视频片段包括与目标文本片段匹配的视频图像,目标语音片段包括与目标文本片段匹配的朗读语音;
响应于剪辑模板获取请求,获取目标剪辑模板;
将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,得到目标多媒体数据;
基于目标多媒体数据生成目标视频。
第二方面,本公开实施例提供一种视频生成装置,包括:
初始多媒体数据生成模块,用于基于接收到的文本数据生成初始多媒体数据;其中,初始多媒体数据包括文本数据的朗读语音与文本数据匹配的视频图像,初始多媒体数据包括至少一个多媒体片段,至少一个多媒体片段分别对应于文本数据划分的至少一个文本片段;至少一个多媒体片段中的目标多媒体片段与至少一个文本片段中的目标文本片段相对应,目标多媒体片段包括目标视频片段和目标语音片段,目标视频片段包括与目标文本片段匹配的视频图像,目标语音片段包括与目标文本片段匹配的朗读语音;
目标剪辑模板获取模块,用于响应于剪辑模板获取请求,获取目标剪辑模板;
目标多媒体数据生成模块,用于将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,得到目标多媒体数据;
目标视频生成模块,用于基于目标多媒体数据生成目标视频。第三方面,本公开实施例提供一种电子设备,电子设备包括:
一个或多个处理器;
存储装置,用于存储一个或多个程序;
当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如上述第一方面中任一项的视频生成方法。
第四方面,本公开实施例提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如上述第一方面中任一项所述的视频生成方法。
第五方面,本公开实施例提供一种计算机程序产品,该计算机程序产品包括计算机程序或指令,该计算机程序或指令被处理器执行时 实现如上述第一方面中任一项所述的视频生成方法。
本公开实施例提供了一种视频生成方法、装置、设备、存储介质和程序产品,方法包括:基于接收到的文本数据生成初始多媒体数据;响应于剪辑模板获取请求,获取目标剪辑模板;将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,得到目标多媒体数据;基于目标多媒体数据生成目标视频。本公开实施例通过将获取到的剪辑模板中的剪辑操作直接应用在多媒体数据中,生成视频,无需用户手动剪辑视频,不但能够降低制作视频的时间成本,还能够提高制作的视频质量。
附图说明
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。
图1示出了本公开实施例提供的一种视频制作场景的架构图;
图2为本公开实施例中的一种视频生成方法的流程示意图;
图3为本公开实施例中的模板主题控件的触发示意图;
图4为本公开实施例中的模板控件的触发示意图;
图5为本公开实施例中的模板应用提示的示意图;
图6为本公开实施例中的一种视频生成装置的结构示意图;
图7为本公开实施例中的一种电子设备的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的 步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
在对本申请实施例进行详细地解释说明之前,先对本申请实施例的应用场景进行说明。
用户在处理文档时,大多是以文本形式呈现,而用户阅读文本较为费力,因此,可以将文本信息转换成视频,这样,用户可以收听音频以及观看视频画面来明确文章所传递的信息,而不需要费力解读文本,可以降低用户获取信息的难度。或者,由于文本篇幅较长,而用户阅读文本较为费时,用户没有精力逐篇阅读,因此,需要将文章转换成视频,通过视频快速了解文章所传递的信息,再选择自己感兴趣的文章进行仔细阅读。另外,由于视频的展现形式较为多元化,相比于枯燥的文字阅读来说,更容易吸引用户的注意力,用户也更愿意通过这种方式来阅读文章。
相关技术中,需要从文本数据中提取关键词;对于每个关键词,在预先设定的图片库中查找与该关键词相匹配的视频图片;根据排版规则对文本信息和视频图片进行合成,得到目标视频。然而,相关技术中,仅仅是将查找到视频图片与文本数据进行简单的合成,制作的视频质量不高,后续还需要用户手动进行剪辑,如果用户缺乏剪辑经 验,则影响了视频的质量。
本申请实施例中,根据文本数据生成初始多媒体数据后,获取一个目标剪辑模板,将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,从而实现对初始多媒体数据的剪辑处理,无需用户手动剪辑视频,不但能够降低制作视频的时间成本,还能够提高制作的视频质量。图1示出了本公开实施例提供的一种视频制作场景的架构图。
如图1所示,该架构图中可以包括客户端的至少一个电子设备101以及服务端的至少一个服务器102。电子设备101可以通过网络协议如超文本传输安全协议(Hyper Text Transfer Protocol over Secure Socket Layer,HTTPS)与服务器102建立连接并进行信息交互。其中,电子设备101可以包括移动电话、平板电脑、台式计算机、笔记本电脑、车载终端、可穿戴设备、一体机、智能家居设备等具有通信功能的设备,也可以包括虚拟机或者模拟器模拟的设备。服务器102可以包括云服务器或者服务器集群等具有存储及计算功能的设备。
基于上述架构,用户可以在电子设备101上的指定平台内制作视频,指定平台可以为指定应用程序或者指定网站。用户可以在制作好视频后,向指定平台的服务器102发送该视频,服务器102可以接收电子设备101发送的视频,并且存储接收到的视频,以将该视频发送给需要播放该视频的电子设备。
在本公开实施例中,为了降低制作视频的时间成本以及提高制作的视频质量,电子设备101能够接收用户针对初始多媒体数据的剪辑模板获取请求,在电子设备101接收到该剪辑模板获取请求之后,可以获取目标剪辑模板,将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,得到目标多媒体数据,并基于目标多媒体数据生成目标视频。可见,在目标视频的生成过程中将获取到的目标剪辑模板中的剪辑操作直接应用在初始多媒体数据中,无需用户手动剪辑视频,不但能够降低制作视频的时间成本,还能够提高制作的视频质量。
可选地,基于上述架构,电子设备101还可以在接收到剪辑模板获取请求获取目标剪辑模板,将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,得到目标多媒体数据,并基于目标多媒体数据生成目标视频,从而在电子设备101本地将目标剪辑模板所指示的剪辑 操作应用于初始多媒体数据,进而生成目标视频,以进一步降低制作视频的时间成本。
可选地,基于上述架构,电子设备101还可以在接收到剪辑模板获取请求之后,向服务器102发送携带有模板标识的剪辑模板获取请求。服务器102可以在接收到电子设备101发送的携带有模板标识的剪辑模板获取请求之后,响应于剪辑模板获取请求,获取目标剪辑模板,将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,得到目标多媒体数据,并基于目标多媒体数据生成目标视频,并向电子设备101发送生成的目标视频,从而使电子设备101可以请求服务器102基于剪辑模板获取请求,获取目标剪辑模板,并将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,生成目标视频,以进一步提高制作的视频质量并且降低电子设备101的数据处理量。
例如:电子设备可以是移动终端、固定终端或便携式终端,例如移动手机、站点、单元、设备、多媒体计算机、多媒体平板、互联网节点、通信器、台式计算机、膝上型计算机、笔记本计算机、上网本计算机、平板计算机、个人通信系统(PCS)设备、个人导航设备、个人数字助理(PDA)、音频/视频播放器、数码相机/摄像机、定位设备、电视接收器、无线电广播接收器、电子书设备、游戏设备或者其任意组合,包括这些设备的配件和外设或者其任意组合。
服务器可以是实体服务器,也可以是云服务器,服务器可以是一个服务器,或者服务器集群。
下面将结合附图,对本申请实施例提出的视频生成方法进行详细介绍。
图2为本公开实施例中的一种视频生成方法的流程图,本实施例可适用于根据文本信息生成视频的情况,该方法可以由视频生成装置执行,该视频生成装置可以采用软件和/或硬件的方式实现,该视频生成方法可由图1中所述的电子设备中。
如图2所示,本公开实施例提供的视频生成方法主要包括步骤S101-S104。
S101、基于接收到的文本数据生成初始多媒体数据。
在本公开的一个实施例中,文本数据可以是用户通过输入装置录 入至电子设备的数据,还可以是其他设备向电子设备发送的数据。
在本公开的一个实施例中,在基于接收到的文本数据生成初始多媒体数据之前,还包括:响应于用户的输入数据操作,接收文本数据。其中,用户的输入数据操作可以包括对文本数据的添加操作,也可以包括对文本数据的录入操作,本实施例中不作具体限制。
在本公开的一个实施例中,初始多媒体数据包括文本数据的朗读语音与文本数据匹配的视频图像,初始多媒体数据包括至少一个多媒体片段,至少一个多媒体片段分别对应于文本数据划分的至少一个文本片段;至少一个多媒体片段中的目标多媒体片段与至少一个文本片段中的目标文本片段相对应,目标多媒体片段包括目标视频片段和目标语音片段,目标视频片段包括与目标文本片段匹配的视频图像,目标语音片段包括与目标文本片段匹配的朗读语音。
在本公开的一个实施方式中,基于接收到的文本数据生成初始多媒体数据,包括:将接收到的文本数据划分的至少一个文本片段,文本片段包括多个目标文本片段。针对每一个目标文本片段,基于目标文本片段在预设图库中查找与目标文本片段对应的视频图像,将视频图像按照预设的动画效果进行处理,得到与目标文本片段对应的目标视频片段。获取目标文本片段匹配的朗读语音,生成目标语音片段。将目标视频片段和目标语音片段进行合成,得到目标多媒体片段。针对每一个目标文本片段,得到多个目标多媒体片段,将多个目标多媒体片段按照目标文本片段的前后顺序进行合成,得到初始多媒体数据。
在本公开的一个实施方式中,视频图像上包括与目标文本片段匹配的字幕文本。
在本公开实施例中,在视频图像上添加与目标文本片段匹配的字幕文本,以方便用户在观看视频的过程中,能够直观的看到与朗读语音对应的字幕,提高用户的观看体验。
S102、响应于剪辑模板获取请求,获取目标剪辑模板。
在本公开的一个实施例方式中,响应于剪辑模板获取请求可以是接收到用户对电子设备的操作后,响应于剪辑模板获取请求。还可以是检测到初始多媒体数据生成之后,响应于剪辑模板获取请求。
目标剪辑模板,可以是基于用户对电子设备的操作,选中的剪辑 模板,还可以是基于文本数据中的关键字自动匹配到剪辑模板。
本公开的一个实施例方式中,获取目标剪辑模板,包括:电子设备直接在本地预先存储的模板数据库中获取目标剪辑模块。
在本公开的一个实施方式中,获取目标剪辑模板,包括:电子设备获取目标剪辑模板对应的模板标识,将携带有模板标识的剪辑模板获取请求发送至服务器,服务器响应该携带有模板标识的剪辑模板获取请求,并基于模板标识获取目标剪辑模板,将获取到的目标剪辑模板返回至电子设备。
在本公开的一个实施方式中,如果未能获取目标剪辑模板,则在电子设备的显示界面中展示一个提示弹出框,该提示弹出框用于提示用于目标剪辑模板获取失败。
在本公开的一个实施方式中,响应于剪辑模板获取请求,获取目标剪辑模板,包括:响应于对模板主题控件的触发操作,将触发操作对应的剪辑模板确定为目标剪辑模板;获取目标剪辑模板。
在本公开的一个实施方式中,电子设备的交互界面上显示至少一个模板主题控件,响应于用户对模板主题控件的触发操作,将触发操作对应的剪辑模板确定为目标剪辑模板。
如图3所示,响应于用户对模板主题1控件的触发操作,将模板主题1控件对应的剪辑模板确定为目标剪辑模板。
在本公开实施例中,通过用户的触发操作,选择目标剪辑模板,便于用户选择到自己满意的剪辑模板,提高用户的使用体验。
在本公开的一个实施方式中,响应于对剪辑模板控件的触发操作之前,还包括:显示视频编辑区域,其中,视频编辑区域中包括模板控件;响应于对模板控件的触发操作,显示蒙层区域;在蒙层区域上显示至少一个模板主题控件。
在本公开实施例中,如图4所示,生成初始多媒体数据后,在电子设备的显示界面中,显示视频预览区域10和视频编辑区域20,在视频编辑区域20中包括多个编辑控件,例如:模板控件、画面控件、文字控件、朗读音色控件和音乐控件。其中,模板控件用于指示用户可以使用现有模板对初始多媒体数据进行编辑。画面控件用于指示用户对初始多媒体数据中的视频图像进行编辑。文字控件用于指示用户对 初始多媒体数据中的字幕文本进行编辑。朗读音色控件用于指示用户对初始多媒体数据中的朗读语音进行编辑。音乐控件用于指示用户对初始多媒体数据中的背景音乐进行编辑。
在本公开的一个实施方式中,如图4所示,响应于用户对模板控件的触发操作,显示一个蒙层区域,在蒙层区域中显示多个剪辑模板主题控件。响应于对蒙层区域的左右滑动操作,以左右滑动的效果展示多个剪辑模板主题控件。
在本公开实施例中,响应用户对模板控件的触发操作后,显示多个模板主题控件,使得操作简单易懂,方便用户操作。
S103、将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,得到目标多媒体数据。
在本公开的一个实施方式中,目标剪辑模板包括至少一个剪辑操作,该剪辑操作应用于初始多媒体数据上,可以对初始多媒体数据进行剪辑操作。
在本公开的一个实施方式中,如图5所示,将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据的过程中,由于对初始多媒体数据的剪辑需要一定的时间,因此,在电子设备的显示界面中展示应用提示框,该应用提示框用于提示用户正在使用剪辑模板中所指示的剪辑操作对初始多媒体视频进行剪辑处理。
在本公开的一个实施方式中,如果目标剪辑模板所指示的剪辑操作成功应用于初始多媒体数据,则显示剪辑模板应用成功的提示消息;如果目标剪辑模板所指示的剪辑操作不成功应用于初始多媒体数据,则显示剪辑模板应用失败的提示消息,并提示用户重新选择剪辑模板。
在本公开的一个实施方式中,目标剪辑模板所指示的剪辑操作包括:视频合成操作;将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,得到目标多媒体数据,包括:基于视频合成操作将目标剪辑模板中包括的视频片段与初始多媒体数据中包括的多媒体片段进行合成,得到目标多媒体数据。
在本公开的实施例方式中,目标剪辑模板中包括一个或多个视频片段。在目标剪辑模板所指示的剪辑操作包括:视频合成操作的情况下,将目标剪辑模板中包括的一个或多个视频片段与初始多媒体数据 中包括的多媒体片段进行合成,得到目标多媒体数据。
在本公开的实施例方式中,将目标剪辑模板中包括的视频片段添加至多媒体片段的任意两个视频帧之间。上述视频片段合成操作,可以是现有的任意一种视频合成方式,本实施例中不再具体的限定。
在本公开实施例中,通过剪辑模板中的视频合成操作,实现多段视频的合成,避免用户手动合成视频,降低制作视频的时间成本,提高制作的视频质量。
在本公开的一个实施方式中,基于视频合成操作将目标剪辑模板中包括的视频片段与初始多媒体数据中包括的多媒体片段进行合成,得到目标多媒体数据,包括:基于视频合成操作,将目标剪辑模板中包括的视频片段加载至初始多媒体数据中包括的多媒体片段的设定位置,得到目标多媒体数据,其中,设定位置包括:初始多媒体数据第一帧媒体数据之前,和/或,初始多媒体数据最后一帧媒体数据之后。
在本公开实施例中,目标剪辑模板中包括多个视频片段以及各个视频片段对应的添加位置。
在本公开的一个实施方式中,如果目标剪辑模板中包括的视频片段对应的添加位置为片头位置,则将该视频片段添加至初始多媒体数据第一帧媒体数据之前,作为目标视频片头。
在本公开的一个实施方式中,如果目标剪辑模板中包括的视频片段对应的添加位置为片尾位置,则将该视频片段添加至初始多媒体数据最后一帧媒体数据之后,作为目标视频的片头。
在本公开的一个实施方式中,如果文本数据中包括文本主题,将文本主题添加至片头对应的视频片段中的文本主题的位置,并根据目标剪辑模板中包括的文本主题显示效果对文本主题进行编辑并渲染上屏。进一步的,如果文本数据中包括文本作者,将文本作者添加至片头对应的视频片段中的文本作者的位置,并根据目标剪辑模板中包括的文本作者显示效果对文本作者信息进行编辑并渲染上屏。
在本公开的一个实施方式中,如果获取到视频制作者的信息,将视频制作者的信息添加至片尾对应的视频片段中的制作者的位置,并根据目标剪辑模板中包括的视频制作者显示效果对视频制作者的信息进行编辑并渲染上屏。
在本公开实施例中,通过剪辑模板中的视频合成操作,实现添加片头和/或片尾的操作,避免用户手动添加片头或片尾,降低制作视频的时间成本,提高制作的视频质量。
在本公开的一个实施方式中,目标剪辑模板所指示的剪辑操作包括:转场设置操作;将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,得到目标多媒体数据,包括:基于转场设置操作对初始多媒体数据中包括的多媒体片段添加转场效果,得到目标多媒体数据。
在本公开的一个实施方式中,初始多媒体数据中包括多个与文本数据匹配的视频图像,多个视频图像在切换的过程中,必然涉及到图像转场设置。在相关技术中,用户需要手动设置两个相邻视频图像之间的转场效果,增加视频制作的时间成本。
在本公开的一个实施方式中,转场效果包括如下一个或多个:百叶窗动画效果、切入动画效果、闪烁动画效果、渐变动画效果、十字溶解动画效果、缩放动画效果等等。
在本公开的一个实施方式中,目标剪辑模板所指示的剪辑操作包括:转场设置操作,转场设置操作中包括多个转场效果类型。基于转场设置操作中包括的多个转场效果类型应用于多媒体片段中,使得各个多媒体片段具备各自对应的转场效果。
在本公开的一个实施方式中,如果转场设置操作中包括一个转场效果类型,则将该转场效果类型应用于多媒体片段,使得多媒体片段具备相同的转场效果。
在本公开实施例中,通过剪辑模板中的转场设置操作,为多媒体片段添加转场效果,避免用户手动设置转场效果,降低制作视频的时间成本,提高制作的视频质量。
在本公开的一个实施方式中,将目标剪辑模板所指示的剪辑操作包括:虚拟对象添加操作;将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,得到目标多媒体数据,包括:基于虚拟对象添加操作将目标剪辑模板中包括的虚拟对象添加至初始多媒体数据的预设位置,得到目标多媒体数据。
在本公开的一个实施方式中,虚拟对象包括:目标视频片段、虚拟贴纸、虚拟物体、虚拟卡片等多种对象。可选的,可以包括:面部 装饰特征、头饰特征、衣物特征和衣物配饰特征等等。
在本公开的一个实施方式中,可以是直接将目标剪辑模板中保存的虚拟对象添加至初始多媒体数据的预设位置。预设位置的具体参数可以保存在目标剪辑模板中,可选的。在保存在目标剪辑模板中设定将闪光效果的贴纸添加在第3幅视频图像上。
在本公开的一个实施方式中,可以根据文本信息中提出的关键字,确定虚拟对象的添加位置。可选的,将虚拟对象添加至关键字对应的视频图像中。
在本公开的实施例中,通过剪辑模板中的虚拟对象添加操作,为多媒体片段添加虚拟对象,避免用户手动添加虚拟对象,降低制作视频的时间成本,提高制作的视频质量。
在本公开的一个实施方式中,目标剪辑模板所指示的剪辑操作包括:背景音频添加操作;将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,得到目标多媒体数据,包括:基于背景音频添加操作将目标剪辑模板中包括的背景音频与初始多媒体数据中包括的朗读语音进行混合,得到目标多媒体数据。
在本公开的一个实施方式中,目标剪辑模板中包括一个背景音频。基于背音频添加操作,基于背景音频对应的时间戳和朗读语音对应的时间戳,将背景音频和朗读语音进行混合,得到目标多媒体数据。
在本公开的一个实施方式中,基于朗读语音的播放参数调节背景音频的播放参数,使得两者能够更加融合。
在本公开实施例中,通过剪辑模板中的背景音频的添加操作,为多媒体片段添加背景音乐,避免用户手动添加背景音乐,降低制作视频的时间成本,提高制作的视频质量。
在本公开的一个实施方式中,目标剪辑模板所指示的剪辑操作包括:关键词提取操作;将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,包括:针对至少一个目标文本片段,提取目标文本片段中的关键字;将关键字添加至目标文本片段对应的目标多媒体片段中。
在本公开的一个实施方式中,关键字可以是日期、数字、人物名称、专有名称、地名、植物、动物等等。
在本公开的一个实施方式中,以目标文本片段是“张三于当日向 李四支付现金20万元”,该目标文本片段中提取出的关键字为“20万元”,将关键字“20万元”添加至该目标文本片段对应的目标多媒体片段中。
在本公开的一个实施方式中,目标剪辑模块中还包括:关键字参数,其中,关键字参数包括:关键字的颜色、字体、添加效果等等。根据关键字参数设置关键字在目标多媒体片段中的显示信息。
在本公开的实施例中,通过剪辑模板中的关键字提取操作,为多媒体片段添加关键字,使得用户能够更明确的了解该文本片段的关键信息。
在本公开的一个实施方式中,将关键字添加至目标文本片段对应的目标多媒体片段中,包括:获取与关键字匹配的关键文本信息;将关键字和关键文本信息添加至目标文本片段对应的目标多媒体片段中。
在本公开的实施例中,从目标文本片段中提取关键字之后,基于上述关键字获取与关键字匹配的关键信息。例如:关键字是“王五”,与关键字匹配的关键信息为:王五是一名演员,代表作是《电视剧A》、《电影B》。此时,将“王五”作为关键字,“演员”,“代表作品《电视剧A》、《电影B》”作为关键文本信息,添加至目标多媒体片段中。再如:关键字为“职务侵占罪”,其匹配的关键文本信息为“职务侵占罪,是指公司、企业或者其他单位的人员,利用职务上的便利,将本单位财物非法占为己有,数额较大的行为”此时,将“职务侵占罪”作为关键字,“职务侵占罪,是指公司、企业或者其他单位的人员,利用职务上的便利,将本单位财物非法占为己有,数额较大的行为”作为关键文本信息,添加至目标多媒体片段中。
在本公开的一个实施方式中,可以为关键字、关键文本信息设置不同的显示参数。
在本公开的一个实施方式中,上述与关键字匹配的关键文本信息可以是从文本数据中提取出的文本信息,也可以是从互联网或者预设知识库中获取的文本信息。关键文本信息的获取方式本实施例中不再具体限定。
在本公开实施例中,通过关键字提取关键文本信息,将关键字和关键文本信息添加至视频中,使得用户可以快速了解与关键字相关的 知识,辅助用户了解文本数据的内容。
S104、基于目标多媒体数据生成目标视频。
本公开实施例提供了一种视频生成方法、装置、设备、存储介质和程序产品,方法包括:基于接收到的文本数据生成初始多媒体数据;其中,初始多媒体数据包括文本数据的朗读语音与文本数据匹配的视频图像,初始多媒体数据包括至少一个多媒体片段,至少一个多媒体片段分别对应于文本数据划分的至少一个文本片段;至少一个多媒体片段中的目标多媒体片段与至少一个文本片段中的目标文本片段相对应,目标多媒体片段包括目标视频片段和目标语音片段,目标视频片段包括与目标文本片段匹配的视频图像,目标语音片段包括与目标文本片段匹配的朗读语音;响应于剪辑模板获取请求,获取目标剪辑模板;将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,得到目标多媒体数据;基于目标多媒体数据生成目标视频。本公开实施例通过将获取到的剪辑模板中的剪辑操作直接应用在多媒体数据中,生成视频,无需用户手动剪辑视频,不但能够降低制作视频的时间成本,还能够提高制作的视频质量。
图6为本公开实施例中的一种视频生成方法的流程图,本实施例可适用于根据文本信息生成视频的情况,该方法可以由视频生成装置执行,该视频生成装置可以采用软件和/或硬件的方式实现,该视频生成装置可配置于电子设备中。
如图6所示,本公开实施例提供的视频生成装置60主要包括:初始多媒体数据生成模块61、目标剪辑模板获取模块62、目标多媒体数据生成模块63和目标视频生成模块64。
其中,初始多媒体数据生成模块61,用于基于接收到的文本数据生成初始多媒体数据;其中,初始多媒体数据包括文本数据的朗读语音与文本数据匹配的视频图像,初始多媒体数据包括至少一个多媒体片段,至少一个多媒体片段分别对应于文本数据划分的至少一个文本片段;至少一个多媒体片段中的目标多媒体片段与至少一个文本片段中的目标文本片段相对应,目标多媒体片段包括目标视频片段和目标语音片段,目标视频片段包括与目标文本片段匹配的视频图像,目标语音片段包括与目标文本片段匹配的朗读语音;目标剪辑模板获取模 块62,用于响应于剪辑模板获取请求,获取目标剪辑模板;目标多媒体数据生成模块63,用于将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,得到目标多媒体数据;目标视频生成模块64,用于基于目标多媒体数据生成目标视频。
在本公开的一个实施方式中,视频图像上包括与目标文本片段匹配的字幕文本。
在本公开的一个实施方式中,目标剪辑模板获取模块62,用于响应于剪辑模板获取请求,获取目标剪辑模板时,包括:目标剪辑模板确定单元,用于响应于对模板主题控件的触发操作,将触发操作对应的剪辑模板确定为目标剪辑模板;目标剪辑模板获取单元,用于获取目标剪辑模板。
在本公开的一个实施方式中,目标剪辑模板获取模块62,还包括:视频编辑区域显示单元,用于响应于对剪辑模板控件的触发操作之前,显示视频编辑区域,其中,视频编辑区域中包括模板控件;蒙层区域显示单元,用于响应于对模板控件的触发操作,显示蒙层区域;在蒙层区域上显示至少一个模板主题控件。
在本公开的一个实施方式中,目标剪辑模板所指示的剪辑操作包括:视频合成操作;目标多媒体数据生成模块63,具体用于基于视频合成操作将目标剪辑模板中包括的视频片段与初始多媒体数据中包括的多媒体片段进行合成,得到目标多媒体数据。
在本公开的一个实施方式中,目标多媒体数据生成模块63,具体用于基于视频合成操作,将目标剪辑模板中包括的视频片段加载至初始多媒体数据中包括的多媒体片段的设定位置,得到目标多媒体数据,其中,设定位置包括:初始多媒体数据第一帧媒体数据之前,和/或,初始多媒体数据最后一帧媒体数据之后。
在本公开的一个实施方式中,目标剪辑模板所指示的剪辑操作包括:转场设置操作;目标多媒体数据生成模块63,具体用于基于转场设置操作对初始多媒体数据中包括的多媒体片段添加转场效果,得到目标多媒体数据。
在本公开的一个实施方式中,目标剪辑模板所指示的剪辑操作包括:虚拟对象添加操作;目标多媒体数据生成模块63,具体用于基于 虚拟对象添加操作将目标剪辑模板中包括的虚拟对象添加至初始多媒体数据的预设位置,得到目标多媒体数据。
在本公开的一个实施方式中,目标剪辑模板所指示的剪辑操作包括:背景音频添加操作;目标多媒体数据生成模块63,具体用于基于背景音频添加操作将目标剪辑模板中包括的背景音频与初始多媒体数据中包括的朗读语音进行混合,得到目标多媒体数据。
在本公开的一个实施方式中,目标剪辑模板所指示的剪辑操作包括:关键词提取操作;目标多媒体数据生成模块63,具体用于针对至少一个目标文本片段,提取目标文本片段中的关键字;将关键字添加至目标文本片段对应的目标多媒体片段中。
在本公开的一个实施方式中,目标多媒体数据生成模块63,具体用于获取与关键字匹配的关键文本信息;将关键字和关键文本信息添加至目标文本片段对应的目标多媒体片段中。
本公开实施例提供的视频生成装置,可执行本公开方法实施例所提供的视频生成方法中所执行的步骤,具备执行步骤和有益效果此处不再赘述。
图7为本公开实施例中的一种电子设备的结构示意图。下面具体参考图7,其示出了适于用来实现本公开实施例中的电子设备700的结构示意图。本公开实施例中的电子设备700可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)、可穿戴终端设备等等的移动终端以及诸如数字TV、台式计算机、智能家居设备等等的固定终端。图7示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图7所示,电子设备700可以包括处理装置(例如中央处理器、图形处理器等)701,其可以根据存储在只读存储器(ROM)702中的程序或者从存储装置708加载到随机访问存储器(RAM)703中的程序而执行各种适当的动作和处理以实现如本公开所述的实施例的图片渲染方法。在RAM 703中,还存储有终端设备700操作所需的各种程序和数据。处理装置701、ROM 702以及RAM 703通过总线704彼此相连。输入/输出(I/O)接口705也连接至总线704。
通常,以下装置可以连接至I/O接口705:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置706;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置707;包括例如磁带、硬盘等的存储装置708;以及通信装置709。通信装置709可以允许终端设备700与其他设备进行无线或有线通信以交换数据。虽然图7示出了具有各种装置的终端设备700,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码,从而实现如上所述的视频生成方法。在这样的实施例中,该计算机程序可以通过通信装置709从网络上被下载和安装,或者从存储装置708被安装,或者从ROM 702被安装。在该计算机程序被处理装置701执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质, 该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该终端设备执行时,使得该终端设备:基于接收到的文本数据生成初始多媒体数据;其中,初始多媒体数据包括文本数据的朗读语音与文本数据匹配的视频图像,初始多媒体数据包括至少一个多媒体片段,至少一个多媒体片段分别对应于文本数据划分的至少一个文本片段;至少一个多媒体片段中的目标多媒体片段与至少一个文本片段中的目标文本片段相对应,目标多媒体片段包括目标视频片段和目标语音片段,目标视频片段包括与目标文本片段匹配的视频图像,目标语音片段包括与目标文本片段匹配的朗读语音;响应于剪辑模板获取请求,获取目标剪辑模板;将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,得到目标多媒体数据;基于目标多媒体数据生成目标视频。
可选的,当上述一个或者多个程序被该终端设备执行时,该终端设备还可以执行上述实施例所述的其他步骤。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个 独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储 器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,本公开提供了一种视频生成方法方法,包括:基于接收到的文本数据生成初始多媒体数据;其中,初始多媒体数据包括文本数据的朗读语音与文本数据匹配的视频图像,初始多媒体数据包括至少一个多媒体片段,至少一个多媒体片段分别对应于文本数据划分的至少一个文本片段;至少一个多媒体片段中的目标多媒体片段与至少一个文本片段中的目标文本片段相对应,目标多媒体片段包括目标视频片段和目标语音片段,目标视频片段包括与目标文本片段匹配的视频图像,目标语音片段包括与目标文本片段匹配的朗读语音;响应于剪辑模板获取请求,获取目标剪辑模板;将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,得到目标多媒体数据;基于目标多媒体数据生成目标视频。
根据本公开的一个或多个实施例,本公开提供了一种视频生成方法,其中,视频图像上包括与目标文本片段匹配的字幕文本。
根据本公开的一个或多个实施例,本公开提供了一种视频生成方法,其中,响应于剪辑模板获取请求,获取目标剪辑模板,包括:响应于对模板主题控件的触发操作,将触发操作对应的剪辑模板确定为目标剪辑模板;获取目标剪辑模板。
根据本公开的一个或多个实施例,本公开提供了一种视频生成方法,其中,响应于对剪辑模板控件的触发操作之前,还包括:显示视频编辑区域,其中,视频编辑区域中包括模板控件;响应于对模板控件的触发操作,显示蒙层区域;在蒙层区域上显示至少一个模板主题控件。
根据本公开的一个或多个实施例,本公开提供了一种视频生成方法,其中,将目标剪辑模板所指示的剪辑操作包括:视频合成操作;将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,得到目标多媒体数据,包括:基于视频合成操作将目标剪辑模板中包括的视频片段与初始多媒体数据中包括的多媒体片段进行合成,得到目标多媒体数据。
根据本公开的一个或多个实施例,本公开提供了一种视频生成方法,其中,基于视频合成操作将目标剪辑模板中包括的视频片段与初始多媒体数据中包括的多媒体片段进行合成,得到目标多媒体数据,包括:基于视频合成操作,将目标剪辑模板中包括的视频片段加载至初始多媒体数据中包括的多媒体片段的设定位置,得到目标多媒体数据,其中,设定位置包括:初始多媒体数据第一帧媒体数据之前,和/或,初始多媒体数据最后一帧媒体数据之后。
根据本公开的一个或多个实施例,本公开提供了一种视频生成方法,其中,将目标剪辑模板所指示的剪辑操作包括:转场设置操作;将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,得到目标多媒体数据,包括:基于转场设置操作对初始多媒体数据中包括的多媒体片段添加转场效果,得到目标多媒体数据。
根据本公开的一个或多个实施例,本公开提供了一种视频生成方法,其中,将目标剪辑模板所指示的剪辑操作包括:虚拟对象添加操作;将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,得到目标多媒体数据,包括:基于虚拟对象添加操作将目标剪辑模板中包括的虚拟对象添加至初始多媒体数据的预设位置,得到目标多媒体数据。
根据本公开的一个或多个实施例,本公开提供了一种视频生成方法,其中,目标剪辑模板所指示的剪辑操作包括:背景音频添加操作;将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,得到目标多媒体数据,包括:基于背景音频添加操作将目标剪辑模板中包括的背景音频与初始多媒体数据中包括的朗读语音进行混合,得到目标多媒体数据。
根据本公开的一个或多个实施例,本公开提供了一种视频生成方法,其中,目标剪辑模板所指示的剪辑操作包括:关键词提取操作;将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,包括:针对至少一个目标文本片段,提取目标文本片段中的关键字;将关键字添加至目标文本片段对应的目标多媒体片段中。
根据本公开的一个或多个实施例,本公开提供了一种视频生成方法,其中,将关键字添加至目标文本片段对应的目标多媒体片段中, 包括:获取与关键字匹配的关键文本信息;将关键字和关键文本信息添加至目标文本片段对应的目标多媒体片段中。
根据本公开的一个或多个实施例,本公开提供了一种视频生成装置,装置包括:初始多媒体数据生成模块,用于基于接收到的文本数据生成初始多媒体数据;其中,初始多媒体数据包括文本数据的朗读语音与文本数据匹配的视频图像,初始多媒体数据包括至少一个多媒体片段,至少一个多媒体片段分别对应于文本数据划分的至少一个文本片段;至少一个多媒体片段中的目标多媒体片段与至少一个文本片段中的目标文本片段相对应,目标多媒体片段包括目标视频片段和目标语音片段,目标视频片段包括与目标文本片段匹配的视频图像,目标语音片段包括与目标文本片段匹配的朗读语音;目标剪辑模板获取模块,用于响应于剪辑模板获取请求,获取目标剪辑模板;
目标多媒体数据生成模块,用于将目标剪辑模板所指示的剪辑操作应用于初始多媒体数据,得到目标多媒体数据;目标视频生成模块,用于基于目标多媒体数据生成目标视频。
根据本公开的一个或多个实施例,本公开提供了一种视频生成装置,其中,视频图像上包括与目标文本片段匹配的字幕文本。
根据本公开的一个或多个实施例,本公开提供了一种视频生成装置,其中,目标剪辑模板获取模块,用于响应于剪辑模板获取请求,获取目标剪辑模板时,包括:目标剪辑模板确定单元,用于响应于对模板主题控件的触发操作,将触发操作对应的剪辑模板确定为目标剪辑模板;目标剪辑模板获取单元,用于获取目标剪辑模板。
根据本公开的一个或多个实施例,本公开提供了一种视频生成装置,其中,目标剪辑模板获取模块,还包括:视频编辑区域显示单元,用于响应于对剪辑模板控件的触发操作之前,显示视频编辑区域,其中,视频编辑区域中包括模板控件;蒙层区域显示单元,用于响应于对模板控件的触发操作,显示蒙层区域;在蒙层区域上显示至少一个模板主题控件。
根据本公开的一个或多个实施例,本公开提供了一种视频生成装置,其中,目标剪辑模板所指示的剪辑操作包括:视频合成操作;目标多媒体数据生成模块,具体用于基于视频合成操作将目标剪辑模板 中包括的视频片段与初始多媒体数据中包括的多媒体片段进行合成,得到目标多媒体数据。
根据本公开的一个或多个实施例,本公开提供了一种视频生成装置,其中,目标多媒体数据生成模块,具体用于基于视频合成操作,将目标剪辑模板中包括的视频片段加载至初始多媒体数据中包括的多媒体片段的设定位置,得到目标多媒体数据,其中,设定位置包括:初始多媒体数据第一帧媒体数据之前,和/或,初始多媒体数据最后一帧媒体数据之后。
根据本公开的一个或多个实施例,本公开提供了一种视频生成装置,其中,目标剪辑模板所指示的剪辑操作包括:转场设置操作;目标多媒体数据生成模块,具体用于基于转场设置操作对初始多媒体数据中包括的多媒体片段添加转场效果,得到目标多媒体数据。
根据本公开的一个或多个实施例,本公开提供了一种视频生成装置,其中,目标剪辑模板所指示的剪辑操作包括:虚拟对象添加操作;目标多媒体数据生成模块,具体用于基于虚拟对象添加操作将目标剪辑模板中包括的虚拟对象添加至初始多媒体数据的预设位置,得到目标多媒体数据。
根据本公开的一个或多个实施例,本公开提供了一种视频生成装置,其中,目标剪辑模板所指示的剪辑操作包括:背景音频添加操作;目标多媒体数据生成模块,具体用于基于背景音频添加操作将目标剪辑模板中包括的背景音频与初始多媒体数据中包括的朗读语音进行混合,得到目标多媒体数据。
根据本公开的一个或多个实施例,本公开提供了一种视频生成装置,其中,目标剪辑模板所指示的剪辑操作包括:关键词提取操作;目标多媒体数据生成模块,具体用于针对至少一个目标文本片段,提取目标文本片段中的关键字;将关键字添加至目标文本片段对应的目标多媒体片段中。
根据本公开的一个或多个实施例,本公开提供了一种视频生成装置,其中,目标多媒体数据生成模块,具体用于获取与关键字匹配的关键文本信息;将关键字和关键文本信息添加至目标文本片段对应的目标多媒体片段中。
根据本公开的一个或多个实施例,本公开提供了一种电子设备,包括:
一个或多个处理器;
存储器,用于存储一个或多个程序;
当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如本公开提供的任一所述的视频生成方法。
根据本公开的一个或多个实施例,本公开提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本公开提供的任一所述的视频生成方法。
本公开实施例还提供了一种计算机程序产品,该计算机程序产品包括计算机程序或指令,该计算机程序或指令被处理器执行时实现如上所述的视频生成方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (15)

  1. 一种视频生成方法,其特征在于,包括:
    基于接收到的文本数据生成初始多媒体数据;其中,所述初始多媒体数据包括所述文本数据的朗读语音与所述文本数据匹配的视频图像,所述初始多媒体数据包括至少一个多媒体片段,所述至少一个多媒体片段分别对应于所述文本数据划分的至少一个文本片段;所述至少一个多媒体片段中的目标多媒体片段与所述至少一个文本片段中的目标文本片段相对应,所述目标多媒体片段包括目标视频片段和目标语音片段,所述目标视频片段包括与所述目标文本片段匹配的视频图像,所述目标语音片段包括与所述目标文本片段匹配的朗读语音;
    响应于剪辑模板获取请求,获取目标剪辑模板;
    将所述目标剪辑模板所指示的剪辑操作应用于所述初始多媒体数据,得到目标多媒体数据;
    基于所述目标多媒体数据生成目标视频。
  2. 根据权利要求1所述的方法,其特征在于,所述视频图像上包括与所述目标文本片段匹配的字幕文本。
  3. 根据权利要求1所述的方法,其特征在于,响应于剪辑模板获取请求,获取目标剪辑模板,包括:
    响应于对模板主题控件的触发操作,将所述触发操作对应的剪辑模板确定为目标剪辑模板;
    获取所述目标剪辑模板。
  4. 根据权利要求3所述的方法,其特征在于,响应于对剪辑模板控件的触发操作之前,还包括:
    显示视频编辑区域,其中,所述视频编辑区域中包括模板控件;
    响应于对所述模板控件的触发操作,显示蒙层区域;
    在所述蒙层区域上显示至少一个模板主题控件。
  5. 根据权利要求1所述的方法,其特征在于,所述目标剪辑模板所指示的剪辑操作包括:视频合成操作;
    将所述目标剪辑模板所指示的剪辑操作应用于所述初始多媒体数据,得到目标多媒体数据,包括:
    基于所述视频合成操作将所述目标剪辑模板中包括的视频片段与所述初始多媒体数据中包括的多媒体片段进行合成,得到目标多媒体数据。
  6. 根据权利要求5所述的方法,其特征在于,基于所述视频合成操作将所述目标剪辑模板中包括的视频片段与所述初始多媒体数据中包括的多媒体片段进行合成,得到目标多媒体数据,包括:
    基于所述视频合成操作,将所述目标剪辑模板中包括的视频片段加载至所述初始多媒体数据中包括的多媒体片段的设定位置,得到目标多媒体数据,其中,所述设定位置包括:所述初始多媒体数据第一帧媒体数据之前,和/或,所述初始多媒体数据最后一帧媒体数据之后。
  7. 根据权利要求1所述的方法,其特征在于,所述目标剪辑模板所指示的剪辑操作包括:转场设置操作;
    将所述目标剪辑模板所指示的剪辑操作应用于所述初始多媒体数据,得到目标多媒体数据,包括:
    基于所述转场设置操作对所述初始多媒体数据中包括的多媒体片段添加转场效果,得到目标多媒体数据。
  8. 根据权利要求1所述的方法,其特征在于,所述目标剪辑模板所指示的剪辑操作包括:虚拟对象添加操作;
    将所述目标剪辑模板所指示的剪辑操作应用于所述初始多媒体数据,得到目标多媒体数据,包括:
    基于所述虚拟对象添加操作将所述目标剪辑模板中包括的虚拟对象添加至所述初始多媒体数据的预设位置,得到目标多媒体数据。
  9. 根据权利要求1所述的方法,其特征在于,所述目标剪辑模板所指示的剪辑操作包括:背景音频添加操作;
    将所述目标剪辑模板所指示的剪辑操作应用于所述初始多媒体数据,得到目标多媒体数据,包括:
    基于所述背景音频添加操作将所述目标剪辑模板中包括的背景音频与所述初始多媒体数据中包括的朗读语音进行混合,得到目标多媒体数据。
  10. 根据权利要求1所述的方法,其特征在于,所述目标剪辑模板所指示的剪辑操作包括:关键词提取操作;
    将所述目标剪辑模板所指示的剪辑操作应用于所述初始多媒体数据,包括:
    针对至少一个目标文本片段,提取所述目标文本片段中的关键字;
    将所述关键字添加至所述目标文本片段对应的目标多媒体片段中。
  11. 根据权利要求10所述的方法,其特征在于,将所述关键字添加至所述目标文本片段对应的目标多媒体片段中,包括:
    获取与所述关键字匹配的关键文本信息;
    将所述关键字和所述关键文本信息添加至所述目标文本片段对应的目标多媒体片段中。
  12. 一种视频生成装置,其特征在于,包括:
    初始多媒体数据生成模块,用于基于接收到的文本数据生成初始多媒体数据;其中,所述初始多媒体数据包括所述文本数据的朗读语音与所述文本数据匹配的视频图像,所述初始多媒体数据包括至少一个多媒体片段,所述至少一个多媒体片段分别对应于所述文本数据划分的至少一个文本片段;所述至少一个多媒体片段中的目标多媒体片段与所述至少一个文本片段中的目标文本片段相对应,所述目标多媒体片段包括目标视频片段和目标语音片段,所述目标视频片段包括与所述目标文本片段匹配的视频图像,所述目标语音片段包括与所述目标文本片段匹配的朗读语音;
    目标剪辑模板获取模块,用于响应于剪辑模板获取请求,获取目标剪辑模板;
    目标多媒体数据生成模块,用于将所述目标剪辑模板所指示的剪辑操作应用于所述初始多媒体数据,得到目标多媒体数据;
    目标视频生成模块,用于基于所述目标多媒体数据生成目标视频。
  13. 一种电子设备,其特征在于,所述电子设备包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-11中任一项所述的方法。
  14. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-11中任一项所述的方法。
  15. 一种计算机程序产品,该计算机程序产品包括计算机程序或指令,该计算机程序或指令被处理器执行时实现如权利要求1-11中任一项所述的方法。
PCT/CN2023/093089 2022-05-10 2023-05-09 视频生成方法、装置、设备、存储介质和程序产品 WO2023217155A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP23802924.3A EP4344230A4 (en) 2022-05-10 2023-05-09 VIDEO GENERATING METHOD, APPARATUS AND DEVICE, STORAGE MEDIUM AND PROGRAM PRODUCT
JP2023578709A JP2024528440A (ja) 2022-05-10 2023-05-09 ビデオ生成方法、装置、デバイス、記憶媒体およびプログラム製品
US18/573,097 US20240296871A1 (en) 2022-05-10 2023-05-09 Method, apparatus, device, storage medium and program product for video generation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210508063.2 2022-05-10
CN202210508063.2A CN117082292A (zh) 2022-05-10 2022-05-10 视频生成方法、装置、设备、存储介质和程序产品

Publications (1)

Publication Number Publication Date
WO2023217155A1 true WO2023217155A1 (zh) 2023-11-16

Family

ID=88701054

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/093089 WO2023217155A1 (zh) 2022-05-10 2023-05-09 视频生成方法、装置、设备、存储介质和程序产品

Country Status (5)

Country Link
US (1) US20240296871A1 (zh)
EP (1) EP4344230A4 (zh)
JP (1) JP2024528440A (zh)
CN (1) CN117082292A (zh)
WO (1) WO2023217155A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2025113021A1 (zh) * 2023-11-30 2025-06-05 北京字跳网络技术有限公司 一种文本处理方法、系统、装置、设备及存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118828105A (zh) * 2023-04-19 2024-10-22 北京字跳网络技术有限公司 视频生成方法、装置、设备、存储介质和程序产品
JP2025518428A (ja) 2023-04-19 2025-06-17 北京字跳▲網▼絡技▲術▼有限公司 動画生成方法、装置、機器、記憶媒体及びプログラム製品

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109756751A (zh) * 2017-11-07 2019-05-14 腾讯科技(深圳)有限公司 多媒体数据处理方法及装置、电子设备、存储介质
CN111243632A (zh) * 2020-01-02 2020-06-05 北京达佳互联信息技术有限公司 多媒体资源的生成方法、装置、设备及存储介质
CN111460183A (zh) * 2020-03-30 2020-07-28 北京金堤科技有限公司 多媒体文件生成方法和装置、存储介质、电子设备
CN112738623A (zh) * 2019-10-14 2021-04-30 北京字节跳动网络技术有限公司 视频文件的生成方法、装置、终端及存储介质
CN113452941A (zh) * 2021-05-14 2021-09-28 北京达佳互联信息技术有限公司 视频生成方法、装置、电子设备及存储介质
CN113473182A (zh) * 2021-09-06 2021-10-01 腾讯科技(深圳)有限公司 一种视频生成的方法及装置、计算机设备和存储介质
CN114339399A (zh) * 2021-12-27 2022-04-12 咪咕文化科技有限公司 多媒体文件剪辑方法、装置及计算设备

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180143741A1 (en) * 2016-11-23 2018-05-24 FlyrTV, Inc. Intelligent graphical feature generation for user content
JP6887132B2 (ja) * 2018-04-12 2021-06-16 パナソニックIpマネジメント株式会社 映像処理装置、映像処理システム及び映像処理方法
CN112449231B (zh) * 2019-08-30 2023-02-03 腾讯科技(深圳)有限公司 多媒体文件素材的处理方法、装置、电子设备及存储介质
CN111246300B (zh) * 2020-01-02 2022-04-22 北京达佳互联信息技术有限公司 剪辑模板的生成方法、装置、设备及存储介质
CN111935491B (zh) * 2020-06-28 2023-04-07 百度在线网络技术(北京)有限公司 直播的特效处理方法、装置以及服务器
US11626139B2 (en) * 2020-10-28 2023-04-11 Meta Platforms Technologies, Llc Text-driven editor for audio and video editing
CN112579826A (zh) * 2020-12-07 2021-03-30 北京字节跳动网络技术有限公司 视频显示及处理方法、装置、系统、设备、介质
US12154598B1 (en) * 2022-03-29 2024-11-26 United Services Automobile Association (Usaa) System and method for generating synthetic video segments during video editing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109756751A (zh) * 2017-11-07 2019-05-14 腾讯科技(深圳)有限公司 多媒体数据处理方法及装置、电子设备、存储介质
CN112738623A (zh) * 2019-10-14 2021-04-30 北京字节跳动网络技术有限公司 视频文件的生成方法、装置、终端及存储介质
CN111243632A (zh) * 2020-01-02 2020-06-05 北京达佳互联信息技术有限公司 多媒体资源的生成方法、装置、设备及存储介质
CN111460183A (zh) * 2020-03-30 2020-07-28 北京金堤科技有限公司 多媒体文件生成方法和装置、存储介质、电子设备
CN113452941A (zh) * 2021-05-14 2021-09-28 北京达佳互联信息技术有限公司 视频生成方法、装置、电子设备及存储介质
CN113473182A (zh) * 2021-09-06 2021-10-01 腾讯科技(深圳)有限公司 一种视频生成的方法及装置、计算机设备和存储介质
CN114339399A (zh) * 2021-12-27 2022-04-12 咪咕文化科技有限公司 多媒体文件剪辑方法、装置及计算设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4344230A4

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2025113021A1 (zh) * 2023-11-30 2025-06-05 北京字跳网络技术有限公司 一种文本处理方法、系统、装置、设备及存储介质

Also Published As

Publication number Publication date
US20240296871A1 (en) 2024-09-05
CN117082292A (zh) 2023-11-17
EP4344230A1 (en) 2024-03-27
EP4344230A4 (en) 2024-10-30
JP2024528440A (ja) 2024-07-30

Similar Documents

Publication Publication Date Title
WO2021196903A1 (zh) 视频处理方法、装置、可读介质及电子设备
US20240107127A1 (en) Video display method and apparatus, video processing method, apparatus, and system, device, and medium
KR102792043B1 (ko) 비디오 생성 장치 및 방법, 전자 장치, 및 컴퓨터 판독가능 매체
WO2023217155A1 (zh) 视频生成方法、装置、设备、存储介质和程序产品
JP6971292B2 (ja) 段落と映像を整列させるための方法、装置、サーバー、コンピュータ可読記憶媒体およびコンピュータプログラム
CN113365134A (zh) 音频分享方法、装置、设备及介质
WO2021057740A1 (zh) 视频生成方法、装置、电子设备和计算机可读介质
US20240168605A1 (en) Text input method and apparatus, and electronic device and storage medium
CN112287168A (zh) 用于生成视频的方法和装置
WO2024037491A1 (zh) 媒体内容处理方法、装置、设备及存储介质
WO2023165515A1 (zh) 拍摄方法、装置、电子设备和存储介质
WO2023169356A1 (zh) 图像处理方法、装置、设备及存储介质
CN117793478A (zh) 讲解信息生成方法、装置、设备、介质和程序产品
WO2024008184A1 (zh) 一种信息展示方法、装置、电子设备、计算机可读介质
US20250203153A1 (en) Video generation method and apparatus, and device, storage medium and program product
CN117596452A (zh) 视频生成方法、装置、介质及电子设备
CN113891168B (zh) 字幕处理方法、装置、电子设备和存储介质
JP2024521940A (ja) マルチメディア処理方法、装置、デバイスおよび媒体
JP7684446B2 (ja) ビデオ生成方法、装置、機器、記憶媒体及びプログラム製品
CN114520928B (zh) 显示信息生成方法、信息显示方法、装置和电子设备
CN114697756A (zh) 一种显示方法、装置、终端设备及介质
CN111385638B (zh) 视频处理方法和装置
CN112287173A (zh) 用于生成信息的方法和装置
US12148451B2 (en) Method, apparatus, device, storage medium and program product for video generating
JP7708980B2 (ja) テンプレート選択方法、装置、電子機器及び記憶媒体

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2023578709

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2023802924

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 18573097

Country of ref document: US

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23802924

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2023802924

Country of ref document: EP

Effective date: 20231220

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112024021953

Country of ref document: BR

WWE Wipo information: entry into national phase

Ref document number: 11202407421W

Country of ref document: SG

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 112024021953

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20241023