WO2023217155A1 - 视频生成方法、装置、设备、存储介质和程序产品 - Google Patents
视频生成方法、装置、设备、存储介质和程序产品 Download PDFInfo
- Publication number
- WO2023217155A1 WO2023217155A1 PCT/CN2023/093089 CN2023093089W WO2023217155A1 WO 2023217155 A1 WO2023217155 A1 WO 2023217155A1 CN 2023093089 W CN2023093089 W CN 2023093089W WO 2023217155 A1 WO2023217155 A1 WO 2023217155A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- multimedia data
- template
- video
- segment
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 70
- 230000004044 response Effects 0.000 claims abstract description 31
- 230000007704 transition Effects 0.000 claims description 31
- 230000000694 effects Effects 0.000 claims description 27
- 230000015572 biosynthetic process Effects 0.000 claims description 24
- 238000003786 synthesis reaction Methods 0.000 claims description 24
- 238000004590 computer program Methods 0.000 claims description 22
- 239000012634 fragment Substances 0.000 claims description 22
- 238000000605 extraction Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 10
- 238000004891 communication Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000004519 manufacturing process Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000005669 field effect Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
- G11B27/036—Insert-editing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/186—Templates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/60—Editing figures and text; Combining figures or text
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/34—Indicating arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/431—Generation of visual interfaces for content selection or interaction; Content or additional data rendering
- H04N21/4312—Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/435—Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44016—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/47205—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/488—Data services, e.g. news ticker
- H04N21/4884—Data services, e.g. news ticker for displaying subtitles
Definitions
- the present disclosure relates to the field of video processing technology, and in particular, to a video generation method, device, equipment, storage medium and program product.
- embodiments of the present disclosure provide a video generation method, device, equipment, storage medium and program product.
- the editing operation in the obtained editing template is directly applied to the multimedia data to generate the video without any need.
- Users manually edit videos, which can not only reduce the time cost of making videos, but also improve the quality of the produced videos.
- an embodiment of the present disclosure provides a video generation method, including:
- Initial multimedia data is generated based on the received text data; wherein the initial multimedia data includes a video image in which the spoken voice of the text data matches the text data, the initial multimedia data includes at least one multimedia segment, and the at least one multimedia segment respectively corresponds to the text data divided At least one text segment; the target multimedia segment in the at least one multimedia segment corresponds to the target text segment in the at least one text segment, and the target multimedia segment
- the segment includes a target video segment and a target voice segment, the target video segment includes a video image that matches the target text segment, and the target voice segment includes a reading voice that matches the target text segment;
- an embodiment of the present disclosure provides a video generation device, including:
- An initial multimedia data generation module is used to generate initial multimedia data based on the received text data; wherein the initial multimedia data includes a video image of a reading voice of the text data that matches the text data, and the initial multimedia data includes at least one multimedia segment, at least one multimedia The segments respectively correspond to at least one text segment divided into text data; a target multimedia segment in at least one multimedia segment corresponds to a target text segment in at least one text segment, and the target multimedia segment includes a target video segment and a target voice segment, and the target video segment The target text segment includes a video image that matches the target text segment, and the target speech segment includes a read voice that matches the target text segment;
- the target editing template acquisition module is used to obtain the target editing template in response to the editing template acquisition request;
- the target multimedia data generation module is used to apply the editing operation indicated by the target editing template to the initial multimedia data to obtain the target multimedia data;
- the target video generation module is used to generate the target video based on the target multimedia data.
- an embodiment of the present disclosure provides an electronic device.
- the electronic device includes:
- processors one or more processors
- a storage device for storing one or more programs
- the one or more processors When one or more programs are executed by one or more processors, the one or more processors are caused to implement the video generation method in any one of the above-mentioned first aspects.
- embodiments of the present disclosure provide a computer-readable storage medium on which a computer program is stored.
- the program is executed by a processor, the video generation method as described in any one of the above-mentioned first aspects is implemented.
- inventions of the present disclosure provide a computer program product.
- the computer program product includes a computer program or instructions. When the computer program or instructions are executed by a processor, Implement the video generation method as described in any one of the above first aspects.
- Embodiments of the present disclosure provide a video generation method, device, equipment, storage medium and program product.
- the method includes: generating initial multimedia data based on received text data; acquiring a target clipping template in response to a clipping template acquisition request; The editing operation indicated by the editing template is applied to the initial multimedia data to obtain the target multimedia data; the target video is generated based on the target multimedia data.
- Embodiments of the present disclosure generate videos by directly applying the editing operations in the obtained editing templates to multimedia data, eliminating the need for users to manually edit videos. This not only reduces the time cost of making videos, but also improves the quality of the produced videos.
- Figure 1 shows an architecture diagram of a video production scenario provided by an embodiment of the present disclosure
- Figure 2 is a schematic flowchart of a video generation method in an embodiment of the present disclosure
- Figure 3 is a schematic diagram of triggering the template theme control in an embodiment of the present disclosure
- Figure 4 is a schematic diagram of triggering a template control in an embodiment of the present disclosure
- Figure 5 is a schematic diagram of a template application prompt in an embodiment of the present disclosure
- Figure 6 is a schematic structural diagram of a video generation device in an embodiment of the present disclosure.
- FIG. 7 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure.
- the term “include” and its variations are open-ended, ie, “including but not limited to.”
- the term “based on” means “based at least in part on.”
- the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; and the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
- keywords need to be extracted from text data; for each keyword, video images matching the keyword are searched in a preset image library; text information and video images are synthesized according to typesetting rules. Get target video.
- search for each keyword, video images matching the keyword are searched in a preset image library; text information and video images are synthesized according to typesetting rules. Get target video.
- the related technology only the found video pictures and text data are simply synthesized, and the quality of the produced video is not high, and the user needs to manually edit it later. If the user lacks editing experience, test, it affects the quality of the video.
- FIG. 1 shows an architectural diagram of a video production scenario provided by an embodiment of the present disclosure.
- the architecture diagram may include at least one electronic device 101 on the client side and at least one server 102 on the server side.
- the electronic device 101 can establish a connection with the server 102 and exchange information through a network protocol such as Hyper Text Transfer Protocol over Secure Socket Layer (HTTPS).
- HTTPS Hyper Text Transfer Protocol over Secure Socket Layer
- the electronic device 101 may include mobile phones, tablet computers, desktop computers, notebook computers, vehicle-mounted terminals, wearable devices, all-in-one machines, smart home devices and other devices with communication functions, and may also include devices simulated by virtual machines or simulators.
- the server 102 may include a cloud server or a server cluster and other devices with storage and computing functions.
- the user can create videos in a designated platform on the electronic device 101, and the designated platform can be a designated application or a designated website.
- the user can send the video to the server 102 of the designated platform.
- the server 102 can receive the video sent by the electronic device 101 and store the received video to send the video to the electronic device that needs to play the video.
- the electronic device 101 can receive the user's editing template acquisition request for the initial multimedia data.
- the target editing template can be obtained, the editing operation indicated by the target editing template is applied to the initial multimedia data, the target multimedia data is obtained, and the target video is generated based on the target multimedia data. It can be seen that during the generation process of the target video, the editing operations in the obtained target editing template are directly applied to the initial multimedia data, without the need for the user to manually edit the video, which can not only reduce the time cost of making videos, but also improve the quality of the produced videos. .
- the electronic device 101 can also obtain the target clipping template after receiving the clipping template acquisition request, apply the clipping operation indicated by the target clipping template to the initial multimedia data, obtain the target multimedia data, and based on the target multimedia data
- the target video is generated so that the clip indicated by the target clip template is locally generated on the electronic device 101
- the operation is applied to the initial multimedia data to generate the target video to further reduce the time cost of video production.
- the electronic device 101 may also send a clipping template acquisition request carrying a template identifier to the server 102 after receiving the clipping template acquisition request.
- the server 102 may respond to the clipping template acquisition request, obtain the target clipping template, and apply the clipping operation indicated by the target clipping template to the initial multimedia data to obtain
- the target multimedia data is generated, and the target video is generated based on the target multimedia data, and the generated target video is sent to the electronic device 101, so that the electronic device 101 can request the server 102 to obtain the target clipping template based on the clipping template acquisition request, and obtain the target clipping template.
- the indicated clipping operation is applied to the initial multimedia data to generate a target video to further improve the quality of the produced video and reduce the data processing volume of the electronic device 101 .
- the electronic device may be a mobile terminal, a fixed terminal or a portable terminal, such as a mobile phone, a site, a unit, a device, a multimedia computer, a multimedia tablet, an Internet node, a communicator, a desktop computer, a laptop computer, a notebook computer, a netbook computer , tablet computers, personal communications system (PCS) devices, personal navigation devices, personal digital assistants (PDA), audio/video players, digital cameras/camcorders, positioning devices, television receivers, radio broadcast receivers, e-book devices, Gaming equipment or any combination thereof, including accessories and peripherals for such equipment or any combination thereof.
- PCS personal communications system
- PDA personal digital assistants
- audio/video players digital cameras/camcorders
- positioning devices television receivers, radio broadcast receivers, e-book devices, Gaming equipment or any combination thereof, including accessories and peripherals for such equipment or any combination thereof.
- the server can be a physical server or a cloud server.
- the server can be a server or a server cluster.
- FIG 2 is a flow chart of a video generation method in an embodiment of the present disclosure. This embodiment can be applied to the situation of generating a video based on text information.
- the method can be executed by a video generation device, and the video generation device can use software and/or Or implemented in hardware, the video generation method can be implemented in the electronic device described in Figure 1.
- the video generation method provided by the embodiment of the present disclosure mainly includes steps S101-S104.
- the text data may be recorded by the user through an input device.
- the data input to the electronic device can also be the data sent by other devices to the electronic device.
- the method before generating the initial multimedia data based on the received text data, the method further includes: receiving text data in response to a user's input data operation.
- the user's input data operation may include an adding operation to text data, or may include an input operation to text data, which is not specifically limited in this embodiment.
- the initial multimedia data includes a video image in which the spoken voice of the text data matches the text data
- the initial multimedia data includes at least one multimedia segment
- the at least one multimedia segment respectively corresponds to at least one text segment divided by the text data.
- the target multimedia segment in at least one multimedia segment corresponds to the target text segment in at least one text segment
- the target multimedia segment includes a target video segment and a target voice segment
- the target video segment includes a video image that matches the target text segment
- the segments include spoken speech that matches the target text segment.
- generating initial multimedia data based on the received text data includes: dividing the received text data into at least one text segment, where the text segment includes a plurality of target text segments. For each target text segment, the video image corresponding to the target text segment is searched in the preset gallery based on the target text segment, and the video image is processed according to the preset animation effect to obtain the target video segment corresponding to the target text segment. Obtain the reading speech matching the target text fragment and generate the target speech fragment. The target video clip and the target voice clip are synthesized to obtain the target multimedia clip. For each target text fragment, multiple target multimedia fragments are obtained, and the multiple target multimedia fragments are synthesized in the order of the target text fragments to obtain initial multimedia data.
- subtitle text matching the target text segment is included on the video image.
- subtitle text matching the target text segment is added to the video image, so that the user can intuitively see the subtitles corresponding to the read speech while watching the video, thereby improving the user's viewing experience.
- responding to the clipping template acquisition request may be responding to the clipping template acquisition request after receiving a user's operation on the electronic device. It may also be that after detecting the initial multimedia data generation, responding to the clip template acquisition request.
- the target clip template can be based on the user's operation of the electronic device, the selected clip Templates can also be automatically matched to clipping templates based on keywords in text data.
- obtaining the target clipping template includes: the electronic device directly obtains the target clipping module from a locally pre-stored template database.
- obtaining the target clip template includes: the electronic device obtains a template identifier corresponding to the target clip template, sends a clip template acquisition request carrying the template identifier to the server, and the server responds to the clip template carrying the template identifier.
- a template acquisition request is made, and a target clipping template is obtained based on the template identification, and the obtained target clipping template is returned to the electronic device.
- a prompt popup box is displayed in the display interface of the electronic device, and the prompt popup box is used to prompt the failure to obtain the target clipping template.
- obtaining the target clip template in response to a clip template acquisition request, includes: in response to a trigger operation on the template theme control, determining the clip template corresponding to the trigger operation as the target clip template; obtaining the target clip template .
- At least one template theme control is displayed on the interactive interface of the electronic device, and in response to the user's triggering operation on the template theme control, the clipping template corresponding to the triggering operation is determined as the target clipping template.
- the clipping template corresponding to the template theme 1 control is determined as the target clipping template.
- the target editing template is selected through the user's triggering operation, which facilitates the user to select a satisfactory editing template and improves the user experience.
- the method before responding to the triggering operation on the clipping template control, the method further includes: displaying a video editing area, wherein the video editing area includes the template control; responding to the triggering operation on the template control, displaying the mask layer Area; displays at least one template theme control on the masked area.
- the video preview area 10 and the video editing area 20 are displayed in the display interface of the electronic device, and the video editing area 20 includes a plurality of editing controls, For example: template control, screen control, text control, reading tone control and music control.
- the template control is used to indicate that the user can use the existing template to edit the initial multimedia data.
- the screen control is used to instruct the user to edit the video image in the initial multimedia data.
- Text controls are used to instruct users about The subtitle text in the initial multimedia data is edited.
- the reading voice control is used to instruct the user to edit the reading voice in the initial multimedia data.
- the music control is used to instruct the user to edit the background music in the initial multimedia data.
- a masked layer area is displayed, and multiple clip template theme controls are displayed in the masked layer area.
- multiple clip template theme controls are displayed with a left and right sliding effect.
- multiple template theme controls are displayed in response to the user's triggering operation on the template control, making the operation simple and easy to understand and convenient for the user to operate.
- the target clipping template includes at least one clipping operation, the clipping operation is applied to the initial multimedia data, and the clipping operation can be performed on the initial multimedia data.
- the electronic device in the process of applying the clipping operation indicated by the target clipping template to the initial multimedia data, since clipping the initial multimedia data requires a certain amount of time, the electronic device An application prompt box is displayed in the display interface, and the application prompt box is used to prompt the user that the initial multimedia video is being edited using the editing operation indicated in the editing template.
- a prompt message indicating that the clipping template is successfully applied is displayed; if the clipping operation indicated by the target clipping template is unsuccessfully applied to the initial multimedia data data, a prompt message indicating that the clipping template application failed will be displayed, and the user will be prompted to reselect the clipping template.
- the editing operation indicated by the target editing template includes: a video synthesis operation; applying the editing operation indicated by the target editing template to the initial multimedia data to obtain the target multimedia data includes: based on the video synthesis operation
- the video clips included in the target editing template are synthesized with the multimedia clips included in the initial multimedia data to obtain target multimedia data.
- the target clip template includes one or more video clips.
- the editing operation indicated by the target editing template includes: a video synthesis operation, combine one or more video clips included in the target editing template with the initial multimedia data The multimedia fragments included in it are synthesized to obtain the target multimedia data.
- the video clip included in the target clip template is added between any two video frames of the multimedia clip.
- the above video clip synthesis operation can be any existing video synthesis method, and is no longer specifically limited in this embodiment.
- the synthesis of multiple videos is realized through the video synthesis operation in the editing template, which avoids the user's manual synthesis of videos, reduces the time cost of making videos, and improves the quality of the produced videos.
- synthesizing the video clips included in the target clipping template and the multimedia clips included in the initial multimedia data based on the video synthesis operation to obtain the target multimedia data includes: based on the video synthesis operation, combining the target clipping template
- the video clips included in are loaded to the set positions of the multimedia clips included in the initial multimedia data to obtain the target multimedia data, where the set positions include: before the first frame of media data of the initial multimedia data, and/or, at the end of the initial multimedia data After one frame of media data.
- the target editing template includes multiple video clips and the corresponding adding positions of each video clip.
- the video clip is added before the first frame of media data of the initial multimedia data as the target video title.
- the video clip is added after the last frame of media data of the initial multimedia data as the beginning of the target video.
- the text theme is added to the position of the text theme in the video clip corresponding to the title, and the text theme is processed according to the text theme display effect included in the target clip template. Edit and render on screen. Further, if the text data includes a text author, the text author is added to the text author's position in the video clip corresponding to the title, and the text author information is edited and rendered on the screen according to the text author display effect included in the target editing template. .
- the video producer's information is obtained, the video producer's information is added to the producer's position in the video clip corresponding to the end, and displayed according to the video producer included in the target editing template. The effect edits the video producer's information and renders it on the screen.
- the operation of adding the title and/or the end is realized, which avoids the user from manually adding the beginning or the end, reduces the time cost of making the video, and improves the quality of the produced video.
- the editing operation indicated by the target editing template includes: a transition setting operation; applying the editing operation indicated by the target editing template to the initial multimedia data to obtain the target multimedia data includes: based on the transition setting operation The operation is to add a transition effect to the multimedia clips included in the initial multimedia data to obtain target multimedia data.
- the initial multimedia data includes multiple video images that match the text data.
- the process of switching multiple video images inevitably involves image transition settings.
- users need to manually set the transition effect between two adjacent video images, which increases the time cost of video production.
- the transition effect includes one or more of the following: blind animation effect, cut-in animation effect, flash animation effect, gradient animation effect, cross dissolve animation effect, zoom animation effect, etc.
- the editing operation indicated by the target editing template includes: a transition setting operation, and the transition setting operation includes multiple transition effect types. Multiple transition effect types included in the transition setting operation are applied to the multimedia clips, so that each multimedia clip has its own corresponding transition effect.
- the transition effect type is applied to the multimedia clips, so that the multimedia clips have the same transition effect.
- transition effects are added to multimedia clips through the transition setting operation in the editing template, which avoids users manually setting transition effects, reduces the time cost of making videos, and improves the quality of the produced videos.
- applying the clipping operation indicated by the target clipping template includes: a virtual object adding operation; applying the clipping operation indicated by the target clipping template to the initial multimedia data to obtain the target multimedia data, including: based on the virtual object The adding operation adds the virtual object included in the target clipping template to a preset position of the initial multimedia data to obtain the target multimedia data.
- virtual objects include: target video clips, virtual stickers, virtual objects, virtual cards and other objects.
- Optional can include: face Decorative features, headwear features, clothing features, clothing accessories features, etc.
- the virtual object saved in the target clipping template may be directly added to a preset position of the initial multimedia data.
- Specific parameters of the preset positions can be saved in the target clip template, optionally. Add the glitter effect sticker to the third video image according to the settings saved in the target clip template.
- the location where the virtual object is added can be determined based on keywords proposed in the text information.
- virtual objects are added to multimedia clips through the virtual object addition operation in the editing template, which avoids users from manually adding virtual objects, reduces the time cost of making videos, and improves the quality of the produced videos.
- the editing operation indicated by the target editing template includes: a background audio adding operation; applying the editing operation indicated by the target editing template to the initial multimedia data to obtain the target multimedia data includes: adding based on the background audio The operation mixes the background audio included in the target clip template and the spoken voice included in the initial multimedia data to obtain target multimedia data.
- a background audio is included in the target clip template. Based on the back audio adding operation, based on the timestamp corresponding to the background audio and the timestamp corresponding to the read voice, the background audio and the read voice are mixed to obtain the target multimedia data.
- the playback parameters of the background audio are adjusted based on the playback parameters of the read speech, so that the two can be more integrated.
- background music is added to multimedia clips through the adding operation of background audio in the editing template, which avoids users from manually adding background music, reduces the time cost of making videos, and improves the quality of produced videos.
- the clipping operation indicated by the target clipping template includes: a keyword extraction operation; applying the clipping operation indicated by the target clipping template to the initial multimedia data includes: for at least one target text segment, extracting the target Keywords in text fragments; add keywords to the target multimedia fragment corresponding to the target text fragment.
- the keyword may be a date, a number, a person's name, a proper name, a place name, a plant, an animal, etc.
- the target text segment is "Zhang San reported to "Li Si paid cash 200,000 yuan”
- the keyword extracted from the target text fragment is "200,000 yuan”
- the keyword "200,000 yuan” is added to the target multimedia fragment corresponding to the target text fragment.
- the target editing module also includes: keyword parameters, where the keyword parameters include: keyword color, font, added effects, etc. Set the keyword display information in the target multimedia clip according to the keyword parameters.
- keywords are added to the multimedia clips through keyword extraction operations in the editing template, so that users can more clearly understand the key information of the text clips.
- adding keywords to the target multimedia segment corresponding to the target text segment includes: obtaining key text information that matches the keyword; adding the keyword and key text information to the target text segment corresponding to in the target multimedia clip.
- key information matching the keywords is obtained based on the above keywords.
- the keyword is “Wang Wu”
- the key information matching the keyword is: Wang Wu is an actor, and his representative works are “TV Series A” and “Movie B”.
- “ ⁇ ” is used as the keyword
- “actor” and “representative works “TV Series A” and “Movie B”” are used as key text information and added to the target multimedia clip.
- the keyword is "Office embezzlement crime”
- the matching key text information is "Office embezzlement crime” refers to the personnel of a company, enterprise or other unit who take advantage of their position to illegally take possession of the unit's property as their own.
- different display parameters can be set for keywords and key text information.
- the above-mentioned key text information matching keywords may be text information extracted from text data, or may be text information obtained from the Internet or a preset knowledge base.
- the method of obtaining key text information is no longer specifically limited in this embodiment.
- key text information is extracted through keywords, and keywords and key text information are added to the video, so that users can quickly understand keyword-related Knowledge helps users understand the content of text data.
- Embodiments of the present disclosure provide a video generation method, device, equipment, storage medium and program product.
- the method includes: generating initial multimedia data based on received text data; wherein the initial multimedia data includes the reading voice of the text data and the text data.
- Matching video images, the initial multimedia data includes at least one multimedia segment, and the at least one multimedia segment respectively corresponds to at least one text segment divided by the text data; the target multimedia segment in the at least one multimedia segment is consistent with the target text segment in the at least one text segment.
- the target multimedia segment includes a target video segment and a target voice segment
- the target video segment includes a video image that matches the target text segment
- the target voice segment includes a reading voice that matches the target text segment
- the target clip is obtained Template
- Apply the editing operation indicated by the target editing template to the initial multimedia data to obtain the target multimedia data
- Embodiments of the present disclosure generate videos by directly applying the editing operations in the obtained editing templates to multimedia data, eliminating the need for users to manually edit videos. This not only reduces the time cost of making videos, but also improves the quality of the produced videos.
- Figure 6 is a flow chart of a video generation method in an embodiment of the present disclosure. This embodiment can be applied to the situation of generating a video based on text information.
- the method can be executed by a video generation device, and the video generation device can use software and/or Or implemented in hardware, the video generating device can be configured in an electronic device.
- the video generation device 60 provided by the embodiment of the present disclosure mainly includes: an initial multimedia data generation module 61 , a target clipping template acquisition module 62 , a target multimedia data generation module 63 and a target video generation module 64 .
- the initial multimedia data generation module 61 is used to generate initial multimedia data based on the received text data; wherein the initial multimedia data includes a video image in which the reading voice of the text data matches the text data, and the initial multimedia data includes at least one multimedia segment, At least one multimedia segment respectively corresponds to at least one text segment divided into text data; a target multimedia segment in at least one multimedia segment corresponds to a target text segment in at least one text segment, and the target multimedia segment includes a target video segment and a target voice segment,
- the target video clip includes a video image that matches the target text clip, and the target speech clip includes a reading voice that matches the target text clip;
- the target clip template obtains the model
- Block 62 is used to obtain the target clip template in response to the clip template acquisition request;
- the target multimedia data generation module 63 is used to apply the clip operation indicated by the target clip template to the initial multimedia data to obtain the target multimedia data; the target video generation module 64, used to generate target videos based on target multimedia data.
- subtitle text matching the target text segment is included on the video image.
- the target clipping template acquisition module 62 is used to obtain the target clipping template in response to a clipping template acquisition request, including: a target clipping template determination unit, used to respond to a triggering operation on the template theme control. , determine the clipping template corresponding to the trigger operation as the target clipping template; the target clipping template acquisition unit is used to obtain the target clipping template.
- the target clipping template acquisition module 62 also includes: a video editing area display unit, configured to display the video editing area before responding to a triggering operation on the clipping template control, wherein the video editing area includes: A template control; a masked layer area display unit, used for displaying the masked layer area in response to a triggering operation on the template control; and displaying at least one template theme control on the masked layer area.
- a video editing area display unit configured to display the video editing area before responding to a triggering operation on the clipping template control
- the video editing area includes: A template control; a masked layer area display unit, used for displaying the masked layer area in response to a triggering operation on the template control; and displaying at least one template theme control on the masked layer area.
- the editing operation indicated by the target editing template includes: a video synthesis operation; a target multimedia data generation module 63, specifically configured to combine the video clips included in the target editing template with the initial multimedia data based on the video synthesis operation.
- the multimedia fragments included in it are synthesized to obtain the target multimedia data.
- the target multimedia data generation module 63 is specifically configured to load the video clips included in the target clipping template to the set positions of the multimedia clips included in the initial multimedia data based on the video synthesis operation to obtain the target Multimedia data, wherein the setting position includes: before the first frame of media data of the initial multimedia data, and/or after the last frame of media data of the initial multimedia data.
- the editing operation indicated by the target editing template includes: a transition setting operation; a target multimedia data generation module 63, specifically configured to add transitions to the multimedia clips included in the initial multimedia data based on the transition setting operation. field effect to obtain the target multimedia data.
- the editing operation indicated by the target editing template includes: a virtual object adding operation; the target multimedia data generation module 63, which is specifically used based on The virtual object adding operation adds the virtual object included in the target clip template to a preset position of the initial multimedia data to obtain the target multimedia data.
- the editing operation indicated by the target editing template includes: a background audio adding operation; the target multimedia data generation module 63 is specifically configured to combine the background audio included in the target editing template with the initial background audio based on the background audio adding operation. The reading voices included in the multimedia data are mixed to obtain target multimedia data.
- the editing operation indicated by the target editing template includes: a keyword extraction operation; the target multimedia data generation module 63 is specifically configured to extract keywords in the target text segment for at least one target text segment; Add keywords to the target multimedia segment corresponding to the target text segment.
- the target multimedia data generation module 63 is specifically configured to obtain key text information matching keywords; add keywords and key text information to the target multimedia segment corresponding to the target text segment.
- the video generation device provided by the embodiments of the present disclosure can execute the steps performed in the video generation method provided by the method embodiments of the present disclosure. The execution steps and beneficial effects will not be described again here.
- FIG. 7 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure.
- the electronic device 700 in the embodiment of the present disclosure may include, but is not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Tablets), PMPs (Portable Multimedia Players), vehicle-mounted terminals ( Mobile terminals such as car navigation terminals), wearable terminal devices, etc., and fixed terminals such as digital TVs, desktop computers, smart home devices, etc.
- the electronic device shown in FIG. 7 is only an example and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
- the electronic device 700 may include a processing device (eg, central processing unit, graphics processor, etc.) 701 that may be loaded into a random access device according to a program stored in a read-only memory (ROM) 702 or from a storage device 708 .
- the program in the memory (RAM) 703 performs various appropriate actions and processes to implement the image rendering method according to the embodiments of the present disclosure.
- various programs and data required for the operation of the terminal device 700 are also stored.
- the processing device 701, the ROM 702 and the RAM 703 are connected to each other via a bus 704.
- An input/output (I/O) interface 705 is also connected to bus 704.
- the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 707 such as a computer; a storage device 708 including a magnetic tape, a hard disk, etc.; and a communication device 709.
- the communication device 709 may allow the terminal device 700 to communicate wirelessly or wiredly with other devices to exchange data.
- FIG. 7 shows the terminal device 700 having various means, it should be understood that implementation or possession of all illustrated means is not required. More or fewer means may alternatively be implemented or provided.
- embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer-readable medium, the computer program including program code for executing the method shown in the flowchart, thereby achieving the above The video generation method described.
- the computer program may be downloaded and installed from the network via communication device 709, or from storage device 708, or from ROM 702.
- the processing device 701 the above-mentioned functions defined in the method of the embodiment of the present disclosure are performed.
- the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
- the computer-readable storage medium may be, for example, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination thereof. More specific examples of computer readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), removable Programmd read-only memory (EPROM or flash memory), fiber optics, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
- a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.
- a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
- a computer-readable signal medium may also be any computer-readable medium other than computer-readable storage media, The computer-readable signal medium may send, propagate, or transport the program for use by or in connection with an instruction execution system, apparatus, or device.
- Program code embodied on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wire, optical cable, RF (radio frequency), etc., or any suitable combination of the above.
- the client and server can communicate using any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol), and can communicate with digital data in any form or medium.
- Communications e.g., communications network
- communications networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (e.g., the Internet), and end-to-end networks (e.g., ad hoc end-to-end networks), as well as any currently known or developed in the future network of.
- the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; it may also exist independently without being assembled into the electronic device.
- the computer-readable medium carries one or more programs.
- the terminal device When the one or more programs are executed by the terminal device, the terminal device: generates initial multimedia data based on the received text data; wherein the initial multimedia data includes text A video image in which the reading voice of the data matches the text data.
- the initial multimedia data includes at least one multimedia segment, and the at least one multimedia segment respectively corresponds to at least one text segment divided by the text data; the target multimedia segment in the at least one multimedia segment is consistent with at least one text segment.
- the target multimedia segment includes a target video segment and a target voice segment
- the target video segment includes a video image that matches the target text segment
- the target voice segment includes a reading voice that matches the target text segment
- the terminal device may also perform other steps described in the above embodiments.
- Computer program code for performing the operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages—such as "C” or similar programming languages.
- the program code may execute entirely on the user's computer, partially on the user's computer, as a Stand-alone software packages execute, partially on the user's computer and partially on a remote computer, or entirely on the remote computer or server.
- the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through Internet connection).
- LAN local area network
- WAN wide area network
- Internet service provider such as an Internet service provider through Internet connection
- each block in the flowchart or block diagram may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions.
- the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
- each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or operations. , or can be implemented using a combination of specialized hardware and computer instructions.
- the units involved in the embodiments of the present disclosure can be implemented in software or hardware. Among them, the name of a unit does not constitute a limitation on the unit itself under certain circumstances.
- FPGAs Field Programmable Gate Arrays
- ASICs Application Specific Integrated Circuits
- ASSPs Application Specific Standard Products
- SOCs Systems on Chips
- CPLD Complex Programmable Logical device
- a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
- the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
- Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
- machine-readable storage media would include electrical connections based on one or more wires, portable computer disks, hard drives, random access storage (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device , or any suitable combination of the above.
- RAM random access storage
- ROM read-only memory
- EPROM or flash memory erasable programmable read-only memory
- CD-ROM compact disk read-only memory
- magnetic storage device magnetic storage device
- the present disclosure provides a video generation method, including: generating initial multimedia data based on received text data; wherein the initial multimedia data includes a reading voice of the text data that matches the text data.
- the video image, the initial multimedia data includes at least one multimedia segment, and the at least one multimedia segment respectively corresponds to at least one text segment divided by the text data; the target multimedia segment in the at least one multimedia segment corresponds to the target text segment in the at least one text segment.
- the target multimedia clip includes a target video clip and a target voice clip
- the target video clip includes a video image that matches the target text clip
- the target voice clip includes a reading voice that matches the target text clip
- the present disclosure provides a video generation method, wherein the video image includes subtitle text matching the target text segment.
- the present disclosure provides a video generation method, wherein in response to a clip template acquisition request, obtaining a target clip template includes: in response to a trigger operation on a template theme control, triggering an operation The corresponding editing template is determined as the target editing template; the target editing template is obtained.
- the present disclosure provides a video generation method, wherein before responding to the triggering operation on the clip template control, the method further includes: displaying a video editing area, wherein the video editing area includes a template Control; in response to a triggering operation on the template control, display the masked area; display at least one template theme control on the masked area.
- the present disclosure provides a video generation method, wherein applying the editing operation indicated by the target editing template includes: a video synthesis operation; applying the editing operation indicated by the target editing template to the initial Multimedia data, obtaining target multimedia data includes: synthesizing video clips included in the target editing template and multimedia clips included in the initial multimedia data based on a video synthesis operation to obtain target multimedia data.
- the present disclosure provides a video generation method, wherein video segments included in the target clip template and multimedia segments included in the initial multimedia data are synthesized based on a video synthesis operation to obtain the target
- the multimedia data includes: based on the video synthesis operation, loading the video clips included in the target editing template to the set positions of the multimedia clips included in the initial multimedia data to obtain the target multimedia data, where the set positions include: the initial multimedia data No. Before one frame of media data, and/or, after the last frame of media data of the initial multimedia data.
- the present disclosure provides a video generation method, wherein applying the editing operation indicated by the target editing template includes: a transition setting operation; applying the editing operation indicated by the target editing template to Initial multimedia data to obtain target multimedia data includes: adding transition effects to multimedia clips included in the initial multimedia data based on a transition setting operation to obtain target multimedia data.
- the present disclosure provides a video generation method, wherein applying the editing operation indicated by the target editing template includes: a virtual object adding operation; applying the editing operation indicated by the target editing template to Initial multimedia data and obtaining target multimedia data include: adding virtual objects included in the target clipping template to a preset position of the initial multimedia data based on a virtual object adding operation to obtain target multimedia data.
- the present disclosure provides a video generation method, wherein the editing operation indicated by the target editing template includes: a background audio adding operation; applying the editing operation indicated by the target editing template to the initial Multimedia data, obtaining the target multimedia data includes: mixing the background audio included in the target clip template and the reading voice included in the initial multimedia data based on the background audio adding operation to obtain the target multimedia data.
- the present disclosure provides a video generation method, wherein the editing operation indicated by the target editing template includes: a keyword extraction operation; applying the editing operation indicated by the target editing template to the initial
- the multimedia data includes: extracting keywords from the target text segment for at least one target text segment; and adding the keywords to the target multimedia segment corresponding to the target text segment.
- the present disclosure provides a video generation method, wherein keywords are added to the target multimedia segment corresponding to the target text segment, It includes: obtaining key text information that matches the keyword; adding the keyword and key text information to the target multimedia fragment corresponding to the target text fragment.
- the present disclosure provides a video generation device.
- the device includes: an initial multimedia data generation module, configured to generate initial multimedia data based on received text data; wherein the initial multimedia data includes text A video image in which the reading voice of the data matches the text data.
- the initial multimedia data includes at least one multimedia segment, and the at least one multimedia segment respectively corresponds to at least one text segment divided by the text data; the target multimedia segment in the at least one multimedia segment is consistent with at least one text segment.
- the target multimedia segment includes a target video segment and a target voice segment
- the target video segment includes a video image that matches the target text segment
- the target voice segment includes a reading voice that matches the target text segment
- the target clip template The acquisition module is used to obtain the target editing template in response to the editing template acquisition request;
- the target multimedia data generation module is used to apply the editing operation indicated by the target editing template to the initial multimedia data to obtain the target multimedia data; the target video generation module is used to generate the target video based on the target multimedia data.
- the present disclosure provides a video generation device, wherein the video image includes subtitle text matching the target text segment.
- the present disclosure provides a video generation device, wherein the target clip template acquisition module is configured to obtain the target clip template in response to the clip template acquisition request, including: target clip template determination The unit is used to determine the clipping template corresponding to the triggering operation as the target clipping template in response to the triggering operation on the template theme control; the target clipping template acquisition unit is used to obtain the target clipping template.
- the present disclosure provides a video generation device, wherein the target clip template acquisition module further includes: a video editing area display unit, configured to respond to a triggering operation on the clip template control before , display the video editing area, wherein the video editing area includes a template control; the masked layer area display unit is used to display the masked layer area in response to a triggering operation on the template control; and display at least one template theme control on the masked layer area.
- the present disclosure provides a video generation device, wherein the editing operation indicated by the target editing template includes: a video synthesis operation; a target multimedia data generation module, specifically used for based on the video synthesis operation Target clipping template
- the video clips included in the multimedia data are synthesized with the multimedia clips included in the initial multimedia data to obtain the target multimedia data.
- the present disclosure provides a video generation device, wherein the target multimedia data generation module is specifically configured to load the video clips included in the target clip template into the initial multimedia based on the video synthesis operation.
- the set position of the multimedia fragment included in the data is used to obtain the target multimedia data, where the set position includes: before the first frame of media data of the initial multimedia data, and/or after the last frame of media data of the initial multimedia data.
- the present disclosure provides a video generation device, wherein the editing operation indicated by the target editing template includes: a transition setting operation; a target multimedia data generation module, specifically configured to generate a video based on the transition
- the setting operation adds transition effects to the multimedia clips included in the initial multimedia data to obtain target multimedia data.
- the present disclosure provides a video generation device, wherein the editing operation indicated by the target editing template includes: a virtual object adding operation; a target multimedia data generation module, specifically configured to generate video based on the virtual object The adding operation adds the virtual object included in the target clipping template to a preset position of the initial multimedia data to obtain the target multimedia data.
- the present disclosure provides a video generation device, in which the editing operation indicated by the target editing template includes: a background audio adding operation; a target multimedia data generation module, specifically for based on the background audio The adding operation mixes the background audio included in the target clip template and the spoken voice included in the initial multimedia data to obtain target multimedia data.
- the present disclosure provides a video generation device, wherein the editing operation indicated by the target editing template includes: a keyword extraction operation; a target multimedia data generation module, specifically configured to target at least one Target text fragment, extract keywords from the target text fragment; add the keywords to the target multimedia fragment corresponding to the target text fragment.
- the present disclosure provides a video generation device, in which the target multimedia data generation module is specifically configured to obtain key text information matching keywords; add keywords and key text information to the target multimedia segment corresponding to the target text segment.
- the present disclosure provides an electronic device, including:
- processors one or more processors
- Memory used to store one or more programs
- the one or more processors are caused to implement any of the video generation methods provided by this disclosure.
- the present disclosure provides a computer-readable storage medium having a computer program stored thereon.
- the program is executed by a processor, the video generation as described in any one provided by the present disclosure is implemented. method.
- Embodiments of the present disclosure also provide a computer program product.
- the computer program product includes a computer program or instructions. When the computer program or instructions are executed by a processor, the video generation method as described above is implemented.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Television Signal Processing For Recording (AREA)
- Studio Circuits (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
Claims (15)
- 一种视频生成方法,其特征在于,包括:基于接收到的文本数据生成初始多媒体数据;其中,所述初始多媒体数据包括所述文本数据的朗读语音与所述文本数据匹配的视频图像,所述初始多媒体数据包括至少一个多媒体片段,所述至少一个多媒体片段分别对应于所述文本数据划分的至少一个文本片段;所述至少一个多媒体片段中的目标多媒体片段与所述至少一个文本片段中的目标文本片段相对应,所述目标多媒体片段包括目标视频片段和目标语音片段,所述目标视频片段包括与所述目标文本片段匹配的视频图像,所述目标语音片段包括与所述目标文本片段匹配的朗读语音;响应于剪辑模板获取请求,获取目标剪辑模板;将所述目标剪辑模板所指示的剪辑操作应用于所述初始多媒体数据,得到目标多媒体数据;基于所述目标多媒体数据生成目标视频。
- 根据权利要求1所述的方法,其特征在于,所述视频图像上包括与所述目标文本片段匹配的字幕文本。
- 根据权利要求1所述的方法,其特征在于,响应于剪辑模板获取请求,获取目标剪辑模板,包括:响应于对模板主题控件的触发操作,将所述触发操作对应的剪辑模板确定为目标剪辑模板;获取所述目标剪辑模板。
- 根据权利要求3所述的方法,其特征在于,响应于对剪辑模板控件的触发操作之前,还包括:显示视频编辑区域,其中,所述视频编辑区域中包括模板控件;响应于对所述模板控件的触发操作,显示蒙层区域;在所述蒙层区域上显示至少一个模板主题控件。
- 根据权利要求1所述的方法,其特征在于,所述目标剪辑模板所指示的剪辑操作包括:视频合成操作;将所述目标剪辑模板所指示的剪辑操作应用于所述初始多媒体数据,得到目标多媒体数据,包括:基于所述视频合成操作将所述目标剪辑模板中包括的视频片段与所述初始多媒体数据中包括的多媒体片段进行合成,得到目标多媒体数据。
- 根据权利要求5所述的方法,其特征在于,基于所述视频合成操作将所述目标剪辑模板中包括的视频片段与所述初始多媒体数据中包括的多媒体片段进行合成,得到目标多媒体数据,包括:基于所述视频合成操作,将所述目标剪辑模板中包括的视频片段加载至所述初始多媒体数据中包括的多媒体片段的设定位置,得到目标多媒体数据,其中,所述设定位置包括:所述初始多媒体数据第一帧媒体数据之前,和/或,所述初始多媒体数据最后一帧媒体数据之后。
- 根据权利要求1所述的方法,其特征在于,所述目标剪辑模板所指示的剪辑操作包括:转场设置操作;将所述目标剪辑模板所指示的剪辑操作应用于所述初始多媒体数据,得到目标多媒体数据,包括:基于所述转场设置操作对所述初始多媒体数据中包括的多媒体片段添加转场效果,得到目标多媒体数据。
- 根据权利要求1所述的方法,其特征在于,所述目标剪辑模板所指示的剪辑操作包括:虚拟对象添加操作;将所述目标剪辑模板所指示的剪辑操作应用于所述初始多媒体数据,得到目标多媒体数据,包括:基于所述虚拟对象添加操作将所述目标剪辑模板中包括的虚拟对象添加至所述初始多媒体数据的预设位置,得到目标多媒体数据。
- 根据权利要求1所述的方法,其特征在于,所述目标剪辑模板所指示的剪辑操作包括:背景音频添加操作;将所述目标剪辑模板所指示的剪辑操作应用于所述初始多媒体数据,得到目标多媒体数据,包括:基于所述背景音频添加操作将所述目标剪辑模板中包括的背景音频与所述初始多媒体数据中包括的朗读语音进行混合,得到目标多媒体数据。
- 根据权利要求1所述的方法,其特征在于,所述目标剪辑模板所指示的剪辑操作包括:关键词提取操作;将所述目标剪辑模板所指示的剪辑操作应用于所述初始多媒体数据,包括:针对至少一个目标文本片段,提取所述目标文本片段中的关键字;将所述关键字添加至所述目标文本片段对应的目标多媒体片段中。
- 根据权利要求10所述的方法,其特征在于,将所述关键字添加至所述目标文本片段对应的目标多媒体片段中,包括:获取与所述关键字匹配的关键文本信息;将所述关键字和所述关键文本信息添加至所述目标文本片段对应的目标多媒体片段中。
- 一种视频生成装置,其特征在于,包括:初始多媒体数据生成模块,用于基于接收到的文本数据生成初始多媒体数据;其中,所述初始多媒体数据包括所述文本数据的朗读语音与所述文本数据匹配的视频图像,所述初始多媒体数据包括至少一个多媒体片段,所述至少一个多媒体片段分别对应于所述文本数据划分的至少一个文本片段;所述至少一个多媒体片段中的目标多媒体片段与所述至少一个文本片段中的目标文本片段相对应,所述目标多媒体片段包括目标视频片段和目标语音片段,所述目标视频片段包括与所述目标文本片段匹配的视频图像,所述目标语音片段包括与所述目标文本片段匹配的朗读语音;目标剪辑模板获取模块,用于响应于剪辑模板获取请求,获取目标剪辑模板;目标多媒体数据生成模块,用于将所述目标剪辑模板所指示的剪辑操作应用于所述初始多媒体数据,得到目标多媒体数据;目标视频生成模块,用于基于所述目标多媒体数据生成目标视频。
- 一种电子设备,其特征在于,所述电子设备包括:一个或多个处理器;存储装置,用于存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-11中任一项所述的方法。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-11中任一项所述的方法。
- 一种计算机程序产品,该计算机程序产品包括计算机程序或指令,该计算机程序或指令被处理器执行时实现如权利要求1-11中任一项所述的方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP23802924.3A EP4344230A4 (en) | 2022-05-10 | 2023-05-09 | VIDEO GENERATING METHOD, APPARATUS AND DEVICE, STORAGE MEDIUM AND PROGRAM PRODUCT |
JP2023578709A JP2024528440A (ja) | 2022-05-10 | 2023-05-09 | ビデオ生成方法、装置、デバイス、記憶媒体およびプログラム製品 |
US18/573,097 US20240296871A1 (en) | 2022-05-10 | 2023-05-09 | Method, apparatus, device, storage medium and program product for video generation |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210508063.2 | 2022-05-10 | ||
CN202210508063.2A CN117082292A (zh) | 2022-05-10 | 2022-05-10 | 视频生成方法、装置、设备、存储介质和程序产品 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023217155A1 true WO2023217155A1 (zh) | 2023-11-16 |
Family
ID=88701054
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/093089 WO2023217155A1 (zh) | 2022-05-10 | 2023-05-09 | 视频生成方法、装置、设备、存储介质和程序产品 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20240296871A1 (zh) |
EP (1) | EP4344230A4 (zh) |
JP (1) | JP2024528440A (zh) |
CN (1) | CN117082292A (zh) |
WO (1) | WO2023217155A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2025113021A1 (zh) * | 2023-11-30 | 2025-06-05 | 北京字跳网络技术有限公司 | 一种文本处理方法、系统、装置、设备及存储介质 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118828105A (zh) * | 2023-04-19 | 2024-10-22 | 北京字跳网络技术有限公司 | 视频生成方法、装置、设备、存储介质和程序产品 |
JP2025518428A (ja) | 2023-04-19 | 2025-06-17 | 北京字跳▲網▼絡技▲術▼有限公司 | 動画生成方法、装置、機器、記憶媒体及びプログラム製品 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109756751A (zh) * | 2017-11-07 | 2019-05-14 | 腾讯科技(深圳)有限公司 | 多媒体数据处理方法及装置、电子设备、存储介质 |
CN111243632A (zh) * | 2020-01-02 | 2020-06-05 | 北京达佳互联信息技术有限公司 | 多媒体资源的生成方法、装置、设备及存储介质 |
CN111460183A (zh) * | 2020-03-30 | 2020-07-28 | 北京金堤科技有限公司 | 多媒体文件生成方法和装置、存储介质、电子设备 |
CN112738623A (zh) * | 2019-10-14 | 2021-04-30 | 北京字节跳动网络技术有限公司 | 视频文件的生成方法、装置、终端及存储介质 |
CN113452941A (zh) * | 2021-05-14 | 2021-09-28 | 北京达佳互联信息技术有限公司 | 视频生成方法、装置、电子设备及存储介质 |
CN113473182A (zh) * | 2021-09-06 | 2021-10-01 | 腾讯科技(深圳)有限公司 | 一种视频生成的方法及装置、计算机设备和存储介质 |
CN114339399A (zh) * | 2021-12-27 | 2022-04-12 | 咪咕文化科技有限公司 | 多媒体文件剪辑方法、装置及计算设备 |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180143741A1 (en) * | 2016-11-23 | 2018-05-24 | FlyrTV, Inc. | Intelligent graphical feature generation for user content |
JP6887132B2 (ja) * | 2018-04-12 | 2021-06-16 | パナソニックIpマネジメント株式会社 | 映像処理装置、映像処理システム及び映像処理方法 |
CN112449231B (zh) * | 2019-08-30 | 2023-02-03 | 腾讯科技(深圳)有限公司 | 多媒体文件素材的处理方法、装置、电子设备及存储介质 |
CN111246300B (zh) * | 2020-01-02 | 2022-04-22 | 北京达佳互联信息技术有限公司 | 剪辑模板的生成方法、装置、设备及存储介质 |
CN111935491B (zh) * | 2020-06-28 | 2023-04-07 | 百度在线网络技术(北京)有限公司 | 直播的特效处理方法、装置以及服务器 |
US11626139B2 (en) * | 2020-10-28 | 2023-04-11 | Meta Platforms Technologies, Llc | Text-driven editor for audio and video editing |
CN112579826A (zh) * | 2020-12-07 | 2021-03-30 | 北京字节跳动网络技术有限公司 | 视频显示及处理方法、装置、系统、设备、介质 |
US12154598B1 (en) * | 2022-03-29 | 2024-11-26 | United Services Automobile Association (Usaa) | System and method for generating synthetic video segments during video editing |
-
2022
- 2022-05-10 CN CN202210508063.2A patent/CN117082292A/zh active Pending
-
2023
- 2023-05-09 EP EP23802924.3A patent/EP4344230A4/en active Pending
- 2023-05-09 JP JP2023578709A patent/JP2024528440A/ja active Pending
- 2023-05-09 US US18/573,097 patent/US20240296871A1/en active Pending
- 2023-05-09 WO PCT/CN2023/093089 patent/WO2023217155A1/zh active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109756751A (zh) * | 2017-11-07 | 2019-05-14 | 腾讯科技(深圳)有限公司 | 多媒体数据处理方法及装置、电子设备、存储介质 |
CN112738623A (zh) * | 2019-10-14 | 2021-04-30 | 北京字节跳动网络技术有限公司 | 视频文件的生成方法、装置、终端及存储介质 |
CN111243632A (zh) * | 2020-01-02 | 2020-06-05 | 北京达佳互联信息技术有限公司 | 多媒体资源的生成方法、装置、设备及存储介质 |
CN111460183A (zh) * | 2020-03-30 | 2020-07-28 | 北京金堤科技有限公司 | 多媒体文件生成方法和装置、存储介质、电子设备 |
CN113452941A (zh) * | 2021-05-14 | 2021-09-28 | 北京达佳互联信息技术有限公司 | 视频生成方法、装置、电子设备及存储介质 |
CN113473182A (zh) * | 2021-09-06 | 2021-10-01 | 腾讯科技(深圳)有限公司 | 一种视频生成的方法及装置、计算机设备和存储介质 |
CN114339399A (zh) * | 2021-12-27 | 2022-04-12 | 咪咕文化科技有限公司 | 多媒体文件剪辑方法、装置及计算设备 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4344230A4 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2025113021A1 (zh) * | 2023-11-30 | 2025-06-05 | 北京字跳网络技术有限公司 | 一种文本处理方法、系统、装置、设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
US20240296871A1 (en) | 2024-09-05 |
CN117082292A (zh) | 2023-11-17 |
EP4344230A1 (en) | 2024-03-27 |
EP4344230A4 (en) | 2024-10-30 |
JP2024528440A (ja) | 2024-07-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021196903A1 (zh) | 视频处理方法、装置、可读介质及电子设备 | |
US20240107127A1 (en) | Video display method and apparatus, video processing method, apparatus, and system, device, and medium | |
KR102792043B1 (ko) | 비디오 생성 장치 및 방법, 전자 장치, 및 컴퓨터 판독가능 매체 | |
WO2023217155A1 (zh) | 视频生成方法、装置、设备、存储介质和程序产品 | |
JP6971292B2 (ja) | 段落と映像を整列させるための方法、装置、サーバー、コンピュータ可読記憶媒体およびコンピュータプログラム | |
CN113365134A (zh) | 音频分享方法、装置、设备及介质 | |
WO2021057740A1 (zh) | 视频生成方法、装置、电子设备和计算机可读介质 | |
US20240168605A1 (en) | Text input method and apparatus, and electronic device and storage medium | |
CN112287168A (zh) | 用于生成视频的方法和装置 | |
WO2024037491A1 (zh) | 媒体内容处理方法、装置、设备及存储介质 | |
WO2023165515A1 (zh) | 拍摄方法、装置、电子设备和存储介质 | |
WO2023169356A1 (zh) | 图像处理方法、装置、设备及存储介质 | |
CN117793478A (zh) | 讲解信息生成方法、装置、设备、介质和程序产品 | |
WO2024008184A1 (zh) | 一种信息展示方法、装置、电子设备、计算机可读介质 | |
US20250203153A1 (en) | Video generation method and apparatus, and device, storage medium and program product | |
CN117596452A (zh) | 视频生成方法、装置、介质及电子设备 | |
CN113891168B (zh) | 字幕处理方法、装置、电子设备和存储介质 | |
JP2024521940A (ja) | マルチメディア処理方法、装置、デバイスおよび媒体 | |
JP7684446B2 (ja) | ビデオ生成方法、装置、機器、記憶媒体及びプログラム製品 | |
CN114520928B (zh) | 显示信息生成方法、信息显示方法、装置和电子设备 | |
CN114697756A (zh) | 一种显示方法、装置、终端设备及介质 | |
CN111385638B (zh) | 视频处理方法和装置 | |
CN112287173A (zh) | 用于生成信息的方法和装置 | |
US12148451B2 (en) | Method, apparatus, device, storage medium and program product for video generating | |
JP7708980B2 (ja) | テンプレート選択方法、装置、電子機器及び記憶媒体 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2023578709 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023802924 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18573097 Country of ref document: US |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23802924 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2023802924 Country of ref document: EP Effective date: 20231220 |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112024021953 Country of ref document: BR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11202407421W Country of ref document: SG |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 112024021953 Country of ref document: BR Kind code of ref document: A2 Effective date: 20241023 |