WO2024046484A1 - 视频生成方法、装置、设备、存储介质和程序产品 - Google Patents

视频生成方法、装置、设备、存储介质和程序产品 Download PDF

Info

Publication number
WO2024046484A1
WO2024046484A1 PCT/CN2023/116765 CN2023116765W WO2024046484A1 WO 2024046484 A1 WO2024046484 A1 WO 2024046484A1 CN 2023116765 W CN2023116765 W CN 2023116765W WO 2024046484 A1 WO2024046484 A1 WO 2024046484A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
segment
target
text
editing
Prior art date
Application number
PCT/CN2023/116765
Other languages
English (en)
French (fr)
Inventor
李欣玮
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Priority to EP23821469.6A priority Critical patent/EP4354885A1/en
Priority to US18/391,576 priority patent/US20240127859A1/en
Publication of WO2024046484A1 publication Critical patent/WO2024046484A1/zh

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/732Query formulation
    • G06F16/7343Query language or query format
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]

Definitions

  • the present disclosure relates to a video generation method, device, equipment, storage medium and program product.
  • the process of making a video is: obtaining the input text entered by the user; matching the input text with its corresponding video image through an intelligent matching algorithm, and synthesizing the corresponding target video based on the input text and video image.
  • video images are obtained through intelligent matching algorithms, which may not meet the user's personalized video production needs.
  • embodiments of the present disclosure provide a video generation method, device, equipment, storage medium and program product, which generates first video editing data based on input text.
  • first video editing data users can generate You can freely select image materials for video editing as you like to meet your personalized video production needs.
  • Embodiments of the present disclosure provide a video generation method, including:
  • first video editing data is generated based on the input text; wherein the first video editing data includes at least one first video segment and at least one audio segment, the at least one first video segment
  • the segments and the at least one audio segment respectively correspond to at least one text segment divided by the input text, and the first object in the at least one first video segment
  • the target audio segment in the target video segment and the at least one audio segment corresponds to the target text segment in the at least one text segment, and the target audio segment is used to fill in the reading speech matching the target text segment, and the
  • the first target video segment is an empty segment;
  • the first target video The segment has the same track timeline interval as the target audio segment
  • a first target video is filled in the first target video clip to obtain a second Video editing data;
  • the first target video is a video obtained based on the first target image material indicated by the second instruction;
  • a first target video is generated.
  • Another embodiment of the present disclosure provides a video generation device, including:
  • a first video editing data determination module configured to generate first video editing data based on the input text in response to a first instruction triggered for the input text; wherein the first video editing data includes at least one first video segment and at least one Audio segments, the at least one first video segment and the at least one audio segment respectively correspond to at least one text segment divided by the input text, the first target video segment in the at least one first video segment, the The target audio segment in the at least one audio segment corresponds to the target text segment in the at least one text segment, the target audio segment is used to fill in the reading speech matching the target text segment, and the first target video segment is Empty fragment input text;
  • a first video editing data import module configured to import the first video editing data in a video editor, so that the at least one first video clip and the at least one video clip are displayed on a video editing track of the video editor.
  • An audio clip, the track timeline interval of the first target video clip and the target audio clip are the same;
  • a second video editing data determining module configured to, in response to triggering a second instruction on the video editor for the first target video segment, based on the first video editing data, in the first target video segment Fill in the first target video to obtain the second video editing data;
  • the first target video is a video obtained based on the first target image material indicated by the second instruction;
  • the first target video generation module is used to generate the first target video based on the second edited video data. frequency.
  • Another embodiment of the present disclosure provides an electronic device, the electronic device includes:
  • processors one or more processors
  • a storage device for storing one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the video generation method as described in any one of the above first aspects.
  • Yet another embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored.
  • the program is executed by a processor, the video generation method as described in any one of the above first aspects is implemented.
  • the computer program product includes a computer program or instructions.
  • the video generation method as described in any one of the above first aspects is implemented.
  • Embodiments of the present disclosure provide a video generation method, device, equipment, storage medium, and program product.
  • the method includes: in response to a first instruction triggered for input text, generating first video editing data based on the input text;
  • the first video editing data includes at least one first video segment and at least one audio segment, and the at least one first video segment and the at least one audio segment respectively correspond to at least one text segment divided by the input text, so A first target video segment in the at least one first video segment, a target audio segment in the at least one audio segment, and a target text segment in the at least one text segment correspond to each other, and the target audio segment is used to fill the
  • the target text segment matches the reading speech, and the first target video segment is an empty segment; the first video editing data is imported into the video editor, so that the first video editing data is displayed on the video editing track of the video editor.
  • At least one first video segment and the at least one audio segment, the first target video segment and the target audio segment have the same track timeline interval; in response to triggering on the video editor for the first target
  • the second instruction of the video clip is based on the first video editing data, filling the first target video in the first target video clip to obtain the second video editing data; the first target video is based on the second A video obtained from the first target image material indicated by the instruction; and based on the second edited video data, the first target video is generated.
  • Embodiments of the present disclosure generate first video editing data based on input text. In the first video editing data, users can freely select image materials for video editing according to their own preferences, thereby meeting the needs for personalized video production. beg.
  • Figure 1 shows an architecture diagram of a video production scenario provided by an embodiment of the present disclosure
  • Figure 2 is a schematic flowchart of a video generation method in an embodiment of the present disclosure
  • Figure 3 is a schematic diagram of a video production page in an embodiment of the present disclosure.
  • Figure 4 is a schematic diagram of a video editing page in an embodiment of the present disclosure.
  • Figure 5 is a schematic diagram of an image selection page in an embodiment of the present disclosure.
  • Figure 6 is a schematic diagram of a text editing page in an embodiment of the present disclosure.
  • Figure 7 is a schematic diagram of a network address identification page in an embodiment of the present disclosure.
  • Figure 8 is a schematic diagram of another network address identification page in an embodiment of the present disclosure.
  • Figure 9 is a schematic structural diagram of a video generation device in an embodiment of the present disclosure.
  • Figure 10 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure.
  • the term “include” and its variations are open-ended, ie, “including but not limited to.”
  • the term “based on” means “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” “Examples” means “at least some examples”. Relevant definitions of other terms will be given in the description below.
  • users can create videos on mobile terminals such as mobile phones, tablets, laptops, or other electronic devices.
  • the currently commonly used video production method is: the user has pre-written the input text, but there is no suitable image or video material. In this case, the user usually inputs the text; the video production client uses an intelligent matching algorithm to match the corresponding input text image material, and synthesize the corresponding target video based on the input text and image material.
  • video images are obtained through intelligent matching algorithms. Users cannot intervene in the image materials matched by the intelligent matching algorithm, so that the matched image materials may not meet the user's personalized video production needs.
  • the user wants to make a video of the cooking process.
  • the user has pre-written the recipe, that is, what materials are needed and what operations are performed at each step.
  • the user has taken corresponding pictures or Short video.
  • image materials need to be intelligently matched according to recipes, but for example, the steps of cooking, what to put first, and what to put next, have a certain degree of coherence and sequence.
  • the current intelligent matching algorithm may not be able to accurately match images that meet user needs.
  • embodiments of the present disclosure provide a video generation method, including: in response to a first instruction triggered for input text, generating first video editing data based on the input text; wherein, the first video editing data Including at least one first video segment and at least one audio segment, the at least one first video segment and the at least one audio segment respectively correspond to at least one text segment divided by the input text, and the at least one first video segment #1 target video in The target audio segment in the at least one audio segment corresponds to the target text segment in the at least one text segment, and the target audio segment is used to fill in the reading speech matching the target text segment, and the first The target video segment is an empty segment; the first video editing data is imported into the video editor, so that the at least one first video segment and the at least one audio segment are displayed on the video editing track of the video editor.
  • the track timeline interval of the first target video clip and the target audio clip is the same; in response to triggering a second instruction for the first target video clip on the video editor, based on the first video Edit data, fill the first target video in the target video clip, and obtain the second video editing data; the first target video is a video obtained based on the first target image material indicated by the second instruction; based on The second video edits data to generate the first target video.
  • Embodiments of the present disclosure generate first video editing data based on input text.
  • users can freely select image materials for video editing according to their own preferences, thereby meeting personalized video production needs.
  • Figure 1 is a system that can be used to implement the interactive method provided by embodiments of the present disclosure.
  • the system 100 may include a plurality of user terminals 110 , a network 120 , a server 130 and a database 140 .
  • the system 100 can be used to implement the video generation method described in any embodiment of the present disclosure.
  • the user terminal 110 may be any other type of electronic device capable of performing data processing, which may include but is not limited to: mobile phones, sites, units, devices, multimedia computers, multimedia tablets, Internet nodes, communicators, Desktop computers, laptop computers, notebook computers, netbook computers, tablet computers, personal communications system (PCS) devices, personal navigation devices, personal digital assistants (PDA), audio/video players, digital cameras/camcorders, positioning devices, Television receivers, radio receivers, e-book devices, gaming devices or any combination thereof, including accessories and peripherals for these devices or any combination thereof.
  • PCS personal communications system
  • PDA personal digital assistants
  • audio/video players digital cameras/camcorders
  • positioning devices Television receivers
  • radio receivers radio receivers
  • e-book devices gaming devices or any combination thereof, including accessories and peripherals for these devices or any combination thereof.
  • the user can operate through the application program installed on the user terminal 110.
  • the application program transmits user behavior data to the server 130 through the network 120.
  • the user terminal 110 can also receive data transmitted by the server 130 through the network 120.
  • the embodiment of the present disclosure is for the hardware system of the user terminal 110 There are no restrictions on the system and software system.
  • the user terminal 110 can be based on ARM, X86 and other processors, can have input/output devices such as cameras, touch screens, microphones, etc., and can run Windows, iOS, Linux, Android, Hongmeng OS, etc. operating system.
  • the application program on the user terminal 110 may be a video production application program, such as a video production application program based on multimedia resources such as videos, pictures, texts, etc.
  • a video production application program based on multimedia resources such as videos, pictures, and texts
  • the user can use the video production application on the user terminal 110 to shoot videos, create scripts, make videos, edit videos, etc., and at the same time, watch or Browse videos posted by other users, and perform operations such as likes, comments, and forwarding.
  • the user terminal 110 can implement the video generation method provided by the embodiments of the present disclosure by running a process or a thread. In some examples, the user terminal 110 may utilize its built-in application program to perform the video generation method. In other examples, the user terminal 110 may execute the video generation method by calling an application stored externally to the user terminal 110 .
  • Network 120 may be a single network, or a combination of at least two different networks.
  • the network 120 may include, but is not limited to, one or a combination of a local area network, a wide area network, a public network, a private network, etc.
  • the network 120 may be a computer network such as the Internet and/or various telecommunications networks (such as 3G/4G/5G mobile communication network, WIFI, Bluetooth, ZigBee, etc.), and embodiments of the present disclosure are not limited to this.
  • the server 130 may be a single server, a server group, or a cloud server. Each server in the server group is connected through a wired or wireless network.
  • a server group can be centralized, such as a data center, or distributed.
  • Server 130 may be local or remote.
  • the server 130 may communicate with the user terminal 110 through a wired or wireless network.
  • the embodiments of the present disclosure do not limit the hardware system and software system of the server 130.
  • the database 140 may generally refer to a device with a storage function.
  • the database 140 is mainly used to store various data utilized, generated and output by the user terminal 110 and the server 130 during their work.
  • the application on the user terminal 110 is the above-mentioned video production application based on multimedia resources such as videos, pictures, and texts
  • the data stored in the database 140 may include resources such as videos and texts uploaded by the user through the user terminal 110 Data, as well as the number of interaction operations such as likes, comments, etc. According to others.
  • Database 140 may be local or remote.
  • the database 140 may include various memories, such as random access memory (Random Access Memory, RAM), read only memory (Read Only Memory, ROM), etc.
  • RAM random access memory
  • ROM read only memory
  • the storage devices mentioned above are just some examples, and the storage devices that can be used by the system 100 are not limited thereto.
  • the embodiments of the present disclosure do not limit the hardware system and software system of the database 140.
  • it may be a relational database or a non-relational database.
  • the database 140 may be connected to or communicate with the server 130 or a portion thereof via the network 120, or directly with the server 130, or a combination of the above two methods.
  • database 140 may be a stand-alone device. In other examples, the database 140 may also be integrated in at least one of the user terminal 110 and the server 130 . For example, the database 140 can be set on the user terminal 110 or on the server 130 . For another example, the database 140 may also be distributed, with part of it set on the user terminal 110 and another part set on the server 130 .
  • FIG 2 is a flow chart of a video generation method in an embodiment of the present disclosure. This embodiment can be applied to the situation of generating a video based on input text.
  • the method can be executed by a video generation device, and the video generation device can use software and/or Or implemented in hardware, the video generation method can be implemented in the electronic device described in Figure 1.
  • the video generation method provided by the embodiment of the present disclosure mainly includes steps S101-S104.
  • first video editing data In response to a first instruction triggered for input text, generate first video editing data based on the input text; wherein the first video editing data includes at least one first video segment and at least one audio segment, and the at least one first A video segment and the at least one audio segment respectively correspond to at least one text segment divided by the input text, a first target video segment in the at least one first video segment, and a target audio segment in the at least one audio segment.
  • the segment corresponds to a target text segment in the at least one text segment, the target audio segment is used to fill in the reading speech matching the target text segment, and the first target video segment is a vacant segment.
  • response is used to represent the condition or state on which the performed operation depends.
  • the dependent condition or state is met, the one or more operations performed may be in real time or may have a set delay; Unless otherwise specified, there is no order of execution for the multiple operations performed. limits.
  • the video editing data can be understood as a video editing draft or editing project file, which is used to record and reproduce the user's video editing process, specifically including the audio and video materials targeted for editing and the editing operations performed on the audio and video materials. Instructions.
  • the first instruction can be understood as an instruction for instructing the client to generate the first video editing data based only on the input text, and does not require intelligent matching of image materials based on the input text.
  • the first instruction triggered for the input text may be in response to the user's trigger operation on the first control in the video production page.
  • a video production application when the user wants to create a video based on the input text, a video production application can be started in advance.
  • One of the subroutines included in the video production application has the function of producing a video based on the input text. It could also be launching a video production application that creates a video based on input text.
  • the method further includes: in response to a triggering operation on the video production control, displaying a video production page, wherein the video production page includes a first control, a second control and a text editing area.
  • the first control is used to trigger the first instruction in response to the user's trigger operation
  • the second control is used to trigger the third instruction in response to the user's trigger operation
  • the text editing area is used to respond to the user's edit The operation gets the input text.
  • a video production application is started, the application interface is displayed, the application interface includes a video production control, and in response to the user's triggering operation on the video production control, the video production page is displayed.
  • the above-mentioned trigger operation may be one or more combination operations of click, long press, hover, touch, etc., which are not limited in the embodiments of the present disclosure.
  • the video production page 30 includes a first control 301, a second control 302 and a text editing area 303.
  • the first control 301 is used to respond to the user's trigger operation. Trigger the first instruction
  • the second control 302 is used to trigger the third instruction in response to the user's trigger operation
  • the text editing area 303 is used to obtain the input text in response to the user's editing operation.
  • a first instruction triggered for input text is triggered in response to a user's triggering operation on the first control, and in response to the first instruction, first video editing data is generated based on the input text.
  • the input text refers to the text saved and displayed in the text editing area in response to the first instruction.
  • the first control is selected, and then in response to the video production control 304 included in the video production page 30, the first instruction triggered for the input text is triggered, In response to the first instruction, first video editing data is generated based on the input text.
  • the input text refers to the text saved and displayed in the text editing area in response to the first instruction.
  • the input text is divided into at least one text segment, and for each target text segment, its corresponding reading voice is obtained through intelligent matching, and the target text segment and its corresponding reading voice are aligned on the track timeline. , obtain its corresponding target audio segment, and obtain an empty segment as the first target video segment corresponding to the text segment.
  • the first target video segment and the target audio segment are synthesized to obtain a first video segment.
  • the above operations are performed for each target text segment to obtain multiple first video segments, and the multiple first video segments are synthesized according to the sequence of the target text segments in the input text to obtain first video editing data.
  • the correspondence between the first target video clip, the target audio clip, and the target text clip can be understood to mean that the timelines of the above three clips are aligned, and the expressed content echoes.
  • the timelines are all clips between 1 minute 55 seconds and 1 minute 57 seconds.
  • the target text segment is "Big Fire Stir-fried”
  • the target audio clip is the reading voice of the four characters "Big Fire Stir-fried”.
  • the first target video segment can be understood as any video segment in at least one first video frequency band, and the first target video segment being vacant can be understood as setting the first target video segment to be vacant in the first video editing data.
  • the vacancy can be to leave the video track vacant without setting any image material, or it can be to set the video track with preset image material.
  • the preset image material is a system setting and the user cannot change it at will, that is, regardless of the input Whatever content the text is, its corresponding first target video clip is always the set image material.
  • the image material can be a black image. In other words, no matter what the input text is, the corresponding first target video clip is a black picture.
  • the first video editing data includes at least one subtitle segment, a target subtitle segment in the at least one subtitle segment corresponds to a target text segment in the at least one text segment, and the The target subtitle segment is used to fill in text subtitles matching the target text segment.
  • the video editing page 40 of the video editor mainly includes a video preview area 401 and a video editing area 402.
  • the video editing area 402 displays a video editing track
  • the video editing track includes a video track 403, an audio track 404, and a subtitle track 405.
  • the first video clip is imported into the video track 403
  • the audio clip is imported into the audio track 404
  • the text subtitle is imported into the subtitle track 405 .
  • the imported content on the track can be edited in response to operations on the video editing track described above. For example: in the audio track 404, you can select an audio style, such as sweet style, serious style, etc. The timbre, contrast, etc.
  • all audio-related parameters can be edited on the audio track 404.
  • all parameters related to text subtitles can also be edited on the subtitle track 405, such as: subtitle color, subtitle appearance mode, subtitle font, etc.
  • the first target video clip and the target audio clip have the same track timeline interval.
  • their timelines are both clips between 1 minute and 55 seconds and 1 minute and 57 seconds.
  • the first target video segment may be any one of the at least one first video segment. Further, the first target video segment may be the one that the current user needs to perform Edited video clips.
  • the image selection page 50 in response to the trigger operation for the first target video clip 406, jump to the image selection page, as shown in Figure 5, the image selection page 50 includes an image preview area 501 and an image Select area 502.
  • the image selection area 502 also includes a local image control 503, a network image control 504, an emoticon package control 505 and an image browsing area 506.
  • the local image control 503 is used to obtain images or videos in the local album and display them in the image browsing area in response to the user's trigger operation.
  • the network image control 504 responds to the user's trigger operation, it pulls images or videos from the network or the database corresponding to the client and displays them in the image browsing area.
  • the emoticon package control 505 is used to obtain commonly used emoticon packages or popular emoticon packages for display in the image browsing area in response to the user's triggering operation.
  • the image browsing area 506 is used to respond to the user's up and down sliding operation, displaying multiple images or short videos in an up and down manner. Furthermore, the image browsing area 506 is also used to respond to the user's triggering operation on the image, and display the corresponding triggering operation.
  • the image is determined as the first target image material, and the first target image material is displayed in the image preview area 501 for the user to preview.
  • the image selection page 50 also includes a shooting control, which is used to call the camera in the terminal to shoot in response to the user's operation to obtain the first target image material.
  • the first target image material may be a picture or a video. If the first target image material is a picture, the technology of generating a video from a picture can be used to process the picture to obtain the first target video. For example: picture moving effect or freeze-frame video, etc.
  • the first target image material is a video
  • the duration of the video is inconsistent with the duration of the target video segment, the video can be cropped into a video with the same duration as the target video segment. If the duration of the video is not the same as the duration of the target video segment, Consistent, the frequency can be directly cropped and filled in the first target video clip.
  • the first target video in response to a triggering operation of video generation, is generated based on the second video editing data.
  • the triggering operation of video generation may refer to the triggering operation of the export control 407 in the video editing page 40 .
  • the export method can be to save the first target video locally, or to share it to other video sharing platforms or websites. In the embodiments of this disclosure, no specific details will be given. limited.
  • the method further includes: in response to a third instruction triggered for the input text, generating third video editing data based on the input text; wherein the third video editing data includes at least one second video segments and the at least one audio segment, the at least one second video segment and the at least one audio segment respectively correspond to at least one text segment divided by the input text, and the second of the at least one second video segment
  • the target video segment, the target audio segment in the at least one audio segment corresponds to the target text segment in the at least one text segment, and the second target video segment is a video obtained based on the second target image material, so The second target image material matches the target text segment; and the third video editing data is imported into a video editor so that the at least one second video is displayed on a video editing track of the video editor. segments and the at least one audio segment, the second target video segment has the same track timeline interval as the target audio segment; and based on the third edited video data, a second target video is generated.
  • the third instruction refers to an instruction for intelligently matching image materials to the input text, and then generating the first video data.
  • the second target image material is matched based on the target text fragment, and the target text fragment is matched to the image material through an intelligent matching algorithm.
  • a target video is generated based on the edited third video editing data.
  • the third video editing data can also be imported into a video editor for editing.
  • the specific editing method can refer to the description in the above embodiment.
  • the target video is generated based on the edited third video editing data in response to the triggering operation of video generation.
  • Embodiments of the present disclosure provide a video generation method, including: in response to a first instruction triggered for input text, generating first video editing data based on the input text; wherein the first video editing data includes at least one first video segments and at least one audio segment, the at least one first video segment and the at least one audio segment respectively correspond to at least one text segment divided by the input text, and the first target video in the at least one first video segment
  • the target audio segment in the at least one audio segment corresponds to the target text segment in the at least one text segment, and the target audio segment is used to fill in the reading speech matching the target text segment, and the first
  • the target video segment is an empty segment; import the first video editing data in the video editor so that Display the at least one first video clip and the at least one audio clip on the video editing track of the video editor, and the track timeline interval of the first target video clip and the target audio clip is the same; in response to Triggering a second instruction for the first target video clip on the video editor, and filling the first target video clip with the
  • the embodiments of the present disclosure provide several ways for the video production page 40 to obtain input text in response to the user's editing operation, specifically as follows:
  • obtaining input text in response to a user's editing operation includes: in response to a triggering operation on the text editing area, displaying a text input page; in response to an input operation on the text input page, obtaining The input text corresponding to the input operation.
  • the text input page 60 in response to the triggering operation on the text editing area, a text input page is displayed.
  • the text input page 60 includes a title editing area 601 and a content editing area 602. and edit completion control 603.
  • the title editing area 601 and the content editing area 602 can both respond to the user's operations on the virtual keyboard, physical keyboard, etc. on the terminal to obtain the text input by the user.
  • the above input methods are not limited. Recognizing the text in the picture through OTC, recognizing the voice entered by the user through speech recognition, and obtaining the input text, etc., are all within the protection scope of the present disclosure.
  • the input text in the title editing area 601 and the content editing area 602 is obtained as the input text in the text editing area, and jumps to the video production page 30.
  • the text editing area includes a network address copy control
  • obtaining the input text in response to a user's editing operation includes: in response to a triggering operation of the network address copy control, displaying the network address input area ; In response to the operation on the network address input area, receive the network address corresponding to the input operation; obtain the input text corresponding to the network address.
  • the text editing area 303 includes a network address complex Control 304.
  • a masked area 701 as shown in FIG. 7 is displayed, and the masked area 701 includes a network address input area 702.
  • the user can input the address in the network address input area 702 by keyboard input, or by copying and pasting the address in the network address input area 702.
  • the network address in the network address input area 702 is received and the web page content corresponding to the URL is used as the input text in the text editing area 303.
  • obtaining the input text corresponding to the network address includes: determining whether there is original input text in the text editing area; if there is original input text in the text editing area, then The original input text is deleted, and the input text corresponding to the network address is obtained.
  • the original input text refers to the input text entered in the text editing area before pulling back the web page content corresponding to the URL.
  • Determine whether there is original input text in the text editing area if there is original input text in the text editing area, a prompt floating box is displayed.
  • the prompt floating box is used to prompt the user whether to delete the original input text existing in the text editing area.
  • delete input text control delete the original input text, and obtain the input text corresponding to the network address.
  • no processing will be performed on the original input text, that is, the original input text will still remain in the text editing area.
  • the original input text refers to the input text entered in the text editing area before pulling back the web page content corresponding to the URL.
  • Determine whether there is original input text in the text editing area if there is original input text in the text editing area, prompt the user to select the insertion position of the input text corresponding to the network address, and based on the user's selection operation, insert the input text corresponding to the network address
  • the input text is inserted at the corresponding position of the original input text.
  • the method after responding to the triggering operation of the video production control, the method further includes: obtaining the network address carried on the clipboard; obtaining the input text corresponding to the network address; displaying the video production page,
  • the video production page includes a text editing area, and the text editing area is used to display the input text corresponding to the network address.
  • the network address carried on the clipboard in response to a triggering operation on the video production control, is detected and acquired. As shown in Figure 8, if it carries a network address, a prompt is displayed. box. This prompt box is used to display the carried network address and prompt the user whether to identify the web page content corresponding to the network address.
  • the network address is received and the webpage content corresponding to the URL is used as the input text in the text editing area 103 .
  • the network address is directly ignored and jumps to the text acquisition interface 10. At this time, there is no input text in the text editing area 303 of the text acquisition interface 30.
  • multiple input text input methods are provided according to different situations to facilitate user selection.
  • Figure 9 is a schematic structural diagram of a video generation device in an embodiment of the present disclosure. This embodiment is applicable to the situation of generating a video based on input text.
  • the video generation device can be implemented in software and/or hardware.
  • the video generation device 90 mainly includes: a first video editing data determination module 91, a first video editing data import module 92, a second video editing data determination module 93 and a first target video Generate module 94.
  • the first video editing data determination module 91 is configured to generate first video editing data based on the input text in response to the first instruction triggered for the input text; wherein the first video editing data includes at least one first video segment and at least one audio segment, the at least one first video segment and the at least one audio segment respectively correspond to at least one text segment divided by the input text, and the first target video segment in the at least one first video segment
  • the target audio segment in the at least one audio segment corresponds to the target text segment in the at least one text segment, and the target audio segment is used to fill in the reading speech matching the target text segment, and the first target
  • the video clip is an empty clip and input text
  • the first video editing data import module 92 is used to import the first video editing data in the video editor, so that the at least one video editing data is displayed on the video editing track of the video editor.
  • a first video clip and the at least one audio clip, the first target video clip and the target audio clip have the same track timeline interval; a second video editing data determination module 93 is used to respond to the triggering a second instruction for the first target video clip on the editor, filling the first target video clip with the first target video based on the first video editing data, and obtaining the second video editing data;
  • the first target video is a video obtained based on the first target image material indicated by the second instruction; the first target video generation module 94, Used to generate the first target video based on the second edited video data.
  • the first video editing data includes at least one subtitle segment, a target subtitle segment in the at least one subtitle segment corresponds to a target text segment in the at least one text segment, and the The target subtitle segment is used to fill in text subtitles matching the target text segment.
  • the device further includes: a third video editing data generation module, configured to generate third video editing data based on the input text in response to a third instruction triggered for the input text; wherein, The third video editing data includes at least one second video segment and the at least one audio segment, the at least one second video segment and the at least one audio segment respectively correspond to at least one text segment divided by the input text, so A second target video segment in the at least one second video segment, a target audio segment in the at least one audio segment, and a target text segment in the at least one text segment correspond to the second target video segment based on The video obtained from the second target image material, the second target image material matches the target text segment; a third video editing data import module, used to import the third video editing data in the video editor, So that the at least one second video segment and the at least one audio segment are displayed on the video editing track of the video editor, and the track timeline interval of the second target video segment and the target audio segment is the same; The second target video generation module is used to generate a second video editing data generation module, configured
  • the device further includes: a video production page display module, configured to display a video production page in response to a triggering operation on a video production control, wherein the video production page includes a first control , a second control and a text editing area, the first control is used to trigger the first instruction in response to the user's trigger operation, the second control is used to trigger the third instruction in response to the user's trigger operation, the text The editing area is used to obtain input text in response to the user's editing operation.
  • a video production page display module configured to display a video production page in response to a triggering operation on a video production control
  • the video production page includes a first control , a second control and a text editing area
  • the first control is used to trigger the first instruction in response to the user's trigger operation
  • the second control is used to trigger the third instruction in response to the user's trigger operation
  • the text The editing area is used to obtain input text in response to the user's editing operation.
  • obtaining input text in response to a user's editing operation includes: in response to a triggering operation on the text editing area, displaying a text input page; in response to an input operation on the text input page, obtaining The input text corresponding to the input operation.
  • the text editing area includes a network address copy control
  • obtaining input text in response to a user's editing operation includes: responding to the network address copy control.
  • the network address input area is displayed according to the triggering operation of the software; in response to the operation on the network address input area, the network address corresponding to the input operation is received; and the input text corresponding to the network address is obtained.
  • obtaining the input text corresponding to the network address includes: determining whether there is original input text in the text editing area; if there is original input text in the text editing area, then The original input text is deleted, and the input text corresponding to the network address is obtained.
  • the method after responding to the triggering operation of the video production control, the method further includes: obtaining the network address carried on the clipboard; obtaining the input text corresponding to the network address; displaying the video production page,
  • the video production page includes a text editing area, and the text editing area is used to display the input text corresponding to the network address.
  • the video generation device provided by the embodiments of the present disclosure can execute the steps performed in the video generation method provided by the method embodiments of the present disclosure. The execution steps and beneficial effects will not be described again here.
  • FIG. 10 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure.
  • the electronic device 1000 in the embodiment of the present disclosure may include, but is not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMP (portable multimedia players), vehicle-mounted terminals ( Mobile terminals such as vehicle navigation terminals), wearable terminal devices, etc., and fixed terminals such as digital TVs, desktop computers, smart home devices, etc.
  • the electronic device shown in FIG. 10 is only an example and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
  • the electronic device 1000 may include a processing device (eg, central processing unit, graphics processor, etc.) 1001 , which may be loaded into a random access device according to a program stored in a read-only memory (ROM) 1002 or from a storage device 1008 .
  • the program in the memory (RAM) 1003 performs various appropriate actions and processes to implement the image rendering method according to the embodiments of the present disclosure.
  • various programs and data required for the operation of the terminal device 1000 are also stored.
  • the processing device 1001, ROM 1002 and RAM 1003 are connected to each other via a bus 1004.
  • An input/output (I/O) interface 1005 is also connected to bus 1004.
  • the following devices may be connected to the I/O interface 1005: input devices 1006 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including An output device 1007 such as a liquid crystal display (LCD), a speaker, a vibrator, etc.; a storage device 1008 including a magnetic tape, a hard disk, etc.; and a communication device 1009.
  • the communication device 1009 may allow the terminal device 1000 to communicate wirelessly or wiredly with other devices to exchange data.
  • FIG. 10 shows the terminal device 1000 having various means, it should be understood that implementation or possession of all illustrated means is not required. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer-readable medium, the computer program including program code for executing the method shown in the flowchart, thereby achieving the above The video generation method described.
  • the computer program may be downloaded and installed from the network via the communication device 1009, or from the storage device 1008, or from the ROM 1002.
  • the processing device 1001 When the computer program is executed by the processing device 1001, the above-mentioned functions defined in the method of the embodiment of the present disclosure are performed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium may be, for example, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination thereof. More specific examples of computer readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), removable Programmd read-only memory (EPROM or flash memory), fiber optics, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wire, optical cable, RF (radio frequency), etc., or any suitable combination of the above.
  • the client and server can communicate using any currently known or future developed network protocol such as HTTP (Hyper Text Transfer Protocol), and can communicate with digital data in any form or medium.
  • Data communications e.g., communications network
  • communications networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (e.g., the Internet), and end-to-end networks (e.g., ad hoc end-to-end networks), as well as any currently known or developed in the future network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; it may also exist independently without being assembled into the electronic device.
  • the computer-readable medium carries one or more programs.
  • the terminal device responds to the first instruction triggered for the input text, generates a first instruction based on the input text.
  • a video editing data ; wherein the first video editing data includes at least one first video segment and at least one audio segment, and the at least one first video segment and the at least one audio segment respectively correspond to at least one segment of the input text.
  • the segments are used to fill in the reading speech matching the target text segment, and the first target video segment is a vacant segment; the first video editing data is imported into the video editor, so that in the video editing of the video editor The at least one first video clip and the at least one audio clip are displayed on the track, and the track timeline interval of the first target video clip and the target audio clip is the same; in response to triggering on the video editor for The second instruction of the first target video clip is to fill the first target video clip with the first target video based on the first video editing data to obtain the second video editing data; the first target video is A video obtained based on the first target image material indicated by the second instruction; based on the second edited video data, the first target video is generated.
  • the terminal device may also perform other steps described in the above embodiments.
  • Computer program code for performing the operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages—such as "C" language or similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider such as an Internet service provider through Internet connection
  • each block in the flowchart or block diagram may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or operations. , or can be implemented using a combination of specialized hardware and computer instructions.
  • the units involved in the embodiments of the present disclosure can be implemented in software or hardware. Among them, the name of a unit does not constitute a limitation on the unit itself under certain circumstances.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs Systems on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include electrical connections based on one or more wires, laptop disks, hard drives, Random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), fiber optics, portable compact disk read-only memory (CD-ROM), optical storage devices, Magnetic storage device, or any suitable combination of the foregoing.
  • RAM Random access memory
  • ROM read-only memory
  • EPROM or flash memory erasable programmable read-only memory
  • CD-ROM compact disk read-only memory
  • magnetic storage device or any suitable combination of the foregoing.
  • the present disclosure provides a video generation method, including: in response to a first instruction triggered for input text, generating first video editing data based on the input text; wherein, the first The video editing data includes at least one first video segment and at least one audio segment, the at least one first video segment and the at least one audio segment respectively correspond to at least one text segment divided by the input text, and the at least one first A first target video segment in a video segment, a target audio segment in the at least one audio segment, and a target text segment in the at least one text segment correspond to each other, and the target audio segment is used to fill the target text segment.
  • the first target video segment is an empty segment; importing the first video editing data in the video editor, so that the at least one first video segment is displayed on the video editing track of the video editor.
  • the video clip and the at least one audio clip, the first target video clip and the target audio clip have the same track timeline interval in response to triggering a second target video clip on the video editor.
  • Instruction, based on the first video editing data fill the first target video in the first target video segment to obtain the second video editing data; the first target video is based on the third target video indicated by the second instruction.
  • a video obtained from a target image material; based on the second edited video data, the first target video is generated.
  • the present disclosure provides a video generation method, wherein the first video editing data includes at least one subtitle segment, and a target subtitle segment in the at least one subtitle segment is consistent with the The target text segment in at least one text segment corresponds to the target subtitle segment, and the target subtitle segment is used to fill in text subtitles matching the target text segment.
  • the present disclosure provides a video generation method, wherein the method further includes: in response to a third instruction triggered for input text, generating a third video edit based on the input text data; wherein the third video editing data includes at least one second video segment and the at least one audio segment, the at least one second video segment and the at least one audio segment respectively correspond to at least one of the input text divisions Text segments, a second target video segment in the at least one second video segment, a target audio segment in the at least one audio segment, and a target text segment in the at least one text segment, the second target video clip For a video based on the second target image material, the second target image material matches the target text segment; import the third video editing data in the video editor, so that in the video editor The at least one second video clip and the at least one audio clip are displayed on the video editing track, and the track timeline interval of the second target video clip and the target audio clip is the same; based on the third edited video data, generate Second target video.
  • the present disclosure provides a video generation method, wherein the method further includes: in response to a triggering operation on the video production control, displaying a video production page, wherein the video production
  • the page includes a first control, a second control and a text editing area.
  • the first control is used to trigger the first instruction in response to the user's trigger operation.
  • the second control is used to trigger the third instruction in response to the user's trigger operation.
  • Third instruction, the text editing area is used to obtain input text in response to the user's editing operation.
  • the present disclosure provides a video generation method, wherein obtaining input text in response to a user's editing operation includes: displaying a text input page in response to a triggering operation on the text editing area. ; In response to an input operation on the text input page, obtain the input text corresponding to the input operation.
  • the present disclosure provides a video generation method, wherein the text editing area includes a network address copy control, and obtaining the input text in response to the user's editing operation includes: responding to the The triggering operation of the network address copy control displays the network address input area; in response to the operation on the network address input area, the network address corresponding to the input operation is received; and the input text corresponding to the network address is obtained.
  • the present disclosure provides a video generation method, wherein the obtaining the input text corresponding to the network address includes: determining whether the original input text exists in the text editing area; If the original input text exists in the text editing area, the original input text is deleted, and the input text corresponding to the network address is obtained.
  • the present disclosure provides a video generation method, wherein, after responding to the triggering operation of the video production control, the method further includes: obtaining the network address carried on the clipboard; obtaining The input text corresponding to the network address is displayed; the video production page includes a text editing area, and the text editing area is used to display the input text corresponding to the network address.
  • the present disclosure provides a video generation device, the device includes: a first video editing data determination module, configured to respond to a first instruction triggered for input text, based on the Inputting text generates first video editing data; wherein the first video editing data includes at least one first video segment and at least one audio segment, and the at least one first video segment and the at least one audio segment respectively correspond to the input at least one text segment of text division, a first target video segment in the at least one first video segment, a target audio segment in the at least one audio segment, and a target text segment in the at least one text segment, The target audio segment is used to fill in the reading voice matching the target text segment, and the first target video segment is a vacant segment to input text; a first video editing data import module is used to import the third video editor into the video editor.
  • a first video editing data determination module configured to respond to a first instruction triggered for input text, based on the Inputting text generates first video editing data
  • the first video editing data includes at least one first video segment and at least one audio segment
  • a video editing data such that the at least one first video segment and the at least one audio segment are displayed on a video editing track of the video editor, the tracks of the first target video segment and the target audio segment
  • the timeline intervals are the same; a second video editing data determination module, configured to respond to triggering a second instruction for the first target video segment on the video editor, based on the first video editing data, in the The first target video clip is filled with the first target video to obtain the second video editing data; the first target video is a video obtained based on the first target image material indicated by the second instruction; the first target video is generated A module configured to generate a first target video based on the second edited video data.
  • the present disclosure provides a video generation device, wherein the first video editing data includes at least one subtitle segment, and a target subtitle segment in the at least one subtitle segment is consistent with the The target text segment in at least one text segment corresponds to the target subtitle segment, and the target subtitle segment is used to fill in text subtitles matching the target text segment.
  • the present disclosure provides a video generation device, wherein the device further includes: a third video editing data generation module, configured to respond to a third instruction triggered for input text, Generate third video editing data based on the input text; wherein the third video editing data includes at least one second video segment and the at least one audio segment, the at least one second video segment and the at least one audio segment respectively Corresponding to at least one text segment divided by the input text, a second target video segment in the at least one second video segment, a target audio segment in the at least one audio segment, and a target in the at least one text segment Corresponding to the text segment, the second target video segment is a video based on the second target image material, so The second target image material matches the target text segment; a third video editing data import module is used to import the third video editing data in the video editor, so that the video editing in the video editor The at least one second video clip and the at least one audio clip are displayed on the track, and the track timeline interval of the second target video
  • the present disclosure provides a video generation device, wherein the device further includes: a video production page display module, configured to display the video production page in response to a triggering operation on the video production control.
  • the video production page includes a first control, a second control and a text editing area, the first control is used to trigger the first instruction in response to the user's trigger operation, and the second control is used to respond to The user's triggering operation triggers the third instruction, and the text editing area is used to obtain input text in response to the user's editing operation.
  • the present disclosure provides a video generation device, wherein obtaining input text in response to a user's editing operation includes: displaying a text input page in response to a triggering operation on the text editing area. ; In response to an input operation on the text input page, obtain the input text corresponding to the input operation.
  • the present disclosure provides a video generation device, wherein the text editing area includes a network address copy control, and obtaining the input text in response to the user's editing operation includes: responding to the The triggering operation of the network address copy control displays the network address input area; in response to the operation on the network address input area, the network address corresponding to the input operation is received; and the input text corresponding to the network address is obtained.
  • the present disclosure provides a video generation device, wherein the obtaining the input text corresponding to the network address includes: determining whether the original input text exists in the text editing area; If the original input text exists in the text editing area, the original input text is deleted, and the input text corresponding to the network address is obtained.
  • the present disclosure provides a video generation device, wherein, after responding to the triggering operation of the video production control, the method further includes: obtaining the network address carried on the clipboard; obtaining The input text corresponding to the network address is displayed; the video production page includes a text editing area, and the text editing area is used to display the input text corresponding to the network address.
  • the present disclosure provides an electronic device, including:
  • processors one or more processors
  • Memory used to store one or more programs
  • the one or more processors are caused to implement any of the video generation methods provided by this disclosure.
  • the present disclosure provides a computer-readable storage medium having a computer program stored thereon.
  • the program is executed by a processor, the video generation as described in any one provided by the present disclosure is implemented. method.
  • Embodiments of the present disclosure also provide a computer program product.
  • the computer program product includes a computer program or instructions. When the computer program or instructions are executed by a processor, the video generation method as described above is implemented.

Abstract

一种视频生成方法、装置、设备、存储介质和程序产品,该方法包括:响应于针对输入文本触发的第一指令,基于输入文本生成第一视频编辑数据;其中,第一视频编辑数据包括第一视频片段和音频片段,第一视频片段中的第一目标视频片段为空置片段;在视频编辑器的视频编辑轨道上显示第一视频片段和音频片段;响应于在视频编辑器上触发针对目标视频片段的第二指令,在第一目标视频片段中填充目标视频,得到第二视频编辑数据;基于第二视频编辑数据,生成第一目标视频。本公开实施例通过基于输入文本生成第一视频编辑数据,在第一视频编辑数据中用户可以根据自己的喜好自由选择图像素材进行视频编辑,满足用于个性化的视频制作需求。

Description

视频生成方法、装置、设备、存储介质和程序产品
本申请要求于2022年09月02日递交的中国专利申请第202211075071.9号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
技术领域
本公开涉及一种视频生成方法、装置、设备、存储介质和程序产品。
背景技术
随着计算机技术和移动通信技术的迅速发展,基于电子设备的各种视频平台得到了普遍应用,极大地丰富了人们的日常生活。越来越多的用户乐于在视频平台上分享自己的视频作品,以供其他用户观看。
制作视频的过程是:获取用户输入的输入文本;通过智能匹配算法为输入文本匹配其对应的视频图像,基于输入文本和视频图像合成对应的目标视频。
上述视频制作流程中,视频图像是通过智能匹配算法得到,可能不能满足用户个性化的视频制作需求。
发明内容
为了解决上述技术问题,本公开实施例提供了一种视频生成方法、装置、设备、存储介质和程序产品,基于输入文本生成第一视频编辑数据,在第一视频编辑数据中用户可以根据自己的喜好自由选择图像素材进行视频编辑,满足用于个性化的视频制作需求。
本公开实施例提供一种视频生成方法,包括:
响应于针对输入文本触发的第一指令,基于所述输入文本生成第一视频编辑数据;其中,第一视频编辑数据包括至少一个第一视频片段和至少一个音频片段,所述至少一个第一视频片段、所述至少一个音频片段分别对应于所述输入文本划分的至少一个文本片段,所述至少一个第一视频片段中的第一目 标视频片段、所述至少一个音频片段中的目标音频片段与所述至少一个文本片段中的目标文本片段相对应,所述目标音频片段用于填充所述目标文本片段匹配的朗读语音,所述第一目标视频片段为空置片段;
在视频编辑器中导入所述第一视频编辑数据,以使得在所述视频编辑器的视频编辑轨道上显示所述至少一个第一视频片段和所述至少一个音频片段,所述第一目标视频片段与所述目标音频片段的轨道时间线区间相同;
响应于在所述视频编辑器上触发针对所述第一目标视频片段的第二指令,基于所述第一视频编辑数据,在所述第一目标视频片段中填充第一目标视频,得到第二视频编辑数据;所述第一目标视频为基于所述第二指令所指示的第一目标图像素材而得到的视频;
基于第二编辑视频数据,生成第一目标视频。
本公开另一实施例提供一种视频生成装置,包括:
第一视频编辑数据确定模块,用于响应于针对输入文本触发的第一指令,基于所述输入文本生成第一视频编辑数据;其中,第一视频编辑数据包括至少一个第一视频片段和至少一个音频片段,所述至少一个第一视频片段、所述至少一个音频片段分别对应于所述输入文本划分的至少一个文本片段,所述至少一个第一视频片段中的第一目标视频片段、所述至少一个音频片段中的目标音频片段与所述至少一个文本片段中的目标文本片段相对应,所述目标音频片段用于填充所述目标文本片段匹配的朗读语音,所述第一目标视频片段为空置片段输入文本;
第一视频编辑数据导入模块,用于在视频编辑器中导入所述第一视频编辑数据,以使得在所述视频编辑器的视频编辑轨道上显示所述至少一个第一视频片段和所述至少一个音频片段,所述第一目标视频片段与所述目标音频片段的轨道时间线区间相同;
第二视频编辑数据确定模块,用于响应于在所述视频编辑器上触发针对所述第一目标视频片段的第二指令,基于所述第一视频编辑数据,在所述第一目标视频片段中填充第一目标视频,得到第二视频编辑数据;所述第一目标视频为基于所述第二指令所指示的第一目标图像素材而得到的视频;
第一目标视频生成模块,用于基于第二编辑视频数据,生成第一目标视 频。
本公开另一实施例提供一种电子设备,所述电子设备包括:
一个或多个处理器;
存储装置,用于存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如上述第一方面中任一项所述的视频生成方法。
本公开再一实施例提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如上述第一方面中任一项所述的视频生成方法。
本公开再一实施例提供一种计算机程序产品,该计算机程序产品包括计算机程序或指令,该计算机程序或指令被处理器执行时实现如上述第一方面中任一项所述的视频生成方法。
本公开实施例提供了一种视频生成方法、装置、设备、存储介质和程序产品,所述方法包括:响应于针对输入文本触发的第一指令,基于所述输入文本生成第一视频编辑数据;其中,第一视频编辑数据包括至少一个第一视频片段和至少一个音频片段,所述至少一个第一视频片段、所述至少一个音频片段分别对应于所述输入文本划分的至少一个文本片段,所述至少一个第一视频片段中的第一目标视频片段、所述至少一个音频片段中的目标音频片段与所述至少一个文本片段中的目标文本片段相对应,所述目标音频片段用于填充所述目标文本片段匹配的朗读语音,所述第一目标视频片段为空置片段;在视频编辑器中导入所述第一视频编辑数据,以使得在所述视频编辑器的视频编辑轨道上显示所述至少一个第一视频片段和所述至少一个音频片段,所述第一目标视频片段与所述目标音频片段的轨道时间线区间相同;响应于在所述视频编辑器上触发针对所述第一目标视频片段的第二指令,基于所述第一视频编辑数据,在所述第一目标视频片段中填充第一目标视频,得到第二视频编辑数据;所述第一目标视频为基于所述第二指令所指示的第一目标图像素材而得到的视频;基于第二编辑视频数据,生成第一目标视频。本公开实施例通过基于输入文本生成第一视频编辑数据,在第一视频编辑数据中用户可以根据自己的喜好自由选择图像素材进行视频编辑,满足用于个性化的视频制作需 求。
附图说明
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。
图1示出了本公开实施例提供的一种视频制作场景的架构图;
图2为本公开实施例中的一种视频生成方法的流程示意图;
图3为本公开实施例中的一种视频制作页面示意图;
图4为本公开实施例中的一种视频编辑页面示意图;
图5为本公开实施例中的一种图像选择页面示意图;
图6为本公开实施例中的一种文本编辑页面示意图;
图7为本公开实施例中的一种网络地址识别页面示意图;
图8为本公开实施例中的另一种网络地址识别页面示意图;
图9为本公开实施例中的一种视频生成装置的结构示意图;
图10为本公开实施例中的一种电子设备的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实 施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
对本公开实施例进行进一步详细说明之前,对本公开实施例中涉及的名词和术语进行说明,本公开实施例中涉及的名词和术语适用于如下的解释。
在相关技术中,用户可以在例如手机、平板电脑、笔记本电脑等移动终端或其他电子设备上制作视频。目前常用的视频制作方式是:用户预先编写好了输入文本,但是没有合适的图像或者视频素材,这种情况下,一般是用户输入文本;视频制作客户端通过智能匹配算法为输入文本匹配其对应的图像素材,基于输入文本和图像素材合成对应的目标视频。
上述视频制作流程中,视频图像是通过智能匹配算法得到,用户无法干预智能匹配算法匹配到的图像素材,使得匹配到的图像素材可能不能满足用户个性化的视频制作需求。例如:用户想要制作一个做饭流程的视频,用户预先编写了菜谱,即需要什么材料,每个步骤执行什么样的操作,并且针对每种材料和每个步骤用户均拍摄了相应的图片或者小视频。按照现有的视频制作方式,需要根据菜谱智能匹配图像素材,但是例如炒菜步骤,先放什么,再放什么,具备一定的连贯性和先后关系。目前的智能匹配算法可能无法精准的匹配到满足用户需求的图像。
为解决上述技术文本,本公开实施例提供了一种视频生成方法,包括:响应于针对输入文本触发的第一指令,基于输入文本生成第一视频编辑数据;其中,所述第一视频编辑数据包括至少一个第一视频片段和至少一个音频片段,所述至少一个第一视频片段、所述至少一个音频片段分别对应于所述输入文本划分的至少一个文本片段,所述至少一个第一视频片段中的第一目标视频 片段、所述至少一个音频片段中的目标音频片段与所述至少一个文本片段中的目标文本片段相对应,所述目标音频片段用于填充所述目标文本片段匹配的朗读语音,所述第一目标视频片段为空置片段;在视频编辑器中导入所述第一视频编辑数据,以使得在所述视频编辑器的视频编辑轨道上显示所述至少一个第一视频片段和所述至少一个音频片段,所述第一目标视频片段与所述目标音频片段的轨道时间线区间相同;响应于在所述视频编辑器上触发针对所述第一目标视频片段的第二指令,基于所述第一视频编辑数据,在所述目标视频片段中填充第一目标视频,得到第二视频编辑数据;所述第一目标视频为基于所述第二指令所指示的第一目标图像素材而得到的视频;基于第二视频编辑数据,生成第一目标视频。
本公开实施例通过基于输入文本生成第一视频编辑数据,在第一视频编辑数据中用户可以根据自己的喜好自由选择图像素材进行视频编辑,满足用于个性化的视频制作需求。
下面,将参考附图详细地说明本公开的实施例。应当注意的是,不同的附图中相同的附图标记将用于指代已描述的相同的元件。
图1为一种可用于实施本公开实施例提供的交互方法的系统。如图1所示,该系统100可以包括多个用户终端110、网络120、服务器130以及数据库140。例如,该系统100可以用于实施本公开任一实施例所述的视频生成方法。
可以理解的是,用户终端110可以是能够执行数据处理的任何其他类型的电子设备,其可以包括但不限于:移动手机、站点、单元、设备、多媒体计算机、多媒体平板、互联网节点、通信器、台式计算机、膝上型计算机、笔记本计算机、上网本计算机、平板计算机、个人通信系统(PCS)设备、个人导航设备、个人数字助理(PDA)、音频/视频播放器、数码相机/摄像机、定位设备、电视接收器、无线电广播接收器、电子书设备、游戏设备或者其任意组合,包括这些设备的配件和外设或者其任意组合。
用户可以通过安装在用户终端110上的应用程序进行操作,应用程序通过网络120将用户行为数据传输给服务器130,用户终端110还可以通过网络120接收服务器130传输的数据。本公开的实施例对于用户终端110的硬件系 统以及软件系统没有限制,例如,用户终端110可以是基于ARM,X86等处理器,可以具备例如摄像头、触摸屏、麦克风等输入/输出设备,可以运行有Windows,iOS,Linux,Android,鸿蒙OS等操作系统。
如,用户终端110上的应用程序可以是视频制作应用程序,例如基于视频、图片、文本等多媒体资源的视频制作应用程序。以基于视频、图片、文本等多媒体资源的视频制作应用程序为例,用户可以在用户终端110上通过该视频制作应用程序进行视频拍摄、创作脚本、制作视频、视频剪辑等,同时也可以观看或浏览其他用户发布的视频等,并可以进行例如点赞、评论、转发等操作。
用户终端110可以通过运行进程或线程的方式实施本公开实施例提供的视频生成方法。在一些示例中,用户终端110可以利用其内置的应用程序执行视频生成方法。在另一些示例中,用户终端110可以通过调用用户终端110外部存储的应用程序执行视频生成方法。
网络120可以是单个网络,或至少两个不同网络的组合。例如,网络120可以包括但不限于局域网、广域网、公用网络、专用网络等中的一种或几种的组合。网络120可以是诸如因特网的计算机网络和/或各种电信网络(例如3G/4G/5G移动通信网、W IFI、蓝牙、ZigBee等),本公开的实施例对此不作限制。
服务器130可以是一个单独的服务器,或一个服务器群组,或云服务器,服务器群组内的各个服务器通过有线的或无线的网络进行连接。一个服务器群组可以是集中式的,例如数据中心,也可以是分布式的。服务器130可以是本地的或远程的。服务器130可以通过有线的或无线的网络与用户终端110进行通信。本公开的实施例对于服务器130的硬件系统以及软件系统不作限制。
数据库140可以泛指具有存储功能的设备。数据库140主要用于存储用户终端110和服务器130在工作中所利用、产生和输出的各种数据。例如,以用户终端110上的应用程序为上述基于视频、图片、文本等多媒体资源的视频制作应用程序为例,数据库140所存储的数据可以包括用户通过用户终端110上传的例如视频、文本等资源数据,以及例如点赞、评论等互动操作数 据等。
数据库140可以是本地的或远程的。数据库140可以包括各种存储器、例如随机存取存储器(Random Access Memory,RAM)、只读存储器(Read Only Memory,ROM)等。以上提及的存储设备只是列举了一些例子,该系统100可以使用的存储设备并不局限于此。本公开的实施例对于数据库140的硬件系统以及软件系统不作限制,例如,可以是关系型数据库或非关系型数据库。
数据库140可以经由网络120与服务器130或其一部分相互连接或通信,或直接与服务器130相互连接或通信,或是上述两种方式的结合。
在一些示例中,数据库140可以是独立的设备。在另一些示例中,数据库140也可以集成在用户终端110和服务器130中的至少一个中。例如,数据库140可以设置在用户终端110上,也可以设置在服务器130上。又例如,数据库140也可以是分布式的,其一部分设置在用户终端110上,另一部分设置在服务器130上。
图2为本公开实施例中的一种视频生成方法的流程图,本实施例可适用于根据输入文本生成视频的情况,该方法可以由视频生成装置执行,该视频生成装置可以采用软件和/或硬件的方式实现,该视频生成方法可由图1中所述的电子设备中。
如图2所示,本公开实施例提供的视频生成方法主要包括步骤S101-S104。
S101、响应于针对输入文本触发的第一指令,基于所述输入文本生成第一视频编辑数据;其中,第一视频编辑数据包括至少一个第一视频片段和至少一个音频片段,所述至少一个第一视频片段、所述至少一个音频片段分别对应于所述输入文本划分的至少一个文本片段,所述至少一个第一视频片段中的第一目标视频片段、所述至少一个音频片段中的目标音频片段与所述至少一个文本片段中的目标文本片段相对应,所述目标音频片段用于填充所述目标文本片段匹配的朗读语音,所述第一目标视频片段为空置片段。
其中,响应于,用于表示所执行的操作所依赖的条件或者状态,当满足所依赖的条件或状态时,所执行的一个或多个操作可以是实时的,也可以具有设定的延迟;在没有特别说明的情况下,所执行的多个操作不存在执行先后顺序 的限制。
其中,所述视频编辑数据可以理解为视频编辑草稿或者编辑工程文件,用于记录和复现用户的视频编辑过程,具体包括编辑所针对的音视频素材以及针对音视频素材执行过的编辑操作的指示信息。
在本公开的一个实施方式中,第一指令可以理解为用于指示客户端基于仅仅基于输入文本生成第一视频编辑数据,并不需要根据输入文本智能匹配图像素材的指令。针对输入文本触发的第一指令可以是响应于用户对视频制作页面中第一控件的触发操作。
在本公开的一个实施方式中,在用户想要基于输入文本制作视频时,可以预先启动一个视频制作应用程序,该视频制作应用程序包括的其中一个子程序具备基于输入文本制作视频的功能。也可以是启动一个基于输入文本制作视频的视频制作应用程序。
在本公开的一个实施方式中,所述方法还包括:响应于对视频制作控件的触发操作,显示视频制作页面,其中,所述视频制作页面中包括第一控件、第二控件和文本编辑区域,所述第一控件用于响应用户的触发操作触发所述第一指令,所述第二控件用于响应用户的触发操作触发所述第三指令,所述文本编辑区域用于响应用户的编辑操作获取输入文本。
在本公开实施例中,启动视频制作应用程序,显示该应用程序界面,在上述应用程序界面中包括视频制作控件,响应于用户对视频制作控件的触发操作,显示视频制作页面。其中,上述触发操作可以是点击、长按、悬停、触摸等中的一种或者多种组合操作,本公开实施例中不进行限定。
在本公开的实施例中,如图3所示,所述视频制作页面30中包括第一控件301、第二控件302和文本编辑区域303,所述第一控件301用于响应用户的触发操作触发所述第一指令,所述第二控件302用于响应用户的触发操作触发所述第三指令,所述文本编辑区域303用于响应用户的编辑操作获取输入文本。
在本公开实施例中,响应于用户对第一控件的触发操作触发针对输入文本触发的第一指令,响应该第一指令,基于输入文本生成第一视频编辑数据。输入文本是指在响应第一指令时,所述文本编辑区域中保存并显示的文本。
进一步的,响应于用户对第一控件的触发操作,将所述第一控件选中,然后响应于对所述视频制作页面30中包括的视频制作控件304,触发针对输入文本触发的第一指令,响应该第一指令,基于输入文本生成第一视频编辑数据。输入文本是指在响应第一指令时,所述文本编辑区域中保存并显示的文本。
在本公开的一个实施例中,将输入文本划分为至少一个文本片段,针对每个目标文本片段通过智能匹配得到其对应的朗读语音,将目标文本片段和其对应的朗读语音进行轨道时间线对齐,得到其对应的目标音频片段,获取一个空置片段作为该文本片段对应的第一目标视频片段。将第一目标视频片段和所述目标音频片段进行合成,得到第一视频片段。针对每一个目标文本片段均执行上述操作,得到多个第一视频片段,将多个第一视频片段按照目标文本片段在所述输入文本中的前后顺序进行合成,得到第一视频编辑数据。
第一目标视频片段、目标音频片段与目标文本片段相对应可以理解为上述三个片段的时间线是对齐的,且所表达的内容是相呼应的。例如:其时间线均是1分55秒到1分57秒之间的片段。例如:目标文本片段是“大火爆炒”,目标音频片段是“大火爆炒”四个字的朗读语音。
所述第一目标视频片段可以理解为至少一个第一视频频段中的任意一个视频片段,第一目标视频片段为空置可以理解为在第一视频编辑数据中将第一目标视频片段设置为空置。该空置可以是将视频轨道进行空置不设置任何图像素材,还可以是视频轨道设置预先设定的图像素材,其中,预先设定的图像素材是系统设定,用户不可以随意更改,即不论输入文本是何内容,其对应的第一目标视频片段总是设定的图像素材。例如:该图像素材可以是一张黑色图像。换句话说,输入文本是何内容,其对应的第一目标视频片段均是一张黑色图片。
在本公开的一个实施方式中,所述第一视频编辑数据包括至少一个字幕片段,所述至少一个字幕片段中的目标字幕片段与所述至少一个文本片段中的目标文本片段相对应,所述目标字幕片段用于填充所述目标文本片段匹配的文本字幕。
在第一目标视频片段上添加与目标文本片段匹配的文本字幕,以方便用 户在观看视频的过程中,能够直观的看到与朗读语音对应的字幕,提高用户的观看体验。
S102、在视频编辑器中导入所述第一视频编辑数据,以使得在所述视频编辑器的视频编辑轨道上显示所述至少一个第一视频片段和所述至少一个音频片段,所述第一目标视频片段与所述目标音频片段的轨道时间线区间相同。
在本公开实施例中,如图4所示,在视频编辑器的视频编辑页面40中主要包括视频预览区域401和视频编辑区域402。其中,视频编辑区域402中显示视频编辑轨道,视频编辑轨道包括视频轨道403、音频轨道404和字幕轨道405。进一步的,在视频轨道403导入第一视频片段,在音频轨道404导入音频片段,在字幕轨道405导入文本字幕。响应于对上述视频编辑轨道的操作可以对该轨道上导入的内容进行编辑。例如:在音频轨道404中可以选择音频风格,例如:甜美风格、严肃风格等等。在音频轨道404中也可以对音色、对比度等进行编辑。即所有与音频相关的参数均可以在音频轨道404上进行编辑。同理,在字幕轨道405上也可以编辑所有与文本字幕相关的参数,例如:字幕颜色、字幕出现方式、字幕字体等等。
需要说明的,在对上述轨道进行编辑的过程中,可以是针对所有音频片段的编辑,也可以是针对其中一个目标音频片段的编辑,本公开实施例中,不再具体限定。在对上述轨道进行编辑的过程中,可以是针对所有字幕片段的编辑,也可以是针对其中一个目标字幕片段的编辑,本公开实施例中,不再具体限定。
在本公开的一个实施方式中,所述第一目标视频片段与所述目标音频片段的轨道时间线区间相同,例如:其时间线均是1分55秒到1分57秒之间的片段。
S103、响应于在所述视频编辑器上触发针对所述第一目标视频片段的第二指令,基于所述第一视频编辑数据,在所述第一目标视频片段中填充第一目标视频,得到第二视频编辑数据;所述第一目标视频为基于所述第二指令所指示的第一目标图像素材而得到的视频。
其中,所述第一目标视频片段可以是至少一个第一视频片段中的其中任意一个视频片段。进一步的,所述第一目标视频片段可以是当前用户需要进行 编辑的视频片段。
在本公开实施例中,如图4所示,响应于针对第一目标视频片段406的触发操作,跳转至图像选择页面,如图5所示,图像选择页面50包括图像预览区域501和图像选择区域502。其中,所述图像选择区域502还包括本地图像控件503、网络图像控件504、表情包控件505和图像浏览区域506。
其中,本地图像控件503用于响应用户的触发操作后,获取本地相册中的图像或者视频在图像浏览区域中进行展示。网络图像控件504响应用户的触发操作后,从网络上或者客户端对应的数据库中拉取图像或者视频在图像浏览区域中进行展示。表情包控件505用于响应用户的触发操作后,获取常用表情包或者流行表情包在图像浏览区域中进行展示。图像浏览区域506用于响应用户的上下滑动操作,以上下移动的方式展示多个图像或者小视频,进一步的,图像浏览区域506还用于响应用户对图像的触发操作,将该触发操作对应的图像确定为第一目标图像素材,将第一目标图像素材在所述图像预览区域501中进行显示,以供用户进行预览。
进一步的,在所述图像选择页面50中还包括拍摄控件,所述拍摄控件用于响应于用户的操作,调用终端中的摄像头进行拍摄,得到第一目标图像素材。
进一步的,所述第一目标图像素材可以是一张图片,也可以是一段视频。如果所述第一目标图像素材是一张图片,可以采用图片生成视频的技术,对图片进行处理,得到第一目标视频。例如:图片运镜效果或者定格视频等。所述第一目标图像素材是一段视频时,如果该视频时长与目标视频片段的时长不一致,可以将该视频裁剪成与目标视频片段的时长相同的视频,如果该视频时长与目标视频片段的时长一致,可以直接将该频裁剪成填充在第一目标视频片段中。
S104、基于第二编辑视频数据,生成第一目标视频。
在本公开实施例中,响应于视频生成的触发操作,基于第二视频编辑数据,生成第一目标视频。视频生成的触发操作可以是指对视频编辑页面40中导出控件407的触发操作。其中,导出方式可以是第一目标视频在本地保存,也可以是分享至其他视频共享平台或者网站。本公开实施例中不再进行具体 限定。
在上述实施例的基础上,所述方法还包括:响应于针对输入文本触发的第三指令,基于所述输入文本生成第三视频编辑数据;其中,第三视频编辑数据包括至少一个第二视频片段和所述至少一个音频片段,所述至少一个第二视频片段、所述至少一个音频片段分别对应于所述输入文本划分的至少一个文本片段,所述至少一个第二视频片段中的第二目标视频片段、所述至少一个音频片段中的目标音频片段与所述至少一个文本片段中的目标文本片段相对应,所述第二目标视频片段为基于第二目标图像素材而得到的视频,所述第二目标图像素材与所述目标文本片段相匹配;在视频编辑器中导入所述第三视频编辑数据,以使得在所述视频编辑器的视频编辑轨道上显示所述至少一个第二视频片段和所述至少一个音频片段,所述第二目标视频片段与所述目标音频片段的轨道时间线区间相同;基于第三编辑视频数据,生成第二目标视频。
其中,所述第三指令是指用于为输入文本智能匹配图像素材,进而生成第一视频数据的指令。第二目标图像素材是基于所述目标文本片段相匹配,是通过智能匹配算法为目标文本片段匹配到图像素材。
在本公开的一个实施方式中,得到第三视频编辑数据后,响应于视频生成的触发操作,基于编辑后的第三视频编辑数据,生成目标视频。
在本公开的一个实施方式中,得到第三视频编辑数据后,还可以将第三视频编辑数据导入视频编辑器中进行编辑,具体编辑方式可以参照上述实施中的描述,本公开实施例中不再赘述。对第三视频编辑数据进行编辑之后,再响应于视频生成的触发操作,基于编辑后的第三视频编辑数据,生成目标视频。
本公开实施例提供了一种视频生成方法,包括:响应于针对输入文本触发的第一指令,基于所述输入文本生成第一视频编辑数据;其中,第一视频编辑数据包括至少一个第一视频片段和至少一个音频片段,所述至少一个第一视频片段、所述至少一个音频片段分别对应于所述输入文本划分的至少一个文本片段,所述至少一个第一视频片段中的第一目标视频片段、所述至少一个音频片段中的目标音频片段与所述至少一个文本片段中的目标文本片段相对应,所述目标音频片段用于填充所述目标文本片段匹配的朗读语音,所述第一目标视频片段为空置片段;在视频编辑器中导入所述第一视频编辑数据,以使得 在所述视频编辑器的视频编辑轨道上显示所述至少一个第一视频片段和所述至少一个音频片段,所述第一目标视频片段与所述目标音频片段的轨道时间线区间相同;响应于在所述视频编辑器上触发针对所述第一目标视频片段的第二指令,基于所述第一视频编辑数据,在所述第一目标视频片段中填充第一目标视频,得到第二视频编辑数据;所述第一目标视频为基于所述第二指令所指示的第一目标图像素材而得到的视频;基于第二编辑视频数据,生成第一目标视频。本公开实施例通过基于输入文本生成第一视频编辑数据,在第一视频编辑数据中用户可以根据自己的喜好自由选择图像素材进行视频编辑,满足用于个性化的视频制作需求。
在上述实施例的基础上,本公开实施例中提供了几种视频制作页面40响应用户的编辑操作获取输入文本的方式,具体如下:
在本公开的一个实施方式中,响应用户的编辑操作获取输入文本,包括:响应于对所述文本编辑区域的触发操作,显示文本输入页面;响应于对所述文本输入页面的输入操作,获取所述输入操作对应的输入文本。
在本公开实施例中,如图3所示,响应于对所述文本编辑区域的触发操作,显示文本输入页面,如图6所示,文本输入页面60包括标题编辑区域601、内容编辑区域602以及编辑完成控件603。
其中,标题编辑区域601、内容编辑区域602均可以响应用户对终端上虚拟键盘、物理键盘等的操作,获取用户输入的文本。在本公开实施例中不限定上述输入方式,通过OTC识别图片中的文本,通过语音识别对用户录入的声音进行识别,得到输入文本,等输入文本获取方式,均在本公开的保护范围内。
进一步的,响应于对编辑完成控件603的触发操作,获取标题编辑区域601、内容编辑区域602中的输入文本作为文本编辑区域的输入文本,并跳转至视频制作页面30。
在本公开的一个实施方式中,所述文本编辑区域中包括网络地址复制控件,响应用户的编辑操作获取输入文本,包括:响应于对所述网络地址复制控件的触发操作,显示网络地址输入区域;响应于对所述网络地址输入区域的操作,接收所述输入操作对应的网络地址;获取所述网络地址对应的输入文本。
在本公开实施例中,如图3所示,文本编辑区域303中包括网络地址复 制控件304。响应于对所述网络地址复制控件的触发操作,显示如图7所示的蒙层区域701,该蒙层区域701内包括网络地址输入区域702。在该蒙层区域701内用户可以通过键盘录入的方式在网络地址输入区域702内录入地址,也可以通过复制粘贴的方式在网络地址输入区域702内录入地址。
进一步的,响应于对蒙层区域701中包括文本获取控件703的触发操作,接收网络地址输入区域702内的网络地址并将该网址对应的网页内容作为文本编辑区域303中的输入文本。
在本公开的一个实施方式中,所述获取所述网络地址对应的输入文本,包括:判断所述文本编辑区域内是否存在原始输入文本;如果所述文本编辑区域内存在原始输入文本,则将原始输入文本删除,并获取所述网络地址对应的输入文本。
在本公开实施例中,原始输入文本是指在将该网址对应的网页内容拉取回来之前,在文本编辑区域内输入的输入文本。判断所述文本编辑区域内是否存在原始输入文本;如果所述文本编辑区域内存在原始输入文本,在显示提示悬浮框,该提示悬浮框用于提示用户是否删除文本编辑区域内存在的原始输入文本,响应于对删除输入文本控件的触发操作,将原始输入文本删除,并获取所述网络地址对应的输入文本。响应于对取消控件的触发操作,将不对原始输入文本进行任何处理,即原始输入文本依旧保留在文本编辑区域内。
在本公开实施例中,原始输入文本是指在将该网址对应的网页内容拉取回来之前,在文本编辑区域内输入的输入文本。判断所述文本编辑区域内是否存在原始输入文本;如果所述文本编辑区域内存在原始输入文本,则提示用户选择网络地址对应的输入文本的插入位置,基于用户的选择操作,将网络地址对应的输入文本插入到原始输入文本的相应位置。
在本公开的一个实施方式中,所述响应于对视频制作控件的触发操作之后,还包括:获取剪切板上携带的网络地址;获取所述网络地址对应的输入文本;显示视频制作页面,所述视频制作页面中包括文本编辑区域,所述文本编辑区域用于显示所述网络地址对应的输入文本。
在本公开的一个实施方式中,响应于对视频制作控件的触发操作,检测获取剪切板上携带的网络地址。如图8所示,如果携带网络地址,则显示提示 框,该提示框用于显示携带的网络地址,并提示用户是否识别该网络地址对应网页内容。响应于对该提示框中确认控件801的触发操作,接收该网络地址并将该网址对应的网页内容作为文本编辑区域103中的输入文本。响应于对该提示框中取消控件802的触发操作,则直接忽略该网络地址,跳转至文本获取界面10中,此时,文本获取界面30中的文本编辑区域303中没有任何输入文本。
在本公开实施例中,根据不同的情况,设置了多种输入文本输入方式,方便用户进行选择。
图9为本公开实施例中的一种视频生成装置的结构示意图,本实施例可适用于根据输入文本生成视频的情况,该视频生成装置可以采用软件和/或硬件的方式实现。
如图9所示,本公开实施例提供的视频生成装置90主要包括:第一视频编辑数据确定模块91、第一视频编辑数据导入模块92、第二视频编辑数据确定模块93和第一目标视频生成模块94。
其中,第一视频编辑数据确定模块91,用于响应于针对输入文本触发的第一指令,基于所述输入文本生成第一视频编辑数据;其中,第一视频编辑数据包括至少一个第一视频片段和至少一个音频片段,所述至少一个第一视频片段、所述至少一个音频片段分别对应于所述输入文本划分的至少一个文本片段,所述至少一个第一视频片段中的第一目标视频片段、所述至少一个音频片段中的目标音频片段与所述至少一个文本片段中的目标文本片段相对应,所述目标音频片段用于填充所述目标文本片段匹配的朗读语音,所述第一目标视频片段为空置片段输入文本;第一视频编辑数据导入模块92,用于在视频编辑器中导入所述第一视频编辑数据,以使得在所述视频编辑器的视频编辑轨道上显示所述至少一个第一视频片段和所述至少一个音频片段,所述第一目标视频片段与所述目标音频片段的轨道时间线区间相同;第二视频编辑数据确定模块93,用于响应于在所述视频编辑器上触发针对所述第一目标视频片段的第二指令,基于所述第一视频编辑数据,在所述第一目标视频片段中填充第一目标视频,得到第二视频编辑数据;所述第一目标视频为基于所述第二指令所指示的第一目标图像素材而得到的视频;第一目标视频生成模块94, 用于基于第二编辑视频数据,生成第一目标视频。
在本公开的一个实施方式中,所述第一视频编辑数据包括至少一个字幕片段,所述至少一个字幕片段中的目标字幕片段与所述至少一个文本片段中的目标文本片段相对应,所述目标字幕片段用于填充所述目标文本片段匹配的文本字幕。
在本公开的一个实施方式中,所述装置还包括:第三视频编辑数据生成模块,用于响应于针对输入文本触发的第三指令,基于所述输入文本生成第三视频编辑数据;其中,第三视频编辑数据包括至少一个第二视频片段和所述至少一个音频片段,所述至少一个第二视频片段、所述至少一个音频片段分别对应于所述输入文本划分的至少一个文本片段,所述至少一个第二视频片段中的第二目标视频片段、所述至少一个音频片段中的目标音频片段与所述至少一个文本片段中的目标文本片段相对应,所述第二目标视频片段为基于第二目标图像素材而得到的视频,所述第二目标图像素材与所述目标文本片段相匹配;第三视频编辑数据导入模块,用于在视频编辑器中导入所述第三视频编辑数据,以使得在所述视频编辑器的视频编辑轨道上显示所述至少一个第二视频片段和所述至少一个音频片段,所述第二目标视频片段与所述目标音频片段的轨道时间线区间相同;第二目标视频生成模块,用于基于第三编辑视频数据,生成第二目标视频。
在本公开的一个实施方式中,所述装置还包括:视频制作页面显示模块,用于响应于对视频制作控件的触发操作,显示视频制作页面,其中,所述视频制作页面中包括第一控件、第二控件和文本编辑区域,所述第一控件用于响应用户的触发操作触发所述第一指令,所述第二控件用于响应用户的触发操作触发所述第三指令,所述文本编辑区域用于响应用户的编辑操作获取输入文本。
在本公开的一个实施方式中,响应用户的编辑操作获取输入文本,包括:响应于对所述文本编辑区域的触发操作,显示文本输入页面;响应于对所述文本输入页面的输入操作,获取所述输入操作对应的输入文本。
在本公开的一个实施方式中,所述文本编辑区域中包括网络地址复制控件,响应用户的编辑操作获取输入文本,包括:响应于对所述网络地址复制控 件的触发操作,显示网络地址输入区域;响应于对所述网络地址输入区域的操作,接收所述输入操作对应的网络地址;获取所述网络地址对应的输入文本。
在本公开的一个实施方式中,所述获取所述网络地址对应的输入文本,包括:判断所述文本编辑区域内是否存在原始输入文本;如果所述文本编辑区域内存在原始输入文本,则将原始输入文本删除,并获取所述网络地址对应的输入文本。
在本公开的一个实施方式中,所述响应于对视频制作控件的触发操作之后,还包括:获取剪切板上携带的网络地址;获取所述网络地址对应的输入文本;显示视频制作页面,所述视频制作页面中包括文本编辑区域,所述文本编辑区域用于显示所述网络地址对应的输入文本。
本公开实施例提供的视频生成装置,可执行本公开方法实施例所提供的视频生成方法中所执行的步骤,具备执行步骤和有益效果此处不再赘述。
图10为本公开实施例中的一种电子设备的结构示意图。下面具体参考图10,其示出了适于用来实现本公开实施例中的电子设备1000的结构示意图。本公开实施例中的电子设备1000可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)、可穿戴终端设备等等的移动终端以及诸如数字TV、台式计算机、智能家居设备等等的固定终端。图10示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图10所示,电子设备1000可以包括处理装置(例如中央处理器、图形处理器等)1001,其可以根据存储在只读存储器(ROM)1002中的程序或者从存储装置1008加载到随机访问存储器(RAM)1003中的程序而执行各种适当的动作和处理以实现如本公开所述的实施例的图片渲染方法。在RAM1003中,还存储有终端设备1000操作所需的各种程序和数据。处理装置1001、ROM 1002以及RAM 1003通过总线1004彼此相连。输入/输出(I/O)接口1005也连接至总线1004。
通常,以下装置可以连接至I/O接口1005:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置1006;包括 例如液晶显示器(LCD)、扬声器、振动器等的输出装置1007;包括例如磁带、硬盘等的存储装置1008;以及通信装置1009。通信装置1009可以允许终端设备1000与其他设备进行无线或有线通信以交换数据。虽然图10示出了具有各种装置的终端设备1000,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码,从而实现如上所述的视频生成方法。在这样的实施例中,该计算机程序可以通过通信装置1009从网络上被下载和安装,或者从存储装置1008被安装,或者从ROM 1002被安装。在该计算机程序被处理装置1001执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如HTTP(Hyper Text Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该终端设备执行时,使得该终端设备:响应于针对输入文本触发的第一指令,基于所述输入文本生成第一视频编辑数据;其中,第一视频编辑数据包括至少一个第一视频片段和至少一个音频片段,所述至少一个第一视频片段、所述至少一个音频片段分别对应于所述输入文本划分的至少一个文本片段,所述至少一个第一视频片段中的第一目标视频片段、所述至少一个音频片段中的目标音频片段与所述至少一个文本片段中的目标文本片段相对应,所述目标音频片段用于填充所述目标文本片段匹配的朗读语音,所述第一目标视频片段为空置片段;在视频编辑器中导入所述第一视频编辑数据,以使得在所述视频编辑器的视频编辑轨道上显示所述至少一个第一视频片段和所述至少一个音频片段,所述第一目标视频片段与所述目标音频片段的轨道时间线区间相同;响应于在所述视频编辑器上触发针对所述第一目标视频片段的第二指令,基于所述第一视频编辑数据,在所述第一目标视频片段中填充第一目标视频,得到第二视频编辑数据;所述第一目标视频为基于所述第二指令所指示的第一目标图像素材而得到的视频;基于第二编辑视频数据,生成第一目标视频。
可选的,当上述一个或者多个程序被该终端设备执行时,该终端设备还可以执行上述实施例所述的其他步骤。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C” 语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、 随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,本公开提供了一种视频生成方法,包括:响应于针对输入文本触发的第一指令,基于所述输入文本生成第一视频编辑数据;其中,第一视频编辑数据包括至少一个第一视频片段和至少一个音频片段,所述至少一个第一视频片段、所述至少一个音频片段分别对应于所述输入文本划分的至少一个文本片段,所述至少一个第一视频片段中的第一目标视频片段、所述至少一个音频片段中的目标音频片段与所述至少一个文本片段中的目标文本片段相对应,所述目标音频片段用于填充所述目标文本片段匹配的朗读语音,所述第一目标视频片段为空置片段;在视频编辑器中导入所述第一视频编辑数据,以使得在所述视频编辑器的视频编辑轨道上显示所述至少一个第一视频片段和所述至少一个音频片段,所述第一目标视频片段与所述目标音频片段的轨道时间线区间相同响应于在所述视频编辑器上触发针对所述第一目标视频片段的第二指令,基于所述第一视频编辑数据,在所述第一目标视频片段中填充第一目标视频,得到第二视频编辑数据;所述第一目标视频为基于所述第二指令所指示的第一目标图像素材而得到的视频;基于第二编辑视频数据,生成第一目标视频。
根据本公开的一个或多个实施例,本公开提供了一种视频生成方法,其中,所述第一视频编辑数据包括至少一个字幕片段,所述至少一个字幕片段中的目标字幕片段与所述至少一个文本片段中的目标文本片段相对应,所述目标字幕片段用于填充所述目标文本片段匹配的文本字幕。
根据本公开的一个或多个实施例,本公开提供了一种视频生成方法,其中,所述方法还包括:响应于针对输入文本触发的第三指令,基于所述输入文本生成第三视频编辑数据;其中,第三视频编辑数据包括至少一个第二视频片段和所述至少一个音频片段,所述至少一个第二视频片段、所述至少一个音频片段分别对应于所述输入文本划分的至少一个文本片段,所述至少一个第二视频片段中的第二目标视频片段、所述至少一个音频片段中的目标音频片段与所述至少一个文本片段中的目标文本片段相对应,所述第二目标视频片段 为基于第二目标图像素材而得到的视频,所述第二目标图像素材与所述目标文本片段相匹配;在视频编辑器中导入所述第三视频编辑数据,以使得在所述视频编辑器的视频编辑轨道上显示所述至少一个第二视频片段和所述至少一个音频片段,所述第二目标视频片段与所述目标音频片段的轨道时间线区间相同;基于第三编辑视频数据,生成第二目标视频。
根据本公开的一个或多个实施例,本公开提供了一种视频生成方法,其中,所述方法还包括:响应于对视频制作控件的触发操作,显示视频制作页面,其中,所述视频制作页面中包括第一控件、第二控件和文本编辑区域,所述第一控件用于响应用户的触发操作触发所述第一指令,所述第二控件用于响应用户的触发操作触发所述第三指令,所述文本编辑区域用于响应用户的编辑操作获取输入文本。
根据本公开的一个或多个实施例,本公开提供了一种视频生成方法,其中,响应用户的编辑操作获取输入文本,包括:响应于对所述文本编辑区域的触发操作,显示文本输入页面;响应于对所述文本输入页面的输入操作,获取所述输入操作对应的输入文本。
根据本公开的一个或多个实施例,本公开提供了一种视频生成方法,其中,所述文本编辑区域中包括网络地址复制控件,响应用户的编辑操作获取输入文本,包括:响应于对所述网络地址复制控件的触发操作,显示网络地址输入区域;响应于对所述网络地址输入区域的操作,接收所述输入操作对应的网络地址;获取所述网络地址对应的输入文本。
根据本公开的一个或多个实施例,本公开提供了一种视频生成方法,其中,所述获取所述网络地址对应的输入文本,包括:判断所述文本编辑区域内是否存在原始输入文本;如果所述文本编辑区域内存在原始输入文本,则将原始输入文本删除,并获取所述网络地址对应的输入文本。
根据本公开的一个或多个实施例,本公开提供了一种视频生成方法,其中,所述响应于对视频制作控件的触发操作之后,还包括:获取剪切板上携带的网络地址;获取所述网络地址对应的输入文本;显示视频制作页面,所述视频制作页面中包括文本编辑区域,所述文本编辑区域用于显示所述网络地址对应的输入文本。
根据本公开的一个或多个实施例,本公开提供了一种视频生成装置,所述装置包括:第一视频编辑数据确定模块,用于响应于针对输入文本触发的第一指令,基于所述输入文本生成第一视频编辑数据;其中,第一视频编辑数据包括至少一个第一视频片段和至少一个音频片段,所述至少一个第一视频片段、所述至少一个音频片段分别对应于所述输入文本划分的至少一个文本片段,所述至少一个第一视频片段中的第一目标视频片段、所述至少一个音频片段中的目标音频片段与所述至少一个文本片段中的目标文本片段相对应,所述目标音频片段用于填充所述目标文本片段匹配的朗读语音,所述第一目标视频片段为空置片段输入文本;第一视频编辑数据导入模块,用于在视频编辑器中导入所述第一视频编辑数据,以使得在所述视频编辑器的视频编辑轨道上显示所述至少一个第一视频片段和所述至少一个音频片段,所述第一目标视频片段与所述目标音频片段的轨道时间线区间相同;第二视频编辑数据确定模块,用于响应于在所述视频编辑器上触发针对所述第一目标视频片段的第二指令,基于所述第一视频编辑数据,在所述第一目标视频片段中填充第一目标视频,得到第二视频编辑数据;所述第一目标视频为基于所述第二指令所指示的第一目标图像素材而得到的视频;第一目标视频生成模块,用于基于第二编辑视频数据,生成第一目标视频。
根据本公开的一个或多个实施例,本公开提供了一种视频生成装置,其中,所述第一视频编辑数据包括至少一个字幕片段,所述至少一个字幕片段中的目标字幕片段与所述至少一个文本片段中的目标文本片段相对应,所述目标字幕片段用于填充所述目标文本片段匹配的文本字幕。
根据本公开的一个或多个实施例,本公开提供了一种视频生成装置,其中,所述装置还包括:第三视频编辑数据生成模块,用于响应于针对输入文本触发的第三指令,基于所述输入文本生成第三视频编辑数据;其中,第三视频编辑数据包括至少一个第二视频片段和所述至少一个音频片段,所述至少一个第二视频片段、所述至少一个音频片段分别对应于所述输入文本划分的至少一个文本片段,所述至少一个第二视频片段中的第二目标视频片段、所述至少一个音频片段中的目标音频片段与所述至少一个文本片段中的目标文本片段相对应,所述第二目标视频片段为基于第二目标图像素材而得到的视频,所 述第二目标图像素材与所述目标文本片段相匹配;第三视频编辑数据导入模块,用于在视频编辑器中导入所述第三视频编辑数据,以使得在所述视频编辑器的视频编辑轨道上显示所述至少一个第二视频片段和所述至少一个音频片段,所述第二目标视频片段与所述目标音频片段的轨道时间线区间相同;第二目标视频生成模块,用于基于第三编辑视频数据,生成第二目标视频。
根据本公开的一个或多个实施例,本公开提供了一种视频生成装置,其中,所述装置还包括:视频制作页面显示模块,用于响应于对视频制作控件的触发操作,显示视频制作页面,其中,所述视频制作页面中包括第一控件、第二控件和文本编辑区域,所述第一控件用于响应用户的触发操作触发所述第一指令,所述第二控件用于响应用户的触发操作触发所述第三指令,所述文本编辑区域用于响应用户的编辑操作获取输入文本。
根据本公开的一个或多个实施例,本公开提供了一种视频生成装置,其中,响应用户的编辑操作获取输入文本,包括:响应于对所述文本编辑区域的触发操作,显示文本输入页面;响应于对所述文本输入页面的输入操作,获取所述输入操作对应的输入文本。
根据本公开的一个或多个实施例,本公开提供了一种视频生成装置,其中,所述文本编辑区域中包括网络地址复制控件,响应用户的编辑操作获取输入文本,包括:响应于对所述网络地址复制控件的触发操作,显示网络地址输入区域;响应于对所述网络地址输入区域的操作,接收所述输入操作对应的网络地址;获取所述网络地址对应的输入文本。
根据本公开的一个或多个实施例,本公开提供了一种视频生成装置,其中,所述获取所述网络地址对应的输入文本,包括:判断所述文本编辑区域内是否存在原始输入文本;如果所述文本编辑区域内存在原始输入文本,则将原始输入文本删除,并获取所述网络地址对应的输入文本。
根据本公开的一个或多个实施例,本公开提供了一种视频生成装置,其中,所述响应于对视频制作控件的触发操作之后,还包括:获取剪切板上携带的网络地址;获取所述网络地址对应的输入文本;显示视频制作页面,所述视频制作页面中包括文本编辑区域,所述文本编辑区域用于显示所述网络地址对应的输入文本。
根据本公开的一个或多个实施例,本公开提供了一种电子设备,包括:
一个或多个处理器;
存储器,用于存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如本公开提供的任一所述的视频生成方法。
根据本公开的一个或多个实施例,本公开提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本公开提供的任一所述的视频生成方法。
本公开实施例还提供了一种计算机程序产品,该计算机程序产品包括计算机程序或指令,该计算机程序或指令被处理器执行时实现如上所述的视频生成方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (12)

  1. 一种视频生成方法,包括:
    响应于针对输入文本触发的第一指令,基于所述输入文本生成第一视频编辑数据;其中,第一视频编辑数据包括至少一个第一视频片段和至少一个音频片段,所述至少一个第一视频片段、所述至少一个音频片段分别对应于所述输入文本划分的至少一个文本片段,所述至少一个第一视频片段中的第一目标视频片段、所述至少一个音频片段中的目标音频片段与所述至少一个文本片段中的目标文本片段相对应,所述目标音频片段用于填充所述目标文本片段匹配的朗读语音,所述第一目标视频片段为空置片段;
    在视频编辑器中导入所述第一视频编辑数据,以使得在所述视频编辑器的视频编辑轨道上显示所述至少一个第一视频片段和所述至少一个音频片段,所述第一目标视频片段与所述目标音频片段的轨道时间线区间相同;
    响应于在所述视频编辑器上触发针对所述第一目标视频片段的第二指令,基于所述第一视频编辑数据,在所述第一目标视频片段中填充第一目标视频,得到第二视频编辑数据;所述第一目标视频为基于所述第二指令所指示的第一目标图像素材而得到的视频;
    基于第二编辑视频数据,生成第一目标视频。
  2. 根据权利要求1所述的方法,其中,所述第一视频编辑数据包括至少一个字幕片段,所述至少一个字幕片段中的目标字幕片段与所述至少一个文本片段中的目标文本片段相对应,所述目标字幕片段用于填充所述目标文本片段匹配的文本字幕。
  3. 根据权利要求1或2所述的方法,还包括:
    响应于针对输入文本触发的第三指令,基于所述输入文本生成第三视频编辑数据;其中,第三视频编辑数据包括至少一个第二视频片段和所述至少一个音频片段,所述至少一个第二视频片段、所述至少一个音频片段分别对应于所述输入文本划分的至少一个文本片段,所述至少一个第二视频片段中的第二目标视频片段、所述至少一个音频片段中的目标音频片段与所述至少一个文本片段中的目标文本片段相对应,所述第二目标视频片段为基于第二目标 图像素材而得到的视频,所述第二目标图像素材与所述目标文本片段相匹配;
    在视频编辑器中导入所述第三视频编辑数据,以使得在所述视频编辑器的视频编辑轨道上显示所述至少一个第二视频片段和所述至少一个音频片段,所述第二目标视频片段与所述目标音频片段的轨道时间线区间相同;
    基于第三编辑视频数据,生成第二目标视频。
  4. 根据权利要求3所述的方法,还包括:
    响应于对视频制作控件的触发操作,显示视频制作页面,其中,所述视频制作页面中包括第一控件、第二控件和文本编辑区域,所述第一控件用于响应用户的触发操作触发所述第一指令,所述第二控件用于响应用户的触发操作触发所述第三指令,所述文本编辑区域用于响应用户的编辑操作获取输入文本。
  5. 根据权利要求4所述的方法,其中,所述响应用户的编辑操作获取输入文本,包括:
    响应于对所述文本编辑区域的触发操作,显示文本输入页面;
    响应于对所述文本输入页面的输入操作,获取所述输入操作对应的输入文本。
  6. 根据权利要求4所述的方法,其中,所述文本编辑区域中包括网络地址复制控件,所述响应用户的编辑操作获取输入文本,包括:
    响应于对所述网络地址复制控件的触发操作,显示网络地址输入区域;
    响应于对所述网络地址输入区域的操作,接收所述输入操作对应的网络地址;
    获取所述网络地址对应的输入文本。
  7. 根据权利要求6所述的方法,其中,所述获取所述网络地址对应的输入文本,包括:
    判断所述文本编辑区域内是否存在原始输入文本;
    如果所述文本编辑区域内存在原始输入文本,则将原始输入文本删除,并获取所述网络地址对应的输入文本。
  8. 根据权利要求6所述的方法,其中,所述响应于对视频制作控件的触发操作之后,还包括:
    获取剪切板上携带的网络地址;
    获取所述网络地址对应的输入文本;
    显示视频制作页面,所述视频制作页面中包括文本编辑区域,所述文本编辑区域用于显示所述网络地址对应的输入文本。
  9. 一种视频生成装置,包括:
    第一视频编辑数据确定模块,用于响应于针对输入文本触发的第一指令,基于所述输入文本生成第一视频编辑数据;其中,第一视频编辑数据包括至少一个第一视频片段和至少一个音频片段,所述至少一个第一视频片段、所述至少一个音频片段分别对应于所述输入文本划分的至少一个文本片段,所述至少一个第一视频片段中的第一目标视频片段、所述至少一个音频片段中的目标音频片段与所述至少一个文本片段中的目标文本片段相对应,所述目标音频片段用于填充所述目标文本片段匹配的朗读语音,所述第一目标视频片段为空置片段输入文本;
    第一视频编辑数据导入模块,用于在视频编辑器中导入所述第一视频编辑数据,以使得在所述视频编辑器的视频编辑轨道上显示所述至少一个第一视频片段和所述至少一个音频片段,所述第一目标视频片段与所述目标音频片段的轨道时间线区间相同;
    第二视频编辑数据确定模块,用于响应于在所述视频编辑器上触发针对所述第一目标视频片段的第二指令,基于所述第一视频编辑数据,在所述第一目标视频片段中填充第一目标视频,得到第二视频编辑数据;所述第一目标视频为基于所述第二指令所指示的第一目标图像素材而得到的视频;
    第一目标视频生成模块,用于基于第二编辑视频数据,生成第一目标视频。
  10. 一种电子设备,包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-8中任一项所述的方法。
  11. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算 机程序被处理器执行时实现如权利要求1-8中任一项所述的方法。
  12. 一种计算机程序产品,包括计算机程序或指令,所述计算机程序或指令被处理器执行时实现如权利要求1-8中任一项所述的方法。
PCT/CN2023/116765 2022-09-02 2023-09-04 视频生成方法、装置、设备、存储介质和程序产品 WO2024046484A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP23821469.6A EP4354885A1 (en) 2022-09-02 2023-09-04 Video generation method and apparatus, device, storage medium, and program product
US18/391,576 US20240127859A1 (en) 2022-09-02 2023-12-20 Video generation method, apparatus, device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211075071.9A CN117692699A (zh) 2022-09-02 2022-09-02 视频生成方法、装置、设备、存储介质和程序产品
CN202211075071.9 2022-09-02

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/391,576 Continuation US20240127859A1 (en) 2022-09-02 2023-12-20 Video generation method, apparatus, device, and storage medium

Publications (1)

Publication Number Publication Date
WO2024046484A1 true WO2024046484A1 (zh) 2024-03-07

Family

ID=89427041

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/116765 WO2024046484A1 (zh) 2022-09-02 2023-09-04 视频生成方法、装置、设备、存储介质和程序产品

Country Status (4)

Country Link
US (1) US20240127859A1 (zh)
EP (1) EP4354885A1 (zh)
CN (1) CN117692699A (zh)
WO (1) WO2024046484A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140219633A1 (en) * 2013-02-07 2014-08-07 Cyberlink Corp. Systems and Methods for Performing Selective Video Rendering
CN111935537A (zh) * 2020-06-30 2020-11-13 百度在线网络技术(北京)有限公司 音乐短片视频生成方法、装置、电子设备和存储介质
CN112423023A (zh) * 2020-12-09 2021-02-26 珠海九松科技有限公司 一种智能化视频自动混剪的方法
CN114390220A (zh) * 2022-01-19 2022-04-22 中国平安人寿保险股份有限公司 一种动画视频生成方法及相关装置
CN114827752A (zh) * 2022-04-25 2022-07-29 中国平安人寿保险股份有限公司 视频生成方法、视频生成系统、电子设备及存储介质
CN114998484A (zh) * 2022-05-27 2022-09-02 中国平安人寿保险股份有限公司 音视频生成方法、装置、计算机设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140219633A1 (en) * 2013-02-07 2014-08-07 Cyberlink Corp. Systems and Methods for Performing Selective Video Rendering
CN111935537A (zh) * 2020-06-30 2020-11-13 百度在线网络技术(北京)有限公司 音乐短片视频生成方法、装置、电子设备和存储介质
CN112423023A (zh) * 2020-12-09 2021-02-26 珠海九松科技有限公司 一种智能化视频自动混剪的方法
CN114390220A (zh) * 2022-01-19 2022-04-22 中国平安人寿保险股份有限公司 一种动画视频生成方法及相关装置
CN114827752A (zh) * 2022-04-25 2022-07-29 中国平安人寿保险股份有限公司 视频生成方法、视频生成系统、电子设备及存储介质
CN114998484A (zh) * 2022-05-27 2022-09-02 中国平安人寿保险股份有限公司 音视频生成方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN117692699A (zh) 2024-03-12
EP4354885A1 (en) 2024-04-17
US20240127859A1 (en) 2024-04-18

Similar Documents

Publication Publication Date Title
WO2022048478A1 (zh) 多媒体数据的处理方法、生成方法及相关设备
WO2021196903A1 (zh) 视频处理方法、装置、可读介质及电子设备
WO2022042593A1 (zh) 字幕编辑方法、装置和电子设备
WO2022143924A1 (zh) 视频生成方法、装置、电子设备和存储介质
WO2020062683A1 (zh) 视频获取方法、装置、终端和介质
WO2022052838A1 (zh) 视频文件的处理方法、装置、电子设备及计算机存储介质
CN111970571B (zh) 视频制作方法、装置、设备及存储介质
WO2023165515A1 (zh) 拍摄方法、装置、电子设备和存储介质
WO2020220773A1 (zh) 图片预览信息的显示方法、装置、电子设备及计算机可读存储介质
WO2023179424A1 (zh) 弹幕添加方法、装置、电子设备和存储介质
EP4343583A1 (en) Text input method and apparatus, and electronic device and storage medium
CN114363686B (zh) 多媒体内容的发布方法、装置、设备和介质
WO2024037491A1 (zh) 媒体内容处理方法、装置、设备及存储介质
WO2024037557A1 (zh) 特效道具处理方法、装置、电子设备及存储介质
WO2024012058A1 (zh) 一种表情图片预览方法、装置、设备及介质
WO2024002120A1 (zh) 媒体内容展示方法、装置、设备及存储介质
WO2023241373A1 (zh) 影像记录生成方法、装置、电子设备及存储介质
WO2023088484A1 (zh) 用于多媒体资源剪辑场景的方法、装置、设备及存储介质
WO2024046484A1 (zh) 视频生成方法、装置、设备、存储介质和程序产品
CN114520928A (zh) 显示信息生成方法、信息显示方法、装置和电子设备
WO2024061360A1 (zh) 文本素材获取方法、装置、设备、介质和程序产品
WO2023109813A1 (zh) 视频生成方法、装置、电子设备和存储介质
WO2023216941A1 (zh) 视频记录的展示方法、装置、电子设备、介质和程序产品
WO2024002132A1 (zh) 多媒体数据处理方法、装置、设备、存储介质和程序产品
WO2023217155A1 (zh) 视频生成方法、装置、设备、存储介质和程序产品

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2023821469

Country of ref document: EP

Effective date: 20231220