WO2022121626A1 - 视频显示及处理方法、装置、系统、设备、介质 - Google Patents

视频显示及处理方法、装置、系统、设备、介质 Download PDF

Info

Publication number
WO2022121626A1
WO2022121626A1 PCT/CN2021/130581 CN2021130581W WO2022121626A1 WO 2022121626 A1 WO2022121626 A1 WO 2022121626A1 CN 2021130581 W CN2021130581 W CN 2021130581W WO 2022121626 A1 WO2022121626 A1 WO 2022121626A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
text
target
subtitle
original data
Prior art date
Application number
PCT/CN2021/130581
Other languages
English (en)
French (fr)
Inventor
陈奇
周诗文
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Priority to US18/256,014 priority Critical patent/US20240107127A1/en
Publication of WO2022121626A1 publication Critical patent/WO2022121626A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/635Overlay text, e.g. embedded captions in a TV program
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4318Generation of visual interfaces for content selection or interaction; Content or additional data rendering by altering the content in the rendering process, e.g. blanking, blurring or masking an image region
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Definitions

  • the present disclosure relates to the field of multimedia technologies, and in particular, to a video display and processing method, apparatus, system, device, and medium.
  • the user when producing a video, the user first needs to find the material by himself, and then perform a series of complex video editing operations on the material to finally generate a video work. If the material found by the user is not rich enough, the quality of the video edited manually by the user cannot be guaranteed, and the manual editing operation steps are complicated and time-consuming, making the time cost of video production relatively high.
  • the present disclosure provides a video display and processing method, apparatus, system, device, and medium, which can reduce the time cost of video production.
  • the present disclosure provides a video display method, including:
  • Receive a video generation operation for the original data the original data is used to obtain the target text, and the video generation operation is used to trigger the generation of the target video corresponding to the target text;
  • the generated target video is displayed.
  • the video elements of the target video include subtitle text and multimedia material corresponding to the subtitle text.
  • the subtitle text is generated according to the target text, and the multimedia material is obtained according to the subtitle text.
  • the present disclosure provides a video processing method, including:
  • the present disclosure provides a video display device, comprising:
  • a first receiving unit configured to receive a video generation operation for the original data, the original data is used to obtain the target text, and the video generation operation is used to trigger the generation of the target video corresponding to the target text;
  • the first display unit is configured to display the generated target video in response to the video generation operation, the video elements of the target video include subtitle text and multimedia material corresponding to the subtitle text, the subtitle text is generated according to the target text, and the multimedia material is obtained according to the subtitle text.
  • the present disclosure provides a video processing apparatus, including:
  • a second receiving unit configured to receive a video generation request carrying original data sent by the electronic device
  • a first obtaining unit configured to obtain the target text according to the original data in response to the video generation request
  • a first generating unit configured to generate subtitle text according to the target text
  • a second obtaining unit configured to obtain multimedia material corresponding to the subtitle text
  • the second generation unit is configured to generate the target video according to the subtitle text and the multimedia material
  • the first sending unit is configured to send the target video to the electronic device.
  • the present disclosure provides a video processing system, including an electronic device and a server, wherein:
  • the electronic device is used to receive the video generation operation for the original data, the original data is used to obtain the target text, and the video generation operation is used to trigger the generation of the target video corresponding to the target text; in response to the video generation operation, the video carrying the original data is sent to the server. Generate a request; receive the target video sent by the server; display the target video;
  • the server is used to receive the video generation request sent by the electronic device; in response to the video generation request, obtain the target text according to the original data; according to the target text, generate the subtitle text; obtain the multimedia material corresponding to the subtitle text; target video; send the target video to an electronic device.
  • the present disclosure provides a computing device, comprising:
  • the processor is configured to read executable instructions from the memory and execute the executable instructions to implement the video display method described in the first aspect or the video processing method described in the second aspect.
  • the present disclosure provides a computer-readable storage medium, the storage medium stores a computer program, and when the computer program is executed by a processor, enables the processor to implement the video display method described in the first aspect or the second aspect the video processing method.
  • the present disclosure provides a computer program product, the computer program product comprising a computer program carried on a computer-readable medium, and when the computer program is executed by a processor, causes the processor to realize the video described in the first aspect The display method or the video processing method described in the second aspect.
  • the video display and processing method, device, system, device, and medium of the embodiments of the present disclosure can receive a video generation operation for raw data. Since the original data can be used to obtain target text, the video generation operation can be used to trigger the generation of the target.
  • the target video corresponding to the text therefore, after the video generation operation is received, the target video generated in response to the video generation operation can be displayed, and the video element of the target video can include subtitle text and the multimedia material corresponding to the subtitle text, wherein the subtitle text
  • the text can be automatically generated according to the target text, and the multimedia material can be automatically obtained according to the subtitle text. It can be seen that rich multimedia materials can be automatically found during the generation of the target video, and users do not need to manually search for the material for making the video, which not only reduces the cost of making videos. Time cost, but also can improve the quality of the video produced.
  • FIG. 1 is an architectural diagram of a video production scene provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of a video display method according to an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of a raw data input interface provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of another original data input interface provided by an embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of still another original data input interface provided by an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of a video display interface provided by an embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of another video display interface provided by an embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of still another video display interface provided by an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of still another video display interface provided by an embodiment of the present disclosure.
  • FIG. 10 is a schematic diagram of still another video display interface provided by an embodiment of the present disclosure.
  • FIG. 11 is a schematic diagram of still another original data input interface provided by an embodiment of the present disclosure.
  • FIG. 12 is a schematic diagram of still another original data input interface provided by an embodiment of the present disclosure.
  • FIG. 13 is a schematic diagram of still another video display interface provided by an embodiment of the present disclosure.
  • FIG. 14 is a schematic flowchart of another video display method provided by an embodiment of the present disclosure.
  • FIG. 15 is a schematic diagram of still another video display interface provided by an embodiment of the present disclosure.
  • 16 is a schematic flowchart of a video processing method provided by an embodiment of the present disclosure.
  • FIG. 17 is a schematic diagram of an interaction flow of a video processing system according to an embodiment of the present disclosure.
  • FIG. 18 is a schematic structural diagram of a video display device according to an embodiment of the present disclosure.
  • FIG. 19 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present disclosure.
  • FIG. 20 is a schematic structural diagram of a computing device according to an embodiment of the present disclosure.
  • the term “including” and variations thereof are open-ended inclusions, ie, "including but not limited to”.
  • the term “based on” is “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
  • the video display and processing method provided by the present disclosure can be applied to the architecture shown in FIG. 1 , and will be described in detail with reference to FIG. 1 .
  • FIG. 1 shows an architecture diagram of a video production scenario provided by an embodiment of the present disclosure.
  • the architecture diagram may include at least one electronic device 101 on the client side and at least one server 102 on the server side.
  • the electronic device 101 may establish a connection with the server 102 and perform information exchange through a network protocol such as Hyper Text Transfer Protocol over Secure Socket Layer (HTTPS).
  • HTTPS Hyper Text Transfer Protocol over Secure Socket Layer
  • the electronic device 101 may include a mobile phone, a tablet computer, a desktop computer, a notebook computer, a vehicle-mounted terminal, a wearable device, an all-in-one computer, a smart home device, or other devices with communication functions, and may also include a device simulated by a virtual machine or a simulator .
  • the server 102 may include a device with storage and computing functions, such as a cloud server or a server cluster.
  • the user can make a video in a designated platform on the electronic device 101, and the designated platform can be a designated application program or a designated website.
  • the user can send the video to the server 102 of the designated platform, and the server 102 can receive the video sent by the electronic device 101 and store the received video, so as to send the video to the electronic device that needs to play the video.
  • the electronic device 101 in order to reduce the time cost of producing a video and improve the quality of the produced video, the electronic device 101 can receive a user's video generation operation for raw data. Since the raw data can be used to obtain target text, the video generation operation It can be used to trigger the generation of the target video corresponding to the target text. Therefore, after the electronic device 101 receives the video generation operation, it can display the target video generated in response to the video generation operation.
  • the video elements of the target video can include subtitle text and The multimedia material corresponding to the subtitle text, in which the subtitle text can be automatically generated according to the target text, and the multimedia material can be automatically obtained according to the subtitle text. It can be seen that rich multimedia materials can be automatically found during the generation process of the target video, without the need for the user to manually search for production.
  • the material of the video can not only reduce the time cost of producing the video, but also improve the quality of the video produced.
  • the electronic device 101 may obtain target text according to the original data, and generate subtitle text according to the target text, and then obtain the corresponding subtitle text.
  • Multimedia material to generate target video according to the subtitle text and multimedia material so that the electronic device 101 locally obtains the target text based on the original data and generates the target video corresponding to the target text, so as to further reduce the time cost of video production.
  • the electronic device 101 may also send a video generation request carrying the original data to the server 102 after receiving the video generation operation.
  • the server 102 may, in response to the video generation request, obtain the target text according to the original data, and generate the subtitle text according to the target text, and then obtain the corresponding subtitle text.
  • the multimedia material to generate the target video according to the subtitle text and the multimedia material, and send the generated target video to the electronic device 101, so that the electronic device 101 can request the server 102 to obtain the target text based on the original data and generate the target video corresponding to the target text, In order to further improve the quality of the produced video and reduce the data processing amount of the electronic device 101 .
  • the video display method may be performed by an electronic device, for example, the electronic device 101 in the client shown in FIG. 1 .
  • the electronic devices may include devices with communication functions such as mobile phones, tablet computers, desktop computers, notebook computers, vehicle terminals, wearable devices, all-in-one computers, and smart home devices, and may also include devices simulated by virtual machines or simulators.
  • FIG. 2 shows a schematic flowchart of a video display method provided by an embodiment of the present disclosure.
  • the video display method may include the following S210-S220.
  • S210 Receive a video generation operation for the original data, where the original data is used to obtain target text, and the video generation operation is used to trigger generation of a target video corresponding to the target text.
  • the target text may be all text contents involved in the original data.
  • the original data can be data input by the user, or data sent by other devices to the electronic device.
  • the video display method may further include:
  • the raw data input by the user is displayed in real time.
  • the user input operation may include an operation of adding original data, or may include an operation of inputting original data, which is not limited herein.
  • the user can trigger a data input operation on the electronic device to input the original data that the user wants to input to the electronic device.
  • the electronic device can respond to the data input operation in real time and display the user in real time. input raw data.
  • the raw data may include text.
  • the electronic device may display a first input box for entering text, and the user may input an input operation for entering text into the electronic device in the first input box, so that the electronic device can The entered text is displayed in the first input box.
  • the first input box can be used to input text such as article title and article content, and the user can input the article title and article content in the first input box.
  • FIG. 3 shows a schematic diagram of a raw data input interface provided by an embodiment of the present disclosure.
  • a plurality of first input boxes may be displayed in the original data input interface 301 , such as an “article title” input box 302 and a “article content” input box 303 .
  • the user may perform an input operation in the "article title” input box 302 to input the article title, and may also perform an input operation in the "article content” input box 303 to input the article content.
  • FIG. 4 shows a schematic diagram of another original data input interface provided by an embodiment of the present disclosure.
  • a plurality of first input boxes may be displayed in the original data input interface 401 , for example, an “article title” input box 402 and a “subtitle” input box 403 .
  • the user can perform an input operation in the "article title” input box 402 to input the article title, and can also perform an input operation in the "subtitle” input box 403 to input the subtitles to be displayed in the video.
  • FIG. 5 shows a schematic diagram of still another original data input interface provided by an embodiment of the present disclosure.
  • a plurality of first input boxes may be displayed in the original data input interface 501 , such as an “article title” input box 502 and a “article content” input box 503 .
  • the user may perform an input operation in the "article title” input box 502 to input the article title, and may also perform an input operation in the "article content” input box 503 to input the article content.
  • the target text can be obtained through the text in the original data.
  • the original data may further include a link address
  • the link address may be used to obtain article content.
  • the electronic device may display a second input box for inputting the link address, and the user may input an input operation for inputting the link address to the electronic device in the second input box, so that the electronic device can The link address input by the user is displayed in the second input box.
  • the second input box can be used to input the URL of the article or a link address such as an identity document (Identity Document, ID), and the user can input the URL or ID of the article in the second input box.
  • link address can be a string in any form such as a website address, ID, etc., as long as the string can be used to obtain the content of the article required by the user, which is not limited here.
  • a second input box such as an “article link” input box 304
  • an “article link” input box 304 may be displayed in the original data input interface 301 .
  • the user may perform an input operation in the "article link” input box 304 to input the URL or ID of the article.
  • a second input box such as an “article link” input box 504
  • an “article link” input box 504 may be displayed in the original data input interface 501 .
  • the user may perform an input operation in the "article link” input box 504 to input the URL or ID of the article.
  • the target text can be obtained through the article content obtained based on the link address in the original data.
  • the original data may further include a link address, and the link address may also be used to obtain video content.
  • the method for the user to enter the link address for obtaining the video content is similar to the above-mentioned method for entering the link address for obtaining the article content, which will not be repeated here.
  • the video content obtained based on the link address in the original data can be used to obtain the target text.
  • the raw data may include multimedia files.
  • the multimedia files may include at least one of image files, audio files and video files.
  • the electronic device may display an add control for adding a multimedia file, and a user may input an add operation for adding a multimedia file to the electronic device through an add button, so that the electronic device may display the multimedia file added by the user.
  • the add control may be an add button
  • the add operation may include trigger operations such as clicking, long pressing, and double clicking on the adding button, selecting operations such as clicking, long pressing, and selecting a multimedia file, and clicking and long pressing the selection confirmation button.
  • double-click and other trigger operations the user can click the add button to enter the multimedia file selection interface, browse the multimedia files in the multimedia file selection interface and click the desired multimedia file, and finally click the selection confirmation button to complete the adding operation of the multimedia file.
  • a plurality of adding controls may be displayed in the raw data input interface 401 , such as an “image material” adding control 404 .
  • the user can perform an adding operation through the "image material” adding control 404 to add a picture file.
  • the raw data input interface 501 may display an add control, such as an “image/video material” add control 505 .
  • the user can perform an adding operation through the “image/video material” adding control 505 to add a picture file or a video file.
  • the video file may include a video file captured by the user in real time, or may include a video file specified by the user in the video file stored locally on the electronic device.
  • the video files may also include video files generated based on the embodiments of the present disclosure, and may also include video files obtained after editing the video files generated based on the embodiments of the present disclosure, and so on, so that the video files can be further optimized. edit.
  • the target text can be obtained through the multimedia file in the original data.
  • the video display method before receiving a user's data input operation, may further include:
  • a raw data input interface corresponding to the selected input mode is displayed.
  • receiving the user's data input operation may specifically include:
  • the input mode may include an automatic input input mode and a manual input input mode.
  • the automatic input input mode the user can input the above-mentioned raw data, so that the target text can be obtained through the raw data.
  • the manual input mode the user can directly input the multimedia material and subtitle text used to generate the target video.
  • the mode selection operation may include a user input gesture operation that triggers opening of different input modes.
  • the electronic device may be preset with multiple input modes and multiple gesture operations, and one gesture operation may be used to trigger opening of a corresponding input mode.
  • the user can determine the input mode that he wants to select, and input the gesture operation corresponding to the selected input mode to the electronic device, so that after the electronic device receives the gesture operation input by the user, the input mode corresponding to the received gesture operation is enabled, and Displays the raw data input interface corresponding to the enabled input mode.
  • the raw data input interface can display controls for inputting raw data supported by the enabled input mode, and the user can input the data corresponding to the controls through the displayed controls. Enter an action.
  • the mode selection operation may include a user's selection operation on selection controls of different input modes, such as operations such as clicking, long pressing, and double-clicking on the selection control.
  • the electronic device can display multiple selection controls, and one selection control can correspond to one input mode.
  • the user can determine the input mode that he wants to select, and input a selection operation to the selection control corresponding to the selected input mode, so that the electronic device
  • the selection control selected by the selection operation is displayed in the selected state
  • the input mode corresponding to the selection control in the selected state is enabled
  • the original data input interface corresponding to the opened input mode is displayed
  • the raw data input interface may display a control for inputting raw data supported by the enabled input mode, and the user may input a data input operation corresponding to the control through the displayed control.
  • a plurality of selection controls may be displayed in the raw data input interface 301 , such as an “automatic entry” selection control 305 and a “manual entry” selection control 306 .
  • the raw data input interface 301 may display controls corresponding to the automatic entry input mode, such as the "Article Title” input box 302 and the "Article Content” input box 303 and "article link” input box 304.
  • a plurality of selection controls may be displayed in the raw data input interface 401 , such as an “automatic entry” selection control 405 and a “manual entry” selection control 406 .
  • the original data input interface 401 may display controls corresponding to the manual entry input mode, such as the "article title” input box 402, "subtitle” input box 403 and "Image material” adds control 405.
  • the user can manually enter the article title and subtitle text in the "article title” input box 402 and the "subtitle” input box 403, respectively.
  • One page may correspond to one “image material” adding control 404
  • one “image material” adding control 404 may correspond to at least one “subtitle” input box 403
  • a page editing area 407 may be set for the page corresponding to each page number, and at least one "subtitle” input box 403 and "image material” adding control 404 of the page may be located in the page editing area 407 of the page.
  • a page editing area 407 may be correspondingly set on the right side of the page number "1”, and the subtitle text and image manually entered by the user through the "subtitle” input box 403 and the "image material” add control 404 in the page editing area 407, Both are the subtitle text and image of Page 1.
  • the display order of the subtitle text and images of the page 1 corresponds to the setting order of the corresponding "subtitle” input box 403 and the "image material” adding control 405 .
  • the raw data input interface 401 may also display material addition controls, such as the “Add” button 408, so that the user can add a new page through the “Add” button 408 Corresponding "subtitle” input box 403 and “image material” add controls 404, or add a "subtitle” input box 403 in the displayed page.
  • the original data input interface 401 may also display a subtitle deletion control, such as a "-" button 409, and a "-" button 409 corresponds to a "subtitle” input box 403 , the user can delete the corresponding “subtitle” input box 403 through the “-” button 409 .
  • a subtitle deletion control such as a "-" button 409
  • a "-" button 409 corresponds to a "subtitle” input box 403
  • the user can delete the corresponding “subtitle” input box 403 through the “-” button 409 .
  • the user can sequentially input the title of the article, the subtitles of each clause, and the image material corresponding to each subtitle in the manual input mode.
  • a plurality of selection controls may be displayed in the raw data input interface 501 , such as an “automatic entry” selection control 506 and a “manual entry” selection control 507 .
  • the raw data input interface 501 may display controls corresponding to the automatic entry input mode, such as the “Article Title” input box 502 and the “Article Content” input box 503 , an "article link” input box 504 and an "image/video clip” add control 505.
  • the video display method may further include: receiving and displaying the original data.
  • the original data may include at least one of text, link addresses, and multimedia files, which will not be repeated here.
  • the electronic device can provide the user with rich input methods of raw data for the user to select according to needs, which further improves the user's experience.
  • the user may input a video generation operation for the original data to the electronic device, so as to trigger generation and display of the target video corresponding to the target text.
  • the video generation operation may be a trigger operation such as a long press, double click, voice control, or expression control on the original data
  • the video generation operation may also be a trigger operation such as a click, a long press, or a double click on the video generation trigger control.
  • the raw data input interface 301 may display a video generation trigger control, such as a “Generate Video” button 307, and the user can input a trigger operation to the “Generate Video” button 307 to trigger the generation and display of the raw data involved.
  • a video generation trigger control such as a “Generate Video” button 307
  • the target video corresponding to the target text.
  • the raw data input interface 401 may display a video generation trigger control, such as the “Generate Video” button 410, and the user can input a trigger operation to the “Generate Video” button 410 to trigger the generation and display of the target involved in the raw data.
  • a video generation trigger control such as the “Generate Video” button 410
  • the user can input a trigger operation to the “Generate Video” button 410 to trigger the generation and display of the target involved in the raw data.
  • the target video corresponding to the text.
  • the raw data input interface 501 may display a video generation trigger control, such as a “Generate Video” button 508 , and the user can input a trigger operation to the “Generate Video” button 508 to trigger the generation and display of the raw data involved.
  • a video generation trigger control such as a “Generate Video” button 508
  • the target video corresponding to the target text.
  • the electronic device may display the target video generated based on the target text involved in the original data in response to the video generation operation.
  • the electronic device may display the target video in a full screen, or may display the target video in a partial display area.
  • the partial display area may be a part of the display area in the display screen.
  • the partial display area may be a part of the display area of any one of the display screens, or may be any one of the display screens.
  • FIG. 6 shows a schematic diagram of a video display interface provided by an embodiment of the present disclosure.
  • the electronic device may display a video display interface 601 in response to the video generation operation, and display a full-screen display window 602 of the target video in the video display interface 601 .
  • FIG. 7 shows a schematic diagram of another video display interface provided by an embodiment of the present disclosure.
  • the electronic device may display a video display interface 701 in response to the video generation operation, and a play window 702 of the target video may be displayed in the central display area of the video display interface 701 .
  • the video element of the target video may include at least one subtitle text, and one subtitle text may correspond to at least one multimedia material.
  • the multimedia material may include at least one of images, videos, audios, and the like.
  • the subtitle text can be automatically generated according to the target text obtained from the original data.
  • the subtitle text may be generated by an electronic device. In other embodiments, the subtitle text can also be generated by the server, which will be described in detail later.
  • the server may be the server 102 in the server shown in FIG. 1 .
  • the multimedia material can be automatically obtained from a plurality of local or Internet materials according to the subtitle text.
  • the multimedia material may be acquired by an electronic device.
  • the multimedia material can also be obtained by the server, which will be described in detail later.
  • a video generation operation for raw data can be received. Since the raw data can be used to obtain target text, the video generation operation can be used to trigger the generation of a target video corresponding to the target text. Therefore, after receiving the video After the generation operation, the target video generated in response to the video generation operation can be displayed, and the video element of the target video can include subtitle text and multimedia material corresponding to the subtitle text, wherein the subtitle text can be automatically generated according to the target text, and the multimedia material can be based on The subtitle text is automatically obtained. It can be seen that rich multimedia materials can be automatically found during the generation of the target video, and users do not need to manually search for materials for making videos, which can not only reduce the time cost of making videos, but also improve the quality of the videos.
  • the video display method may further include:
  • the video editing interface includes editable elements, and the editable elements include at least one of video elements and display effect elements corresponding to the video elements;
  • a modified target video is displayed, and the modified target video includes editable elements displayed within the video editing interface when the element modification operation is completed.
  • the electronic device may display a video editing interface for adjusting the target video in response to the video generation operation.
  • the user can adjust at least one of the video element of the target video and the display effect element corresponding to the video element in the video editing interface. Therefore, at least one of the video element and the display effect element corresponding to the video element can be used as an editable element in an editable state in the video editing interface.
  • the display effect elements may include transition effect elements, playback effects, special effects, decoration effect elements, and the like.
  • the modification effect element may include an effect element that plays a role in modifying the video element, such as the tone, size, contrast, color, modified text, and the like of the video element.
  • the user can input an element modification operation to the editable element that he wants to adjust in the video editing interface, so that the electronic device can display the editable element adjusted by the user in the process of the element modification operation in real time, and then display the edited element in real time.
  • the modified target video generated according to the editable elements displayed in the video editing interface when the user completes the element modification operation is displayed.
  • the element modification operation may include a modification operation on the subtitle text, an addition operation on the subtitle text, a deletion operation on the subtitle text, a replacement operation on the image material in the multimedia material, and an operation on the multimedia material.
  • a completion indicator control may also be displayed in the video editing interface, and the user can input a click, long press, double-click, etc. to the completion indicator control to complete the trigger operation.
  • the electronic device can In response to receiving a completion trigger operation input by the user to the completion indicating control, it is determined that the user has completed the element modification operation, and the modified target video generated according to the editable elements displayed in the video editing interface when the user completes the element modification operation is displayed .
  • the video editing interface may be displayed in the same interface as the target video.
  • the playback window of the target video is displayed in the video display interface, and the video editing interface is displayed below the playback window.
  • FIG. 8 shows a schematic diagram of still another video display interface provided by an embodiment of the present disclosure.
  • the electronic device may display a video display interface 801 in response to the video generation operation, and display a playback window 802 of the target video in the video display interface 801, below the playback window 802
  • a video editing interface 803 is displayed.
  • a scroll bar 804 may be displayed on the right side of the video display interface 801 , and the user may view the content of the video editing interface 803 by dragging the scroll bar 804 .
  • the user can perform element modification operations on editable elements such as article title, subtitle text, image material, etc. in the video editing interface 803 .
  • modify the article title through the “Article Title” input box 805 add the subtitle text of a new page through the “Add” button 806 or add the subtitle text of the displayed page, and use the “Subtitle” button 806
  • the input box 807 is used to modify the subtitle text, use the "-" button 808 to delete the subtitle file, and use the "image material” add control 809 to perform replacement and addition operations on image material and video material.
  • a completion indication control may be displayed at the bottom of the video editing interface 803, such as a “Submit Modification” button 810, and the user can input a completion triggering operation to the “Submit Modification” button 810 to trigger the generation and display of modifications After the target video.
  • FIG. 9 shows a schematic diagram of still another video display interface provided by an embodiment of the present disclosure.
  • the electronic device may display a video display interface 901 in response to the video generation operation, and display a playback window 902 of the target video in the video display interface 901 , below the playback window 902
  • a video editing interface 903 is displayed.
  • the user can perform element modification operations on editable elements such as article title, subtitle text, image material, etc. in the video editing interface 903 .
  • modify the article title through the “Article Title” input box 904 add the subtitle text of a new page through the “Add” button 905 or add the subtitle text of the displayed page, and use the “Subtitle” button 905
  • the input box 906 is used to modify the subtitle text, use the "-" button 907 to delete the subtitle file, and use the "image material” add control 908 to perform replacement and addition operations on image material and video material.
  • a scroll bar 909 may be displayed on the right side of the video editing interface 903 , and the user may view the content not displayed in the video editing interface 903 by dragging the scroll bar 909 .
  • a completion indicating control may be displayed at the bottom of the video editing interface 903, such as a “Submit Modification” button 910, and the user can input a completion triggering operation to the “Submit Modification” button 910 to trigger the generation and display of modifications After the target video.
  • FIG. 10 shows a schematic diagram of still another video display interface provided by an embodiment of the present disclosure.
  • the electronic device may display the video display interface 901 and display within the video display interface 901 in response to the video generation operation after receiving the video generation operation
  • a video editing interface 903 is displayed below the play window 902 .
  • the play window 902 may be located in the first display screen 912
  • the video editing interface 903 may be located in the second display screen 913 .
  • video editing interface 903 is similar to the embodiment shown in FIG. 9 , and details are not described here.
  • a video export control may also be displayed in the video display interface. If the user is satisfied with the video effect, the user can input a click, long press, double click, etc. to the video export control to trigger export operations, so that the electronic device responds to the After receiving the export trigger operation input by the user to the video export control, the video displayed in the video display interface is saved locally to the electronic device.
  • video export controls may be displayed at the bottom of the video editing interface 803 , such as an “Export Video” button 811 .
  • the “Export Video” button 811 may be located at the bottom of the “Submit Modification” button 810 Right.
  • the user can input an export trigger operation to the "export video” button 810, so that the electronic device saves the video displayed in the video display interface 811 locally to the electronic device.
  • a video export control may be displayed at the bottom of the video display interface 901, such as an “Export Video” button 911, and the user can input an export trigger operation to the “Export Video” button 911, so that the electronic device displays the video in the interface
  • the video displayed in 901 is saved locally to the electronic device.
  • the user can view the related elements in the video and the adjusted video on the same page, so as to improve the user's experience.
  • the video editing interface and the target video may be displayed in a different interface.
  • the playback window of the target video and the modification trigger controls such as the "video modification" button, are displayed in the video display interface.
  • the user can input modification trigger operations such as clicking, long-pressing, and double-clicking on the modification trigger control, so that the electronic device can respond to receiving To the modification trigger operation input by the user to the modification trigger control, the video display interface jumps to the display video editing interface.
  • the video editing interface is the same as the above-mentioned embodiment, which is not repeated here.
  • the video editing interface can also jump to the display video display interface, so as to display the content displayed in the video editing interface when the user completes the element modification operation.
  • the modified target video generated by the editable element can also jump to the display video display interface, so as to display the content displayed in the video editing interface when the user completes the element modification operation.
  • a video export control may also be displayed in the video display interface, and the user can input an export trigger operation such as clicking, long-pressing, and double-clicking on the video exporting control, so that the electronic device responds to receiving the user's export of the video.
  • the export trigger operation of the control input saves the video displayed in the video display interface to the local electronic device.
  • the target video and the video editing interface can be displayed independently, and the user experience can be improved.
  • the video display method may further include:
  • the indicator is used to indicate that the target video has been generated
  • the raw data is hidden in response to the identification triggering the operation.
  • the electronic device may not directly display the target video, but display an indicator for indicating that the target video has been generated.
  • the user can know that the target video has been produced, and can input a mark triggering operation for the indication mark to the electronic device to trigger the display of the target video that has been generated.
  • the electronic device can hide the currently displayed original data in response to the identification triggering operation, and display a video display interface, and the target video can be displayed in the video display interface.
  • the mark triggering operation may be a trigger operation such as a click, a long press, or a double click on the mark, which is not limited herein.
  • FIG. 11 shows a schematic diagram of still another original data input interface provided by an embodiment of the present disclosure.
  • FIG. 12 shows a schematic diagram of still another original data input interface provided by an embodiment of the present disclosure.
  • Fig. 13 shows a schematic diagram of another video display interface provided by an embodiment of the present disclosure.
  • the top display area of the display screen of the electronic device may display multiple page identifiers, such as the “input article” page identifier 1101 and the “output video” page identifier 1102, when the “input article” page identifier 1101 is displayed as a selected state , a raw data input interface 1103 may be displayed below the top display area.
  • the display content and interaction method of the original data input interface 1103 are similar to those shown in FIG. 3 and FIG. 4 , and will not be repeated here.
  • the electronic device can wait to generate the target video, and while the electronic device is waiting to generate the target video, the raw data input interface 1103 can be displayed. It is in an inoperable state, such as grayed out.
  • an indicator such as the “ ⁇ ” icon 1104
  • the user can input the indicator to the “ ⁇ ” icon 1104 to trigger an operation, so that the electronic device displays target video.
  • the original data input interface 1103 can be hidden below the top display area, and after hiding the original data input interface 1103, in the top display area A video display interface 1105 is displayed below.
  • the video display interface 1105 may display a playback window 1106 of the target video.
  • a progress prompt screen for indicating the production progress of the target video may be superimposed on the raw data input interface 1103, for example, a progress prompt bar to display Let users know the progress of the production of the target video.
  • the electronic device can locally automatically generate subtitle text according to the target text obtained through the original data, automatically obtain multimedia material according to the subtitle text, and automatically generate the target video according to the subtitle text and the multimedia material, so as to reduce the number of target The generation time of the video.
  • the electronic device when the user inputs raw data to the electronic device through the automatic input input mode or the electronic device receives the raw data sent by other devices, the electronic device can locally automatically according to the target text obtained through the raw data Generate subtitle text, automatically acquire multimedia material according to subtitle text, and automatically generate target video according to subtitle text and multimedia material.
  • FIG. 14 shows a schematic flowchart of another video display method provided by an embodiment of the present disclosure.
  • the video display method may include the following S1410-S1460.
  • the raw data is used to obtain the target text, and the video generation operation is used to trigger the generation of the target video corresponding to the target text.
  • S1410 is similar to S210 in the embodiment shown in FIG. 2 , and details are not described here.
  • S1420 may specifically include:
  • the electronic device may determine the data type of the original data in response to the video generation operation, and in the case of determining that the data type of the original data is a text type, directly extract the text in the original data, and use the extracted text as a target text.
  • the electronic device may determine the data type of the raw data through a control that receives the raw data.
  • the electronic device may, in response to receiving a trigger operation input by the user on the “Generate Video” button 307 , determine that the controls for receiving the raw data are the “Article Title” input box 302 and the “Article Content” input box 303 .
  • the electronic device may determine that the data type of the original data is a text type, and then directly extract the text in the original data such as text title and text content, and use the extracted text as the target text.
  • S1420 may specifically include:
  • the electronic device may determine the data type of the original data in response to the video generation operation, and in the case of determining that the data type of the original data is the multimedia file type, perform text conversion on the multimedia file to obtain the converted text, and convert the converted text as the target text.
  • the electronic device may determine the data type of the raw data through a control that receives the raw data.
  • the electronic device may determine that the control that received the raw data is the “Image/Video Material” add control 505 . Determine the data type of the original data as the file type, and then perform text conversion on the multimedia file to obtain the converted text, and use the converted text as the target text.
  • the text of the image file can be converted by using the Optical Character Recognition (OCR) technology to obtain the converted text;
  • OCR Optical Character Recognition
  • the content is learned, and the text conversion is performed by summarizing the content of the image screen to obtain the converted text describing the content of the image screen, which is not limited here.
  • text conversion can be performed on each image frame of the video file by using the Optical Character Recognition (OCR) technology to obtain the converted text;
  • OCR Optical Character Recognition
  • the image content of each image frame of the video file is learned, and the text is converted by summarizing the content of the image to obtain the converted text describing the content of the image.
  • the audio in the video file can also be converted to text through speech recognition to obtain the converted text. , which is not limited here.
  • the audio file may be converted to text through speech recognition to obtain the converted text.
  • S1420 may specifically include:
  • the electronic device may determine the data type of the original data in response to the video generation operation, and in the case where the data type of the original data is determined to be the address type, obtain the target article based on the link address, and directly extract the article in the target article text, and then use the extracted article text as the target text.
  • the electronic device may determine the data type of the raw data through a control that receives the raw data.
  • the electronic device may determine the original The data type of the data is address type, and then the link address in the original data is extracted, based on the link address, the target article is obtained, and the article text in the target article is directly extracted, and then the extracted article text is used as the target text.
  • generating subtitle text may specifically include:
  • Text typesetting is performed on the abstract to obtain the subtitle text corresponding to the abstract.
  • the electronic device can directly input the target file into a preset abstract extraction model to obtain an abstract of the target text, and then directly input the obtained abstract into a preset text typesetting model to obtain sentence-based, word-to-word Subtitle text with matching punctuation that has been processed across lines and pages.
  • the electronic device may also, according to at least one of the title of the target text and the text content, in the Internet article text or the local article text, filter the text whose similarity with the target text meets the preset text similarity threshold. Similar article text. If similar article texts are not filtered, you can directly input the target file into the preset abstract extraction model to get the abstract of the target text. If similar article texts are filtered, you can perform a weighted sum based on the text length, the number of likes, and the number of retweets in the target text and similar article texts to obtain the text score of each text, and select the text with the highest text score, and assign The selected text is input into the preset abstract extraction model to obtain the abstract of the target text. After the electronic device obtains the abstract, the abstract can be directly input into a preset text typesetting model to obtain subtitle text in sentence units, with words processed across lines and pages, and configured with matching punctuation marks.
  • text cleaning may also be performed on sensitive keywords in the abstract, such as institution name and user personal information, as well as special symbols that cannot generate voice and audio, and then the cleaned Text typesetting is performed on the abstract, and the subtitle text corresponding to the abstract is obtained.
  • generating subtitle text according to the target text may further specifically include:
  • the method for obtaining the abstract of the target text by the electronic device is similar to the above-mentioned embodiment, and details are not described here.
  • S1440 may specifically include:
  • the multiple preset materials include materials obtained according to the original data, and the preset materials include at least one of images and videos;
  • the electronic device may first acquire a plurality of preset materials, and the preset materials may include at least one of materials in a material library and materials in the Internet.
  • the electronic device may acquire at least one material from images and videos based on the original data, and the preset material may also include materials acquired according to the original data.
  • the electronic device may determine the degree of matching between each preset material and each subtitle text, and then select a preset number of preset materials with the highest matching degree for each subtitle text, and select the selected The preset material is used as the target material corresponding to the subtitle text.
  • the electronic device may input each preset material and each corresponding subtitle text into a preset graphic-text matching model, respectively, to obtain a graphic-text matching score between each preset material and each corresponding subtitle text, Then, calculate the text similarity between the text contained in each preset material and each corresponding subtitle text, then determine whether each preset material and each subtitle text have the same source, obtain the source similarity, and finally , for each subtitle text, use at least one of the image-text matching score, text similarity, source similarity, the image clustering score of the preset material, and the text weight of the text to which the preset material belongs to perform a weighted sum to obtain each subtitle text.
  • the matching degree of each preset material with respect to the subtitle text, and then a preset number of preset materials with the highest matching degree are selected, and the selected preset material is used as the target material corresponding to the subtitle text.
  • the text similarity may be calculated based on a preset text similarity algorithm according to the subtitle text and the text contained in the preset material.
  • the preset text similarity algorithm may be a text semantic similarity algorithm or a text text similarity algorithm, which is not limited herein.
  • S1440 may specifically include:
  • the electronic device may perform voice conversion on each subtitle text based on a text-to-speech conversion technology to obtain subtitle audio corresponding to each subtitle text.
  • the electronic device may input the subtitle text into a preset text-to-speech conversion model to perform speech conversion to obtain subtitle audio.
  • S1440 may specifically include:
  • the electronic device may be preset with multiple preset background audios, one preset background audio may correspond to one emotion category, and the emotion category may include a happy category, a sad category, a serious category, a nervous category, etc., which are used to represent the target text.
  • the category to which the sentiment belongs. The electronic device inputs the target text into a preset text emotion classification model for classification, obtains the emotion category to which the target text belongs, and determines the target background audio corresponding to the emotion category in multiple preset background audios, and then uses the target background audio as multimedia. material.
  • the electronic device can select appropriate background music from multiple preset background audios by performing sentiment analysis and classification on the target text to generate the target video.
  • subtitle texts may be added at preset positions in each image frame of each image and video, and the images and videos may be added to the images and videos according to the arrangement order of the respective subtitle texts. Sorting is performed to obtain a dynamic image, and then video rendering is performed on the subtitle text, image and video in the dynamic image according to the display effect of the preset subtitle text and the display effect of the image and video in the preset video template to obtain the target video.
  • subtitle text may be added at a preset position in each image frame of each image and video, and according to the arrangement order of each subtitle text,
  • the images and videos are sorted, and then, according to the audio duration of the subtitle audio corresponding to each subtitle text, the number of images and videos corresponding to each subtitle text, and the duration of the video, the display time and display duration of each image and video are determined,
  • a dynamic image is obtained according to the sorting of the images and videos and the display time and display duration of each image and video, and then according to the display effect of the preset subtitle text and the display effect of the image and the video in the preset video template.
  • the subtitle text in the dynamic image, the image and the video are rendered video to obtain the target video, and the target video and subtitle audio are converted according to the corresponding relationship between the timestamp of each video frame in the target video and the timestamp of the audio frame of the subtitle audio. Fusion to get the fused target video.
  • subtitle text may be added at a preset position in each image frame of each image and video, and the subtitle text may be added according to the Arrange the order, sort the images and videos, and then determine the display time of each image and video according to the audio duration of the subtitle audio corresponding to each subtitle text, the number of images and videos corresponding to each subtitle text, and the duration of the video and display duration, then, according to the sorting of images and videos and the display time and display duration of each image and video, a dynamic image is obtained, and then according to the preset video template in the preset subtitle text display effect and images and videos
  • the display effect performs video rendering on the subtitle text and images and videos in the dynamic image to obtain the target video, and according to the corresponding relationship between the timestamp of each video frame in the target video and the timestamp of the audio frame of the subtitle audio, the target video is The video and subtitle audio are fused to obtain the target video after preliminary fusion.
  • the video element of the target video includes subtitle text and multimedia material corresponding to the subtitle text, the subtitle text is generated according to the target text, and the multimedia material is obtained according to the subtitle text.
  • S1460 is similar to S220 in the embodiment shown in FIG. 2 , and details are not described here.
  • images, videos, audios and other materials can be automatically matched according to the target text involved in the original data, and a target video corresponding to a piece of target text can be automatically rendered and generated, thereby improving the intelligence of the electronic device.
  • the video display method may further include:
  • S1450 may specifically include:
  • the process of determining the target video template corresponding to the target text may be performed in parallel with S1430 and S1440, or may be performed sequentially with S1430 and S1440 in a preset sequence.
  • the electronic device may be preset with multiple video classification templates, one video classification template may correspond to one content category, and the content category may include text content such as news category, story category, diary category, variety show category, etc. used to represent the target text The category to which it belongs.
  • the electronic device can input the target text into a preset text content classification model for classification, obtain the content category to which the target text belongs, and determine the target video template corresponding to the content category among the multiple preset video templates. , and then generate the target video according to the subtitle text, multimedia material and target video template.
  • the content categories may also be classified according to other classification methods, for example, the content of the target text is classified according to the keywords contained in the target text, which is not limited herein.
  • different video templates may also include different display effect elements. Therefore, the target video can be obtained by performing video rendering on the subtitle text and the multimedia material according to the preset display effect of the subtitle text and the display effect of the multimedia material in the target video template.
  • an appropriate target video template for generating the target video can be selected according to the content category to which the target text belongs, and then an appropriate display effect can be set for the subtitle text and the multimedia material.
  • the electronic device when the user inputs original data to the electronic device through the manual input mode, the electronic device can directly acquire the subtitle text and multimedia material input by the user, and automatically generate the target video.
  • the electronic device may acquire the subtitle text and image input by the user, and based on the subtitle text, Among the images input by the user and the preset images and videos, the image or video with the highest matching degree with the subtitle text is determined, and then the target video is automatically generated by using the subtitle text and the determined image or video.
  • the electronic device may also acquire the subtitle text and images input by the user, input the subtitle text into a preset text content classification model for classification, obtain the content category to which the subtitle text belongs, and then select the content category to which the subtitle text belongs.
  • the target video template corresponding to the content category is determined, and then the target video is automatically generated according to the subtitle text, the multimedia material and the target video template.
  • the original data such as subtitle text and multimedia materials manually input by the user
  • the subtitle text can be used as the target text
  • the target video corresponding to the subtitle text and multimedia materials manually input by the user can be automatically generated, which further improves the user experience.
  • the electronic device may generate the target video through the server, so as to reduce the data processing amount of the electronic device and further improve the quality of the produced video.
  • the video display method may further include:
  • the electronic device may send a video generation request carrying the original data to the server, so that the server generates and feeds back the target video corresponding to the target text based on the original data in response to the video generation request.
  • the electronic device may receive the target video fed back by the server, and display the target video.
  • the server can automatically obtain the target text through the original data, automatically generate the subtitle text according to the target text, automatically obtain the multimedia material according to the subtitle text, and automatically generate the target video according to the subtitle text and the multimedia material, which is similar to the method for generating the target video by the aforementioned electronic device. , which will not be repeated here.
  • the target video can be generated based on the original data in a fast and high-quality manner through the interaction between the electronic device and the server, so as to improve the user's experience.
  • the video element in order to improve the interest of the target video, may further include a preset virtual object and a pose of the virtual object.
  • the pose of the virtual object can be determined according to the subtitle text.
  • the virtual object may be a virtual character object with a character image, a virtual cartoon object with a cartoon image, etc., and the present disclosure does not limit the specific type of the virtual object.
  • FIG. 15 shows a schematic diagram of still another video display interface provided by an embodiment of the present disclosure.
  • the electronic device may display a video display interface 1501 in response to the video generation operation, and display a full-screen playback window 1502 of the target video in the video display interface 1501 , and display the display in the playback window 1502
  • the target video may include virtual objects 1503, such as avatar objects.
  • the virtual character object 1503 can be, for example, a virtual anchor of the news broadcast.
  • the gesture of the virtual object may include at least one of a mouth gesture, a facial expression gesture, a gesture gesture, and a body gesture, which is not limited herein.
  • the pose of the virtual object may include a mouth pose and a facial expression pose.
  • the poses of the virtual object may include mouth poses, facial expression poses, and gesture poses.
  • the poses of the virtual object may include mouth poses, facial expression poses, gesture poses, and body poses. The present disclosure does not limit the gesture type of the virtual object.
  • the pose of the virtual object may be automatically determined according to the subtitle text.
  • the pose of the virtual object may be determined by the electronic device. In other embodiments, the pose of the virtual object may also be determined by the server.
  • the method for determining the pose of the virtual object by the electronic device is similar to the method for determining the pose of the virtual object by the server, the following takes the virtual object including the virtual character object and the method for determining the pose of the character by the electronic device as an example for detailed description.
  • the subtitle audio can be input into a preset gesture generation model to obtain a real-time character gesture animation, and the real-time character gesture animation can be migrated to the object model of the virtual character object by using gesture migration technology , obtain the object model of the virtual character object that broadcasts the subtitle text, and then obtain the character pose image of the virtual character object corresponding to each audio frame of the subtitle audio according to the obtained object model, and according to each audio frame, according to the subtitle text and multimedia materials
  • the timestamps in the generated target video are fused with the corresponding person pose images into the target video to obtain a fused target video with virtual human objects.
  • the preset gesture generation model can be used to generate the mouth gesture animation and the facial expression gesture animation.
  • the preset pose generation model can be used to generate the mouth pose, the facial expression pose, and the gesture pose.
  • the preset pose generation model can be used to generate a mouth pose, a facial expression pose, a gesture pose, and a body pose.
  • a virtual object similar to the user image may also be generated.
  • the virtual object may be generated before the subtitle audio is generated, or may be generated after the subtitle audio is generated, which is not limited herein.
  • the electronic device may first collect the user image, and then may input the user image into a preset biometric feature extraction model to extract the user biometric feature in the user image , then input the extracted user biometrics into the preset object generation model to obtain the initial object model of the virtual object with the user biometrics, and finally fuse the preset clothing model into the initial object model to obtain the final virtual object. object model.
  • the user image may be a captured image captured by the user through a camera, or may be a captured image selected by the user from preset images.
  • the user image may be the user's face image, upper body image and whole body image, which is not limited herein.
  • the user biometric feature extracted by the electronic device may include at least one of the user's face feature, head and shoulder feature, and body shape feature, which is not limited herein.
  • the extracted biometric features of the user may include the user's facial features.
  • the extracted biometric features of the user may include the user's facial features and body shape features.
  • a preset object generation model when the user image is a face image, a preset object generation model can be used to generate a head model. In the case where the user image is an upper body image, a preset object generation model can be used to generate an upper body model. In the case where the user image is a whole body image, a preset object generation model can be used to generate a whole body model.
  • the electronic device may first collect the user image, then extract the user biometrics in the user image, and then send the extracted user biometrics to the server, so that the server can Generate object models of virtual objects based on user biometrics.
  • the method for generating the object model by the server is similar to the above-mentioned method for generating the object model by the electronic device, and details are not described here.
  • a virtual object similar to the user's image and the user's attire may also be generated.
  • the virtual object may be generated before the subtitle audio is generated, or may be generated after the subtitle audio is generated, which is not limited herein.
  • the electronic device may first collect the user image, and then may input the user image into a preset biometric feature extraction model to extract the user biometric feature in the user image , and input the user image into the preset dressing feature extraction model to extract the user dressing feature in the user image, and then input the extracted user biometrics into the preset object generation model to obtain a virtual object with the user biometrics and according to the corresponding relationship between the preset dress-up style and the dress-up model, in the preset dress-up model, the target dress-up model corresponding to the user dress-up style to which the user dress-up feature belongs is queried, and the target dress-up model and the initial The object model is obtained to obtain the object model of the virtual object with the user's dressing characteristics.
  • the extracted user dressing features may include at least one of the user's facial decoration features, headgear features, clothing features, and clothing accessories features.
  • the extracted user biometric features may include the user's facial features
  • the extracted user dressing features may include headwear features.
  • the extracted user biometric features may include the user's facial features and body shape features
  • the extracted user dressing features may include facial decoration features, headgear features, clothing features, and clothing accessories features.
  • the electronic device may input the user's dressing feature into a preset dressing style classification model, so as to determine the user's dressing style to which the user's dressing feature belongs.
  • the dressing style can include intellectual, cute, handsome, calm, sunny and so on.
  • the electronic device may first collect the user image, then extract the user biometrics and user dressing features in the user image, and then use the extracted user biometrics and user
  • the dressing feature is sent to the server, so that the server generates an object model of the virtual object according to the user's biometric feature and the user's dressing feature.
  • the method for generating the object model by the server is similar to the above-mentioned method for generating the object model by the electronic device, and details are not described here.
  • the electronic device or server may also generate the subtitle audio according to the dress style of the virtual object, where the subtitle audio is Audio of sound signatures consistent with the virtual object's dress-up characteristics.
  • the electronic device can be preset with multiple text-to-speech conversion models, and each text-to-speech conversion model corresponds to a dress style, so the electronic device can select virtual objects among the multiple text-to-speech conversion models.
  • the target text-to-speech conversion model corresponding to the dressing style of the virtual object, and the subtitle text is input into the target text-to-speech conversion model for voice conversion to obtain subtitle audio, so as to generate audio with sound characteristics consistent with the dress-up characteristics of the virtual object, further improving the user experience .
  • the video processing method may be executed by a server, for example, the server 102 in the server shown in FIG. 1 .
  • the server may include a device with storage and computing functions, such as a cloud server or a server cluster.
  • FIG. 16 shows a schematic flowchart of a video processing method provided by an embodiment of the present disclosure.
  • the video processing method may include the following S1610-S16600.
  • the electronic device may, in response to the video generation operation, send a video generation request with the original data to the server, so that the server can receive the original data sent by the electronic device.
  • a video generation request and in response to the video generation request, the target video corresponding to the target text is fed back based on the original data.
  • the electronic device may be the electronic device 101 in the client shown in FIG. 1 .
  • the raw data may include text.
  • S1620 may specifically include:
  • the server may directly extract the text in the original data when it is determined that the data type of the original data is the text type, and use the extracted text as the target text.
  • the video generation request may carry the data type of the original data, and the data type of the original data may be determined by the electronic device through a control that receives the original data.
  • the original data may include multimedia files.
  • S1620 may specifically include:
  • the server may perform text conversion on the multimedia file when it is determined that the data type of the original data is the multimedia file type to obtain the converted text, and use the converted text as the target text.
  • the video generation request may carry the data type of the original data, and the data type of the original data may be determined by the electronic device through the control that receives the original data.
  • the image file when the multimedia file includes an image file, the image file can be converted to text through OCR technology to obtain the converted text; the image content of the image file can also be learned, and the image content can be learned by summarizing the image image.
  • the content is converted into text to obtain the converted text describing the content of the image screen, which is not limited here.
  • OCR technology can be used to perform text conversion on each image frame of the video file to obtain the converted text; Learn the content of the image and screen, and convert the text by summarizing the content of the image and screen to obtain the converted text describing the content of the image and screen. You can also perform text conversion on the audio in the video file through speech recognition to obtain the converted text, which is not limited here.
  • the audio file may be converted to text through speech recognition to obtain the converted text.
  • the original data may include a link address, and the link address may be used to obtain article content.
  • S1620 may specifically include:
  • the server may obtain the target article based on the link address when it is determined that the data type of the original data is the address type, and directly extract the article text in the target article, and then use the extracted article text as the target text.
  • the video generation request may carry the data type of the original data, and the data type of the original data may be determined by the electronic device through a control that receives the original data.
  • S1630 may specifically include:
  • Text typesetting is performed on the abstract to obtain the subtitle text corresponding to the abstract.
  • the server may directly input the target file into a preset abstract extraction model to obtain an abstract of the target text, and then directly input the obtained abstract into a preset text typesetting model to obtain a sentence-by-sentence unit of word processing.
  • Subtitle text that is processed across lines and pages and configured with matching punctuation.
  • the server may also filter, in the Internet article text or the local article text, based on at least one of the title of the target text and the text content, those whose similarity with the target text meets a preset text similarity threshold Article text. If similar article texts are not filtered, you can directly input the target file into the preset abstract extraction model to get the abstract of the target text. If similar article texts are filtered, you can perform a weighted sum based on the text length, the number of likes, and the number of retweets in the target text and similar article texts to obtain the text score of each text, and select the text with the highest text score, and assign The selected text is input into the preset abstract extraction model to obtain the abstract of the target text. After the electronic device obtains the abstract, the abstract can be directly input into a preset text typesetting model to obtain subtitle text in sentence units, with words processed across lines and pages, and configured with matching punctuation marks.
  • text cleaning may also be performed on sensitive keywords in the abstract, such as institution name and user personal information, as well as special symbols that cannot generate voice and audio, and then the cleaned Text typesetting is performed on the abstract, and the subtitle text corresponding to the abstract is obtained.
  • S1630 may specifically include:
  • the method for the server to obtain the abstract of the target text is similar to the above-mentioned embodiment, and is not repeated here.
  • S1640 may specifically include:
  • the multiple preset materials include materials obtained according to the original data, and the preset materials include at least one of images and videos;
  • the server may first acquire a plurality of preset materials, and the preset materials may include at least one of materials in a material library and materials in the Internet.
  • the server may acquire at least one material among images and videos based on the original data, and the preset material may also include material acquired according to the original data.
  • the server may determine the degree of matching between each preset material and each subtitle text, and then select a preset number of preset materials with the highest matching degree for each subtitle text, and select the selected The preset material is used as the target material corresponding to the subtitle text.
  • the server may input each preset material and each corresponding subtitle text into a preset graphic-text matching model, respectively, to obtain a graphic-text matching score between each preset material and each corresponding subtitle text, and then , calculate the text similarity between the text contained in each preset material and each corresponding subtitle text, then determine whether the source of each preset material and each subtitle text is the same, and obtain the source similarity, and finally, For each subtitle text, use at least one of the image-text matching score, text similarity, source similarity, the image clustering score of the preset material, and the text weight of the text to which the preset material belongs to perform a weighted sum to obtain each subtitle text.
  • the matching degree of the preset material relative to the subtitle text is determined, and then a preset number of preset materials with the highest matching degree are selected, and the selected preset material is used as the target material corresponding to the subtitle text.
  • the text similarity may be calculated based on a preset text similarity algorithm according to the subtitle text and the text contained in the preset material.
  • the preset text similarity algorithm may be a text semantic similarity algorithm or a text text similarity algorithm, which is not limited herein.
  • S1640 may specifically include:
  • the server may perform voice conversion on each subtitle text based on a text-to-speech conversion technology to obtain subtitle audio corresponding to each subtitle text.
  • the server may input the subtitle text into a preset text-to-speech conversion model to perform speech conversion to obtain subtitle audio.
  • S1640 may specifically include:
  • the server may be preset with multiple preset background audios, one preset background audio may correspond to one emotion category, and the emotion category may include a happy category, a sad category, a serious category, a nervous category, etc., which are used to represent the emotion of the target text The category to which it belongs.
  • the server inputs the target text into a preset text emotion classification model for classification, obtains the emotion category to which the target text belongs, and determines the target background audio corresponding to the emotion category among the plurality of preset background audios, and then uses the target background audio as a multimedia material .
  • the server can select appropriate background music from multiple preset background audios by performing sentiment analysis and classification on the target text to generate the target video.
  • the server may directly generate the target video according to the subtitle text and the multimedia material.
  • the video element may include subtitle text and multimedia material.
  • subtitle texts may be added at preset positions in each image frame of each image and video, and the images and videos may be added to the images and videos according to the arrangement order of the respective subtitle texts. Sorting is performed to obtain a dynamic image, and then video rendering is performed on the subtitle text, image and video in the dynamic image according to the display effect of the preset subtitle text and the display effect of the image and video in the preset video template to obtain the target video.
  • subtitle text may be added at a preset position in each image frame of each image and video, and according to the arrangement order of each subtitle text,
  • the images and videos are sorted, and then, according to the audio duration of the subtitle audio corresponding to each subtitle text, the number of images and videos corresponding to each subtitle text, and the duration of the video, the display time and display duration of each image and video are determined,
  • a dynamic image is obtained according to the sorting of the images and videos and the display time and display duration of each image and video, and then according to the display effect of the preset subtitle text and the display effect of the image and the video in the preset video template.
  • the subtitle text in the dynamic image, the image and the video are rendered into video, and the target video is obtained, and the target video and the subtitle audio are converted according to the corresponding relationship between the timestamp of each video frame in the target video and the timestamp of the audio frame of the subtitle audio. Fusion to get the fused target video.
  • subtitle text may be added at a preset position in each image frame of each image and video, and the subtitle text may be added according to the Arrange the order, sort the images and videos, and then determine the display time of each image and video according to the audio duration of the subtitle audio corresponding to each subtitle text, the number of images and videos corresponding to each subtitle text, and the duration of the video and display duration, then, according to the sorting of images and videos and the display time and display duration of each image and video, a dynamic image is obtained, and then according to the preset video template in the preset subtitle text display effect and images and videos
  • the display effect performs video rendering on the subtitle text and images and videos in the dynamic image to obtain the target video, and according to the corresponding relationship between the timestamp of each video frame in the target video and the timestamp of the audio frame of the subtitle audio, the target video is The video and subtitle audio are fused to obtain the target video after preliminary fusion.
  • the video processing method may further include:
  • a target video template corresponding to the content category is determined.
  • S1650 may specifically include:
  • the server may preset multiple video classification templates, one video classification template may correspond to one content category, and the content category may include news category, story category, diary category, variety show category, etc. used to indicate that the text content of the target text belongs to Category of classification.
  • the electronic device can input the target text into a preset text content classification model for classification, obtain the content category to which the target text belongs, and determine the target video template corresponding to the content category among the multiple preset video templates. , and then generate the target video according to the subtitle text, multimedia material and target video template.
  • the content categories may also be classified according to other classification methods, for example, the content of the target text is classified according to the keywords contained in the target text, which is not limited herein.
  • different video templates may also include different display effect elements. Therefore, the target video can be obtained by performing video rendering on the subtitle text and the multimedia material according to the preset display effect of the subtitle text and the display effect of the multimedia material in the target video template.
  • an appropriate target video template for generating the target video can be selected according to the content category to which the target text belongs, and then an appropriate display effect can be set for the subtitle text and the multimedia material.
  • the video processing method may further include:
  • the pose of the virtual object is determined.
  • S1650 may specifically include:
  • the target video is generated according to the subtitle text, multimedia material, virtual object and the pose of the virtual object.
  • the video element may further include a preset virtual object and a pose of the virtual object.
  • the virtual object may be a virtual character object with a character image, a virtual cartoon object with a cartoon image, etc., and the present disclosure does not limit the specific type of the virtual object.
  • the gesture of the virtual object may include at least one of a mouth gesture, a facial expression gesture, a gesture gesture, and a body gesture, which is not limited herein.
  • the pose of the virtual object may include a mouth pose and a facial expression pose.
  • the poses of the virtual object may include mouth poses, facial expression poses, and gesture poses.
  • the poses of the virtual object may include mouth poses, facial expression poses, gesture poses, and body poses. The present disclosure does not limit the gesture type of the virtual object.
  • the subtitle audio can be input into a preset gesture generation model to obtain a real-time character gesture animation, and the gesture migration technology is used to transfer the real-time character gesture animation to the object model of the virtual character object, Obtain the object model of the virtual character object that broadcasts the subtitle text, and then obtain the character pose image of the virtual character object corresponding to each audio frame of the subtitle audio according to the obtained object model, and according to each audio frame, according to the subtitle text and multimedia materials.
  • the timestamps in the generated target video are fused with the corresponding person pose images into the target video to obtain a fused target video with virtual human objects.
  • the preset gesture generation model can be used to generate a mouth gesture animation and a facial expression gesture animation.
  • the preset pose generation model can be used to generate a mouth pose, a facial expression pose, and a gesture pose.
  • the preset pose generation model can be used to generate a mouth pose, a facial expression pose, a gesture pose, and a body pose.
  • the video processing method may further include:
  • the electronic device may send the user image to the server, and the server may input the received user image into a preset biometric feature extraction model to extract the user biometric feature in the user image, and then input the extracted user biometric feature into a preset biometric feature extraction model.
  • the preset object generation model is used to obtain the initial object model of the virtual object with the user's biological characteristics, and finally the preset clothing model is fused into the initial object model to obtain the final object model of the virtual object.
  • the electronic device may send the user image to the server, and the server may input the received user image into a preset biometric feature extraction model to extract the user biometric feature in the user image, and input the user image into the preset biometric feature extraction model Then, the extracted user biometrics can be input into the preset object generation model to obtain the initial object model of the virtual object with the user biometrics, and according to the preset Set the corresponding relationship between the dressing style and the dressing model, in the preset dressing model, query the target dressing model corresponding to the user dressing style to which the user dressing feature belongs, and compare the target dressing model and the initial object model to obtain the user dressing feature.
  • the object model of the virtual object may be input into the preset object generation model to obtain the initial object model of the virtual object with the user biometrics, and according to the preset Set the corresponding relationship between the dressing style and the dressing model, in the preset dressing model, query the target dressing model corresponding to the user dressing style to which the user dressing feature belongs, and compare the target dressing model and the initial object model to obtain
  • the extracted user dressing features may include at least one of the user's facial decoration features, headgear features, clothing features, and clothing accessories features.
  • the server may also generate subtitle audio according to the costume style of the virtual object, where the subtitle audio is the costume of the virtual object Audio with a consistent sound signature.
  • the server may send the target video to the electronic device, so that the electronic device displays the target video.
  • the present disclosure it is possible to receive a video generation request carrying original data sent by an electronic device, and in response to the video generation request, automatically obtain target text according to the original data, and automatically generate subtitle text according to the target text, and then Automatically obtain the multimedia material corresponding to the subtitle text, and automatically generate the target video according to the subtitle text and multimedia material.
  • the rich multimedia material can be automatically found during the generation of the target video, without the need for the user to manually search for the material for making the video.
  • the time cost of producing the video can be reduced, and the quality of the produced video can also be improved.
  • An embodiment of the present disclosure further provides a video processing system, where the video processing system may include an electronic device and a server, thereby implementing the architecture shown in FIG. 1 .
  • the electronic device can be used to receive a video generation operation for the original data, the original data is used to obtain target text, and the video generation operation is used to trigger the generation of a target video corresponding to the target text; in response to the video generation operation, send the original data to the server.
  • video generation request ; receive the target video sent by the server; display the target video;
  • the server can be used to receive a video generation request sent by an electronic device; in response to the video generation request, obtain target text according to the original data; generate subtitle text according to the target text; obtain multimedia material corresponding to the subtitle text; , generate the target video; send the target video to the electronic device.
  • the video display device may perform various steps in the method embodiments shown in FIG. 2 to FIG. 15, and realize various processes and effects in the method embodiments shown in FIG. 2 to FIG. Various steps in the method embodiment shown in FIG. 16 are executed, and various processes and effects in the method embodiment shown in FIG. 16 are implemented, which will not be repeated here.
  • the electronic device may send a video generation request with the original data to the server, and the server can generate a video with the original data sent by the electronic device.
  • the server can generate a video with the original data sent by the electronic device.
  • the electronic device Target video so that the electronic device can display the target video after receiving the target video fed back by the server.
  • FIG. 17 shows a schematic diagram of an interaction flow of a video processing system provided by an embodiment of the present disclosure.
  • the video display method may include the following S1710-S1770.
  • the electronic device may receive the original data input by the user's data input operation, or receive the original data sent by other devices to the electronic device.
  • the original data may include at least one of text, link addresses, and multimedia files, which will not be repeated here.
  • the electronic device may display the received raw data.
  • a video generation operation for the original data may be input to the electronic device.
  • the video generation operation may be a trigger operation such as a long press, double click, voice control or expression control on the original data, and the video generation operation may also be a trigger operation such as a click, a long press, and a double click on the video generation trigger control. Do repeat.
  • the electronic device may send a video generation request carrying original data to the server.
  • the server may generate the target video based on the original data in response to the video generation request.
  • the server can automatically obtain the target text through the original data, automatically generate the subtitle text according to the target text, automatically obtain the multimedia material according to the subtitle text, and automatically generate the target video according to the subtitle text and the multimedia material. This will not be repeated.
  • the server may send the target video to the electronic device.
  • the electronic device After receiving the target video sent by the server, the electronic device may display the received target video.
  • FIG. 18 shows a schematic structural diagram of a video display device provided by an embodiment of the present disclosure.
  • the video display apparatus 1800 shown in FIG. 18 may be provided in an electronic device, for example, the electronic device 101 in the client shown in FIG. 1 .
  • electronic devices may include mobile phones, tablet computers, desktop computers, notebook computers, vehicle terminals, wearable devices, all-in-one computers, smart home devices and other devices with communication functions, and may also include devices simulated by virtual machines or simulators.
  • the video display apparatus 1800 may include a first receiving unit 1810 and a first display unit 1820 .
  • the first receiving unit 1810 may be configured to receive a video generation operation for raw data, where the original data is used to obtain target text, and the video generation operation is used to trigger generation of a target video corresponding to the target text.
  • the first display unit 1820 may be configured to display the generated target video in response to the video generation operation, where the video elements of the target video include subtitle text and multimedia material corresponding to the subtitle text, the subtitle text is generated according to the target text, and the multimedia material is obtained according to the subtitle text .
  • a video generation operation for raw data can be received. Since the raw data can be used to obtain target text, the video generation operation can be used to trigger the generation of a target video corresponding to the target text. Therefore, after receiving the video After the generation operation, the target video generated in response to the video generation operation can be displayed, and the video element of the target video can include subtitle text and multimedia material corresponding to the subtitle text, wherein the subtitle text can be automatically generated according to the target text, and the multimedia material can be based on The subtitle text is automatically obtained. It can be seen that rich multimedia materials can be automatically found during the generation of the target video, and users do not need to manually search for materials for making videos, which can not only reduce the time cost of making videos, but also improve the quality of the videos.
  • the video display apparatus 1800 may further include a second display unit, a third receiving unit, and a third display unit.
  • the second display unit may be configured to display a video editing interface in response to the video generation operation, the video editing interface includes editable elements, and the editable elements include at least one of video elements and display effect elements corresponding to the video elements.
  • the third receiving unit may be configured to receive an element modification operation on the editable element.
  • the third display unit may be configured to display a modified target video in response to the element modification operation, where the modified target video includes editable elements displayed in the video editing interface when the element modification operation is completed.
  • the video display apparatus 1800 may further include a fourth display unit, a fourth receiving unit, and a fifth display unit.
  • the fourth display unit may be configured to display an indication mark for indicating that the target video has been generated.
  • the fourth receiving unit may be configured to receive an identification-triggered operation for the indication identification.
  • the fifth display unit may be configured to hide the original data in response to the identification triggering operation.
  • the video display apparatus 1800 may further include a second sending unit and a fifth receiving unit.
  • the second sending unit may be configured to send a video generation request carrying original data to the server, where the video generation request is used to make the server feed back a target video corresponding to the target text based on the original data.
  • the fifth receiving unit may be configured to receive the target video fed back by the server.
  • the video display apparatus 1800 may further include a third acquiring unit, a third generating unit, a fourth acquiring unit, and a fourth generating unit.
  • the third obtaining unit may be configured to obtain the target text according to the original data.
  • the third generating unit may be configured to generate subtitle text according to the target text.
  • the fourth obtaining unit may be configured to obtain the multimedia material corresponding to the subtitle text.
  • the fourth generating unit may be configured to generate the target video according to the subtitle text and the multimedia material.
  • the video element may further include a preset virtual object and a pose of the virtual object, and the pose of the virtual object may be determined according to the subtitle text.
  • the video display apparatus 1800 shown in FIG. 18 can perform various steps in the method embodiments shown in FIGS. 2 to 15 , and implement various processes and processes in the method embodiments shown in FIGS. The effect will not be repeated here.
  • FIG. 19 shows a schematic structural diagram of a video processing apparatus provided by an embodiment of the present disclosure.
  • the video processing apparatus 1900 may be a server, for example, the server 102 in the server shown in FIG. 1 .
  • the server may include a device with storage and computing functions, such as a cloud server or a server cluster.
  • the video processing apparatus 1900 may include a second receiving unit 1910 , a first obtaining unit 1920 , a first generating unit 1930 , a second obtaining unit 1940 , a second generating unit 1950 and a first sending unit 1960 .
  • the second receiving unit 1910 may be configured to receive a video generation request that carries original data and is sent by the electronic device.
  • the first obtaining unit 1920 may be configured to obtain the target text according to the original data in response to the video generation request.
  • the first generating unit 1930 may be configured to generate subtitle text according to the target text.
  • the second obtaining unit 1940 may be configured to obtain the multimedia material corresponding to the subtitle text.
  • the second generating unit 1950 may be configured to generate the target video according to the subtitle text and the multimedia material.
  • the first sending unit 1960 may be configured to send the target video to the electronic device.
  • the present disclosure it is possible to receive a video generation request carrying original data sent by an electronic device, and in response to the video generation request, automatically obtain target text according to the original data, and automatically generate subtitle text according to the target text, and then Automatically obtain the multimedia material corresponding to the subtitle text, and automatically generate the target video according to the subtitle text and multimedia material.
  • the rich multimedia material can be automatically found during the generation of the target video, without the need for the user to manually search for the material for making the video.
  • the time cost of producing the video can be reduced, and the quality of the produced video can also be improved.
  • the first generating unit 1930 may include an abstract extraction subunit and a text typesetting subunit.
  • the abstract extraction subunit can be configured to perform text abstract extraction on the target text to obtain the abstract of the target text.
  • the text typesetting subunit may be configured to perform text typesetting on the abstract to obtain subtitle text corresponding to the abstract.
  • the raw data may include text.
  • the first obtaining unit 1920 may include a text extraction subunit and a first processing subunit.
  • the text extraction subunit can be configured to extract text in the original data.
  • the first processing subunit may be configured to take the text as the target text.
  • the original data may include multimedia files.
  • the first acquisition unit 1920 may include a text conversion subunit and a second processing subunit.
  • the text conversion subunit may be configured to perform text conversion on the multimedia file to obtain converted text.
  • the second processing subunit may be configured to take the converted text as the target text.
  • the original data may include a link address, and the link address may be used to obtain article content.
  • the first acquisition unit 1920 may include an article acquisition subunit, a text extraction subunit, and a third processing subunit.
  • the article obtaining subunit can be configured to obtain the target article based on the link address.
  • the text extraction subunit can be configured to extract article text in the target article.
  • the third processing subunit may be configured to use the article text as the target text.
  • the second obtaining unit 1940 may include a fourth processing subunit and a fifth processing subunit.
  • the fourth processing subunit may be configured to determine a target material with the highest degree of matching with the subtitle text among multiple preset materials, the multiple preset materials include materials obtained according to original data, and the preset materials include images and videos at least one of.
  • the fifth processing subunit may be configured to use the target material as a multimedia material.
  • the second obtaining unit 1940 may include a speech conversion subunit and a sixth processing subunit.
  • the voice conversion subunit may be configured to perform text-to-speech conversion on the subtitle text to obtain subtitle audio corresponding to the subtitle text.
  • the sixth processing subunit may be configured to use the subtitle audio as the multimedia material.
  • the second obtaining unit 1940 may include an emotion classification subunit, a seventh processing subunit, and an eighth processing subunit.
  • the emotion classification subunit may be configured to input the target text into a preset text emotion classification model for classification, and obtain the emotion category to which the target text belongs.
  • the seventh processing subunit may be configured to determine a target background audio corresponding to an emotion category among a plurality of preset background audios.
  • the eighth processing subunit may be configured to use the target background audio as the multimedia material.
  • the video processing apparatus 1900 may further include a content classification unit and a template determination unit.
  • the content classification unit may be configured to input the target text into a preset text content classification model for classification, and obtain the content category to which the target text belongs.
  • the template determining unit may be configured to determine a target video template corresponding to a content category from among a plurality of preset video templates.
  • the second generating unit 1950 may be further configured to generate the target video according to the subtitle text, the multimedia material and the target video template.
  • the video processing apparatus 1900 shown in FIG. 19 can execute various steps in the method embodiment shown in FIG. 16 , and realize various processes and effects in the method embodiment shown in FIG. 16 , which will not be described here. Repeat.
  • Embodiments of the present disclosure also provide a computing device.
  • the computing device may include a processor and a memory, and the memory may be used to store executable instructions.
  • the processor may be configured to read the executable instructions from the memory, and execute the executable instructions to implement the video display method or the video processing method in the foregoing embodiments.
  • FIG. 20 shows a schematic structural diagram of a computing device provided by an embodiment of the present disclosure. Referring specifically to FIG. 20 below, it shows a schematic structural diagram of a computing device 2000 suitable for implementing an embodiment of the present disclosure.
  • the computing device may be an electronic device or a server.
  • Electronic devices may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Tablet Computers), PMPs (Portable Multimedia Players), in-vehicle terminals (eg, in-vehicle navigation terminals), Mobile terminals such as wearable devices, etc., and stationary terminals such as digital TVs, desktop computers, smart home devices, and the like.
  • the server may include a device with storage and computing functions, such as a cloud server or a server cluster.
  • computing device 2000 shown in FIG. 20 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
  • the computing device 2000 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 2001, which may be loaded into a random access device according to a program stored in a read only memory (ROM) 2002 or from a storage device 2008 Various appropriate actions and processes are executed by accessing the program in the memory (RAM) 2003 .
  • ROM read only memory
  • RAM memory
  • various programs and data required for the operation of the computing device 2000 are also stored.
  • the processing device 2001, the ROM 2002, and the RAM 2003 are connected to each other through a bus 2004.
  • An input/output (I/O) interface 2005 is also connected to the bus 2004 .
  • the following devices can be connected to the I/O interface 2005: input devices 2006 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 2007 such as a computer; a storage device 2008 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 2009. Communication means 2009 may allow computing device 2000 to communicate wirelessly or by wire with other devices to exchange data. While FIG. 20 shows computing device 2000 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
  • Embodiments of the present disclosure also provide a computer-readable storage medium, where the storage medium stores a computer program, and when the computer program is executed by the processor, enables the processor to implement the video display method or the video processing method in the foregoing embodiments.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication device 2009, or from the storage device 2008, or from the ROM 2002.
  • the above-mentioned functions defined in the video display method of the embodiments of the present disclosure are executed or the above-mentioned functions defined in the video processing methods of the embodiments of the present disclosure are executed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal in baseband or propagated as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
  • clients, servers can communicate using any currently known or future developed network protocol, such as HTTP, and can be interconnected with any form or medium of digital data communication (eg, a communication network).
  • a communication network examples include local area networks (“LAN”), wide area networks (“WAN”), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned computing device; or may exist alone without being assembled into the computing device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the computing device, the computing device is made to execute:
  • Receive a video generation operation for the original data the original data is used to obtain the target text, and the video generation operation is used to trigger the generation of the target video corresponding to the target text;
  • the generated target video is displayed, and the video elements of the target video include subtitles
  • the multimedia material corresponding to the text and the subtitle text the subtitle text is generated according to the target text, and the multimedia material is obtained according to the subtitle text;
  • computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and also conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments of the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner. Among them, the name of the unit does not constitute a limitation of the unit itself under certain circumstances.
  • exemplary types of hardware logic components include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs Systems on Chips
  • CPLDs Complex Programmable Logical Devices
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

一种视频显示及处理方法、装置、系统、设备、介质。其中,视频显示方法包括:接收针对原始数据的视频生成操作,原始数据用于获取目标文本,视频生成操作用于触发生成目标文本对应的目标视频(S210);响应于视频生成操作,显示生成的目标视频,目标视频的视频元素包括字幕文本和字幕文本对应的多媒体素材,字幕文本根据目标文本生成,多媒体素材根据字幕文本获取(S220)。根据上述视频显示及处理方法、装置、系统、设备、介质,能够降低制作视频的时间成本,并且提高制作的视频质量。

Description

视频显示及处理方法、装置、系统、设备、介质
本申请要求于2020年12月07日提交中国专利局、申请号为202011437788.4、申请名称为“视频显示及处理方法、装置、系统、设备、介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及多媒体技术领域,尤其涉及一种视频显示及处理方法、装置、系统、设备、介质。
背景技术
随着计算机技术和移动通信技术的迅速发展,基于电子设备的各种视频平台得到了普遍应用,极大地丰富了人们的日常生活。越来越多的用户乐于在视频平台上分享自己的视频作品,以供其他用户观看。
在相关技术中,在制作视频时,用户首先需要自行寻找素材,然后对素材进行一系列复杂的视频剪辑操作,最终生成一个视频作品。如果用户寻找到的素材不够丰富,用户手动剪辑的视频则无法保证质量,且手动剪辑操作步骤繁杂耗时,使得制作视频的时间成本也较高。
发明内容
为了解决上述技术问题或者至少部分地解决上述技术问题,本公开提供了一种视频显示及处理方法、装置、系统、设备、介质,能够降低制作视频的时间成本。
第一方面,本公开提供了一种视频显示方法,包括:
接收针对原始数据的视频生成操作,原始数据用于获取目标文本,视频生成操作用于触发生成目标文本对应的目标视频;
响应于视频生成操作,显示生成的目标视频,目标视频的视频元素包括字幕文本和字幕文本对应的多媒体素材,字幕文本根据目标文本生成,多媒体素材根据字幕文本获取。
第二方面,本公开提供了一种视频处理方法,包括:
接收电子设备发送的携带有原始数据的视频生成请求;
响应于视频生成请求,根据原始数据,获取目标文本;
根据目标文本,生成字幕文本;
获取字幕文本对应的多媒体素材;
根据字幕文本和多媒体素材,生成目标视频;
向电子设备发送目标视频。
第三方面,本公开提供了一种视频显示装置,包括:
第一接收单元,配置为接收针对原始数据的视频生成操作,原始数据用于获取目标文本,视频生成操作用于触发生成目标文本对应的目标视频;
第一显示单元,配置为响应于视频生成操作,显示生成的目标视频,目标视频的视频元素包括字幕文本和字幕文本对应的多媒体素材,字幕文本根据目标文本生成,多媒体素材根据字幕文本获取。
第四方面,本公开提供了一种视频处理装置,包括:
第二接收单元,配置为接收电子设备发送的携带有原始数据的视频生成请求;
第一获取单元,配置为响应于视频生成请求,根据原始数据,获取目标文本;
第一生成单元,配置为根据目标文本,生成字幕文本;
第二获取单元,配置为获取字幕文本对应的多媒体素材;
第二生成单元,配置为根据字幕文本和多媒体素材,生成目标视频;
第一发送单元,配置为向电子设备发送目标视频。
第五方面,本公开提供了一种视频处理系统,包括电子设备和服务器,其中:
电子设备用于接收针对原始数据的视频生成操作,原始数据用于获取目标文本,视频生成操作用于触发生成目标文本对应的目标视频;响应于视频生成操作,向服务器发送携带有原始数据的视频生成请求;接收服务器发送的目标视频;显示目标视频;
服务器用于接收电子设备发送的视频生成请求;响应于视频生成请求,根据原始数据,获取目标文本;根据目标文本,生成字幕文本;获取字幕文本对应的多媒体素材;根据字幕文本和多媒体素材,生成目标视频;向电子设备发送目标视频。
第六方面,本公开提供了一种计算设备,包括:
处理器;
存储器,用于存储可执行指令;
其中,处理器用于从存储器中读取可执行指令,并执行可执行指令以实现第一方面所述的视频显示方法或者第二方面所述的视频处理方法。
第七方面,本公开提供了一种计算机可读存储介质,该存储介质存储有计算机程序,当计算机程序被处理器执行时,使得处理器实现第一方面所述的视频显示方法或者第二方面所述的视频处理方法。
第八方面,本公开提供了一种计算机程序产品,该计算机程序产品包括承载在计算机可读介质上的计算机程序,当计算机程序被处理器执行时,使得处理器实现第一方面所述的视频显示方法或者第二方面所述的视频处理方法。
本公开实施例提供的技术方案与现有技术相比至少具有如下优点:
本公开实施例的视频显示及处理方法、装置、系统、设备、介质,能够接收针对原始数据的视频生成操作,由于该原始数据可以用于获取目标文本,该视频生成操作可以用于触发生成目标文本对应的目标视频,因此,在接收到视频生成操作之后,可以显示响应于视频生成操作而生成的目标视频,该目标视频的视频元素可以包括字幕文本和字幕文本对应的多媒体素材,其中,字幕文本可以根据目标文本自动生成,多媒体素材可以根据字幕文本自动获取,可见,在目标视频的生成过程中可以自动寻找到丰富的多媒体素材,无需用户人工寻找制作视频的素材,不但能够降低制作视频的时间成本,还能够提高制作的视频质量。
附图说明
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。
图1为本公开实施例提供的一种视频制作场景的架构图;
图2为本公开实施例提供的一种视频显示方法的流程示意图;
图3为本公开实施例提供的一种原始数据输入界面的示意图;
图4为本公开实施例提供的另一种原始数据输入界面的示意图;
图5为本公开实施例提供的又一种原始数据输入界面的示意图;
图6为本公开实施例提供的一种视频显示界面的示意图;
图7为本公开实施例提供的另一种视频显示界面的示意图;
图8为本公开实施例提供的又一种视频显示界面的示意图;
图9为本公开实施例提供的再一种视频显示界面的示意图;
图10为本公开实施例提供的再一种视频显示界面的示意图;
图11为本公开实施例提供的再一种原始数据输入界面的示意图;
图12为本公开实施例提供的再一种原始数据输入界面的示意图;
图13为本公开实施例提供的又一种视频显示界面的示意图;
图14为本公开实施例提供的另一种视频显示方法的流程示意图;
图15为本公开实施例提供的又一种视频显示界面的示意图;
图16为本公开实施例提供的一种视频处理方法的流程示意图;
图17为本公开实施例提供的一种视频处理系统的交互流程示意图;
图18为本公开实施例提供的一种视频显示装置的结构示意图;
图19为本公开实施例提供的一种视频处理装置的结构示意图;
图20为本公开实施例提供的一种计算设备的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
本公开所提供的视频显示及处理方法可以应用于图1所示的架构中,具体结合图1进行详细说明。
图1示出了本公开实施例提供的一种视频制作场景的架构图。
如图1所示,该架构图中可以包括客户端的至少一个电子设备101以及服务端的至少一个服务器102。电子设备101可以通过网络协议如超文本传输安全协议(Hyper Text Transfer Protocol over Secure Socket Layer,HTTPS)与服务器102建立连接并进行信息交互。其中,电子设备101可以包括移动电话、平板电脑、台式计算机、笔记本电脑、车载终端、可穿戴设备、一体机、智能家居设备等具有通信功能的设备,也可以包括虚拟机或者模拟器模拟的设备。服务器102可以包括云服务器或者服务器集群等具有存储及计算功能的设备。
基于上述架构,用户可以在电子设备101上的指定平台内制作视频,指定平台可以为指定应用程序或者指定网站。用户可以在制作好视频后,向指定平台的服务器102发送该视频,服务器102可以接收电子设备101发送的视频,并且存储接收到的视频,以将该视频发送给需要播放该视频的电子设备。
在本公开实施例中,为了降低制作视频的时间成本以及提高制作的视频质量,电子设备101能够接收用户针对原始数据的视频生成操作,由于该原始数据可以用于获取目标文本,该视频生成操作可以用于触发生成目标文本对应的目标视频,因此,在电子设备101接收到该视频生成操作之后,可以显示响应于视频生成操作而生成的目标视频,该目标视频的视频元素可以包括字幕文本和字幕文本对应的多媒体素材,其中,字幕文本可以根据目标文本自动生成,多媒体素材可以根据字幕文本自动获取,可见,在目标视频的生成过程中可以自动寻找到丰富的多媒体素材,无需用户人工寻找制作视频的素材,不但能够降低制作视频的时间成本,还能够提高制作的视频质量。
在一种可能的实施方式中,基于上述架构,电子设备101可以在接收到该视频生成操作之后,可以根据原始数据,获取目标文本,并且根据目标文本,生成字幕文本,进而获取字幕文本对应的多媒体素材,以根据字幕文本和多媒体素材,生成目标视频,从而在电子设备101本地基于原始数据获取目标文本并生成目标文本对应的目标视频,以进一步降低制作视频的时间成本。
在一种可能的实施方式中,基于上述架构,电子设备101还可以在接收到该视频生成操作之后,向服务器102发送携带有原始数据的视频生成请求。服务器102可以在接收到电子设备101发送的携带有原始数据的视频生成请求之后,响应于视频生成请求,根据原始数据,获取目标文本,并且根据目标文本,生成字幕文本,进而获取字幕文本对应的多媒体素材,以根据字幕文本和多媒体素材,生成目标视频,并向电子设备101发送生成的目标视频,从而使电子设备101可以请求服务器102基于原始数据获取目标文本并生成目标文本对应的目标视频,以进一步提高制作的视频质量并且降低电子设备101的数据处理量。
下面首先根据上述架构,结合图2至图15对本公开实施例提供的视频显示方法进行说明。在本公开实施例中,该视频显示方法可以由电子设备执行,例如图1中所示的客户端中的电子设备101。其中,电子设备可以包括移动电话、平板电脑、台式计算机、笔记本电脑、车载终端、可穿戴设备、一体机、智能家居设备等具有通信功能的设备,也可以包括虚拟机或者模拟器模拟的设备。
图2示出了本公开实施例提供的一种视频显示方法的流程示意图。
如图2所示,该视频显示方法可以包括下文S210-S220。
S210、接收针对原始数据的视频生成操作,原始数据用于获取目标文本,视频生成操作用于触发生成目标文本对应的目标视频。
在本公开实施例中,目标文本可以为原始数据所涉及的全部文本内容。原始数据可以为用户输入的数据,也可以为其他设备向电子设备发送的数据。
在本公开一些实施例中,在原始数据为用户输入的数据的情况下,在S210之前,该视频显示方法还可以包括:
接收用户的数据输入操作,数据输入操作用于输入原始数据;
响应于数据输入操作,实时显示用户输入的原始数据。
在一种可能的实施方式中,用户输入操作可以包括对原始数据的添加操作,也可以包括对原始数据的录入操作,在此不作限制。
具体地,用户可以在电子设备上触发数据输入操作,以向电子设备输入其想要输入的原始数据,电子设备在接收到数据输入操作之后,可以实时地响应数据输入操作,并实时地显示用户输入的原始数据。
在一些实施例中,原始数据可以包括文字。在一种可能的实施方式中,电子设备可以显示用于录入文字的第一输入框,用户可以在第一输入框内,向电子设备输入用于录入文字的录入操作,使得电子设备可以将用户输入的文字显示于第一输入框内。例如,第一输入框可以用于输入文章标题和文章内容等文字,用户可以在第一输入框内录入文章标题和文章内容。
图3示出了本公开实施例提供的一种原始数据输入界面的示意图。如图3所示,原始数据输入界面301内可以显示有多个第一输入框,例如“文章标题”输入框302和“文章内容”输入框303。用户可以在“文章标题”输入框302内进行录入操作,以录入文章标题,还可以在“文章内容”输入框303内进行录入操作,以录入文章内容。
图4示出了本公开实施例提供的另一种原始数据输入界面的示意图。如图4所示,原始数据输入界面401内可以显示有多个第一输入框,例如“文章标题”输入框402和“字幕”输入框403。用户可以在“文章标题”输入框402内进行录入操作,以录入文章标题,还可以在“字幕”输入框403内进行录入操作,以录入需要在视频中显示的字幕。
图5示出了本公开实施例提供的又一种原始数据输入界面的示意图。如图5所示,原始数据输入界面501内可以显示有多个第一输入框,例如“文章标题”输入框502和“文章内容”输入框503。用户可以在“文章标题”输入框502内进行录入操作,以录入文章标题,还可以在“文章内容”输入框503内进行录入操作,以录入文章内容。
由此,在以上这些实施例中,目标文本可以通过原始数据中的文字获取。
在另一些实施例中,原始数据还可以包括链接地址,链接地址可以用于获取文章内容。在一种可能的实施方式中,电子设备可以显示用于录入链接地址的第二输入框,用户可以在第二输入框内,向电子设备输入用于录入链接地址的录入操作,使得电子设备可以将用户输入的链接地址显示于第二输入框内。例如,第二输入框可以用于输入文章的网址或身份标识(Identity Document,ID)等链接地址,用户可以在第二输入框内录入文章的网址或ID。
需要说明的是,链接地址可以为网址、ID等任意形式的字符串,只要是该字符串能够用于获取用户需要的文章内容即可,在此不作限制。
继续参见图3,原始数据输入界面301内可以显示有第二输入框,例如“文章链接”输入框304。用户可以在“文章链接”输入框304内进行录入操作,以录入文章的网址或者ID。
继续参见图5,原始数据输入界面501内可以显示有第二输入框,例如“文章链接”输入框504。用户可以在“文章链接”输入框504内进行录入操作,以录入文章的网址或者ID。
由此,在以上这些实施例中,目标文本可以通过基于原始数据中的链接地址所获取的文章内容获取。
在又一些实施例中,原始数据还可以包括链接地址,链接地址还可以用于获取视频内容。
需要说明的是,用户录入用于获取视频内容的链接地址的方法与上述的录入用于获取文章内容的链接地址的方法相似,在此不做赘述。
由此,在这些实施例中,基于原始数据中的链接地址所获取的视频内容可以用于获取目标文本。
在再一些实施例中,原始数据可以包括多媒体文件。在一种可能的实施方式中,多媒体文件可以包括图像文件、音频文件和视频文件中的至少一种。电子设备可以显示用于添加多媒体文件的添加控件,用户可以通过添加按钮向电子设备输入用于添加多媒体文件的添加操作,使得电子设备可以显示用户添加的多媒体文件。例如,添加控件可以为添加按钮,添加操作可以包括对添加按钮的点击、长按、双击等触发操作、对多媒体文件的点击、长按、选中等选择操作以及对选择确认按钮的点击、长按、双击等触发操作,用户可以点击添加按钮进入多媒体文件选择界面,在多媒体文件选择界面内浏览多媒体文件并点击需要的多媒体文件,最后点击选择确认按钮,以完成对多媒体文件的添加操作。
继续参见图4,原始数据输入界面401内可以显示有多个添加控件,例如“图像素材”添加控件404。用户可以通过“图像素材”添加控件404进行添加操作,以添加图片文件。
继续参见图5,原始数据输入界面501内可以显示有添加控件,例如“图像/视频素材”添加控件505。用户可以通过“图像/视频素材”添加控件505进行添加操作,以添加图片文件或者视频文件。
在一个示例中,以用户通过“图像/视频素材”添加控件505添加视频文件为例,视频文件可以包括用户实时拍摄的视频文件,也可以包括用户在电子设备本地存储的视频文件中所指定的视频文件,还可以包括基于本公开实施例所生成的视频文件,还可以包括对基 于本公开实施例所生成的视频文件进行编辑之后获得的视频文件等等,进而可以对视频文件进行进一步的优化编辑。
由此,在以上这些实施例中,目标文本可以通过原始数据中的多媒体文件获取。
在本公开另一些实施例中,在接收用户的数据输入操作之前,该视频显示方法还可以包括:
接收用户的模式选择操作,模式选择操作用于用户选择输入模式;
响应于模式选择操作,显示所选择的输入模式对应的原始数据输入界面。
相应地,接收用户的数据输入操作可以具体包括:
在所显示的原始数据输入界面内,接收用户输入的所选择的输入模式对应的数据输入操作。
在一种可能的实施方式中,输入模式可以包括自动录入输入模式和手动录入输入模式。其中,在自动录入输入模式下,用户可以输入上述原始数据,使得目标文本可以通过原始数据获取。在手动录入输入模式下,用户可以直接输入用于生成目标视频的多媒体素材和字幕文本。
在一些实施例中,模式选择操作可以包括用户输入的触发开启不同输入模式的手势操作。具体地,电子设备可以预先设置有多个输入模式和多个手势操作,一个手势操作可以用于触发开启一种对应的输入模式。用户可以确定想要选择的输入模式,并且向电子设备输入所选择的输入模式对应的手势操作,使电子设备在接收到用户输入的手势操作之后,开启接收到的手势操作对应的输入模式,并且显示已开启的输入模式对应的原始数据输入界面,该原始数据输入界面内可以显示有已开启的输入模式所支持的用于输入原始数据控件,用户可以通过所显示的控件输入该控件对应的数据输入操作。
在另一些实施例中,模式选择操作可以包括用户对不同输入模式的选择控件的选中操作,例如对选择控件的点击、长按、双击等操作。具体地,电子设备可以显示多个选择控件,一个选择控件可以对应一种输入模式,用户可以确定想要选择的输入模式,并且对所选择的输入模式对应的选择控件输入选中操作,使电子设备在接收到用户输入的选中操作之后,使选中操作所选中的选择控件显示为选中状态,同时开启处于选中状态的选择控件对应的输入模式,并且显示已开启的输入模式对应的原始数据输入界面,该原始数据输入界面内可以显示有已开启的输入模式所支持的用于输入原始数据控件,用户可以通过所显示的控件输入该控件对应的数据输入操作。
继续参见图3,原始数据输入界面301内可以显示有多个选择控件,例如“自动录入”选择控件305和“手动录入”选择控件306。在用户对“自动录入”选择控件305输入选中操作的情况下,原始数据输入界面301内可以显示有自动录入输入模式对应的控件,如“文章标题”输入框302、“文章内容”输入框303和“文章链接”输入框304。
继续参见图4,原始数据输入界面401内可以显示有多个选择控件,例如“自动录入”选择控件405和“手动录入”选择控件406。在用户对“手动录入”选择控件406输入选中操作的情况下,原始数据输入界面401内可以显示有手动录入输入模式对应的控件,如“文章标题”输入框402、“字幕”输入框403和“图像素材”添加控件405。
由此,用户可以在“文章标题”输入框402和“字幕”输入框403内分别手动输入文 章标题和字幕文本。
其中,一个页面可以对应一个“图像素材”添加控件404,一个“图像素材”添加控件404可以对应至少一个“字幕”输入框403。具体地,可以为每个页面序号对应的页面设置一个页面编辑区域407,该页面的至少一个“字幕”输入框403和“图像素材”添加控件404可以位于该页面的页面编辑区域407内。例如,页面序号“1”右侧可以对应设置有一个页面编辑区域407,用户通过该页面编辑区域407内的“字幕”输入框403和“图像素材”添加控件404手动录入的字幕文本和图像,均为页面1的字幕文本和图像。并且页面1的字幕文本和图像的显示顺序对应于对应的“字幕”输入框403和“图像素材”添加控件405的设置顺序。
在一种可能的实施方式中,在手动录入输入模式下,原始数据输入界面401还可以显示有素材添加控件,例如“新增”按钮408,使用户可以通过“新增”按钮408增加新页面对应的“字幕”输入框403和“图像素材”添加控件404,或者增加已显示的页面内的“字幕”输入框403。
在一种可能的实施方式中,在手动录入输入模式下,原始数据输入界面401还可以显示有字幕删除控件,例如“-”按钮409,一个“-”按钮409对应一个“字幕”输入框403,用户可以通过“-”按钮409删除对应的“字幕”输入框403。
由此,用户可以在手动录入输入模式下,依次录入文章标题、每个分句的字幕以及每个字幕对应的图像素材。
继续参见图5,原始数据输入界面501内可以显示有多个选择控件,例如“自动录入”选择控件506和“手动录入”选择控件507。在用户对“自动录入”选择控件506输入选中操作的情况下,原始数据输入界面501内可以显示有自动录入输入模式对应的控件,如“文章标题”输入框502、“文章内容”输入框503、“文章链接”输入框504和“图像/视频素材”添加控件505。
需要说明的是,在用户对“手动录入”选择控件507输入选中操作的情况下,原始数据输入界面501内所显示的手动录入输入模式对应的控件与图3相似,在此不做赘述。
在本公开又一些实施例中,在原始数据为其他设备向电子设备发送的数据的情况下,在S210之前,该视频显示方法还可以包括:接收并显示原始数据。
在一种可能的实施方式中,原始数据可以包括文字、链接地址和多媒体文件中的至少一种,在此不做赘述。
综上,在本公开实施例中,电子设备可以为用户提供丰富的原始数据的输入方式,以供用户根据需要进行选择,进一步提高了用户的体验。
回到S210,用户可以向电子设备输入针对原始数据的视频生成操作,以触发生成并显示目标文本对应的目标视频。
具体地,视频生成操作可以为对原始数据的长按、双击、语音控制或者表情控制等触发操作,视频生成操作也可以为对视频生成触发控件的点击、长按、双击等触发操作。
继续参见图3,原始数据输入界面301内可以显示有视频生成触发控件,如“生成视频”按钮307,用户可以对“生成视频”按钮307输入触发操作,以触发生成并显示原始数据所涉及的目标文本对应的目标视频。
继续参见图4原始数据输入界面401内可以显示有视频生成触发控件,如“生成视频”按钮410,用户可以对“生成视频”按钮410输入触发操作,以触发生成并显示原始数据所涉及的目标文本对应的目标视频。
继续参见图5,原始数据输入界面501内可以显示有视频生成触发控件,如“生成视频”按钮508,用户可以对“生成视频”按钮508输入触发操作,以触发生成并显示原始数据所涉及的目标文本对应的目标视频。
S220、响应于视频生成操作,显示生成的目标视频,目标视频的视频元素包括字幕文本和字幕文本对应的多媒体素材,字幕文本根据目标文本生成,多媒体素材根据字幕文本获取。
具体地,电子设备可以在接收到视频生成操作之后,响应于视频生成操作,显示基于原始数据所涉及的目标文本生成的目标视频。
在一种可能的实施方式中,电子设备可以全屏显示目标视频,也可以在部分显示区域内显示目标视频。其中,在电子设备包括一个显示屏幕的情况下,部分显示区域可以为该显示屏幕中的一部分显示区域。在电子设备包括两个及两个以上显示屏幕的情况下,部分显示区域可以为任一个显示屏幕中的一部分显示区域,也可以为任一个显示屏幕。
图6示出了本公开实施例提供的一种视频显示界面的示意图。如图6所示,电子设备可以在接收到视频生成操作之后,响应于视频生成操作,显示视频显示界面601,并且在视频显示界面601内全屏显示目标视频的播放窗口602。
图7示出了本公开实施例提供的另一种视频显示界面的示意图。如图7所示,电子设备可以在接收到视频生成操作之后,响应于视频生成操作,显示视频显示界面701,并且在视频显示界面701的中部显示区域可以显示目标视频的播放窗口702。
在本公开实施例中,目标视频的视频元素可以包括至少一个字幕文本,一个字幕文本可以对应至少一个多媒体素材。其中,多媒体素材可以包括图像、视频、音频等中的至少一个。
进一步地,字幕文本可以根据通过原始数据获取的目标文本自动生成。
在一些实施例中,字幕文本可以由电子设备生成。在另一些实施例中,字幕文本也可以由服务器生成,将在后文详细描述。其中,服务器可以为图1中所示的服务端中的服务器102。
进一步地,多媒体素材可以根据字幕文本自动从多个本地或者互联网中的素材中获取。
在一些实施例中,多媒体素材可以由电子设备获取。在另一些实施例中,多媒体素材也可以由服务器获取,将在后文详细描述。
在本公开实施例中,能够接收针对原始数据的视频生成操作,由于该原始数据可以用于获取目标文本,该视频生成操作可以用于触发生成目标文本对应的目标视频,因此,在接收到视频生成操作之后,可以显示响应于视频生成操作而生成的目标视频,该目标视频的视频元素可以包括字幕文本和字幕文本对应的多媒体素材,其中,字幕文本可以根据目标文本自动生成,多媒体素材可以根据字幕文本自动获取,可见,在目标视频的生成过程中可以自动寻找到丰富的多媒体素材,无需用户人工寻找制作视频的素材,不但能够降低制作视频的时间成本,还能够提高制作的视频质量。
在本公开另一种实施方式中,为了进一步提高用户的体验,在S210之后,该视频显示方法还可以包括:
响应于视频生成操作,显示视频编辑界面,视频编辑界面包括可编辑元素,可编辑元素包括视频元素和视频元素对应的显示效果元素中的至少一种;
接收对可编辑元素的元素修改操作;
响应于元素修改操作,显示修改后的目标视频,修改后的目标视频包括元素修改操作完成时在视频编辑界面内显示的可编辑元素。
具体地,电子设备可以在接收到视频生成操作之后,响应于视频生成操作,显示用于调整目标视频的视频编辑界面。
在一种可能的实施方式中,用户可以在视频编辑界面内调整目标视频的视频元素和视频元素对应的显示效果元素中的至少一种。因此,可以将视频元素和视频元素对应的显示效果元素中的至少一种作为视频编辑界面内处于可编辑状态的可编辑元素。
进一步地,显示效果元素可以包括转场效果元素、播放效果、特效效果和修饰效果元素等。具体地,修饰效果元素可以包括对视频元素的色调、大小、对比度、色彩、修饰文字等起到对视频元素的修饰作用的效果元素。
由此,用户可以在视频编辑界面内,对想要调整的可编辑元素输入元素修改操作,以使电子设备实时地显示用户在进行元素修改操作的过程中所调整后的可编辑元素,进而在用户完成元素修改操作后,显示根据用户完成元素修改操作时在视频编辑界面内显示的可编辑元素所生成的修改后的目标视频。
在一种可能的实施方式中,元素修改操作可以包括对字幕文本的修改操作、对字幕文本的添加操作、对字幕文本的删除操作、对多媒体素材中的图像素材的替换操作、对多媒体素材中的图像素材的添加操作、对多媒体素材中的图像素材的删除操作、对多媒体素材中的视频素材的替换操作、对多媒体素材中的视频素材的添加操作、对多媒体素材中的视频素材的删除操作、对多媒体素材中的背景音频的替换操作、对多媒体素材中的背景音频的删除操作、对字幕文本的修饰效果的调整操作、对字幕文本的转场效果的修改操作、对图像素材的修饰效果的调整操作、对图像素材的转场效果的修改操作、对视频素材的修饰效果的调整操作、对视频素材的转场效果的修改操作等中的至少一种,在此不作限制。
进一步地,在视频编辑界面内还可以显示有完成指示控件,用户可以对完成指示控件输入点击、长按、双击等完成触发操作,当用户对完成指示控件输入完成触发操作后,使电子设备可以响应于接收到用户对完成指示控件输入的完成触发操作,确定用户已经完成元素修改操作,并且显示根据用户完成元素修改操作时在视频编辑界面内显示的可编辑元素所生成的修改后的目标视频。
在本公开一些实施例中,视频编辑界面可以与目标视频显示于相同的界面内。例如,在视频显示界面内显示目标视频的播放窗口,在播放窗口的下方显示视频编辑界面。
图8示出了本公开实施例提供的又一种视频显示界面的示意图。如图8所示,电子设备可以在接收到视频生成操作之后,响应于视频生成操作,显示视频显示界面801,并且在视频显示界面801内显示目标视频的播放窗口802,在播放窗口802的下方显示视频编 辑界面803。
在一种可能的实施方式中,视频显示界面801的右侧可以显示有滚动条804,用户可通过拖动滚动条804,查看视频编辑界面803的内容。
用户可以在视频编辑界面803内,对可编辑元素如文章标题、字幕文本、图像素材等进行元素修改操作。例如,通过“文章标题”输入框805对文章标题进行修改操作,通过“新增”按钮806对新页面的字幕文本进行添加操作或者对已显示的页面的字幕文本进行添加操作,通过“字幕”输入框807对字幕文本进行修改操作,通过“-”按钮808对字幕文件进行删除操作,通过“图像素材”添加控件809对图像素材和视频素材进行替换操作和添加操作等。
在一种可能的实施方式中,视频编辑界面803的底部可以显示有完成指示控件,如“提交修改”按钮810,用户可以对“提交修改”按钮810输入完成触发操作,以触发生成并显示修改后的目标视频。
图9示出了本公开实施例提供的再一种视频显示界面的示意图。如图9所示,电子设备可以在接收到视频生成操作之后,响应于视频生成操作,显示视频显示界面901,并且在视频显示界面901内显示目标视频的播放窗口902,在播放窗口902的下方显示视频编辑界面903。
用户可以在视频编辑界面903内,对可编辑元素如文章标题、字幕文本、图像素材等进行元素修改操作。例如,通过“文章标题”输入框904对文章标题进行修改操作,通过“新增”按钮905对新页面的字幕文本进行添加操作或者对已显示的页面的字幕文本进行添加操作,通过“字幕”输入框906对字幕文本进行修改操作,通过“-”按钮907对字幕文件进行删除操作,通过“图像素材”添加控件908对图像素材和视频素材进行替换操作和添加操作等。
在一种可能的实施方式中,视频编辑界面903的右侧可以显示有滚动条909,用户可通过拖动滚动条909,查看视频编辑界面903内未显示的内容。
在一种可能的实施方式中,视频编辑界面903的底部可以显示有完成指示控件,如“提交修改”按钮910,用户可以对“提交修改”按钮910输入完成触发操作,以触发生成并显示修改后的目标视频。
图10示出了本公开实施例提供的再一种视频显示界面的示意图。在电子设备具有第一显示屏幕912和第二显示屏幕913的情况下,电子设备可以在接收到视频生成操作之后,响应于视频生成操作,显示视频显示界面901,并且在视频显示界面901内显示目标视频的播放窗口902,在播放窗口902的下方显示视频编辑界面903。其中,播放窗口902可以位于第一显示屏幕912内,视频编辑界面903可以位于第二显示屏幕913内。
需要说明的是,视频编辑界面903与图9所示的实施例相似,在此不做赘述。
在一种可能的实施方式中,视频显示界面内还可以显示有视频导出控件,若对视频效果满意,用户可以对视频导出控件输入点击、长按、双击等导出触发操作,使电子设备响应于接收到用户对视频导出控件输入的导出触发操作,将视频显示界面内所显示的视频保存至电子设备本地。
继续参见图8,视频编辑界面803的底部可以显示有视频导出控件,如“导出视频” 按钮811,在一种可能的实施方式中,“导出视频”按钮811可以位于“提交修改”按钮810的右侧。用户可以对“导出视频”按钮810输入导出触发操作,使电子设备将视频显示界面811内所显示的视频保存至电子设备本地。
继续参见图9和图10,视频显示界面901的底部可以显示有视频导出控件,如“导出视频”按钮911,用户可以对“导出视频”按钮911输入导出触发操作,使电子设备将视频显示界面901内所显示的视频保存至电子设备本地。
由此,在本公开实施例中,用户可以在同一页面中查看视频和调整后的视频内的相关元素,以提升用户的体验。
在本公开另一些实施例中,视频编辑界面可以与目标视频显示于不相同的界面内。例如,在视频显示界面内显示目标视频的播放窗口和修改触发控件,例如“视频修改”按钮,用户可以对修改触发控件输入点击、长按、双击等修改触发操作,使电子设备可以响应于接收到用户对修改触发控件输入的修改触发操作,由视频显示界面跳转至显示视频编辑界面。
其中,视频编辑界面与上述的实施例相同,在此不做赘述。
在一种可能的实施方式中,当电子设备确定用户完成元素修改操作后,还可以由视频编辑界面跳转至显示视频显示界面,以显示根据用户完成元素修改操作时在视频编辑界面内显示的可编辑元素所生成的修改后的目标视频。
在一种可能的实施方式中,视频显示界面内还可以显示有视频导出控件,用户可以对视频导出控件输入点击、长按、双击等导出触发操作,使电子设备响应于接收到用户对视频导出控件输入的导出触发操作,将视频显示界面内所显示的视频保存至电子设备本地。
由此,在本公开实施例中,可以实现对目标视频和视频编辑界面的独立显示,提升用户的体验。
在本公开又一种实施方式中,为了进一步提升用户的体验,在S210之后,在S220中的显示生成的目标视频之前,该视频显示方法还可以包括:
显示指示标识,指示标识用于指示已生成目标视频;
接收对指示标识的标识触发操作;
响应于标识触发操作,隐藏原始数据。
具体地,电子设备在目标视频生成后,可以不直接显示该目标视频,而是显示用于指示目标视频已经生成的指示标识。用户在看到该指示标识之后,可以了解目标视频已经制作完成,并且可以向电子设备输入对指示标识的标识触发操作,以触发显示已经生成的目标视频。电子设备在接受到用户对指示标识的标识触发操作之后,可以响应于该标识触发操作,隐藏当前显示的原始数据,并且显示视频显示界面,视频显示界面内可以显示有目标视频。
在一种可能的实施方式中,标识触发操作可以为对指示标识的点击、长按、双击等触发操作,在此不作限制。
下面以一个示例进行详细说明。
图11示出了本公开实施例提供的再一种原始数据输入界面的示意图。图12示出了本公开实施例提供的再一种原始数据输入界面的示意图。图13示出了本公开实施例提供的又 一种视频显示界面的示意图。
参见图11,电子设备的显示屏幕的顶部显示区域可以显示有多个页面标识,例如“输入文章”页面标识1101和“输出视频”页面标识1102,当“输入文章”页面标识1101显示为选中状态时,顶部显示区域的下方可以显示有原始数据输入界面1103。其中,原始数据输入界面1103的显示内容和交互方式与图3和图4相似,在此不做赘述。
参见图12,当用户完成原始数据的输入并且对“生成视频”按钮输入触发操作之后,电子设备可以等待生成目标视频,在电子设备等待生成目标视频的过程中,可以使原始数据输入界面1103显示为不可操作状态,例如置灰显示。在电子设备确定目标视频已经生成后,可以在“输出视频”1102页面标识的后方显示指示标识,例如“☆”图标1104,用户可以对“☆”图标1104输入标识触发操作,以使电子设备显示目标视频。
参见图13,当电子设备接收到用户对“☆”图标1104的标识触发操作之后,顶部显示区域的下方可以隐藏原始数据输入界面1103,并在隐藏原始数据输入界面1103之后,在顶部显示区域的下方显示视频显示界面1105。其中,视频显示界面1105内可以显示有目标视频的播放窗口1106。
在本公开另一些实施例中,在电子设备等待生成目标视频的过程中,可以在原始数据输入界面1103上叠加显示用于指示目标视频的制作进度的进度提示画面,例如,进度提示条,以使用户了解目标视频的制作进度。
由此,可以避免用户在等待生成目标视频的过程中,对原始数据的误更改,并且向用户提示目标视频的制作进度,进一步提升用户的体验。
在本公开再一种实施方式中,电子设备可以在本地根据通过原始数据获取的目标文本自动生成字幕文本、根据字幕文本自动获取多媒体素材以及根据字幕文本和多媒体素材自动生成目标视频,以减少目标视频的生成时间。
在本公开一些实施例中,在用户通过自动录入输入模式向电子设备输入原始数据或者电子设备接收到其它设备发送的原始数据的情况下,电子设备可以在本地根据通过原始数据获取的目标文本自动生成字幕文本、根据字幕文本自动获取多媒体素材以及根据字幕文本和多媒体素材自动生成目标视频。
图14示出了本公开实施例提供的另一种视频显示方法的流程示意图。
如图14所示,该视频显示方法可以包括下文S1410-S1460。
S1410、接收针对原始数据的视频生成操作。
其中,原始数据用于获取目标文本,视频生成操作用于触发生成目标文本对应的目标视频。
需要说明的是,S1410与图2所示实施例中的S210相似,在此不做赘述。
S1420、响应于视频生成操作,根据原始数据,获取目标文本。
在一些实施例中,在原始数据包括文字的情况下,S1420可以具体包括:
提取原始数据中的文字;
将文字作为目标文本。
具体地,电子设备可以响应于视频生成操作,确定原始数据的数据类型,并且在确定原始数据的数据类型为文字类型的情况下,直接提取原始数据中的文字,并将提取得到的 文字作为目标文本。
在一种可能的实施方式中,电子设备可以通过接收到原始数据的控件,确定原始数据的数据类型。
继续参见图3,电子设备可以响应于接收到用户对“生成视频”按钮307输入的触发操作,在确定接收到原始数据的控件为“文章标题”输入框302和“文章内容”输入框303的情况下,电子设备可以确定原始数据的数据类型为文字类型,进而直接提取原始数据如文字标题和文字内容中的文字,并将提取得到的文字作为目标文本。
在另一些实施例中,在原始数据包括多媒体文件的情况下,S1420可以具体包括:
对多媒体文件进行文本转换,得到转换文本;
将转换文本作为目标文本。
具体地,电子设备可以响应于视频生成操作,确定原始数据的数据类型,并且在确定原始数据的数据类型为多媒体文件类型的情况下,对多媒体文件进行文本转换,得到转换文本,并将转换文本作为目标文本。
在一种可能的实施方式中,电子设备可以通过接收到原始数据的控件,确定原始数据的数据类型。
继续参见图5,电子设备可以响应于接收到用户对“生成视频”按钮508输入的触发操作,在确定接收到原始数据的控件为“图像/视频素材”添加控件505的情况下,电子设备可以确定原始数据的数据类型为文件类型,进而对多媒体文件进行文本转换,得到转换文本,并将转换文本作为目标文本。
在一种可能的实施方式中,在多媒体文件包括图像文件的情况下,可以通过文字识别(Optical Character Recognition,OCR)技术对图像文件进行文本转换,得到转换文本;也可以对图像文件的图像画面内容进行学习,通过总结图像画面内容进行文本转换,得到描述图像画面内容的转换文本,在此不做限制。
在一种可能的实施方式中,在多媒体文件包括视频文件的情况下,可以通过文字识别(Optical Character Recognition,OCR)技术对视频文件的每个图像帧进行文本转换,得到转换文本;也可以对视频文件的每个图像帧的图像画面内容进行学习,通过总结图像画面内容进行文本转换,得到描述图像画面内容的转换文本,还可以通过语音识别对视频文件中的音频进行文本转换,得到转换文本,在此不做限制。
在一种可能的实施方式中,在多媒体文件包括音频文件的情况下,可以通过语音识别对音频文件进行文本转换,得到转换文本。
在又一些实施例中,在原始数据包括用于获取文字内的链接地址的情况下,S1420可以具体包括:
基于链接地址,获取目标文章;
提取目标文章中的文章文本;
将文章文本作为目标文本。
具体地,电子设备可以响应于视频生成操作,确定原始数据的数据类型,并且在确定原始数据的数据类型为地址类型的情况下,基于链接地址,获取目标文章,并直接提取目标文章中的文章文本,然后将提取得到的文章文本作为目标文本。
在一种可能的实施方式中,电子设备可以通过接收到原始数据的控件,确定原始数据的数据类型。
继续参见图3,电子设备可以响应于接收到用户对“生成视频”按钮307输入的触发操作,在确定接收到原始数据的控件为“文章链接”输入框304的情况下,电子设备可以确定原始数据的数据类型为地址类型,进而提取原始数据中的链接地址,基于链接地址,获取目标文章,并直接提取目标文章中的文章文本,然后将提取得到的文章文本作为目标文本。
S1430、根据目标文本,生成字幕文本。
在本公开一些实施例中,根据目标文本,生成字幕文本可以具体包括:
对目标文本进行文本摘要提取,得到目标文本的摘要;
对摘要进行文本排版,得到摘要对应的字幕文本。
在一些实施例中,电子设备可以直接将目标文件输入预设的摘要提取模型,得到目标文本的摘要,然后直接将得到的摘要输入预设的文本排版模型,得到以句为单位的、对词语进行了跨行及跨页处理的、配置有符合的标点符号的字幕文本。
在另一些实施例中,电子设备还可以根据目标文本的标题和文本内容中的至少一种,在互联网文章文本或者本地文章文本中,筛选与目标文本的相似度符合预设文本相似度阈值的相似文章文本。如果未筛选到相似文章文本,则可以直接将目标文件输入预设的摘要提取模型,得到目标文本的摘要。如果筛选到相似文章文本,可以在目标文本和相似文章文本中,基于文本长度、点赞数和转发数进行加权求和,得到每个文本的文本评分,并且选取文本评分最高的文本,并将选取的文本输入预设的摘要提取模型,得到目标文本的摘要。在电子设备得到摘要之后,可以直接将摘要输入预设的文本排版模型,得到以句为单位的、对词语进行了跨行及跨页处理的、配置有符合的标点符号的字幕文本。
在一种可能的实施方式中,在对摘要进行文本排版之前,还可以对摘要中的敏感关键词如机构名和用户个人信息等以及无法生成语音音频的特殊符号进行文本清洗处理,然后对清洗后的摘要进行文本排版,得到摘要对应的字幕文本。
在本公开另一些实施例中,根据目标文本,生成字幕文本还可以具体包括:
对目标文本进行文本摘要提取,得到目标文本的摘要;
将摘要作为字幕文本。
其中,电子设备得到目标文本的摘要的方法与上述的实施例相似,在此不做赘述。
S1440、获取字幕文本对应的多媒体素材。
在一些实施例中,S1440可以具体包括:
在多个预设素材中,确定与字幕文本的匹配程度最高的目标素材,多个预设素材包括根据原始数据获取的素材,预设素材包括图像和视频中的至少一种;
将目标素材作为多媒体素材。
具体地,电子设备可以首先获取多个预设素材,预设素材可以包括素材库中的素材和互联网中的素材中的至少一种。在原始数据包括多媒体文件和链接地址的情况下,电子设备可以基于原始数据获取图像和视频中的至少一种素材,预设素材还可以包括根据原始数据获取的素材。
在一种可能的实施方式中,电子设备可以确定每个预设素材与每个字幕文本的匹配程度,然后针对每一个字幕文本,选取匹配程度最高的预设数量的预设素材,将所选取的预设素材作为该字幕文本对应的目标素材。
进一步地,电子设备可以将每个预设素材和每个对应的字幕文本分别输入预设的图文匹配模型,得到每个预设素材和每个对应的字幕文本之间的图文匹配分数,然后,计算每个预设素材中包含的文本与每个对应的字幕文本之间的文本相似度,接着,确定每个预设素材与每个字幕文本的来源是否相同,得到来源相似度,最后,针对每一个字幕文本,利用图文匹配分数、文本相似度、来源相似度、预设素材的图像聚类分数以及预设素材所属文本的文本权重中的至少一项进行加权求和,得到每个预设素材相对于该字幕文本的匹配程度,进而选取匹配程度最高的预设数量的预设素材,将所选取的预设素材作为该字幕文本对应的目标素材。
其中,文本相似度可以基于预设的文本相似度算法,根据字幕文本和预设素材中包含的文本计算得到。具体地,预设的文本相似度算法可以为文本语义相似度算法,也可以为文本文字相似度算法,在此不作限制。
在另一些实施例中,S1440可以具体包括:
对字幕文本进行文本语音转换,得到字幕文本对应的字幕音频;
将字幕音频作为多媒体素材。
具体地,电子设备可以基于文本语音转换技术,对每个字幕文本进行语音转换,得到每个字幕文本对应的字幕音频。例如,电子设备可以将字幕文本输入预设的文本语音转换模型进行语音转换,得到字幕音频。
在又一些实施例中,S1440可以具体包括:
将目标文本输入预设的文本情感分类模型进行分类,得到目标文本所属的情感类别;
在多个预设背景音频中,确定情感类别对应的目标背景音频;
将目标背景音频作为多媒体素材。
具体地,电子设备可以预先设置有多个预设背景音频,一个预设背景音频可以对应一种情感类别,情感类别可以包括愉悦类别、悲伤类别、严肃类别、紧张类别等用于表示目标文本的情感所属分类的类别。电子设备将目标文本输入预设的文本情感分类模型进行分类,得到目标文本所属的情感类别,并且在多个预设背景音频中,确定情感类别对应的目标背景音频,进而将目标背景音频作为多媒体素材。
由此,电子设备可以通过对目标文本进行情感分析和分类,在多个预设背景音频中选取合适的背景音乐,用以生成目标视频。
S1450、根据字幕文本和多媒体素材,生成目标视频。
在多媒体素材包括图像和视频中的至少一种的情况下,可以在每一个图像和视频的各图像帧中的预设位置处添加字幕文本,并且按照各个字幕文本的排列顺序,对图像和视频进行排序,得到动态图像,进而根据预设的视频模板中的预设的字幕文本的显示效果和图像和视频的显示效果对动态图像中的字幕文本和图像和视频进行视频渲染,得到目标视频。
在多媒体素材包括图像和视频中的至少一种以及字幕音频的情况下,可以在每一个图像和视频的各图像帧中的预设位置处添加字幕文本,并且按照各个字幕文本的排列顺序, 对图像和视频进行排序,然后,根据每个字幕文本对应的字幕音频的音频时长和每个字幕文本对应的图像和视频的数量以及视频的时长,确定每个图像和视频的显示时间和显示时长,接着,根据图像和视频的排序以及每个图像和视频的显示时间和显示时长,得到动态图像,进而根据预设的视频模板中的预设的字幕文本的显示效果和图像和视频的显示效果对动态图像中的字幕文本和图像和视频进行视频渲染,得到目标视频,并按照目标视频中的每个视频帧的时间戳和字幕音频的音频帧的时间戳的对应关系,将目标视频与字幕音频融合,得到融合后的目标视频。
在多媒体素材包括图像和视频中的至少一种、字幕音频和目标背景音频的情况下,可以在每一个图像和视频的各图像帧中的预设位置处添加字幕文本,并且按照各个字幕文本的排列顺序,对图像和视频进行排序,然后,根据每个字幕文本对应的字幕音频的音频时长和每个字幕文本对应的图像和视频的数量以及视频的时长,确定每个图像和视频的显示时间和显示时长,接着,根据图像和视频的排序以及每个图像和视频的显示时间和显示时长,得到动态图像,进而根据预设的视频模板中的预设的字幕文本的显示效果和图像和视频的显示效果对动态图像中的字幕文本和图像和视频进行视频渲染,得到目标视频,并按照目标视频中的每个视频帧的时间戳和字幕音频的音频帧的时间戳的对应关系,将目标视频与字幕音频融合,得到初步融合后的目标视频,最后,根据初步融合后的目标视频中的音频帧的时间戳和目标背景音频,将目标背景音频融合至目标视频的音频中,得到最终融合后的目标视频。
S1460、显示生成的目标视频。
其中,目标视频的视频元素包括字幕文本和字幕文本对应的多媒体素材,字幕文本根据目标文本生成,多媒体素材根据字幕文本获取。
需要说明的是,S1460与图2所示实施例中的S220相似,在此不做赘述。
由此,在本公开实施例中,能够根据原始数据所涉及的目标文本,自动匹配图像、视频、音频等素材,自动渲染生成一段目标文本对应的目标视频,提高了电子设备的智能性。
在本公开另一些实施例中,在S1420之后、在S1450之前,该视频显示方法还可以包括:
将目标文本输入预设的文本内容分类模型进行分类,得到目标文本所属的内容类别;
在多个预设视频模板中,确定内容类别对应的目标视频模板;
相应地,S1450可以具体包括:
根据字幕文本、多媒体素材和目标视频模板,生成目标视频。
在一种可能的实施方式中,确定目标文本对应的目标视频模板的过程可以与S1430和S1440并行进行,也可以按照预先设置的顺序与S1430和S1440顺序进行。
具体地,电子设备可以预先设置有多个视频分类模板,一个视频分类模板可以对应一种内容类别,内容类别可以包括新闻类别、故事类别、日记类别、综艺类别等用于表示目标文本的文本内容所属分类的类别。电子设备在获取到目标文本之后,可以将目标文本输入预设的文本内容分类模型进行分类,得到目标文本所属的内容类别,并且在多个预设视频模板中,确定内容类别对应的目标视频模板,进而根据字幕文本、多媒体素材和目标视频模板,生成目标视频。此外,也可以按照其他分类方式对内容类别进行分类,例如,根 据目标文本中所包含的关键字,对目标文本的内容进行分类,在此不作限制。
进一步地,不同的视频模板还可以包括不同的显示效果元素。因此,可以根据目标视频模板中的预设的字幕文本的显示效果和多媒体素材的显示效果对字幕文本和多媒体素材进行视频渲染,得到目标视频。
由此,在本公开实施例中,可以根据目标文本所属的内容类别选择合适的用于生成目标视频的目标视频模板,进而为字幕文本和多媒体素材设置合适的显示效果。
在本公开另一些实施例中,在用户通过手动录入输入模式向电子设备输入原始数据的情况下,电子设备可以直接获取用户输入的字幕文本和多媒体素材,自动生成目标视频。
在一种可能的实施方式中,在多媒体素材包括图像和视频中的至少一种的情况下,以多媒体素材包括图像为例,电子设备可以获取用户输入的字幕文本和图像,并基于字幕文本,在用户输入的图像以及预设的图像和视频中,确定与字幕文本的匹配程度最高的图像或者视频,进而利用字幕文本和所确定的图像或视频,自动生成目标视频。
在一种可能的实施方式中,电子设备还可以获取用户输入的字幕文本和图像,将字幕文本输入预设的文本内容分类模型进行分类,得到字幕文本所属的内容类别,然后在多个预设视频模板中,确定内容类别对应的目标视频模板,进而根据字幕文本、多媒体素材和目标视频模板,自动生成目标视频。
由此,在本公开实施例中,可以利用用户手动输入的字幕文本和多媒体素材等原始数据,将字幕文本作为目标文本,自动生成用户手动输入的字幕文本和多媒体素材对应的目标视频,进一步提升用户的体验。
在本公开再一种实施方式中,电子设备可以通过服务器生成目标视频,以减少电子设备的数据处理量,并且进一步提高制作的视频质量。
在一种可能的实施方式中,在S210之后、在S220之前,该视频显示方法还可以包括:
向服务器发送携带有原始数据的视频生成请求,视频生成请求用于使服务器基于原始数据反馈目标文本对应的目标视频;
接收服务器反馈的目标视频。
具体地,电子设备可以在接收到视频生成操作之后,向服务器发送携带有原始数据的视频生成请求,使服务器响应于视频生成请求,基于原始数据生成并反馈目标文本对应的目标视频。电子设备可以接收服务器反馈的目标视频,并且显示目标视频。
其中,服务器可以通过原始数据自动获取目标文本,根据目标文本自动生成字幕文本,根据字幕文本自动获取多媒体素材以及根据字幕文本和多媒体素材自动生成目标视频,与前述的电子设备生成目标视频的方法相似,在此不做赘述。
由此,在本公开实施例中,可以通过电子设备与服务器之间的交互,快速、高质量地基于原始数据生成目标视频,以提升用户的体验。
在本公开再一些实施方式中,为了提高目标视频的趣味性,视频元素还可以包括预设的虚拟对象和虚拟对象的姿态。其中,虚拟对象的姿态可以根据字幕文本确定。
在本公开实施例中,虚拟对象可以为具有人物形象的虚拟人物对象和具有卡通形象的虚拟卡通对象等,本公开不对虚拟对象的具体类型进行限制。
图15示出了本公开实施例提供的又一种视频显示界面的示意图。如图15所示,电子 设备可以在接收到视频生成操作之后,响应于视频生成操作,显示视频显示界面1501,并且在视频显示界面1501内全屏显示目标视频的播放窗口1502,播放窗口1502内显示的目标视频可以包括虚拟对象1503,例如虚拟人物对象。以目标视频为新闻播报视频为例,该虚拟人物对象1503可以例如作为新闻播报的虚拟主播。
在本公开实施例中,虚拟对象的姿态可以包括嘴部姿态、面部表情姿态、手势姿态和身体姿态中的至少一种,在此不做限制。
在目标视频中显示有虚拟对象的头部形象的情况下,虚拟对象的姿态可以包括嘴部姿态和面部表情姿态。在目标视频中显示有虚拟对象的上半身形象的情况下,虚拟对象的姿态可以包括嘴部姿态、面部表情姿态和手势姿态。在目标视频中显示有虚拟对象的全身形象的情况下,虚拟对象的姿态可以包括嘴部姿态、面部表情姿态、手势姿态和身体姿态。本公开不对虚拟对象的姿态类型进行限制。
在本公开实施例中,虚拟对象的姿态可以根据字幕文本自动确定。
在一些实施例中,虚拟对象的姿态可以由电子设备确定。在另一些实施例中,虚拟对象的姿态也可以由服务器确定。
由于电子设备确定虚拟对象的姿态的方法与服务器确定虚拟对象的姿态的方法相似,下面以虚拟对象包括虚拟人物对象、电子设备确定人物姿态的方法为例进行详细说明。
具体地,在电子设备得到字幕音频之后,可以将字幕音频输入预设的姿态生成模型中,得到实时人物姿态动画,并利用姿态迁移技术,将实时人物姿态动画迁移至虚拟人物对象的对象模型中,得到播报字幕文本的虚拟人物对象的对象模型,进而根据得到的对象模型获取字幕音频的每个音频帧对应的虚拟人物对象的人物姿态图像,并根据每个音频帧在根据字幕文本和多媒体素材所生成的目标视频内的时间戳,将对应的人物姿态图像融合至该目标视频中,得到融合后的具有虚拟人物对象的目标视频。
其中,在虚拟人物对象的对象模型为虚拟人物对象的头部模型的情况下,预设的姿态生成模型可以用于生成嘴部姿态动画和面部表情姿态动画。在虚拟人物对象的对象模型为虚拟人物对象的上半身模型的情况下,预设的姿态生成模型可以用于生成嘴部姿态、面部表情姿态和手势姿态。在虚拟人物对象的对象模型为虚拟人物对象的全身模型的情况下,预设的姿态生成模型可以用于生成嘴部姿态、面部表情姿态、手势姿态和身体姿态。
在本公开一些实施例中,在确定虚拟对象的姿态之前,还可以生成与用户形象相似的虚拟对象。
在一种可能的实施方式中,虚拟对象可以在生成字幕音频之前生成,也可以在生成字幕音频之后生成,在此不做限制。
在一些实施例中,在虚拟对象的姿态由电子设备确定的情况下,电子设备可以首先采集用户图像,然后可以将用户图像输入预设的生物特征提取模型,以提取用户图像中的用户生物特征,接着将提取的用户生物特征输入预设的对象生成模型,以得到具有该用户生物特征的虚拟对象的初始对象模型,最后将预设的服饰模型融合至初始对象模型,得到最终的虚拟对象的对象模型。
在一种可能的实施方式中,用户图像可以为用户通过摄像头拍摄的采集图像,也可以为用户在预设图像中选择的采集图像。
在一种可能的实施方式中,用户图像可以为用户的人脸图像、上半身图像和全身图像,在此不作限制。
在一种可能的实施方式中,电子设备提取的用户生物特征可以包括用户的人脸特征、头肩部特征和人体形态特征中的至少一种,在此不作限制。
例如,若用户图像为人脸图像,则提取的用户生物特征可以包括用户的人脸特征。再例如,若用户图像为全身图像,则提取的用户生物特征可以包括用户的人脸特征和人体形态特征。
在一种可能的实施方式中,在用户图像为人脸图像的情况下,预设的对象生成模型可以用于生成头部模型。在用户图像为上半身图像的情况下,预设的对象生成模型可以用于生成上半身模型。在用户图像为全身图像的情况下,预设的对象生成模型可以用于生成全身模型。
在另一些实施例中,在虚拟对象的姿态由服务器确定的情况下,电子设备可以首先采集用户图像,然后提取用户图像中的用户生物特征,接着将提取的用户生物特征发送至服务器,使服务器根据用户生物特征生成虚拟对象的对象模型。其中,服务器生成对象模型的方法与上述的电子设备生成对象模型的方法相似,在此不做赘述。
在本公开另一些实施例中,在确定虚拟对象的姿态之前,还可以生成与用户形象和用户装扮相似的虚拟对象。
在一种可能的实施方式中,虚拟对象可以在生成字幕音频之前生成,也可以在生成字幕音频之后生成,在此不做限制。
在一些实施例中,在虚拟对象的姿态由电子设备确定的情况下,电子设备可以首先采集用户图像,然后可以将用户图像输入预设的生物特征提取模型,以提取用户图像中的用户生物特征,并且将用户图像输入预设的装扮特征提取模型,以提取用户图像中的用户装扮特征,接着可以将提取的用户生物特征输入预设的对象生成模型,以得到具有该用户生物特征的虚拟对象的初始对象模型,并且根据预设的装扮风格与装扮模型的对应关系,在预设的装扮模型中,查询到用户装扮特征所属的用户装扮风格对应的目标装扮模型,并且将目标装扮模型与初始对象模型,得到具有用户装扮特征的虚拟对象的对象模型。
在一种可能的实施方式中,提取的用户装扮特征可以包括用户的面部装饰特征、头饰特征、衣物特征和衣物配饰特征中的至少一种。
例如,若用户图像为人脸图像,则提取的用户生物特征可以包括用户的人脸特征,提取的用户装扮特征可以包括头饰特征。再例如,若用户图像为全身图像,则提取的用户生物特征可以包括用户的人脸特征和人体形态特征,提取的用户装扮特征可以包括面部装饰特征、头饰特征、衣物特征和衣物配饰特征。
在一种可能的实施方式中,电子设备可以将用户装扮特征输入预设的装扮风格分类模型,以确定用户装扮特征所属的用户装扮风格。
其中,装扮风格可以包括知性、可爱、帅气、沉稳、阳光等。
在另一些实施例中,在虚拟对象的姿态由服务器确定的情况下,电子设备可以首先采集用户图像,然后提取用户图像中的用户生物特征和用户装扮特征,接着将提取的用户生物特征和用户装扮特征发送至服务器,使服务器根据用户生物特征和用户装扮特征生成虚 拟对象的对象模型。其中,服务器生成对象模型的方法与上述的电子设备生成对象模型的方法相似,在此不做赘述。
在本公开又一些实施例中,在生成字幕音频之前生成与用户形象和用户装扮相似的虚拟对象的情况下,电子设备或服务器还可以根据虚拟对象的装扮风格,生成字幕音频,该字幕音频为与虚拟对象的装扮特征一致的声音特征的音频。
以电子设备生成字幕音频为例,电子设备可以预先设置有多个文本语音转换模型,每个文本语音转换模型对应一种装扮风格,因此电子设备可以在多个文本语音转换模型中,选择虚拟对象的装扮风格对应的目标文本语音转换模型,并且将字幕文本输入目标文本语音转换模型进行语音转换,得到字幕音频,以生成具有与虚拟对象的装扮特征一致的声音特征的音频,进一步提升用户的体验。
下面再根据图1所示的架构,结合图16对本公开实施例提供的视频处理方法进行说明。在本公开实施例中,该视频处理方法可以由服务器执行,例如图1中所示的服务端中的服务器102。其中,服务器可以包括云服务器或者服务器集群等具有存储及计算功能的设备。
图16示出了本公开实施例提供的一种视频处理方法的流程示意图。
如图16所示,该视频处理方法可以包括下文S1610-S16600。
S1610、接收电子设备发送的携带有原始数据的视频生成请求。
具体地,电子设备可以在接收到用户针对原始数据的视频生成操作之后,响应于视频生成操作,向服务器发送携带有原始数据的视频生成请求,以使服务器接收电子设备发送的携带有原始数据的视频生成请求,并响应于视频生成请求,基于原始数据反馈目标文本对应的目标视频。
其中,电子设备可以为图1中所示的客户端中的电子设备101。
S1620、响应于视频生成请求,根据原始数据,获取目标文本。
在本公开一些实施例中,原始数据可以包括文字。
相应地,S1620可以具体包括:
提取原始数据中的文字;
将文字作为目标文本。
具体地,服务器可以在确定原始数据的数据类型为文字类型的情况下,直接提取原始数据中的文字,并将提取得到的文字作为目标文本。
在一种可能的实施方式中,视频生成请求可以携带有原始数据的数据类型,原始数据的数据类型可以由电子设备通过接收到原始数据的控件确定。
在本公开另一些实施例中,原始数据可以包括多媒体文件。
相应地,S1620可以具体包括:
对多媒体文件进行文本转换,得到转换文本;
将转换文本作为目标文本。
具体地,服务器可以在确定原始数据的数据类型为多媒体文件类型的情况下,对多媒体文件进行文本转换,得到转换文本,并将转换文本作为目标文本。
在一种可能的实施方式中,视频生成请求可以携带有原始数据的数据类型,原始数据 的数据类型可以由电子设备通过接收到原始数据的控件确定。
在一种可能的实施方式中,在多媒体文件包括图像文件的情况下,可以通过OCR技术对图像文件进行文本转换,得到转换文本;也可以对图像文件的图像画面内容进行学习,通过总结图像画面内容进行文本转换,得到描述图像画面内容的转换文本,在此不做限制。
在一种可能的实施方式中,在多媒体文件包括视频文件的情况下,可以通过OCR技术对视频文件的每个图像帧进行文本转换,得到转换文本;也可以对视频文件的每个图像帧的图像画面内容进行学习,通过总结图像画面内容进行文本转换,得到描述图像画面内容的转换文本,还可以通过语音识别对视频文件中的音频进行文本转换,得到转换文本,在此不做限制。
在一种可能的实施方式中,在多媒体文件包括音频文件的情况下,可以通过语音识别对音频文件进行文本转换,得到转换文本。
在本公开又一些实施例中,原始数据可以包括链接地址,链接地址可以用于获取文章内容。
相应地,S1620可以具体包括:
基于链接地址,获取目标文章;
提取目标文章中的文章文本;
将文章文本作为目标文本。
具体地,服务器可以在确定原始数据的数据类型为地址类型的情况下,基于链接地址,获取目标文章,并直接提取目标文章中的文章文本,然后将提取得到的文章文本作为目标文本。
在一种可能的实施方式中,视频生成请求可以携带有原始数据的数据类型,原始数据的数据类型可以由电子设备通过接收到原始数据的控件确定。
S1630、根据目标文本,生成字幕文本。
在本公开一些实施例中,S1630可以具体包括:
对目标文本进行文本摘要提取,得到目标文本的摘要;
对摘要进行文本排版,得到摘要对应的字幕文本。
在一些实施例中,服务器可以直接将目标文件输入预设的摘要提取模型,得到目标文本的摘要,然后直接将得到的摘要输入预设的文本排版模型,得到以句为单位的、对词语进行了跨行及跨页处理的、配置有符合的标点符号的字幕文本。
在另一些实施例中,服务器还可以根据目标文本的标题和文本内容中的至少一种,在互联网文章文本或者本地文章文本中,筛选与目标文本的相似度符合预设文本相似度阈值的相似文章文本。如果未筛选到相似文章文本,则可以直接将目标文件输入预设的摘要提取模型,得到目标文本的摘要。如果筛选到相似文章文本,可以在目标文本和相似文章文本中,基于文本长度、点赞数和转发数进行加权求和,得到每个文本的文本评分,并且选取文本评分最高的文本,并将选取的文本输入预设的摘要提取模型,得到目标文本的摘要。在电子设备得到摘要之后,可以直接将摘要输入预设的文本排版模型,得到以句为单位的、对词语进行了跨行及跨页处理的、配置有符合的标点符号的字幕文本。
在一种可能的实施方式中,在对摘要进行文本排版之前,还可以对摘要中的敏感关键 词如机构名和用户个人信息等以及无法生成语音音频的特殊符号进行文本清洗处理,然后对清洗后的摘要进行文本排版,得到摘要对应的字幕文本。
在本公开另一些实施例中,S1630可以具体包括:
对目标文本进行文本摘要提取,得到目标文本的摘要;
将摘要作为字幕文本。
其中,服务器得到目标文本的摘要的方法与上述的实施例相似,在此不做赘述。
S1640、获取字幕文本对应的多媒体素材。
在本公开一些实施例中,S1640可以具体包括:
在多个预设素材中,确定与字幕文本的匹配程度最高的目标素材,多个预设素材包括根据原始数据获取的素材,预设素材包括图像和视频中的至少一种;
将目标素材作为多媒体素材。
具体地,服务器可以首先获取多个预设素材,预设素材可以包括素材库中的素材和互联网中的素材中的至少一种。在原始数据包括多媒体文件和链接地址的情况下,服务器可以基于原始数据获取图像和视频中的至少一种素材,预设素材还可以包括根据原始数据获取的素材。
在一种可能的实施方式中,服务器可以确定每个预设素材与每个字幕文本的匹配程度,然后针对每一个字幕文本,选取匹配程度最高的预设数量的预设素材,将所选取的预设素材作为该字幕文本对应的目标素材。
进一步地,服务器可以将每个预设素材和每个对应的字幕文本分别输入预设的图文匹配模型,得到每个预设素材和每个对应的字幕文本之间的图文匹配分数,然后,计算每个预设素材中包含的文本与每个对应的字幕文本之间的文本相似度,接着,确定每个预设素材与每个字幕文本的来源是否相同,得到来源相似度,最后,针对每一个字幕文本,利用图文匹配分数、文本相似度、来源相似度、预设素材的图像聚类分数以及预设素材所属文本的文本权重中的至少一项进行加权求和,得到每个预设素材相对于该字幕文本的匹配程度,进而选取匹配程度最高的预设数量的预设素材,将所选取的预设素材作为该字幕文本对应的目标素材。
其中,文本相似度可以基于预设的文本相似度算法,根据字幕文本和预设素材中包含的文本计算得到。具体地,预设的文本相似度算法可以为文本语义相似度算法,也可以为文本文字相似度算法,在此不作限制。
在本公开另一些实施例中,S1640可以具体包括:
对字幕文本进行文本语音转换,得到字幕文本对应的字幕音频;
将字幕音频作为多媒体素材。
具体地,服务器可以基于文本语音转换技术,对每个字幕文本进行语音转换,得到每个字幕文本对应的字幕音频。例如,服务器可以将字幕文本输入预设的文本语音转换模型进行语音转换,得到字幕音频。
在本公开又一些实施例中,S1640可以具体包括:
将目标文本输入预设的文本情感分类模型进行分类,得到目标文本所属的情感类别;
在多个预设背景音频中,确定情感类别对应的目标背景音频;
将目标背景音频作为多媒体素材。
具体地,服务器可以预先设置有多个预设背景音频,一个预设背景音频可以对应一种情感类别,情感类别可以包括愉悦类别、悲伤类别、严肃类别、紧张类别等用于表示目标文本的情感所属分类的类别。服务器将目标文本输入预设的文本情感分类模型进行分类,得到目标文本所属的情感类别,并且在多个预设背景音频中,确定情感类别对应的目标背景音频,进而将目标背景音频作为多媒体素材。
由此,服务器可以通过对目标文本进行情感分析和分类,在多个预设背景音频中选取合适的背景音乐,用以生成目标视频。
S1650、根据字幕文本和多媒体素材,生成目标视频。
在本公开一些实施例中,服务器可以直接根据字幕文本和多媒体素材,生成目标视频。
此时,视频元素可以包括字幕文本和多媒体素材。
在多媒体素材包括图像和视频中的至少一种的情况下,可以在每一个图像和视频的各图像帧中的预设位置处添加字幕文本,并且按照各个字幕文本的排列顺序,对图像和视频进行排序,得到动态图像,进而根据预设的视频模板中的预设的字幕文本的显示效果和图像和视频的显示效果对动态图像中的字幕文本和图像和视频进行视频渲染,得到目标视频。
在多媒体素材包括图像和视频中的至少一种以及字幕音频的情况下,可以在每一个图像和视频的各图像帧中的预设位置处添加字幕文本,并且按照各个字幕文本的排列顺序,对图像和视频进行排序,然后,根据每个字幕文本对应的字幕音频的音频时长和每个字幕文本对应的图像和视频的数量以及视频的时长,确定每个图像和视频的显示时间和显示时长,接着,根据图像和视频的排序以及每个图像和视频的显示时间和显示时长,得到动态图像,进而根据预设的视频模板中的预设的字幕文本的显示效果和图像和视频的显示效果对动态图像中的字幕文本和图像和视频进行视频渲染,得到目标视频,并按照目标视频中的每个视频帧的时间戳和字幕音频的音频帧的时间戳的对应关系,将目标视频与字幕音频融合,得到融合后的目标视频。
在多媒体素材包括图像和视频中的至少一种、字幕音频和目标背景音频的情况下,可以在每一个图像和视频的各图像帧中的预设位置处添加字幕文本,并且按照各个字幕文本的排列顺序,对图像和视频进行排序,然后,根据每个字幕文本对应的字幕音频的音频时长和每个字幕文本对应的图像和视频的数量以及视频的时长,确定每个图像和视频的显示时间和显示时长,接着,根据图像和视频的排序以及每个图像和视频的显示时间和显示时长,得到动态图像,进而根据预设的视频模板中的预设的字幕文本的显示效果和图像和视频的显示效果对动态图像中的字幕文本和图像和视频进行视频渲染,得到目标视频,并按照目标视频中的每个视频帧的时间戳和字幕音频的音频帧的时间戳的对应关系,将目标视频与字幕音频融合,得到初步融合后的目标视频,最后,根据初步融合后的目标视频中的音频帧的时间戳和目标背景音频,将目标背景音频融合至目标视频的音频中,得到最终融合后的目标视频。
在本公开另一些实施例中,在S1620之后、在S1650之前,该视频处理方法还可以包括:
将目标文本输入预设的文本内容分类模型进行分类,得到目标文本所属的内容类别;
在多个预设视频模板中,确定内容类别对应的目标视频模板。
相应地,S1650可以具体包括:
根据字幕文本、多媒体素材和目标视频模板,生成目标视频。
具体地,服务器可以预先设置有多个视频分类模板,一个视频分类模板可以对应一种内容类别,内容类别可以包括新闻类别、故事类别、日记类别、综艺类别等用于表示目标文本的文本内容所属分类的类别。电子设备在获取到目标文本之后,可以将目标文本输入预设的文本内容分类模型进行分类,得到目标文本所属的内容类别,并且在多个预设视频模板中,确定内容类别对应的目标视频模板,进而根据字幕文本、多媒体素材和目标视频模板,生成目标视频。此外,也可以按照其他分类方式对内容类别进行分类,例如,根据目标文本中所包含的关键字,对目标文本的内容进行分类,在此不作限制。
进一步地,不同的视频模板还可以包括不同的显示效果元素。因此,可以根据目标视频模板中的预设的字幕文本的显示效果和多媒体素材的显示效果对字幕文本和多媒体素材进行视频渲染,得到目标视频。
由此,在本公开实施例中,可以根据目标文本所属的内容类别选择合适的用于生成目标视频的目标视频模板,进而为字幕文本和多媒体素材设置合适的显示效果。
在本公开又一些实施例中,在S1620之后、在S1650之前,该视频处理方法还可以包括:
根据字幕文本,确定虚拟对象的姿态。
相应地,S1650可以具体包括:
根据字幕文本、多媒体素材、虚拟对象和虚拟对象的姿态,生成目标视频。
此时,视频元素还可以包括预设的虚拟对象和虚拟对象的姿态。
在本公开实施例中,虚拟对象可以为具有人物形象的虚拟人物对象和具有卡通形象的虚拟卡通对象等,本公开不对虚拟对象的具体类型进行限制。
在本公开实施例中,虚拟对象的姿态可以包括嘴部姿态、面部表情姿态、手势姿态和身体姿态中的至少一种,在此不做限制。
在目标视频中显示有虚拟对象的头部形象的情况下,虚拟对象的姿态可以包括嘴部姿态和面部表情姿态。在目标视频中显示有虚拟对象的上半身形象的情况下,虚拟对象的姿态可以包括嘴部姿态、面部表情姿态和手势姿态。在目标视频中显示有虚拟对象的全身形象的情况下,虚拟对象的姿态可以包括嘴部姿态、面部表情姿态、手势姿态和身体姿态。本公开不对虚拟对象的姿态类型进行限制。
具体地,在服务器得到字幕音频之后,可以将字幕音频输入预设的姿态生成模型中,得到实时人物姿态动画,并利用姿态迁移技术,将实时人物姿态动画迁移至虚拟人物对象的对象模型中,得到播报字幕文本的虚拟人物对象的对象模型,进而根据得到的对象模型获取字幕音频的每个音频帧对应的虚拟人物对象的人物姿态图像,并根据每个音频帧在根据字幕文本和多媒体素材所生成的目标视频内的时间戳,将对应的人物姿态图像融合至该目标视频中,得到融合后的具有虚拟人物对象的目标视频。
其中,在虚拟人物对象的对象模型为虚拟人物对象的头部模型的情况下,预设的姿态生成模型可以用于生成嘴部姿态动画和面部表情姿态动画。在虚拟人物对象的对象模型为 虚拟人物对象的上半身模型的情况下,预设的姿态生成模型可以用于生成嘴部姿态、面部表情姿态和手势姿态。在虚拟人物对象的对象模型为虚拟人物对象的全身模型的情况下,预设的姿态生成模型可以用于生成嘴部姿态、面部表情姿态、手势姿态和身体姿态。
在一种可能的实施方式中,在根据字幕文本,确定虚拟对象的姿态之前,该视频处理方法还可以包括:
生成与用户形象相似的虚拟对象。
在一些实施例中,电子设备可以向服务器发送用户图像,服务器可以将接收到的用户图像输入预设的生物特征提取模型,以提取用户图像中的用户生物特征,接着将提取的用户生物特征输入预设的对象生成模型,以得到具有该用户生物特征的虚拟对象的初始对象模型,最后将预设的服饰模型融合至初始对象模型,得到最终的虚拟对象的对象模型。
在另一些实施例中,电子设备可以向服务器发送用户图像,服务器可以将接收到的用户图像输入预设的生物特征提取模型,以提取用户图像中的用户生物特征,并且将用户图像输入预设的装扮特征提取模型,以提取用户图像中的用户装扮特征,接着可以将提取的用户生物特征输入预设的对象生成模型,以得到具有该用户生物特征的虚拟对象的初始对象模型,并且根据预设的装扮风格与装扮模型的对应关系,在预设的装扮模型中,查询到用户装扮特征所属的用户装扮风格对应的目标装扮模型,并且将目标装扮模型与初始对象模型,得到具有用户装扮特征的虚拟对象的对象模型。
在一种可能的实施方式中,提取的用户装扮特征可以包括用户的面部装饰特征、头饰特征、衣物特征和衣物配饰特征中的至少一种。
在又一些实施例中,在生成字幕音频之前生成与用户形象和用户装扮相似的虚拟对象的情况下,服务器还可以根据虚拟对象的装扮风格,生成字幕音频,该字幕音频为与虚拟对象的装扮特征一致的声音特征的音频。
S1660、向电子设备发送目标视频。
具体地,服务器在生成目标视频之后,可以向电子设备发送目标视频,以使电子设备显示目标视频。
在本公开实施例中,能够接收电子设备发送的携带有原始数据的视频生成请求,并且响应于该视频生成请求,根据原始数据,自动获取目标文本,并且根据目标文本,自动生成字幕文本,进而自动获取字幕文本对应的多媒体素材,以根据字幕文本和多媒体素材,自动生成目标视频,可见,在目标视频的生成过程中可以自动寻找到丰富的多媒体素材,无需用户人工寻找制作视频的素材,不但能够降低制作视频的时间成本,还能够提高制作的视频质量。
本公开实施例还提供了一种视频处理系统,该视频处理系统可以包括电子设备和服务器,进而实现图1所示的架构。
该电子设备可以用于接收针对原始数据的视频生成操作,原始数据用于获取目标文本,视频生成操作用于触发生成目标文本对应的目标视频;响应于视频生成操作,向服务器发送携带有原始数据的视频生成请求;接收服务器发送的目标视频;显示目标视频;
该服务器可以用于接收电子设备发送的视频生成请求;响应于视频生成请求,根据原 始数据,获取目标文本;根据目标文本,生成字幕文本;获取字幕文本对应的多媒体素材;根据字幕文本和多媒体素材,生成目标视频;向电子设备发送目标视频。
需要说明的是,视频显示设备可以执行图2至图15所示的方法实施例中的各个步骤,并且实现图2至图15所示的方法实施例中的各个过程和效果;视频处理设备可以执行图16所示的方法实施例中的各个步骤,并且实现图16所示的方法实施例中的各个过程和效果,在此不做赘述。
在本公开实施例中,电子设备可以在接收到针对原始数据的视频生成操作之后,向服务器发送携带有原始数据的视频生成请求,服务器能够在接收到电子设备发送的携带有原始数据的视频生成请求之后,根据原始数据,自动获取目标文本,并且根据目标文本,自动生成字幕文本,进而自动获取字幕文本对应的多媒体素材,以根据字幕文本和多媒体素材,自动生成目标视频,并向电子设备发送目标视频,使得电子设备在接收到服务器反馈的目标视频之后,可以显示目标视频。可见,在目标视频的生成过程中可以自动寻找到丰富的多媒体素材,无需用户人工寻找制作视频的素材,不但能够降低制作视频的时间成本,还能够提高制作的视频质量。
图17示出了本公开实施例提供的一种视频处理系统的交互流程示意图。如图17所示,该视频显示方法可以包括下文S1710-S1770。
S1710、电子设备可以接收用户的数据输入操作所输入的原始数据,或者接收其他设备向电子设备发送的原始数据。
其中,原始数据可以包括文字、链接地址和多媒体文件中的至少一种,在此不做赘述。
S1720、电子设备可以显示接收到的原始数据。
S1730、用户如果想要利用原始数据生成视频,可以向电子设备输入针对原始数据的视频生成操作。
其中,视频生成操作可以为对原始数据的长按、双击、语音控制或者表情控制等触发操作,视频生成操作也可以为对视频生成触发控件的点击、长按、双击等触发操作,在此不做赘述。
S1740、电子设备可以向服务器发送携带有原始数据的视频生成请求。
S1750、服务器在接收到视频生成请求之后,可以响应于视频生成请求,基于原始数据生成目标视频。
其中,服务器可以通过原始数据自动获取目标文本,根据目标文本自动生成字幕文本,根据字幕文本自动获取多媒体素材以及根据字幕文本和多媒体素材自动生成目标视频,与前述的目标视频的生成方法相似,在此不做赘述。
S1760、服务器在生成目标视频之后,可以向电子设备发送目标视频。
S1770、电子设备在接收到服务器发送的目标视频之后,可以显示接收到的目标视频。
图18示出了本公开实施例提供的一种视频显示装置的结构示意图。
在本公开实施例中,图18所示的视频显示装置1800可以设置于电子设备内,例如图1中所示的客户端中的电子设备101。其中,电子设备可以包括移动电话、平板电脑、台式计算机、笔记本电脑、车载终端、可穿戴设备、一体机、智能家居设备等具有通信功能的 设备,也可以包括虚拟机或者模拟器模拟的设备。
如图18所示,该视频显示装置1800可以包括第一接收单元1810和第一显示单元1820。
该第一接收单元1810可以配置为接收针对原始数据的视频生成操作,原始数据用于获取目标文本,视频生成操作用于触发生成目标文本对应的目标视频。
该第一显示单元1820可以配置为响应于视频生成操作,显示生成的目标视频,目标视频的视频元素包括字幕文本和字幕文本对应的多媒体素材,字幕文本根据目标文本生成,多媒体素材根据字幕文本获取。
在本公开实施例中,能够接收针对原始数据的视频生成操作,由于该原始数据可以用于获取目标文本,该视频生成操作可以用于触发生成目标文本对应的目标视频,因此,在接收到视频生成操作之后,可以显示响应于视频生成操作而生成的目标视频,该目标视频的视频元素可以包括字幕文本和字幕文本对应的多媒体素材,其中,字幕文本可以根据目标文本自动生成,多媒体素材可以根据字幕文本自动获取,可见,在目标视频的生成过程中可以自动寻找到丰富的多媒体素材,无需用户人工寻找制作视频的素材,不但能够降低制作视频的时间成本,还能够提高制作的视频质量。
在本公开一些实施例中,该视频显示装置1800还可以包括第二显示单元、第三接收单元和第三显示单元。
该第二显示单元可以配置为响应于视频生成操作,显示视频编辑界面,视频编辑界面包括可编辑元素,可编辑元素包括视频元素和视频元素对应的显示效果元素中的至少一种。
该第三接收单元可以配置为接收对可编辑元素的元素修改操作。
该第三显示单元可以配置为响应于元素修改操作,显示修改后的目标视频,修改后的目标视频包括元素修改操作完成时在视频编辑界面内显示的可编辑元素。
在本公开一些实施例中,该视频显示装置1800还可以包括第四显示单元、第四接收单元和第五显示单元。
该第四显示单元可以配置为显示指示标识,指示标识用于指示已生成目标视频。
该第四接收单元可以配置为接收对指示标识的标识触发操作。
该第五显示单元可以配置为响应于标识触发操作,隐藏原始数据。
在本公开一些实施例中,该视频显示装置1800还可以包括第二发送单元和第五接收单元。
该第二发送单元可以配置为向服务器发送携带有原始数据的视频生成请求,视频生成请求用于使服务器基于原始数据反馈目标文本对应的目标视频。
该第五接收单元可以配置为接收服务器反馈的目标视频。
在本公开一些实施例中,该视频显示装置1800还可以包括第三获取单元、第三生成单元、第四获取单元和第四生成单元。
该第三获取单元可以配置为根据原始数据,获取目标文本。
该第三生成单元可以配置为根据目标文本,生成字幕文本。
该第四获取单元可以配置为获取字幕文本对应的多媒体素材。
该第四生成单元可以配置为根据字幕文本和多媒体素材,生成目标视频。
在本公开一些实施例中,视频元素还可以包括预设的虚拟对象和虚拟对象的姿态,虚拟对象的姿态可以根据字幕文本确定。
需要说明的是,图18所示的视频显示装置1800可以执行图2至图15所示的方法实施例中的各个步骤,并且实现图2至图15所示的方法实施例中的各个过程和效果,在此不做赘述。
图19示出了本公开实施例提供的一种视频处理装置的结构示意图。
在本公开实施例中,该视频处理装置1900可以为服务器,例如图1中所示的服务端中的服务器102。其中,服务器可以包括云服务器或者服务器集群等具有存储及计算功能的设备。
如图19所示,该视频处理装置1900可以包括第二接收单元1910、第一获取单元1920、第一生成单元1930、第二获取单元1940、第二生成单元1950和第一发送单元1960。
该第二接收单元1910可以配置为接收电子设备发送的携带有原始数据的视频生成请求。
该第一获取单元1920可以配置为响应于视频生成请求,根据原始数据,获取目标文本。
该第一生成单元1930可以配置为根据目标文本,生成字幕文本。
该第二获取单元1940可以配置为获取字幕文本对应的多媒体素材。
该第二生成单元1950可以配置为根据字幕文本和多媒体素材,生成目标视频。
该第一发送单元1960可以配置为向电子设备发送目标视频。
在本公开实施例中,能够接收电子设备发送的携带有原始数据的视频生成请求,并且响应于该视频生成请求,根据原始数据,自动获取目标文本,并且根据目标文本,自动生成字幕文本,进而自动获取字幕文本对应的多媒体素材,以根据字幕文本和多媒体素材,自动生成目标视频,可见,在目标视频的生成过程中可以自动寻找到丰富的多媒体素材,无需用户人工寻找制作视频的素材,不但能够降低制作视频的时间成本,还能够提高制作的视频质量。
在本公开一些实施例中,该第一生成单元1930可以包括摘要提取子单元和文本排版子单元。
该摘要提取子单元可以配置为对目标文本进行文本摘要提取,得到目标文本的摘要。
该文本排版子单元可以配置为对摘要进行文本排版,得到摘要对应的字幕文本。
在本公开一些实施例中,原始数据可以包括文字。
相应地,该第一获取单元1920可以包括文字提取子单元和第一处理子单元。
该文字提取子单元可以配置为提取原始数据中的文字。
该第一处理子单元可以配置为将文字作为目标文本。
在本公开另一些实施例中,原始数据可以包括多媒体文件。
相应地,该第一获取单元1920可以包括文本转换子单元和第二处理子单元。
该文本转换子单元可以配置为对多媒体文件进行文本转换,得到转换文本。
该第二处理子单元可以配置为将转换文本作为目标文本。
在本公开又一些实施例中,原始数据可以包括链接地址,链接地址可以用于获取文章内容。
相应地,该第一获取单元1920可以包括文章获取子单元、文本提取子单元和第三处理子单元。
该文章获取子单元可以配置为基于链接地址,获取目标文章。
该文本提取子单元可以配置为提取目标文章中的文章文本。
该第三处理子单元可以配置为将文章文本作为目标文本。
在本公开一些实施例中,该第二获取单元1940可以包括第四处理子单元和第五处理子单元。
该第四处理子单元可以配置为在多个预设素材中,确定与字幕文本的匹配程度最高的目标素材,多个预设素材包括根据原始数据获取的素材,预设素材包括图像和视频中的至少一种。
该第五处理子单元可以配置为将目标素材作为多媒体素材。
在本公开另一些实施例中,该第二获取单元1940可以包括语音转换子单元和第六处理子单元。
该语音转换子单元可以配置为对字幕文本进行文本语音转换,得到字幕文本对应的字幕音频。
该第六处理子单元可以配置为将字幕音频作为多媒体素材。
在本公开又一些实施例中,该第二获取单元1940可以包括情感分类子单元、第七处理子单元和第八处理子单元。
该情感分类子单元可以配置为将目标文本输入预设的文本情感分类模型进行分类,得到目标文本所属的情感类别。
该第七处理子单元可以配置为在多个预设背景音频中,确定情感类别对应的目标背景音频。
该第八处理子单元可以配置为将目标背景音频作为多媒体素材。
在本公开又一些实施例中,该视频处理装置1900还可以包括内容分类单元和模板确定单元。
该内容分类单元可以配置为将目标文本输入预设的文本内容分类模型进行分类,得到目标文本所属的内容类别。
该模板确定单元可以配置为在多个预设视频模板中,确定内容类别对应的目标视频模板。
相应地,该第二生成单元1950可以进一步配置为根据字幕文本、多媒体素材和目标视频模板,生成目标视频。
需要说明的是,图19所示的视频处理装置1900可以执行图16所示的方法实施例中的各个步骤,并且实现图16所示的方法实施例中的各个过程和效果,在此不做赘述。
本公开实施例还提供了一种计算设备,该计算设备可以包括处理器和存储器,存储器可以用于存储可执行指令。其中,处理器可以用于从存储器中读取可执行指令,并执行可执行指令以实现上述实施例中的视频显示方法或者视频处理方法。
图20示出了本公开实施例提供的一种计算设备的结构示意图。下面具体参考图20, 其示出了适于用来实现本公开实施例中的计算设备2000的结构示意图。
在本公开实施例中,计算设备可以为电子设备或者服务器。电子设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)、可穿戴设备、等等的移动终端以及诸如数字TV、台式计算机、智能家居设备等等的固定终端。服务器可以包括云服务器或者服务器集群等具有存储及计算功能的设备。
需要说明的是,图20示出的计算设备2000仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图20所示,该计算设备2000可以包括处理装置(例如中央处理器、图形处理器等)2001,其可以根据存储在只读存储器(ROM)2002中的程序或者从存储装置2008加载到随机访问存储器(RAM)2003中的程序而执行各种适当的动作和处理。在RAM 2003中,还存储有计算设备2000操作所需的各种程序和数据。处理装置2001、ROM 2002以及RAM 2003通过总线2004彼此相连。输入/输出(I/O)接口2005也连接至总线2004。
通常,以下装置可以连接至I/O接口2005:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置2006;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置2007;包括例如磁带、硬盘等的存储装置2008;以及通信装置2009。通信装置2009可以允许计算设备2000与其他设备进行无线或有线通信以交换数据。虽然图20示出了具有各种装置的计算设备2000,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
本公开实施例还提供了一种计算机可读存储介质,该存储介质存储有计算机程序,当计算机程序被处理器执行时,使得处理器实现用上述实施例中的视频显示方法或者视频处理方法。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置2009从网络上被下载和安装,或者从存储装置2008被安装,或者从ROM 2002被安装。在该计算机程序被处理装置2001执行时,执行本公开实施例的视频显示方法中限定的上述功能或者执行本公开实施例的视频处理方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分 传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如HTTP之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述计算设备中所包含的;也可以是单独存在,而未装配入该计算设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该计算设备执行时,使得该计算设备执行:
接收针对原始数据的视频生成操作,原始数据用于获取目标文本,视频生成操作用于触发生成目标文本对应的目标视频;响应于视频生成操作,显示生成的目标视频,目标视频的视频元素包括字幕文本和字幕文本对应的多媒体素材,字幕文本根据目标文本生成,多媒体素材根据字幕文本获取;
或者,
接收电子设备发送的携带有原始数据的视频生成请求;响应于视频生成请求,根据原始数据,获取目标文本;根据目标文本,生成字幕文本;获取字幕文本对应的多媒体素材;根据字幕文本和多媒体素材,生成目标视频;向电子设备发送目标视频。
在本公开实施例中,可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算 机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (20)

  1. 一种视频显示方法,其特征在于,包括:
    接收针对原始数据的视频生成操作,所述原始数据用于获取目标文本,所述视频生成操作用于触发生成所述目标文本对应的目标视频;
    响应于所述视频生成操作,显示生成的所述目标视频,所述目标视频的视频元素包括字幕文本和所述字幕文本对应的多媒体素材,所述字幕文本根据所述目标文本生成,所述多媒体素材根据所述字幕文本获取。
  2. 根据权利要求1所述的方法,其特征在于,在所述接收针对原始数据的视频生成操作之后,所述方法还包括:
    响应于所述视频生成操作,显示视频编辑界面,所述视频编辑界面包括可编辑元素,所述可编辑元素包括所述视频元素和所述视频元素对应的显示效果元素中的至少一种;
    接收对所述可编辑元素的元素修改操作;
    响应于所述元素修改操作,显示修改后的目标视频,所述修改后的目标视频包括所述元素修改操作完成时在所述视频编辑界面内显示的可编辑元素。
  3. 根据权利要求1所述的方法,其特征在于,在所述显示生成的所述目标视频之前,所述方法还包括:
    显示指示标识,所述指示标识用于指示已生成所述目标视频;
    接收对所述指示标识的标识触发操作;
    响应于所述标识触发操作,隐藏所述原始数据。
  4. 根据权利要求1所述的方法,其特征在于,在所述显示生成的所述目标视频之前,所述方法还包括:
    向服务器发送携带有所述原始数据的视频生成请求,所述视频生成请求用于使所述服务器基于所述原始数据反馈所述目标文本对应的所述目标视频;
    接收所述服务器反馈的所述目标视频。
  5. 根据权利要求1所述的方法,其特征在于,在所述显示生成的所述目标视频之前,所述方法还包括:
    根据所述原始数据,获取所述目标文本;
    根据所述目标文本,生成所述字幕文本;
    获取所述字幕文本对应的所述多媒体素材;
    根据所述字幕文本和所述多媒体素材,生成所述目标视频。
  6. 一种视频处理方法,其特征在于,包括:
    接收电子设备发送的携带有原始数据的视频生成请求;
    响应于所述视频生成请求,根据所述原始数据,获取目标文本;
    根据所述目标文本,生成字幕文本;
    获取所述字幕文本对应的多媒体素材;
    根据所述字幕文本和所述多媒体素材,生成目标视频;
    向所述电子设备发送所述目标视频。
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述目标文本,生成字幕文本,包括:
    对所述目标文本进行文本摘要提取,得到所述目标文本的摘要;
    对所述摘要进行文本排版,得到所述摘要对应的所述字幕文本。
  8. 根据权利要求6所述的方法,其特征在于,所述原始数据包括文字;
    所述根据所述原始数据,获取目标文本,包括:
    提取所述原始数据中的所述文字;
    将所述文字作为所述目标文本。
  9. 根据权利要求6所述的方法,其特征在于,所述原始数据包括多媒体文件;
    所述根据所述原始数据,获取目标文本,包括:
    对所述多媒体文件进行文本转换,得到转换文本;
    将所述转换文本作为所述目标文本。
  10. 根据权利要求6所述的方法,其特征在于,所述原始数据包括链接地址,所述链接地址用于获取文章内容;
    所述根据所述原始数据,获取目标文本,包括:
    基于所述链接地址,获取所述目标文章;
    提取所述目标文章中的文章文本;
    将所述文章文本作为所述目标文本。
  11. 根据权利要求6所述的方法,其特征在于,所述获取所述字幕文本对应的多媒体素材,包括:
    在多个预设素材中,确定与所述字幕文本的匹配程度最高的目标素材,所述多个预设素材包括根据所述原始数据获取的素材,所述预设素材包括图像和视频中的至少一种;
    将所述目标素材作为所述多媒体素材。
  12. 根据权利要求6所述的方法,其特征在于,所述获取所述字幕文本对应的多媒体素材,包括:
    对所述字幕文本进行文本语音转换,得到所述字幕文本对应的字幕音频;
    将所述字幕音频作为所述多媒体素材。
  13. 根据权利要求6所述的方法,其特征在于,所述获取所述字幕文本对应的多媒体素材,包括:
    将所述目标文本输入预设的文本情感分类模型进行分类,得到所述目标文本所属的情感类别;
    在多个预设背景音频中,确定所述情感类别对应的目标背景音频;
    将所述目标背景音频作为所述多媒体素材。
  14. 根据权利要求6所述的方法,其特征在于,在所述获取目标文本之后、在所述根据所述字幕文本和所述多媒体素材,生成目标视频之前,所述方法还包括:
    将所述目标文本输入预设的文本内容分类模型进行分类,得到所述目标文本所属的内容类别;
    在多个预设视频模板中,确定所述内容类别对应的目标视频模板;
    所述根据所述字幕文本和所述多媒体素材,生成目标视频,包括:
    根据所述字幕文本、所述多媒体素材和所述目标视频模板,生成所述目标视频。
  15. 一种视频显示装置,其特征在于,包括:
    第一接收单元,配置为接收针对原始数据的视频生成操作,所述原始数据用于获取目标文本,所述视频生成操作用于触发生成所述目标文本对应的目标视频;
    第一显示单元,配置为响应于所述视频生成操作,显示生成的所述目标视频,所述目标视频的视频元素包括字幕文本和所述字幕文本对应的多媒体素材,所述字幕文本根据所述目标文本生成,所述多媒体素材根据所述字幕文本获取。
  16. 一种视频处理装置,其特征在于,包括:
    第二接收单元,配置为接收电子设备发送的携带有原始数据的视频生成请求;
    第一获取单元,配置为响应于所述视频生成请求,根据所述原始数据,获取目标文本;
    第一生成单元,配置为根据所述目标文本,生成字幕文本;
    第二获取单元,配置为获取所述字幕文本对应的多媒体素材;
    第二生成单元,配置为根据所述字幕文本和所述多媒体素材,生成目标视频;
    第一发送单元,配置为向所述电子设备发送所述目标视频。
  17. 一种视频处理系统,包括电子设备和服务器,其特征在于:
    所述电子设备用于:接收针对原始数据的视频生成操作,所述原始数据用于获取目标文本,所述视频生成操作用于触发生成所述目标文本对应的目标视频;响应于所述视频生成操作,向所述服务器发送携带有所述原始数据的视频生成请求;接收所述服务器发送的所述目标视频;显示所述目标视频;
    所述服务器用于:接收所述电子设备发送的所述视频生成请求;响应于所述视频生成请求,根据所述原始数据,获取所述目标文本;根据所述目标文本,生成字幕文本;获取所述字幕文本对应的多媒体素材;根据所述字幕文本和所述多媒体素材,生成所述目标视频;向所述电子设备发送所述目标视频。
  18. 一种计算设备,其特征在于,包括:
    处理器;
    存储器,用于存储可执行指令;
    其中,所述处理器用于从所述存储器中读取所述可执行指令,并执行所述可执行指令以实现上述权利要求1-5中任一项所述的视频显示方法或者权利要求6-14中任一项所述的视频处理方法。
  19. 一种计算机可读存储介质,其特征在于,所述存储介质存储有计算机程序,当所述计算机程序被处理器执行时,使得处理器实现上述权利要求1-5中任一项所述的视频显示方法或者权利要求6-14中任一项所述的视频处理方法。
  20. 一种计算机程序产品,其特征在于,所述计算机程序产品包括承载在计算机可读介质上的计算机程序,当所述计算机程序被处理器执行时,使得处理器实现上述权利要求1-5中任一项所述的视频显示方法或者权利要求6-14中任一项所述的视频处理 方法。
PCT/CN2021/130581 2020-12-07 2021-11-15 视频显示及处理方法、装置、系统、设备、介质 WO2022121626A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/256,014 US20240107127A1 (en) 2020-12-07 2021-11-15 Video display method and apparatus, video processing method, apparatus, and system, device, and medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011437788.4 2020-12-07
CN202011437788.4A CN112579826A (zh) 2020-12-07 2020-12-07 视频显示及处理方法、装置、系统、设备、介质

Publications (1)

Publication Number Publication Date
WO2022121626A1 true WO2022121626A1 (zh) 2022-06-16

Family

ID=75132044

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/130581 WO2022121626A1 (zh) 2020-12-07 2021-11-15 视频显示及处理方法、装置、系统、设备、介质

Country Status (3)

Country Link
US (1) US20240107127A1 (zh)
CN (1) CN112579826A (zh)
WO (1) WO2022121626A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115145452A (zh) * 2022-07-01 2022-10-04 杭州网易云音乐科技有限公司 帖子生成方法、介质、终端设备和计算设备

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112579826A (zh) * 2020-12-07 2021-03-30 北京字节跳动网络技术有限公司 视频显示及处理方法、装置、系统、设备、介质
CN113364999B (zh) * 2021-05-31 2022-12-27 北京达佳互联信息技术有限公司 视频生成方法、装置、电子设备及存储介质
CN113473204B (zh) * 2021-05-31 2023-10-13 北京达佳互联信息技术有限公司 一种信息展示方法、装置、电子设备及存储介质
CN113365134B (zh) * 2021-06-02 2022-11-01 北京字跳网络技术有限公司 音频分享方法、装置、设备及介质
CN113497899A (zh) * 2021-06-22 2021-10-12 深圳市大头兄弟科技有限公司 文字与图片的匹配方法、装置、设备及存储介质
CN113630644B (zh) * 2021-06-29 2024-01-30 北京搜狗科技发展有限公司 视频内容编辑器的编辑方法、装置及存储介质
CN113778717A (zh) * 2021-09-14 2021-12-10 北京百度网讯科技有限公司 内容分享方法、装置、设备以及存储介质
CN115811632A (zh) * 2021-09-15 2023-03-17 北京字跳网络技术有限公司 一种视频处理方法、装置、设备及存储介质
CN114297150A (zh) * 2021-11-19 2022-04-08 北京达佳互联信息技术有限公司 媒体文件处理方法、装置、设备及存储介质
CN114900711A (zh) * 2022-05-27 2022-08-12 北京字跳网络技术有限公司 媒体内容的生成方法、装置、设备及存储介质
CN114968463A (zh) * 2022-05-31 2022-08-30 北京字节跳动网络技术有限公司 实体展示方法、装置、设备及介质
CN115334367B (zh) * 2022-07-11 2023-10-17 北京达佳互联信息技术有限公司 视频的摘要信息生成方法、装置、服务器以及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080284910A1 (en) * 2007-01-31 2008-11-20 John Erskine Text data for streaming video
CN108322800A (zh) * 2017-01-18 2018-07-24 阿里巴巴集团控股有限公司 字幕信息处理方法及装置
CN109729420A (zh) * 2017-10-27 2019-05-07 腾讯科技(深圳)有限公司 图片处理方法及装置、移动终端及计算机可读存储介质
CN109756751A (zh) * 2017-11-07 2019-05-14 腾讯科技(深圳)有限公司 多媒体数据处理方法及装置、电子设备、存储介质
CN112579826A (zh) * 2020-12-07 2021-03-30 北京字节跳动网络技术有限公司 视频显示及处理方法、装置、系统、设备、介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107172485B (zh) * 2017-04-25 2020-01-31 北京百度网讯科技有限公司 一种用于生成短视频的方法与装置、输入设备
CN108965737B (zh) * 2017-05-22 2022-03-29 腾讯科技(深圳)有限公司 媒体数据处理方法、装置及存储介质
CN109257659A (zh) * 2018-11-16 2019-01-22 北京微播视界科技有限公司 字幕添加方法、装置、电子设备及计算机可读存储介质
CN111787395B (zh) * 2020-05-27 2023-04-18 北京达佳互联信息技术有限公司 视频生成方法、装置、电子设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080284910A1 (en) * 2007-01-31 2008-11-20 John Erskine Text data for streaming video
CN108322800A (zh) * 2017-01-18 2018-07-24 阿里巴巴集团控股有限公司 字幕信息处理方法及装置
CN109729420A (zh) * 2017-10-27 2019-05-07 腾讯科技(深圳)有限公司 图片处理方法及装置、移动终端及计算机可读存储介质
CN109756751A (zh) * 2017-11-07 2019-05-14 腾讯科技(深圳)有限公司 多媒体数据处理方法及装置、电子设备、存储介质
CN112579826A (zh) * 2020-12-07 2021-03-30 北京字节跳动网络技术有限公司 视频显示及处理方法、装置、系统、设备、介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115145452A (zh) * 2022-07-01 2022-10-04 杭州网易云音乐科技有限公司 帖子生成方法、介质、终端设备和计算设备

Also Published As

Publication number Publication date
US20240107127A1 (en) 2024-03-28
CN112579826A (zh) 2021-03-30

Similar Documents

Publication Publication Date Title
WO2022121626A1 (zh) 视频显示及处理方法、装置、系统、设备、介质
CN109688463B (zh) 一种剪辑视频生成方法、装置、终端设备及存储介质
TWI720062B (zh) 語音輸入方法、裝置和終端設備
WO2022068533A1 (zh) 互动信息处理方法、装置、设备及介质
WO2022042593A1 (zh) 字幕编辑方法、装置和电子设备
EP3198381B1 (en) Interactive video generation
CN107517323B (zh) 一种信息分享方法、装置及存储介质
CN113365134B (zh) 音频分享方法、装置、设备及介质
WO2022105862A1 (zh) 视频生成及显示方法、装置、设备、介质
CN110602516A (zh) 基于视频直播的信息交互方法、装置及电子设备
CN107211198A (zh) 用于编辑内容的装置和方法
CN115082602B (zh) 生成数字人的方法、模型的训练方法、装置、设备和介质
CN112929746B (zh) 视频生成方法和装置、存储介质和电子设备
WO2021238084A1 (zh) 语音包推荐方法、装置、设备及存储介质
WO2023016349A1 (zh) 一种文本输入方法、装置、电子设备和存储介质
US20230214423A1 (en) Video generation
WO2022105760A1 (zh) 一种多媒体浏览方法、装置、设备及介质
CN113746875A (zh) 一种语音包推荐方法、装置、设备及存储介质
WO2023134568A1 (zh) 显示方法、装置、电子设备及存储介质
WO2022252806A1 (zh) 信息处理方法、装置、设备及介质
EP4099711A1 (en) Method and apparatus and storage medium for processing video and timing of subtitles
EP4088216A1 (en) Presenting intelligently suggested content enhancements
CN111443794A (zh) 一种阅读互动方法、装置、设备、服务器及存储介质
WO2022156557A1 (zh) 图像显示方法、装置、设备及介质
WO2022262560A1 (zh) 图像显示方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21902333

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18256014

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 220923)

122 Ep: pct application non-entry in european phase

Ref document number: 21902333

Country of ref document: EP

Kind code of ref document: A1