WO2022121626A1 - 视频显示及处理方法、装置、系统、设备、介质 - Google Patents
视频显示及处理方法、装置、系统、设备、介质 Download PDFInfo
- Publication number
- WO2022121626A1 WO2022121626A1 PCT/CN2021/130581 CN2021130581W WO2022121626A1 WO 2022121626 A1 WO2022121626 A1 WO 2022121626A1 CN 2021130581 W CN2021130581 W CN 2021130581W WO 2022121626 A1 WO2022121626 A1 WO 2022121626A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- text
- target
- subtitle
- original data
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 89
- 238000003672 processing method Methods 0.000 title claims abstract description 27
- 239000000463 material Substances 0.000 claims abstract description 227
- 230000004044 response Effects 0.000 claims abstract description 60
- 230000000694 effects Effects 0.000 claims description 42
- 230000004048 modification Effects 0.000 claims description 40
- 238000012986 modification Methods 0.000 claims description 40
- 238000012545 processing Methods 0.000 claims description 38
- 238000006243 chemical reaction Methods 0.000 claims description 34
- 238000003860 storage Methods 0.000 claims description 24
- 230000008451 emotion Effects 0.000 claims description 21
- 238000004590 computer program Methods 0.000 claims description 20
- 238000013145 classification model Methods 0.000 claims description 14
- 239000013077 target material Substances 0.000 claims description 12
- 230000001960 triggered effect Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 40
- 238000000605 extraction Methods 0.000 description 20
- 230000006870 function Effects 0.000 description 17
- 230000008921 facial expression Effects 0.000 description 14
- 238000005516 engineering process Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 11
- 238000004891 communication Methods 0.000 description 9
- 238000004519 manufacturing process Methods 0.000 description 9
- 230000004927 fusion Effects 0.000 description 8
- 230000001815 facial effect Effects 0.000 description 7
- 238000012015 optical character recognition Methods 0.000 description 6
- 238000003825 pressing Methods 0.000 description 6
- 238000009877 rendering Methods 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 5
- 238000005034 decoration Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 230000007704 transition Effects 0.000 description 4
- 230000037237 body shape Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 238000012790 confirmation Methods 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 239000012769 display material Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/816—Monomedia components thereof involving special video data, e.g 3D video
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/7867—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
- G06V20/635—Overlay text, e.g. embedded captions in a TV program
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19173—Classification techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/431—Generation of visual interfaces for content selection or interaction; Content or additional data rendering
- H04N21/4312—Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/431—Generation of visual interfaces for content selection or interaction; Content or additional data rendering
- H04N21/4318—Generation of visual interfaces for content selection or interaction; Content or additional data rendering by altering the content in the rendering process, e.g. blanking, blurring or masking an image region
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/488—Data services, e.g. news ticker
- H04N21/4884—Data services, e.g. news ticker for displaying subtitles
Definitions
- the present disclosure relates to the field of multimedia technologies, and in particular, to a video display and processing method, apparatus, system, device, and medium.
- the user when producing a video, the user first needs to find the material by himself, and then perform a series of complex video editing operations on the material to finally generate a video work. If the material found by the user is not rich enough, the quality of the video edited manually by the user cannot be guaranteed, and the manual editing operation steps are complicated and time-consuming, making the time cost of video production relatively high.
- the present disclosure provides a video display and processing method, apparatus, system, device, and medium, which can reduce the time cost of video production.
- the present disclosure provides a video display method, including:
- Receive a video generation operation for the original data the original data is used to obtain the target text, and the video generation operation is used to trigger the generation of the target video corresponding to the target text;
- the generated target video is displayed.
- the video elements of the target video include subtitle text and multimedia material corresponding to the subtitle text.
- the subtitle text is generated according to the target text, and the multimedia material is obtained according to the subtitle text.
- the present disclosure provides a video processing method, including:
- the present disclosure provides a video display device, comprising:
- a first receiving unit configured to receive a video generation operation for the original data, the original data is used to obtain the target text, and the video generation operation is used to trigger the generation of the target video corresponding to the target text;
- the first display unit is configured to display the generated target video in response to the video generation operation, the video elements of the target video include subtitle text and multimedia material corresponding to the subtitle text, the subtitle text is generated according to the target text, and the multimedia material is obtained according to the subtitle text.
- the present disclosure provides a video processing apparatus, including:
- a second receiving unit configured to receive a video generation request carrying original data sent by the electronic device
- a first obtaining unit configured to obtain the target text according to the original data in response to the video generation request
- a first generating unit configured to generate subtitle text according to the target text
- a second obtaining unit configured to obtain multimedia material corresponding to the subtitle text
- the second generation unit is configured to generate the target video according to the subtitle text and the multimedia material
- the first sending unit is configured to send the target video to the electronic device.
- the present disclosure provides a video processing system, including an electronic device and a server, wherein:
- the electronic device is used to receive the video generation operation for the original data, the original data is used to obtain the target text, and the video generation operation is used to trigger the generation of the target video corresponding to the target text; in response to the video generation operation, the video carrying the original data is sent to the server. Generate a request; receive the target video sent by the server; display the target video;
- the server is used to receive the video generation request sent by the electronic device; in response to the video generation request, obtain the target text according to the original data; according to the target text, generate the subtitle text; obtain the multimedia material corresponding to the subtitle text; target video; send the target video to an electronic device.
- the present disclosure provides a computing device, comprising:
- the processor is configured to read executable instructions from the memory and execute the executable instructions to implement the video display method described in the first aspect or the video processing method described in the second aspect.
- the present disclosure provides a computer-readable storage medium, the storage medium stores a computer program, and when the computer program is executed by a processor, enables the processor to implement the video display method described in the first aspect or the second aspect the video processing method.
- the present disclosure provides a computer program product, the computer program product comprising a computer program carried on a computer-readable medium, and when the computer program is executed by a processor, causes the processor to realize the video described in the first aspect The display method or the video processing method described in the second aspect.
- the video display and processing method, device, system, device, and medium of the embodiments of the present disclosure can receive a video generation operation for raw data. Since the original data can be used to obtain target text, the video generation operation can be used to trigger the generation of the target.
- the target video corresponding to the text therefore, after the video generation operation is received, the target video generated in response to the video generation operation can be displayed, and the video element of the target video can include subtitle text and the multimedia material corresponding to the subtitle text, wherein the subtitle text
- the text can be automatically generated according to the target text, and the multimedia material can be automatically obtained according to the subtitle text. It can be seen that rich multimedia materials can be automatically found during the generation of the target video, and users do not need to manually search for the material for making the video, which not only reduces the cost of making videos. Time cost, but also can improve the quality of the video produced.
- FIG. 1 is an architectural diagram of a video production scene provided by an embodiment of the present disclosure
- FIG. 2 is a schematic flowchart of a video display method according to an embodiment of the present disclosure
- FIG. 3 is a schematic diagram of a raw data input interface provided by an embodiment of the present disclosure.
- FIG. 4 is a schematic diagram of another original data input interface provided by an embodiment of the present disclosure.
- FIG. 5 is a schematic diagram of still another original data input interface provided by an embodiment of the present disclosure.
- FIG. 6 is a schematic diagram of a video display interface provided by an embodiment of the present disclosure.
- FIG. 7 is a schematic diagram of another video display interface provided by an embodiment of the present disclosure.
- FIG. 8 is a schematic diagram of still another video display interface provided by an embodiment of the present disclosure.
- FIG. 9 is a schematic diagram of still another video display interface provided by an embodiment of the present disclosure.
- FIG. 10 is a schematic diagram of still another video display interface provided by an embodiment of the present disclosure.
- FIG. 11 is a schematic diagram of still another original data input interface provided by an embodiment of the present disclosure.
- FIG. 12 is a schematic diagram of still another original data input interface provided by an embodiment of the present disclosure.
- FIG. 13 is a schematic diagram of still another video display interface provided by an embodiment of the present disclosure.
- FIG. 14 is a schematic flowchart of another video display method provided by an embodiment of the present disclosure.
- FIG. 15 is a schematic diagram of still another video display interface provided by an embodiment of the present disclosure.
- 16 is a schematic flowchart of a video processing method provided by an embodiment of the present disclosure.
- FIG. 17 is a schematic diagram of an interaction flow of a video processing system according to an embodiment of the present disclosure.
- FIG. 18 is a schematic structural diagram of a video display device according to an embodiment of the present disclosure.
- FIG. 19 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present disclosure.
- FIG. 20 is a schematic structural diagram of a computing device according to an embodiment of the present disclosure.
- the term “including” and variations thereof are open-ended inclusions, ie, "including but not limited to”.
- the term “based on” is “based at least in part on.”
- the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
- the video display and processing method provided by the present disclosure can be applied to the architecture shown in FIG. 1 , and will be described in detail with reference to FIG. 1 .
- FIG. 1 shows an architecture diagram of a video production scenario provided by an embodiment of the present disclosure.
- the architecture diagram may include at least one electronic device 101 on the client side and at least one server 102 on the server side.
- the electronic device 101 may establish a connection with the server 102 and perform information exchange through a network protocol such as Hyper Text Transfer Protocol over Secure Socket Layer (HTTPS).
- HTTPS Hyper Text Transfer Protocol over Secure Socket Layer
- the electronic device 101 may include a mobile phone, a tablet computer, a desktop computer, a notebook computer, a vehicle-mounted terminal, a wearable device, an all-in-one computer, a smart home device, or other devices with communication functions, and may also include a device simulated by a virtual machine or a simulator .
- the server 102 may include a device with storage and computing functions, such as a cloud server or a server cluster.
- the user can make a video in a designated platform on the electronic device 101, and the designated platform can be a designated application program or a designated website.
- the user can send the video to the server 102 of the designated platform, and the server 102 can receive the video sent by the electronic device 101 and store the received video, so as to send the video to the electronic device that needs to play the video.
- the electronic device 101 in order to reduce the time cost of producing a video and improve the quality of the produced video, the electronic device 101 can receive a user's video generation operation for raw data. Since the raw data can be used to obtain target text, the video generation operation It can be used to trigger the generation of the target video corresponding to the target text. Therefore, after the electronic device 101 receives the video generation operation, it can display the target video generated in response to the video generation operation.
- the video elements of the target video can include subtitle text and The multimedia material corresponding to the subtitle text, in which the subtitle text can be automatically generated according to the target text, and the multimedia material can be automatically obtained according to the subtitle text. It can be seen that rich multimedia materials can be automatically found during the generation process of the target video, without the need for the user to manually search for production.
- the material of the video can not only reduce the time cost of producing the video, but also improve the quality of the video produced.
- the electronic device 101 may obtain target text according to the original data, and generate subtitle text according to the target text, and then obtain the corresponding subtitle text.
- Multimedia material to generate target video according to the subtitle text and multimedia material so that the electronic device 101 locally obtains the target text based on the original data and generates the target video corresponding to the target text, so as to further reduce the time cost of video production.
- the electronic device 101 may also send a video generation request carrying the original data to the server 102 after receiving the video generation operation.
- the server 102 may, in response to the video generation request, obtain the target text according to the original data, and generate the subtitle text according to the target text, and then obtain the corresponding subtitle text.
- the multimedia material to generate the target video according to the subtitle text and the multimedia material, and send the generated target video to the electronic device 101, so that the electronic device 101 can request the server 102 to obtain the target text based on the original data and generate the target video corresponding to the target text, In order to further improve the quality of the produced video and reduce the data processing amount of the electronic device 101 .
- the video display method may be performed by an electronic device, for example, the electronic device 101 in the client shown in FIG. 1 .
- the electronic devices may include devices with communication functions such as mobile phones, tablet computers, desktop computers, notebook computers, vehicle terminals, wearable devices, all-in-one computers, and smart home devices, and may also include devices simulated by virtual machines or simulators.
- FIG. 2 shows a schematic flowchart of a video display method provided by an embodiment of the present disclosure.
- the video display method may include the following S210-S220.
- S210 Receive a video generation operation for the original data, where the original data is used to obtain target text, and the video generation operation is used to trigger generation of a target video corresponding to the target text.
- the target text may be all text contents involved in the original data.
- the original data can be data input by the user, or data sent by other devices to the electronic device.
- the video display method may further include:
- the raw data input by the user is displayed in real time.
- the user input operation may include an operation of adding original data, or may include an operation of inputting original data, which is not limited herein.
- the user can trigger a data input operation on the electronic device to input the original data that the user wants to input to the electronic device.
- the electronic device can respond to the data input operation in real time and display the user in real time. input raw data.
- the raw data may include text.
- the electronic device may display a first input box for entering text, and the user may input an input operation for entering text into the electronic device in the first input box, so that the electronic device can The entered text is displayed in the first input box.
- the first input box can be used to input text such as article title and article content, and the user can input the article title and article content in the first input box.
- FIG. 3 shows a schematic diagram of a raw data input interface provided by an embodiment of the present disclosure.
- a plurality of first input boxes may be displayed in the original data input interface 301 , such as an “article title” input box 302 and a “article content” input box 303 .
- the user may perform an input operation in the "article title” input box 302 to input the article title, and may also perform an input operation in the "article content” input box 303 to input the article content.
- FIG. 4 shows a schematic diagram of another original data input interface provided by an embodiment of the present disclosure.
- a plurality of first input boxes may be displayed in the original data input interface 401 , for example, an “article title” input box 402 and a “subtitle” input box 403 .
- the user can perform an input operation in the "article title” input box 402 to input the article title, and can also perform an input operation in the "subtitle” input box 403 to input the subtitles to be displayed in the video.
- FIG. 5 shows a schematic diagram of still another original data input interface provided by an embodiment of the present disclosure.
- a plurality of first input boxes may be displayed in the original data input interface 501 , such as an “article title” input box 502 and a “article content” input box 503 .
- the user may perform an input operation in the "article title” input box 502 to input the article title, and may also perform an input operation in the "article content” input box 503 to input the article content.
- the target text can be obtained through the text in the original data.
- the original data may further include a link address
- the link address may be used to obtain article content.
- the electronic device may display a second input box for inputting the link address, and the user may input an input operation for inputting the link address to the electronic device in the second input box, so that the electronic device can The link address input by the user is displayed in the second input box.
- the second input box can be used to input the URL of the article or a link address such as an identity document (Identity Document, ID), and the user can input the URL or ID of the article in the second input box.
- link address can be a string in any form such as a website address, ID, etc., as long as the string can be used to obtain the content of the article required by the user, which is not limited here.
- a second input box such as an “article link” input box 304
- an “article link” input box 304 may be displayed in the original data input interface 301 .
- the user may perform an input operation in the "article link” input box 304 to input the URL or ID of the article.
- a second input box such as an “article link” input box 504
- an “article link” input box 504 may be displayed in the original data input interface 501 .
- the user may perform an input operation in the "article link” input box 504 to input the URL or ID of the article.
- the target text can be obtained through the article content obtained based on the link address in the original data.
- the original data may further include a link address, and the link address may also be used to obtain video content.
- the method for the user to enter the link address for obtaining the video content is similar to the above-mentioned method for entering the link address for obtaining the article content, which will not be repeated here.
- the video content obtained based on the link address in the original data can be used to obtain the target text.
- the raw data may include multimedia files.
- the multimedia files may include at least one of image files, audio files and video files.
- the electronic device may display an add control for adding a multimedia file, and a user may input an add operation for adding a multimedia file to the electronic device through an add button, so that the electronic device may display the multimedia file added by the user.
- the add control may be an add button
- the add operation may include trigger operations such as clicking, long pressing, and double clicking on the adding button, selecting operations such as clicking, long pressing, and selecting a multimedia file, and clicking and long pressing the selection confirmation button.
- double-click and other trigger operations the user can click the add button to enter the multimedia file selection interface, browse the multimedia files in the multimedia file selection interface and click the desired multimedia file, and finally click the selection confirmation button to complete the adding operation of the multimedia file.
- a plurality of adding controls may be displayed in the raw data input interface 401 , such as an “image material” adding control 404 .
- the user can perform an adding operation through the "image material” adding control 404 to add a picture file.
- the raw data input interface 501 may display an add control, such as an “image/video material” add control 505 .
- the user can perform an adding operation through the “image/video material” adding control 505 to add a picture file or a video file.
- the video file may include a video file captured by the user in real time, or may include a video file specified by the user in the video file stored locally on the electronic device.
- the video files may also include video files generated based on the embodiments of the present disclosure, and may also include video files obtained after editing the video files generated based on the embodiments of the present disclosure, and so on, so that the video files can be further optimized. edit.
- the target text can be obtained through the multimedia file in the original data.
- the video display method before receiving a user's data input operation, may further include:
- a raw data input interface corresponding to the selected input mode is displayed.
- receiving the user's data input operation may specifically include:
- the input mode may include an automatic input input mode and a manual input input mode.
- the automatic input input mode the user can input the above-mentioned raw data, so that the target text can be obtained through the raw data.
- the manual input mode the user can directly input the multimedia material and subtitle text used to generate the target video.
- the mode selection operation may include a user input gesture operation that triggers opening of different input modes.
- the electronic device may be preset with multiple input modes and multiple gesture operations, and one gesture operation may be used to trigger opening of a corresponding input mode.
- the user can determine the input mode that he wants to select, and input the gesture operation corresponding to the selected input mode to the electronic device, so that after the electronic device receives the gesture operation input by the user, the input mode corresponding to the received gesture operation is enabled, and Displays the raw data input interface corresponding to the enabled input mode.
- the raw data input interface can display controls for inputting raw data supported by the enabled input mode, and the user can input the data corresponding to the controls through the displayed controls. Enter an action.
- the mode selection operation may include a user's selection operation on selection controls of different input modes, such as operations such as clicking, long pressing, and double-clicking on the selection control.
- the electronic device can display multiple selection controls, and one selection control can correspond to one input mode.
- the user can determine the input mode that he wants to select, and input a selection operation to the selection control corresponding to the selected input mode, so that the electronic device
- the selection control selected by the selection operation is displayed in the selected state
- the input mode corresponding to the selection control in the selected state is enabled
- the original data input interface corresponding to the opened input mode is displayed
- the raw data input interface may display a control for inputting raw data supported by the enabled input mode, and the user may input a data input operation corresponding to the control through the displayed control.
- a plurality of selection controls may be displayed in the raw data input interface 301 , such as an “automatic entry” selection control 305 and a “manual entry” selection control 306 .
- the raw data input interface 301 may display controls corresponding to the automatic entry input mode, such as the "Article Title” input box 302 and the "Article Content” input box 303 and "article link” input box 304.
- a plurality of selection controls may be displayed in the raw data input interface 401 , such as an “automatic entry” selection control 405 and a “manual entry” selection control 406 .
- the original data input interface 401 may display controls corresponding to the manual entry input mode, such as the "article title” input box 402, "subtitle” input box 403 and "Image material” adds control 405.
- the user can manually enter the article title and subtitle text in the "article title” input box 402 and the "subtitle” input box 403, respectively.
- One page may correspond to one “image material” adding control 404
- one “image material” adding control 404 may correspond to at least one “subtitle” input box 403
- a page editing area 407 may be set for the page corresponding to each page number, and at least one "subtitle” input box 403 and "image material” adding control 404 of the page may be located in the page editing area 407 of the page.
- a page editing area 407 may be correspondingly set on the right side of the page number "1”, and the subtitle text and image manually entered by the user through the "subtitle” input box 403 and the "image material” add control 404 in the page editing area 407, Both are the subtitle text and image of Page 1.
- the display order of the subtitle text and images of the page 1 corresponds to the setting order of the corresponding "subtitle” input box 403 and the "image material” adding control 405 .
- the raw data input interface 401 may also display material addition controls, such as the “Add” button 408, so that the user can add a new page through the “Add” button 408 Corresponding "subtitle” input box 403 and “image material” add controls 404, or add a "subtitle” input box 403 in the displayed page.
- the original data input interface 401 may also display a subtitle deletion control, such as a "-" button 409, and a "-" button 409 corresponds to a "subtitle” input box 403 , the user can delete the corresponding “subtitle” input box 403 through the “-” button 409 .
- a subtitle deletion control such as a "-" button 409
- a "-" button 409 corresponds to a "subtitle” input box 403
- the user can delete the corresponding “subtitle” input box 403 through the “-” button 409 .
- the user can sequentially input the title of the article, the subtitles of each clause, and the image material corresponding to each subtitle in the manual input mode.
- a plurality of selection controls may be displayed in the raw data input interface 501 , such as an “automatic entry” selection control 506 and a “manual entry” selection control 507 .
- the raw data input interface 501 may display controls corresponding to the automatic entry input mode, such as the “Article Title” input box 502 and the “Article Content” input box 503 , an "article link” input box 504 and an "image/video clip” add control 505.
- the video display method may further include: receiving and displaying the original data.
- the original data may include at least one of text, link addresses, and multimedia files, which will not be repeated here.
- the electronic device can provide the user with rich input methods of raw data for the user to select according to needs, which further improves the user's experience.
- the user may input a video generation operation for the original data to the electronic device, so as to trigger generation and display of the target video corresponding to the target text.
- the video generation operation may be a trigger operation such as a long press, double click, voice control, or expression control on the original data
- the video generation operation may also be a trigger operation such as a click, a long press, or a double click on the video generation trigger control.
- the raw data input interface 301 may display a video generation trigger control, such as a “Generate Video” button 307, and the user can input a trigger operation to the “Generate Video” button 307 to trigger the generation and display of the raw data involved.
- a video generation trigger control such as a “Generate Video” button 307
- the target video corresponding to the target text.
- the raw data input interface 401 may display a video generation trigger control, such as the “Generate Video” button 410, and the user can input a trigger operation to the “Generate Video” button 410 to trigger the generation and display of the target involved in the raw data.
- a video generation trigger control such as the “Generate Video” button 410
- the user can input a trigger operation to the “Generate Video” button 410 to trigger the generation and display of the target involved in the raw data.
- the target video corresponding to the text.
- the raw data input interface 501 may display a video generation trigger control, such as a “Generate Video” button 508 , and the user can input a trigger operation to the “Generate Video” button 508 to trigger the generation and display of the raw data involved.
- a video generation trigger control such as a “Generate Video” button 508
- the target video corresponding to the target text.
- the electronic device may display the target video generated based on the target text involved in the original data in response to the video generation operation.
- the electronic device may display the target video in a full screen, or may display the target video in a partial display area.
- the partial display area may be a part of the display area in the display screen.
- the partial display area may be a part of the display area of any one of the display screens, or may be any one of the display screens.
- FIG. 6 shows a schematic diagram of a video display interface provided by an embodiment of the present disclosure.
- the electronic device may display a video display interface 601 in response to the video generation operation, and display a full-screen display window 602 of the target video in the video display interface 601 .
- FIG. 7 shows a schematic diagram of another video display interface provided by an embodiment of the present disclosure.
- the electronic device may display a video display interface 701 in response to the video generation operation, and a play window 702 of the target video may be displayed in the central display area of the video display interface 701 .
- the video element of the target video may include at least one subtitle text, and one subtitle text may correspond to at least one multimedia material.
- the multimedia material may include at least one of images, videos, audios, and the like.
- the subtitle text can be automatically generated according to the target text obtained from the original data.
- the subtitle text may be generated by an electronic device. In other embodiments, the subtitle text can also be generated by the server, which will be described in detail later.
- the server may be the server 102 in the server shown in FIG. 1 .
- the multimedia material can be automatically obtained from a plurality of local or Internet materials according to the subtitle text.
- the multimedia material may be acquired by an electronic device.
- the multimedia material can also be obtained by the server, which will be described in detail later.
- a video generation operation for raw data can be received. Since the raw data can be used to obtain target text, the video generation operation can be used to trigger the generation of a target video corresponding to the target text. Therefore, after receiving the video After the generation operation, the target video generated in response to the video generation operation can be displayed, and the video element of the target video can include subtitle text and multimedia material corresponding to the subtitle text, wherein the subtitle text can be automatically generated according to the target text, and the multimedia material can be based on The subtitle text is automatically obtained. It can be seen that rich multimedia materials can be automatically found during the generation of the target video, and users do not need to manually search for materials for making videos, which can not only reduce the time cost of making videos, but also improve the quality of the videos.
- the video display method may further include:
- the video editing interface includes editable elements, and the editable elements include at least one of video elements and display effect elements corresponding to the video elements;
- a modified target video is displayed, and the modified target video includes editable elements displayed within the video editing interface when the element modification operation is completed.
- the electronic device may display a video editing interface for adjusting the target video in response to the video generation operation.
- the user can adjust at least one of the video element of the target video and the display effect element corresponding to the video element in the video editing interface. Therefore, at least one of the video element and the display effect element corresponding to the video element can be used as an editable element in an editable state in the video editing interface.
- the display effect elements may include transition effect elements, playback effects, special effects, decoration effect elements, and the like.
- the modification effect element may include an effect element that plays a role in modifying the video element, such as the tone, size, contrast, color, modified text, and the like of the video element.
- the user can input an element modification operation to the editable element that he wants to adjust in the video editing interface, so that the electronic device can display the editable element adjusted by the user in the process of the element modification operation in real time, and then display the edited element in real time.
- the modified target video generated according to the editable elements displayed in the video editing interface when the user completes the element modification operation is displayed.
- the element modification operation may include a modification operation on the subtitle text, an addition operation on the subtitle text, a deletion operation on the subtitle text, a replacement operation on the image material in the multimedia material, and an operation on the multimedia material.
- a completion indicator control may also be displayed in the video editing interface, and the user can input a click, long press, double-click, etc. to the completion indicator control to complete the trigger operation.
- the electronic device can In response to receiving a completion trigger operation input by the user to the completion indicating control, it is determined that the user has completed the element modification operation, and the modified target video generated according to the editable elements displayed in the video editing interface when the user completes the element modification operation is displayed .
- the video editing interface may be displayed in the same interface as the target video.
- the playback window of the target video is displayed in the video display interface, and the video editing interface is displayed below the playback window.
- FIG. 8 shows a schematic diagram of still another video display interface provided by an embodiment of the present disclosure.
- the electronic device may display a video display interface 801 in response to the video generation operation, and display a playback window 802 of the target video in the video display interface 801, below the playback window 802
- a video editing interface 803 is displayed.
- a scroll bar 804 may be displayed on the right side of the video display interface 801 , and the user may view the content of the video editing interface 803 by dragging the scroll bar 804 .
- the user can perform element modification operations on editable elements such as article title, subtitle text, image material, etc. in the video editing interface 803 .
- modify the article title through the “Article Title” input box 805 add the subtitle text of a new page through the “Add” button 806 or add the subtitle text of the displayed page, and use the “Subtitle” button 806
- the input box 807 is used to modify the subtitle text, use the "-" button 808 to delete the subtitle file, and use the "image material” add control 809 to perform replacement and addition operations on image material and video material.
- a completion indication control may be displayed at the bottom of the video editing interface 803, such as a “Submit Modification” button 810, and the user can input a completion triggering operation to the “Submit Modification” button 810 to trigger the generation and display of modifications After the target video.
- FIG. 9 shows a schematic diagram of still another video display interface provided by an embodiment of the present disclosure.
- the electronic device may display a video display interface 901 in response to the video generation operation, and display a playback window 902 of the target video in the video display interface 901 , below the playback window 902
- a video editing interface 903 is displayed.
- the user can perform element modification operations on editable elements such as article title, subtitle text, image material, etc. in the video editing interface 903 .
- modify the article title through the “Article Title” input box 904 add the subtitle text of a new page through the “Add” button 905 or add the subtitle text of the displayed page, and use the “Subtitle” button 905
- the input box 906 is used to modify the subtitle text, use the "-" button 907 to delete the subtitle file, and use the "image material” add control 908 to perform replacement and addition operations on image material and video material.
- a scroll bar 909 may be displayed on the right side of the video editing interface 903 , and the user may view the content not displayed in the video editing interface 903 by dragging the scroll bar 909 .
- a completion indicating control may be displayed at the bottom of the video editing interface 903, such as a “Submit Modification” button 910, and the user can input a completion triggering operation to the “Submit Modification” button 910 to trigger the generation and display of modifications After the target video.
- FIG. 10 shows a schematic diagram of still another video display interface provided by an embodiment of the present disclosure.
- the electronic device may display the video display interface 901 and display within the video display interface 901 in response to the video generation operation after receiving the video generation operation
- a video editing interface 903 is displayed below the play window 902 .
- the play window 902 may be located in the first display screen 912
- the video editing interface 903 may be located in the second display screen 913 .
- video editing interface 903 is similar to the embodiment shown in FIG. 9 , and details are not described here.
- a video export control may also be displayed in the video display interface. If the user is satisfied with the video effect, the user can input a click, long press, double click, etc. to the video export control to trigger export operations, so that the electronic device responds to the After receiving the export trigger operation input by the user to the video export control, the video displayed in the video display interface is saved locally to the electronic device.
- video export controls may be displayed at the bottom of the video editing interface 803 , such as an “Export Video” button 811 .
- the “Export Video” button 811 may be located at the bottom of the “Submit Modification” button 810 Right.
- the user can input an export trigger operation to the "export video” button 810, so that the electronic device saves the video displayed in the video display interface 811 locally to the electronic device.
- a video export control may be displayed at the bottom of the video display interface 901, such as an “Export Video” button 911, and the user can input an export trigger operation to the “Export Video” button 911, so that the electronic device displays the video in the interface
- the video displayed in 901 is saved locally to the electronic device.
- the user can view the related elements in the video and the adjusted video on the same page, so as to improve the user's experience.
- the video editing interface and the target video may be displayed in a different interface.
- the playback window of the target video and the modification trigger controls such as the "video modification" button, are displayed in the video display interface.
- the user can input modification trigger operations such as clicking, long-pressing, and double-clicking on the modification trigger control, so that the electronic device can respond to receiving To the modification trigger operation input by the user to the modification trigger control, the video display interface jumps to the display video editing interface.
- the video editing interface is the same as the above-mentioned embodiment, which is not repeated here.
- the video editing interface can also jump to the display video display interface, so as to display the content displayed in the video editing interface when the user completes the element modification operation.
- the modified target video generated by the editable element can also jump to the display video display interface, so as to display the content displayed in the video editing interface when the user completes the element modification operation.
- a video export control may also be displayed in the video display interface, and the user can input an export trigger operation such as clicking, long-pressing, and double-clicking on the video exporting control, so that the electronic device responds to receiving the user's export of the video.
- the export trigger operation of the control input saves the video displayed in the video display interface to the local electronic device.
- the target video and the video editing interface can be displayed independently, and the user experience can be improved.
- the video display method may further include:
- the indicator is used to indicate that the target video has been generated
- the raw data is hidden in response to the identification triggering the operation.
- the electronic device may not directly display the target video, but display an indicator for indicating that the target video has been generated.
- the user can know that the target video has been produced, and can input a mark triggering operation for the indication mark to the electronic device to trigger the display of the target video that has been generated.
- the electronic device can hide the currently displayed original data in response to the identification triggering operation, and display a video display interface, and the target video can be displayed in the video display interface.
- the mark triggering operation may be a trigger operation such as a click, a long press, or a double click on the mark, which is not limited herein.
- FIG. 11 shows a schematic diagram of still another original data input interface provided by an embodiment of the present disclosure.
- FIG. 12 shows a schematic diagram of still another original data input interface provided by an embodiment of the present disclosure.
- Fig. 13 shows a schematic diagram of another video display interface provided by an embodiment of the present disclosure.
- the top display area of the display screen of the electronic device may display multiple page identifiers, such as the “input article” page identifier 1101 and the “output video” page identifier 1102, when the “input article” page identifier 1101 is displayed as a selected state , a raw data input interface 1103 may be displayed below the top display area.
- the display content and interaction method of the original data input interface 1103 are similar to those shown in FIG. 3 and FIG. 4 , and will not be repeated here.
- the electronic device can wait to generate the target video, and while the electronic device is waiting to generate the target video, the raw data input interface 1103 can be displayed. It is in an inoperable state, such as grayed out.
- an indicator such as the “ ⁇ ” icon 1104
- the user can input the indicator to the “ ⁇ ” icon 1104 to trigger an operation, so that the electronic device displays target video.
- the original data input interface 1103 can be hidden below the top display area, and after hiding the original data input interface 1103, in the top display area A video display interface 1105 is displayed below.
- the video display interface 1105 may display a playback window 1106 of the target video.
- a progress prompt screen for indicating the production progress of the target video may be superimposed on the raw data input interface 1103, for example, a progress prompt bar to display Let users know the progress of the production of the target video.
- the electronic device can locally automatically generate subtitle text according to the target text obtained through the original data, automatically obtain multimedia material according to the subtitle text, and automatically generate the target video according to the subtitle text and the multimedia material, so as to reduce the number of target The generation time of the video.
- the electronic device when the user inputs raw data to the electronic device through the automatic input input mode or the electronic device receives the raw data sent by other devices, the electronic device can locally automatically according to the target text obtained through the raw data Generate subtitle text, automatically acquire multimedia material according to subtitle text, and automatically generate target video according to subtitle text and multimedia material.
- FIG. 14 shows a schematic flowchart of another video display method provided by an embodiment of the present disclosure.
- the video display method may include the following S1410-S1460.
- the raw data is used to obtain the target text, and the video generation operation is used to trigger the generation of the target video corresponding to the target text.
- S1410 is similar to S210 in the embodiment shown in FIG. 2 , and details are not described here.
- S1420 may specifically include:
- the electronic device may determine the data type of the original data in response to the video generation operation, and in the case of determining that the data type of the original data is a text type, directly extract the text in the original data, and use the extracted text as a target text.
- the electronic device may determine the data type of the raw data through a control that receives the raw data.
- the electronic device may, in response to receiving a trigger operation input by the user on the “Generate Video” button 307 , determine that the controls for receiving the raw data are the “Article Title” input box 302 and the “Article Content” input box 303 .
- the electronic device may determine that the data type of the original data is a text type, and then directly extract the text in the original data such as text title and text content, and use the extracted text as the target text.
- S1420 may specifically include:
- the electronic device may determine the data type of the original data in response to the video generation operation, and in the case of determining that the data type of the original data is the multimedia file type, perform text conversion on the multimedia file to obtain the converted text, and convert the converted text as the target text.
- the electronic device may determine the data type of the raw data through a control that receives the raw data.
- the electronic device may determine that the control that received the raw data is the “Image/Video Material” add control 505 . Determine the data type of the original data as the file type, and then perform text conversion on the multimedia file to obtain the converted text, and use the converted text as the target text.
- the text of the image file can be converted by using the Optical Character Recognition (OCR) technology to obtain the converted text;
- OCR Optical Character Recognition
- the content is learned, and the text conversion is performed by summarizing the content of the image screen to obtain the converted text describing the content of the image screen, which is not limited here.
- text conversion can be performed on each image frame of the video file by using the Optical Character Recognition (OCR) technology to obtain the converted text;
- OCR Optical Character Recognition
- the image content of each image frame of the video file is learned, and the text is converted by summarizing the content of the image to obtain the converted text describing the content of the image.
- the audio in the video file can also be converted to text through speech recognition to obtain the converted text. , which is not limited here.
- the audio file may be converted to text through speech recognition to obtain the converted text.
- S1420 may specifically include:
- the electronic device may determine the data type of the original data in response to the video generation operation, and in the case where the data type of the original data is determined to be the address type, obtain the target article based on the link address, and directly extract the article in the target article text, and then use the extracted article text as the target text.
- the electronic device may determine the data type of the raw data through a control that receives the raw data.
- the electronic device may determine the original The data type of the data is address type, and then the link address in the original data is extracted, based on the link address, the target article is obtained, and the article text in the target article is directly extracted, and then the extracted article text is used as the target text.
- generating subtitle text may specifically include:
- Text typesetting is performed on the abstract to obtain the subtitle text corresponding to the abstract.
- the electronic device can directly input the target file into a preset abstract extraction model to obtain an abstract of the target text, and then directly input the obtained abstract into a preset text typesetting model to obtain sentence-based, word-to-word Subtitle text with matching punctuation that has been processed across lines and pages.
- the electronic device may also, according to at least one of the title of the target text and the text content, in the Internet article text or the local article text, filter the text whose similarity with the target text meets the preset text similarity threshold. Similar article text. If similar article texts are not filtered, you can directly input the target file into the preset abstract extraction model to get the abstract of the target text. If similar article texts are filtered, you can perform a weighted sum based on the text length, the number of likes, and the number of retweets in the target text and similar article texts to obtain the text score of each text, and select the text with the highest text score, and assign The selected text is input into the preset abstract extraction model to obtain the abstract of the target text. After the electronic device obtains the abstract, the abstract can be directly input into a preset text typesetting model to obtain subtitle text in sentence units, with words processed across lines and pages, and configured with matching punctuation marks.
- text cleaning may also be performed on sensitive keywords in the abstract, such as institution name and user personal information, as well as special symbols that cannot generate voice and audio, and then the cleaned Text typesetting is performed on the abstract, and the subtitle text corresponding to the abstract is obtained.
- generating subtitle text according to the target text may further specifically include:
- the method for obtaining the abstract of the target text by the electronic device is similar to the above-mentioned embodiment, and details are not described here.
- S1440 may specifically include:
- the multiple preset materials include materials obtained according to the original data, and the preset materials include at least one of images and videos;
- the electronic device may first acquire a plurality of preset materials, and the preset materials may include at least one of materials in a material library and materials in the Internet.
- the electronic device may acquire at least one material from images and videos based on the original data, and the preset material may also include materials acquired according to the original data.
- the electronic device may determine the degree of matching between each preset material and each subtitle text, and then select a preset number of preset materials with the highest matching degree for each subtitle text, and select the selected The preset material is used as the target material corresponding to the subtitle text.
- the electronic device may input each preset material and each corresponding subtitle text into a preset graphic-text matching model, respectively, to obtain a graphic-text matching score between each preset material and each corresponding subtitle text, Then, calculate the text similarity between the text contained in each preset material and each corresponding subtitle text, then determine whether each preset material and each subtitle text have the same source, obtain the source similarity, and finally , for each subtitle text, use at least one of the image-text matching score, text similarity, source similarity, the image clustering score of the preset material, and the text weight of the text to which the preset material belongs to perform a weighted sum to obtain each subtitle text.
- the matching degree of each preset material with respect to the subtitle text, and then a preset number of preset materials with the highest matching degree are selected, and the selected preset material is used as the target material corresponding to the subtitle text.
- the text similarity may be calculated based on a preset text similarity algorithm according to the subtitle text and the text contained in the preset material.
- the preset text similarity algorithm may be a text semantic similarity algorithm or a text text similarity algorithm, which is not limited herein.
- S1440 may specifically include:
- the electronic device may perform voice conversion on each subtitle text based on a text-to-speech conversion technology to obtain subtitle audio corresponding to each subtitle text.
- the electronic device may input the subtitle text into a preset text-to-speech conversion model to perform speech conversion to obtain subtitle audio.
- S1440 may specifically include:
- the electronic device may be preset with multiple preset background audios, one preset background audio may correspond to one emotion category, and the emotion category may include a happy category, a sad category, a serious category, a nervous category, etc., which are used to represent the target text.
- the category to which the sentiment belongs. The electronic device inputs the target text into a preset text emotion classification model for classification, obtains the emotion category to which the target text belongs, and determines the target background audio corresponding to the emotion category in multiple preset background audios, and then uses the target background audio as multimedia. material.
- the electronic device can select appropriate background music from multiple preset background audios by performing sentiment analysis and classification on the target text to generate the target video.
- subtitle texts may be added at preset positions in each image frame of each image and video, and the images and videos may be added to the images and videos according to the arrangement order of the respective subtitle texts. Sorting is performed to obtain a dynamic image, and then video rendering is performed on the subtitle text, image and video in the dynamic image according to the display effect of the preset subtitle text and the display effect of the image and video in the preset video template to obtain the target video.
- subtitle text may be added at a preset position in each image frame of each image and video, and according to the arrangement order of each subtitle text,
- the images and videos are sorted, and then, according to the audio duration of the subtitle audio corresponding to each subtitle text, the number of images and videos corresponding to each subtitle text, and the duration of the video, the display time and display duration of each image and video are determined,
- a dynamic image is obtained according to the sorting of the images and videos and the display time and display duration of each image and video, and then according to the display effect of the preset subtitle text and the display effect of the image and the video in the preset video template.
- the subtitle text in the dynamic image, the image and the video are rendered video to obtain the target video, and the target video and subtitle audio are converted according to the corresponding relationship between the timestamp of each video frame in the target video and the timestamp of the audio frame of the subtitle audio. Fusion to get the fused target video.
- subtitle text may be added at a preset position in each image frame of each image and video, and the subtitle text may be added according to the Arrange the order, sort the images and videos, and then determine the display time of each image and video according to the audio duration of the subtitle audio corresponding to each subtitle text, the number of images and videos corresponding to each subtitle text, and the duration of the video and display duration, then, according to the sorting of images and videos and the display time and display duration of each image and video, a dynamic image is obtained, and then according to the preset video template in the preset subtitle text display effect and images and videos
- the display effect performs video rendering on the subtitle text and images and videos in the dynamic image to obtain the target video, and according to the corresponding relationship between the timestamp of each video frame in the target video and the timestamp of the audio frame of the subtitle audio, the target video is The video and subtitle audio are fused to obtain the target video after preliminary fusion.
- the video element of the target video includes subtitle text and multimedia material corresponding to the subtitle text, the subtitle text is generated according to the target text, and the multimedia material is obtained according to the subtitle text.
- S1460 is similar to S220 in the embodiment shown in FIG. 2 , and details are not described here.
- images, videos, audios and other materials can be automatically matched according to the target text involved in the original data, and a target video corresponding to a piece of target text can be automatically rendered and generated, thereby improving the intelligence of the electronic device.
- the video display method may further include:
- S1450 may specifically include:
- the process of determining the target video template corresponding to the target text may be performed in parallel with S1430 and S1440, or may be performed sequentially with S1430 and S1440 in a preset sequence.
- the electronic device may be preset with multiple video classification templates, one video classification template may correspond to one content category, and the content category may include text content such as news category, story category, diary category, variety show category, etc. used to represent the target text The category to which it belongs.
- the electronic device can input the target text into a preset text content classification model for classification, obtain the content category to which the target text belongs, and determine the target video template corresponding to the content category among the multiple preset video templates. , and then generate the target video according to the subtitle text, multimedia material and target video template.
- the content categories may also be classified according to other classification methods, for example, the content of the target text is classified according to the keywords contained in the target text, which is not limited herein.
- different video templates may also include different display effect elements. Therefore, the target video can be obtained by performing video rendering on the subtitle text and the multimedia material according to the preset display effect of the subtitle text and the display effect of the multimedia material in the target video template.
- an appropriate target video template for generating the target video can be selected according to the content category to which the target text belongs, and then an appropriate display effect can be set for the subtitle text and the multimedia material.
- the electronic device when the user inputs original data to the electronic device through the manual input mode, the electronic device can directly acquire the subtitle text and multimedia material input by the user, and automatically generate the target video.
- the electronic device may acquire the subtitle text and image input by the user, and based on the subtitle text, Among the images input by the user and the preset images and videos, the image or video with the highest matching degree with the subtitle text is determined, and then the target video is automatically generated by using the subtitle text and the determined image or video.
- the electronic device may also acquire the subtitle text and images input by the user, input the subtitle text into a preset text content classification model for classification, obtain the content category to which the subtitle text belongs, and then select the content category to which the subtitle text belongs.
- the target video template corresponding to the content category is determined, and then the target video is automatically generated according to the subtitle text, the multimedia material and the target video template.
- the original data such as subtitle text and multimedia materials manually input by the user
- the subtitle text can be used as the target text
- the target video corresponding to the subtitle text and multimedia materials manually input by the user can be automatically generated, which further improves the user experience.
- the electronic device may generate the target video through the server, so as to reduce the data processing amount of the electronic device and further improve the quality of the produced video.
- the video display method may further include:
- the electronic device may send a video generation request carrying the original data to the server, so that the server generates and feeds back the target video corresponding to the target text based on the original data in response to the video generation request.
- the electronic device may receive the target video fed back by the server, and display the target video.
- the server can automatically obtain the target text through the original data, automatically generate the subtitle text according to the target text, automatically obtain the multimedia material according to the subtitle text, and automatically generate the target video according to the subtitle text and the multimedia material, which is similar to the method for generating the target video by the aforementioned electronic device. , which will not be repeated here.
- the target video can be generated based on the original data in a fast and high-quality manner through the interaction between the electronic device and the server, so as to improve the user's experience.
- the video element in order to improve the interest of the target video, may further include a preset virtual object and a pose of the virtual object.
- the pose of the virtual object can be determined according to the subtitle text.
- the virtual object may be a virtual character object with a character image, a virtual cartoon object with a cartoon image, etc., and the present disclosure does not limit the specific type of the virtual object.
- FIG. 15 shows a schematic diagram of still another video display interface provided by an embodiment of the present disclosure.
- the electronic device may display a video display interface 1501 in response to the video generation operation, and display a full-screen playback window 1502 of the target video in the video display interface 1501 , and display the display in the playback window 1502
- the target video may include virtual objects 1503, such as avatar objects.
- the virtual character object 1503 can be, for example, a virtual anchor of the news broadcast.
- the gesture of the virtual object may include at least one of a mouth gesture, a facial expression gesture, a gesture gesture, and a body gesture, which is not limited herein.
- the pose of the virtual object may include a mouth pose and a facial expression pose.
- the poses of the virtual object may include mouth poses, facial expression poses, and gesture poses.
- the poses of the virtual object may include mouth poses, facial expression poses, gesture poses, and body poses. The present disclosure does not limit the gesture type of the virtual object.
- the pose of the virtual object may be automatically determined according to the subtitle text.
- the pose of the virtual object may be determined by the electronic device. In other embodiments, the pose of the virtual object may also be determined by the server.
- the method for determining the pose of the virtual object by the electronic device is similar to the method for determining the pose of the virtual object by the server, the following takes the virtual object including the virtual character object and the method for determining the pose of the character by the electronic device as an example for detailed description.
- the subtitle audio can be input into a preset gesture generation model to obtain a real-time character gesture animation, and the real-time character gesture animation can be migrated to the object model of the virtual character object by using gesture migration technology , obtain the object model of the virtual character object that broadcasts the subtitle text, and then obtain the character pose image of the virtual character object corresponding to each audio frame of the subtitle audio according to the obtained object model, and according to each audio frame, according to the subtitle text and multimedia materials
- the timestamps in the generated target video are fused with the corresponding person pose images into the target video to obtain a fused target video with virtual human objects.
- the preset gesture generation model can be used to generate the mouth gesture animation and the facial expression gesture animation.
- the preset pose generation model can be used to generate the mouth pose, the facial expression pose, and the gesture pose.
- the preset pose generation model can be used to generate a mouth pose, a facial expression pose, a gesture pose, and a body pose.
- a virtual object similar to the user image may also be generated.
- the virtual object may be generated before the subtitle audio is generated, or may be generated after the subtitle audio is generated, which is not limited herein.
- the electronic device may first collect the user image, and then may input the user image into a preset biometric feature extraction model to extract the user biometric feature in the user image , then input the extracted user biometrics into the preset object generation model to obtain the initial object model of the virtual object with the user biometrics, and finally fuse the preset clothing model into the initial object model to obtain the final virtual object. object model.
- the user image may be a captured image captured by the user through a camera, or may be a captured image selected by the user from preset images.
- the user image may be the user's face image, upper body image and whole body image, which is not limited herein.
- the user biometric feature extracted by the electronic device may include at least one of the user's face feature, head and shoulder feature, and body shape feature, which is not limited herein.
- the extracted biometric features of the user may include the user's facial features.
- the extracted biometric features of the user may include the user's facial features and body shape features.
- a preset object generation model when the user image is a face image, a preset object generation model can be used to generate a head model. In the case where the user image is an upper body image, a preset object generation model can be used to generate an upper body model. In the case where the user image is a whole body image, a preset object generation model can be used to generate a whole body model.
- the electronic device may first collect the user image, then extract the user biometrics in the user image, and then send the extracted user biometrics to the server, so that the server can Generate object models of virtual objects based on user biometrics.
- the method for generating the object model by the server is similar to the above-mentioned method for generating the object model by the electronic device, and details are not described here.
- a virtual object similar to the user's image and the user's attire may also be generated.
- the virtual object may be generated before the subtitle audio is generated, or may be generated after the subtitle audio is generated, which is not limited herein.
- the electronic device may first collect the user image, and then may input the user image into a preset biometric feature extraction model to extract the user biometric feature in the user image , and input the user image into the preset dressing feature extraction model to extract the user dressing feature in the user image, and then input the extracted user biometrics into the preset object generation model to obtain a virtual object with the user biometrics and according to the corresponding relationship between the preset dress-up style and the dress-up model, in the preset dress-up model, the target dress-up model corresponding to the user dress-up style to which the user dress-up feature belongs is queried, and the target dress-up model and the initial The object model is obtained to obtain the object model of the virtual object with the user's dressing characteristics.
- the extracted user dressing features may include at least one of the user's facial decoration features, headgear features, clothing features, and clothing accessories features.
- the extracted user biometric features may include the user's facial features
- the extracted user dressing features may include headwear features.
- the extracted user biometric features may include the user's facial features and body shape features
- the extracted user dressing features may include facial decoration features, headgear features, clothing features, and clothing accessories features.
- the electronic device may input the user's dressing feature into a preset dressing style classification model, so as to determine the user's dressing style to which the user's dressing feature belongs.
- the dressing style can include intellectual, cute, handsome, calm, sunny and so on.
- the electronic device may first collect the user image, then extract the user biometrics and user dressing features in the user image, and then use the extracted user biometrics and user
- the dressing feature is sent to the server, so that the server generates an object model of the virtual object according to the user's biometric feature and the user's dressing feature.
- the method for generating the object model by the server is similar to the above-mentioned method for generating the object model by the electronic device, and details are not described here.
- the electronic device or server may also generate the subtitle audio according to the dress style of the virtual object, where the subtitle audio is Audio of sound signatures consistent with the virtual object's dress-up characteristics.
- the electronic device can be preset with multiple text-to-speech conversion models, and each text-to-speech conversion model corresponds to a dress style, so the electronic device can select virtual objects among the multiple text-to-speech conversion models.
- the target text-to-speech conversion model corresponding to the dressing style of the virtual object, and the subtitle text is input into the target text-to-speech conversion model for voice conversion to obtain subtitle audio, so as to generate audio with sound characteristics consistent with the dress-up characteristics of the virtual object, further improving the user experience .
- the video processing method may be executed by a server, for example, the server 102 in the server shown in FIG. 1 .
- the server may include a device with storage and computing functions, such as a cloud server or a server cluster.
- FIG. 16 shows a schematic flowchart of a video processing method provided by an embodiment of the present disclosure.
- the video processing method may include the following S1610-S16600.
- the electronic device may, in response to the video generation operation, send a video generation request with the original data to the server, so that the server can receive the original data sent by the electronic device.
- a video generation request and in response to the video generation request, the target video corresponding to the target text is fed back based on the original data.
- the electronic device may be the electronic device 101 in the client shown in FIG. 1 .
- the raw data may include text.
- S1620 may specifically include:
- the server may directly extract the text in the original data when it is determined that the data type of the original data is the text type, and use the extracted text as the target text.
- the video generation request may carry the data type of the original data, and the data type of the original data may be determined by the electronic device through a control that receives the original data.
- the original data may include multimedia files.
- S1620 may specifically include:
- the server may perform text conversion on the multimedia file when it is determined that the data type of the original data is the multimedia file type to obtain the converted text, and use the converted text as the target text.
- the video generation request may carry the data type of the original data, and the data type of the original data may be determined by the electronic device through the control that receives the original data.
- the image file when the multimedia file includes an image file, the image file can be converted to text through OCR technology to obtain the converted text; the image content of the image file can also be learned, and the image content can be learned by summarizing the image image.
- the content is converted into text to obtain the converted text describing the content of the image screen, which is not limited here.
- OCR technology can be used to perform text conversion on each image frame of the video file to obtain the converted text; Learn the content of the image and screen, and convert the text by summarizing the content of the image and screen to obtain the converted text describing the content of the image and screen. You can also perform text conversion on the audio in the video file through speech recognition to obtain the converted text, which is not limited here.
- the audio file may be converted to text through speech recognition to obtain the converted text.
- the original data may include a link address, and the link address may be used to obtain article content.
- S1620 may specifically include:
- the server may obtain the target article based on the link address when it is determined that the data type of the original data is the address type, and directly extract the article text in the target article, and then use the extracted article text as the target text.
- the video generation request may carry the data type of the original data, and the data type of the original data may be determined by the electronic device through a control that receives the original data.
- S1630 may specifically include:
- Text typesetting is performed on the abstract to obtain the subtitle text corresponding to the abstract.
- the server may directly input the target file into a preset abstract extraction model to obtain an abstract of the target text, and then directly input the obtained abstract into a preset text typesetting model to obtain a sentence-by-sentence unit of word processing.
- Subtitle text that is processed across lines and pages and configured with matching punctuation.
- the server may also filter, in the Internet article text or the local article text, based on at least one of the title of the target text and the text content, those whose similarity with the target text meets a preset text similarity threshold Article text. If similar article texts are not filtered, you can directly input the target file into the preset abstract extraction model to get the abstract of the target text. If similar article texts are filtered, you can perform a weighted sum based on the text length, the number of likes, and the number of retweets in the target text and similar article texts to obtain the text score of each text, and select the text with the highest text score, and assign The selected text is input into the preset abstract extraction model to obtain the abstract of the target text. After the electronic device obtains the abstract, the abstract can be directly input into a preset text typesetting model to obtain subtitle text in sentence units, with words processed across lines and pages, and configured with matching punctuation marks.
- text cleaning may also be performed on sensitive keywords in the abstract, such as institution name and user personal information, as well as special symbols that cannot generate voice and audio, and then the cleaned Text typesetting is performed on the abstract, and the subtitle text corresponding to the abstract is obtained.
- S1630 may specifically include:
- the method for the server to obtain the abstract of the target text is similar to the above-mentioned embodiment, and is not repeated here.
- S1640 may specifically include:
- the multiple preset materials include materials obtained according to the original data, and the preset materials include at least one of images and videos;
- the server may first acquire a plurality of preset materials, and the preset materials may include at least one of materials in a material library and materials in the Internet.
- the server may acquire at least one material among images and videos based on the original data, and the preset material may also include material acquired according to the original data.
- the server may determine the degree of matching between each preset material and each subtitle text, and then select a preset number of preset materials with the highest matching degree for each subtitle text, and select the selected The preset material is used as the target material corresponding to the subtitle text.
- the server may input each preset material and each corresponding subtitle text into a preset graphic-text matching model, respectively, to obtain a graphic-text matching score between each preset material and each corresponding subtitle text, and then , calculate the text similarity between the text contained in each preset material and each corresponding subtitle text, then determine whether the source of each preset material and each subtitle text is the same, and obtain the source similarity, and finally, For each subtitle text, use at least one of the image-text matching score, text similarity, source similarity, the image clustering score of the preset material, and the text weight of the text to which the preset material belongs to perform a weighted sum to obtain each subtitle text.
- the matching degree of the preset material relative to the subtitle text is determined, and then a preset number of preset materials with the highest matching degree are selected, and the selected preset material is used as the target material corresponding to the subtitle text.
- the text similarity may be calculated based on a preset text similarity algorithm according to the subtitle text and the text contained in the preset material.
- the preset text similarity algorithm may be a text semantic similarity algorithm or a text text similarity algorithm, which is not limited herein.
- S1640 may specifically include:
- the server may perform voice conversion on each subtitle text based on a text-to-speech conversion technology to obtain subtitle audio corresponding to each subtitle text.
- the server may input the subtitle text into a preset text-to-speech conversion model to perform speech conversion to obtain subtitle audio.
- S1640 may specifically include:
- the server may be preset with multiple preset background audios, one preset background audio may correspond to one emotion category, and the emotion category may include a happy category, a sad category, a serious category, a nervous category, etc., which are used to represent the emotion of the target text The category to which it belongs.
- the server inputs the target text into a preset text emotion classification model for classification, obtains the emotion category to which the target text belongs, and determines the target background audio corresponding to the emotion category among the plurality of preset background audios, and then uses the target background audio as a multimedia material .
- the server can select appropriate background music from multiple preset background audios by performing sentiment analysis and classification on the target text to generate the target video.
- the server may directly generate the target video according to the subtitle text and the multimedia material.
- the video element may include subtitle text and multimedia material.
- subtitle texts may be added at preset positions in each image frame of each image and video, and the images and videos may be added to the images and videos according to the arrangement order of the respective subtitle texts. Sorting is performed to obtain a dynamic image, and then video rendering is performed on the subtitle text, image and video in the dynamic image according to the display effect of the preset subtitle text and the display effect of the image and video in the preset video template to obtain the target video.
- subtitle text may be added at a preset position in each image frame of each image and video, and according to the arrangement order of each subtitle text,
- the images and videos are sorted, and then, according to the audio duration of the subtitle audio corresponding to each subtitle text, the number of images and videos corresponding to each subtitle text, and the duration of the video, the display time and display duration of each image and video are determined,
- a dynamic image is obtained according to the sorting of the images and videos and the display time and display duration of each image and video, and then according to the display effect of the preset subtitle text and the display effect of the image and the video in the preset video template.
- the subtitle text in the dynamic image, the image and the video are rendered into video, and the target video is obtained, and the target video and the subtitle audio are converted according to the corresponding relationship between the timestamp of each video frame in the target video and the timestamp of the audio frame of the subtitle audio. Fusion to get the fused target video.
- subtitle text may be added at a preset position in each image frame of each image and video, and the subtitle text may be added according to the Arrange the order, sort the images and videos, and then determine the display time of each image and video according to the audio duration of the subtitle audio corresponding to each subtitle text, the number of images and videos corresponding to each subtitle text, and the duration of the video and display duration, then, according to the sorting of images and videos and the display time and display duration of each image and video, a dynamic image is obtained, and then according to the preset video template in the preset subtitle text display effect and images and videos
- the display effect performs video rendering on the subtitle text and images and videos in the dynamic image to obtain the target video, and according to the corresponding relationship between the timestamp of each video frame in the target video and the timestamp of the audio frame of the subtitle audio, the target video is The video and subtitle audio are fused to obtain the target video after preliminary fusion.
- the video processing method may further include:
- a target video template corresponding to the content category is determined.
- S1650 may specifically include:
- the server may preset multiple video classification templates, one video classification template may correspond to one content category, and the content category may include news category, story category, diary category, variety show category, etc. used to indicate that the text content of the target text belongs to Category of classification.
- the electronic device can input the target text into a preset text content classification model for classification, obtain the content category to which the target text belongs, and determine the target video template corresponding to the content category among the multiple preset video templates. , and then generate the target video according to the subtitle text, multimedia material and target video template.
- the content categories may also be classified according to other classification methods, for example, the content of the target text is classified according to the keywords contained in the target text, which is not limited herein.
- different video templates may also include different display effect elements. Therefore, the target video can be obtained by performing video rendering on the subtitle text and the multimedia material according to the preset display effect of the subtitle text and the display effect of the multimedia material in the target video template.
- an appropriate target video template for generating the target video can be selected according to the content category to which the target text belongs, and then an appropriate display effect can be set for the subtitle text and the multimedia material.
- the video processing method may further include:
- the pose of the virtual object is determined.
- S1650 may specifically include:
- the target video is generated according to the subtitle text, multimedia material, virtual object and the pose of the virtual object.
- the video element may further include a preset virtual object and a pose of the virtual object.
- the virtual object may be a virtual character object with a character image, a virtual cartoon object with a cartoon image, etc., and the present disclosure does not limit the specific type of the virtual object.
- the gesture of the virtual object may include at least one of a mouth gesture, a facial expression gesture, a gesture gesture, and a body gesture, which is not limited herein.
- the pose of the virtual object may include a mouth pose and a facial expression pose.
- the poses of the virtual object may include mouth poses, facial expression poses, and gesture poses.
- the poses of the virtual object may include mouth poses, facial expression poses, gesture poses, and body poses. The present disclosure does not limit the gesture type of the virtual object.
- the subtitle audio can be input into a preset gesture generation model to obtain a real-time character gesture animation, and the gesture migration technology is used to transfer the real-time character gesture animation to the object model of the virtual character object, Obtain the object model of the virtual character object that broadcasts the subtitle text, and then obtain the character pose image of the virtual character object corresponding to each audio frame of the subtitle audio according to the obtained object model, and according to each audio frame, according to the subtitle text and multimedia materials.
- the timestamps in the generated target video are fused with the corresponding person pose images into the target video to obtain a fused target video with virtual human objects.
- the preset gesture generation model can be used to generate a mouth gesture animation and a facial expression gesture animation.
- the preset pose generation model can be used to generate a mouth pose, a facial expression pose, and a gesture pose.
- the preset pose generation model can be used to generate a mouth pose, a facial expression pose, a gesture pose, and a body pose.
- the video processing method may further include:
- the electronic device may send the user image to the server, and the server may input the received user image into a preset biometric feature extraction model to extract the user biometric feature in the user image, and then input the extracted user biometric feature into a preset biometric feature extraction model.
- the preset object generation model is used to obtain the initial object model of the virtual object with the user's biological characteristics, and finally the preset clothing model is fused into the initial object model to obtain the final object model of the virtual object.
- the electronic device may send the user image to the server, and the server may input the received user image into a preset biometric feature extraction model to extract the user biometric feature in the user image, and input the user image into the preset biometric feature extraction model Then, the extracted user biometrics can be input into the preset object generation model to obtain the initial object model of the virtual object with the user biometrics, and according to the preset Set the corresponding relationship between the dressing style and the dressing model, in the preset dressing model, query the target dressing model corresponding to the user dressing style to which the user dressing feature belongs, and compare the target dressing model and the initial object model to obtain the user dressing feature.
- the object model of the virtual object may be input into the preset object generation model to obtain the initial object model of the virtual object with the user biometrics, and according to the preset Set the corresponding relationship between the dressing style and the dressing model, in the preset dressing model, query the target dressing model corresponding to the user dressing style to which the user dressing feature belongs, and compare the target dressing model and the initial object model to obtain
- the extracted user dressing features may include at least one of the user's facial decoration features, headgear features, clothing features, and clothing accessories features.
- the server may also generate subtitle audio according to the costume style of the virtual object, where the subtitle audio is the costume of the virtual object Audio with a consistent sound signature.
- the server may send the target video to the electronic device, so that the electronic device displays the target video.
- the present disclosure it is possible to receive a video generation request carrying original data sent by an electronic device, and in response to the video generation request, automatically obtain target text according to the original data, and automatically generate subtitle text according to the target text, and then Automatically obtain the multimedia material corresponding to the subtitle text, and automatically generate the target video according to the subtitle text and multimedia material.
- the rich multimedia material can be automatically found during the generation of the target video, without the need for the user to manually search for the material for making the video.
- the time cost of producing the video can be reduced, and the quality of the produced video can also be improved.
- An embodiment of the present disclosure further provides a video processing system, where the video processing system may include an electronic device and a server, thereby implementing the architecture shown in FIG. 1 .
- the electronic device can be used to receive a video generation operation for the original data, the original data is used to obtain target text, and the video generation operation is used to trigger the generation of a target video corresponding to the target text; in response to the video generation operation, send the original data to the server.
- video generation request ; receive the target video sent by the server; display the target video;
- the server can be used to receive a video generation request sent by an electronic device; in response to the video generation request, obtain target text according to the original data; generate subtitle text according to the target text; obtain multimedia material corresponding to the subtitle text; , generate the target video; send the target video to the electronic device.
- the video display device may perform various steps in the method embodiments shown in FIG. 2 to FIG. 15, and realize various processes and effects in the method embodiments shown in FIG. 2 to FIG. Various steps in the method embodiment shown in FIG. 16 are executed, and various processes and effects in the method embodiment shown in FIG. 16 are implemented, which will not be repeated here.
- the electronic device may send a video generation request with the original data to the server, and the server can generate a video with the original data sent by the electronic device.
- the server can generate a video with the original data sent by the electronic device.
- the electronic device Target video so that the electronic device can display the target video after receiving the target video fed back by the server.
- FIG. 17 shows a schematic diagram of an interaction flow of a video processing system provided by an embodiment of the present disclosure.
- the video display method may include the following S1710-S1770.
- the electronic device may receive the original data input by the user's data input operation, or receive the original data sent by other devices to the electronic device.
- the original data may include at least one of text, link addresses, and multimedia files, which will not be repeated here.
- the electronic device may display the received raw data.
- a video generation operation for the original data may be input to the electronic device.
- the video generation operation may be a trigger operation such as a long press, double click, voice control or expression control on the original data, and the video generation operation may also be a trigger operation such as a click, a long press, and a double click on the video generation trigger control. Do repeat.
- the electronic device may send a video generation request carrying original data to the server.
- the server may generate the target video based on the original data in response to the video generation request.
- the server can automatically obtain the target text through the original data, automatically generate the subtitle text according to the target text, automatically obtain the multimedia material according to the subtitle text, and automatically generate the target video according to the subtitle text and the multimedia material. This will not be repeated.
- the server may send the target video to the electronic device.
- the electronic device After receiving the target video sent by the server, the electronic device may display the received target video.
- FIG. 18 shows a schematic structural diagram of a video display device provided by an embodiment of the present disclosure.
- the video display apparatus 1800 shown in FIG. 18 may be provided in an electronic device, for example, the electronic device 101 in the client shown in FIG. 1 .
- electronic devices may include mobile phones, tablet computers, desktop computers, notebook computers, vehicle terminals, wearable devices, all-in-one computers, smart home devices and other devices with communication functions, and may also include devices simulated by virtual machines or simulators.
- the video display apparatus 1800 may include a first receiving unit 1810 and a first display unit 1820 .
- the first receiving unit 1810 may be configured to receive a video generation operation for raw data, where the original data is used to obtain target text, and the video generation operation is used to trigger generation of a target video corresponding to the target text.
- the first display unit 1820 may be configured to display the generated target video in response to the video generation operation, where the video elements of the target video include subtitle text and multimedia material corresponding to the subtitle text, the subtitle text is generated according to the target text, and the multimedia material is obtained according to the subtitle text .
- a video generation operation for raw data can be received. Since the raw data can be used to obtain target text, the video generation operation can be used to trigger the generation of a target video corresponding to the target text. Therefore, after receiving the video After the generation operation, the target video generated in response to the video generation operation can be displayed, and the video element of the target video can include subtitle text and multimedia material corresponding to the subtitle text, wherein the subtitle text can be automatically generated according to the target text, and the multimedia material can be based on The subtitle text is automatically obtained. It can be seen that rich multimedia materials can be automatically found during the generation of the target video, and users do not need to manually search for materials for making videos, which can not only reduce the time cost of making videos, but also improve the quality of the videos.
- the video display apparatus 1800 may further include a second display unit, a third receiving unit, and a third display unit.
- the second display unit may be configured to display a video editing interface in response to the video generation operation, the video editing interface includes editable elements, and the editable elements include at least one of video elements and display effect elements corresponding to the video elements.
- the third receiving unit may be configured to receive an element modification operation on the editable element.
- the third display unit may be configured to display a modified target video in response to the element modification operation, where the modified target video includes editable elements displayed in the video editing interface when the element modification operation is completed.
- the video display apparatus 1800 may further include a fourth display unit, a fourth receiving unit, and a fifth display unit.
- the fourth display unit may be configured to display an indication mark for indicating that the target video has been generated.
- the fourth receiving unit may be configured to receive an identification-triggered operation for the indication identification.
- the fifth display unit may be configured to hide the original data in response to the identification triggering operation.
- the video display apparatus 1800 may further include a second sending unit and a fifth receiving unit.
- the second sending unit may be configured to send a video generation request carrying original data to the server, where the video generation request is used to make the server feed back a target video corresponding to the target text based on the original data.
- the fifth receiving unit may be configured to receive the target video fed back by the server.
- the video display apparatus 1800 may further include a third acquiring unit, a third generating unit, a fourth acquiring unit, and a fourth generating unit.
- the third obtaining unit may be configured to obtain the target text according to the original data.
- the third generating unit may be configured to generate subtitle text according to the target text.
- the fourth obtaining unit may be configured to obtain the multimedia material corresponding to the subtitle text.
- the fourth generating unit may be configured to generate the target video according to the subtitle text and the multimedia material.
- the video element may further include a preset virtual object and a pose of the virtual object, and the pose of the virtual object may be determined according to the subtitle text.
- the video display apparatus 1800 shown in FIG. 18 can perform various steps in the method embodiments shown in FIGS. 2 to 15 , and implement various processes and processes in the method embodiments shown in FIGS. The effect will not be repeated here.
- FIG. 19 shows a schematic structural diagram of a video processing apparatus provided by an embodiment of the present disclosure.
- the video processing apparatus 1900 may be a server, for example, the server 102 in the server shown in FIG. 1 .
- the server may include a device with storage and computing functions, such as a cloud server or a server cluster.
- the video processing apparatus 1900 may include a second receiving unit 1910 , a first obtaining unit 1920 , a first generating unit 1930 , a second obtaining unit 1940 , a second generating unit 1950 and a first sending unit 1960 .
- the second receiving unit 1910 may be configured to receive a video generation request that carries original data and is sent by the electronic device.
- the first obtaining unit 1920 may be configured to obtain the target text according to the original data in response to the video generation request.
- the first generating unit 1930 may be configured to generate subtitle text according to the target text.
- the second obtaining unit 1940 may be configured to obtain the multimedia material corresponding to the subtitle text.
- the second generating unit 1950 may be configured to generate the target video according to the subtitle text and the multimedia material.
- the first sending unit 1960 may be configured to send the target video to the electronic device.
- the present disclosure it is possible to receive a video generation request carrying original data sent by an electronic device, and in response to the video generation request, automatically obtain target text according to the original data, and automatically generate subtitle text according to the target text, and then Automatically obtain the multimedia material corresponding to the subtitle text, and automatically generate the target video according to the subtitle text and multimedia material.
- the rich multimedia material can be automatically found during the generation of the target video, without the need for the user to manually search for the material for making the video.
- the time cost of producing the video can be reduced, and the quality of the produced video can also be improved.
- the first generating unit 1930 may include an abstract extraction subunit and a text typesetting subunit.
- the abstract extraction subunit can be configured to perform text abstract extraction on the target text to obtain the abstract of the target text.
- the text typesetting subunit may be configured to perform text typesetting on the abstract to obtain subtitle text corresponding to the abstract.
- the raw data may include text.
- the first obtaining unit 1920 may include a text extraction subunit and a first processing subunit.
- the text extraction subunit can be configured to extract text in the original data.
- the first processing subunit may be configured to take the text as the target text.
- the original data may include multimedia files.
- the first acquisition unit 1920 may include a text conversion subunit and a second processing subunit.
- the text conversion subunit may be configured to perform text conversion on the multimedia file to obtain converted text.
- the second processing subunit may be configured to take the converted text as the target text.
- the original data may include a link address, and the link address may be used to obtain article content.
- the first acquisition unit 1920 may include an article acquisition subunit, a text extraction subunit, and a third processing subunit.
- the article obtaining subunit can be configured to obtain the target article based on the link address.
- the text extraction subunit can be configured to extract article text in the target article.
- the third processing subunit may be configured to use the article text as the target text.
- the second obtaining unit 1940 may include a fourth processing subunit and a fifth processing subunit.
- the fourth processing subunit may be configured to determine a target material with the highest degree of matching with the subtitle text among multiple preset materials, the multiple preset materials include materials obtained according to original data, and the preset materials include images and videos at least one of.
- the fifth processing subunit may be configured to use the target material as a multimedia material.
- the second obtaining unit 1940 may include a speech conversion subunit and a sixth processing subunit.
- the voice conversion subunit may be configured to perform text-to-speech conversion on the subtitle text to obtain subtitle audio corresponding to the subtitle text.
- the sixth processing subunit may be configured to use the subtitle audio as the multimedia material.
- the second obtaining unit 1940 may include an emotion classification subunit, a seventh processing subunit, and an eighth processing subunit.
- the emotion classification subunit may be configured to input the target text into a preset text emotion classification model for classification, and obtain the emotion category to which the target text belongs.
- the seventh processing subunit may be configured to determine a target background audio corresponding to an emotion category among a plurality of preset background audios.
- the eighth processing subunit may be configured to use the target background audio as the multimedia material.
- the video processing apparatus 1900 may further include a content classification unit and a template determination unit.
- the content classification unit may be configured to input the target text into a preset text content classification model for classification, and obtain the content category to which the target text belongs.
- the template determining unit may be configured to determine a target video template corresponding to a content category from among a plurality of preset video templates.
- the second generating unit 1950 may be further configured to generate the target video according to the subtitle text, the multimedia material and the target video template.
- the video processing apparatus 1900 shown in FIG. 19 can execute various steps in the method embodiment shown in FIG. 16 , and realize various processes and effects in the method embodiment shown in FIG. 16 , which will not be described here. Repeat.
- Embodiments of the present disclosure also provide a computing device.
- the computing device may include a processor and a memory, and the memory may be used to store executable instructions.
- the processor may be configured to read the executable instructions from the memory, and execute the executable instructions to implement the video display method or the video processing method in the foregoing embodiments.
- FIG. 20 shows a schematic structural diagram of a computing device provided by an embodiment of the present disclosure. Referring specifically to FIG. 20 below, it shows a schematic structural diagram of a computing device 2000 suitable for implementing an embodiment of the present disclosure.
- the computing device may be an electronic device or a server.
- Electronic devices may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Tablet Computers), PMPs (Portable Multimedia Players), in-vehicle terminals (eg, in-vehicle navigation terminals), Mobile terminals such as wearable devices, etc., and stationary terminals such as digital TVs, desktop computers, smart home devices, and the like.
- the server may include a device with storage and computing functions, such as a cloud server or a server cluster.
- computing device 2000 shown in FIG. 20 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
- the computing device 2000 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 2001, which may be loaded into a random access device according to a program stored in a read only memory (ROM) 2002 or from a storage device 2008 Various appropriate actions and processes are executed by accessing the program in the memory (RAM) 2003 .
- ROM read only memory
- RAM memory
- various programs and data required for the operation of the computing device 2000 are also stored.
- the processing device 2001, the ROM 2002, and the RAM 2003 are connected to each other through a bus 2004.
- An input/output (I/O) interface 2005 is also connected to the bus 2004 .
- the following devices can be connected to the I/O interface 2005: input devices 2006 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 2007 such as a computer; a storage device 2008 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 2009. Communication means 2009 may allow computing device 2000 to communicate wirelessly or by wire with other devices to exchange data. While FIG. 20 shows computing device 2000 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
- Embodiments of the present disclosure also provide a computer-readable storage medium, where the storage medium stores a computer program, and when the computer program is executed by the processor, enables the processor to implement the video display method or the video processing method in the foregoing embodiments.
- embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
- the computer program may be downloaded and installed from the network through the communication device 2009, or from the storage device 2008, or from the ROM 2002.
- the above-mentioned functions defined in the video display method of the embodiments of the present disclosure are executed or the above-mentioned functions defined in the video processing methods of the embodiments of the present disclosure are executed.
- the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
- the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
- a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
- a computer-readable signal medium may include a data signal in baseband or propagated as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
- Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
- clients, servers can communicate using any currently known or future developed network protocol, such as HTTP, and can be interconnected with any form or medium of digital data communication (eg, a communication network).
- a communication network examples include local area networks (“LAN”), wide area networks (“WAN”), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.
- the above-mentioned computer-readable medium may be included in the above-mentioned computing device; or may exist alone without being assembled into the computing device.
- the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the computing device, the computing device is made to execute:
- Receive a video generation operation for the original data the original data is used to obtain the target text, and the video generation operation is used to trigger the generation of the target video corresponding to the target text;
- the generated target video is displayed, and the video elements of the target video include subtitles
- the multimedia material corresponding to the text and the subtitle text the subtitle text is generated according to the target text, and the multimedia material is obtained according to the subtitle text;
- computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and also conventional procedural programming languages - such as the "C" language or similar programming languages.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
- LAN local area network
- WAN wide area network
- each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
- the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
- each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
- the units involved in the embodiments of the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner. Among them, the name of the unit does not constitute a limitation of the unit itself under certain circumstances.
- exemplary types of hardware logic components include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
- FPGAs Field Programmable Gate Arrays
- ASICs Application Specific Integrated Circuits
- ASSPs Application Specific Standard Products
- SOCs Systems on Chips
- CPLDs Complex Programmable Logical Devices
- a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
- the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
- Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
- machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
- RAM random access memory
- ROM read only memory
- EPROM or flash memory erasable programmable read only memory
- CD-ROM compact disk read only memory
- magnetic storage or any suitable combination of the foregoing.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Library & Information Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
Claims (20)
- 一种视频显示方法,其特征在于,包括:接收针对原始数据的视频生成操作,所述原始数据用于获取目标文本,所述视频生成操作用于触发生成所述目标文本对应的目标视频;响应于所述视频生成操作,显示生成的所述目标视频,所述目标视频的视频元素包括字幕文本和所述字幕文本对应的多媒体素材,所述字幕文本根据所述目标文本生成,所述多媒体素材根据所述字幕文本获取。
- 根据权利要求1所述的方法,其特征在于,在所述接收针对原始数据的视频生成操作之后,所述方法还包括:响应于所述视频生成操作,显示视频编辑界面,所述视频编辑界面包括可编辑元素,所述可编辑元素包括所述视频元素和所述视频元素对应的显示效果元素中的至少一种;接收对所述可编辑元素的元素修改操作;响应于所述元素修改操作,显示修改后的目标视频,所述修改后的目标视频包括所述元素修改操作完成时在所述视频编辑界面内显示的可编辑元素。
- 根据权利要求1所述的方法,其特征在于,在所述显示生成的所述目标视频之前,所述方法还包括:显示指示标识,所述指示标识用于指示已生成所述目标视频;接收对所述指示标识的标识触发操作;响应于所述标识触发操作,隐藏所述原始数据。
- 根据权利要求1所述的方法,其特征在于,在所述显示生成的所述目标视频之前,所述方法还包括:向服务器发送携带有所述原始数据的视频生成请求,所述视频生成请求用于使所述服务器基于所述原始数据反馈所述目标文本对应的所述目标视频;接收所述服务器反馈的所述目标视频。
- 根据权利要求1所述的方法,其特征在于,在所述显示生成的所述目标视频之前,所述方法还包括:根据所述原始数据,获取所述目标文本;根据所述目标文本,生成所述字幕文本;获取所述字幕文本对应的所述多媒体素材;根据所述字幕文本和所述多媒体素材,生成所述目标视频。
- 一种视频处理方法,其特征在于,包括:接收电子设备发送的携带有原始数据的视频生成请求;响应于所述视频生成请求,根据所述原始数据,获取目标文本;根据所述目标文本,生成字幕文本;获取所述字幕文本对应的多媒体素材;根据所述字幕文本和所述多媒体素材,生成目标视频;向所述电子设备发送所述目标视频。
- 根据权利要求6所述的方法,其特征在于,所述根据所述目标文本,生成字幕文本,包括:对所述目标文本进行文本摘要提取,得到所述目标文本的摘要;对所述摘要进行文本排版,得到所述摘要对应的所述字幕文本。
- 根据权利要求6所述的方法,其特征在于,所述原始数据包括文字;所述根据所述原始数据,获取目标文本,包括:提取所述原始数据中的所述文字;将所述文字作为所述目标文本。
- 根据权利要求6所述的方法,其特征在于,所述原始数据包括多媒体文件;所述根据所述原始数据,获取目标文本,包括:对所述多媒体文件进行文本转换,得到转换文本;将所述转换文本作为所述目标文本。
- 根据权利要求6所述的方法,其特征在于,所述原始数据包括链接地址,所述链接地址用于获取文章内容;所述根据所述原始数据,获取目标文本,包括:基于所述链接地址,获取所述目标文章;提取所述目标文章中的文章文本;将所述文章文本作为所述目标文本。
- 根据权利要求6所述的方法,其特征在于,所述获取所述字幕文本对应的多媒体素材,包括:在多个预设素材中,确定与所述字幕文本的匹配程度最高的目标素材,所述多个预设素材包括根据所述原始数据获取的素材,所述预设素材包括图像和视频中的至少一种;将所述目标素材作为所述多媒体素材。
- 根据权利要求6所述的方法,其特征在于,所述获取所述字幕文本对应的多媒体素材,包括:对所述字幕文本进行文本语音转换,得到所述字幕文本对应的字幕音频;将所述字幕音频作为所述多媒体素材。
- 根据权利要求6所述的方法,其特征在于,所述获取所述字幕文本对应的多媒体素材,包括:将所述目标文本输入预设的文本情感分类模型进行分类,得到所述目标文本所属的情感类别;在多个预设背景音频中,确定所述情感类别对应的目标背景音频;将所述目标背景音频作为所述多媒体素材。
- 根据权利要求6所述的方法,其特征在于,在所述获取目标文本之后、在所述根据所述字幕文本和所述多媒体素材,生成目标视频之前,所述方法还包括:将所述目标文本输入预设的文本内容分类模型进行分类,得到所述目标文本所属的内容类别;在多个预设视频模板中,确定所述内容类别对应的目标视频模板;所述根据所述字幕文本和所述多媒体素材,生成目标视频,包括:根据所述字幕文本、所述多媒体素材和所述目标视频模板,生成所述目标视频。
- 一种视频显示装置,其特征在于,包括:第一接收单元,配置为接收针对原始数据的视频生成操作,所述原始数据用于获取目标文本,所述视频生成操作用于触发生成所述目标文本对应的目标视频;第一显示单元,配置为响应于所述视频生成操作,显示生成的所述目标视频,所述目标视频的视频元素包括字幕文本和所述字幕文本对应的多媒体素材,所述字幕文本根据所述目标文本生成,所述多媒体素材根据所述字幕文本获取。
- 一种视频处理装置,其特征在于,包括:第二接收单元,配置为接收电子设备发送的携带有原始数据的视频生成请求;第一获取单元,配置为响应于所述视频生成请求,根据所述原始数据,获取目标文本;第一生成单元,配置为根据所述目标文本,生成字幕文本;第二获取单元,配置为获取所述字幕文本对应的多媒体素材;第二生成单元,配置为根据所述字幕文本和所述多媒体素材,生成目标视频;第一发送单元,配置为向所述电子设备发送所述目标视频。
- 一种视频处理系统,包括电子设备和服务器,其特征在于:所述电子设备用于:接收针对原始数据的视频生成操作,所述原始数据用于获取目标文本,所述视频生成操作用于触发生成所述目标文本对应的目标视频;响应于所述视频生成操作,向所述服务器发送携带有所述原始数据的视频生成请求;接收所述服务器发送的所述目标视频;显示所述目标视频;所述服务器用于:接收所述电子设备发送的所述视频生成请求;响应于所述视频生成请求,根据所述原始数据,获取所述目标文本;根据所述目标文本,生成字幕文本;获取所述字幕文本对应的多媒体素材;根据所述字幕文本和所述多媒体素材,生成所述目标视频;向所述电子设备发送所述目标视频。
- 一种计算设备,其特征在于,包括:处理器;存储器,用于存储可执行指令;其中,所述处理器用于从所述存储器中读取所述可执行指令,并执行所述可执行指令以实现上述权利要求1-5中任一项所述的视频显示方法或者权利要求6-14中任一项所述的视频处理方法。
- 一种计算机可读存储介质,其特征在于,所述存储介质存储有计算机程序,当所述计算机程序被处理器执行时,使得处理器实现上述权利要求1-5中任一项所述的视频显示方法或者权利要求6-14中任一项所述的视频处理方法。
- 一种计算机程序产品,其特征在于,所述计算机程序产品包括承载在计算机可读介质上的计算机程序,当所述计算机程序被处理器执行时,使得处理器实现上述权利要求1-5中任一项所述的视频显示方法或者权利要求6-14中任一项所述的视频处理 方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/256,014 US20240107127A1 (en) | 2020-12-07 | 2021-11-15 | Video display method and apparatus, video processing method, apparatus, and system, device, and medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011437788.4 | 2020-12-07 | ||
CN202011437788.4A CN112579826A (zh) | 2020-12-07 | 2020-12-07 | 视频显示及处理方法、装置、系统、设备、介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022121626A1 true WO2022121626A1 (zh) | 2022-06-16 |
Family
ID=75132044
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/130581 WO2022121626A1 (zh) | 2020-12-07 | 2021-11-15 | 视频显示及处理方法、装置、系统、设备、介质 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20240107127A1 (zh) |
CN (1) | CN112579826A (zh) |
WO (1) | WO2022121626A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115145452A (zh) * | 2022-07-01 | 2022-10-04 | 杭州网易云音乐科技有限公司 | 帖子生成方法、介质、终端设备和计算设备 |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112579826A (zh) * | 2020-12-07 | 2021-03-30 | 北京字节跳动网络技术有限公司 | 视频显示及处理方法、装置、系统、设备、介质 |
CN113364999B (zh) * | 2021-05-31 | 2022-12-27 | 北京达佳互联信息技术有限公司 | 视频生成方法、装置、电子设备及存储介质 |
CN113473204B (zh) * | 2021-05-31 | 2023-10-13 | 北京达佳互联信息技术有限公司 | 一种信息展示方法、装置、电子设备及存储介质 |
CN113365134B (zh) * | 2021-06-02 | 2022-11-01 | 北京字跳网络技术有限公司 | 音频分享方法、装置、设备及介质 |
CN113497899A (zh) * | 2021-06-22 | 2021-10-12 | 深圳市大头兄弟科技有限公司 | 文字与图片的匹配方法、装置、设备及存储介质 |
CN113630644B (zh) * | 2021-06-29 | 2024-01-30 | 北京搜狗科技发展有限公司 | 视频内容编辑器的编辑方法、装置及存储介质 |
CN113778717A (zh) * | 2021-09-14 | 2021-12-10 | 北京百度网讯科技有限公司 | 内容分享方法、装置、设备以及存储介质 |
CN115811632A (zh) * | 2021-09-15 | 2023-03-17 | 北京字跳网络技术有限公司 | 一种视频处理方法、装置、设备及存储介质 |
CN114297150A (zh) * | 2021-11-19 | 2022-04-08 | 北京达佳互联信息技术有限公司 | 媒体文件处理方法、装置、设备及存储介质 |
CN114900711A (zh) * | 2022-05-27 | 2022-08-12 | 北京字跳网络技术有限公司 | 媒体内容的生成方法、装置、设备及存储介质 |
CN114968463A (zh) * | 2022-05-31 | 2022-08-30 | 北京字节跳动网络技术有限公司 | 实体展示方法、装置、设备及介质 |
CN115334367B (zh) * | 2022-07-11 | 2023-10-17 | 北京达佳互联信息技术有限公司 | 视频的摘要信息生成方法、装置、服务器以及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080284910A1 (en) * | 2007-01-31 | 2008-11-20 | John Erskine | Text data for streaming video |
CN108322800A (zh) * | 2017-01-18 | 2018-07-24 | 阿里巴巴集团控股有限公司 | 字幕信息处理方法及装置 |
CN109729420A (zh) * | 2017-10-27 | 2019-05-07 | 腾讯科技(深圳)有限公司 | 图片处理方法及装置、移动终端及计算机可读存储介质 |
CN109756751A (zh) * | 2017-11-07 | 2019-05-14 | 腾讯科技(深圳)有限公司 | 多媒体数据处理方法及装置、电子设备、存储介质 |
CN112579826A (zh) * | 2020-12-07 | 2021-03-30 | 北京字节跳动网络技术有限公司 | 视频显示及处理方法、装置、系统、设备、介质 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107172485B (zh) * | 2017-04-25 | 2020-01-31 | 北京百度网讯科技有限公司 | 一种用于生成短视频的方法与装置、输入设备 |
CN108965737B (zh) * | 2017-05-22 | 2022-03-29 | 腾讯科技(深圳)有限公司 | 媒体数据处理方法、装置及存储介质 |
CN109257659A (zh) * | 2018-11-16 | 2019-01-22 | 北京微播视界科技有限公司 | 字幕添加方法、装置、电子设备及计算机可读存储介质 |
CN111787395B (zh) * | 2020-05-27 | 2023-04-18 | 北京达佳互联信息技术有限公司 | 视频生成方法、装置、电子设备及存储介质 |
-
2020
- 2020-12-07 CN CN202011437788.4A patent/CN112579826A/zh active Pending
-
2021
- 2021-11-15 WO PCT/CN2021/130581 patent/WO2022121626A1/zh active Application Filing
- 2021-11-15 US US18/256,014 patent/US20240107127A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080284910A1 (en) * | 2007-01-31 | 2008-11-20 | John Erskine | Text data for streaming video |
CN108322800A (zh) * | 2017-01-18 | 2018-07-24 | 阿里巴巴集团控股有限公司 | 字幕信息处理方法及装置 |
CN109729420A (zh) * | 2017-10-27 | 2019-05-07 | 腾讯科技(深圳)有限公司 | 图片处理方法及装置、移动终端及计算机可读存储介质 |
CN109756751A (zh) * | 2017-11-07 | 2019-05-14 | 腾讯科技(深圳)有限公司 | 多媒体数据处理方法及装置、电子设备、存储介质 |
CN112579826A (zh) * | 2020-12-07 | 2021-03-30 | 北京字节跳动网络技术有限公司 | 视频显示及处理方法、装置、系统、设备、介质 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115145452A (zh) * | 2022-07-01 | 2022-10-04 | 杭州网易云音乐科技有限公司 | 帖子生成方法、介质、终端设备和计算设备 |
Also Published As
Publication number | Publication date |
---|---|
US20240107127A1 (en) | 2024-03-28 |
CN112579826A (zh) | 2021-03-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022121626A1 (zh) | 视频显示及处理方法、装置、系统、设备、介质 | |
CN109688463B (zh) | 一种剪辑视频生成方法、装置、终端设备及存储介质 | |
TWI720062B (zh) | 語音輸入方法、裝置和終端設備 | |
WO2022068533A1 (zh) | 互动信息处理方法、装置、设备及介质 | |
WO2022042593A1 (zh) | 字幕编辑方法、装置和电子设备 | |
EP3198381B1 (en) | Interactive video generation | |
CN107517323B (zh) | 一种信息分享方法、装置及存储介质 | |
CN113365134B (zh) | 音频分享方法、装置、设备及介质 | |
WO2022105862A1 (zh) | 视频生成及显示方法、装置、设备、介质 | |
CN110602516A (zh) | 基于视频直播的信息交互方法、装置及电子设备 | |
CN107211198A (zh) | 用于编辑内容的装置和方法 | |
CN115082602B (zh) | 生成数字人的方法、模型的训练方法、装置、设备和介质 | |
CN112929746B (zh) | 视频生成方法和装置、存储介质和电子设备 | |
WO2021238084A1 (zh) | 语音包推荐方法、装置、设备及存储介质 | |
WO2023016349A1 (zh) | 一种文本输入方法、装置、电子设备和存储介质 | |
US20230214423A1 (en) | Video generation | |
WO2022105760A1 (zh) | 一种多媒体浏览方法、装置、设备及介质 | |
CN113746875A (zh) | 一种语音包推荐方法、装置、设备及存储介质 | |
WO2023134568A1 (zh) | 显示方法、装置、电子设备及存储介质 | |
WO2022252806A1 (zh) | 信息处理方法、装置、设备及介质 | |
EP4099711A1 (en) | Method and apparatus and storage medium for processing video and timing of subtitles | |
EP4088216A1 (en) | Presenting intelligently suggested content enhancements | |
CN111443794A (zh) | 一种阅读互动方法、装置、设备、服务器及存储介质 | |
WO2022156557A1 (zh) | 图像显示方法、装置、设备及介质 | |
WO2022262560A1 (zh) | 图像显示方法、装置、设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21902333 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18256014 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 220923) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21902333 Country of ref document: EP Kind code of ref document: A1 |