WO2023202361A1 - 视频生成方法、装置、介质及电子设备 - Google Patents

视频生成方法、装置、介质及电子设备 Download PDF

Info

Publication number
WO2023202361A1
WO2023202361A1 PCT/CN2023/085775 CN2023085775W WO2023202361A1 WO 2023202361 A1 WO2023202361 A1 WO 2023202361A1 CN 2023085775 W CN2023085775 W CN 2023085775W WO 2023202361 A1 WO2023202361 A1 WO 2023202361A1
Authority
WO
WIPO (PCT)
Prior art keywords
web page
video
target
candidate
elements
Prior art date
Application number
PCT/CN2023/085775
Other languages
English (en)
French (fr)
Inventor
许冲
李冬琳
张伟
王立鑫
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2023202361A1 publication Critical patent/WO2023202361A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web

Definitions

  • the present disclosure relates to the field of data processing technology, and specifically, to a video generation method, device, medium and electronic equipment.
  • the present disclosure provides a video generation method, which includes: obtaining web page elements of a target web page; extracting visual features and semantic features of the web page elements; and based on the web page elements, the visual features, and the semantic features, Determine video material matching the target webpage; generate a target video based on the video material and the webpage element.
  • the present disclosure provides a video generation device, including: an acquisition module for acquiring web page elements of a target web page; and an extraction module for extracting visual features and semantic features of the web page elements acquired by the acquisition module. ; a determination module, configured to determine the video material matching the target web page based on the web page elements, the visual features and the semantic features extracted by the extraction module; a generation module, configured to determine the video material matching the target web page based on the determination module The determined video material and the web page elements extracted by the extraction module generate a target video.
  • the present disclosure provides a computer-readable medium on which a computer program is stored.
  • the program is executed by a processing device, the steps of the video generation method provided in the first aspect of the present disclosure are implemented.
  • the present disclosure provides an electronic device, including: a storage device on which a computer program is stored; and a processing device for executing the computer program in the storage device to implement the method provided in the first aspect of the present disclosure.
  • the steps of the video generation method are not limited to: a storage device on which a computer program is stored; and a processing device for executing the computer program in the storage device to implement the method provided in the first aspect of the present disclosure. The steps of the video generation method.
  • the present disclosure provides a computer program product, including a computer program that, when executed by a processing device, implements the steps of the video generation method provided in the first aspect of the present disclosure.
  • an embodiment of the present disclosure provides a computer program that, when executed by a processing device, implements the steps of the video generation method provided in the first aspect of the present disclosure.
  • Figure 1 is a flow chart of a video generation method according to an exemplary embodiment.
  • Figure 2 is a flowchart of a method for generating a target video based on video material and web page elements according to an exemplary embodiment.
  • FIG. 3 is a flowchart of a method for generating a target video based on video material, web page elements, and a target video template according to an exemplary embodiment.
  • FIG. 4 is a block diagram of a video generating device according to an exemplary embodiment.
  • FIG. 5 is a block diagram of an electronic device according to an exemplary embodiment.
  • videos are mainly generated manually or in a semi-automatic manner through human-computer interaction.
  • manual production costs are high and the cycle is long.
  • the quality of videos generated by different designers is uneven and difficult to control. It cannot meet the needs of converting a large amount of web content into videos.
  • Semi-automation of human-computer interaction Although the generation method can solve the problem of generation efficiency to a certain extent, it still requires manual participation, such as screening the content extracted from the web page, selecting video templates, and secondary editing of the later videos. Therefore, the production cost remains the same. higher.
  • the present disclosure provides a video generation method, device, medium and electronic equipment.
  • the term “include” and its variations are open-ended, ie, “including but not limited to.”
  • the term “based on” means “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; and the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
  • a prompt message is sent to the user to clearly remind the user that the operation requested will require the acquisition and use of the user's personal information. Therefore, users can autonomously choose whether to provide personal information to software or hardware such as electronic devices, applications, servers or storage media that perform the operations of the technical solution of the present disclosure based on the prompt information.
  • the method of sending prompt information to the user may be, for example, a pop-up window, and the prompt information may be presented in the form of text in the pop-up window.
  • the pop-up window can also contain a selection control for the user to choose "agree” or "disagree” to provide personal information to the electronic device.
  • Figure 1 is a flow chart of a video generation method according to an exemplary embodiment. As shown in Figure 1, the method may include the following S101 to S104.
  • the target web page may be any type of web page, such as blog type, news type, forum type, etc.
  • Web page elements can include static elements such as fonts, text, images, and links, as well as dynamic elements such as videos, special effects, and animations.
  • visual features refer to explicit attribute features of web page elements that can be directly obtained from web page elements, such as color, visual size, spatial coordinates and other attribute features of web page elements.
  • the visual characteristics of the web page elements can be obtained by parsing the target web page.
  • Semantic features refer to deep-level semantic features obtained from further processing and mining of web page elements.
  • the semantic features may include the web page type of the target web page, the category of web page elements, the latent attribute characteristics of web page elements (for example, the semantic representation of text), the spatial layout of the target web page, and the positional relationship between different elements in the web page elements. At least one of.
  • web page elements such as text
  • it may include categories such as action points and prices
  • web page elements such as images
  • it may include categories such as figures, animals, landscapes, etc.
  • the spatial layout of the target web page can include centered layout, grid layout, etc.; the positional relationship between different elements in the web page elements can include juxtaposition, superposition, etc. Among them, the spatial layout of the target web page and the positional relationship between different elements in the web page elements can be determined based on the spatial coordinates of the web page elements.
  • video material matching the target web page is determined based on web page elements, visual features, and semantic features.
  • video materials that match the target web page can be filtered from the video material library through feature matching based on web page elements, visual features, and semantic features.
  • video materials can include font file packages, stickers, music, images, etc.
  • a target video is generated based on the video material and web page elements.
  • the visual features and semantic features of the web page elements are extracted; then, based on the web page elements, visual features and semantic features, the video material matching the target web page is determined; finally, according to Video materials and web page elements are used to generate target videos.
  • rapid and automatic generation of web content into videos can be achieved, greatly reducing user operating costs and saving video production costs.
  • automatic recall of video materials can be achieved without manual collection of video materials, saving time and effort.
  • webpage content of the target webpage is obtained according to the webpage link; and then, webpage elements are extracted from the webpage content.
  • the webpage content of the target webpage can be obtained according to the webpage link in the following ways: first, load and render the webpage corresponding to the webpage link of the target webpage; then, load and render the webpage obtained through the headless browser. technology or crawler technology to crawl web content.
  • the latent attribute features of web page elements are extracted and the categories of web page elements are determined; then, based on the categories and latent attribute features of web page elements, the web page type of the target web page is determined.
  • the above-mentioned semantic features may also include highlight segments of the video.
  • highlight segments can be used for feature matching, which can improve the efficiency, richness, and matching of video material screening.
  • a preset video template can be used as the target video template.
  • the target video template matching the target webpage can be determined from the video template library through feature matching based on webpage elements, visual features, and semantic features. In this way, the target video template can be more closely matched with the web content of the target web page, and the quality of subsequent video generation can be improved.
  • one target video template can be obtained, or multiple target video templates can be obtained, which is not specifically limited in this disclosure.
  • a target video is generated based on the video material, web page elements and target video template.
  • the above-mentioned S201 obtains a target video template, and can fill webpage materials and webpage elements into the target video template, and then perform rendering and generation of the video in the time domain and spatial domain to obtain the target video.
  • the independent variable in the time domain is time, that is, the horizontal axis is time and the vertical axis is the change of the signal.
  • the spatial domain is the so-called pixel domain.
  • the processing in the spatial domain is the processing at the pixel level, such as image superposition at the pixel level.
  • multiple candidate videos can be generated based on video materials, web page elements, and target video templates, and the multiple candidate videos can be directly used as target videos. In this way, the diversity of generated videos can be guaranteed to meet the needs of different users.
  • the target video may be generated through S301 to S304 shown in FIG. 3 .
  • multiple candidate videos are generated based on the video material, web page elements and target video template.
  • a target number of image frames of the candidate video and the sound information in the candidate video can be obtained; then, the target number of image frames and the sound information in the candidate video are analyzed, Get the video description information of the candidate video.
  • the video description information may be a text used to describe the candidate video.
  • the video description information may be "West Lake Tourism".
  • the video description information may also be text used to describe the characteristics of the video. For example, when the candidate video matches a currently popular Internet language, for example, when Internet language such as "panta" appears in the candidate video, the video description information may be "Pan him".
  • the aesthetic quality and/or delivery effect of the candidate video is predicted based on the candidate video and the video description information of the candidate video, and a prediction result of the candidate video is obtained.
  • aesthetic quality is a subject whose theme is the study of the nature and meaning of beauty, and is an important branch of philosophy.
  • N candidate videos are determined from multiple candidate videos as target videos based on the prediction results of each candidate video.
  • N 1
  • multiple candidate videos are screened based on aesthetic quality, so that the target video meets the visual aesthetics.
  • screening multiple candidate videos based on the delivery effect i.e., delivery performance
  • delivery performance can enable the target video to achieve better delivery effects.
  • the above-mentioned S201 obtains multiple target video templates.
  • webpage materials and webpage elements can be filled into the target.
  • the temporal and spatial domains of the video are then rendered and generated to obtain multiple candidate videos.
  • the above-mentioned S201 obtains a target video template. At this time, different webpage materials and webpage elements can be filled into the target video template each time, and then the time domain and spatial domain rendering of the video is performed. Generate multiple candidate videos.
  • aesthetic quality prediction can be performed on the candidate video based on the candidate video and the video description information of the candidate video, and the aesthetic quality score corresponding to the candidate video can be obtained and used as the prediction result of the candidate video.
  • the candidate video and the video description information of the candidate video can be input into a pre-trained aesthetic quality prediction model to obtain the aesthetic quality score corresponding to the candidate video.
  • the delivery effect of the candidate video can be predicted based on the candidate video and the video description information of the candidate video, and the delivery effect score corresponding to the candidate video can be obtained and used as the prediction of the candidate video. result.
  • the candidate video and the video description information of the candidate video can be input into a pre-trained delivery effect prediction model to obtain the delivery effect score corresponding to the candidate video.
  • the aesthetic quality and delivery effect predictions of the candidate video can be performed respectively based on the candidate video and the video description information of the candidate video, and the aesthetic quality score and delivery effect score corresponding to the candidate video are obtained, and then , the sum of the aesthetic quality score corresponding to the candidate video and the delivery effect score corresponding to the candidate video is used as the prediction result of the candidate video.
  • FIG. 4 is a block diagram of a video generating device according to an exemplary embodiment. As shown in Figure 4, the device 400 may include:
  • the acquisition module 401 is used to obtain the web page elements of the target web page
  • the extraction module 402 is used to extract the visual features and semantic features of the web page elements acquired by the acquisition module 401;
  • Determining module 403 configured to determine video material matching the target web page based on the web page elements, the visual features and the semantic features extracted by the extraction module 402;
  • the generation module 404 is configured to generate a target video based on the video material determined by the determination module 403 and the web page elements extracted by the extraction module 402 .
  • the visual features and semantic features of the web page elements are extracted; then, based on the web page elements, visual features and semantic features, the video material matching the target web page is determined; finally, according to Video materials and web page elements are used to generate target videos.
  • rapid and automatic generation of web content into videos can be achieved, greatly reducing user operating costs and saving video production costs.
  • automatic recall of video materials can be achieved without manual collection of video materials, saving time and effort.
  • the semantic features include the web page type of the target web page, the category of the web page element, the latent attribute characteristics of the web page element, the spatial layout of the target web page, and the relationship between different elements in the web page element. At least one of the positional relationships.
  • the semantic features further include highlight segments of the video.
  • the semantic features include the web page type of the target web page
  • the extraction module 402 includes:
  • the first determination sub-module is used to extract the latent attribute characteristics of the web page element and determine the category of the web page element based on natural language processing and visual understanding;
  • the second determination sub-module is used to determine the web page type of the target web page according to the category and the latent attribute feature.
  • the generation module 404 includes:
  • the first acquisition sub-module is used to acquire the target video template
  • the first generation sub-module is used to generate a target video according to the video material, the web page element and the target video template.
  • the first acquisition sub-module is used to determine a target video template matching the target web page based on the web page element, the visual feature and the semantic feature.
  • the first generation sub-module includes:
  • the second generation sub-module is used to generate multiple candidate videos according to the video material, the web page element and the target video template;
  • the second acquisition sub-module is used to acquire the video description information of each candidate video
  • a prediction sub-module configured to predict the aesthetic quality and/or delivery effect of each candidate video according to the candidate video and the video description information of the candidate video, and obtain the prediction result of the candidate video;
  • the third determination sub-module is used to determine N candidate videos from the plurality of candidate videos as target videos according to the prediction results of each candidate video, where N ⁇ 1.
  • the acquisition module 401 includes:
  • the third acquisition sub-module is used to obtain the web page content of the target web page according to the web page link in response to receiving the web page link of the target web page;
  • the extraction sub-module is used to extract web page elements from the web page content.
  • the present disclosure also provides a computer-readable medium on which a computer program is stored.
  • the program is executed by a processing device, the steps of the above video generation method provided by the present disclosure are implemented.
  • Terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, PDA (Personal Digital Assistant, Personal Digital Assistant), PAD (Portable Android Device), PMP (Portable Multimedia Players, Portable Media Player), vehicle-mounted terminals (such as car navigation terminals) and other mobile terminals as well as fixed terminals such as digital TV (TV, Television), desktop computers, etc.
  • the electronic device shown in FIG. 5 is only an example and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
  • the electronic device 600 may include a processing device (such as a central processing unit, a graphics processor, etc.) 601, which may process data according to a program stored in a read-only memory (Read Only Memory, ROM) 602 or from a storage device 608
  • a processing device such as a central processing unit, a graphics processor, etc.
  • the program loaded into the random access memory (Random Access Memory, RAM) 603 performs various appropriate actions and processing.
  • RAM 603 Random Access Memory
  • various programs and data required for the operation of the electronic device 600 are also stored.
  • the processing device 601, ROM 602 and RAM 603 are connected to each other via a bus 604.
  • An input/output (I/O) interface 605 is also connected to bus 604.
  • I/O interface 605 input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a Liquid Crystal Display (LCD) , output devices 607 such as speakers, vibrators, etc.; including storage devices such as tapes, hard disks, etc. 608; and communication device 609.
  • Communication device 609 may allow electronic device 600 to communicate wirelessly or wiredly with other devices to exchange data.
  • FIG. 5 illustrates electronic device 600 with various means, it should be understood that implementation or availability of all illustrated means is not required. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via communication device 609, or from storage device 608, or from ROM 602.
  • the processing device 601 When the computer program is executed by the processing device 601, the above functions defined in the method of the embodiment of the present disclosure are performed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium may be, for example, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination thereof. More specific examples of computer readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), removable Programmd read-only memory (EPROM or flash memory), fiber optics, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wire, optical cable, RF (radio frequency), etc., or any suitable combination of the above.
  • the client and server can communicate using any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol), and can communicate with digital data in any form or medium.
  • Communications e.g., communications network
  • Examples of communication networks include Local Area Networks (LANs), Wide Area Networks (WANs), the Internet (e.g., the Internet), and end-to-end networks (e.g., ad hoc end-to-end networks), as well as any current network for knowledge or future research and development.
  • LANs Local Area Networks
  • WANs Wide Area Networks
  • the Internet e.g., the Internet
  • end-to-end networks e.g., ad hoc end-to-end networks
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; it may also exist alone without assembly. into the electronic device.
  • the computer-readable medium carries one or more programs.
  • the electronic device When the one or more programs are executed by the electronic device, the electronic device: obtains the web page elements of the target web page; extracts the visual features and semantic features of the web page elements. ; Determine video material matching the target web page based on the web page elements, the visual features and the semantic features; generate a target video based on the video material and the web page elements.
  • Computer program code for performing the operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as "C" or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider). connected via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service provider such as an Internet service provider
  • each block in the flowchart or block diagram may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or operations. , or can be implemented using a combination of specialized hardware and computer instructions.
  • the modules involved in the embodiments of the present disclosure can be implemented in software or hardware.
  • the name of the module does not constitute a limitation on the module itself under certain circumstances.
  • the acquisition module can also be described as "a module that acquires web page elements of the target web page.”
  • exemplary types of hardware logic components include: field programmable gate array (Field Programmable Gate Array, FPGA), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), application specific standard product (Application Specific Standard Product (ASSP), System on Chip (SOC), Complex Programmable Logic Device (CPLD), etc. wait.
  • FPGA Field Programmable Gate Array
  • ASIC application Specific Integrated Circuit
  • ASSP Application Specific Standard Product
  • SOC System on Chip
  • CPLD Complex Programmable Logic Device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM portable compact disk read-only memory
  • magnetic storage device or any suitable combination of the above.
  • Example 1 provides a video generation method, including: obtaining a web page element of a target web page; extracting visual features and semantic features of the web page element; according to the web page element, the The visual features and the semantic features are used to determine the video material that matches the target web page; and the target video is generated based on the video material and the web page elements.
  • Example 2 provides the method of Example 1, the semantic features include the web page type of the target web page, the category of the web page element, the latent attribute feature of the web page element, the At least one of the spatial layout of the target web page and the positional relationship between different elements in the web page elements.
  • Example 3 provides the method of Example 2. If the target web page includes a video, the semantic feature further includes a highlight segment of the video.
  • Example 4 provides the method of Example 2, the semantic features include the web page type of the target web page; the web page type is determined in the following manner: based on natural language processing and visual understanding, Extract latent attribute features of the web page element and determine the category of the web page element; determine the web page type of the target web page based on the category and the latent attribute feature.
  • Example 5 provides the method described in any one of Example 1 to Example 4, wherein generating a target video according to the video material and the web page element includes: obtaining the target video Template: generate a target video according to the video material, the web page element and the target video template.
  • Example 6 provides the method of Example 5.
  • Obtaining a target video template includes: determining, according to the web page element, the visual feature, and the semantic feature, the target video template. The web page matches the target video template.
  • Example 7 provides the method of Example 5, wherein generating a target video according to the video material, the web page element, and the target video template includes: according to the video material ,Place Use the web page elements and the target video template to generate multiple candidate videos; obtain the video description information of each candidate video; for each candidate video, based on the candidate video and the video description information of the candidate video, Predict the aesthetic quality and/or delivery effect of the candidate video to obtain the prediction result of the candidate video; according to the prediction result of each candidate video, determine N candidate videos from the plurality of candidate videos as target videos, Among them, N ⁇ 1.
  • Example 8 provides the method described in any one of Examples 1 to 4.
  • Obtaining webpage elements of the target webpage includes: in response to receiving a webpage link of the target webpage, according to The webpage link obtains the webpage content of the target webpage; and extracts webpage elements from the webpage content.
  • Example 9 provides a video generation device, including: an acquisition module, configured to acquire web page elements of a target web page; and an extraction module, configured to extract the web page elements acquired by the acquisition module. visual features and semantic features of web page elements; a determination module configured to determine video material matching the target web page based on the web page elements, the visual features and the semantic features extracted by the extraction module; generate A module configured to generate a target video based on the video material determined by the determination module and the web page elements extracted by the extraction module.
  • Example 10 provides a computer-readable medium having a computer program stored thereon, which implements the steps of the method in any one of Examples 1-8 when executed by a processing device. .
  • Example 11 provides an electronic device, including: a storage device having a computer program stored thereon; and a processing device configured to execute the computer program in the storage device, to Implement the steps of the method described in any of Examples 1-8.
  • Example 12 provides a computer program product, including a computer program that, when executed by a processing device, implements the steps of the method in any one of Examples 1-8.
  • Example 13 provides a computer program that, when executed by a processing device, implements the steps of the method in any one of Examples 1-8.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

本公开涉及一种视频生成方法、装置、介质及电子设备。方法包括:获取目标网页的网页元素;提取所述网页元素的视觉特征和语义特征;根据所述网页元素、所述视觉特征以及所述语义特征,确定与所述目标网页相匹配的视频素材;根据所述视频素材和所述网页元素,生成目标视频。这样,可以实现网页内容到视频的快速自动生成,极大减少了用户的操作成本,节约了视频制作成本。另外,可以实现视频素材的自动召回,而无需人工收集视频素材,省时省力。此外,在确定用于视频生成的视频素材时,不但参考了网页元素本身和网页元素的视觉特征,还参考了网页元素的深层语义特征,从而能够提取到更加丰富贴切的视频素材,进而提升了视频生成的质量。

Description

视频生成方法、装置、介质及电子设备
相关申请的交叉引用
本申请要求于2022年04月22日提交中国专利局、申请号为202210432243.7、申请名称为“视频生成方法、装置、介质及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本文中。
技术领域
本公开涉及数据处理技术领域,具体地,涉及一种视频生成方法、装置、介质及电子设备。
背景技术
现阶段,互联网中的信息通常都以文本、图像、音视频等形式存在。其中,视频形式的信息以内容表现丰富、可视化、直观等优势,常被用户作为了解事物的主要方式。因此,用户想要将传统的网页内容转化成短视频,以满足其在短视频领域的推广转化目的。现阶段,针对网页内容,主要通过人工或者人机交互的半自动化方式来生成视频,成本高、周期长、且复杂度高。
发明内容
提供该发明内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该发明内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。
第一方面,本公开提供一种视频生成方法,包括:获取目标网页的网页元素;提取所述网页元素的视觉特征和语义特征;根据所述网页元素、所述视觉特征以及所述语义特征,确定与所述目标网页相匹配的视频素材;根据所述视频素材和所述网页元素,生成目标视频。
第二方面,本公开提供一种视频生成装置,包括:获取模块,用于获取目标网页的网页元素;提取模块,用于提取所述获取模块获取到的所述网页元素的视觉特征和语义特征; 确定模块,用于根据所述提取模块提取到的所述网页元素、所述视觉特征以及所述语义特征,确定与所述目标网页相匹配的视频素材;生成模块,用于根据所述确定模块确定出的所述视频素材和所述提取模块提取到的所述网页元素,生成目标视频。
第三方面,本公开提供一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现本公开第一方面提供的所述视频生成方法的步骤。
第四方面,本公开提供一种电子设备,包括:存储装置,其上存储有计算机程序;处理装置,用于执行所述存储装置中的所述计算机程序,以实现本公开第一方面提供的所述视频生成方法的步骤。
第五方面,本公开提供一种计算机程序产品,包括计算机程序,该计算机程序被处理装置执行时实现本公开第一方面提供的所述视频生成方法的步骤。
第六方面,本公开实施例提供一种计算机程序,该计算机程序被处理装置执行时实现本公开第一方面提供的所述视频生成方法的步骤。
本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。
附图说明
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。在附图中:
图1是根据一示例性实施例示出的一种视频生成方法的流程图。
图2是根据一示例性实施例示出的一种根据视频素材和网页元素,生成目标视频的方法的流程图。
图3是根据一示例性实施例示出的一种根据视频素材、网页元素以及目标视频模板,生成目标视频的方法的流程图。
图4是根据一示例性实施例示出的一种视频生成装置的框图。
图5是根据一示例性实施例示出的一种电子设备的框图。
具体实施方式
正如背景技术中论述的那样,现阶段,针对网页内容,主要通过人工或者人机交互的半自动化方式来生成视频。其中,人工制作成本高、周期长,不同设计师生成的视频质量参差不齐,难以把控,无法满足大量的网页内容转化成视频的需求。人机交互的半自动化 生成方式虽然能够一定程度上解决生成效率的问题,但是依然需要人工参与,比如,对从网页中提取的内容进行筛选、视频模板的挑选以及后期视频的二次编辑等,因此,指制作成本依旧较高。
鉴于此,本公开提供一种视频生成方法、装置、介质及电子设备。
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
可以理解的是,在使用本公开各实施例公开的技术方案之前,均应当依据相关法律法规通过恰当的方式对本公开所涉及个人信息的类型、使用范围、使用场景等告知用户并获得用户的授权。
例如,在响应于接收到用户的主动请求时,向用户发送提示信息,以明确地提示用户,其请求执行的操作将需要获取和使用到用户的个人信息。从而,使得用户可以根据提示信息来自主地选择是否向执行本公开技术方案的操作的电子设备、应用程序、服务器或存储介质等软件或硬件提供个人信息。
作为一种可选的但非限定性的实现方式,响应于接收到用户的主动请求,向用户发送提示信息的方式例如可以是弹窗的方式,弹窗中可以以文字的方式呈现提示信息。此外, 弹窗中还可以承载供用户选择“同意”或者“不同意”向电子设备提供个人信息的选择控件。
可以理解的是,上述通知和获取用户授权过程仅是示意性的,不对本公开的实现方式构成限定,其它满足相关法律法规的方式也可应用于本公开的实现方式中。
同时,可以理解的是,本技术方案所涉及的数据(包括但不限于数据本身、数据的获取或使用)应当遵循相应法律法规及相关规定的要求。
图1是根据一示例性实施例示出的一种视频生成方法的流程图。如图1所示,该方法可以包括以下S101~S104。
在S101中,获取目标网页的网页元素。
在本公开中,目标网页可以是任意类型的网页,比如,博客类型、新闻类型、论坛类型等。网页元素可以包括字体、文本、图像、链接等静态元素,以及视频、特效、动图等动态元素。
在S102中,提取网页元素的视觉特征和语义特征。
在本公开中,视觉特征是指能够根据网页元素直接获取到的网页元素的显属性特征,例如,网页元素的颜色、视觉大小、空间坐标等属性特征。其中,可以通过对目标网页进行解析来得到网页元素的视觉特征。
语义特征是指对网页元素进一步处理和挖掘得到的深层次的语义特征。具体来说,语义特征可以包括目标网页的网页类型、网页元素的类别、网页元素的隐属性特征(例如,文本的语义表征)、目标网页的空间布局以及网页元素中不同元素间的位置关系中的至少一者。
示例地,针对文本这类网页元素,其可以包括行动点、价格等类别;针对图像这类网页元素,其可以包括人物图、动物图、风景图等类别。
目标网页的空间布局可以包括居中布局、网格布局等;网页元素中不同元素间的位置关系可以包括并列关系、叠加关系等。其中,可以基于网页元素的空间坐标,确定目标网页的空间布局以及网页元素中不同元素间的位置关系。
在S103中,根据网页元素、视觉特征以及语义特征,确定与目标网页相匹配的视频素材。
在本公开中,可以根据网页元素、视觉特征以及语义特征,通过特征匹配的方式,从视频素材库中筛选出与目标网页相匹配的视频素材。其中,视频素材可以包括字体文件包、贴纸、音乐、图像等。
在S104中,根据视频素材和网页元素,生成目标视频。
在上述技术方案中,在获取目标网页的网页元素后,提取网页元素的视觉特征和语义特征;然后,根据网页元素、视觉特征以及语义特征,确定与目标网页相匹配的视频素材;最后,根据视频素材和网页元素,生成目标视频。这样,可以实现网页内容到视频的快速自动生成,极大减少了用户的操作成本,节约了视频制作成本。另外,可以实现视频素材的自动召回,而无需人工收集视频素材,省时省力。此外,在确定用于视频生成的视频素材时,不但参考了网页元素本身和网页元素的视觉特征,还参考了网页元素的深层语义特征,从而能够提取到更加丰富贴切的视频素材,进而提升了视频生成的质量。
下面针对上述S101中的获取目标网页的网页元素的具体实施方式进行详细说明。具体来说,可以通过多种方式来实现,在一种实施方式中,响应于接收到包含目标网页的页面图像,通过图像识别技术来获取目标网页的网页内容;然后,从网页内容中提取网页元素。这样,用户只需输入网页的页面截图,即可自动快速生成相应的视频,方便快捷,极大降低了用户的操作成本。
在另一种实施方式中,响应于接收到目标网页的网页链接,根据网页链接获取目标网页的网页内容;之后,从网页内容中提取网页元素。这样,用户只需输入网页链接,即可自动快速生成相应的视频,方便快捷,极大降低了用户的操作成本。
具体来说,可以根据网页链接,通过以下方式来获取目标网页的网页内容:首先,对目标网页的网页链接对应的网页进行加载渲染;之后,对加载渲染后所得的网页,通过无头浏览器技术或爬虫技术来抓取网页内容。
下面针对上述目标网页的网页类型的具体确定方式进行详细说明。具体来说,可以通过以下方式来实现:
首先,基于自然语言处理和视觉理解,提取网页元素的隐属性特征,并确定网页元素的类别;然后,根据网页元素的类别和隐属性特征,确定目标网页的网页类型。
另外,在目标网页包括视频的情况下,上述语义特征还可以包括视频的高光片段。这样,可以在确定与目标网页相匹配的视频素材时,采用高光片段进行特征匹配,可以提高视频素材筛选的效率、丰富度以及匹配度。
下面针对上述S104中的根据视频素材和网页元素,生成目标视频的具体实施方式进行详细说明,具体来说,可以通过图2中所示S201和S202来实现。
在S201中,获取目标视频模板。
在一种实施方式中,可以将预设视频模板作为目标视频模板。
在另一种实施方式中,可以根据网页元素、视觉特征以及语义特征,通过特征匹配的方式,从视频模板库中确定除与目标网页相匹配的目标视频模板。这样,可以使得目标视频模板与目标网页的网页内容更加匹配,提升后续视频生成的质量。
另外,需要说明的是,可以获取一个目标视频模板,也可以获取多个目标视频模板,本公开不作具体限定。
在S202中,根据视频素材、网页元素以及目标视频模板,生成目标视频。
在一种实施方式中,上述S201获取到一个目标视频模板,可以将网页素材、网页元素填充到目标视频模板中,之后,进行视频的时域、空域的渲染生成,得到目标视频。其中,时域的自变量是时间,即横轴是时间,纵轴是信号的变化。空域,即所说的像素域,在空域的处理就是在像素级的处理,如在像素级的图像叠加。
在另一种实施方式中,可以根据视频素材、网页元素以及目标视频模板,生成多个候选视频,并将该多个候选视频直接作为目标视频。这样,可以保证生成视频的多样性,以满足不同用户的需求。
在又一种实施方式中,可以通过图3中所示的S301~S304来生成目标视频。
在S301中,根据视频素材、网页元素以及目标视频模板,生成多个候选视频。
在S302中,获取每一候选视频的视频描述信息。
在本公开中,可以针对每一候选视频,获取该候选视频的目标数量的图像帧和该候选视频中的声音信息;之后,对目标数量的图像帧和该候选视频中的声音信息进行分析,得到该候选视频的视频描述信息。
其中,视频描述信息可以是一段用于描述候选视频的文字,例如,候选视频为游览西湖的视频时,视频描述信息可以是“西湖旅游”。
视频描述信息还可以是用于描述视频特点的文字,例如,当候选视频为与当前流行的网络语言匹配时,例如,候选视频中有出现“盘他”等网络语言时,视频描述信息可以是“盘他”。
在S303中,针对每一候选视频,根据该候选视频和该候选视频的视频描述信息,对该候选视频进行美学质量和/或投放效果预测,得到该候选视频的预测结果。
在本公开中,美学质量是以对美的本质及其意义的研究为主题的学科,乃哲学中一个重要分支。
在S304中,根据每一候选视频的预测结果,从多个候选视频中确定N个候选视频,作为目标视频。
在本公开中,N≥1。
在上述实施方式中,基于美学质量对多个候选视频进行筛选,可以使得目标视频满足视觉美观性。另外,基于投放效果(即投放表现)对多个候选视频进行筛选,可以使得目标视频能达到较好的投放效果。
下面针对上述根据视频素材、网页元素以及目标视频模板,生成多个候选视频的具体实施方式进行详细说明。具体来说,可以通过多种方式来实现,在一种实施方式中,上述S201获取到多个目标视频模板,此时,可以针对每一目标视频模板,将网页素材、网页元素填充到该目标视频模板中,之后,进行视频的时域、空域的渲染生成,从而得到多个候选视频。
在另一种实施方式中,上述S201获取到一个目标视频模板,此时,可以每次将不同的网页素材、网页元素填充到该目标视频模板中,之后,进行视频的时域、空域的渲染生成,从而得到多个候选视频。
下面针对上述S303中的根据该候选视频和该候选视频的视频描述信息,对该候选视频进行美学质量和/或投放效果预测,得到该候选视频的预测结果的具体实施方式进行详细说明。
在一种实施方式中,可以根据该候选视频和该候选视频的视频描述信息,对该候选视频进行美学质量预测,得到该候选视频对应的美学质量分,并将其作为该候选视频的预测结果。具体来说,可以将该候选视频和该候选视频的视频描述信息输入到预先训练好的美学质量预测模型中,得到该候选视频对应的美学质量分。
在另一种实施方式中,可以根据该候选视频和该候选视频的视频描述信息,对该候选视频进行投放效果预测,得到该候选视频对应的投放效果分,并将其作为该候选视频的预测结果。具体来说,可以将该候选视频和该候选视频的视频描述信息输入到预先训练好的投放效果预测模型中,得到该候选视频对应的投放效果分。
在又一种实施方式中,可以根据该候选视频和该候选视频的视频描述信息,对该候选视频分别进行美学质量和投放效果预测,得到该候选视频对应的美学质量分和投放效果分,之后,将该候选视频对应的美学质量分与该候选视频对应的投放效果分的和作为该候选视频的预测结果。
图4是根据一示例性实施例示出的一种视频生成装置的框图。如图4所示,该装置400可以包括:
获取模块401,用于获取目标网页的网页元素;
提取模块402,用于提取所述获取模块401获取到的所述网页元素的视觉特征和语义特征;
确定模块403,用于根据所述提取模块402提取到的所述网页元素、所述视觉特征以及所述语义特征,确定与所述目标网页相匹配的视频素材;
生成模块404,用于根据所述确定模块403确定出的所述视频素材和所述提取模块402提取到的所述网页元素,生成目标视频。
在上述技术方案中,在获取目标网页的网页元素后,提取网页元素的视觉特征和语义特征;然后,根据网页元素、视觉特征以及语义特征,确定与目标网页相匹配的视频素材;最后,根据视频素材和网页元素,生成目标视频。这样,可以实现网页内容到视频的快速自动生成,极大减少了用户的操作成本,节约了视频制作成本。另外,可以实现视频素材的自动召回,而无需人工收集视频素材,省时省力。此外,在确定用于视频生成的视频素材时,不但参考了网页元素本身和网页元素的视觉特征,还参考了网页元素的深层语义特征,从而能够提取到更加丰富贴切的视频素材,进而提升了视频生成的质量。
可选地,所述语义特征包括所述目标网页的网页类型、所述网页元素的类别、所述网页元素的隐属性特征、所述目标网页的空间布局以及所述网页元素中不同元素间的位置关系中的至少一者。
可选地,在所述目标网页中包括视频的情况下,所述语义特征还包括所述视频的高光片段。
可选地,所述语义特征包括所述目标网页的网页类型;
所述提取模块402包括:
第一确定子模块,用于基于自然语言处理和视觉理解,提取所述网页元素的隐属性特征、并确定所述网页元素的类别;
第二确定子模块,用于根据所述类别和所述隐属性特征,确定所述目标网页的网页类型。
可选地,所述生成模块404包括:
第一获取子模块,用于获取目标视频模板;
第一生成子模块,用于根据所述视频素材、所述网页元素以及所述目标视频模板,生成目标视频。
可选地,所述第一获取子模块用于根据所述网页元素、所述视觉特征以及所述语义特征,确定与所述目标网页相匹配的目标视频模板。
可选地,所述第一生成子模块包括:
第二生成子模块,用于根据所述视频素材、所述网页元素以及所述目标视频模板,生成多个候选视频;
第二获取子模块,用于获取每一所述候选视频的视频描述信息;
预测子模块,用于针对每一所述候选视频,根据该候选视频和该候选视频的视频描述信息,对该候选视频进行美学质量和/或投放效果预测,得到该候选视频的预测结果;
第三确定子模块,用于根据每一所述候选视频的预测结果,从所述多个候选视频中确定N个候选视频,作为目标视频,其中,N≥1。
可选地,所述获取模块401包括:
第三获取子模块,用于响应于接收到目标网页的网页链接,根据所述网页链接获取所述目标网页的网页内容;
提取子模块,用于从所述网页内容中提取网页元素。
本公开还提供一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现本公开提供的上述视频生成方法的步骤。
下面参考图5,其示出了适于用来实现本公开实施例的电子设备(例如终端设备或服务器)600的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理,Personal Digital Assistant)、PAD(平板电脑,Portable Android Device)、PMP(便携式多媒体播放器,Portable Media Player)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV(电视,Television)、台式计算机等等的固定终端。图5示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图5所示,电子设备600可以包括处理装置(例如中央处理器、图形处理器等)601,其可以根据存储在只读存储器(Read Only Memory,ROM)602中的程序或者从存储装置608加载到随机访问存储器(Random Access Memory,RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有电子设备600操作所需的各种程序和数据。处理装置601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(Input/Output,I/O)接口605也连接至总线604。
通常,以下装置可以连接至I/O接口605:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置606;包括例如液晶显示器(Liquid Crystal Display,LCD)、扬声器、振动器等的输出装置607;包括例如磁带、硬盘等的存储装置 608;以及通信装置609。通信装置609可以允许电子设备600与其他设备进行无线或有线通信以交换数据。虽然图5示出了具有各种装置的电子设备600,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM 602被安装。在该计算机程序被处理装置601执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配 入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取目标网页的网页元素;提取所述网页元素的视觉特征和语义特征;根据所述网页元素、所述视觉特征以及所述语义特征,确定与所述目标网页相匹配的视频素材;根据所述视频素材和所述网页元素,生成目标视频。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言——诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块的名称在某种情况下并不构成对该模块本身的限定,例如,获取模块还可以被描述为“获取目标网页的网页元素的模块”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(Field Programmable Gate Array,FPGA)、专用集成电路(Application Specific Integrated Circuit,ASIC)、专用标准产品(Application Specific Standard Product,ASSP)、片上系统(System on Chip,SOC)、复杂可编程逻辑设备(Complex Programmable Logic Device,CPLD)等 等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,示例1提供了一种视频生成方法,包括:获取目标网页的网页元素;提取所述网页元素的视觉特征和语义特征;根据所述网页元素、所述视觉特征以及所述语义特征,确定与所述目标网页相匹配的视频素材;根据所述视频素材和所述网页元素,生成目标视频。
根据本公开的一个或多个实施例,示例2提供了示例1的方法,所述语义特征包括所述目标网页的网页类型、所述网页元素的类别、所述网页元素的隐属性特征、所述目标网页的空间布局以及所述网页元素中不同元素间的位置关系中的至少一者。
根据本公开的一个或多个实施例,示例3提供了示例2的方法,在所述目标网页中包括视频的情况下,所述语义特征还包括所述视频的高光片段。
根据本公开的一个或多个实施例,示例4提供了示例2的方法,所述语义特征包括所述目标网页的网页类型;所述网页类型通过以下方式确定:基于自然语言处理和视觉理解,提取所述网页元素的隐属性特征、并确定所述网页元素的类别;根据所述类别和所述隐属性特征,确定所述目标网页的网页类型。
根据本公开的一个或多个实施例,示例5提供了示例1-示例4任一项所述的方法,所述根据所述视频素材和所述网页元素,生成目标视频,包括:获取目标视频模板;根据所述视频素材、所述网页元素以及所述目标视频模板,生成目标视频。
根据本公开的一个或多个实施例,示例6提供了示例5的方法,所述获取目标视频模板,包括:根据所述网页元素、所述视觉特征以及所述语义特征,确定与所述目标网页相匹配的目标视频模板。
根据本公开的一个或多个实施例,示例7提供了示例5的方法,所述根据所述视频素材、所述网页元素以及所述目标视频模板,生成目标视频,包括:根据所述视频素材、所 述网页元素以及所述目标视频模板,生成多个候选视频;获取每一所述候选视频的视频描述信息;针对每一所述候选视频,根据该候选视频和该候选视频的视频描述信息,对该候选视频进行美学质量和/或投放效果预测,得到该候选视频的预测结果;根据每一所述候选视频的预测结果,从所述多个候选视频中确定N个候选视频,作为目标视频,其中,N≥1。
根据本公开的一个或多个实施例,示例8提供了示例1-示例4任一项所述的方法,所述获取目标网页的网页元素,包括:响应于接收到目标网页的网页链接,根据所述网页链接获取所述目标网页的网页内容;从所述网页内容中提取网页元素。
根据本公开的一个或多个实施例,示例9提供了一种视频生成装置,包括:获取模块,用于获取目标网页的网页元素;提取模块,用于提取所述获取模块获取到的所述网页元素的视觉特征和语义特征;确定模块,用于根据所述提取模块提取到的所述网页元素、所述视觉特征以及所述语义特征,确定与所述目标网页相匹配的视频素材;生成模块,用于根据所述确定模块确定出的所述视频素材和所述提取模块提取到的所述网页元素,生成目标视频。
根据本公开的一个或多个实施例,示例10提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现示例1-8中任一项所述方法的步骤。
根据本公开的一个或多个实施例,示例11提供了一种电子设备,包括:存储装置,其上存储有计算机程序;处理装置,用于执行所述存储装置中的所述计算机程序,以实现示例1-8中任一项所述方法的步骤。
根据本公开的一个或多个实施例,示例12提供了一种计算机程序产品,包括计算机程序,该计算机程序被处理装置执行时实现示例1-8中任一项所述方法的步骤。
根据本公开的一个或多个实施例,示例13提供了一种计算机程序,该计算机程序被处理装置执行时实现示例1-8中任一项所述方法的步骤。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。 同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。

Claims (13)

  1. 一种视频生成方法,包括:
    获取目标网页的网页元素;
    提取所述网页元素的视觉特征和语义特征;
    根据所述网页元素、所述视觉特征以及所述语义特征,确定与所述目标网页相匹配的视频素材;
    根据所述视频素材和所述网页元素,生成目标视频。
  2. 根据权利要求1所述的方法,其中,所述语义特征包括所述目标网页的网页类型、所述网页元素的类别、所述网页元素的隐属性特征、所述目标网页的空间布局以及所述网页元素中不同元素间的位置关系中的至少一者。
  3. 根据权利要求2所述的方法,其中,在所述目标网页中包括视频的情况下,所述语义特征还包括所述视频的高光片段。
  4. 根据权利要求2或3所述的方法,其中,所述语义特征包括所述目标网页的网页类型;
    所述网页类型通过以下方式确定:
    基于自然语言处理和视觉理解,提取所述网页元素的隐属性特征、并确定所述网页元素的类别;
    根据所述类别和所述隐属性特征,确定所述目标网页的网页类型。
  5. 根据权利要求1-4中任一项所述的方法,其中,所述根据所述视频素材和所述网页元素,生成目标视频,包括:
    获取目标视频模板;
    根据所述视频素材、所述网页元素以及所述目标视频模板,生成目标视频。
  6. 根据权利要求5所述的方法,其中,所述获取目标视频模板,包括:
    根据所述网页元素、所述视觉特征以及所述语义特征,确定与所述目标网页相匹配的目标视频模板。
  7. 根据权利要求5或6所述的方法,其中,所述根据所述视频素材、所述网页元素以及所述目标视频模板,生成目标视频,包括:
    根据所述视频素材、所述网页元素以及所述目标视频模板,生成多个候选视频;
    获取每一所述候选视频的视频描述信息;
    针对每一所述候选视频,根据该候选视频和该候选视频的视频描述信息,对该候选视 频进行美学质量和/或投放效果预测,得到该候选视频的预测结果;
    根据每一所述候选视频的预测结果,从所述多个候选视频中确定N个候选视频,作为目标视频,其中,N≥1。
  8. 根据权利要求1-7中任一项所述的方法,其中,所述获取目标网页的网页元素,包括:
    响应于接收到目标网页的网页链接,根据所述网页链接获取所述目标网页的网页内容;
    从所述网页内容中提取网页元素。
  9. 一种视频生成装置,包括:
    获取模块,用于获取目标网页的网页元素;
    提取模块,用于提取所述获取模块获取到的所述网页元素的视觉特征和语义特征;
    确定模块,用于根据所述提取模块提取到的所述网页元素、所述视觉特征以及所述语义特征,确定与所述目标网页相匹配的视频素材;
    生成模块,用于根据所述确定模块确定出的所述视频素材和所述提取模块提取到的所述网页元素,生成目标视频。
  10. 一种计算机可读介质,其上存储有计算机程序,所述计算机程序被处理装置执行时实现权利要求1-8中任一项所述方法的步骤。
  11. 一种电子设备,包括:
    存储装置,其上存储有计算机程序;
    处理装置,用于执行所述存储装置中的所述计算机程序,以实现权利要求1-8中任一项所述方法的步骤。
  12. 一种计算机程序产品,包括计算机程序,所述计算机程序被处理装置执行时实现权利要求1-8中任一项所述方法的步骤。
  13. 一种计算机程序,所述计算机程序被处理装置执行时实现权利要求1-8中任一项所述方法的步骤。
PCT/CN2023/085775 2022-04-22 2023-03-31 视频生成方法、装置、介质及电子设备 WO2023202361A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210432243.7A CN114786069A (zh) 2022-04-22 2022-04-22 视频生成方法、装置、介质及电子设备
CN202210432243.7 2022-04-22

Publications (1)

Publication Number Publication Date
WO2023202361A1 true WO2023202361A1 (zh) 2023-10-26

Family

ID=82433466

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/085775 WO2023202361A1 (zh) 2022-04-22 2023-03-31 视频生成方法、装置、介质及电子设备

Country Status (2)

Country Link
CN (1) CN114786069A (zh)
WO (1) WO2023202361A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114786069A (zh) * 2022-04-22 2022-07-22 北京有竹居网络技术有限公司 视频生成方法、装置、介质及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104731960A (zh) * 2015-04-03 2015-06-24 北京威扬科技有限公司 基于电子商务网页内容生成视频摘要的方法、装置及系统
CN108965737A (zh) * 2017-05-22 2018-12-07 腾讯科技(深圳)有限公司 媒体数据处理方法、装置及存储介质
CN110309351A (zh) * 2018-02-14 2019-10-08 阿里巴巴集团控股有限公司 数据对象的视频影像生成、装置及计算机系统
CN114786069A (zh) * 2022-04-22 2022-07-22 北京有竹居网络技术有限公司 视频生成方法、装置、介质及电子设备

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10694222B2 (en) * 2016-01-07 2020-06-23 Microsoft Technology Licensing, Llc Generating video content items using object assets
WO2022061806A1 (zh) * 2020-09-27 2022-03-31 深圳市大疆创新科技有限公司 影片生成方法、终端设备、拍摄设备及影片生成系统
CN112287168A (zh) * 2020-10-30 2021-01-29 北京有竹居网络技术有限公司 用于生成视频的方法和装置
CN113033680A (zh) * 2021-03-31 2021-06-25 北京有竹居网络技术有限公司 视频分类方法、装置、可读介质及电子设备
CN113596579B (zh) * 2021-07-29 2023-04-07 北京字节跳动网络技术有限公司 视频生成方法、装置、介质及电子设备
CN114363701A (zh) * 2021-12-29 2022-04-15 四川启睿克科技有限公司 一种将web网页转化为短视频的方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104731960A (zh) * 2015-04-03 2015-06-24 北京威扬科技有限公司 基于电子商务网页内容生成视频摘要的方法、装置及系统
CN108965737A (zh) * 2017-05-22 2018-12-07 腾讯科技(深圳)有限公司 媒体数据处理方法、装置及存储介质
CN110309351A (zh) * 2018-02-14 2019-10-08 阿里巴巴集团控股有限公司 数据对象的视频影像生成、装置及计算机系统
CN114786069A (zh) * 2022-04-22 2022-07-22 北京有竹居网络技术有限公司 视频生成方法、装置、介质及电子设备

Also Published As

Publication number Publication date
CN114786069A (zh) 2022-07-22

Similar Documents

Publication Publication Date Title
CN109168026B (zh) 即时视频显示方法、装置、终端设备及存储介质
CN110046021B (zh) 一种页面显示方法、装置、系统、设备和存储介质
CN109460233B (zh) 页面的原生界面显示更新方法、装置、终端设备及介质
WO2021196903A1 (zh) 视频处理方法、装置、可读介质及电子设备
WO2021179882A1 (zh) 图像的绘制方法、装置、可读介质和电子设备
CN111277892B (zh) 用于选取视频片段的方法、装置、服务器和介质
WO2020233166A1 (zh) 评论数据的提供、显示方法、装置、电子设备及存储介质
WO2021135626A1 (zh) 菜单项选择方法、装置、可读介质及电子设备
WO2021223752A1 (zh) 展示方法、装置和电子设备
CN110516159B (zh) 一种信息推荐方法、装置、电子设备及存储介质
CN110070593B (zh) 图片预览信息的显示方法、装置、设备及介质
WO2020199749A1 (zh) 基于反馈的信息推送方法、装置及电子设备
US11785195B2 (en) Method and apparatus for processing three-dimensional video, readable storage medium and electronic device
WO2020220776A1 (zh) 图片类评论数据的展示方法、装置、设备及介质
WO2021218981A1 (zh) 互动记录的生成方法、装置、设备及介质
US11818491B2 (en) Image special effect configuration method, image recognition method, apparatus and electronic device
WO2023202361A1 (zh) 视频生成方法、装置、介质及电子设备
CN111723309B (zh) 用于网页搜索的方法和装置
WO2024104336A1 (zh) 一种信息采集方法、装置、存储介质及电子设备
CN112492399B (zh) 信息显示方法、装置及电子设备
CN116596748A (zh) 图像风格化处理方法、装置、设备、存储介质和程序产品
WO2022262824A1 (zh) 交互方法、装置、电子设备及计算机可读存储介质
CN114520928B (zh) 显示信息生成方法、信息显示方法、装置和电子设备
WO2022245280A1 (zh) 特征构建方法、内容显示方法及相关装置
CN111641692B (zh) 会话数据处理方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23791017

Country of ref document: EP

Kind code of ref document: A1