CN112565875B - Method, device, equipment and computer readable storage medium for automatically generating video - Google Patents

Method, device, equipment and computer readable storage medium for automatically generating video Download PDF

Info

Publication number
CN112565875B
CN112565875B CN202011383389.4A CN202011383389A CN112565875B CN 112565875 B CN112565875 B CN 112565875B CN 202011383389 A CN202011383389 A CN 202011383389A CN 112565875 B CN112565875 B CN 112565875B
Authority
CN
China
Prior art keywords
multimedia content
video
node
data
module configured
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011383389.4A
Other languages
Chinese (zh)
Other versions
CN112565875A (en
Inventor
卞东海
彭卫华
罗雨
蒋帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011383389.4A priority Critical patent/CN112565875B/en
Publication of CN112565875A publication Critical patent/CN112565875A/en
Application granted granted Critical
Publication of CN112565875B publication Critical patent/CN112565875B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • H04N21/4355Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream involving reformatting operations of additional data, e.g. HTML pages on a television screen
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4398Processing of audio elementary streams involving reformatting operations of audio signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Data Mining & Analysis (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

根据本公开的示例实施例,提供了一种自动生成视频的方法、装置、设备、计算机可读存储介质和计算机程序产品。涉及知识图谱、深度学习和视频创作领域。一种自动生成视频的方法,包括接收用户输入,用户输入包括第一多媒体内容和用于描述视频的关键短语,第一多媒体内容具有多种预定数据格式中的至少一种数据格式;基于关键短语,从预先构建的知识图谱确定至少一个节点;基于第一多媒体内容,获取与至少一个节点相关联的第二多媒体内容;以及基于第一多媒体内容和第二多媒体内容,生成视频。由此,可以自动、高效地生成视频。

Figure 202011383389

According to example embodiments of the present disclosure, there are provided a method, apparatus, device, computer-readable storage medium and computer program product for automatically generating a video. Involved in the fields of knowledge graph, deep learning and video creation. A method for automatically generating a video, comprising receiving user input, the user input including first multimedia content and key phrases for describing the video, the first multimedia content having at least one data format in a plurality of predetermined data formats ; based on key phrases, determine at least one node from a pre-built knowledge map; based on the first multimedia content, obtain second multimedia content associated with at least one node; and based on the first multimedia content and the second Multimedia content, generate video. As a result, videos can be automatically and efficiently generated.

Figure 202011383389

Description

自动生成视频的方法、装置、设备和计算机可读存储介质Method, device, device and computer-readable storage medium for automatically generating video

技术领域technical field

本公开的实施例主要涉及信息处理领域,并且更具体地,涉及自动生成视频的方法、装置、设备、计算机可读存储介质和计算机程序产品。Embodiments of the present disclosure mainly relate to the field of information processing, and more specifically, to methods, devices, devices, computer-readable storage media and computer program products for automatically generating videos.

背景技术Background technique

随着移动数据网络的发展,互联网上视频的数据占比正在逐渐超越文本。在与视频制作相关的技术领域中,基于人工智能的创新应用目前还处于空缺状态,也没有自动制作视频的方案。而传统的视频制作流程存在以下缺陷:对用户要求较高,合格的视频需要制作者能够应用很多复杂软件,且用于制作视频的素材不易收集。因此,需要一种自动地生成高质量的视频的方案。With the development of mobile data networks, the proportion of video data on the Internet is gradually surpassing that of text. In the technical field related to video production, innovative applications based on artificial intelligence are still in a blank state, and there is no automatic video production scheme. However, the traditional video production process has the following defects: high requirements for users, qualified video requires producers to be able to apply a lot of complex software, and the materials used to make videos are not easy to collect. Therefore, a solution to automatically generate high-quality video is needed.

发明内容Contents of the invention

根据本公开的示例实施例,提供了一种自动生成视频的方案。According to an example embodiment of the present disclosure, a solution for automatically generating a video is provided.

在本公开的第一方面中,提供了一种自动生成视频的方法,包括:接收用户输入,用户输入包括第一多媒体内容和用于描述视频的关键短语,第一多媒体内容具有多种预定数据格式中的至少一种数据格式;基于关键短语,从预先构建的知识图谱确定至少一个节点;基于第一多媒体内容,获取与至少一个节点相关联的第二多媒体内容;以及基于第一多媒体内容和第二多媒体内容,生成视频。In a first aspect of the present disclosure, there is provided a method for automatically generating a video, comprising: receiving user input, the user input includes a first multimedia content and a keyphrase for describing the video, the first multimedia content has At least one data format in a plurality of predetermined data formats; based on key phrases, determine at least one node from a pre-built knowledge map; based on the first multimedia content, obtain second multimedia content associated with at least one node ; and generating a video based on the first multimedia content and the second multimedia content.

在本公开的第二方面中,提供了一种自动生成视频的装置,包括:输入接收模块,被配置为接收用户输入,用户输入包括第一多媒体内容和用于描述视频的关键短语,第一多媒体内容具有多种预定数据格式中的至少一种数据格式;第一节点确定模块,被配置为基于关键短语,从预先构建的知识图谱确定至少一个节点;第一多媒体内容获取模块,被配置为基于第一多媒体内容,获取与至少一个节点相关联的第二多媒体内容;以及第一视频生成模块,被配置为基于第一多媒体内容和第二多媒体内容,生成视频。In a second aspect of the present disclosure, there is provided an apparatus for automatically generating a video, including: an input receiving module configured to receive user input, the user input including first multimedia content and key phrases used to describe the video, The first multimedia content has at least one data format in a plurality of predetermined data formats; the first node determination module is configured to determine at least one node from a pre-built knowledge map based on key phrases; the first multimedia content An acquisition module configured to acquire second multimedia content associated with at least one node based on the first multimedia content; and a first video generation module configured to acquire the second multimedia content based on the first multimedia content and the second multimedia content Media content, generate video.

在本公开的第三方面中,提供了一种电子设备,包括一个或多个处理器;以及存储装置,用于存储一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现根据本公开的第一方面的方法。In a third aspect of the present disclosure, an electronic device is provided, including one or more processors; and a storage device for storing one or more programs, when the one or more programs are executed by the one or more processors Execution causes the one or more processors to implement the method according to the first aspect of the present disclosure.

在本公开的第四方面中,提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现根据本公开的第一方面的方法。In a fourth aspect of the present disclosure, there is provided a computer-readable medium on which is stored a computer program that implements the method according to the first aspect of the present disclosure when executed by a processor.

在本公开的第五方面中,提供了一种计算机程序产品,包括计算机程序指令,该计算机程序指令被处理器实现如本公开的第一方面的方法。In a fifth aspect of the present disclosure, there is provided a computer program product comprising computer program instructions, the computer program instructions being implemented by a processor as the method of the first aspect of the present disclosure.

应当理解,发明内容部分中所描述的内容并非旨在限定本公开的实施例的关键或重要特征,亦非用于限制本公开的范围。本公开的其它特征将通过以下的描述变得容易理解。It should be understood that what is described in the Summary of the Invention is not intended to limit the key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will be readily understood through the following description.

附图说明Description of drawings

结合附图并参考以下详细说明,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。在附图中,相同或相似的附图标记表示相同或相似的元素,其中:The above and other features, advantages and aspects of the various embodiments of the present disclosure will become more apparent with reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, identical or similar reference numerals denote identical or similar elements, wherein:

图1示出了本公开的多个实施例能够在其中实现的示例环境的示意图;Figure 1 shows a schematic diagram of an example environment in which various embodiments of the present disclosure can be implemented;

图2示出了根据本公开的一些实施例的自动生成视频的过程的示例的流程图;FIG. 2 shows a flowchart of an example of a process of automatically generating a video according to some embodiments of the present disclosure;

图3示出了根据本公开的一些实施例的自动生成视频的过程的另一示例的流程图;FIG. 3 shows a flowchart of another example of a process of automatically generating a video according to some embodiments of the present disclosure;

图4示出了根据本公开的实施例的自动生成视频的装置的示意框图;以及Fig. 4 shows a schematic block diagram of an apparatus for automatically generating video according to an embodiment of the present disclosure; and

图5示出了能够实施本公开的多个实施例的计算设备的框图。Figure 5 shows a block diagram of a computing device capable of implementing various embodiments of the present disclosure.

具体实施方式Detailed ways

下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein; A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for exemplary purposes only, and are not intended to limit the protection scope of the present disclosure.

在本公开的实施例的描述中,术语“包括”及其类似用语应当理解为开放性包含,即“包括但不限于”。术语“基于”应当理解为“至少部分地基于”。术语“一个实施例”或“该实施例”应当理解为“至少一个实施例”。术语“第一”、“第二”等等可以指代不同的或相同的对象。下文还可能包括其他明确的和隐含的定义。In the description of the embodiments of the present disclosure, the term "comprising" and its similar expressions should be interpreted as an open inclusion, that is, "including but not limited to". The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be read as "at least one embodiment". The terms "first", "second", etc. may refer to different or the same object. Other definitions, both express and implied, may also be included below.

如以上提及的,在传统方案中,在创作视频时,存在如下缺陷:(1)在视频制作过程中,除了用户拍摄的内容以外,往往还需要多种多样的视频素材来完善用户所需的表达效果,而这些视频素材对于用户而言有着难以逾越的专业门槛,主要存在难获取、种类少、价格高昂等问题;(2)用户需要花费大量时间对素材进行合成、例如确定不同格式的素材间的转换效果或者确定文字在视频中的位置等。As mentioned above, in the traditional solution, there are the following defects when creating a video: (1) In the video production process, in addition to the content shot by the user, a variety of video materials are often required to complete the content required by the user. However, these video materials have insurmountable professional thresholds for users, and mainly have problems such as difficulty in obtaining, few types, and high prices; (2) users need to spend a lot of time to synthesize materials, such as determining different formats. Transition effects between clips or determining the position of text in the video, etc.

本公开的示例实施例提出了一种自动生成视频的方案。在该方案中,首先接收用户输入的第一多媒体内容和对待生成视频的描述。然后根据上述多媒体内容和描述,获取对第二多媒体内容。最后将用户输入的第一多媒体内容和获取的第二多媒体内容进行组合,以生成视频。由此,可以自动地获取用于生成视频的高质量多媒体内容,以高效地生成视频。Exemplary embodiments of the present disclosure propose a solution for automatically generating videos. In this solution, firstly, the first multimedia content input by the user and the description of the video to be generated are received. Then according to the above multimedia content and description, acquire the second multimedia content. Finally, the first multimedia content input by the user and the second acquired multimedia content are combined to generate a video. Thereby, high-quality multimedia content for generating videos can be automatically acquired to efficiently generate videos.

图1示出了本公开的多个实施例能够在其中实现的示例环境100的示意图。如图所示,示例环境100包括计算设备110、数据库120和视频130和用户140。计算设备110可以连接至数据库120。计算设备110还可以从用户140接收用户所输入。数据库120可以是集中式或分布式的任何适当的数据库,包括但不限于基于知识图谱技术的数据库和基于检索的数据库。FIG. 1 shows a schematic diagram of an example environment 100 in which various embodiments of the present disclosure can be implemented. As shown, example environment 100 includes computing device 110 , database 120 , and video 130 and users 140 . Computing device 110 may be connected to database 120 . Computing device 110 may also receive user input from user 140 . The database 120 may be any suitable centralized or distributed database, including but not limited to databases based on knowledge graph technology and databases based on retrieval.

在一个实施例中,为了生成视频130,计算设备110可以从数据库中存储的知识图谱中获取针对视频130的多媒体内容,也即素材。知识图谱本质上旨在描述真实世界客观存在的知识、以及知识之间等关联关系的语义网络。基于知识图谱的应用领域,时下通常将知识图谱分为通用知识图谱和垂直知识图谱(又称行业知识图谱)。通用知识图谱不面向特定领域,可将其类比为结构化的百科知识。这类知识图谱包含了大量常识性知识,强调知识的广度。垂直知识图谱则面向特定领域,基于行业知识构建,强调知识的深度。本文中的知识图谱可以是专用于视频的知识图谱,其中知识图谱中的每个节点存储有与该节点相关联的多模态数据。但也可以是通用知识图谱,本公开在此不做限制。In one embodiment, in order to generate the video 130 , the computing device 110 may acquire multimedia content for the video 130 , that is, material, from a knowledge graph stored in a database. The knowledge graph essentially aims to describe the knowledge that exists objectively in the real world and the semantic network of the relationship between knowledge. Based on the application fields of knowledge graphs, knowledge graphs are usually divided into general knowledge graphs and vertical knowledge graphs (also known as industry knowledge graphs). The general knowledge graph is not oriented to a specific field, and can be compared to structured encyclopedic knowledge. This type of knowledge graph contains a large amount of common sense knowledge and emphasizes the breadth of knowledge. The vertical knowledge graph is oriented to specific fields, built on the basis of industry knowledge, and emphasizes the depth of knowledge. The knowledge graph in this paper may be a knowledge graph dedicated to video, where each node in the knowledge graph stores multimodal data associated with the node. But it can also be a general knowledge graph, which is not limited in this disclosure.

在一个备选实施例中,数据库130中的知识图谱中有一些数据是不完善的。例如,以发动机为例,在初始构建知识图谱时,发动机这一概念可能包含有油耗、颜色、排量、品牌和型号等等属性,这些属性均是常识性知识,为公众所公知,因此在初始构建知识图谱时,这些有关于发动机属性的概念便可添加至知识图谱中,一个概念位于一个节点。初步构建了一个知识图谱的整体大框架。但是油耗具体为多少、颜色都包括哪些、排量的大小是多少升、品牌和型号都包括哪些这类的知识却是千变万化的,并不属于常识性知识,因此是无法具体给出的,还需对这些数据再进行收集。此外,知识图谱里已有的数据可能仅仅是单模态的,例如仅仅是文本格式的,这无法用于视频的创作。In an alternative embodiment, some data in the knowledge graph in the database 130 is incomplete. For example, taking the engine as an example, when the knowledge map is initially constructed, the concept of the engine may include attributes such as fuel consumption, color, displacement, brand, and model. These attributes are common sense knowledge and are well known to the public, so in When the knowledge graph is initially constructed, these concepts about engine attributes can be added to the knowledge graph, and a concept is located at a node. The overall framework of a knowledge map is preliminarily constructed. However, such knowledge as the specific fuel consumption, what colors are included, how many liters the displacement is, and what brands and models are included are ever-changing, and they do not belong to common sense knowledge, so they cannot be given in detail. These data need to be collected again. In addition, the existing data in the knowledge graph may only be unimodal, such as only in text format, which cannot be used for video creation.

在上述不完善的知识图谱的情况下,计算设备110可以经由网络从数据库以外获取全网数据来重构知识图谱。网络可以是任何适当的网络,包括但不限于因特网、局域网(LAN)、城域网(MAN)、广域网(WAN)、诸如光纤网络和同轴电缆等的有线网、以及诸如WIFI、蜂窝电信网络和蓝牙等的无线网等。In the case of the above-mentioned imperfect knowledge graph, the computing device 110 can obtain the whole network data from outside the database via the network to reconstruct the knowledge graph. The network may be any suitable network, including but not limited to the Internet, Local Area Networks (LAN), Metropolitan Area Networks (MAN), Wide Area Networks (WAN), wired networks such as fiber optic networks and coaxial cables, and networks such as WIFI, cellular telecommunications networks Wireless network such as Bluetooth and so on.

计算设备110可以是集中式或分布式的任何适当的计算设备,包括但不限于个人计算机、服务器、客户端、手持或膝上型设备、多处理器、微处理器、机顶盒、可编程消费电子产品、网络PC、小型计算机、大型计算机系统和分布式云以及其组合等。Computing device 110 may be any suitable computing device, centralized or distributed, including but not limited to personal computers, servers, clients, handheld or laptop devices, multiprocessors, microprocessors, set-top boxes, programmable consumer electronics products, network PCs, minicomputers, mainframe computer systems and distributed clouds, and combinations thereof.

计算设备110还可以利用上述获取的多媒体内容来合成视频130,生成视频的详细过程将在下文进行阐述。The computing device 110 can also use the acquired multimedia content to synthesize a video 130, and the detailed process of generating the video will be described below.

图2示出了根据本公开的一些实施例的自动合成视频的过程200的示例的流程图。过程200可以由计算设备110来实现。FIG. 2 shows a flowchart of an example of a process 200 of automatically compositing videos according to some embodiments of the present disclosure. Process 200 may be implemented by computing device 110 .

在210,计算设备110接收用户输入,用户输入包括第一多媒体内容和用于描述视频的关键短语,第一多媒体内容具有多种预定数据格式中的至少一种数据格式。At 210, computing device 110 receives user input, the user input including first multimedia content and a keyphrase for describing a video, the first multimedia content having at least one data format among a plurality of predetermined data formats.

计算设备110可以从用户140接收用户输入,该用户输入包括用户对视频创作提供的第一多媒体内容,计算设备110以该第一多媒体内容作为基础数据来生成视频。该第一多媒体内容可以是文本数据、图片数据、视频数据或者声音数据中的一种格式的数据或者多模态数据。该第一多媒体内容可以是excel、json、csv等常见的结构化类型。The computing device 110 may receive user input from the user 140, the user input includes first multimedia content provided by the user for video creation, and the computing device 110 uses the first multimedia content as basic data to generate a video. The first multimedia content may be data in a format of text data, picture data, video data or sound data or multimodal data. The first multimedia content may be a common structured type such as excel, json, csv, etc.

用户输入还包括用于描述视频的关键短语,也即用户140对待创作视频所反映的内容的简短描述,其可以是视频的主题。例如,用户A输入的第一多媒体内容为二月份的天气温度的excel文本数据,用户A输入的关键短语为“2月份气温走势图”,或者用户B输入的第一多媒体内容为其本身形象的图片,用户B输入的关键短语“用户B带你游览世界名胜”。The user input also includes a key phrase used to describe the video, that is, a short description of what the user 140 is to create the video to reflect, which may be the subject of the video. For example, the first multimedia content input by user A is the excel text data of the weather temperature in February, the key phrase input by user A is "temperature chart in February", or the first multimedia content input by user B is The picture of its own image, the key phrase "user B takes you to visit world famous places" input by user B.

在一个实施例中,计算设备110可以对第一多媒体内容中的文本数据进行清洗,对文本数据进行分词操作来提取本文数据的实体属性信息。其中清洗后的文本数据会用于生成视频的字幕、图表、标题等信息。In one embodiment, the computing device 110 may clean the text data in the first multimedia content, and perform a word segmentation operation on the text data to extract entity attribute information of the text data. The cleaned text data will be used to generate video subtitles, charts, titles and other information.

在一个实施例中,计算设备110可以对第一多媒体内容中的视频数据进行摘要处理以获取其中的重要素材供最终合成视频所用。计算设备110还可以对视频数据进行场景分割,然后对各个场景使用特定算法对视频进行理解,获取到每个场景所属的主题、分类、涉及到的人物等信息,然后根据用户输入的关键短语选择最合适的场景作为素材使用。In one embodiment, the computing device 110 may perform summary processing on the video data in the first multimedia content to obtain important material therein for use in final composite video. The computing device 110 can also perform scene segmentation on the video data, and then use a specific algorithm to understand the video for each scene, obtain information such as the theme, classification, and people involved in each scene, and then select according to the key phrases input by the user. The most suitable scene is used as the material.

在一个实施例中,计算设备110可以对第一多媒体内容中的图片数据进行识别,对尺寸小于300*300、包含广告、涉黄涉政等相关的图片进行过滤,如果过滤后的图片太少则查询多模态图谱进行智能补充。In one embodiment, the computing device 110 can identify the picture data in the first multimedia content, and filter pictures with a size smaller than 300*300, containing advertisements, pornography and politics, etc., if the filtered pictures If too few, query the multimodal map for intelligent supplementation.

在一个实施例中,计算设备110可以将第一多媒体内容中的声音数据转换为文本数据和/或将文本数据转换为声音数据,以用于在最终的视频生成中形成字幕或者音频信息。In one embodiment, the computing device 110 can convert the sound data in the first multimedia content into text data and/or convert the text data into sound data, so as to form subtitles or audio information in the final video generation .

上述对不同类数据的处理将在下文进一步阐述。The above processing of different types of data will be further elaborated below.

在220,计算设备110基于关键短语,从预先构建的知识图谱确定至少一个节点。预先构建的多模态知识图谱中可以存在多个节点,每个节点可以对应于多个相近的关键短语构成,并且存储有相关联的多模态数据。预先构建的多模态知识图谱可以被存储在数据库120中,也可以由计算设备110构建,本公开在此不做限制。At 220, the computing device 110 determines at least one node from the pre-built knowledge graph based on the key phrase. There may be multiple nodes in the pre-built multimodal knowledge graph, and each node may correspond to multiple similar key phrases and store associated multimodal data. The pre-built multi-modal knowledge graph can be stored in the database 120, and can also be constructed by the computing device 110, which is not limited by the present disclosure.

在一个实施例中,计算设备110确定关键短语和知识图谱中的目标节点所对应的目标关键短语之间的匹配度,并且如果确定匹配度大于第一预定阈值,将目标节点确定为至少一个节点。例如,继续以上述用户A和用户B为示例进行描述。计算设备110确定知识图谱中的目标关键短语“走势”、“趋势”、“形势”等与用户A输入的关键短语“走势图”之间的匹配度大于第一预定阈值0.8(最大为1),则确定上述目标关键短语“走势”、“趋势”、“形势”所对应的节点A为目标节点,也可以通过“气温”确定与目标关键短语“天气”、“温度”所对应的节点A’为目标节点。关键短语支架内的匹配度可以通过计算短语的向量见得欧式距离来获得,本公开在此不再赘述。类似地,计算设备110可以将与目标关键短语“古迹”多对应的节点确定为目标节点,其中目标关键短语“古迹”与用户B输入的关键短语“名胜”之间的匹配度大于第一预定阈值0.8。该第一预定阈值0.8的数值是示意性的,本公开在此不做限制。In one embodiment, the computing device 110 determines the matching degree between the key phrase and the target key phrase corresponding to the target node in the knowledge graph, and if it is determined that the matching degree is greater than a first predetermined threshold, the target node is determined as at least one node . For example, continue to use the above-mentioned user A and user B as examples for description. The computing device 110 determines that the matching degree between the target key phrases "trend", "trend", "situation", etc. in the knowledge graph and the key phrase "trend graph" input by user A is greater than the first predetermined threshold of 0.8 (the maximum is 1) , then determine the node A corresponding to the above target key phrases "trend", "trend", and "situation" as the target node, or determine the node A corresponding to the target key phrases "weather" and "temperature" through "temperature" ' is the target node. The matching degree within the key phrase bracket can be obtained by calculating the Euclidean distance of the vector of the phrase, which will not be repeated in this disclosure. Similarly, the computing device 110 may determine, as the target node, a node that corresponds more to the target keyphrase "monument site", wherein the matching degree between the target keyphrase "monument site" and the key phrase "famous place" input by user B is greater than the first predetermined Threshold 0.8. The numerical value of the first predetermined threshold 0.8 is illustrative, and the present disclosure does not limit it here.

在230,计算设备110基于第一多媒体内容,获取与至少一个节点相关联的第二多媒体内容。例如,计算设备140在上述220中确定了至少一个节点后,可以通过该节点在知识图谱中确定相关联的第二多媒体内容作为第一多媒体内容的补充素材。At 230, computing device 110 obtains second multimedia content associated with at least one node based on the first multimedia content. For example, after the computing device 140 determines at least one node in the above 220, it may determine the associated second multimedia content in the knowledge graph through the node as the supplementary material of the first multimedia content.

在一个实施例中,计算设备110确定第一多媒体内容中不包括的至少一种数据格式,并且获取与至少一个节点相关联的、呈至少一种数据格式的数据作为第二多媒体内容。例如,继续以上述用户A和用户B为示例进行描述。计算设备110确定用户A输入的第一多媒体内容中的天气为excel文本数据,则计算设备110确定该第一多媒体内容不包括图片数据、视频数据和声音数据。则计算设备110可以从数据库120中的知识图谱中确定与节点A和A’相关联的、呈图片数据、视频数据和声音数据格式的数据,例如包括但不限于天气图标、打雷的声音、下雪的背景动画、曲线走势图的动画模板等。In one embodiment, the computing device 110 determines at least one data format not included in the first multimedia content, and obtains data in at least one data format associated with at least one node as the second multimedia content content. For example, continue to use the above-mentioned user A and user B as examples for description. The computing device 110 determines that the weather in the first multimedia content input by user A is excel text data, then the computing device 110 determines that the first multimedia content does not include picture data, video data and sound data. Then the computing device 110 can determine from the knowledge graph in the database 120 the data associated with the nodes A and A' in the form of picture data, video data and sound data, for example including but not limited to weather icons, the sound of thunder, Background animation of snow, animation template of graph, etc.

对于用户B,计算设备110确定第一多媒体内容为用户B本身形象的图片,则其可以从数据库120中的知识图谱中确定与节点B相关联的、呈文本数据、视频数据和声音数据格式的数据,例如包括但不限于世界名胜的视频、文字介绍和动画。For user B, computing device 110 determines that the first multimedia content is a picture of user B's own image, and then it can determine from the knowledge map in database 120 that associated with node B, in the form of text data, video data and sound data Format data, such as including but not limited to videos, text introductions and animations of world attractions.

在另一实施例中,计算设备110确定第一多媒体内容中的至少一种数据格式的多媒体内容的数据量,并且如果确定数据量小于第二预定阈值,计算设备110获取与至少一个节点相关联的、呈至少一种数据格式的数据作为第二多媒体内容。例如,继续以上述用户A和用户B为示例进行描述。计算设备110确定A用户输入的第一多媒体内容中的excel文本数据的数据量小于第二预定阈值8KB,例如其仅仅记录了2月中10天的天气信息,则计算设备110获取与节点A’相关联的、呈至少一种数据格式的数据作为第二多媒体内容,例如整个2月的天气的文本信息作为第二多媒体内容。In another embodiment, the computing device 110 determines the data volume of the multimedia content in at least one data format in the first multimedia content, and if it is determined that the data volume is less than a second predetermined threshold, the computing device 110 obtains the data volume associated with at least one node Associated data in at least one data format serves as second multimedia content. For example, continue to use the above-mentioned user A and user B as examples for description. Computing device 110 determines that the data volume of the excel text data in the first multimedia content input by user A is less than the second predetermined threshold 8KB, for example, it only records the weather information of 10 days in February, then computing device 110 obtains and node The data associated with A' in at least one data format is used as the second multimedia content, for example, the text information of the weather in February is used as the second multimedia content.

对于用户B,计算设备110确定用户140输入的第一多媒体内容的数据量例如小于第二预定阈值20MB,则计算设备110获取与节点B相关联的、呈至少一种数据格式的数据作为第二多媒体内容,例如世界名胜的视频、文字介绍和动画。For user B, computing device 110 determines that the data volume of the first multimedia content input by user 140 is, for example, less than the second predetermined threshold of 20MB, then computing device 110 acquires data associated with Node B in at least one data format as Secondary multimedia content, such as videos, text introductions and animations of world attractions.

请注意,上述第一预定阈值和第二预定阈值的数值仅仅是示例性的,可以根据用户输入和计算设备110调整不同的阈值。Please note that the above-mentioned values of the first predetermined threshold and the second predetermined threshold are only exemplary, and different thresholds may be adjusted according to user input and the computing device 110 .

通过对用户提供的基础素材和对视频的要求进行自动分析,并且根据分析结果在多模态知识图谱中进行匹配以自动获取基础素材中数据量不足的数据或者数据格式不足的高质量数据,由此解决了传统方案中获取素材难的缺陷。By automatically analyzing the basic material provided by the user and the requirements for the video, and matching the analysis results in the multi-modal knowledge map to automatically obtain data with insufficient data in the basic material or high-quality data with insufficient data format, by This solves the defect that it is difficult to obtain materials in the traditional solution.

在一个备选实施例中,计算设备110可以利用预先设置的视频检索模块、相似图片检索模块以及话题检索模块等在多模态图谱中查找相关的字段来补充给定数据中的不足信息。In an alternative embodiment, the computing device 110 can use the preset video retrieval module, similar image retrieval module, and topic retrieval module to search for relevant fields in the multimodal graph to supplement insufficient information in the given data.

在240,计算设备110基于第一多媒体内容和第二多媒体内容,生成视频。例如,计算设备110可以对上述获取的多种格式的多媒体内容进行进一步处理,以生成视频。At 240, computing device 110 generates a video based on the first multimedia content and the second multimedia content. For example, the computing device 110 may further process the acquired multimedia content in multiple formats to generate a video.

在一个实施例中,计算设备110对第一多媒体内容和第二多媒体内容中的文本内容进行语义分析,以生成文本元素。然后,计算设备110确定文本元素在视频中的位置、文本元素中的文字大小、文本元素的显示效果、文本元素的显示时间中的至少一项,以生成视频。In one embodiment, the computing device 110 performs semantic analysis on text content in the first multimedia content and the second multimedia content to generate text elements. Then, the computing device 110 determines at least one of the position of the text element in the video, the size of the text in the text element, the display effect of the text element, and the display time of the text element to generate the video.

例如,计算设备110对多媒体内容中的文本内容进行语义分析,将其与图片信息相关联,并且进一步确定其在相关联的图片中的限制位置、持续显示时间,文本的大小、位置的动态变化、文本的动态效果等。根据上述操作。可以将文本内容与图像帧相关联,从而使文本信息可以清楚地描述每个图像帧。For example, the computing device 110 performs semantic analysis on the text content in the multimedia content, associates it with the picture information, and further determines its limited position in the associated picture, continuous display time, and dynamic changes in the size and position of the text , Dynamic effects of text, etc. Follow the steps above. Text content can be associated with image frames so that the text information clearly describes each image frame.

在另一个实施例中,计算设备110获取第一多媒体内容和第二多媒体内容中的视频内容。计算设备110接着确定视频内容中的、与关键短语相关联的多个图像帧。计算设备110然后确定多个图像帧在视频中的顺序和多个图像帧间的转换效果,最后,计算设备110按照顺序,利用转换效果,生成视频。例如,计算设备110可以确定根据用户140输入的关键短语确定多媒体内容中的视频内容中关键图像帧。计算设备110可以将图像帧中的、与关键短语间的匹配度大于阈值的图像帧作为与关键短语相关联的多个图像帧。该图像帧往往最能反映用户预想的视频最终呈现的效果。计算设备110然后可以根据所确定的相关联的多个图像帧来确定图像帧-图像帧、图像帧帧-视频、视频-图像帧3类转场效果。例如,以上述用户B为示例,计算设备110可以根据世界名胜的不同风格确定图片和视频在最终合成的视频中的出场顺序,并且在图片和视频数据之间的加入转换效果,以使得转换更加自然。In another embodiment, the computing device 110 acquires video content in the first multimedia content and the second multimedia content. Computing device 110 then determines a number of image frames in the video content that are associated with the key phrase. The computing device 110 then determines the sequence of the multiple image frames in the video and the transition effect between the multiple image frames, and finally, the computing device 110 generates the video by using the transition effect according to the sequence. For example, the computing device 110 may determine to determine the key image frame in the video content in the multimedia content according to the key phrase input by the user 140 . The computing device 110 may use the image frames whose matching degree with the key phrase is greater than a threshold among the image frames as a plurality of image frames associated with the key phrase. The image frame can often best reflect the final presentation effect of the video expected by the user. The computing device 110 may then determine image frame-image frame, image frame-video, and video-image frame transition effects based on the determined associated plurality of image frames. For example, taking the above-mentioned user B as an example, the computing device 110 can determine the appearance order of the pictures and videos in the final synthesized video according to the different styles of world famous attractions, and add transition effects between the pictures and video data to make the transition more accurate. nature.

通过分析视频素材中的文字、图片和视频的属性,可以自动地根据其属性对其在视频中出现的顺序进行优选组合,并且自动设置转换效果,使得视频整体上更加流畅和自然。By analyzing the properties of text, pictures and videos in the video material, it can automatically optimize the order in which they appear in the video according to their properties, and automatically set the conversion effect to make the video more smooth and natural as a whole.

由此,可以自动地获取用于生成视频的高质量多媒体内容,并且根据多媒体内容间的联系来高效地生成高质量的视频。解决了制作视频对用户的技术要求高且素材难以获取的关键问题。Thus, it is possible to automatically acquire high-quality multimedia content for generating a video, and efficiently generate a high-quality video according to the relationship between the multimedia content. It solves the key problem that the production of video requires high technology for users and the material is difficult to obtain.

图3示出了根据本公开的一些实施例的自动生成视频的过程的另一示例的流程图。过程300可以由计算设备110来实现。其中计算设备110可以通过基于FFMPEG的底层框架来实现本公开的步骤,即将文本、图片、视频、声音等经过一系列的操作合称为一个视频的过程。FFMPEG为linux类系统中较为底层视频处理工具,主要功能包括对视频的编码和解码等。当然还可以应用其它框架,本公开不旨在限制。FIG. 3 shows a flowchart of another example of the process of automatically generating a video according to some embodiments of the present disclosure. Process 300 may be implemented by computing device 110 . The computing device 110 can realize the steps of the present disclosure through the underlying framework based on FFMPEG, that is, a series of operations of text, picture, video, sound, etc. are collectively referred to as a video process. FFMPEG is a relatively low-level video processing tool in linux-like systems. Its main functions include encoding and decoding video. Of course other frameworks may also be applied and this disclosure is not intended to be limiting.

在310,计算设备110从用户140接收用户输入的第一多媒体内容和描述视频的关键短语。At 310 , computing device 110 receives user input from user 140 of first multimedia content and a keyphrase describing the video.

在320,计算设备110对用户输入的第一多媒体内容进行处理,并且根据用户输入从知识图谱360中获取第二多媒体内容来补充第一多媒体内容。At 320, the computing device 110 processes the first multimedia content input by the user, and acquires second multimedia content from the knowledge graph 360 according to the user input to supplement the first multimedia content.

在330,计算设备110对上述第一多媒体内容和第二多媒体内容进行基础处理。该基础处理包括但不限于:对声音相关的操作,包括多源声音的合成,声音变换、声音裁剪等;自定义蒙版操作,该功能主要为后续的应用处理340服务,可以实现各类动画效果的制作;视频相关的功能,具体实现了对视频的大小、位置变换,视频的颜色、帧数以及时长等动态效果操作。At 330, the computing device 110 performs basic processing on the first multimedia content and the second multimedia content. The basic processing includes but is not limited to: sound-related operations, including multi-source sound synthesis, sound transformation, sound clipping, etc.; custom mask operation, this function is mainly for the subsequent application processing 340 service, and can realize various animations The production of effects; video-related functions, specifically realize the dynamic effect operation of video size, position transformation, video color, frame number and duration.

在440,计算设备110在基础处理330的基础上进行进一步的应用处理。相对于基础处理330,应用处理340实现了对FFMPEG的功能的更高级封装。At 440 , computing device 110 performs further application processing based on base processing 330 . Compared with the basic processing 330, the application processing 340 implements a higher-level encapsulation of the functions of FFMPEG.

例如,计算设备110可以应用基础处理340来生成图表视频。计算设备110可以分析用户140输入的数据,然后按照用户140的要求选择不同的图表将数据加入到视频130中。For example, computing device 110 may apply underlying processing 340 to generate a chart video. Computing device 110 may analyze the data entered by user 140 and then select different charts to add the data to video 130 as required by user 140 .

在一个实施例中,计算设备110可以基于自定义蒙版的方法,该方法主要应用于图表可以一次性展示完毕的情况,实现时需要先选择一个动态的模板,然后将动态模板作为当前填充图表的蒙版,从而形成动画效果生成视频.In one embodiment, the computing device 110 can be based on a custom mask method, which is mainly applied to the situation where the chart can be displayed at one time. When implementing it, it is necessary to select a dynamic template first, and then use the dynamic template as the current filled chart mask to form an animation effect to generate a video.

在另一实施例中,计算设备110可以基于坐标轴归一的方法,该方法主要应用于图表不能一次性展示完毕的情况,例如,以上述用户A作为示例,“2月份气温走势图”,每次只展示7天的情况,即需要展示28个7天的走势信息,这些走势线条在视频中展示是连续的,这导致图表中的内容在不断变化,为了体现这种变化,在视频尺寸不变的情况下,计算设备110通过计算当前的数据信息,不断的移动坐标轴就可以实时的显示走势的变化,以生成所需的图表视频。In another embodiment, the computing device 110 may be based on the method of normalizing the coordinate axes, which is mainly applied to the situation where the graph cannot be displayed at one time. Only 7 days are displayed each time, that is, 28 7-day trend information need to be displayed. These trend lines are displayed continuously in the video, which causes the content in the chart to change constantly. In order to reflect this change, the video size Under the condition of not changing, the computing device 110 can display the change of the trend in real time by calculating the current data information and continuously moving the coordinate axis, so as to generate the required chart video.

在应用处理340中,计算设备110可以使用数据库120提供的AR人物接口,以根据输入的数据自动生成AR人物播报。计算设备110可以将声音数据转换为视频数据,通过先对声音进行文本转化,然后对文本计算其与声音时间点的关系,确定每个时间点文本的大小、所在位置等信息构成一连串的时间文本。计算设备110还可以对给定的文本串进行长度、实体词数量、冗余数等信息进行统计,然后按照上述数据进行打散混合,接着按照视频时间进行弹幕的位置、移动速率、颜色、字号等,以生成自动字幕。In the application processing 340, the computing device 110 may use the AR character interface provided by the database 120 to automatically generate an AR character broadcast according to the input data. Computing device 110 can convert sound data into video data, convert the sound into text first, and then calculate the relationship between the text and the time point of the sound to determine the size and location of the text at each time point to form a series of time texts . The computing device 110 can also count information such as the length, number of entity words, and redundant numbers of a given text string, and then break up and mix according to the above data, and then perform the position, moving speed, color, and so on of the barrage according to the video time. font size, etc. to generate automatic subtitles.

最后,在350,计算设备110合成视频130。计算设备可以首先对该视频130进行云存储,然后将存储的地址发送给用户140以供用户查看和分享。Finally, at 350 , computing device 110 composites video 130 . The computing device can firstly store the video 130 in the cloud, and then send the stored address to the user 140 for the user to view and share.

根据本公开,可以通过分析用户输入数据,补充用户数据,然后对数据进行上述处理来实现高质量的视频的自动合成。According to the present disclosure, automatic synthesis of high-quality video can be realized by analyzing user input data, supplementing user data, and then performing the above-mentioned processing on the data.

步骤410至步骤450中的每个步骤的具体实施参考图2的描述,在此不再赘述。The specific implementation of each step in step 410 to step 450 refers to the description in FIG. 2 , and details are not repeated here.

图4示出了根据本公开的实施例的自动生成视频的装置400的示意框图。如图4所示,装置400包括:输入接收模块410,被配置为接收用户输入,用户输入包括第一多媒体内容和用于描述视频的关键短语,第一多媒体内容具有多种预定数据格式中的至少一种数据格式;第一节点确定模块420,被配置为基于关键短语,从预先构建的知识图谱确定至少一个节点;第一多媒体内容获取模块430,被配置为基于第一多媒体内容,获取与至少一个节点相关联的第二多媒体内容;以及第一视频生成模块440,被配置为基于第一多媒体内容和第二多媒体内容,生成视频。Fig. 4 shows a schematic block diagram of an apparatus 400 for automatically generating video according to an embodiment of the present disclosure. As shown in FIG. 4 , the device 400 includes: an input receiving module 410 configured to receive user input, the user input includes first multimedia content and key phrases used to describe the video, the first multimedia content has various predetermined At least one data format in the data format; the first node determination module 420 is configured to determine at least one node from a pre-built knowledge map based on the key phrase; the first multimedia content acquisition module 430 is configured to be based on the first A multimedia content, acquiring second multimedia content associated with at least one node; and a first video generation module 440 configured to generate a video based on the first multimedia content and the second multimedia content.

在一些实施例中,其中第一节点确定模块420可以包括:匹配模块,被配置为确定关键短语和知识图谱中的目标节点所对应的目标关键短语之间的匹配度;以及第二节点确定模块,被配置为如果确定匹配度大于第一预定阈值,将目标节点确定为至少一个节点。In some embodiments, the first node determination module 420 may include: a matching module configured to determine the matching degree between the key phrase and the target key phrase corresponding to the target node in the knowledge graph; and the second node determination module , configured to determine the target node as at least one node if it is determined that the matching degree is greater than a first predetermined threshold.

在一些实施例中,其中第一多媒体内容获取模块430包括:数据格式确定模块,被配置为确定第一多媒体内容中不包括的至少一种数据格式;以及第二多媒体内容获取模块,被配置为获取与至少一个节点相关联的、呈至少一种数据格式的数据作为第二多媒体内容。In some embodiments, the first multimedia content acquisition module 430 includes: a data format determination module configured to determine at least one data format not included in the first multimedia content; and the second multimedia content An acquisition module configured to acquire data associated with at least one node and in at least one data format as the second multimedia content.

在一些实施例中,其中第一多媒体内容获取模块430包括:数据量确定模块,被配置为确定第一多媒体内容中的至少一种数据格式的多媒体内容的数据量;以及第三多媒体内容获取模块,被配置为如果确定数据量小于第二预定阈值,获取与至少一个节点相关联的、呈至少一种数据格式的数据作为第二多媒体内容。In some embodiments, the first multimedia content acquisition module 430 includes: a data volume determination module configured to determine the data volume of the multimedia content in at least one data format in the first multimedia content; and the third The multimedia content obtaining module is configured to obtain data associated with at least one node and in at least one data format as second multimedia content if it is determined that the amount of data is less than a second predetermined threshold.

在一些实施例中,其中第一视频生成模块440包括:文本元素生成模块,被配置为对第一多媒体内容和第二多媒体内容中的文本内容进行语义分析,以生成文本元素;以及第二视频生成模块,被配置为基于文本元素,生成视频。In some embodiments, the first video generation module 440 includes: a text element generation module configured to perform semantic analysis on the text content in the first multimedia content and the second multimedia content to generate text elements; and a second video generation module configured to generate a video based on the text element.

在一些实施例中,其中第二视频生成模块包括:第三视频生成模块,被配置为确定文本元素在视频中的位置、文本元素中的文字大小、文本元素的显示效果、文本元素的显示时间中的至少一项,生成视频。In some embodiments, the second video generation module includes: a third video generation module configured to determine the position of the text element in the video, the size of the text in the text element, the display effect of the text element, and the display time of the text element At least one of the to generate a video.

在一些实施例中,其中第一视频生成模块440包括:视频内容获取模块,被配置为获取第一多媒体内容和第二多媒体内容中的视频内容;图像帧确定模块,被配置为确定视频内容中的、与关键短语相关联的多个图像帧;以及第四视频生成模块,被配置为基于多个图像帧,生成视频。In some embodiments, the first video generation module 440 includes: a video content acquisition module configured to acquire video content in the first multimedia content and the second multimedia content; an image frame determination module configured to Determining a plurality of image frames associated with the key phrase in the video content; and a fourth video generating module configured to generate a video based on the plurality of image frames.

在一些实施例中,其中第四视频生成模块包括:转换效果确定模块,被配置为确定多个图像帧在视频中的顺序和多个图像帧间的转换效果;以及第五视频生成模块,被配置为按照顺序,利用转换效果,生成视频。In some embodiments, the fourth video generation module includes: a conversion effect determination module configured to determine the sequence of multiple image frames in the video and the conversion effect between multiple image frames; and the fifth video generation module is configured Configured to generate video in sequence using transition effects.

在一些实施例中,其中多种预定数据格式包括文本数据格式、图片数据格式、视频数据格式和声音数据格式中的至少一项。In some embodiments, the plurality of predetermined data formats include at least one of text data format, picture data format, video data format and sound data format.

图5示出了可以用来实施本公开的实施例的示例设备500的示意性框图。设备500可以用于实现图1的计算设备110。如图所示,设备400包括中央处理单元(CPU)510,其可以根据存储在只读存储器(ROM)520中的计算机程序指令或者从存储单元580加载到随机访问存储器(RAM)530中的计算机程序指令,来执行各种适当的动作和处理。在RAM 530中,还可存储设备500操作所需的各种程序和数据。CPU 510、ROM 520以及RAM 530通过总线540彼此相连。输入/输出(I/O)接口550也连接至总线540。Fig. 5 shows a schematic block diagram of an example device 500 that may be used to implement embodiments of the present disclosure. Device 500 may be used to implement computing device 110 of FIG. 1 . As shown, the device 400 includes a central processing unit (CPU) 510 which can be programmed according to computer program instructions stored in a read only memory (ROM) 520 or loaded from a storage unit 580 into a random access memory (RAM) 530 program instructions to perform various appropriate actions and processes. In the RAM 530, various programs and data necessary for the operation of the device 500 can also be stored. The CPU 510 , ROM 520 and RAM 530 are connected to each other through a bus 540 . An input/output (I/O) interface 550 is also connected to bus 540 .

设备500中的多个部件连接至I/O接口550,包括:输入单元560,例如键盘、鼠标等;输出单元570,例如各种类型的显示器、扬声器等;存储单元580,例如磁盘、光盘等;以及通信单元590,例如网卡、调制解调器、无线通信收发机等。通信单元590允许设备500通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Multiple components in the device 500 are connected to the I/O interface 550, including: an input unit 560, such as a keyboard, a mouse, etc.; an output unit 570, such as various types of displays, speakers, etc.; a storage unit 580, such as a magnetic disk, an optical disk, etc. ; and a communication unit 590, such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 590 allows the device 500 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.

处理单元510执行上文所描述的各个方法和处理,例如过程200和/或过程300。例如,在一些实施例中,过程200和/或过程300可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元580。在一些实施例中,计算机程序的部分或者全部可以经由ROM 520和/或通信单元590而被载入和/或安装到设备500上。当计算机程序加载到RAM530并由CPU 610执行时,可以执行上文描述的过程200和/或过程300的一个或多个步骤。备选地,在其他实施例中,CPU 510可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行过程200和/或过程300。The processing unit 510 executes various methods and processes described above, such as the process 200 and/or the process 300 . For example, in some embodiments, process 200 and/or process 300 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 580 . In some embodiments, part or all of the computer program may be loaded and/or installed on the device 500 via the ROM 520 and/or the communication unit 590 . When a computer program is loaded into RAM 530 and executed by CPU 610, one or more steps of process 200 and/or process 300 described above may be performed. Alternatively, in other embodiments, CPU 510 may be configured to perform process 200 and/or process 300 in any other suitable manner (eg, by means of firmware).

本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)等等。The functions described herein above may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field programmable gate array (FPGA), application specific integrated circuit (ASIC), application specific standard product (ASSP), system on a chip (SOC), load programmable logic device (CPLD), etc.

用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

此外,虽然采用特定次序描绘了各操作,但是这应当理解为要求这样操作以所示出的特定次序或以顺序次序执行,或者要求所有图示的操作应被执行以取得期望的结果。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实现中。相反地,在单个实现的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实现中。In addition, while operations are depicted in a particular order, this should be understood to require that such operations be performed in the particular order shown, or in sequential order, or that all illustrated operations should be performed to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while the above discussion contains several specific implementation details, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.

尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.

Claims (20)

1. A method of automatically generating video, comprising:
receiving user input comprising first multimedia content and a key phrase describing the video, the first multimedia content having at least one of a plurality of predetermined data formats, and the first multimedia content being multimedia content for the video obtained from a knowledge-graph stored in a database;
determining at least one node from a pre-constructed knowledge graph based on the key phrase;
obtaining second multimedia content associated with the at least one node based on the first multimedia content; and
generating the video based on the first multimedia content and the second multimedia content.
2. The method of claim 1, wherein determining at least one node from a pre-constructed knowledge-graph based on the key phrase comprises:
determining the matching degree between the key phrases and target key phrases corresponding to target nodes in the knowledge graph; and
and if the matching degree is determined to be larger than a first preset threshold value, determining the target node as the at least one node.
3. The method of claim 1, wherein obtaining second multimedia content associated with the at least one node based on the first multimedia content comprises:
determining at least one data format not included in the first multimedia content; and
data associated with the at least one node in the at least one data format is obtained as second multimedia content.
4. The method of claim 1, wherein obtaining second multimedia content associated with the at least one node based on the first multimedia content comprises:
determining a data amount of multimedia content of at least one data format in the first multimedia content; and
if it is determined that the amount of data is less than a second predetermined threshold, data associated with the at least one node in the at least one data format is retrieved as second multimedia content.
5. The method of claim 1, wherein generating the video based on the first multimedia content and the second multimedia content comprises:
performing semantic analysis on text content in the first multimedia content and the second multimedia content to generate text elements; and
generating the video based on the text element.
6. The method of claim 5, wherein generating the video based on the text element comprises:
determining at least one of a position of the text element in a video, a word size in the text element, a display effect of the text element, and a display time of the text element to generate the video.
7. The method of claim 1, wherein generating the video based on the first multimedia content and the second multimedia content comprises:
acquiring video contents in the first multimedia content and the second multimedia content;
determining a plurality of image frames in the video content associated with the key phrase; and
generating the video based on the plurality of image frames.
8. The method of claim 7, wherein generating the video based on the image frames comprises:
determining an order of the plurality of image frames in the video and a transition effect between the plurality of image frames; and
and generating the video by using the conversion effect according to the sequence.
9. The method of claim 1, wherein the plurality of predetermined data formats includes at least one of a text data format, a picture data format, a video data format, and a sound data format.
10. An apparatus for automatically generating video, comprising:
an input receiving module configured to receive a user input comprising first multimedia content and a key phrase for describing the video, the first multimedia content having at least one of a plurality of predetermined data formats, and the first multimedia content being multimedia content for the video obtained from a knowledge-graph stored in a database;
a first node determination module configured to determine at least one node from a pre-constructed knowledge-graph based on the key phrase;
a first multimedia content acquisition module configured to acquire second multimedia content associated with the at least one node based on the first multimedia content; and
a first video generation module configured to generate the video based on the first multimedia content and the second multimedia content.
11. The apparatus of claim 10, wherein the first node determination module comprises:
a matching module configured to determine a matching degree between the key phrase and a target key phrase corresponding to a target node in the knowledge-graph; and
a second node determination module configured to determine the target node as the at least one node if it is determined that the degree of match is greater than a first predetermined threshold.
12. The device of claim 10, wherein the first multimedia content acquisition module comprises:
a data format determination module configured to determine at least one data format not included in the first multimedia content; and
a second multimedia content acquisition module configured to acquire data associated with the at least one node in the at least one data format as second multimedia content.
13. The device of claim 10, wherein the first multimedia content acquisition module comprises:
a data amount determination module configured to determine a data amount of multimedia content of at least one data format in the first multimedia content; and
a third multimedia content obtaining module configured to obtain data associated with the at least one node in the at least one data format as second multimedia content if it is determined that the amount of data is less than a second predetermined threshold.
14. The apparatus of claim 10, wherein the first video generation module comprises:
a text element generation module configured to perform semantic analysis on text content in the first multimedia content and the second multimedia content to generate text elements; and
a second video generation module configured to generate the video based on the text element.
15. The apparatus of claim 14, wherein the second video generation module comprises:
a third video generating module configured to determine at least one of a position of the text element in a video, a size of a word in the text element, a display effect of the text element, and a display time of the text element, and generate the video.
16. The apparatus of claim 10, wherein the first video generation module comprises:
a video content obtaining module configured to obtain video content of the first multimedia content and the second multimedia content;
an image frame determination module configured to determine a plurality of image frames in the video content associated with the key phrase; and
a fourth video generation module configured to generate the video based on the plurality of image frames.
17. The apparatus of claim 16, wherein the fourth video generation module comprises:
a transition effect determination module configured to determine an order of the plurality of image frames in the video and a transition effect between the plurality of image frames; and
a fifth video generation module configured to generate the video using the conversion effect in the order.
18. The apparatus of claim 10, wherein the plurality of predetermined data formats includes at least one of a text data format, a picture data format, a video data format, and a sound data format.
19. An electronic device, the electronic device comprising:
one or more processors; and
storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method of any one of claims 1 to 8.
20. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 9.
CN202011383389.4A 2020-11-30 2020-11-30 Method, device, equipment and computer readable storage medium for automatically generating video Active CN112565875B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011383389.4A CN112565875B (en) 2020-11-30 2020-11-30 Method, device, equipment and computer readable storage medium for automatically generating video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011383389.4A CN112565875B (en) 2020-11-30 2020-11-30 Method, device, equipment and computer readable storage medium for automatically generating video

Publications (2)

Publication Number Publication Date
CN112565875A CN112565875A (en) 2021-03-26
CN112565875B true CN112565875B (en) 2023-03-03

Family

ID=75045967

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011383389.4A Active CN112565875B (en) 2020-11-30 2020-11-30 Method, device, equipment and computer readable storage medium for automatically generating video

Country Status (1)

Country Link
CN (1) CN112565875B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113438538B (en) * 2021-06-28 2023-02-10 康键信息技术(深圳)有限公司 Short video preview method, device, equipment and storage medium
CN114979054B (en) * 2022-05-13 2024-06-18 维沃移动通信有限公司 Video generation method, device, electronic device and readable storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309351A (en) * 2018-02-14 2019-10-08 阿里巴巴集团控股有限公司 Video image generation, device and the computer system of data object
CN109189938B (en) * 2018-08-31 2025-09-30 北京字节跳动网络技术有限公司 Method and device for updating knowledge graph
CN109344291B (en) * 2018-09-03 2020-08-25 腾讯科技(武汉)有限公司 Video generation method and device
CN109614537A (en) * 2018-12-06 2019-04-12 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for generating video
CN110532404B (en) * 2019-09-03 2023-08-04 北京百度网讯科技有限公司 Source multimedia determining method, device, equipment and storage medium
CN111767796B (en) * 2020-05-29 2023-12-15 北京奇艺世纪科技有限公司 Video association method, device, server and readable storage medium

Also Published As

Publication number Publication date
CN112565875A (en) 2021-03-26

Similar Documents

Publication Publication Date Title
CN114880441B (en) Visual content generation method, device, system, device and medium
JP7240505B2 (en) Voice packet recommendation method, device, electronic device and program
KR20210153009A (en) Method for automatically generating advertisement, apparatus, device, and computer-readable storage medium
CN111935537A (en) Music video generation method and device, electronic equipment and storage medium
CN115496550A (en) Text generation method and device
US12277766B2 (en) Information generation method and apparatus
CN114066718A (en) Image style migration method and device, storage medium and terminal
WO2024235271A1 (en) Movement generation method and apparatus for virtual character, and construction method and apparatus for movement library of virtual avatar
CN116611496A (en) Text-to-image generation model optimization method, device, equipment and storage medium
CN111883101B (en) A model training and speech synthesis method, device, equipment and medium
Kuroczyński et al. 3D models on triple paths-new pathways for documenting and visualizing virtual reconstructions
CN117011875A (en) Method, device, equipment, medium and program product for generating multimedia page
CN110706300A (en) Virtual image generation method and device
US20240195765A1 (en) Personality reply for digital content
CN111667557A (en) Animation production method and device, storage medium and terminal
CN114638232A (en) Method and device for converting text into video, electronic equipment and storage medium
CN112565875B (en) Method, device, equipment and computer readable storage medium for automatically generating video
CN108305306B (en) An animation data organization method based on sketch interaction
CN118474476A (en) AIGC-based travel scene video generation method, system, equipment and storage medium
CN111626023A (en) Automatic generation method, device and system for visualization chart highlighting and annotation
CN113240780B (en) Method and device for generating animation
Hu et al. Efficient procedural modelling of building Façades based on windows from sketches
Wang et al. Integrated design system of voice-visual VR based on multi-dimensional information analysis
CN112559758A (en) Method, device and equipment for constructing knowledge graph and computer readable storage medium
CN113704488A (en) Content generation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant