CN112565875B - Method, device, equipment and computer readable storage medium for automatically generating video - Google Patents
Method, device, equipment and computer readable storage medium for automatically generating video Download PDFInfo
- Publication number
- CN112565875B CN112565875B CN202011383389.4A CN202011383389A CN112565875B CN 112565875 B CN112565875 B CN 112565875B CN 202011383389 A CN202011383389 A CN 202011383389A CN 112565875 B CN112565875 B CN 112565875B
- Authority
- CN
- China
- Prior art keywords
- multimedia content
- video
- node
- data
- module configured
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/435—Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
- H04N21/4355—Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream involving reformatting operations of additional data, e.g. HTML pages on a television screen
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4398—Processing of audio elementary streams involving reformatting operations of audio signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/47205—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/488—Data services, e.g. news ticker
- H04N21/4884—Data services, e.g. news ticker for displaying subtitles
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Data Mining & Analysis (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
根据本公开的示例实施例,提供了一种自动生成视频的方法、装置、设备、计算机可读存储介质和计算机程序产品。涉及知识图谱、深度学习和视频创作领域。一种自动生成视频的方法,包括接收用户输入,用户输入包括第一多媒体内容和用于描述视频的关键短语,第一多媒体内容具有多种预定数据格式中的至少一种数据格式;基于关键短语,从预先构建的知识图谱确定至少一个节点;基于第一多媒体内容,获取与至少一个节点相关联的第二多媒体内容;以及基于第一多媒体内容和第二多媒体内容,生成视频。由此,可以自动、高效地生成视频。
According to example embodiments of the present disclosure, there are provided a method, apparatus, device, computer-readable storage medium and computer program product for automatically generating a video. Involved in the fields of knowledge graph, deep learning and video creation. A method for automatically generating a video, comprising receiving user input, the user input including first multimedia content and key phrases for describing the video, the first multimedia content having at least one data format in a plurality of predetermined data formats ; based on key phrases, determine at least one node from a pre-built knowledge map; based on the first multimedia content, obtain second multimedia content associated with at least one node; and based on the first multimedia content and the second Multimedia content, generate video. As a result, videos can be automatically and efficiently generated.
Description
技术领域technical field
本公开的实施例主要涉及信息处理领域,并且更具体地,涉及自动生成视频的方法、装置、设备、计算机可读存储介质和计算机程序产品。Embodiments of the present disclosure mainly relate to the field of information processing, and more specifically, to methods, devices, devices, computer-readable storage media and computer program products for automatically generating videos.
背景技术Background technique
随着移动数据网络的发展,互联网上视频的数据占比正在逐渐超越文本。在与视频制作相关的技术领域中,基于人工智能的创新应用目前还处于空缺状态,也没有自动制作视频的方案。而传统的视频制作流程存在以下缺陷:对用户要求较高,合格的视频需要制作者能够应用很多复杂软件,且用于制作视频的素材不易收集。因此,需要一种自动地生成高质量的视频的方案。With the development of mobile data networks, the proportion of video data on the Internet is gradually surpassing that of text. In the technical field related to video production, innovative applications based on artificial intelligence are still in a blank state, and there is no automatic video production scheme. However, the traditional video production process has the following defects: high requirements for users, qualified video requires producers to be able to apply a lot of complex software, and the materials used to make videos are not easy to collect. Therefore, a solution to automatically generate high-quality video is needed.
发明内容Contents of the invention
根据本公开的示例实施例,提供了一种自动生成视频的方案。According to an example embodiment of the present disclosure, a solution for automatically generating a video is provided.
在本公开的第一方面中,提供了一种自动生成视频的方法,包括:接收用户输入,用户输入包括第一多媒体内容和用于描述视频的关键短语,第一多媒体内容具有多种预定数据格式中的至少一种数据格式;基于关键短语,从预先构建的知识图谱确定至少一个节点;基于第一多媒体内容,获取与至少一个节点相关联的第二多媒体内容;以及基于第一多媒体内容和第二多媒体内容,生成视频。In a first aspect of the present disclosure, there is provided a method for automatically generating a video, comprising: receiving user input, the user input includes a first multimedia content and a keyphrase for describing the video, the first multimedia content has At least one data format in a plurality of predetermined data formats; based on key phrases, determine at least one node from a pre-built knowledge map; based on the first multimedia content, obtain second multimedia content associated with at least one node ; and generating a video based on the first multimedia content and the second multimedia content.
在本公开的第二方面中,提供了一种自动生成视频的装置,包括:输入接收模块,被配置为接收用户输入,用户输入包括第一多媒体内容和用于描述视频的关键短语,第一多媒体内容具有多种预定数据格式中的至少一种数据格式;第一节点确定模块,被配置为基于关键短语,从预先构建的知识图谱确定至少一个节点;第一多媒体内容获取模块,被配置为基于第一多媒体内容,获取与至少一个节点相关联的第二多媒体内容;以及第一视频生成模块,被配置为基于第一多媒体内容和第二多媒体内容,生成视频。In a second aspect of the present disclosure, there is provided an apparatus for automatically generating a video, including: an input receiving module configured to receive user input, the user input including first multimedia content and key phrases used to describe the video, The first multimedia content has at least one data format in a plurality of predetermined data formats; the first node determination module is configured to determine at least one node from a pre-built knowledge map based on key phrases; the first multimedia content An acquisition module configured to acquire second multimedia content associated with at least one node based on the first multimedia content; and a first video generation module configured to acquire the second multimedia content based on the first multimedia content and the second multimedia content Media content, generate video.
在本公开的第三方面中,提供了一种电子设备,包括一个或多个处理器;以及存储装置,用于存储一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现根据本公开的第一方面的方法。In a third aspect of the present disclosure, an electronic device is provided, including one or more processors; and a storage device for storing one or more programs, when the one or more programs are executed by the one or more processors Execution causes the one or more processors to implement the method according to the first aspect of the present disclosure.
在本公开的第四方面中,提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现根据本公开的第一方面的方法。In a fourth aspect of the present disclosure, there is provided a computer-readable medium on which is stored a computer program that implements the method according to the first aspect of the present disclosure when executed by a processor.
在本公开的第五方面中,提供了一种计算机程序产品,包括计算机程序指令,该计算机程序指令被处理器实现如本公开的第一方面的方法。In a fifth aspect of the present disclosure, there is provided a computer program product comprising computer program instructions, the computer program instructions being implemented by a processor as the method of the first aspect of the present disclosure.
应当理解,发明内容部分中所描述的内容并非旨在限定本公开的实施例的关键或重要特征,亦非用于限制本公开的范围。本公开的其它特征将通过以下的描述变得容易理解。It should be understood that what is described in the Summary of the Invention is not intended to limit the key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will be readily understood through the following description.
附图说明Description of drawings
结合附图并参考以下详细说明,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。在附图中,相同或相似的附图标记表示相同或相似的元素,其中:The above and other features, advantages and aspects of the various embodiments of the present disclosure will become more apparent with reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, identical or similar reference numerals denote identical or similar elements, wherein:
图1示出了本公开的多个实施例能够在其中实现的示例环境的示意图;Figure 1 shows a schematic diagram of an example environment in which various embodiments of the present disclosure can be implemented;
图2示出了根据本公开的一些实施例的自动生成视频的过程的示例的流程图;FIG. 2 shows a flowchart of an example of a process of automatically generating a video according to some embodiments of the present disclosure;
图3示出了根据本公开的一些实施例的自动生成视频的过程的另一示例的流程图;FIG. 3 shows a flowchart of another example of a process of automatically generating a video according to some embodiments of the present disclosure;
图4示出了根据本公开的实施例的自动生成视频的装置的示意框图;以及Fig. 4 shows a schematic block diagram of an apparatus for automatically generating video according to an embodiment of the present disclosure; and
图5示出了能够实施本公开的多个实施例的计算设备的框图。Figure 5 shows a block diagram of a computing device capable of implementing various embodiments of the present disclosure.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein; A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for exemplary purposes only, and are not intended to limit the protection scope of the present disclosure.
在本公开的实施例的描述中,术语“包括”及其类似用语应当理解为开放性包含,即“包括但不限于”。术语“基于”应当理解为“至少部分地基于”。术语“一个实施例”或“该实施例”应当理解为“至少一个实施例”。术语“第一”、“第二”等等可以指代不同的或相同的对象。下文还可能包括其他明确的和隐含的定义。In the description of the embodiments of the present disclosure, the term "comprising" and its similar expressions should be interpreted as an open inclusion, that is, "including but not limited to". The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be read as "at least one embodiment". The terms "first", "second", etc. may refer to different or the same object. Other definitions, both express and implied, may also be included below.
如以上提及的,在传统方案中,在创作视频时,存在如下缺陷:(1)在视频制作过程中,除了用户拍摄的内容以外,往往还需要多种多样的视频素材来完善用户所需的表达效果,而这些视频素材对于用户而言有着难以逾越的专业门槛,主要存在难获取、种类少、价格高昂等问题;(2)用户需要花费大量时间对素材进行合成、例如确定不同格式的素材间的转换效果或者确定文字在视频中的位置等。As mentioned above, in the traditional solution, there are the following defects when creating a video: (1) In the video production process, in addition to the content shot by the user, a variety of video materials are often required to complete the content required by the user. However, these video materials have insurmountable professional thresholds for users, and mainly have problems such as difficulty in obtaining, few types, and high prices; (2) users need to spend a lot of time to synthesize materials, such as determining different formats. Transition effects between clips or determining the position of text in the video, etc.
本公开的示例实施例提出了一种自动生成视频的方案。在该方案中,首先接收用户输入的第一多媒体内容和对待生成视频的描述。然后根据上述多媒体内容和描述,获取对第二多媒体内容。最后将用户输入的第一多媒体内容和获取的第二多媒体内容进行组合,以生成视频。由此,可以自动地获取用于生成视频的高质量多媒体内容,以高效地生成视频。Exemplary embodiments of the present disclosure propose a solution for automatically generating videos. In this solution, firstly, the first multimedia content input by the user and the description of the video to be generated are received. Then according to the above multimedia content and description, acquire the second multimedia content. Finally, the first multimedia content input by the user and the second acquired multimedia content are combined to generate a video. Thereby, high-quality multimedia content for generating videos can be automatically acquired to efficiently generate videos.
图1示出了本公开的多个实施例能够在其中实现的示例环境100的示意图。如图所示,示例环境100包括计算设备110、数据库120和视频130和用户140。计算设备110可以连接至数据库120。计算设备110还可以从用户140接收用户所输入。数据库120可以是集中式或分布式的任何适当的数据库,包括但不限于基于知识图谱技术的数据库和基于检索的数据库。FIG. 1 shows a schematic diagram of an
在一个实施例中,为了生成视频130,计算设备110可以从数据库中存储的知识图谱中获取针对视频130的多媒体内容,也即素材。知识图谱本质上旨在描述真实世界客观存在的知识、以及知识之间等关联关系的语义网络。基于知识图谱的应用领域,时下通常将知识图谱分为通用知识图谱和垂直知识图谱(又称行业知识图谱)。通用知识图谱不面向特定领域,可将其类比为结构化的百科知识。这类知识图谱包含了大量常识性知识,强调知识的广度。垂直知识图谱则面向特定领域,基于行业知识构建,强调知识的深度。本文中的知识图谱可以是专用于视频的知识图谱,其中知识图谱中的每个节点存储有与该节点相关联的多模态数据。但也可以是通用知识图谱,本公开在此不做限制。In one embodiment, in order to generate the
在一个备选实施例中,数据库130中的知识图谱中有一些数据是不完善的。例如,以发动机为例,在初始构建知识图谱时,发动机这一概念可能包含有油耗、颜色、排量、品牌和型号等等属性,这些属性均是常识性知识,为公众所公知,因此在初始构建知识图谱时,这些有关于发动机属性的概念便可添加至知识图谱中,一个概念位于一个节点。初步构建了一个知识图谱的整体大框架。但是油耗具体为多少、颜色都包括哪些、排量的大小是多少升、品牌和型号都包括哪些这类的知识却是千变万化的,并不属于常识性知识,因此是无法具体给出的,还需对这些数据再进行收集。此外,知识图谱里已有的数据可能仅仅是单模态的,例如仅仅是文本格式的,这无法用于视频的创作。In an alternative embodiment, some data in the knowledge graph in the
在上述不完善的知识图谱的情况下,计算设备110可以经由网络从数据库以外获取全网数据来重构知识图谱。网络可以是任何适当的网络,包括但不限于因特网、局域网(LAN)、城域网(MAN)、广域网(WAN)、诸如光纤网络和同轴电缆等的有线网、以及诸如WIFI、蜂窝电信网络和蓝牙等的无线网等。In the case of the above-mentioned imperfect knowledge graph, the
计算设备110可以是集中式或分布式的任何适当的计算设备,包括但不限于个人计算机、服务器、客户端、手持或膝上型设备、多处理器、微处理器、机顶盒、可编程消费电子产品、网络PC、小型计算机、大型计算机系统和分布式云以及其组合等。
计算设备110还可以利用上述获取的多媒体内容来合成视频130,生成视频的详细过程将在下文进行阐述。The
图2示出了根据本公开的一些实施例的自动合成视频的过程200的示例的流程图。过程200可以由计算设备110来实现。FIG. 2 shows a flowchart of an example of a
在210,计算设备110接收用户输入,用户输入包括第一多媒体内容和用于描述视频的关键短语,第一多媒体内容具有多种预定数据格式中的至少一种数据格式。At 210,
计算设备110可以从用户140接收用户输入,该用户输入包括用户对视频创作提供的第一多媒体内容,计算设备110以该第一多媒体内容作为基础数据来生成视频。该第一多媒体内容可以是文本数据、图片数据、视频数据或者声音数据中的一种格式的数据或者多模态数据。该第一多媒体内容可以是excel、json、csv等常见的结构化类型。The
用户输入还包括用于描述视频的关键短语,也即用户140对待创作视频所反映的内容的简短描述,其可以是视频的主题。例如,用户A输入的第一多媒体内容为二月份的天气温度的excel文本数据,用户A输入的关键短语为“2月份气温走势图”,或者用户B输入的第一多媒体内容为其本身形象的图片,用户B输入的关键短语“用户B带你游览世界名胜”。The user input also includes a key phrase used to describe the video, that is, a short description of what the
在一个实施例中,计算设备110可以对第一多媒体内容中的文本数据进行清洗,对文本数据进行分词操作来提取本文数据的实体属性信息。其中清洗后的文本数据会用于生成视频的字幕、图表、标题等信息。In one embodiment, the
在一个实施例中,计算设备110可以对第一多媒体内容中的视频数据进行摘要处理以获取其中的重要素材供最终合成视频所用。计算设备110还可以对视频数据进行场景分割,然后对各个场景使用特定算法对视频进行理解,获取到每个场景所属的主题、分类、涉及到的人物等信息,然后根据用户输入的关键短语选择最合适的场景作为素材使用。In one embodiment, the
在一个实施例中,计算设备110可以对第一多媒体内容中的图片数据进行识别,对尺寸小于300*300、包含广告、涉黄涉政等相关的图片进行过滤,如果过滤后的图片太少则查询多模态图谱进行智能补充。In one embodiment, the
在一个实施例中,计算设备110可以将第一多媒体内容中的声音数据转换为文本数据和/或将文本数据转换为声音数据,以用于在最终的视频生成中形成字幕或者音频信息。In one embodiment, the
上述对不同类数据的处理将在下文进一步阐述。The above processing of different types of data will be further elaborated below.
在220,计算设备110基于关键短语,从预先构建的知识图谱确定至少一个节点。预先构建的多模态知识图谱中可以存在多个节点,每个节点可以对应于多个相近的关键短语构成,并且存储有相关联的多模态数据。预先构建的多模态知识图谱可以被存储在数据库120中,也可以由计算设备110构建,本公开在此不做限制。At 220, the
在一个实施例中,计算设备110确定关键短语和知识图谱中的目标节点所对应的目标关键短语之间的匹配度,并且如果确定匹配度大于第一预定阈值,将目标节点确定为至少一个节点。例如,继续以上述用户A和用户B为示例进行描述。计算设备110确定知识图谱中的目标关键短语“走势”、“趋势”、“形势”等与用户A输入的关键短语“走势图”之间的匹配度大于第一预定阈值0.8(最大为1),则确定上述目标关键短语“走势”、“趋势”、“形势”所对应的节点A为目标节点,也可以通过“气温”确定与目标关键短语“天气”、“温度”所对应的节点A’为目标节点。关键短语支架内的匹配度可以通过计算短语的向量见得欧式距离来获得,本公开在此不再赘述。类似地,计算设备110可以将与目标关键短语“古迹”多对应的节点确定为目标节点,其中目标关键短语“古迹”与用户B输入的关键短语“名胜”之间的匹配度大于第一预定阈值0.8。该第一预定阈值0.8的数值是示意性的,本公开在此不做限制。In one embodiment, the
在230,计算设备110基于第一多媒体内容,获取与至少一个节点相关联的第二多媒体内容。例如,计算设备140在上述220中确定了至少一个节点后,可以通过该节点在知识图谱中确定相关联的第二多媒体内容作为第一多媒体内容的补充素材。At 230,
在一个实施例中,计算设备110确定第一多媒体内容中不包括的至少一种数据格式,并且获取与至少一个节点相关联的、呈至少一种数据格式的数据作为第二多媒体内容。例如,继续以上述用户A和用户B为示例进行描述。计算设备110确定用户A输入的第一多媒体内容中的天气为excel文本数据,则计算设备110确定该第一多媒体内容不包括图片数据、视频数据和声音数据。则计算设备110可以从数据库120中的知识图谱中确定与节点A和A’相关联的、呈图片数据、视频数据和声音数据格式的数据,例如包括但不限于天气图标、打雷的声音、下雪的背景动画、曲线走势图的动画模板等。In one embodiment, the
对于用户B,计算设备110确定第一多媒体内容为用户B本身形象的图片,则其可以从数据库120中的知识图谱中确定与节点B相关联的、呈文本数据、视频数据和声音数据格式的数据,例如包括但不限于世界名胜的视频、文字介绍和动画。For user B,
在另一实施例中,计算设备110确定第一多媒体内容中的至少一种数据格式的多媒体内容的数据量,并且如果确定数据量小于第二预定阈值,计算设备110获取与至少一个节点相关联的、呈至少一种数据格式的数据作为第二多媒体内容。例如,继续以上述用户A和用户B为示例进行描述。计算设备110确定A用户输入的第一多媒体内容中的excel文本数据的数据量小于第二预定阈值8KB,例如其仅仅记录了2月中10天的天气信息,则计算设备110获取与节点A’相关联的、呈至少一种数据格式的数据作为第二多媒体内容,例如整个2月的天气的文本信息作为第二多媒体内容。In another embodiment, the
对于用户B,计算设备110确定用户140输入的第一多媒体内容的数据量例如小于第二预定阈值20MB,则计算设备110获取与节点B相关联的、呈至少一种数据格式的数据作为第二多媒体内容,例如世界名胜的视频、文字介绍和动画。For user B,
请注意,上述第一预定阈值和第二预定阈值的数值仅仅是示例性的,可以根据用户输入和计算设备110调整不同的阈值。Please note that the above-mentioned values of the first predetermined threshold and the second predetermined threshold are only exemplary, and different thresholds may be adjusted according to user input and the
通过对用户提供的基础素材和对视频的要求进行自动分析,并且根据分析结果在多模态知识图谱中进行匹配以自动获取基础素材中数据量不足的数据或者数据格式不足的高质量数据,由此解决了传统方案中获取素材难的缺陷。By automatically analyzing the basic material provided by the user and the requirements for the video, and matching the analysis results in the multi-modal knowledge map to automatically obtain data with insufficient data in the basic material or high-quality data with insufficient data format, by This solves the defect that it is difficult to obtain materials in the traditional solution.
在一个备选实施例中,计算设备110可以利用预先设置的视频检索模块、相似图片检索模块以及话题检索模块等在多模态图谱中查找相关的字段来补充给定数据中的不足信息。In an alternative embodiment, the
在240,计算设备110基于第一多媒体内容和第二多媒体内容,生成视频。例如,计算设备110可以对上述获取的多种格式的多媒体内容进行进一步处理,以生成视频。At 240,
在一个实施例中,计算设备110对第一多媒体内容和第二多媒体内容中的文本内容进行语义分析,以生成文本元素。然后,计算设备110确定文本元素在视频中的位置、文本元素中的文字大小、文本元素的显示效果、文本元素的显示时间中的至少一项,以生成视频。In one embodiment, the
例如,计算设备110对多媒体内容中的文本内容进行语义分析,将其与图片信息相关联,并且进一步确定其在相关联的图片中的限制位置、持续显示时间,文本的大小、位置的动态变化、文本的动态效果等。根据上述操作。可以将文本内容与图像帧相关联,从而使文本信息可以清楚地描述每个图像帧。For example, the
在另一个实施例中,计算设备110获取第一多媒体内容和第二多媒体内容中的视频内容。计算设备110接着确定视频内容中的、与关键短语相关联的多个图像帧。计算设备110然后确定多个图像帧在视频中的顺序和多个图像帧间的转换效果,最后,计算设备110按照顺序,利用转换效果,生成视频。例如,计算设备110可以确定根据用户140输入的关键短语确定多媒体内容中的视频内容中关键图像帧。计算设备110可以将图像帧中的、与关键短语间的匹配度大于阈值的图像帧作为与关键短语相关联的多个图像帧。该图像帧往往最能反映用户预想的视频最终呈现的效果。计算设备110然后可以根据所确定的相关联的多个图像帧来确定图像帧-图像帧、图像帧帧-视频、视频-图像帧3类转场效果。例如,以上述用户B为示例,计算设备110可以根据世界名胜的不同风格确定图片和视频在最终合成的视频中的出场顺序,并且在图片和视频数据之间的加入转换效果,以使得转换更加自然。In another embodiment, the
通过分析视频素材中的文字、图片和视频的属性,可以自动地根据其属性对其在视频中出现的顺序进行优选组合,并且自动设置转换效果,使得视频整体上更加流畅和自然。By analyzing the properties of text, pictures and videos in the video material, it can automatically optimize the order in which they appear in the video according to their properties, and automatically set the conversion effect to make the video more smooth and natural as a whole.
由此,可以自动地获取用于生成视频的高质量多媒体内容,并且根据多媒体内容间的联系来高效地生成高质量的视频。解决了制作视频对用户的技术要求高且素材难以获取的关键问题。Thus, it is possible to automatically acquire high-quality multimedia content for generating a video, and efficiently generate a high-quality video according to the relationship between the multimedia content. It solves the key problem that the production of video requires high technology for users and the material is difficult to obtain.
图3示出了根据本公开的一些实施例的自动生成视频的过程的另一示例的流程图。过程300可以由计算设备110来实现。其中计算设备110可以通过基于FFMPEG的底层框架来实现本公开的步骤,即将文本、图片、视频、声音等经过一系列的操作合称为一个视频的过程。FFMPEG为linux类系统中较为底层视频处理工具,主要功能包括对视频的编码和解码等。当然还可以应用其它框架,本公开不旨在限制。FIG. 3 shows a flowchart of another example of the process of automatically generating a video according to some embodiments of the present disclosure.
在310,计算设备110从用户140接收用户输入的第一多媒体内容和描述视频的关键短语。At 310 ,
在320,计算设备110对用户输入的第一多媒体内容进行处理,并且根据用户输入从知识图谱360中获取第二多媒体内容来补充第一多媒体内容。At 320, the
在330,计算设备110对上述第一多媒体内容和第二多媒体内容进行基础处理。该基础处理包括但不限于:对声音相关的操作,包括多源声音的合成,声音变换、声音裁剪等;自定义蒙版操作,该功能主要为后续的应用处理340服务,可以实现各类动画效果的制作;视频相关的功能,具体实现了对视频的大小、位置变换,视频的颜色、帧数以及时长等动态效果操作。At 330, the
在440,计算设备110在基础处理330的基础上进行进一步的应用处理。相对于基础处理330,应用处理340实现了对FFMPEG的功能的更高级封装。At 440 ,
例如,计算设备110可以应用基础处理340来生成图表视频。计算设备110可以分析用户140输入的数据,然后按照用户140的要求选择不同的图表将数据加入到视频130中。For example,
在一个实施例中,计算设备110可以基于自定义蒙版的方法,该方法主要应用于图表可以一次性展示完毕的情况,实现时需要先选择一个动态的模板,然后将动态模板作为当前填充图表的蒙版,从而形成动画效果生成视频.In one embodiment, the
在另一实施例中,计算设备110可以基于坐标轴归一的方法,该方法主要应用于图表不能一次性展示完毕的情况,例如,以上述用户A作为示例,“2月份气温走势图”,每次只展示7天的情况,即需要展示28个7天的走势信息,这些走势线条在视频中展示是连续的,这导致图表中的内容在不断变化,为了体现这种变化,在视频尺寸不变的情况下,计算设备110通过计算当前的数据信息,不断的移动坐标轴就可以实时的显示走势的变化,以生成所需的图表视频。In another embodiment, the
在应用处理340中,计算设备110可以使用数据库120提供的AR人物接口,以根据输入的数据自动生成AR人物播报。计算设备110可以将声音数据转换为视频数据,通过先对声音进行文本转化,然后对文本计算其与声音时间点的关系,确定每个时间点文本的大小、所在位置等信息构成一连串的时间文本。计算设备110还可以对给定的文本串进行长度、实体词数量、冗余数等信息进行统计,然后按照上述数据进行打散混合,接着按照视频时间进行弹幕的位置、移动速率、颜色、字号等,以生成自动字幕。In the
最后,在350,计算设备110合成视频130。计算设备可以首先对该视频130进行云存储,然后将存储的地址发送给用户140以供用户查看和分享。Finally, at 350 ,
根据本公开,可以通过分析用户输入数据,补充用户数据,然后对数据进行上述处理来实现高质量的视频的自动合成。According to the present disclosure, automatic synthesis of high-quality video can be realized by analyzing user input data, supplementing user data, and then performing the above-mentioned processing on the data.
步骤410至步骤450中的每个步骤的具体实施参考图2的描述,在此不再赘述。The specific implementation of each step in
图4示出了根据本公开的实施例的自动生成视频的装置400的示意框图。如图4所示,装置400包括:输入接收模块410,被配置为接收用户输入,用户输入包括第一多媒体内容和用于描述视频的关键短语,第一多媒体内容具有多种预定数据格式中的至少一种数据格式;第一节点确定模块420,被配置为基于关键短语,从预先构建的知识图谱确定至少一个节点;第一多媒体内容获取模块430,被配置为基于第一多媒体内容,获取与至少一个节点相关联的第二多媒体内容;以及第一视频生成模块440,被配置为基于第一多媒体内容和第二多媒体内容,生成视频。Fig. 4 shows a schematic block diagram of an
在一些实施例中,其中第一节点确定模块420可以包括:匹配模块,被配置为确定关键短语和知识图谱中的目标节点所对应的目标关键短语之间的匹配度;以及第二节点确定模块,被配置为如果确定匹配度大于第一预定阈值,将目标节点确定为至少一个节点。In some embodiments, the first
在一些实施例中,其中第一多媒体内容获取模块430包括:数据格式确定模块,被配置为确定第一多媒体内容中不包括的至少一种数据格式;以及第二多媒体内容获取模块,被配置为获取与至少一个节点相关联的、呈至少一种数据格式的数据作为第二多媒体内容。In some embodiments, the first multimedia
在一些实施例中,其中第一多媒体内容获取模块430包括:数据量确定模块,被配置为确定第一多媒体内容中的至少一种数据格式的多媒体内容的数据量;以及第三多媒体内容获取模块,被配置为如果确定数据量小于第二预定阈值,获取与至少一个节点相关联的、呈至少一种数据格式的数据作为第二多媒体内容。In some embodiments, the first multimedia
在一些实施例中,其中第一视频生成模块440包括:文本元素生成模块,被配置为对第一多媒体内容和第二多媒体内容中的文本内容进行语义分析,以生成文本元素;以及第二视频生成模块,被配置为基于文本元素,生成视频。In some embodiments, the first video generation module 440 includes: a text element generation module configured to perform semantic analysis on the text content in the first multimedia content and the second multimedia content to generate text elements; and a second video generation module configured to generate a video based on the text element.
在一些实施例中,其中第二视频生成模块包括:第三视频生成模块,被配置为确定文本元素在视频中的位置、文本元素中的文字大小、文本元素的显示效果、文本元素的显示时间中的至少一项,生成视频。In some embodiments, the second video generation module includes: a third video generation module configured to determine the position of the text element in the video, the size of the text in the text element, the display effect of the text element, and the display time of the text element At least one of the to generate a video.
在一些实施例中,其中第一视频生成模块440包括:视频内容获取模块,被配置为获取第一多媒体内容和第二多媒体内容中的视频内容;图像帧确定模块,被配置为确定视频内容中的、与关键短语相关联的多个图像帧;以及第四视频生成模块,被配置为基于多个图像帧,生成视频。In some embodiments, the first video generation module 440 includes: a video content acquisition module configured to acquire video content in the first multimedia content and the second multimedia content; an image frame determination module configured to Determining a plurality of image frames associated with the key phrase in the video content; and a fourth video generating module configured to generate a video based on the plurality of image frames.
在一些实施例中,其中第四视频生成模块包括:转换效果确定模块,被配置为确定多个图像帧在视频中的顺序和多个图像帧间的转换效果;以及第五视频生成模块,被配置为按照顺序,利用转换效果,生成视频。In some embodiments, the fourth video generation module includes: a conversion effect determination module configured to determine the sequence of multiple image frames in the video and the conversion effect between multiple image frames; and the fifth video generation module is configured Configured to generate video in sequence using transition effects.
在一些实施例中,其中多种预定数据格式包括文本数据格式、图片数据格式、视频数据格式和声音数据格式中的至少一项。In some embodiments, the plurality of predetermined data formats include at least one of text data format, picture data format, video data format and sound data format.
图5示出了可以用来实施本公开的实施例的示例设备500的示意性框图。设备500可以用于实现图1的计算设备110。如图所示,设备400包括中央处理单元(CPU)510,其可以根据存储在只读存储器(ROM)520中的计算机程序指令或者从存储单元580加载到随机访问存储器(RAM)530中的计算机程序指令,来执行各种适当的动作和处理。在RAM 530中,还可存储设备500操作所需的各种程序和数据。CPU 510、ROM 520以及RAM 530通过总线540彼此相连。输入/输出(I/O)接口550也连接至总线540。Fig. 5 shows a schematic block diagram of an
设备500中的多个部件连接至I/O接口550,包括:输入单元560,例如键盘、鼠标等;输出单元570,例如各种类型的显示器、扬声器等;存储单元580,例如磁盘、光盘等;以及通信单元590,例如网卡、调制解调器、无线通信收发机等。通信单元590允许设备500通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Multiple components in the
处理单元510执行上文所描述的各个方法和处理,例如过程200和/或过程300。例如,在一些实施例中,过程200和/或过程300可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元580。在一些实施例中,计算机程序的部分或者全部可以经由ROM 520和/或通信单元590而被载入和/或安装到设备500上。当计算机程序加载到RAM530并由CPU 610执行时,可以执行上文描述的过程200和/或过程300的一个或多个步骤。备选地,在其他实施例中,CPU 510可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行过程200和/或过程300。The
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)等等。The functions described herein above may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field programmable gate array (FPGA), application specific integrated circuit (ASIC), application specific standard product (ASSP), system on a chip (SOC), load programmable logic device (CPLD), etc.
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
此外,虽然采用特定次序描绘了各操作,但是这应当理解为要求这样操作以所示出的特定次序或以顺序次序执行,或者要求所有图示的操作应被执行以取得期望的结果。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实现中。相反地,在单个实现的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实现中。In addition, while operations are depicted in a particular order, this should be understood to require that such operations be performed in the particular order shown, or in sequential order, or that all illustrated operations should be performed to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while the above discussion contains several specific implementation details, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.
Claims (20)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011383389.4A CN112565875B (en) | 2020-11-30 | 2020-11-30 | Method, device, equipment and computer readable storage medium for automatically generating video |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011383389.4A CN112565875B (en) | 2020-11-30 | 2020-11-30 | Method, device, equipment and computer readable storage medium for automatically generating video |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN112565875A CN112565875A (en) | 2021-03-26 |
| CN112565875B true CN112565875B (en) | 2023-03-03 |
Family
ID=75045967
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202011383389.4A Active CN112565875B (en) | 2020-11-30 | 2020-11-30 | Method, device, equipment and computer readable storage medium for automatically generating video |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN112565875B (en) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113438538B (en) * | 2021-06-28 | 2023-02-10 | 康键信息技术(深圳)有限公司 | Short video preview method, device, equipment and storage medium |
| CN114979054B (en) * | 2022-05-13 | 2024-06-18 | 维沃移动通信有限公司 | Video generation method, device, electronic device and readable storage medium |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110309351A (en) * | 2018-02-14 | 2019-10-08 | 阿里巴巴集团控股有限公司 | Video image generation, device and the computer system of data object |
| CN109189938B (en) * | 2018-08-31 | 2025-09-30 | 北京字节跳动网络技术有限公司 | Method and device for updating knowledge graph |
| CN109344291B (en) * | 2018-09-03 | 2020-08-25 | 腾讯科技(武汉)有限公司 | Video generation method and device |
| CN109614537A (en) * | 2018-12-06 | 2019-04-12 | 北京百度网讯科技有限公司 | Method, apparatus, device and storage medium for generating video |
| CN110532404B (en) * | 2019-09-03 | 2023-08-04 | 北京百度网讯科技有限公司 | Source multimedia determining method, device, equipment and storage medium |
| CN111767796B (en) * | 2020-05-29 | 2023-12-15 | 北京奇艺世纪科技有限公司 | Video association method, device, server and readable storage medium |
-
2020
- 2020-11-30 CN CN202011383389.4A patent/CN112565875B/en active Active
Also Published As
| Publication number | Publication date |
|---|---|
| CN112565875A (en) | 2021-03-26 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN114880441B (en) | Visual content generation method, device, system, device and medium | |
| JP7240505B2 (en) | Voice packet recommendation method, device, electronic device and program | |
| KR20210153009A (en) | Method for automatically generating advertisement, apparatus, device, and computer-readable storage medium | |
| CN111935537A (en) | Music video generation method and device, electronic equipment and storage medium | |
| CN115496550A (en) | Text generation method and device | |
| US12277766B2 (en) | Information generation method and apparatus | |
| CN114066718A (en) | Image style migration method and device, storage medium and terminal | |
| WO2024235271A1 (en) | Movement generation method and apparatus for virtual character, and construction method and apparatus for movement library of virtual avatar | |
| CN116611496A (en) | Text-to-image generation model optimization method, device, equipment and storage medium | |
| CN111883101B (en) | A model training and speech synthesis method, device, equipment and medium | |
| Kuroczyński et al. | 3D models on triple paths-new pathways for documenting and visualizing virtual reconstructions | |
| CN117011875A (en) | Method, device, equipment, medium and program product for generating multimedia page | |
| CN110706300A (en) | Virtual image generation method and device | |
| US20240195765A1 (en) | Personality reply for digital content | |
| CN111667557A (en) | Animation production method and device, storage medium and terminal | |
| CN114638232A (en) | Method and device for converting text into video, electronic equipment and storage medium | |
| CN112565875B (en) | Method, device, equipment and computer readable storage medium for automatically generating video | |
| CN108305306B (en) | An animation data organization method based on sketch interaction | |
| CN118474476A (en) | AIGC-based travel scene video generation method, system, equipment and storage medium | |
| CN111626023A (en) | Automatic generation method, device and system for visualization chart highlighting and annotation | |
| CN113240780B (en) | Method and device for generating animation | |
| Hu et al. | Efficient procedural modelling of building Façades based on windows from sketches | |
| Wang et al. | Integrated design system of voice-visual VR based on multi-dimensional information analysis | |
| CN112559758A (en) | Method, device and equipment for constructing knowledge graph and computer readable storage medium | |
| CN113704488A (en) | Content generation method and device, electronic equipment and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |
