CN113794930A - Video generation method, device, equipment and storage medium - Google Patents

Video generation method, device, equipment and storage medium Download PDF

Info

Publication number
CN113794930A
CN113794930A CN202111064510.1A CN202111064510A CN113794930A CN 113794930 A CN113794930 A CN 113794930A CN 202111064510 A CN202111064510 A CN 202111064510A CN 113794930 A CN113794930 A CN 113794930A
Authority
CN
China
Prior art keywords
video
multimedia
target
structured knowledge
template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111064510.1A
Other languages
Chinese (zh)
Other versions
CN113794930B (en
Inventor
于向丽
张煜
刘驰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202111064510.1A priority Critical patent/CN113794930B/en
Publication of CN113794930A publication Critical patent/CN113794930A/en
Application granted granted Critical
Publication of CN113794930B publication Critical patent/CN113794930B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The application provides a video generation method, a video generation device, video generation equipment and a storage medium, wherein the method responds to video generation operation of a user and acquires target structured knowledge and a target template selected by the user; triggering a conversation recording function to acquire a conversation content video; inputting the conversation content video, the target structured knowledge and the target template into a preset training model, and outputting to obtain a target multimedia material; the target multimedia material is spliced to obtain the target video, and the technical problems that a large amount of manpower and material resources are consumed, the cost is high, the time consumption is long, the efficiency is low, and the video generated in a keyword searching mode is difficult to ensure the video quality are solved.

Description

Video generation method, device, equipment and storage medium
Technical Field
The present application relates to the field of communications technologies, and in particular, to a video generation method, apparatus, device, and storage medium.
Background
Short video, namely short video, is an internet content transmission mode, and with the popularization of mobile terminals and the speed increase of networks, short and fast large-flow transmission content is gradually and widely applied to various fields, for example, the transmission content is rapidly developed in a customer service system in the telecommunication service field, under the large environment, the interaction mode between an operator and the outside is not limited to the traditional text and voice mode, and more short video modes are performed, such as brand publicity, package introduction, activity popularization and the like.
The existing video is recorded and post-edited by manpower, or a plurality of related pictures are searched on the internet to be spliced into the video according to keywords in the text description in a video file to be formed.
However, in the prior art, the method for manually editing the video needs to consume a large amount of manpower and material resources, is high in cost, long in time consumption and low in efficiency, and the video generated by the keyword search mode is difficult to ensure the video quality.
Disclosure of Invention
The application provides a video generation method, a video generation device, video generation equipment and a storage medium, so that the technical problems that in the prior art, a method for manually editing videos needs to consume a large amount of manpower and material resources, the cost is high, the time consumption is long, the efficiency is low, and the quality of videos generated in a keyword search mode is difficult to guarantee are solved.
In a first aspect, the present application provides a video generation method, including:
responding to video generation operation of a user, and acquiring target structured knowledge and a target template selected by the user, wherein the target structured knowledge comprises rule information for generating a target video;
triggering a conversation recording function to acquire a conversation content video;
inputting the conversation content video, the target structured knowledge and the target template into a preset training model, and outputting to obtain a target multimedia material;
and splicing the target multimedia material to obtain a target video.
Here, the present application can acquire the target structured knowledge and the target template selected by the user when the user needs to generate the video, then start the dialogue recording function, acquire the video content to be generated according to the recorded dialogue content video, input the dialogue content video and the target structured knowledge and the target template selected by the user into the preset training model, and can perform the screening of the multimedia material through the preset training model, thereby generating the target video according to the screened multimedia material, wherein, the present application does not need to manually screen and generate the video material, can automatically screen and generate the video material, saves the cost of manpower and material resources, also saves the time of video generation, improves the efficiency of video generation, and screens the video material through the training model aiming at the method of generating the video through the keyword in the prior art, meanwhile, the conversation content video recorded in real time is used as a parameter, the purpose that the video generated in the later period has higher fitness with the needed content style is achieved, and the quality of the generated video is guaranteed.
Optionally, before the inputting the dialog content video, the target structured knowledge and the target template into a preset training model, the method further includes:
acquiring a plurality of multimedia videos in a preset knowledge base;
splitting the multimedia video to obtain a multimedia material sample corresponding to the multimedia video;
and performing model training according to the multimedia material sample and the multimedia video to obtain a preset training model.
The method comprises the steps of processing multimedia videos acquired from a preset knowledge base to obtain corresponding multimedia material samples, training according to the multimedia videos and the multimedia material samples, and obtaining the preset training model capable of accurately screening the video materials by taking knowledge in the preset knowledge base as reference so as to ensure the quality of generated videos.
Optionally, the splitting the multimedia video to obtain a multimedia material sample corresponding to the multimedia video includes:
judging whether the multimedia video is formed by splicing multimedia materials or not;
if the multimedia video is formed by splicing multimedia materials, splitting the multimedia video to obtain a plurality of multimedia material samples and material scores corresponding to the multimedia material samples in a template;
correspondingly, the performing model training according to the multimedia material sample and the multimedia video to obtain a preset training model comprises:
and performing model optimization on the training model according to the multimedia material sample and the material score to obtain a preset training model.
According to the method, when the model is trained, whether the multimedia video is formed by splicing multimedia materials is judged firstly, the multimedia video obtained by splicing the multimedia materials can be split to obtain each multimedia material sample and each material module score corresponding to the multimedia material sample, and the training model can be optimally trained through the data and the scores, so that an accurate and optimized preset training model is obtained, and the accuracy of model weight and the quality of video generation are further ensured.
Optionally, the splitting the multimedia video to obtain a multimedia material sample corresponding to the multimedia video includes:
judging whether the multimedia video is formed by splicing multimedia materials or not;
if the multimedia video is not formed by splicing multimedia materials, acquiring a template from the preset knowledge base;
splitting the multimedia video according to a template to obtain a plurality of multimedia materials, and acquiring a structured knowledge sample corresponding to the template;
extracting information from the multimedia materials to obtain structural knowledge information corresponding to the multimedia video;
correspondingly, the performing model training according to the multimedia material sample and the multimedia video to obtain a preset training model comprises:
and inputting the structured knowledge sample, the multimedia video, the multimedia material and the structured knowledge information into a training model for training to obtain a preset training model.
In the model training process, whether the multimedia video is formed by splicing multimedia materials is judged firstly, and for the multimedia video which is not formed by splicing the media materials, the multimedia video can be marked and split in the form of a template, the multimedia video can be split into multimedia materials, the structured knowledge information in the split multimedia materials can be extracted, meanwhile, according to the content after the splitting, finding the part of the same template corresponding to the structured knowledge, extracting the structured knowledge data of the part, binding the data with the character content of the corresponding multimedia material to form a group of training data, thereby being input into the training model for model training to obtain the preset training model and further obtain the accurate and optimized preset training model, the model can be used for automatically screening materials, and the accuracy of the model weight and the quality of video generation are further ensured.
Optionally, the extracting information from the multiple multimedia materials to obtain the structured knowledge information corresponding to the multimedia video includes:
and extracting images, videos and character contents in the multimedia material through a symbol numerology system based on data stream programming and a natural language processing identification technology to obtain the structured knowledge information.
The method and the device identify images, videos and text contents in the multimedia material by using a symbolic mathematical system (TensorFlow) based on data flow programming and a natural language processing technology, can accurately extract structural knowledge information in the multimedia material, and further ensure the accuracy of model weight and the quality of video generation.
Optionally, the splicing the target multimedia material to obtain the target video includes:
and splicing the target multimedia material according to the target template to obtain a target video.
The method and the device can splice the target multimedia material according to the target template, thereby achieving the capability of automatically generating the multimedia material, realizing the automatic construction of the multimedia video, saving manpower, improving the video output, further saving manpower and material resources, saving the cost and improving the efficiency of video generation.
Optionally, after the splicing processing is performed on the target multimedia material to obtain the target video, the method further includes:
and pushing the target video to the client of the user.
According to the method and the device, after the target video is generated, the target video can be directly pushed to the client of the user, the user can conveniently obtain the information which the user wants to obtain according to the target video, and the user experience is improved.
In a second aspect, the present application provides a video generating apparatus, comprising:
the video processing device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for responding to video generation operation of a user and acquiring target structured knowledge and a target template selected by the user, and the target structured knowledge comprises rule information for generating a target video;
the first processing module is used for triggering a conversation recording function and acquiring a conversation content video;
the second processing module is used for inputting the conversation content video, the target structured knowledge and the target template into a preset training model and outputting to obtain a target multimedia material;
and the third processing module is used for splicing the target multimedia material to obtain a target video.
Optionally, before the second processing module inputs the dialog content video, the target structured knowledge and the target template into a preset training model, the apparatus further includes:
the second acquisition module is used for acquiring a plurality of multimedia videos in a preset knowledge base;
the splitting module is used for splitting the multimedia video to obtain a multimedia material sample corresponding to the multimedia video;
and the training module is used for carrying out model training according to the multimedia material samples and the multimedia video to obtain a preset training model.
Optionally, the splitting module is specifically configured to:
judging whether the multimedia video is formed by splicing multimedia materials or not;
if the multimedia video is formed by splicing multimedia materials, splitting the multimedia video to obtain a plurality of multimedia material samples and material scores corresponding to the multimedia material samples in a template;
correspondingly, the training module is specifically configured to:
and performing model optimization on the training model according to the multimedia material sample and the material score to obtain a preset training model.
Optionally, the splitting module is specifically configured to:
judging whether the multimedia video is formed by splicing multimedia materials or not;
if the multimedia video is not formed by splicing multimedia materials, acquiring a template from the preset knowledge base;
splitting the multimedia video according to a template to obtain a plurality of multimedia materials, and acquiring a structured knowledge sample corresponding to the template;
extracting information from the multimedia materials to obtain structural knowledge information corresponding to the multimedia video;
correspondingly, the training module is specifically configured to:
and inputting the structured knowledge sample, the multimedia video, the multimedia material and the structured knowledge information into a training model for training to obtain a preset training model.
Optionally, the splitting module is further specifically configured to:
and extracting images, videos and character contents in the multimedia material through a symbol numerology system based on data stream programming and a natural language processing identification technology to obtain the structured knowledge information.
Optionally, the third processing module is specifically configured to:
and splicing the target multimedia material according to the target template to obtain a target video.
Optionally, after the third processing module performs splicing processing on the target multimedia material to obtain a target video, the apparatus further includes:
and the pushing module is used for pushing the target video to the client of the user.
In a third aspect, the present application provides a video generating device, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the video generation method as described above in the first aspect and various possible designs of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, implement a video generation method as set forth in the first aspect and various possible designs of the first aspect.
In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements a video generation method as described above in the first aspect and various possible designs of the first aspect.
The method can acquire target structured knowledge and a target template selected by a user when the user needs to generate a video, then start a conversation recording function, acquire the video content to be generated according to the recorded conversation content video, input the conversation content video, the target structured knowledge and the target template selected by the user into a preset training model, and screen multimedia materials through the preset training model, thereby generating the target video according to the screened multimedia materials, wherein, the method does not need to screen and generate the video materials manually, can screen and generate the video materials automatically, saves the cost of manpower and material resources, saves the time for generating the video, improves the efficiency for generating the video, and aims at the method for generating the video through keywords in the prior art, the video materials are screened through the training model, and simultaneously, the conversation content video recorded in real time is taken as a parameter, so that the aim that the video generated in the later period has higher fitness with the needed content style is fulfilled, and the quality of the generated video is guaranteed.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic diagram of a video generation system according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of a video generation method according to an embodiment of the present application;
fig. 3 is a schematic flowchart of another video generation method according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a video generating apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a video generating device according to an embodiment of the present application.
With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The terms "first," "second," "third," and "fourth," if any, in the description and claims of this application and the above-described figures are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
With the rapid development of customer service systems, the popularization of mobile terminals and the acceleration of networks, knowledge bases are no longer limited to traditional text modes, more information is developed towards diversification, short and fast mass-flow spreading contents are gradually favored by more and more users and enterprises, meanwhile, interaction modes between operators and the outside are also performed in more short video modes, such as brand propaganda, package introduction, activity popularization and the like, but the short videos are often manufactured at higher labor cost, shooting time and clipping time.
The existing video is recorded and post-edited by manpower, or a plurality of related pictures are searched on the internet according to keywords in the text description in a video file to be formed to be spliced into the video. However, in the existing mode, if manual recording is performed, more manpower and material resources are consumed; if the video is generated only through the keywords, the generated video content is lack of incoherence, and meanwhile, the video formed by splicing the network pictures extracted through the keywords has the problem that the content style and the like have great difference, so that the quality cannot be ensured.
In order to solve the above problems, embodiments of the present application provide a video generation method, an apparatus, a device, and a storage medium, where the method may acquire target structured knowledge and a target template selected by a user when the user needs to generate a video, then start a session recording function, acquire video content to be generated according to a recorded session content video, input the session content video, the target structured knowledge and the target template selected by the user into a preset training model, and perform screening of multimedia materials through the preset training model, so as to generate a target video according to the screened multimedia materials, thereby implementing automatic construction of a multimedia video, saving manpower, and improving video yield and quality.
Optionally, fig. 1 is a schematic diagram of a video generation system architecture provided in an embodiment of the present application. In fig. 1, the above-described architecture includes at least one of a receiving device 101, a processor 102, and a display device 103.
It is to be understood that the illustrated structure of the embodiments of the present application does not constitute a specific limitation to the architecture of the video generation system. In other possible embodiments of the present application, the foregoing architecture may include more or less components than those shown in the drawings, or combine some components, or split some components, or arrange different components, which may be determined according to practical application scenarios, and is not limited herein. The components shown in fig. 1 may be implemented in hardware, software, or a combination of software and hardware.
In a specific implementation process, the receiving device 101 may be an input/output interface or a communication interface.
The processor 102 may acquire the target structured knowledge and the target template selected by the user when the user needs to generate a video, then start a conversation recording function, acquire the video content to be generated according to the recorded conversation content video, input the conversation content video and the target structured knowledge and the target template selected by the user into a preset training model, and screen multimedia materials through the preset training model, so as to generate the target video according to the screened multimedia materials, thereby implementing automatic construction of the multimedia video, saving manpower, and improving the output and quality of the video.
The display device 103 may be used to display the above results, or may be used to interact with the user through the display device.
The display device can also be a touch display screen for receiving user instructions while displaying the content so as to realize interaction with the user.
It should be understood that the processor may be implemented by reading instructions in the memory and executing the instructions, or may be implemented by a chip circuit.
In addition, the network architecture and the service scenario described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not constitute a limitation to the technical solution provided in the embodiment of the present application, and it can be known by a person skilled in the art that along with the evolution of the network architecture and the appearance of a new service scenario, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.
The technical scheme of the application is described in detail by combining specific embodiments as follows:
optionally, fig. 2 is a schematic flow chart of a video generation method provided in the embodiment of the present application. The execution subject of the embodiment of the present application may be the processor 102 in fig. 1, and the specific execution subject may be determined according to an actual application scenario. As shown in fig. 2, the method comprises the steps of:
s201: and responding to the video generation operation of the user, and acquiring the target structured knowledge and the target template selected by the user.
Wherein the target structured knowledge comprises rule information for generating the target video.
Here, the video generation operation of the user may be a click or input operation on a user terminal or a video generation device, and the user performs selection of the target structured knowledge and the target template through the video generation operation. Alternatively, the input operation may be a plurality of input modes such as voice input, text input, and the like.
Each template has corresponding structured knowledge, the structured knowledge comprises rule information for generating the video, and a user can select a target template and the target structured knowledge from the pre-stored templates.
Optionally, the templates and the structured knowledge are pre-stored in a preset knowledge base.
The knowledge base is a rule set applied by expert system design, and comprises facts and data related to the rules, and the knowledge base is formed by the facts and the data. The knowledge base is related to a specific expert system, and the sharing problem of the knowledge base does not exist; the other refers to a knowledge base with consulting properties, which is shared and not unique to a family.
The structuralization refers to the process of organizing and organizing the gradually accumulated knowledge to make it organized, drawn and bound into a compendium.
S202: and triggering a conversation recording function to acquire a conversation content video.
Optionally, a user operation may be received here to trigger the session recording function, and a corresponding recording function may also be automatically triggered.
S203: and inputting the conversation content video, the target structured knowledge and the target template into a preset training model, and outputting to obtain a target multimedia material.
The preset training model is a model realized based on deep learning.
The preset training model is used for selecting video contents with high matching degree, deleting irrelevant segments in the conversation process, complementing the lack of contents according to corresponding structured knowledge, and generating screened and optimized multimedia materials.
S204: and splicing the target multimedia material to obtain a target video.
Optionally, the splicing processing is performed on the target multimedia material to obtain a target video, and the method includes:
and splicing the target multimedia material according to the target template to obtain a target video.
Here, the target multimedia material may be directly filled into the target template to obtain the target video, or the target template may be a format of the target template to perform splicing of the multimedia material.
Optionally, after the target multimedia material is spliced to obtain the target video, the method further includes:
and pushing the target video to the client of the user.
Here, after the target video is generated, the target video can be directly pushed to the client of the user, so that the user can conveniently obtain the desired message according to the target video, and the user experience is improved.
According to the method and the device, the target multimedia materials can be spliced according to the target template, so that the capacity of automatically generating the multimedia materials is achieved, the automatic construction of the multimedia video is realized, the labor is saved, the video output is improved, the labor and the material resources are further saved, the cost is also saved, and the video generation efficiency is improved.
The embodiment of the application can acquire the target structured knowledge and the target template selected by the user when the user needs to generate the video, then start the conversation recording function, acquire the video content to be generated according to the recorded conversation content video, input the conversation content video, the target structured knowledge and the target template selected by the user into the preset training model, and screen the multimedia material through the preset training model, thereby generating the target video according to the screened multimedia material, wherein, the embodiment of the application does not need to screen and generate the video material manually, can screen and generate the video material automatically, saves the cost of manpower and material resources, saves the time for generating the video, improves the efficiency of generating the video, and screens the video material through the training model aiming at the method for generating the video through the keyword in the prior art, meanwhile, the conversation content video recorded in real time is used as a parameter, the purpose that the video generated in the later period has higher fitness with the needed content style is achieved, and the quality of the generated video is guaranteed.
In a possible implementation manner, an embodiment of the present application provides a model that can be trained in advance so as to perform screening of multimedia material according to the model, and accordingly, fig. 3 is a schematic flow chart of another video generation method provided by the embodiment of the present application, as shown in fig. 3, the method includes:
s301: and responding to the video generation operation of the user, and acquiring the target structured knowledge and the target template selected by the user.
S302: and triggering a conversation recording function to acquire a conversation content video.
S303: and acquiring a plurality of multimedia videos in a preset knowledge base.
Optionally, a plurality of multimedia videos may be prestored in the preset knowledge base, and the preset knowledge base may be updated in real time, so as to increase or decrease the multimedia videos, thereby improving the quality of the preset knowledge base.
S304: and splitting the multimedia video to obtain a multimedia material sample corresponding to the multimedia video.
Optionally, splitting the multimedia video to obtain a multimedia material sample corresponding to the multimedia video, including:
judging whether the multimedia video is formed by splicing multimedia materials or not; and if the multimedia video is formed by splicing multimedia materials, splitting the multimedia video to obtain a plurality of multimedia material samples and material scores corresponding to the multimedia material samples in the template.
Correspondingly, model training is carried out according to the multimedia material samples and the multimedia video, and a preset training model is obtained, wherein the model training comprises the following steps: and performing model optimization on the training model according to the multimedia material samples and the material scores to obtain a preset training model.
Here, in the embodiment of the application, when the model is trained, it is first determined whether the multimedia video is formed by splicing multimedia materials, the multimedia video obtained by splicing the multimedia materials can be split to obtain each multimedia material sample and each material module score corresponding to the multimedia material sample, and the training model can be optimally trained through the data and the scores, so that an accurate and optimized preset training model is obtained, and the accuracy of the model weight and the quality of video generation are further ensured.
Optionally, splitting the multimedia video to obtain a multimedia material sample corresponding to the multimedia video, including:
judging whether the multimedia video is formed by splicing multimedia materials or not; if the multimedia video is not formed by splicing multimedia materials, acquiring a template from a preset knowledge base; splitting the multimedia video according to the template to obtain a plurality of multimedia materials, and acquiring a structured knowledge sample corresponding to the template; and extracting information from the plurality of multimedia materials to obtain the structural knowledge information corresponding to the multimedia video.
Correspondingly, model training is carried out according to the multimedia material samples and the multimedia video, and a preset training model is obtained, wherein the model training comprises the following steps: and inputting the structured knowledge sample, the multimedia video, the multimedia material and the structured knowledge information into a training model for training to obtain a preset training model.
Optionally, the information extraction processing is performed on a plurality of multimedia materials to obtain structured knowledge information corresponding to the multimedia video, and the method includes: and extracting images, videos and character contents in the multimedia material through a symbol numerology system based on data stream programming and a natural language processing identification technology to obtain structured knowledge information.
Here, if the multimedia video is not obtained by splicing, and if the multimedia video is obtained by adopting and editing by an editor, the multimedia video in structural association can be marked and split manually in the form of a template, the multimedia video is split into multimedia materials, and images, video contents and text contents in the split multimedia materials are extracted by using a tensrflow and natural language processing technology. And meanwhile, finding a part of the same template corresponding to the structured knowledge according to the split content, extracting the structured knowledge data of the part, binding the part of the structured knowledge data with the text content corresponding to the multimedia material to form a group of training data, distributing different weights to the corresponding image, video content and knowledge evaluation to be used as input parameters, and performing model training by using a deep learning technology.
The embodiment of the application identifies the images, videos and text contents in the multimedia material by using a symbolic mathematical system (TensorFlow) based on data flow programming and a natural language processing technology, can accurately extract the structured knowledge information in the multimedia material, and further ensures the accuracy of model weight and the quality of video generation.
Here, in the embodiment of the present application, during model training, it is first determined whether the multimedia video is formed by splicing multimedia materials, and for the multimedia video that is not formed by splicing the media materials, the multimedia video can be marked and split in the form of a template, the multimedia video can be split into multimedia materials, the structured knowledge information in the split multimedia materials can be extracted, meanwhile, according to the content after the splitting, finding the part of the same template corresponding to the structured knowledge, extracting the structured knowledge data of the part, binding the data with the character content of the corresponding multimedia material to form a group of training data, thereby being input into the training model for model training to obtain the preset training model and further obtain the accurate and optimized preset training model, the model can be used for automatically screening materials, and the accuracy of the model weight and the quality of video generation are further ensured.
S305: and performing model training according to the multimedia material sample and the multimedia video to obtain a preset training model.
S306: and inputting the conversation content video, the target structured knowledge and the target template into a preset training model, and outputting to obtain a target multimedia material.
S307: and splicing the target multimedia material to obtain a target video.
The embodiment of the application provides a training method of a preset training model, wherein a corresponding multimedia material sample is obtained by processing a multimedia video acquired from a preset knowledge base, training is performed according to the multimedia video and the multimedia material sample, and the preset training model capable of accurately screening video materials is obtained by taking knowledge in the preset knowledge base as reference, so that the quality of a generated video is ensured.
Fig. 4 is a schematic structural diagram of a video generating apparatus according to an embodiment of the present application, and as shown in fig. 4, the apparatus according to the embodiment of the present application includes: a first obtaining module 401, a first processing module 402, a second processing module 403 and a third processing module 404. The video generation device may be the processor 102 itself, or a chip or an integrated circuit that implements the functions of the processor 102. It should be noted here that the division of the first obtaining module 401, the first processing module 402, the second processing module 403, and the third processing module 404 is only a division of logical functions, and the two may be integrated or independent physically.
The first acquisition module is used for responding to video generation operation of a user and acquiring target structured knowledge and a target template selected by the user, wherein the target structured knowledge comprises rule information for generating a target video;
the first processing module is used for triggering a conversation recording function and acquiring a conversation content video;
the second processing module is used for inputting the conversation content video, the target structured knowledge and the target template into a preset training model and outputting to obtain a target multimedia material;
and the third processing module is used for splicing the target multimedia material to obtain a target video.
Optionally, before the second processing module inputs the dialog content video, the target structured knowledge and the target template into the preset training model, the apparatus further includes:
the second acquisition module is used for acquiring a plurality of multimedia videos in a preset knowledge base;
the splitting module is used for splitting the multimedia video to obtain a multimedia material sample corresponding to the multimedia video;
and the training module is used for carrying out model training according to the multimedia material samples and the multimedia video to obtain a preset training model.
Optionally, the splitting module is specifically configured to:
judging whether the multimedia video is formed by splicing multimedia materials or not;
if the multimedia video is formed by splicing multimedia materials, the multimedia video is split to obtain a plurality of multimedia material samples and material scores corresponding to the multimedia material samples in the template;
correspondingly, the training module is specifically configured to:
and performing model optimization on the training model according to the multimedia material samples and the material scores to obtain a preset training model.
Optionally, the splitting module is specifically configured to:
judging whether the multimedia video is formed by splicing multimedia materials or not;
if the multimedia video is not formed by splicing multimedia materials, acquiring a template from a preset knowledge base;
splitting the multimedia video according to the template to obtain a plurality of multimedia materials, and acquiring a structured knowledge sample corresponding to the template;
extracting information from the multimedia materials to obtain structural knowledge information corresponding to the multimedia video;
correspondingly, the training module is specifically configured to:
and inputting the structured knowledge sample, the multimedia video, the multimedia material and the structured knowledge information into a training model for training to obtain a preset training model.
Optionally, the splitting module is further specifically configured to:
and extracting images, videos and character contents in the multimedia material through a symbol numerology system based on data stream programming and a natural language processing identification technology to obtain structured knowledge information.
Optionally, the third processing module is specifically configured to:
and splicing the target multimedia material according to the target template to obtain a target video.
Optionally, after the third processing module performs splicing processing on the target multimedia material to obtain the target video, the apparatus further includes:
and the pushing module is used for pushing the target video to the client of the user.
Fig. 5 is a schematic structural diagram of a video generating device according to an embodiment of the present disclosure, where the video generating device may be the processor 102. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not limiting to the implementations of the present application described and/or claimed herein.
As shown in fig. 5, the video generating apparatus includes: a processor 501 and a memory 502, the various components being interconnected using different buses, and may be mounted on a common motherboard or in other manners as desired. The processor 501 may process instructions for execution within the video generating device, including instructions for graphical information stored in or on a memory for display on an external input/output device (such as a display device coupled to an interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. In fig. 5, one processor 501 is taken as an example.
The memory 502, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the methods of the video generation apparatus in the embodiments of the present application (for example, as shown in fig. 5, the first acquisition module 401, the first processing module 402, the second processing module 403, and the third processing module 404). The processor 501 executes various functional applications of the authentication platform and data processing, i.e., a method of implementing the video generation apparatus in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 502.
The video generation device may further include: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.
The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the video-generating apparatus, such as a touch screen, a keypad, a mouse, or a plurality of mouse buttons, a trackball, a joystick, or other input devices. The output device 504 may be an output device such as a display device of a video generation apparatus. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
The video generation device in the embodiment of the present application may be configured to execute the technical solutions in the method embodiments of the present application, and the implementation principles and technical effects are similar, which are not described herein again.
An embodiment of the present application further provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, the computer-readable storage medium is configured to implement the video generation method according to any one of the foregoing embodiments.
An embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program is configured to implement the video generation method of any one of the foregoing methods.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
Other embodiments of the present disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A method of video generation, comprising:
responding to video generation operation of a user, and acquiring target structured knowledge and a target template selected by the user, wherein the target structured knowledge comprises rule information for generating a target video;
triggering a conversation recording function to acquire a conversation content video;
inputting the conversation content video, the target structured knowledge and the target template into a preset training model, and outputting to obtain a target multimedia material;
and splicing the target multimedia material to obtain a target video.
2. The method of claim 1, further comprising, prior to said inputting said dialog content video, said target structured knowledge and said target template into a preset training model:
acquiring a plurality of multimedia videos in a preset knowledge base;
splitting the multimedia video to obtain a multimedia material sample corresponding to the multimedia video;
and performing model training according to the multimedia material sample and the multimedia video to obtain a preset training model.
3. The method according to claim 2, wherein the splitting the multimedia video to obtain a multimedia material sample corresponding to the multimedia video comprises:
judging whether the multimedia video is formed by splicing multimedia materials or not;
if the multimedia video is formed by splicing multimedia materials, splitting the multimedia video to obtain a plurality of multimedia material samples and material scores corresponding to the multimedia material samples in a template;
correspondingly, the performing model training according to the multimedia material sample and the multimedia video to obtain a preset training model comprises:
and performing model optimization on the training model according to the multimedia material sample and the material score to obtain a preset training model.
4. The method according to claim 2, wherein the splitting the multimedia video to obtain a multimedia material sample corresponding to the multimedia video comprises:
judging whether the multimedia video is formed by splicing multimedia materials or not;
if the multimedia video is not formed by splicing multimedia materials, acquiring a template from the preset knowledge base;
splitting the multimedia video according to a template to obtain a plurality of multimedia materials, and acquiring a structured knowledge sample corresponding to the template;
extracting information from the multimedia materials to obtain structural knowledge information corresponding to the multimedia video;
correspondingly, the performing model training according to the multimedia material sample and the multimedia video to obtain a preset training model comprises:
and inputting the structured knowledge sample, the multimedia video, the multimedia material and the structured knowledge information into a training model for training to obtain a preset training model.
5. The method of claim 4, wherein the performing information extraction on the plurality of multimedia materials to obtain the corresponding structured knowledge information of the multimedia video comprises:
and extracting images, videos and character contents in the multimedia material through a symbol numerology system based on data stream programming and a natural language processing identification technology to obtain the structured knowledge information.
6. The method according to any one of claims 1 to 5, wherein the splicing the target multimedia material to obtain the target video comprises:
and splicing the target multimedia material according to the target template to obtain a target video.
7. The method according to any one of claims 1 to 5, further comprising, after said splicing the target multimedia material to obtain a target video:
and pushing the target video to the client of the user.
8. A video generation apparatus, comprising:
the video processing device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for responding to video generation operation of a user and acquiring target structured knowledge and a target template selected by the user, and the target structured knowledge comprises rule information for generating a target video;
the first processing module is used for triggering a conversation recording function and acquiring a conversation content video;
the second processing module is used for inputting the conversation content video, the target structured knowledge and the target template into a preset training model and outputting to obtain a target multimedia material;
and the third processing module is used for splicing the target multimedia material to obtain a target video.
9. A video generation device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the video generation method of any of claims 1 to 7.
10. A computer-readable storage medium having computer-executable instructions stored therein, which when executed by a processor, are configured to implement the video generation method of any one of claims 1 to 7.
CN202111064510.1A 2021-09-10 2021-09-10 Video generation method, device, equipment and storage medium Active CN113794930B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111064510.1A CN113794930B (en) 2021-09-10 2021-09-10 Video generation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111064510.1A CN113794930B (en) 2021-09-10 2021-09-10 Video generation method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113794930A true CN113794930A (en) 2021-12-14
CN113794930B CN113794930B (en) 2023-11-24

Family

ID=79183264

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111064510.1A Active CN113794930B (en) 2021-09-10 2021-09-10 Video generation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113794930B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117891971B (en) * 2024-03-18 2024-05-14 吉林省通泰信息技术有限公司 Video editing system management method

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120021389A1 (en) * 2004-12-23 2012-01-26 Carl Isamu Wakamoto Interactive immersion system for movies, television, animation, music videos, language training, entertainment, video games and social networking
US20120195573A1 (en) * 2011-01-28 2012-08-02 Apple Inc. Video Defect Replacement
US20130272679A1 (en) * 2012-04-12 2013-10-17 Mario Luis Gomes Cavalcanti Video Generator System
US20170213469A1 (en) * 2016-01-25 2017-07-27 Wespeke, Inc. Digital media content extraction and natural language processing system
CN109120992A (en) * 2018-09-13 2019-01-01 北京金山安全软件有限公司 Video generation method and device, electronic equipment and storage medium
CN109660865A (en) * 2018-12-17 2019-04-19 杭州柚子街信息科技有限公司 Make method and device, medium and the electronic equipment of video tab automatically for video
CN109819179A (en) * 2019-03-21 2019-05-28 腾讯科技(深圳)有限公司 A kind of video clipping method and device
CN110855904A (en) * 2019-11-26 2020-02-28 Oppo广东移动通信有限公司 Video processing method, electronic device and storage medium
CN111105817A (en) * 2018-10-25 2020-05-05 国家新闻出版广电总局广播科学研究院 Training data generation method and device for intelligent program production
CN111209435A (en) * 2020-01-10 2020-05-29 上海摩象网络科技有限公司 Method and device for generating video data, electronic equipment and computer storage medium
CN111866585A (en) * 2020-06-22 2020-10-30 北京美摄网络科技有限公司 Video processing method and device
CN111914523A (en) * 2020-08-19 2020-11-10 腾讯科技(深圳)有限公司 Multimedia processing method and device based on artificial intelligence and electronic equipment
CN112073649A (en) * 2020-09-04 2020-12-11 北京字节跳动网络技术有限公司 Multimedia data processing method, multimedia data generating method and related equipment
CN112565825A (en) * 2020-12-02 2021-03-26 腾讯科技(深圳)有限公司 Video data processing method, device, equipment and medium
CN112784078A (en) * 2021-01-22 2021-05-11 哈尔滨玖楼科技有限公司 Video automatic editing method based on semantic recognition
CN113079326A (en) * 2020-01-06 2021-07-06 北京小米移动软件有限公司 Video editing method and device and storage medium

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120021389A1 (en) * 2004-12-23 2012-01-26 Carl Isamu Wakamoto Interactive immersion system for movies, television, animation, music videos, language training, entertainment, video games and social networking
US20120195573A1 (en) * 2011-01-28 2012-08-02 Apple Inc. Video Defect Replacement
US20130272679A1 (en) * 2012-04-12 2013-10-17 Mario Luis Gomes Cavalcanti Video Generator System
US20170213469A1 (en) * 2016-01-25 2017-07-27 Wespeke, Inc. Digital media content extraction and natural language processing system
CN109120992A (en) * 2018-09-13 2019-01-01 北京金山安全软件有限公司 Video generation method and device, electronic equipment and storage medium
CN111105817A (en) * 2018-10-25 2020-05-05 国家新闻出版广电总局广播科学研究院 Training data generation method and device for intelligent program production
CN109660865A (en) * 2018-12-17 2019-04-19 杭州柚子街信息科技有限公司 Make method and device, medium and the electronic equipment of video tab automatically for video
CN109819179A (en) * 2019-03-21 2019-05-28 腾讯科技(深圳)有限公司 A kind of video clipping method and device
CN110855904A (en) * 2019-11-26 2020-02-28 Oppo广东移动通信有限公司 Video processing method, electronic device and storage medium
CN113079326A (en) * 2020-01-06 2021-07-06 北京小米移动软件有限公司 Video editing method and device and storage medium
CN111209435A (en) * 2020-01-10 2020-05-29 上海摩象网络科技有限公司 Method and device for generating video data, electronic equipment and computer storage medium
CN111866585A (en) * 2020-06-22 2020-10-30 北京美摄网络科技有限公司 Video processing method and device
CN111914523A (en) * 2020-08-19 2020-11-10 腾讯科技(深圳)有限公司 Multimedia processing method and device based on artificial intelligence and electronic equipment
CN112073649A (en) * 2020-09-04 2020-12-11 北京字节跳动网络技术有限公司 Multimedia data processing method, multimedia data generating method and related equipment
CN112565825A (en) * 2020-12-02 2021-03-26 腾讯科技(深圳)有限公司 Video data processing method, device, equipment and medium
CN112784078A (en) * 2021-01-22 2021-05-11 哈尔滨玖楼科技有限公司 Video automatic editing method based on semantic recognition

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117891971B (en) * 2024-03-18 2024-05-14 吉林省通泰信息技术有限公司 Video editing system management method

Also Published As

Publication number Publication date
CN113794930B (en) 2023-11-24

Similar Documents

Publication Publication Date Title
CN110418151B (en) Bullet screen information sending and processing method, device, equipment and medium in live game
CN112698769B (en) Information interaction method, device, equipment, storage medium and program product
CN110493653B (en) Barrage play control method, device, equipment and storage medium
EP4345591A1 (en) Prop processing method and apparatus, and device and medium
US20230013601A1 (en) Program trial method, system, apparatus, and device, and medium
CN114339285B (en) Knowledge point processing method, video processing method, device and electronic equipment
CN113746874B (en) Voice package recommendation method, device, equipment and storage medium
CN113253880B (en) Method and device for processing pages of interaction scene and storage medium
CN111209417A (en) Information display method, server, terminal and storage medium
CN108491188A (en) The exploitative management method and device of voice dialogue product
CN112988185A (en) Cloud application updating method, device and system, electronic equipment and storage medium
CN112395027A (en) Widget interface generation method and device, storage medium and electronic equipment
CN113010698A (en) Multimedia interaction method, information interaction method, device, equipment and medium
CN114449327B (en) Video clip sharing method and device, electronic equipment and readable storage medium
CN110688569B (en) Information search method, device, medium and equipment
CN111767109A (en) H5 page display method and device based on terminal application and readable storage medium
CN109146540A (en) Monitoring method, mobile device and the server of the visible exposure of advertisement
CN113626624B (en) Resource identification method and related device
EP4124025A1 (en) Interaction information processing method and apparatus, electronic device and storage medium
CN112148395A (en) Page display method, device, equipment and storage medium
CN113986083A (en) File processing method and electronic equipment
CN107357481B (en) Message display method and message display device
CN105095398B (en) A kind of information providing method and device
CN113794930B (en) Video generation method, device, equipment and storage medium
CN115379136A (en) Special effect prop processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant