CN117135416A

CN117135416A - Video generation system based on AIGC

Info

Publication number: CN117135416A
Application number: CN202311393909.3A
Authority: CN
Inventors: 张卫平; 王丹; 邵胜博; 李显阔; 张伟
Original assignee: Global Digital Group Co Ltd
Current assignee: Global Digital Group Co Ltd
Priority date: 2023-10-26
Filing date: 2023-10-26
Publication date: 2023-11-28
Anticipated expiration: 2043-10-26
Also published as: CN117135416B

Abstract

The invention provides an AIGC-based video generation system, which comprises an information input module, a text analysis module, a model selection module and a video generation module, wherein the information input module is used for inputting requirement information of a user, the text analysis module is used for carrying out content analysis on the requirement information, the model selection module is used for selecting a basic model of a video, and the video generation module is used for processing the basic model to obtain a complete video; the system can intelligently identify text information input by a user, select a proper model from the model library, virtually shoot the processing process of the model to obtain a required video, greatly reduce the difficulty of video production and improve the efficiency of video production.

Description

Video generation system based on AIGC

Technical Field

The invention relates to the field of electric digital data processing, in particular to an AIGC-based video generation system.

Background

AIGC, i.e., artificial intelligence generation of content, is a field in which artificial intelligence technology is applied to create content. With the continuous development of deep learning and generating models, AIGC has made remarkable progress in various fields, from image generation to music creation to text writing and video generation, and has shown great potential, in the technology of AIGC technology, the difficulty of many works can be greatly reduced, and video production is one of the works, and a system is needed to improve the efficiency of video production on the basis of combining with the AIGC technology.

The foregoing discussion of the background art is intended to facilitate an understanding of the present invention only. This discussion is not an admission or admission that any of the material referred to was common general knowledge.

Many video production systems have been developed and found to have a system as disclosed in publication number CN113784167B, which generally includes: editing a behavior tree based on performance contents and interaction contents required by all the existing objects in the 3D scene by using a behavior tree editor, outputting corresponding behavior tree data, and enabling the interaction contents to be conditional nodes of the behavior tree; inputting behavior tree data in a pre-rendering host, and disassembling each segment of complete behavior tree data under a condition node to perform 3D real-time rendering and video recording to obtain a plurality of video files; simplifying the behavior tree data converted into the video file into a behavior of playing video, and obtaining simplified behavior tree data; and inputting the simplified behavior tree data and the video file into an interactive video player. However, the system still uses the traditional mode to make 3D video, and is low in efficiency, high in difficulty and has technical requirements for producers.

Disclosure of Invention

The invention aims to overcome the defects, and provides an AIGC-based video generation system.

The invention adopts the following technical scheme:

the AIGC-based video generation system comprises an information input module, a text analysis module, a model selection module and a video generation module;

the information input module is used for inputting the requirement information of a user, the text analysis module is used for carrying out content analysis on the requirement information, the model selection module is used for selecting a basic model of the video, and the video generation module is used for processing the basic model to obtain a complete video;

the text analysis module comprises a classification disassembly unit, a combination unit and a conversion unit, wherein the classification disassembly unit is used for disassembling text information into a plurality of keywords and classifying the keywords, the combination unit is used for combining the classified keywords based on the positions of descriptive keywords to generate three word packages, and the conversion unit is used for converting the keywords in the word packages into corresponding codes to form code packages;

the model selection module comprises a model library storage unit and a model matching unit, wherein the model library storage unit is used for storing a background model and an active subject model, and the model matching unit obtains the corresponding background model and active subject model based on the coding information in the coding packet;

the video generation module comprises a model placement unit, an action unit and a shooting unit, wherein the model placement unit is used for placing a background model and a movable main body model, the action unit enables the movable main body model to complete corresponding actions according to event coding packet information, and the shooting unit is used for shooting picture information of the movable main body model when the action is completed in the background model to obtain video data;

further, the classification disassembly unit comprises a keyword training processor, a keyword recognition processor and a keyword recording processor, wherein the keyword training processor is used for training a large number of texts to obtain recognition parameters of keywords, the recognition parameters are sent to the keyword recognition processor, the keyword recognition processor recognizes the keywords in the text information based on the recognition parameters, and each time a keyword is recognized, the keyword recognition processor sends the category and position information of the keyword and the keyword to the keyword recording processor, and the keyword recording processor is used for recording the received keyword information;

further, the model matching unit comprises an encoding processor, an annotation retrieval processor and a matching calculation processor, wherein the encoding processor is used for processing the received encoding packet, the annotation retrieval processor reduces the retrieval range of annotation information according to the processing result, and the matching calculation processor is used for carrying out matching calculation on the annotation information and the encoding packet;

further, the matching calculation processor calculates a matching degree Pm between the retrieved model annotation information and the encoded packet according to the following formula:

；

wherein Ic (i) represents an indication number of the ith body code contained in the annotation information, the indication number when contained is 1, the indication number when not contained is 0,description code number for the ith body,/-code number>The annotation information contains the number of the description codes corresponding to the ith main body code;

the matching calculation processor selects model annotation information with the largest matching degree, acquires a model from a model register according to a corresponding storage address and sends the model to the model placement unit or the action unit;

further, the process of generating the video by the video generation module comprises the following steps:

s31, the model placement unit performs initialization placement on a background model and a movable main body model;

s32, the action unit disassembles the event code packet to obtain a plurality of action instructions;

s33, the model placement unit performs action processing on the movable main body model based on action instructions to obtain end positions of the movable main body model under each action instruction, and node positions are formed by the initial positions and the end positions in sequence;

s34, the model placement unit calculates a time frame number according to the positions of two adjacent nodes;

s35, the model placement unit calculates the same number of frame supplementing positions according to the time length frame number;

s36, the model placement unit changes the movable main body model according to the sequence of the node positions and the plurality of frame supplementing positions, signals are sent to the shooting unit at each change sequence position, and the shooting unit shoots a frame of picture data after receiving the signals;

s37, the shooting unit combines all picture data to form video data.

The beneficial effects obtained by the invention are as follows:

the system carries out intelligent analysis on the demand information, divides an analysis result into a model result and an action result, screens out a proper model in a model library based on the model result, presents a model and a model action in a virtual space based on the action result, shoots the whole action process to realize the generation of a video, and a user only needs to input text information in the whole process without complex operation, thereby reducing the difficulty of video production and improving the efficiency of video production.

For a further understanding of the nature and the technical aspects of the present invention, reference should be made to the following detailed description of the invention and the accompanying drawings, which are provided for purposes of reference only and are not intended to limit the invention.

Drawings

FIG. 1 is a schematic diagram of the overall structural framework of the present invention;

FIG. 2 is a schematic diagram of a text parsing module according to the present invention;

FIG. 3 is a schematic diagram of a video generation module according to the present invention;

FIG. 4 is a schematic diagram of a model matching unit of the present invention;

fig. 5 is a schematic diagram of the model placement unit of the present invention.

Detailed Description

The following embodiments of the present invention are described in terms of specific examples, and those skilled in the art will appreciate the advantages and effects of the present invention from the disclosure herein. The invention is capable of other and different embodiments and its several details are capable of modification and variation in various respects, all without departing from the spirit of the present invention. The drawings of the present invention are merely schematic illustrations, and are not intended to be drawn to actual dimensions. The following embodiments will further illustrate the related art content of the present invention in detail, but the disclosure is not intended to limit the scope of the present invention.

Embodiment one: the embodiment provides a video generation system based on AIGC, which comprises an information input module, a text analysis module, a model selection module and a video generation module, and is combined with FIG. 1;

the classifying and disassembling unit comprises a keyword training processor, a keyword recognition processor and a keyword recording processor, wherein the keyword training processor is used for training a large number of texts to obtain recognition parameters of keywords, the recognition parameters are sent to the keyword recognition processor, the keyword recognition processor recognizes the keywords in the text information based on the recognition parameters, and each time a keyword is recognized, the keyword recognition processor sends the category and position information of the keyword and the keyword to the keyword recording processor, and the keyword recording processor is used for recording the received keyword information;

the model matching unit comprises an encoding processor, an annotation retrieval processor and a matching calculation processor, wherein the encoding processor is used for processing a received encoding packet, the annotation retrieval processor reduces the retrieval range of annotation information according to a processing result, and the matching calculation processor is used for carrying out matching calculation on the annotation information and the encoding packet;

the matching calculation processor calculates the matching degree Pm between the retrieved model annotation information and the coding packet according to the following formula:

；

the process of generating the video by the video generation module comprises the following steps:

s37, the shooting unit combines all picture data to form video data.

Embodiment two: the embodiment includes the whole content in the first embodiment, and provides a video generation system based on AIGC, which comprises an information input module, a text analysis module, a model selection module and a video generation module;

the information input module comprises an interactive display unit and an information acquisition unit, wherein the interactive display unit is used for displaying window information and inputting text information in a window through input equipment, and the information acquisition unit is used for acquiring the text information in the window and sending the text information to the text analysis module;

referring to fig. 2, the text parsing module includes a classification disassembly unit, a combination unit and a conversion unit, where the classification disassembly unit is configured to disassemble text information into a plurality of keywords and divide the keywords into descriptive keywords, static main keywords, dynamic main keywords and event keywords, the combination unit combines descriptive keyword codes with the static main keywords, the dynamic main keywords and the event keywords respectively based on the positions of the descriptive keywords to generate three word packages, and the conversion unit is configured to convert the keywords in the word packages into corresponding codes to form code packages, which are respectively called a background code package, an activity code package and an event code package, and the background code package and the activity code package are sent to the model selection module, and the event code package is sent to the video generation module;

the model selection module comprises a model library storage unit and a model matching unit, wherein the model library storage unit is used for storing a background model and an active subject model, the model matching unit obtains a corresponding background model and an active subject model based on the coding information in the coding packet, and model data obtained by matching is sent to the video generation module;

referring to fig. 3, the video generating module includes a model placement unit, an action unit and a shooting unit, where the model placement unit is used to place a background model and a movable main body model, the action unit makes the movable main body model complete corresponding actions according to event coding packet information, and the shooting unit is used to shoot picture information of the movable main body model when the action is completed in the background model to obtain video data;

the classifying and disassembling unit comprises a keyword training processor, a keyword recognition processor and a keyword recording processor, wherein the keyword training processor trains a large number of texts based on an AIGC technology to obtain recognition parameters of keywords, the recognition parameters are sent to the keyword recognition processor, the keyword recognition processor recognizes the keywords in the text information based on the recognition parameters, and each time a keyword is recognized, the keyword recognition processor sends the category and position information of the keyword and the keyword to the keyword recording processor, and the keyword recording processor is used for recording the received keyword information;

the process of combining the keywords by the combining unit comprises the following steps:

s1, selecting descriptive keywords in the keyword recording processor as first target keywords, using the rest keywords as second target keywords, and establishing a description package for each second target keyword;

s2, selecting a first target keyword, and calculating a combination index Cb with the rest second target keywords respectively:

；

wherein Np1 is the position number of the first target keyword, np2 is the position number of the second target keyword,the description association coefficients of the first target keyword and the second target keyword are obtained;

s3, adding the first target keywords into a description packet of a second target keyword with the smallest combination index;

s4, repeating the step S2 and the step S3 until all the first target keywords are added into the description package;

s5, integrating all the static main body keywords and the description packages into a word package, integrating all the dynamic main body keywords and the description packages into a word package, and integrating all the event keywords and the description packages into a word package;

the conversion unit comprises a code mapping register and a conversion processor, wherein the code mapping register is used for storing the mapping relation between keywords and codes, one code can map a plurality of keywords, and the conversion processor converts the keywords in the word packet into corresponding codes according to the mapping relation;

the model storage unit comprises a model register and a model annotation register, the model register divides a model into a background model and an active subject model, the model is stored in a single model unit, and the model annotation register is used for recording annotation information of each model, the type of the corresponding model and a storage address in the model register;

referring to fig. 4, the model matching unit includes an encoding processor, an annotation retrieval processor, and a matching calculation processor, where the encoding processor is configured to process a received encoded packet, the annotation retrieval processor reduces a retrieval range of annotation information according to a processing result, and the matching calculation processor is configured to perform matching calculation on the annotation information and the encoded packet;

the codes in the code package are divided into a main code and a description code, wherein the main code refers to the code converted by the keyword in the non-description package, the description code refers to the code converted by the keyword in the description package, and the main code sequentially represents the classification fineness from the first position to the last position;

the process of processing the main body code in the code packet by the code processor comprises the following steps:

s21, acquiring a main body code in a code packet;

s22, selecting two main body codes with the smallest difference value, and respectively marking the two main body codes as Cd (1) and Cd (2), wherein the average value of the two main body codes is Cd';

s23, sorting the rest main body codes according to the absolute value of the difference value with Cd' from small to large, respectively marking the main body codes as Cd (3), cd (4), and Cd (m), wherein m is the number of the main body codes in the code packet;

s24, determining a search code set according to Cd (1) and Cd (2), wherein the search code set comprises search codes which have the same bit number as the main body code and are a plurality of continuous 0S; the search code Sc satisfies the following relation:

；

wherein n is the number of continuous 0 s of the last bit of the search code, and Ys is a search threshold;

s25, calculating a retrieval index Ps of each retrieval code according to the following formula:

；

wherein,is a range control parameter, and is set by a user;

s26, sending the retrieval code with the minimum retrieval index to the annotation retrieval processor;

；

s37, the shooting unit combines all picture data to form video data;

referring to fig. 5, the model placement unit includes a model display processor, an action processor, and a position processor, where the model display processor is configured to provide a virtual space for placing a model, the action processor is configured to execute an action instruction to make the model generate an action, and the position processor is configured to record position information of the model and perform calculation processing according to the position information;

the position processor calculates a time frame number T according to the following formula:

；

wherein L is the maximum moving distance of the model at two node positions, and v is the single frame moving speed of the action instruction;

the position processor calculates the frame compensating position according to the following formula：

；

Wherein,for the previous node position, +.>For the latter node position, j is the complement frame number.

The foregoing disclosure is only a preferred embodiment of the present invention and is not intended to limit the scope of the invention, so that all equivalent technical changes made by applying the description of the present invention and the accompanying drawings are included in the scope of the present invention, and in addition, elements in the present invention can be updated as the technology develops.

Claims

1. The AIGC-based video generation system is characterized by comprising an information input module, a text analysis module, a model selection module and a video generation module;

the video generation module comprises a model placement unit, an action unit and a shooting unit, wherein the model placement unit is used for placing a background model and an active main body model, the action unit enables the active main body model to complete corresponding actions according to event coding packet information, and the shooting unit is used for shooting picture information of the active main body model when the active main body model completes actions in the background model to obtain video data.

2. The AIGC-based video generation system of claim 1, wherein the classification disassembly unit includes a keyword training processor for training a plurality of texts to obtain recognition parameters of keywords, a keyword recognition processor for recognizing keywords in text information based on the recognition parameters, and a keyword recording processor for recording received keyword information by transmitting the category and position information of the keywords and the keywords themselves to each time one keyword is recognized.

3. The AIGC-based video generation system of claim 2, wherein the model matching unit includes an encoding processor for processing the received encoded packets, an annotation retrieval processor for narrowing retrieval range of annotation information according to the processing result, and a matching calculation processor for performing matching calculation of the annotation information with the encoded packets.

4. The AIGC-based video generation system of claim 3, wherein the matching calculation processor calculates the matching degree Pm between the retrieved model annotation information and the encoded package according to the following formula:

；

and the matching calculation processor selects model annotation information with the maximum matching degree, acquires a model from a model register according to the corresponding storage address and sends the model to the model placement unit or the action unit.

5. The AIGC-based video generation system of claim 4, wherein the process of generating video by the video generation module comprises the steps of:

s37, the shooting unit combines all picture data to form video data.