CN117676270A - Video data generation method, device, computer equipment and storage medium - Google Patents

Video data generation method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN117676270A
CN117676270A CN202311674444.9A CN202311674444A CN117676270A CN 117676270 A CN117676270 A CN 117676270A CN 202311674444 A CN202311674444 A CN 202311674444A CN 117676270 A CN117676270 A CN 117676270A
Authority
CN
China
Prior art keywords
video
image data
text
image
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311674444.9A
Other languages
Chinese (zh)
Inventor
高伟
施好健
钟春彬
蔡为彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202311674444.9A priority Critical patent/CN117676270A/en
Publication of CN117676270A publication Critical patent/CN117676270A/en
Pending legal-status Critical Current

Links

Abstract

The application relates to a video data generation method, a video data generation device, a video data generation computer device, a video data storage medium and a video data generation computer program product, and relates to the technical field of artificial intelligence or the financial field so as to improve the generation efficiency and the generation quality of video data. The method comprises the following steps: acquiring a video text matched with a target video type; inputting the video text into an image generation model corresponding to the target video type to obtain image data corresponding to the video text; performing style migration processing on the image data to obtain a processed image of the image data; and performing video generation processing on the processed image to obtain video data corresponding to the target video type.

Description

Video data generation method, device, computer equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a method, an apparatus, a computer device, a storage medium, and a computer program product for generating video data.
Background
Video analysis is often required in off-line places of financial institutions such as banking sites and treasury, but currently, there are fewer public data sets about off-line scenes of the financial institutions, and video data is not enough in diversity, such as video data sets about double-record signature actions. The financial scene video data set with small data volume and low data diversity is used for training the artificial intelligent model based on data driving, and the model performance is poor and the accuracy is low due to insufficient learned sample volume.
In the conventional technology, a video data set is often increased by adding a camera to an on-line scene. However, the mode of adding the camera to the scene under the line has the advantages that the types of the video which can be acquired are limited, video contents which are irrelevant to the service scene are frequently clamped in the video, the effective video quantity is less, and the efficiency of acquiring the video data is still not high.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a video data generating method, apparatus, computer device, computer readable storage medium, and computer program product that can improve the efficiency of generating video data of a scene under a financial industry line.
In a first aspect, the present application provides a video data generation method. The method comprises the following steps:
acquiring a video text matched with a target video type;
inputting the video text into an image generation model corresponding to the target video type to obtain image data corresponding to the video text;
performing style migration processing on the image data to obtain a processed image of the image data;
and performing video generation processing on the processed image to obtain video data corresponding to the target video type.
In one embodiment, the image generation model corresponding to the target video type is trained as follows:
acquiring sample image data of the target video type and labeling text of the sample image data;
inputting the labeling text into an image generation model to be trained to obtain predicted image data corresponding to the labeling text;
and carrying out iterative updating on the image generation model to be trained according to the difference between the predicted image data and the sample image data to obtain the image generation model corresponding to the target video type.
In one embodiment, obtaining sample image data of the target video type and annotation text of the sample image data includes:
acquiring historical video data of the target video type;
performing image extraction processing on the historical video data to obtain sample image data;
and performing text labeling processing on the sample image data to obtain labeled text of the sample image data.
In one embodiment, performing text labeling processing on the sample image data to obtain labeled text of the sample image data, including:
performing target detection processing on the sample image data to obtain client position information and customer service position information in the sample image data;
according to the client position information, performing text labeling processing on the client in the sample image data to obtain client labeling information;
performing text labeling processing on customer service in the sample image data according to the customer service position information to obtain customer service labeling information;
and obtaining the labeling text of the sample image data according to the client labeling information and the customer service labeling information.
In one embodiment, performing style migration processing on the image data to obtain a processed image of the image data, including:
acquiring image style information to be processed of the image data;
and inputting the image data and the image style information into a style migration model to obtain a processed image corresponding to the image data and the image style information.
In one embodiment, performing video generation processing on the processed image to obtain video data corresponding to the target video type, where the video generation processing includes:
acquiring a pre-trained video generation model;
and inputting the processed image and the target video type into the pre-trained video generation model to obtain video data corresponding to the target video type.
In one embodiment, inputting the processed image and the target video type into the pre-trained video generation model to obtain video data corresponding to the target video type, where the video data comprises:
performing latent space mapping processing on the processed image to obtain potential characteristics of the processed image;
setting the potential features and the target video type as constraints of the pre-trained video generation model;
and processing the processed image through the pre-trained video generation model based on the constraint condition to obtain video data corresponding to the target video type.
In a second aspect, the present application further provides a video data generating apparatus. The device comprises:
the text acquisition module is used for acquiring video texts matched with the target video types;
the image generation module is used for inputting the video text into an image generation model corresponding to the target video type to obtain image data corresponding to the video text;
the style migration module is used for performing style migration processing on the image data to obtain a processed image of the image data;
and the video generation module is used for carrying out video generation processing on the processed image to obtain video data corresponding to the target video type.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:
acquiring a video text matched with a target video type;
inputting the video text into an image generation model corresponding to the target video type to obtain image data corresponding to the video text;
performing style migration processing on the image data to obtain a processed image of the image data;
and performing video generation processing on the processed image to obtain video data corresponding to the target video type.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring a video text matched with a target video type;
inputting the video text into an image generation model corresponding to the target video type to obtain image data corresponding to the video text;
performing style migration processing on the image data to obtain a processed image of the image data;
and performing video generation processing on the processed image to obtain video data corresponding to the target video type.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:
acquiring a video text matched with a target video type;
inputting the video text into an image generation model corresponding to the target video type to obtain image data corresponding to the video text;
performing style migration processing on the image data to obtain a processed image of the image data;
and performing video generation processing on the processed image to obtain video data corresponding to the target video type.
The video data generating method, the video data generating device, the computer equipment, the storage medium and the computer program product acquire video texts matched with the target video types; inputting the video text into an image generation model corresponding to the target video type to obtain image data corresponding to the video text; performing style migration processing on the image data to obtain a processed image of the image data; and performing video generation processing on the processed image to obtain video data corresponding to the target video type. By adopting the method, the technologies of generating images by using texts, style migration, conditional images to videos and the like are utilized, so that the corresponding video data can be generated by using the video texts of the target video type, the generation efficiency of the video data is improved, and reliable data is provided for a video analysis model of a scene under a financial industry line.
Drawings
FIG. 1 is a flow diagram of a video data generation method in one embodiment;
FIG. 2 is a flowchart illustrating steps for obtaining sample image data of a target video type and labeling text of the sample image data in one embodiment;
FIG. 3 is a flowchart illustrating steps for obtaining video data corresponding to a target video type according to an embodiment;
FIG. 4 is a flow chart of a video data generating method according to another embodiment;
FIG. 5 is a flow chart of a video data generation method according to yet another embodiment;
FIG. 6 is a block diagram showing the construction of a video data generating apparatus in one embodiment;
fig. 7 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
It should be noted that, related information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) related to the present application are information and data which are fully authorized by each party, and the processes of collection, use, storage, processing, transmission, provision, disclosure, application, etc. of related data all conform to related regulations, necessary security measures are adopted, no prejudice to public order colloquiality is not generated, and corresponding operation entries are provided for users to select authorization or rejection.
In one embodiment, as shown in fig. 1, a video data generating method is provided, where the method is applied to a terminal to illustrate the method, it is understood that the method may also be applied to a server, and may also be applied to a system including the terminal and the server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:
step S101, obtaining a video text matching the target video type.
Wherein the target video type refers to the type of video data that is desired to be generated. For example, if it is desired to generate a video data set of signature actions in a bank double-record scene, the target video type may be the signature action type in the bank double-record scene.
Wherein the video text refers to text data describing the content of video data that is desired to be generated.
Specifically, the type of the video data to be generated is determined according to the requirement of the service scene, for example, the type of signature action in the website double-record scene can be determined, then the target video type can be input into the terminal, the terminal can also provide options of the video type, and the target video type is selected from the options. And inputting a corresponding text describing the video to the terminal according to the selected target video type, and obtaining the video text matched with the target video type by the terminal.
Step S102, inputting the video text into an image generation model corresponding to the target video type to obtain image data corresponding to the video text.
The image generation model refers to a model capable of generating corresponding image data according to input text data (such as video text). The image generation model may be an offset diffusion model.
Specifically, the terminal trains in advance according to sample image data of the target video type and labels of the sample image data to obtain an image generation model corresponding to the target video type. And the terminal inputs the video text into an image generation model corresponding to the target video type, so as to perform image generation processing on the video text through the image generation model, and output image data corresponding to the video text.
Step S103, performing style migration processing on the image data to obtain a processed image of the image data.
The processed image is image data in which the style of the image is changed but the content of the image is not changed.
In order to improve the diversity of the generated video data, the image data can be processed in different image styles so as to obtain processed images in different image styles. Specifically, the terminal firstly determines a desired image style, namely, determines image style information of image data; and then performing style migration processing on the image data according to the image style information, so that the terminal obtains a processed image of the image data.
Step S104, video generation processing is carried out on the processed image, and video data corresponding to the target video type is obtained.
Specifically, the terminal may pre-train the video generation model based on a deep learning algorithm to convert the processed image data into video data corresponding to the target video type using the video generation model. The terminal may also convert the processed image into video data using a video coding tool.
In the video data generation method, a video text matched with a target video type is acquired; inputting the video text into an image generation model corresponding to the target video type to obtain image data corresponding to the video text; performing style migration processing on the image data to obtain a processed image of the image data; and performing video generation processing on the processed image to obtain video data corresponding to the target video type. By adopting the method, the technologies of generating images by using texts, style migration, conditional images to videos and the like are utilized, so that the corresponding video data can be generated by using the video texts of the target video type, the generation efficiency of the video data is improved, and reliable data is provided for a video analysis model of a scene under a financial industry line.
In one embodiment, the image generation model corresponding to the target video type in step S102 may be trained as follows: acquiring sample image data of a target video type and labeling text of the sample image data; inputting the labeling text into an image generation model to be trained, and obtaining predicted image data corresponding to the labeling text; and carrying out iterative updating on the image generation model to be trained according to the difference between the predicted image data and the sample image data to obtain the image generation model corresponding to the target video type.
The labeling text refers to text information describing key events or key tasks in the sample image data.
Specifically, a terminal firstly collects a historical image of a target video type and sets the historical image as sample image data; and then, carrying out text labeling on the sample image data to obtain labeled text of the sample image data. The sample image data and the marked text are formed into a text-image pair, namely the marked text is used as input data, and the sample image data is used as a label. The text-image pair is utilized to carry out iterative training on the image generation model to be trained, namely, the labeling text is input into the image generation model to be trained, so that predicted image data corresponding to the labeling text is output through the image generation model to be trained; the terminal further determines a loss value of the image generation model to be trained according to the difference between the predicted image data and the sample image data; and carrying out iterative updating on model parameters of the image generation model to be trained according to the loss value, obtaining the image generation model after training is completed by the terminal after a plurality of iterations, and setting the image generation model as the image generation model corresponding to the target video type.
In practical application, the image generation model can be constructed based on an offset diffusion model, and compared with the image generation model constructed based on DALL-E2 (which is an OpenAI text generation image system), the offset diffusion model can generate image embedding features with higher quality more effectively, so that the quality and accuracy of image data obtained through processing can be improved.
In the embodiment, the image generation model to be trained is iteratively updated through the sample image data of the target video type and the labeling text of the sample image data, so that the trained image generation model can accurately convert the text data of the target video type into the corresponding image data, and a foundation is laid for the subsequent use of the image data to generate the video data.
In one embodiment, as shown in fig. 2, sample image data of a target video type and a labeling text of the sample image data are obtained, which specifically includes the following contents:
step S201, obtain the historical video data of the target video type.
Step S202, performing image extraction processing on the historical video data to obtain sample image data.
And step S203, performing text labeling processing on the sample image data to obtain labeled text of the sample image data.
Specifically, the terminal may obtain the historical video data of the target video type from a database or a cloud server. The terminal extracts sample image data from the historical video data, which can be obtained by cutting out the sample image data from the historical video data, or can be obtained by screening out the sample image data with the association degree meeting the preset association degree condition with the target video type from the video frames of the historical video data. And the terminal performs text labeling processing on the key event or the characteristic of the key person in the sample image data to obtain a labeled text of the sample image data.
For example, when an image generation model corresponding to a signature action of a bank double-record scene is required, the terminal can acquire historical video data of a historical signature action from a database with the bank double-record scene; then, a signature action image is intercepted from the historical video data, and the signature action image is used as sample image data.
In the present embodiment, the sample image data is extracted by extracting from the history video data of the target video type; and further, the sample image data is subjected to text labeling processing to obtain a labeled text of the sample image data, so that reasonable acquisition of the sample image data and the labeled text is realized, and the sample image data and the labeled text can be used for training in the subsequent steps to obtain an image generation model.
In one embodiment, the step S203 performs a text labeling process on the sample image data to obtain a labeled text of the sample image data, which specifically includes the following contents: performing target detection processing on the sample image data to obtain client position information and customer service position information in the sample image data; according to the client position information, performing text labeling processing on the client in the sample image data to obtain client labeling information; according to the customer service position information, performing text labeling processing on the customer service in the sample image data to obtain customer service labeling information; and obtaining the labeling text of the sample image data according to the customer labeling information and the customer service labeling information.
Specifically, the terminal may perform target detection processing on a key person or a key event associated with a target video type in the sample image data through the target detection model. Taking the signature action type in the above embodiment as an example, the key characters associated with the signature type include clients and customer service managers (simply referred to as customer service). The terminal can detect the position and image characteristics of the client in the sample image data through the target detection model, and then the terminal obtains the client position information and the client characteristic information; the terminal performs text labeling processing on the client in the sample image data according to the client position information and the client characteristic information to obtain client labeling information, for example, the client labeling information comprises whether the client wears glasses, a hairstyle, which hand signs, the position of a signature pixel in the sample image data, the position of the client in the sample image data and the like. Similarly, the terminal detects the position and image characteristics of customer service in sample image data through a target detection model, and then the terminal obtains customer service position information and customer service characteristic information; and the terminal performs text labeling processing on the customer service in the sample image data according to the customer service position information and the customer service characteristic information to obtain customer service labeling information, for example, the customer service labeling information comprises whether the customer service wears glasses, a hairstyle, the position of the customer service in the sample image data and the like. And the terminal fuses the customer labeling information and the customer service labeling information to generate a labeling text of the sample image data.
In the embodiment, according to the client position information in the sample image data, text labeling processing is carried out on clients in the sample image data to obtain client labeling information; according to customer service position information in the sample image data, performing text labeling processing on customer service in the sample image data to obtain customer service labeling information; and further, labeling text of the sample image data is obtained according to the customer labeling information and the customer service labeling information, labeling of the sample image data is realized by utilizing relevant characteristics and position information of customers and customer service of the sample image data, and reliable sample data is provided for training of an image generation model.
In one embodiment, the step S103 performs style migration processing on the image data to obtain a processed image of the image data, which specifically includes the following contents: acquiring image style information to be processed of image data; and inputting the image data and the image style information into a style migration model to obtain a processed image of which the image data corresponds to the image style information.
The image style information is information describing the image style that is desired to be processed. For example, the image style information includes time information and illumination information.
Specifically, the terminal may screen a plurality of time periods and a plurality of illumination conditions as a plurality of pieces of image style information to be processed of the image data; the terminal may input each time period and each lighting condition, and the image data into the style migration model, respectively, to convert the image data into processed images under each time period and each lighting condition through the style migration model. Wherein, the style migration model can be a model based on deep learning; for example, the style migration model may be based on neural network training.
In this embodiment, the image data and the image style information are input into the style migration model, so as to obtain the processed image corresponding to the image data and the image style information, so that the conversion of the image data of different image styles is realized, the diversity of the processed image is effectively improved, and the diversity of the subsequently generated video data is facilitated to be improved.
In one embodiment, the step S104 performs video generation processing on the processed image to obtain video data corresponding to the target video type, and specifically includes the following contents: acquiring a pre-trained video generation model; and inputting the processed image and the target video type into a pre-trained video generation model to obtain video data corresponding to the target video type.
In particular, the terminal may pre-train the video generation model based on a deep learning algorithm. The terminal takes the target video type as a constraint condition of a video generation model; and converting the processed image data into video data corresponding to the target video type by using a video generation model carrying constraint conditions. The video generation model can be constructed based on a subsurface flow diffusion model (Latent Flow Diffusion Models, LFDM).
In this embodiment, compared with the video synthesis performed by directly using the processed image, the video generation performed on the processed image with the constraint condition (i.e., the target video type) by using the pre-trained video generation model can better utilize the spatial content in the processed image, thereby more accurately generating video data in different image styles and improving the synthesis accuracy of the video data.
In one embodiment, as shown in fig. 3, the processed image and the target video type are input into a pre-trained video generation model to obtain video data corresponding to the target video type, which specifically includes the following contents:
step S301, performing latent space mapping processing on the processed image to obtain latent features of the processed image.
Step S302, setting the potential characteristics and the target video type as constraints of a pre-trained video generation model.
Step S303, processing the processed image through a pre-trained video generation model based on constraint conditions to obtain video data corresponding to the target video type.
Where potential features refer to features that are implicit in the image.
Specifically, the terminal uses an encoder to map the processed image to an object in the latent space, resulting in a latent feature of the processed image. The potential features and target video types are set as constraints for the pre-trained video generation model. The pre-trained video generation model converts the processed image into an optical flow chart sequence and an occlusion chart sequence on a potential space based on constraint conditions; then, utilizing the optical flow diagram sequence and the shielding diagram sequence to deform the potential features in the potential space so as to obtain a corresponding potential space feature sequence; and decoding the latent space sequence by using a decoder to obtain video data corresponding to the target video type.
In the embodiment, the model is generated based on the adaptation of the subsurface flow diffusion model, the potential characteristics of the processed image and the target video type are taken as constraint conditions, the processed image is deformed in the potential space, so that the video data with more details and richer characteristics are synthesized, and the accuracy and quality of the synthesized video data are greatly improved.
In one embodiment, as shown in fig. 4, another video data generating method is provided, and the method is applied to a terminal for illustration, and includes the following steps:
step S401, obtaining video text matched with the target video type.
Step S402, inputting the video text into an image generation model corresponding to the target video type to obtain image data corresponding to the video text.
Step S403, obtaining image style information to be processed of the image data.
Step S404, inputting the image data and the image style information into a style migration model to obtain a processed image corresponding to the image data and the image style information.
Step S405, a pre-trained video generation model is acquired.
Step S406, the processed image and the target video type are input into a pre-trained video generation model, and video data corresponding to the target video type is obtained.
The video data generation method can realize the following beneficial effects: by utilizing the technologies of text generation image, style migration, conditional image to video and the like, the corresponding video data can be generated through the video text of the target video type, the generation efficiency of the video data is improved, and reliable data is provided for a video analysis model of a scene under a financial industry line.
In order to more clearly clarify the video data generating method provided by the embodiments of the present disclosure, the video data generating method will be specifically described in the following with a specific embodiment. As shown in fig. 5, there is provided still another video data generating method, which may be applied to a terminal, including the following:
step S501, selecting a video data set type to be generated. The terminal can determine the data level type to be enhanced according to the service scene requirement, such as signature action data sets in a double-record scene of the website.
Step S502, according to the type of the video data set, a section of text is input into the text generation image model, and a picture under a corresponding scene, such as a signature action picture, is obtained.
Step S503, performing style migration on the obtained picture to obtain processed pictures in different time periods and under different illumination conditions, thereby realizing expansion of picture diversity. By taking signature action pictures as an example, the terminal inputs signature action pictures generated by using texts and different time and different illumination conditions of histories into a style migration model at the same time, so that processed pictures under different time periods and different illumination conditions can be obtained.
In step S504, the obtained processed pictures and video data set types (for example, signature actions) under different illumination conditions in different time periods are input into a conditional image-to-video model (for example, based on an undercurrent diffusion model), so as to generate signature action videos of different illumination in different time periods.
Step S505, repeating the steps S501 to S504 to obtain signature action video data sets under double-record scenes with multiple dimensions of different clients, different customer services and the like.
In the embodiment, the technologies of text generation image, style migration, conditional image to video and the like are utilized, so that corresponding video data can be generated through the video text of the target video type, the generation efficiency of the video data is improved, and reliable data is provided for a video analysis model of a scene under a financial industry line.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a video data generating device for realizing the video data generating method. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitation in the embodiments of the video data generating apparatus provided below may refer to the limitation of the video data generating method described above, and will not be repeated here.
In one embodiment, as shown in fig. 6, there is provided a video data generating apparatus 600 including: a text acquisition module 601, an image generation module 602, a style migration module 603, and a video generation module 604, wherein:
the text obtaining module 601 is configured to obtain video text that matches the target video type.
The image generation module 602 is configured to input the video text into an image generation model corresponding to the target video type, so as to obtain image data corresponding to the video text.
The style migration module 603 is configured to perform style migration processing on the image data, so as to obtain a processed image of the image data.
The video generating module 604 is configured to perform video generating processing on the processed image, so as to obtain video data corresponding to the target video type.
In one embodiment, the video data generating apparatus 600 further includes a model training module for acquiring sample image data of the target video type and a labeling text of the sample image data; inputting the labeling text into an image generation model to be trained, and obtaining predicted image data corresponding to the labeling text; and carrying out iterative updating on the image generation model to be trained according to the difference between the predicted image data and the sample image data to obtain the image generation model corresponding to the target video type.
In one embodiment, the video data generating apparatus 600 further includes a sample acquisition module for acquiring historical video data of the target video type; carrying out image extraction processing on the historical video data to obtain sample image data; and performing text labeling processing on the sample image data to obtain labeled text of the sample image data.
In one embodiment, the video data generating apparatus 600 further includes a text labeling module, configured to perform a target detection process on the sample image data, so as to obtain client location information and customer service location information in the sample image data; according to the client position information, performing text labeling processing on the client in the sample image data to obtain client labeling information; according to the customer service position information, performing text labeling processing on the customer service in the sample image data to obtain customer service labeling information; and obtaining the labeling text of the sample image data according to the customer labeling information and the customer service labeling information.
In one embodiment, the style migration module 603 is further configured to obtain image style information to be processed of the image data; and inputting the image data and the image style information into a style migration model to obtain a processed image of which the image data corresponds to the image style information.
In one embodiment, the video generation module 604 is further configured to obtain a pre-trained video generation model; and inputting the processed image and the target video type into a pre-trained video generation model to obtain video data corresponding to the target video type.
In one embodiment, the video data generating apparatus 600 further includes a conditional image module, configured to perform a latent spatial mapping process on the processed image to obtain a latent feature of the processed image; setting potential characteristics and target video types as constraint conditions of a pre-trained video generation model; and processing the processed image through a pre-trained video generation model based on constraint conditions to obtain video data corresponding to the target video type.
The respective modules in the above-described video data generating apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a video data generation method. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in fig. 7 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (11)

1. A method of generating video data, the method comprising:
acquiring a video text matched with a target video type;
inputting the video text into an image generation model corresponding to the target video type to obtain image data corresponding to the video text;
performing style migration processing on the image data to obtain a processed image of the image data;
and performing video generation processing on the processed image to obtain video data corresponding to the target video type.
2. The method of claim 1, wherein the image generation model corresponding to the target video type is trained by:
acquiring sample image data of the target video type and labeling text of the sample image data;
inputting the labeling text into an image generation model to be trained to obtain predicted image data corresponding to the labeling text;
and carrying out iterative updating on the image generation model to be trained according to the difference between the predicted image data and the sample image data to obtain the image generation model corresponding to the target video type.
3. The method of claim 2, wherein the obtaining sample image data of the target video type and the annotation text of the sample image data comprises:
acquiring historical video data of the target video type;
performing image extraction processing on the historical video data to obtain sample image data;
and performing text labeling processing on the sample image data to obtain labeled text of the sample image data.
4. A method according to claim 3, wherein the text labeling the sample image data to obtain labeled text of the sample image data comprises:
performing target detection processing on the sample image data to obtain client position information and customer service position information in the sample image data;
according to the client position information, performing text labeling processing on the client in the sample image data to obtain client labeling information;
performing text labeling processing on customer service in the sample image data according to the customer service position information to obtain customer service labeling information;
and obtaining the labeling text of the sample image data according to the client labeling information and the customer service labeling information.
5. The method according to claim 1, wherein performing style migration processing on the image data to obtain a processed image of the image data comprises:
acquiring image style information to be processed of the image data;
and inputting the image data and the image style information into a style migration model to obtain a processed image corresponding to the image data and the image style information.
6. The method according to claim 1, wherein the performing video generation processing on the processed image to obtain video data corresponding to the target video type includes:
acquiring a pre-trained video generation model;
and inputting the processed image and the target video type into the pre-trained video generation model to obtain video data corresponding to the target video type.
7. The method according to claim 6, wherein inputting the processed image and the target video type into the pre-trained video generation model to obtain video data corresponding to the target video type comprises:
performing latent space mapping processing on the processed image to obtain potential characteristics of the processed image;
setting the potential features and the target video type as constraints of the pre-trained video generation model;
and processing the processed image through the pre-trained video generation model based on the constraint condition to obtain video data corresponding to the target video type.
8. A video data generating apparatus, the apparatus comprising:
the text acquisition module is used for acquiring video texts matched with the target video types;
the image generation module is used for inputting the video text into an image generation model corresponding to the target video type to obtain image data corresponding to the video text;
the style migration module is used for performing style migration processing on the image data to obtain a processed image of the image data;
and the video generation module is used for carrying out video generation processing on the processed image to obtain video data corresponding to the target video type.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
11. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
CN202311674444.9A 2023-12-07 2023-12-07 Video data generation method, device, computer equipment and storage medium Pending CN117676270A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311674444.9A CN117676270A (en) 2023-12-07 2023-12-07 Video data generation method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311674444.9A CN117676270A (en) 2023-12-07 2023-12-07 Video data generation method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117676270A true CN117676270A (en) 2024-03-08

Family

ID=90073006

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311674444.9A Pending CN117676270A (en) 2023-12-07 2023-12-07 Video data generation method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117676270A (en)

Similar Documents

Publication Publication Date Title
US20230419076A1 (en) Recurrent neural networks for data item generation
US20230359865A1 (en) Modeling Dependencies with Global Self-Attention Neural Networks
WO2020087564A1 (en) Three-dimensional object reconstruction method, computer device and storage medium
AU2021354030B2 (en) Processing images using self-attention based neural networks
CN114998814B (en) Target video generation method and device, computer equipment and storage medium
WO2022105117A1 (en) Method and device for image quality assessment, computer device, and storage medium
CN115272082A (en) Model training method, video quality improving method, device and computer equipment
CN117676270A (en) Video data generation method, device, computer equipment and storage medium
CN112418292B (en) Image quality evaluation method, device, computer equipment and storage medium
CN117454185B (en) Federal model training method, federal model training device, federal model training computer device, and federal model training storage medium
US11983903B2 (en) Processing images using self-attention based neural networks
CN116740540B (en) Data processing method, device, equipment and computer readable storage medium
CN111598189B (en) Generative model training method, data generation method, device, medium, and apparatus
CN116156092A (en) Background replacement method, device, computer equipment and storage medium
CN117196924A (en) Watermark adding method, watermark adding device, computer equipment and storage medium
CN117975473A (en) Bill text detection model training and detection method, device, equipment and medium
CN116828254A (en) Video background processing method, device, computer equipment and storage medium
CN117130530A (en) Background map updating method, device, equipment, storage medium and program product
CN117670686A (en) Video frame enhancement method, device, computer equipment and storage medium
CN116342242A (en) Abnormality detection method, abnormality detection device, computer device, and storage medium
CN115658899A (en) Text classification method and device, computer equipment and storage medium
CN114693941A (en) Text recognition method and device, computer equipment and storage medium
CN116563408A (en) Image reconstruction method, device, equipment and medium
CN117974707A (en) Training method of image segmentation model, image segmentation method and device
CN115984947A (en) Image generation method, training method, device, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination