CN116600155A - Video generation method, device, equipment and storage medium - Google Patents

Video generation method, device, equipment and storage medium Download PDF

Info

Publication number
CN116600155A
CN116600155A CN202310557882.0A CN202310557882A CN116600155A CN 116600155 A CN116600155 A CN 116600155A CN 202310557882 A CN202310557882 A CN 202310557882A CN 116600155 A CN116600155 A CN 116600155A
Authority
CN
China
Prior art keywords
image
feature
video
position information
training set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310557882.0A
Other languages
Chinese (zh)
Inventor
舒畅
陈又新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202310557882.0A priority Critical patent/CN116600155A/en
Publication of CN116600155A publication Critical patent/CN116600155A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234309Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4 or from Quicktime to Realvideo
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234345Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440218Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440245Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an artificial intelligence technology in the field of financial science and technology, and discloses a video generation method, which comprises the following steps: training an image feature prediction model by using a standard image feature set with position information to obtain an image feature generation model, generating context image features of an image to be analyzed with initial position information by using the image feature generation model to obtain context image features with predicted position information, performing image restoration on the context image features to obtain a context image set, and connecting the image to be analyzed and the image in the context image set in series by using the initial position information and the predicted position information to obtain a series video. The present invention also relates to blockchain techniques, the tandem video may be stored in nodes of a blockchain. The invention also provides a video generating device, electronic equipment and a readable storage medium. In the financial field, the accuracy of generating the video according to the image can be improved by positioning the context information through the position information.

Description

Video generation method, device, equipment and storage medium
Technical Field
The present invention relates to the field of financial science and technology and artificial intelligence technology, and in particular, to a video generating method, apparatus, electronic device and readable storage medium.
Background
With the rapid development of artificial intelligence, artificial intelligence technology is widely applied to various fields, such as the field of financial science and technology, and face video is generated by using face images through artificial intelligence during online transactions so as to improve the accuracy of face authentication and identification.
The following methods are generally used in the prior art for video generation: 1. moving the picture by a frame inserting method; 2. and generating a plurality of pictures by using the antagonistic neural network GAN, and splicing consecutive pictures to form a video. The method does not consider the context information between frames in the video, namely the position information between different frame images cannot be determined, so that the generated video lacks logic (for example, when the face images generate the face video, the different frame position information causes the face authentication failure when the face images generate the face video), and the accuracy of generating the video according to the images is low.
Disclosure of Invention
The invention provides a video generation method, a video generation device, an electronic device and a readable storage medium, and aims to improve the accuracy of video generation according to images.
In order to achieve the above object, the present invention provides a video generating method, including:
acquiring a video training set, and performing frame-by-frame image segmentation on videos in the video training set to obtain an original image training set;
Adding position information into images of the original image training set to obtain a standard image training set, and extracting features of the images in the standard image training set by utilizing a pre-trained feature extraction model to obtain an original image feature set;
performing position coding on images in the standard image training set to obtain a position feature vector set, and constructing a standard image feature set based on the position feature vector set and the original image feature set;
training a pre-constructed image feature prediction model by using the standard image feature set to obtain an image feature generation model;
acquiring an image to be analyzed with initial position information, and generating context image features of the image to be analyzed by using the image feature generation model to obtain the context image features with predicted position information;
performing image restoration on the context image features by using a preset image decoder to obtain a context image set;
and utilizing the initial position information and the predicted position information in the context image characteristics to carry out series connection on the image to be analyzed and the image in the context image set, so as to obtain a series connection video.
Optionally, the performing frame-by-frame image segmentation on the video in the video training set to obtain an original image training set includes:
downsampling the videos in the video training set to obtain a sampled video set;
and extracting the images frame by frame from the videos in the sampling video set to obtain an original image training set.
Optionally, adding position information to the images in the original image training set to obtain a standard image training set, including:
dividing the pixel blocks of the images in the original image training set according to a preset sliding window to obtain a pixel block set;
randomly covering the pixel blocks in the pixel block set according to a preset first random value to obtain a standard pixel block set;
tiling pixel blocks in the standard pixel block set to obtain an image sequence set;
and adding position information into the image sequence set, and summarizing all the image sequences added with the position information to obtain the standard image training set.
Optionally, the performing position coding on the image in the standard image training set to obtain a position feature vector set, and constructing a standard image feature set based on the position feature vector set and the original image feature set includes:
Performing position coding on the images in the standard image training set based on the position information of the images in the standard image training set to obtain a position feature vector set;
and adding the vectors in the position feature vector set and the feature vectors in the original image feature set based on the position information to obtain the standard image feature set.
Optionally, the images in the standard image training set are position coded by the following formula:
where pos represents the location information, i represents the dimension in the feature, d model Representing the overall dimension of a feature, PE (pos,2i ) Feature encoding parameter PE representing 2 i-th dimensional position of image with position information of pos (pos,2i+1) Characteristic encoding parameters representing 2i+1-th dimensional positions of the image whose position information is pos.
Optionally, training the pre-constructed image feature prediction model by using the standard image feature set to obtain an image feature generation model, including:
randomly masking the image features in the standard image feature set according to a preset second random value to obtain a masking feature vector;
carrying out feature prediction on the masking feature vector by using the image feature prediction model to obtain prediction probability;
And calculating a predicted loss value based on the predicted probability, adjusting network parameters of the image feature prediction model when the predicted loss value does not meet a preset loss threshold, returning to the step of predicting the features of the mask feature vector by using the image feature prediction model, stopping training until the predicted loss value meets the loss threshold, and taking the trained image feature prediction model as an image feature generation model.
Alternatively, the predictive loss value is calculated using the following formula:
wherein, L (θ) represents a predicted loss value, θ represents a model parameter, E represents an expectation, x represents an image in an original image training set, D represents the original image training set, xl represents a first standard image feature in a standard image feature set, x\l represents a mask feature vector corresponding to the first standard image feature, and L represents the total number of features in the standard image feature set.
In order to solve the above problems, the present invention also provides a video generating apparatus, including:
the image feature extraction module is used for acquiring a video training set, carrying out frame-by-frame image segmentation on videos in the video training set to obtain an original image training set, adding position information into images of the original image training set to obtain a standard image training set, and carrying out feature extraction on the images in the standard image training set by utilizing a pre-trained feature extraction model to obtain an original image feature set;
The position feature coding module is used for carrying out position coding on the images in the standard image training set to obtain a position feature vector set, and constructing a standard image feature set based on the position feature vector set and the original image feature set;
the model training module is used for training a pre-constructed image feature prediction model by utilizing the standard image feature set to obtain an image feature generation model;
the video generation module is used for acquiring an image to be analyzed with initial position information, generating context image characteristics of the image to be analyzed by using the image characteristic generation model to obtain context image characteristics with predicted position information, performing image restoration on the context image characteristics by using a preset image decoder to obtain a context image set, and connecting the image to be analyzed and the image in the context image set in series by using the initial position information and the predicted position information in the context image characteristics to obtain a serial video.
In order to solve the above-mentioned problems, the present invention also provides an electronic apparatus including:
a memory storing at least one computer program; a kind of electronic device with high-pressure air-conditioning system
And a processor executing the computer program stored in the memory to implement the video generation method described above.
In order to solve the above-described problems, the present invention also provides a computer-readable storage medium having stored therein at least one computer program that is executed by a processor in an electronic device to implement the video generation method described above.
According to the invention, the position information is added in the images of the original image training set, and the images in the standard image training set are subjected to position coding based on the position information, so that the training data of the image feature prediction model contains the position features, the generated image feature generation model can accurately predict the image sequence in the video, then the images are connected in series according to the predicted position information, the logic between the serial images is improved, the video can be accurately generated according to the images, for example, the face images in the financial field are locked through the position information, and the accuracy of generating the face video by the face images is improved. Therefore, the video generation method, the video generation device, the electronic equipment and the computer readable storage medium can improve the accuracy of video generation according to the image.
Drawings
Fig. 1 is a flowchart of a video generating method according to an embodiment of the present invention;
FIG. 2 is a functional block diagram of a video generating apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device for implementing the video generating method according to an embodiment of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment of the invention provides a video generation method. The execution subject of the video generation method includes, but is not limited to, at least one of a server, a terminal, and the like, which can be configured to execute the method provided by the embodiment of the invention. In other words, the video generation method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The service end includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (ContentDelivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.
Referring to fig. 1, a flowchart of a video generating method according to an embodiment of the invention is shown. In this embodiment, the video generation method includes the following steps S1 to S7:
s1, acquiring a video training set, and performing frame-by-frame image segmentation on videos in the video training set to obtain an original image training set.
In the embodiment of the invention, the video training set may be a public data set X4K1000FPS, including 175 video scene segments, each video segment having a duration of 5s and including a 5000 frame sequence, 1000 frames per second, and each frame being a high resolution 4096X2160.
In detail, the performing frame-by-frame image segmentation on the video in the video training set to obtain an original image training set includes:
downsampling the videos in the video training set to obtain a sampled video set;
and extracting the images frame by frame from the videos in the sampling video set to obtain an original image training set.
In an alternative embodiment of the present invention, because the number of frames of video per second in the video training set is excessive, downsampling is performed on 1000 frames per second, the number of frames per second is sampled to 100 frames, the 5s video has 500 frames of images, and then the images are extracted frame by frame, so as to obtain the original image training set.
S2, adding position information into the images of the original image training set to obtain a standard image training set, and extracting features of the images in the standard image training set by utilizing a pre-trained feature extraction model to obtain an original image feature set.
In the embodiment of the present invention, the pre-trained feature extraction model may be a BEiT model, where the BEiT model converts a picture into two representation views: firstly, changing an image into discrete visual symbols (visual token) through code learning token, and resembling a text; second, the image is cut into multiple small "pixel blocks" (patches), each of which corresponds to a character. When the BEiT pre-training is performed, the model may randomly cover part of the pixel blocks of the image and replace them with special mask symbols [ M ], and then learn and predict the actual picture in the backbone network ViT continuously, so as to obtain the feature vector of the whole picture.
In detail, adding position information in the images of the original image training set to obtain a standard image training set includes:
dividing the pixel blocks of the images in the original image training set according to a preset sliding window to obtain a pixel block set;
Randomly covering the pixel blocks in the pixel block set according to a preset first random value to obtain a standard pixel block set;
tiling pixel blocks in the standard pixel block set to obtain an image sequence set;
and adding position information into the image sequence set, and summarizing all the image sequences added with the position information to obtain the standard image training set.
In an alternative embodiment of the present invention, after a complete image is patched by using a preset sliding window, 4×4 pixel blocks are obtained, 16 pixel blocks are randomly patched, and then all the patches are patched, that is, after the pixel blocks of the other rows are sequentially combined in the first row, an image sequence is obtained, and meanwhile, the position information (position embedding) of each pixel block in the image sequence is added, for example, the position information of 16 images (pixel blocks) is sequentially 0-16. And finally, inputting a BEiT model, so that each picture can be encoded through BEiT, and an image characteristic vector is obtained. For example, a 5s video has 500 frames, each frame being an image, 500 feature vectors can be acquired by BEiT.
S3, carrying out position coding on the images in the standard image training set to obtain a position feature vector set, and constructing a standard image feature set based on the position feature vector set and the original image feature set.
In detail, the performing position coding on the images in the standard image training set to obtain a position feature vector set, and constructing a standard image feature set based on the position feature vector set and the original image feature set, including:
performing position coding on the images in the standard image training set based on the position information of the images in the standard image training set to obtain a position feature vector set;
and adding the vectors in the position feature vector set and the feature vectors in the original image feature set based on the position information to obtain the standard image feature set.
In an alternative embodiment of the present invention, the images in the standard image training set are position coded by the following formula:
where pos represents the location information, i represents the dimension in the feature, d model Representing the overall dimension of a feature, PE (pos,2i) Feature encoding parameter PE representing 2 i-th dimensional position of image with position information of pos (pos,2i+1) Characteristic encoding parameters representing 2i+1-th dimensional positions of the image whose position information is pos.
In the embodiment of the invention, for example, 500 frames of video of 5s are provided, 500 feature vectors can be obtained through BEiT, and position codes (position encoding) are added to the 500 feature vectors according to the sequence of position information to obtain position (position) feature vectors, wherein the position information pos is 0-499, the position feature vector dimension starts from i to 0, the dmedel is the total number of position feature dimensions, here 512 dimensions, the position of an image of 500 frames can be encoded through the above formula to obtain the position feature vectors, and the position feature vectors (512 dimensions) and original image features (512 dimensions) of BEiT are added to obtain the standard image feature vector xl.
And S4, training a pre-constructed image feature prediction model by using the standard image feature set to obtain an image feature generation model.
In the embodiment of the present invention, the pre-constructed image feature prediction model may be a BERT model.
In detail, training the pre-constructed image feature prediction model by using the standard image feature set to obtain an image feature generation model, including:
randomly masking the image features in the standard image feature set according to a preset second random value to obtain a masking feature vector;
carrying out feature prediction on the masking feature vector by using the image feature prediction model to obtain prediction probability;
and calculating a predicted loss value based on the predicted probability, adjusting network parameters of the image feature prediction model when the predicted loss value does not meet a preset loss threshold, returning to the step of predicting the features of the mask feature vector by using the image feature prediction model, stopping training until the predicted loss value meets the loss threshold, and taking the trained image feature prediction model as an image feature generation model.
In an alternative embodiment of the invention, the predicted loss value is calculated using the following formula:
Wherein, L (θ) represents a predicted loss value, θ represents a model parameter, E represents an expectation, x represents an image in an original image training set, D represents the original image training set, xl represents a first standard image feature in a standard image feature set, x\l represents a mask feature vector corresponding to the first standard image feature, and L represents the total number of features in the standard image feature set.
In an alternative embodiment of the present invention, the mask is performed on the input standard image feature vector by setting the second random value (60%), that is, 500 frames of images are input, and 300 frames of the mask are input, so that the model can generate the picture of the mask through the loss function. The final predicted output of the method is the feature vector of the original picture, rather than the xl vector added with the position feature vector.
S5, acquiring an image to be analyzed with initial position information, and generating context image features of the image to be analyzed by using the image feature generation model to obtain the context image features with predicted position information.
In the embodiment of the present invention, the starting position information refers to the appearance position of the image to be analyzed in the video, for example, a video starting frame, a video intermediate frame, and the like.
S6, performing image restoration on the context image features by using a preset image decoder to obtain a context image set.
In an alternative embodiment of the present invention, since the original image features are extracted by the BEiT model, in order to ensure that the image can be accurately restored, the preset image decoder may be a BEiT image decoder, where the training process of the image decoder is a pixel-level variable self-encoder training process, that is, the original image is input, and the original image is encoded by the BEiT encoder to obtain the encoded features, and then the encoded features are input into the decoder and restored to the image by the decoder.
And S7, utilizing the initial position information and the predicted position information in the context image characteristics to carry out series connection on the image to be analyzed and the image in the context image set, so as to obtain a series connection video.
In the embodiment of the invention, the context of the picture can be generated through the image characteristic generation model. Assuming that a given image to be analyzed is a video initial frame, placing the image to be analyzed in a first position, and performing mask on a subsequent 499 position, so that the model can predict the picture characteristics of the position to be masked; if the starting position information of a given image to be analyzed is in the middle of the video, the image is indicated to be placed in the middle position, and the left part and the right part are masked. Since the output context image features are sequence information with positions, the features can be restored to pictures by using the BEiT image decoder, and finally the output pictures are connected in series according to the predicted position information and the initial position information to form a serial video.
For example, when online transaction or online shopping is performed, a coherent video is generated through a face image or a body image, and the accuracy and the consistency of the generated video can be improved through position information coding, so that the accuracy of face authentication and identification or the degree of knowledge of wearing products are improved.
According to the invention, the position information is added in the images of the original image training set, and the images in the standard image training set are subjected to position coding based on the position information, so that the training data of the image feature prediction model contains the position features, the generated image feature generation model can accurately predict the image sequence in the video, then the images are connected in series according to the predicted position information, the logic between the serial images is improved, the video can be accurately generated according to the images, for example, the face images in the financial field are locked through the position information, and the accuracy of generating the face video by the face images is improved. Therefore, the video generation method provided by the invention can improve the accuracy of video generation according to the image.
Fig. 2 is a functional block diagram of a video generating apparatus according to an embodiment of the present invention.
The video generating apparatus 100 of the present invention may be mounted in an electronic device. Depending on the implemented functionality, the video generating apparatus 100 may include an image feature extraction module 101, a location feature encoding module 102, a model training module 103, and a video generating module 104. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.
In the present embodiment, the functions concerning the respective modules/units are as follows:
the image feature extraction module 101 is configured to obtain a video training set, divide a video in the video training set into frame-by-frame images to obtain an original image training set, add position information to images in the original image training set to obtain a standard image training set, and perform feature extraction on images in the standard image training set by using a pre-trained feature extraction model to obtain an original image feature set;
the position feature encoding module 102 is configured to perform position encoding on images in the standard image training set to obtain a position feature vector set, and construct a standard image feature set based on the position feature vector set and the original image feature set;
The model training module 103 is configured to train a pre-constructed image feature prediction model by using the standard image feature set to obtain an image feature generation model;
the video generating module 104 is configured to obtain an image to be analyzed with initial position information, generate a context image feature of the image to be analyzed by using the image feature generating model to obtain a context image feature with predicted position information, perform image restoration on the context image feature by using a preset image decoder to obtain a context image set, and perform serial connection on the image to be analyzed and the image in the context image set by using the initial position information and the predicted position information in the context image feature to obtain a serial video.
In detail, the specific embodiments of the modules of the video generating apparatus 100 are as follows:
step one, acquiring a video training set, and performing frame-by-frame image segmentation on videos in the video training set to obtain an original image training set.
In the embodiment of the invention, the video training set may be a public data set X4K1000FPS, including 175 video scene segments, each video segment having a duration of 5s and including a 5000 frame sequence, 1000 frames per second, and each frame being a high resolution 4096X2160.
In detail, the performing frame-by-frame image segmentation on the video in the video training set to obtain an original image training set includes:
downsampling the videos in the video training set to obtain a sampled video set;
and extracting the images frame by frame from the videos in the sampling video set to obtain an original image training set.
In an alternative embodiment of the present invention, because the number of frames of video per second in the video training set is excessive, downsampling is performed on 1000 frames per second, the number of frames per second is sampled to 100 frames, the 5s video has 500 frames of images, and then the images are extracted frame by frame, so as to obtain the original image training set.
Adding position information into the images of the original image training set to obtain a standard image training set, and extracting features of the images in the standard image training set by utilizing a pre-trained feature extraction model to obtain an original image feature set.
In the embodiment of the present invention, the pre-trained feature extraction model may be a BEiT model, where the BEiT model converts a picture into two representation views: firstly, changing an image into discrete visual symbols (visual token) through code learning token, and resembling a text; second, the image is cut into multiple small "pixel blocks" (patches), each of which corresponds to a character. When the BEiT pre-training is performed, the model may randomly cover part of the pixel blocks of the image and replace them with special mask symbols [ M ], and then learn and predict the actual picture in the backbone network ViT continuously, so as to obtain the feature vector of the whole picture.
In detail, adding position information in the images of the original image training set to obtain a standard image training set includes:
dividing the pixel blocks of the images in the original image training set according to a preset sliding window to obtain a pixel block set;
randomly covering the pixel blocks in the pixel block set according to a preset first random value to obtain a standard pixel block set;
tiling pixel blocks in the standard pixel block set to obtain an image sequence set;
and adding position information into the image sequence set, and summarizing all the image sequences added with the position information to obtain the standard image training set.
In an alternative embodiment of the present invention, after a complete image is patched by using a preset sliding window, 4×4 pixel blocks are obtained, 16 pixel blocks are randomly patched, and then all the patches are patched, that is, after the pixel blocks of the other rows are sequentially combined in the first row, an image sequence is obtained, and meanwhile, the position information (position embedding) of each pixel block in the image sequence is added, for example, the position information of 16 images (pixel blocks) is sequentially 0-16. And finally, inputting a BEiT model, so that each picture can be encoded through BEiT, and an image characteristic vector is obtained. For example, a 5s video has 500 frames, each frame being an image, 500 feature vectors can be acquired by BEiT.
Thirdly, carrying out position coding on images in the standard image training set to obtain a position feature vector set, and constructing a standard image feature set based on the position feature vector set and the original image feature set.
In detail, the performing position coding on the images in the standard image training set to obtain a position feature vector set, and constructing a standard image feature set based on the position feature vector set and the original image feature set, including:
performing position coding on the images in the standard image training set based on the position information of the images in the standard image training set to obtain a position feature vector set;
and adding the vectors in the position feature vector set and the feature vectors in the original image feature set based on the position information to obtain the standard image feature set.
In an alternative embodiment of the present invention, the images in the standard image training set are position coded by the following formula:
where pos represents the location information, i represents the dimension in the feature, d model Representing the overall dimension of a feature, PE (pos,2i) Feature encoding parameter PE representing 2 i-th dimensional position of image with position information of pos (pos,2i+1) Characteristic encoding parameters representing 2i+1-th dimensional positions of the image whose position information is pos.
In the embodiment of the invention, for example, 500 frames of video of 5s are provided, 500 feature vectors can be obtained through BEiT, and position codes (position encoding) are added to the 500 feature vectors according to the sequence of position information to obtain position (position) feature vectors, wherein the position information pos is 0-499, the position feature vector dimension starts from i to 0, the dmedel is the total number of position feature dimensions, here 512 dimensions, the position of an image of 500 frames can be encoded through the above formula to obtain the position feature vectors, and the position feature vectors (512 dimensions) and original image features (512 dimensions) of BEiT are added to obtain the standard image feature vector xl.
Training a pre-constructed image feature prediction model by using the standard image feature set to obtain an image feature generation model.
In the embodiment of the present invention, the pre-constructed image feature prediction model may be a BERT model.
In detail, training the pre-constructed image feature prediction model by using the standard image feature set to obtain an image feature generation model, including:
randomly masking the image features in the standard image feature set according to a preset second random value to obtain a masking feature vector;
Carrying out feature prediction on the masking feature vector by using the image feature prediction model to obtain prediction probability;
and calculating a predicted loss value based on the predicted probability, adjusting network parameters of the image feature prediction model when the predicted loss value does not meet a preset loss threshold, returning to the step of predicting the features of the mask feature vector by using the image feature prediction model, stopping training until the predicted loss value meets the loss threshold, and taking the trained image feature prediction model as an image feature generation model.
In an alternative embodiment of the invention, the predicted loss value is calculated using the following formula:
wherein, L (θ) represents a predicted loss value, θ represents a model parameter, E represents an expectation, x represents an image in an original image training set, D represents the original image training set, xl represents a first standard image feature in a standard image feature set, x\l represents a mask feature vector corresponding to the first standard image feature, and L represents the total number of features in the standard image feature set.
In an alternative embodiment of the present invention, the mask is performed on the input standard image feature vector by setting the second random value (60%), that is, 500 frames of images are input, and 300 frames of the mask are input, so that the model can generate the picture of the mask through the loss function. The final predicted output of the method is the feature vector of the original picture, rather than the xl vector added with the position feature vector.
And fifthly, acquiring an image to be analyzed with initial position information, and generating context image features of the image to be analyzed by using the image feature generation model to obtain the context image features with predicted position information.
In the embodiment of the present invention, the starting position information refers to the appearance position of the image to be analyzed in the video, for example, a video starting frame, a video intermediate frame, and the like.
And step six, performing image restoration on the context image features by using a preset image decoder to obtain a context image set.
In an alternative embodiment of the present invention, since the original image features are extracted by the BEiT model, in order to ensure that the image can be accurately restored, the preset image decoder may be a BEiT image decoder, where the training process of the image decoder is a pixel-level variable self-encoder training process, that is, the original image is input, and the original image is encoded by the BEiT encoder to obtain the encoded features, and then the encoded features are input into the decoder and restored to the image by the decoder.
And step seven, utilizing the initial position information and the predicted position information in the context image characteristics to carry out series connection on the image to be analyzed and the image in the context image set, so as to obtain a series connection video.
In the embodiment of the invention, the context of the picture can be generated through the image characteristic generation model. Assuming that a given image to be analyzed is a video initial frame, placing the image to be analyzed in a first position, and performing mask on a subsequent 499 position, so that the model can predict the picture characteristics of the position to be masked; if the starting position information of a given image to be analyzed is in the middle of the video, the image is indicated to be placed in the middle position, and the left part and the right part are masked. Since the output context image features are sequence information with positions, the features can be restored to pictures by using the BEiT image decoder, and finally the output pictures are connected in series according to the predicted position information and the initial position information to form a serial video.
According to the invention, the position information is added in the images of the original image training set, and the images in the standard image training set are subjected to position coding based on the position information, so that the training data of the image feature prediction model contains the position features, the generated image feature generation model can accurately predict the image sequence in the video, then the images are connected in series according to the predicted position information, the logic between the serial images is improved, the video can be accurately generated according to the images, for example, the face images in the financial field are locked through the position information, and the accuracy of generating the face video by the face images is improved. Therefore, the video generating device provided by the invention can improve the accuracy of generating the video according to the image.
Fig. 3 is a schematic structural diagram of an electronic device implementing the video generating method according to an embodiment of the present invention.
The electronic device may comprise a processor 10, a memory 11, a communication interface 12 and a bus 13, and may further comprise a computer program, such as a video generation program, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device, such as a mobile hard disk of the electronic device. The memory 11 may in other embodiments also be an external storage device of the electronic device, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device. The memory 11 may be used not only for storing application software installed in an electronic device and various types of data, such as codes of video generation programs, but also for temporarily storing data that has been output or is to be output.
The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing Unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the entire electronic device using various interfaces and lines, and executes various functions of the electronic device and processes data by running or executing programs or modules (e.g., video generation programs, etc.) stored in the memory 11, and calling data stored in the memory 11.
The communication interface 12 is used for communication between the electronic device and other devices, including network interfaces and user interfaces. Optionally, the network interface may include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), typically used to establish a communication connection between the electronic device and other electronic devices. The user interface may be a Display (Display), an input unit such as a Keyboard (Keyboard), or alternatively a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device and for displaying a visual user interface.
The bus 13 may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The bus 13 may be classified into an address bus, a data bus, a control bus, and the like. The bus 13 is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.
Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 is not limiting of the electronic device and may include fewer or more components than shown, or may combine certain components, or a different arrangement of components.
For example, although not shown, the electronic device may further include a power source (such as a battery) for supplying power to the respective components, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device may further include various sensors, bluetooth modules, wi-Fi modules, etc., which are not described herein.
Further, the electronic device may also include a network interface, optionally, the network interface may include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), typically used to establish a communication connection between the electronic device and other electronic devices.
Optionally, the electronic device may further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device and for displaying a visual user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
The video generation program stored in the memory 11 in the electronic device is a combination of instructions that, when executed in the processor 10, may implement:
Acquiring a video training set, and performing frame-by-frame image segmentation on videos in the video training set to obtain an original image training set;
adding position information into images of the original image training set to obtain a standard image training set, and extracting features of the images in the standard image training set by utilizing a pre-trained feature extraction model to obtain an original image feature set;
performing position coding on images in the standard image training set to obtain a position feature vector set, and constructing a standard image feature set based on the position feature vector set and the original image feature set;
training a pre-constructed image feature prediction model by using the standard image feature set to obtain an image feature generation model;
acquiring an image to be analyzed with initial position information, and generating context image features of the image to be analyzed by using the image feature generation model to obtain the context image features with predicted position information;
performing image restoration on the context image features by using a preset image decoder to obtain a context image set;
and utilizing the initial position information and the predicted position information in the context image characteristics to carry out series connection on the image to be analyzed and the image in the context image set, so as to obtain a series connection video.
In particular, the specific implementation method of the above instructions by the processor 10 may refer to the description of the relevant steps in the corresponding embodiment of the drawings, which is not repeated herein.
Further, the electronic device integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. The computer readable storage medium may be volatile or nonvolatile. For example, the computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).
The present invention also provides a computer readable storage medium storing a computer program which, when executed by a processor of an electronic device, can implement:
acquiring a video training set, and performing frame-by-frame image segmentation on videos in the video training set to obtain an original image training set;
adding position information into images of the original image training set to obtain a standard image training set, and extracting features of the images in the standard image training set by utilizing a pre-trained feature extraction model to obtain an original image feature set;
Performing position coding on images in the standard image training set to obtain a position feature vector set, and constructing a standard image feature set based on the position feature vector set and the original image feature set;
training a pre-constructed image feature prediction model by using the standard image feature set to obtain an image feature generation model;
acquiring an image to be analyzed with initial position information, and generating context image features of the image to be analyzed by using the image feature generation model to obtain the context image features with predicted position information;
performing image restoration on the context image features by using a preset image decoder to obtain a context image set;
and utilizing the initial position information and the predicted position information in the context image characteristics to carry out series connection on the image to be analyzed and the image in the context image set, so as to obtain a series connection video.
In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The embodiment of the invention can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.
Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims (10)

1. A method of video generation, the method comprising:
acquiring a video training set, and performing frame-by-frame image segmentation on videos in the video training set to obtain an original image training set;
adding position information into images of the original image training set to obtain a standard image training set, and extracting features of the images in the standard image training set by utilizing a pre-trained feature extraction model to obtain an original image feature set;
Performing position coding on images in the standard image training set to obtain a position feature vector set, and constructing a standard image feature set based on the position feature vector set and the original image feature set;
training a pre-constructed image feature prediction model by using the standard image feature set to obtain an image feature generation model;
acquiring an image to be analyzed with initial position information, and generating context image features of the image to be analyzed by using the image feature generation model to obtain the context image features with predicted position information;
performing image restoration on the context image features by using a preset image decoder to obtain a context image set;
and utilizing the initial position information and the predicted position information in the context image characteristics to carry out series connection on the image to be analyzed and the image in the context image set, so as to obtain a series connection video.
2. The method of generating video according to claim 1, wherein said performing frame-by-frame image segmentation on the video in the video training set to obtain an original image training set comprises:
downsampling the videos in the video training set to obtain a sampled video set;
And extracting the images frame by frame from the videos in the sampling video set to obtain an original image training set.
3. The method of generating video according to claim 1, wherein adding position information to the images of the original image training set to obtain a standard image training set includes:
dividing the pixel blocks of the images in the original image training set according to a preset sliding window to obtain a pixel block set;
randomly covering the pixel blocks in the pixel block set according to a preset first random value to obtain a standard pixel block set;
tiling pixel blocks in the standard pixel block set to obtain an image sequence set;
and adding position information into the image sequence set, and summarizing all the image sequences added with the position information to obtain the standard image training set.
4. The method for generating video according to claim 1, wherein the performing position coding on the image in the standard image training set to obtain a position feature vector set, and constructing a standard image feature set based on the position feature vector set and the original image feature set includes:
Performing position coding on the images in the standard image training set based on the position information of the images in the standard image training set to obtain a position feature vector set;
and adding the vectors in the position feature vector set and the feature vectors in the original image feature set based on the position information to obtain the standard image feature set.
5. The video generation method of claim 4, wherein images in the standard image training set are position coded by the following formula:
where pos represents the location information, i represents the dimension in the feature, d model Representing the overall dimension of a feature, PE (pos,2i) Feature encoding parameter PE representing 2-dimensional position of image with position information of pos (pos,2i+1) Characteristic encoding parameters representing the 2+1-dimensional position of the image whose position information is pos.
6. The method for generating video according to claim 4, wherein training the pre-constructed image feature prediction model using the standard image feature set to obtain the image feature generation model comprises:
randomly masking the image features in the standard image feature set according to a preset second random value to obtain a masking feature vector;
Carrying out feature prediction on the masking feature vector by using the image feature prediction model to obtain prediction probability;
and calculating a predicted loss value based on the predicted probability, adjusting network parameters of the image feature prediction model when the predicted loss value does not meet a preset loss threshold, returning to the step of predicting the features of the mask feature vector by using the image feature prediction model, stopping training until the predicted loss value meets the loss threshold, and taking the trained image feature prediction model as an image feature generation model.
7. The video generation method of claim 6, wherein the predictive loss value is calculated using the formula:
wherein, L (θ) represents a predicted loss value, θ represents a model parameter, E represents an expectation, x represents an image in an original image training set, D represents the original image training set, xl represents a first standard image feature in a standard image feature set, x\l represents a mask feature vector corresponding to the first standard image feature, and L represents the total number of features in the standard image feature set.
8. A video generating apparatus, the apparatus comprising:
The image feature extraction module is used for acquiring a video training set, carrying out frame-by-frame image segmentation on videos in the video training set to obtain an original image training set, adding position information into images of the original image training set to obtain a standard image training set, and carrying out feature extraction on the images in the standard image training set by utilizing a pre-trained feature extraction model to obtain an original image feature set;
the position feature coding module is used for carrying out position coding on the images in the standard image training set to obtain a position feature vector set, and constructing a standard image feature set based on the position feature vector set and the original image feature set;
the model training module is used for training a pre-constructed image feature prediction model by utilizing the standard image feature set to obtain an image feature generation model;
the video generation module is used for acquiring an image to be analyzed with initial position information, generating context image characteristics of the image to be analyzed by using the image characteristic generation model to obtain context image characteristics with predicted position information, performing image restoration on the context image characteristics by using a preset image decoder to obtain a context image set, and connecting the image to be analyzed and the image in the context image set in series by using the initial position information and the predicted position information in the context image characteristics to obtain a serial video.
9. An electronic device, the electronic device comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the video generation method of any one of claims 1 to 7.
10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the video generation method according to any one of claims 1 to 7.
CN202310557882.0A 2023-05-17 2023-05-17 Video generation method, device, equipment and storage medium Pending CN116600155A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310557882.0A CN116600155A (en) 2023-05-17 2023-05-17 Video generation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310557882.0A CN116600155A (en) 2023-05-17 2023-05-17 Video generation method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116600155A true CN116600155A (en) 2023-08-15

Family

ID=87605759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310557882.0A Pending CN116600155A (en) 2023-05-17 2023-05-17 Video generation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116600155A (en)

Similar Documents

Publication Publication Date Title
CN112396613B (en) Image segmentation method, device, computer equipment and storage medium
CN111898696A (en) Method, device, medium and equipment for generating pseudo label and label prediction model
CN112446025A (en) Federal learning defense method and device, electronic equipment and storage medium
CN111681681A (en) Voice emotion recognition method and device, electronic equipment and storage medium
CN111274937B (en) Tumble detection method, tumble detection device, electronic equipment and computer-readable storage medium
CN112767320A (en) Image detection method, image detection device, electronic equipment and storage medium
CN114021582B (en) Spoken language understanding method, device, equipment and storage medium combined with voice information
CN113705462A (en) Face recognition method and device, electronic equipment and computer readable storage medium
CN112651342A (en) Face recognition method and device, electronic equipment and storage medium
CN114511038A (en) False news detection method and device, electronic equipment and readable storage medium
CN116630457A (en) Training method and device for picture generation model, electronic equipment and storage medium
CN115205225A (en) Training method, device and equipment of medical image recognition model and storage medium
CN112686232B (en) Teaching evaluation method and device based on micro expression recognition, electronic equipment and medium
CN116630712A (en) Information classification method and device based on modal combination, electronic equipment and medium
CN116680580A (en) Information matching method and device based on multi-mode training, electronic equipment and medium
CN113792801B (en) Method, device, equipment and storage medium for detecting face dazzling degree
CN113887408B (en) Method, device, equipment and storage medium for detecting activated face video
CN113255456B (en) Inactive living body detection method, inactive living body detection device, electronic equipment and storage medium
CN116600155A (en) Video generation method, device, equipment and storage medium
CN113239814B (en) Facial expression recognition method, device, equipment and medium based on optical flow reconstruction
CN114881103A (en) Countermeasure sample detection method and device based on universal disturbance sticker
CN113806540A (en) Text labeling method and device, electronic equipment and storage medium
CN113989548A (en) Certificate classification model training method and device, electronic equipment and storage medium
CN113627394A (en) Face extraction method and device, electronic equipment and readable storage medium
CN112329599A (en) Digital signature identification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination