CN117893419A - Video generation method, device, electronic equipment and readable storage medium - Google Patents

Video generation method, device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN117893419A
CN117893419A CN202410059702.0A CN202410059702A CN117893419A CN 117893419 A CN117893419 A CN 117893419A CN 202410059702 A CN202410059702 A CN 202410059702A CN 117893419 A CN117893419 A CN 117893419A
Authority
CN
China
Prior art keywords
image
processed
parameter
initial
fused
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410059702.0A
Other languages
Chinese (zh)
Inventor
王凡祎
苏婧文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202410059702.0A priority Critical patent/CN117893419A/en
Publication of CN117893419A publication Critical patent/CN117893419A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Processing (AREA)

Abstract

The application discloses a video generation method, a device, electronic equipment and a readable storage medium, comprising the following steps: acquiring a video to be fused and an initial image, and carrying out frame decomposition on the video to be fused to obtain a plurality of frame images to be fused; acquiring a mask image of a target object in an initial image; fusing the initial image with each frame image to be fused based on a specified algorithm to obtain a plurality of images to be processed; determining a first color parameter corresponding to each image to be processed based on the mask image, and determining a second color parameter based on the initial image; re-polishing each image to be processed based on the second color parameters, the mask image, the initial image and the first color parameters corresponding to each image to be processed to obtain a target image; a target video is generated based on the target image. The video to be fused and the initial image are fused through a specified algorithm, and the obtained image can be subjected to re-lighting through color parameters, so that a target video with more vivid illumination and shadow is obtained.

Description

Video generation method, device, electronic equipment and readable storage medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a video generating method, apparatus, electronic device, and readable storage medium.
Background
At present, with the development of electronic information technology, elements in some dynamic videos can be added into images, so that dynamic videos are generated. However, the current method for generating the dynamic video consumes more manpower, and the illumination and shadow of the generated dynamic video are not lifelike.
Disclosure of Invention
The application provides a video generation method, a video generation device, electronic equipment and a readable storage medium.
In a first aspect, an embodiment of the present application provides a video generating method, including: acquiring a video to be fused and an initial image, and carrying out frame decomposition on the video to be fused to obtain a plurality of frame images to be fused; acquiring a mask image of a target object in the initial image; fusing the initial image with each frame image to be fused based on a specified algorithm to obtain a plurality of images to be processed; determining a first color parameter corresponding to each image to be processed based on the mask image, and determining a second color parameter based on the initial image; performing re-lighting on each image to be processed based on the second color parameters, the mask image, the initial image and the first color parameters corresponding to each image to be processed to obtain a target image; and generating a target video based on the target image.
In a second aspect, an embodiment of the present application further provides a video generating apparatus, including: the device comprises a first acquisition unit, a second acquisition unit, a fusion unit, a color parameter determination unit, a re-lighting unit and a video generation unit. The first acquisition unit is used for acquiring a video to be fused and an initial image, and carrying out frame decomposition on the video to be fused to obtain a plurality of frame images to be fused; a second obtaining unit, configured to obtain a mask image of a target object in the initial image; the fusion unit is used for respectively fusing the initial image with each frame image to be fused based on a specified algorithm to obtain a plurality of images to be processed; a color parameter determining unit, configured to determine a first color parameter corresponding to each image to be processed based on the mask image, and determine a second color parameter based on the initial image; the re-lighting unit is used for re-lighting each image to be processed based on the second color parameters, the mask image, the initial image and the first color parameters corresponding to each image to be processed to obtain a target image; and the video generation unit is used for generating a target video based on the target image.
In a third aspect, an embodiment of the present application further provides an electronic device, including: one or more processors; a memory; one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of the first aspect.
In a fourth aspect, embodiments of the present application also provide a computer readable storage medium having stored therein program code that is callable by a processor to perform the method according to the first aspect.
The video generation method, the device, the electronic equipment and the readable storage medium provided by the embodiment of the application are characterized in that firstly, a video to be fused and an initial image are obtained, and the video to be fused is subjected to frame decomposition to obtain a plurality of frame images to be fused; then obtaining a mask image of the target object in the initial image; fusing the initial image with each frame image to be fused based on a specified algorithm to obtain a plurality of images to be processed; determining a first color parameter corresponding to each image to be processed based on the mask image, and determining a second color parameter based on the initial image; performing re-lighting on each image to be processed based on the second color parameters, the mask image, the initial image and the first color parameters corresponding to each image to be processed to obtain a target image; and finally, generating a target video based on the target image. Firstly, fusing the initial image with each frame image to be fused respectively through a designated algorithm to obtain a plurality of images to be processed, so that editing through manpower is avoided, labor cost is saved, and efficiency of acquiring the images to be processed is improved. In addition, the obtained image is subjected to re-lighting through the color parameters, so that a target video with more vivid illumination and shadow can be obtained.
Additional features and advantages of embodiments of the application will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of embodiments of the application. The objectives and other advantages of embodiments of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 shows an application scene graph of a video generation method provided by an embodiment of the present application;
fig. 2 shows a method flowchart of a video generation method according to an embodiment of the present application;
FIG. 3 shows a schematic diagram of an initial image provided by an embodiment of the present application;
fig. 4 is a schematic diagram of a frame image to be fused according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a mask image according to an embodiment of the present application;
FIG. 6 shows a schematic diagram of a target image provided in an embodiment of the application;
FIG. 7 is a flow chart of a video generation method according to another embodiment of the present application;
fig. 8 shows a schematic diagram of a to-be-fused line manuscript provided by an embodiment of the present application;
FIG. 9 is a schematic diagram of an initial line contribution graph provided by an embodiment of the present application;
fig. 10 shows a schematic diagram of a line manuscript image to be processed provided by an embodiment of the present application;
FIG. 11 is a flow chart of a video generation method according to another embodiment of the present application;
FIG. 12 is a flow chart of a video generation method according to still another embodiment of the present application;
FIG. 13 is a flow chart of a video generating method according to still another embodiment of the present application;
fig. 14 is a block diagram showing the configuration of a video generating apparatus according to an embodiment of the present application;
Fig. 15 shows a block diagram of an electronic device according to an embodiment of the present application;
fig. 16 shows a block diagram of a computer-readable storage medium according to an embodiment of the present application.
Detailed Description
In order to make the present application better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.
At present, with the development of electronic information technology, elements in some dynamic videos can be added into images, so that dynamic videos are generated. However, the current method for generating the dynamic video consumes more manpower, and the illumination and shadow of the generated dynamic video are not lifelike. How to reduce the manpower consumed by generating the dynamic video and improve the fidelity of illumination and shadow of the generated dynamic video is a problem to be solved urgently.
The motion video may be a video generated by editing elements of a motion on a still image, where the elements of the motion may be motion objects extracted from the video, for example, motion objects extracted from the motion video.
In the prior art, dynamic elements may be added manually on a still image, for example, by using image editing software or animation software.
However, the inventors found in the study that adding dynamic elements manually on a static image requires a relatively high time and requires a relatively high manual effort. Moreover, the illumination and shadow of the generated dynamic video are not lifelike.
Accordingly, in order to solve or partially solve the above-described problems, the present application provides a video generation method, apparatus, electronic device, and readable storage medium.
Referring to fig. 1, fig. 1 shows an application scenario diagram of a video generating method, that is, a video generating scenario 100, where the video generating scenario 100 may include an electronic device 110 and a server 120, where the electronic device 110 is connected to the server 120.
The electronic device 110 may establish a connection with a server 120 that is also internet-accessed by accessing the internet. The electronic device 110 may access the internet in a wireless manner, for example, access the internet through a wireless communication technology Wi-Fi, bluetooth, etc.; the electronic device 110 may also access the internet by wired means, for example by Rj45 network cable or fiber optic.
The user may control the electronic device 110 so that the electronic device performs the video generating method, for example, the user may directly operate the electronic device 110 so that the electronic device performs the video generating method, where the electronic device 110 may be locally deployed with a specified algorithm, for example, may be pre-stored with the specified algorithm, so that the specified algorithm may be invoked to implement video generation, and for details, reference may be made to the following embodiments. Optionally, the electronic device 110 may also invoke a specified algorithm deployed in the server 120 to perform video generation.
The server 120 may be a cloud server or a local server.
For some embodiments, the video generation method may be applied to entertainment and gaming, for example, the entertainment industry may be used for movies, television shows, cartoons, games, etc. to generate corresponding video content. But also to advertising and marketing, for example, video content that may be used to generate attractive advertising and marketing. Specific video content may include product presentations, dynamic advertisement banners, animated identification, and the like. But also in Virtual Reality (VR) and Augmented Reality (AR), for example, video content may be generated that may be used for virtual reality and augmented reality. Specific video content may include virtual scenes, dynamic elements, interactive effect presentations, and the like. But also to the design and creative arts, for example, video content may be generated that may be used to assist in the design and creative process. Specific video content may include dynamic design prototypes, artistic effects, visual effect previews, and the like. In addition, it may also be applied to social media and expression packages, for example, video content may be generated that may be used for social media platforms and chat applications. Specific video content may include dynamic expression packages and images that are interesting, express rich emotions. Furthermore, the method can be applied to data visualization, for example, video content which can be used in the field of data visualization can be generated. In particular, video content may include dynamic charts, interactive graphics, and dynamic data presentations.
It should be noted that, the application scenarios of the video generating method provided by the present application shown above are only examples, and do not limit the embodiments of the present application.
Referring to fig. 2, fig. 2 is a flowchart of a video generating method according to an embodiment of the present application. The video generating method can be applied to the electronic device in the video generating scene shown in fig. 1, and specifically, a processor of the electronic device can be used as an execution subject for executing the video generating method. The video generation method may include steps S110 to S160.
Step S110: and acquiring a video to be fused and an initial image, and carrying out frame decomposition on the video to be fused to obtain a plurality of frame images to be fused.
Some dynamic effects in the video can be fused to the image, so that the video with the dynamic effects is regenerated. Thus, an initial image and a video to be fused may be acquired first. The initial image may be an image to which the dynamic effect needs to be fused, or may be understood as that the dynamic effect is fused on the basis of the initial image; and the video to be fused is the video containing the active object.
In some embodiments, the electronic device may store some image files and video files in advance, so that one image file may be selected from the pre-stored image files as an initial image; one video file can be selected from the pre-stored video files to serve as a video to be fused. For example, the electronic device may run a photo and a video application, thereby selecting an image file in the photo and the video application as an initial image; and selecting one video file as a video to be fused.
In other embodiments, the electronic device may obtain, via the application, an image file that is not stored locally as the initial image, and a video file as the video to be fused. By way of example, the electronic device may run a web browsing application that may correspond to image files as well as video files. Thus, the electronic equipment can acquire the image file in the webpage browsing application program as an initial image; and acquiring a video file in the webpage browsing application program as a video to be fused.
The video to be fused may be an mp4 format, mkv format, mov format, or the like, or may be a gif format file, which is not particularly limited in the embodiment of the present application.
For example, referring to fig. 3, fig. 3 is a schematic diagram illustrating an initial image provided by an embodiment of the present application. The initial image 300 shown in fig. 3 includes a target object 301 and a background 302. Wherein the background 302 may be an area other than the target area to which the target object 301 corresponds.
It will be appreciated that the video to be fused may be made up of a plurality of frame images, for example, a specified number of frame images may be included in each second of video to be fused. Specifically, the specified number may be 24, 30, 50, 60, 120, or the like. The frame rate of the video to be fused may also be referred to as a specified number of frame images included in each second, for example, the specified number is 24, that is, the frame rate of the video to be fused is 24; the fixed number is 60, namely, the frame rate of the video to be fused is 60. Therefore, in order to fuse the content in the video to be fused to the initial image, the video to be fused can be subjected to frame decomposition to obtain a plurality of frame images to be fused. The plurality of frame images to be fused are the plurality of frame images in the video to be fused.
For example, the video to be fused includes N frames of images, and each frame of image to be fused can be sequentially represented by a_n, where n=1, 2,3 … N.
For example, referring to fig. 4, fig. 4 is a schematic diagram of a frame image to be fused according to an embodiment of the present application. The frame image 400 to be fused shown in fig. 4 includes an object 401 to be fused.
Step S120: and acquiring a mask image of the target object in the initial image.
In order to facilitate the subsequent acquisition of a target video with more realistic illumination and shadow, a mask image of a target object in an initial image can be acquired first, so that the mask image can be directly invoked later.
Wherein, through the mask image, a plurality of designated areas in the image can be conveniently operated without influencing the areas except the designated areas in the image. In some embodiments, the mask image may be a binary or boolean image of the same size as the original image, with selected regions marked 1 (or True) and the remaining regions marked 0 (or False). The selected area is a target area corresponding to the target object in the initial image; the rest areas are the areas except the target areas corresponding to the target objects in the initial image. Thus, visually, the target area corresponding to the target object in the mask image may be one color, and the area other than the target area corresponding to the target object may be another color.
The target object may be a subject person in the initial image, such as the target object 301 in fig. 3 described above.
For some embodiments, a mask image of the target object in the initial image may be acquired by a subject matting model. The main body matting algorithm can identify main body characters in the input image, so that a mask image of a target object in the initial image is obtained.
For some embodiments, the subject matting model may be previously obtained by adjusting or training a pre-training model. The subject matting model may be implemented based on a u2net algorithm. The U2Net algorithm is an image segmentation network algorithm based on a U-Net structure.
For example, referring to fig. 5, fig. 5 shows a schematic diagram of a mask image according to an embodiment of the present application. The mask image 500 includes a target area 501 corresponding to the target object, and an area 502 other than the target area corresponding to the target object.
Step S130: and respectively fusing the initial image with each frame image to be fused based on a specified algorithm to obtain a plurality of images to be processed.
It will be appreciated that the subsequent generation of the video requires the merging of multiple frame images to obtain the video. That is, the initial image may be fused with each frame image to be fused, respectively, so that a plurality of images to be processed may be obtained. The video may be generated after the plurality of images to be processed are subsequently processed.
For some embodiments, in order to reduce the requirement of manpower for generating the video and reduce the time required for generating the video, the initial image may be fused with each frame image to be fused by a designated algorithm, so as to obtain a plurality of images to be processed.
Wherein the specified algorithm may be a graphical algorithm. The specified algorithm may be performed, for example, by a stable diffusion running in the electronic device. And intelligent optimization is performed through the generating capability of stable diffusion, so that the process of generating the target video can be more efficient and accurate.
Optionally, the image to be processed with better effect can be obtained by calling inpainting algorithm in combination with the graphic algorithm, and the detailed description can refer to the following embodiments. The inpainting algorithm is an algorithm technology based on image restoration, and is mainly used for restoring missing or damaged parts in images. The inpainting algorithm can be used for recovering damaged pictures, removing or reducing noise in the image by analyzing information of surrounding images and repairing or repairing the images by using unbroken information in the images.
It can be understood that each image to be processed includes a target object and a dynamic object, and the dynamic object is the object to be fused in the image frame to be fused. That is, the target object in each image to be processed is the same, i.e., the target object in the initial image. The moving effect object in each image to be processed is different, that is, the moving effect object in each image to be processed corresponds to the object to be fused in the image frame to be fused for generating the image to be processed.
Therefore, in the embodiment of the application, the initial image is fused with each frame image to be fused by directly calling the designated algorithm to obtain a plurality of images to be processed, so that the initial image is prevented from being processed on each frame image to be fused by manpower in sequence, the requirement on manpower is reduced, the efficiency of acquiring the plurality of images to be processed is improved, and the efficiency of generating the video is improved as a whole. Meanwhile, the error generated by manpower can be reduced, and the stability of the video acquisition method is further improved.
Step S140: and determining a first color parameter corresponding to each image to be processed based on the mask image, and determining a second color parameter based on the initial image.
After a plurality of images to be processed are acquired, because the area of each image to be processed except for the target area of the target object is changed compared with the initial image, the target object can be re-polished based on the color parameters of the area of the image to be processed except for the target area of the target object, so that the light shadow of the target object is more vivid.
The first color parameter corresponding to each image to be processed may be obtained, where the first color parameter may be a color parameter corresponding to an area of each image to be processed except for a target area of a target object. Specifically, the first color parameter corresponding to each image to be processed may be determined based on the mask image. In some embodiments, the mask image may be used to determine an area outside the target area corresponding to the target object in the image to be processed, and then determine the first color parameter, which may be described in detail in the following examples.
Further, a second color parameter may be determined based on the initial image, where the second color parameter is a color parameter corresponding to the initial image.
The color parameters can be determined by parameters of red, green and blue (RGB) color channels, and the color parameters are the first color parameters and the second color parameters. Specific determination methods can be found in the following examples.
Step S150: and re-polishing each image to be processed based on the second color parameters, the mask image, the initial image and the first color parameters corresponding to each image to be processed to obtain a target image.
Further, after the first color parameter corresponding to each image to be processed and the second color parameter of the initial image are obtained, the image to be processed can be subjected to re-lighting, and then the image with more vivid illumination and shadow is obtained. The adjustment of the re-lighting can eliminate the problem of uncoordinated light shadow, so that the generated target image is more consistent with the light shadow of the initial image, the immersion and realism of the audience are enhanced, and the picture of the target image is more coordinated and natural.
Specifically, the target area of the target object in the initial image can be screened out through the mask image, and then the screened target area of the target object in the initial image is subjected to re-polishing. The re-lighting may be performed based on a specified color parameter, which may be determined based on the second color parameter and the first color parameter corresponding to each image to be processed, and the detailed description will refer to the following embodiments.
And for the areas except the target areas corresponding to the target objects in the to-be-processed image, the initial image is determined, and the re-polishing is not performed.
Furthermore, the target area of the target object in the initial image after the re-polishing and the area except the target area corresponding to the target object in the image to be processed in the initial image without the re-polishing can be fused to obtain a target image, and the detailed description can refer to the subsequent embodiment.
It can be understood that each frame image to be fused can correspondingly generate one image to be processed, and further, one target image is determined to be obtained, so that if N frame images to be fused exist, N images to be processed and N target images can exist.
Referring to fig. 6, fig. 6 is a schematic diagram of a target image according to an embodiment of the present application. Fig. 6 illustrates a target image 600 including a target object 601 and a background 602, where an area corresponding to the background 602 is an area other than an area where the target object 601 is located. Also included in the background 602 is a dynamic object 603.
Step S160: and generating a target video based on the target image.
After the target image is acquired, a target video may be generated based on the target image. The plurality of target images can be regarded as frame images, and then the frame images are combined to generate the target video. In some implementations, image video editing software may be invoked to enable generation of a target video based on the target image. And the generated target video can comprise a target object and a dynamic effect object. In the target video, the moving object is in variable motion, and the content outside the moving object is kept stationary.
The video generation method provided by the embodiment of the application comprises the steps of firstly obtaining a video to be fused and an initial image, and carrying out frame decomposition on the video to be fused to obtain a plurality of frame images to be fused; then obtaining a mask image of the target object in the initial image; fusing the initial image with each frame image to be fused based on a specified algorithm to obtain a plurality of images to be processed; determining a first color parameter corresponding to each image to be processed based on the mask image, and determining a second color parameter based on the initial image; performing re-lighting on each image to be processed based on the second color parameters, the mask image, the initial image and the first color parameters corresponding to each image to be processed to obtain a target image; and finally, generating a target video based on the target image. Firstly, fusing the initial image with each frame image to be fused respectively through a designated algorithm to obtain a plurality of images to be processed, so that editing through manpower is avoided, labor cost is saved, and efficiency of acquiring the images to be processed is improved. In addition, the obtained image is subjected to re-lighting through the color parameters, so that a target video with more vivid illumination and shadow can be obtained.
Referring to fig. 7, fig. 7 shows a method flowchart of a video generating method according to an embodiment of the present application. The video generating method can be applied to the electronic device in the video generating scene shown in fig. 1, and specifically, a processor of the electronic device can be used as an execution subject for executing the video generating method. The video generation method may include steps S210 to S2120.
Step S210: and acquiring a video to be fused and an initial image, and carrying out frame decomposition on the video to be fused to obtain a plurality of frame images to be fused.
The step S210 is described in detail in the foregoing embodiments, and is not described herein.
Step S220: and determining an initial mask image of the target object from the initial image based on a pre-acquired main body matting model.
Step S230: and adjusting the resolution of the initial mask image to a specified resolution, so as to obtain the mask image, wherein the specified resolution is the resolution of a potential space corresponding to the initial image.
First, an initial mask image of the target object may be determined from an initial image, and in particular, the initial mask image of the target object may be determined from the initial image through a pre-acquired subject matting model. The main body matting model can be obtained by adjusting or training a pre-training model in advance. The body matting algorithm can identify a main person in the input image, so that a mask image of a target object in the initial image is obtained.
For example, the output initial image may be input into the subject matting model, so that an image output by the subject matting model is obtained as a mask image.
In some implementations, the initial image may be encoded to obtain a potential feature image of the initial image in a potential space (LATENT SPACE). That is, the conversion into the potential space for processing the initial image and the processing of the initial image in the potential space can save the computing resources.
Therefore, optionally, the image output by the main body matting model can be used as an initial mask image, and the resolution of the initial mask image is further adjusted, so that the resolution of the initial mask image after the resolution is adjusted is the same as the resolution of the potential space. And further, the processing of the potential characteristic image or the mask image in the potential space can be realized, and the computing resource is saved.
Specifically, the resolution of the initial mask image may be adjusted to a specified resolution, so as to obtain the mask image, where the specified resolution is the resolution of the potential space corresponding to the initial image. For example, the specified resolution may be 128×128; for another example, the specified resolution may be 64×64.
Step S240: and obtaining a to-be-fused line manuscript corresponding to each to-be-fused frame image.
In some embodiments, a to-be-fused line manuscript corresponding to each to-be-fused frame image may be obtained first. The to-be-fused line manuscript graph can be used for representing information such as lines, shapes and the like of to-be-fused frame images.
For example, a line manuscript diagram to be fused corresponding to each frame image to be fused can be generated through a line manuscript diagram solving algorithm. Specifically, the linear graph solving algorithm can be a linear detector (LineartDetector) matched with training in the control network (Controlnet), wherein the linear detector can be used for identifying and positioning a target object in an image, so that the functions of target tracking, identifying and the like in a control task are realized. The linear detector can classify and position objects in the images by using a linear classifier (such as SVM, logistic regression and the like), classify the input images by using a trained model, and output the type and position information of the objects so as to determine the to-be-fused line manuscript corresponding to the to-be-fused frame images. Wherein the control network is a neural network architecture that can control the model and allow the model to support more input conditions. The original model can accept the input of prompt words and original images, and the control network provides various input conditions including canny edges, semantic segmentation graphs, key points, graffiti and the like, so that the controllability of artificial intelligence generated Content (ARTIFICIAL INTELLIGENCE GENERATED Content, AIGC) is greatly improved.
As can be seen from the foregoing description, after the video to be fused is subjected to frame decomposition, a plurality of frame images to be fused can be obtained. Therefore, the line manuscript graph to be fused corresponding to each frame image to be fused can be obtained through a line manuscript graph solving algorithm.
Referring to fig. 8, fig. 8 is a schematic diagram of a to-be-fused line manuscript provided by an embodiment of the application. The to-be-fused document 800 shown in fig. 8 includes an object 801 to be fused. Referring to fig. 4 and fig. 8 together, it can be seen that the line manuscript 800 to be fused shown in fig. 8 is the line manuscript corresponding to the frame image 400 to be fused shown in fig. 4.
For some embodiments, the to-be-fused line contribution graphs of the N to-be-fused frame images may be sequentially acquired, for example, for the to-be-fused frame image a_n, the to-be-fused line contribution graphs of the N to-be-fused frame images may be sequentially acquired according to the order from 1 to N; the to-be-fused line manuscript of the N to-be-fused frame images can be acquired in sequence according to N to 1, and the embodiment of the application is not particularly limited.
Step S250: and acquiring a depth image of the initial image and an initial line manuscript of the initial image.
A depth image of the initial image may also be acquired. Illustratively, the depth image of the initial image may be extracted by a depth learning model. For example, the depth learning model may extract a depth map algorithm for the monocular image, and in particular may be the algorithm provided in diffusers. The monocular image extraction depth map algorithm may enable prediction of the image depth from an input image.
Further, an initial line contribution of the initial image may also be obtained. The method for acquiring the initial line manuscript of the initial image is similar to the method for acquiring the line manuscript to be fused corresponding to the frame image to be fused in the previous steps, and the initial line manuscript of the initial image can be acquired through a line manuscript acquisition algorithm. Reference will be made to the foregoing steps for details, and details are not repeated here.
Referring to fig. 9, fig. 9 is a schematic diagram of an initial draft provided by an embodiment of the present application. The initial draft 900 in fig. 9 includes a target object 901 and a background 902. Wherein the background 902 may be an area other than the target area corresponding to the target object 901.
Step S260: and fusing the initial line manuscript with each line manuscript to be fused respectively to obtain a plurality of line manuscript images to be processed.
Further, after the initial line manuscript and the plurality of line manuscripts to be fused are obtained, the initial line manuscripts can be fused with each line manuscripts to be fused respectively, so that a plurality of line manuscripts images to be processed can be obtained.
In some embodiments, a target position may be first determined in the initial line manuscript, where the target position is used to insert the line manuscripts to be fused, so as to implement fusion of the initial line manuscripts with each line manuscripts to be fused respectively.
The plane coordinate system can be established by using the initial line manuscript, for example, a vertex of the initial line manuscript is used as a coordinate origin, and two sides of the initial line manuscript connected with the origin are respectively used as an x axis and a y axis to establish the plane coordinate system. The target position can thus be characterized by coordinates (x, y). Furthermore, the appointed position of the line manuscript graph to be fused can be aligned with the target position, so that the initial line manuscript graph and the line manuscript graph to be fused are fused. For example, the designated position may be set as the vertex of the upper left corner of the line contribution graph to be fused, so that the upper left corner vertex of the line contribution graph to be fused may be set at the target position.
For example, the target location may be manually selected in the initial draft. For example, a point in the initial draft may be manually clicked, and the clicked point may be used as the target position. Still another exemplary, an arbitrary position other than the target region of the target object in the initial draft may be set as the target position.
If the object to be fused in the line manuscript to be fused is a fish, an area corresponding to water can be found in an area except for a target area of the target object in the initial line manuscript, so that a point is determined as a target position in the area of water. In another exemplary embodiment, if the object to be fused in the line manuscript to be fused is a bird, an area corresponding to the sky may be found in an area other than the target area of the target object in the initial line manuscript, so as to determine a point in the area of the sky as the target position.
Further, the initial line manuscript is fused with each line manuscript to be fused respectively, and after the line manuscript to be fused is inserted into the target position, pixel-by-pixel addition is performed on the line manuscript to be fused and the initial line manuscript, so that a plurality of line manuscripts to be processed are obtained. It should be noted that, each line manuscript to be fused is fused with the initial line manuscript at the same target position.
For example, referring to fig. 10, fig. 10 shows a schematic diagram of a document image to be processed according to an embodiment of the present application. The to-be-processed document image 1000 shown in fig. 10 includes a target object 1010, an active object 1020, and a background 1030.
Step S270: and based on the self-encoder, encoding the initial image, and acquiring a potential characteristic image of the initial image in a potential space.
As can be seen from the foregoing description, the initial image may be converted into the potential space to obtain the potential feature image, so that the subsequent processing is performed in the potential space to save the computing resource.
For some embodiments, the initial image may be encoded by a self-encoder (Variational Autoencoder, VAE) to obtain a potential feature image of the initial image in a potential space. Wherein the self-encoder includes an encoder (Encoder) and a Decoder (Decoder). Thus, an initial image may be encoded by an encoder in the self-encoder to obtain a potential feature image of the initial image in a potential space. Wherein the self-encoder learns to generate a model of high-dimensional data with latent variable representations by combining an automatic encoder and variation inference.
Step S280: and adding Gaussian distribution noise to the potential characteristic image to obtain the noise image.
Further, noise may be added to the latent feature image, resulting in the noisy image. For some embodiments, a gaussian distribution noise may be added to the latent feature image, resulting in a noisy image. For example, the gaussian distribution noise may be added to the feature image a plurality of times. Specifically, the noise image may be added with T times of gaussian distribution noise, where the forward step may be a markov chain, so that each step of adding gaussian distribution noise is only related to the last time, and a picture is changed into pure gaussian noise, so as to obtain the noise image.
Step S290: and obtaining a to-be-processed image corresponding to each to-be-processed line manuscript image based on a specified algorithm through the depth image, the to-be-processed line manuscript image and the noise image, wherein the noise image is an image obtained by adding noise to the initial image.
And further, the depth image, the line manuscript image to be processed and the noise image can be combined and processed through a specified algorithm to obtain the image to be processed corresponding to the line manuscript image to be processed. And the noise image is the image obtained by adding noise to the initial image in the previous step. Specifically, an initial image may be first converted into a latent feature image located in a latent space, and then gaussian distribution noise is added to the latent feature image to obtain the noise image.
It will be appreciated that since the noise image is an image located in the potential space, the image obtained by the specification algorithm is also an image located in the potential space. Thus, it is also necessary to further decode the image obtained by the specified algorithm.
Specifically, step S290 may further include step S291 and step S292.
Step S291: and obtaining an image to be decoded corresponding to each line manuscript image to be processed through the depth image, the line manuscript image to be processed and the noise image based on a specified algorithm.
Step S292: and decoding each image to be decoded based on the self-encoder to obtain a corresponding image to be processed of each document image to be processed.
Therefore, the image to be decoded corresponding to each line manuscript image to be processed can be obtained through the depth image, the line manuscript image to be processed and the noise image based on a specified algorithm. However, the obtained to-be-decoded image corresponding to each to-be-processed line manuscript image is an image located in a potential space, so that each to-be-decoded image can be decoded by a decoder of a self-encoder to obtain the to-be-processed image corresponding to each to-be-processed line manuscript image.
In particular, each of the images to be decoded may be decoded by a decoder in the self-encoder.
The method comprises the steps of combining a depth image, a line manuscript image to be processed and a plurality of images to be processed determined by a noise image in a specified algorithm, and combining the semantics and the contextual understanding of an initial image to enable the generated images to be processed to be more consistent with the theme or the semantics of the initial image, so that the overall quality and the attraction of a subsequent target video are improved, and the target video and the initial image form a consistent visual style and theme.
The detailed description of the to-be-processed image corresponding to each to-be-processed line manuscript image is obtained through the depth image, the to-be-processed line manuscript image and the noise image based on a specified algorithm, and can refer to the following embodiments.
Step S2100: and determining a first color parameter corresponding to each image to be processed based on the mask image, and determining a second color parameter based on the initial image.
Step S2110: and re-polishing each image to be processed based on the second color parameters, the mask image, the initial image and the first color parameters corresponding to each image to be processed to obtain a target image.
Step S2120: and generating a target video based on the target image.
For a detailed description of step S2100 to step S2120, reference may be made to the foregoing embodiments, and details are not repeated here.
The video generation method provided by the embodiment of the application can be converted into the potential space to process the initial image, and the initial image is processed in the potential space, so that the calculation resources can be saved. And a plurality of images to be processed, which are determined by combining the depth image, the line manuscript image to be processed and the noise image, are combined in a specified algorithm, so that the generated images to be processed more accord with the theme or the semanteme of the initial image by combining the semanteme and the context understanding of the initial image, the overall quality and the attraction of the subsequent target video are improved, and the target video and the initial image form the consistent visual style and theme. That is, in the embodiment of the application, the object video is acquired through AIGC, so that the intention of the initial image can be better understood and expressed, and the visual effect and the overall quality of the object video are improved
Referring to fig. 11, fig. 11 is a flowchart illustrating a method for generating video according to an embodiment of the present application. The video generating method can be applied to the electronic device in the video generating scene shown in fig. 1, and specifically, a processor of the electronic device can be used as an execution subject for executing the video generating method. The video generating method may include steps S310 to S3120.
Step S310: and acquiring a video to be fused and an initial image, and carrying out frame decomposition on the video to be fused to obtain a plurality of frame images to be fused.
Step S320: and acquiring a mask image of the target object in the initial image.
Step S330: and obtaining a to-be-fused line manuscript corresponding to each to-be-fused frame image.
Step S340: and acquiring a depth image of the initial image and an initial line manuscript of the initial image.
Step S350: and fusing the initial line manuscript with each line manuscript to be fused respectively to obtain a plurality of line manuscript images to be processed.
The steps S310 to S350 are described in detail in the foregoing embodiments, and are not repeated here.
Step S360: depth feature information of the depth image is extracted based on a depth model.
Step S370: and extracting the line manuscript characteristic information of the line manuscript image to be processed based on the line manuscript model.
In some embodiments, when the to-be-processed image corresponding to each to-be-processed line-manuscript image is obtained through the depth image, the to-be-processed line-manuscript image and the noise image based on a specified algorithm, depth feature information of the depth image and line-manuscript feature information of the to-be-processed line-manuscript image may be acquired first. That is, the depth characteristic information extracted from the depth image and the image to be processed corresponding to the line manuscript image to be processed may be included in the input to the specified algorithm. By means of the appointed algorithm, depth characteristic information and line manuscript characteristic information can be considered, and further characteristics, semantics and context information of an initial image are considered, so that a target video generated later can form a more coordinated and unified visual effect.
Specifically, depth feature information of the depth image can be extracted through a depth model (Controlnet-depth); and extracting the line manuscript characteristic information of the line manuscript image to be processed through a line manuscript model (Controlnet-lineart). The depth image can be used as a Condition (Condition) to be input into the depth model, so that depth characteristic information output by the depth model is obtained; the line manuscript image to be processed can be input into the line manuscript model as a condition, so that the line manuscript characteristic information output by the line manuscript model is obtained.
The draft characteristic information can be used for representing the outline and the edge characteristics of the input draft to be processed.
Step S380: and adding Gaussian noise to the target object in the noise image to obtain an intermediate image.
For some embodiments, noise may be further added to the target object in the noisy image, such as adding gaussian noise, to obtain an intermediate image. The noise image may be an image obtained by adding noise to the initial image. For a detailed description of acquiring the noise image, reference may be made to the description of the foregoing embodiments, which is not repeated here.
Specifically, step S380 may include steps S381 to S384.
Step S381: and determining a target area corresponding to a target object in the noise image based on the mask image as a first image area.
Step S382: and determining a region except the first image region in the noise image based on the mask image as a second image region.
Step S383: and adding Gaussian noise to the first image area to obtain a noise image area.
Step S384: and fusing the noise image area with the second image area to obtain the intermediate image.
It will be appreciated that, in order to add gaussian noise to a target object in a noisy image, the region corresponding to the target object in the noisy image may be first determined. Specifically, a target area corresponding to a target object in the noise image may be determined as the first image area by the mask image.
Further, an area other than the first image area in the noise image may also be determined. Similarly, an area other than the first image area in the noise image can also be determined by the mask image as the second image area.
In order to achieve an increase in gaussian noise to the target object in the noisy image, the first image region may be increased with gaussian noise, while the second region is not increased with gaussian noise. In some embodiments, gaussian noise may be added to the first image region to obtain a noisy image region. And then fusing the noise image area with the second image area to obtain the intermediate image. That is, the second image region included in the intermediate image is a region to which gaussian noise is not added again.
Illustratively, steps S341 through S344 may be characterized by the following equation (1):
E=Di*C+E*(1-C) (1)
Wherein C is used to characterize the mask image, which may be, for example, an image of 128 x 128 pixels; di is used to characterize the noise image; di×c is equivalent to adding gaussian noise to the first region, that is, adding gaussian noise to the target object in the noise image, so as to obtain a noise image region; (1-C) corresponding to a mask of the background, that is, a region of the mask image other than the target region of the target object; e (1-C) corresponds to not adding gaussian noise to the second region, i.e. to regions other than the target region of the target object in the noisy image, here corresponds to directly multiplexing the second region. That is, the fusion of the noise image area and the second image area is realized, and the intermediate image is obtained.
It should be noted that, the output obtained by the above formula (1) may be a noise image after noise is added again, that is, E, and E may be used as the input amount of executing the formula (1) next time, so as to implement iteration on the noise image. The iteration of E may be performed multiple times through equation (1) above to add gaussian noise multiple times to the first region. For example, in some embodiments, the multiple times may be i times, that is, i times of iteration on E, i times of gaussian noise is added to the first region, so that the obtained i times of iteration output E is an intermediate image.
Step S390: and carrying out noise reduction processing on the intermediate image through the depth characteristic information and the line manuscript characteristic information based on a specified algorithm to obtain a to-be-processed image corresponding to each to-be-processed line manuscript image.
Further, after the depth feature information and the line manuscript feature information are obtained through the foregoing steps, the intermediate image may be subjected to noise reduction processing through a specified algorithm, so as to obtain a to-be-processed image corresponding to each to-be-processed line manuscript image.
It should be noted that, firstly, noise is added to an image, and then noise reduction is performed to the noise-added image, and the algorithm inpainting is essentially utilized. Since inpainting algorithm is mainly used to repair missing or damaged portions in the image. The method is used for repairing or repairing images by analyzing information of surrounding images and utilizing unbroken information in the images, and is generally used for recovering damaged pictures, removing or reducing noise in the images. Therefore, by firstly carrying out noise adding on one image and then carrying out noise reduction processing on the image after noise adding by calling inpainting algorithm, the information of the image before noise adding can be recovered.
In addition, if one image is directly subjected to noise addition without being guided by other conditions, and then the noise reduction processing is performed on the image subjected to noise addition, even if the image can be restored, details of the restored image can be lost in the process of encoding and decoding by the self-encoder, for example, in the up-down sampling process. Therefore, in the process of performing the noise reduction processing on the intermediate image, the depth feature information and the line manuscript feature information are used for guiding the specified algorithm, so that the effect of the obtained image to be processed corresponding to each line manuscript image to be processed can be improved, and particularly, unnatural flaws (artifacts) in the boundary between the target region corresponding to the target object and the region except for the target region corresponding to the target object in the obtained image to be processed can be avoided.
Optionally, in some embodiments, step S390 may further include step S391 and step S392.
Step S391: and obtaining an effect prompt word, wherein the effect prompt word is used for adjusting the light and shadow effect of the intermediate image.
Optionally, in some embodiments, besides guiding the specified algorithm to operate on the intermediate image through the depth feature information and the line manuscript feature information, the intermediate image may be further adjusted in combination with the effect prompt word.
The effect prompt word can be used for adjusting the light and shadow effect of the intermediate image. For example, some preset effect prompt words may be stored in advance, so that the user may directly select a desired effect prompt word from among the preset effect prompt words. For another example, the user may directly input a desired effect prompt.
By way of example, the effect cue words may be "gold light", "DARK NIGHT", and the like.
Step S392: and performing target operation on the intermediate image through the depth characteristic information, the line manuscript characteristic information and the effect prompt word based on a specified algorithm to obtain a to-be-processed image corresponding to each to-be-processed line manuscript image, wherein the target operation comprises noise reduction processing and light and shadow effect adjustment.
After the effect prompt word is obtained, target operation can be carried out on the intermediate image through the depth characteristic information, the line manuscript characteristic information and the effect prompt word based on a specified algorithm, and a to-be-processed image corresponding to each to-be-processed line manuscript image is obtained. The target operation includes noise reduction processing and light effect adjustment. The relevant description of the intermediate image for noise reduction processing is guided by the depth feature information and the line manuscript feature information to a specified algorithm, and the relevant description of the to-be-processed image corresponding to each to-be-processed line manuscript image can be obtained through the depth image, the to-be-processed line manuscript image and the noise image based on the specified algorithm, which is not described herein.
Furthermore, the effect prompt word can be used for guiding a specified algorithm to adjust the light and shadow effect of the intermediate image, so that more realistic illumination and shadow effect can be obtained.
Step S3100: and determining a first color parameter corresponding to each image to be processed based on the mask image, and determining a second color parameter based on the initial image.
Step S3110: and re-polishing each image to be processed based on the second color parameters, the mask image, the initial image and the first color parameters corresponding to each image to be processed to obtain a target image.
Step S3120: and generating a target video based on the target image.
Step S3100 is described in detail in the foregoing embodiments, and is not described herein.
According to the video generation method provided by the embodiment of the application, the depth characteristic information and the line manuscript characteristic information can be considered through the appointed algorithm, so that the characteristics, the semantics and the context information of the initial image are considered, and the subsequently generated target video can form a more coordinated and unified visual effect. And, can avoid appearing the unnatural flaw in the interface of the goal area that the goal object corresponds to and area except goal area that goal object corresponds to in the treated picture obtained. In addition, in the embodiment of the application, the light and shadow effects of the intermediate image can be adjusted through the effect prompt word guiding and specifying algorithm, so that more realistic illumination and shadow effects are obtained.
It should be noted that, although the moving-effect object in the generated target video is specifically determined by the moving-effect object in the video to be fused, for example, the moving-effect object in the video to be fused is one dragon, the moving-effect object in the generated target video is also one dragon. However, illumination, shadow, display transparency and the like of the dynamic object in the target video can be flexibly adjusted in the video generation method provided by the application, so that the generated target video has more coordinated and unified visual effect.
Referring to fig. 12, fig. 12 is a flowchart illustrating a method for generating video according to an embodiment of the present application. The video generating method can be applied to the electronic device in the video generating scene shown in fig. 1, and specifically, a processor of the electronic device can be used as an execution subject for executing the video generating method. The video generation method may include steps S410 to S4110.
Step S410: and acquiring a video to be fused and an initial image, and carrying out frame decomposition on the video to be fused to obtain a plurality of frame images to be fused.
Step S420: and acquiring a mask image of the target object in the initial image.
Step S430: and respectively fusing the initial image with each frame image to be fused based on a specified algorithm to obtain a plurality of images to be processed.
The steps S410 to S430 are described in detail in the foregoing embodiments, and are not repeated here.
Step S440: and determining an area outside a target area corresponding to the target object in the image to be processed based on the mask image, and taking the area as a third image area.
As can be seen from the description of the foregoing embodiments, since the area of each image to be processed other than the target area of the target object is changed compared with the initial image, the target object can be re-polished based on the color parameters of the area of the image to be processed other than the target area of the target object, so that the light shadow of the target object is more realistic.
Therefore, an area other than the target area corresponding to the target object in the image to be processed can be first determined. Specifically, an area outside the target area corresponding to the target object in the image to be processed can be determined based on the mask image and used as a third image area, and further, the third image area can be subjected to re-lighting. For example, the first color parameter may be determined based on the third image area, and then the second color parameter may be determined in combination with the target area corresponding to the target object in the initial image, so that the third image area may be re-polished in combination with the first color parameter and the second color parameter. For detailed description, reference is made to the following steps.
Step S450: and acquiring a first red parameter corresponding to the red channel, a first green parameter corresponding to the green channel and a first blue parameter corresponding to the blue channel in each third image area.
It will be appreciated that the region of the image may comprise a plurality of pixels, each pixel being defined by three colour channels, red Green Blue (RGB), each colour channel having a different value, so that the pixels may exhibit different colours.
Thus, in some embodiments, the method for determining the first color parameter corresponding to the third image area may specifically be to first obtain the first red parameter corresponding to the red channel, the first green parameter corresponding to the green channel, and the first blue parameter corresponding to the blue channel in the third image area. The first red parameter may represent a color average of the red channel in the third image area, that is, an average of values of the red channel of each pixel point in the third image area. Similarly, the first green parameter may characterize the color average of the green channel in the third image area, i.e. the average of the values of the green channel for each pixel point in the third image area. The first blue parameter may characterize the color average of the blue channel in the third image area, i.e. the average of the values of the blue channel for each pixel point in the third image area.
It should be noted that, since the number of images to be processed is plural, the first red parameter, the first green parameter, and the first blue parameter determined based on each image to be processed may be slightly different. Specifically, color analysis may be performed on the image to be processed, so as to obtain a first red parameter corresponding to the red channel, a first green parameter corresponding to the green channel, and a first blue parameter corresponding to the blue channel in each third image area.
Step S460: and determining a first color parameter corresponding to each image to be processed based on the first red parameter, the first green parameter and the first blue parameter.
After the first red parameter, the first green parameter and the first blue parameter are acquired, the first color parameter corresponding to each image to be processed may be determined based on the first red parameter, the first green parameter and the first blue parameter. For some embodiments, an average value of the first red parameter, the first green parameter, and the first blue parameter may be used as the first color parameter corresponding to each of the images to be processed. Illustratively, the first color parameter may be characterized by f_mean.
Step S470: and acquiring a second red parameter corresponding to the red channel, a second green parameter corresponding to the green channel and a second blue parameter corresponding to the blue channel in the initial image.
The method for determining the second color parameter corresponding to the initial image may specifically be that the second red parameter corresponding to the red channel, the second green parameter corresponding to the green channel, and the second blue parameter corresponding to the blue channel in the initial image are obtained first. The second red parameter may represent a color average of a red channel in the initial image, that is, an average of values of the red channel of each pixel point in the initial image. Similarly, the second green parameter may represent the color average of the green channel in the initial image, i.e., the average of the values of the green channel for each pixel point in the initial image. The second blue parameter may represent the color average of the blue channel in the initial image, i.e. the average of the values of the blue channel for each pixel point in the initial image.
It should be noted that, determining the second red parameter corresponding to the red channel, the second green parameter corresponding to the green channel, and the second blue parameter corresponding to the blue channel in the initial image may be determining the second red parameter, the second green parameter, and the second blue parameter corresponding to the entire image area in the initial image.
Step S480: the second color parameter is determined based on a second red parameter, a second green parameter, and a second blue parameter.
After the second red parameter, the second green parameter and the second blue parameter are acquired, a second color parameter corresponding to the initial image may be determined based on the second red parameter, the second green parameter and the second blue parameter. For some embodiments, the average of the second red parameter, the second green parameter, and the second blue parameter may be used as the second color parameter corresponding to the initial image. The second color parameter may be characterized by a_mean, for example.
Step S490: and determining a target area corresponding to the target object in the initial image based on the mask image, and taking the target area as a fourth image area.
In order to perform re-lighting on the target object in the image to be processed, the target object in each image to be processed is the target object in the initial image, so that a target area corresponding to the target object in the initial image can be determined.
Specifically, a target area corresponding to the target object in the initial image may be determined based on the mask image and used as the fourth image area.
Step S4100: and fusing a fourth image area which is re-polished based on a specified color parameter with a third image area to obtain a target image, wherein the specified color parameter is a parameter determined based on the quotient of the first color parameter and the second color parameter, and the third image area is an area which is determined to be except for a target area corresponding to a target object in the image to be processed based on the mask image.
After the first color parameter and the second color parameter are acquired, the fourth image area and the third image area which are subjected to repeated lighting based on the designated color parameter can be fused to obtain the target image. The designated color parameter is a parameter determined based on the quotient of the first color parameter and the second color parameter, and the third image area is an area except for a target area corresponding to a target object in the image to be processed determined based on the mask image.
By way of example, a specific method of obtaining the target image can be described by the following formula (2):
G=A*C*F_mean/A_mean+F*(1-C) (2)
Wherein G is used for representing a target image; a is used for representing an initial image; c is used for representing the mask image; f is used for representing the image to be processed; f (1-C) corresponds to multiplexing the fourth region directly, i.e. without re-polishing the fourth region; f_mean is used to characterize the first color parameter, A_mean is used to characterize the second color parameter; f_mean/a_mean is used to characterize the specified color parameter; a C F mean/a mean characterizes the re-polishing of the third region. And A.C.F./A.mean+F. (. 1-C) represents the fusion of the fourth image area, which is re-lit based on the specified color parameters, with the third image area to obtain the target image G.
Step S4110: and generating a target video based on the target image.
Step S4110 is described in detail in the foregoing embodiments, and is not repeated here.
According to the video generation method provided by the embodiment of the application, the target object in the fourth area is subjected to re-lighting based on the third area in the acquired target image. The consistency of illumination and shadow of the target area where the target object is located and the areas except the target area where the target object is located is good, and the fusion edge of the two areas can not have unnatural flaws.
Referring to fig. 13, fig. 13 is a flowchart illustrating a method for generating video according to an embodiment of the present application. The video generating method can be applied to the electronic device in the video generating scene shown in fig. 1, and specifically, a processor of the electronic device can be used as an execution subject for executing the video generating method. The video generation method may include steps S510 to S5250.
Step S510: starting.
Step S520: and obtaining the video to be fused.
Step S530: and carrying out frame decomposition on the video to be fused.
Firstly, a video to be fused can be acquired, and the video to be fused is the video containing the active object. Then, the video to be fused is subjected to frame decomposition, for example, the video to be fused comprises N frames of images, and each frame of image to be fused can be sequentially represented by a_n, wherein n=1, 2,3 … N.
The specific manner of acquiring the video to be fused and the manner of performing frame decomposition can be referred to the description of the foregoing embodiments, and will not be repeated here.
Step S540: and obtaining the manuscript graph to be fused.
Furthermore, the line manuscript diagrams to be fused corresponding to each frame image to be fused can be obtained, so that a plurality of line manuscripts are obtained.
Step S550: whether each to-be-fused line manuscript graph is traversed or not.
And processing one to-be-fused line manuscript at each time until each to-be-fused line manuscript is traversed. If not, the step S560 may be skipped; if the traversal is completed, the execution of step S5240 can be skipped.
Step S560: an initial image is acquired.
Step S570: and acquiring a mask image.
Step S580: adjusted to a specified resolution.
An initial image may be acquired and then a mask image of the initial image may be acquired by a subject matting algorithm, wherein the mask image may be used to characterize a target object in the initial image. The resolution of the mask image is then adjusted so that the resolution of the mask image matches the resolution of the underlying space.
Step S590: and (5) encoding by a self-encoder.
Step S5100: a latent feature image is acquired.
In some embodiments, the initial image may be encoded by a self-encoder to obtain a potential feature image located in a potential space.
Step S5110: increasing gaussian distributed noise.
Further, gaussian distribution noise can be added to the latent feature image to obtain a noise image. Specifically, the noise image may be added with T times of gaussian distribution noise, where the forward step may be a markov chain, so that each step of adding gaussian distribution noise is only related to the last time, and a picture is changed into pure gaussian noise, so as to obtain the noise image.
Step S5120: a depth image is acquired.
The depth image of the initial image may be extracted by a depth learning model.
Step S5130: and obtaining an initial draft diagram.
The method for acquiring the initial line manuscript of the initial image is similar to the method for acquiring the line manuscript to be fused corresponding to the frame image to be fused in the previous steps, and the initial line manuscript of the initial image can be acquired through a line manuscript acquisition algorithm.
Step S5140: obtaining a to-be-processed manuscript graph.
In some embodiments, the initial line manuscript graph may be fused with each line manuscript graph to be fused, so as to obtain a plurality of line manuscript images to be processed. The specific fusion method may be referred to the description of the foregoing embodiments, and will not be repeated here.
Step S5150: and (5) a line manuscript model.
Step S5160: and (5) a depth model.
Depth feature information of the depth image can be extracted through a depth model; and extracting the line manuscript characteristic information of the line manuscript image to be processed through a line manuscript model.
Step S5170: and obtaining the effect prompt words.
Besides guiding a specified algorithm to operate on the intermediate image through the depth characteristic information and the line manuscript characteristic information, the intermediate image can be further adjusted by combining effect prompt words. The effect prompt word can be used for adjusting the light and shadow effect of the intermediate image. The specific acquisition manner may be referred to the description in the foregoing embodiments, and will not be repeated here.
Step S5180: noise reduction and light and shadow effect adjustment.
And calling a designated algorithm to reduce noise of the noise image and adjust the light and shadow effect. The appointed algorithm can adjust the light and shadow effect under the guidance of the effect prompt word, and reduce noise under the guidance of the depth characteristic information and the line manuscript characteristic information.
Step S5190: and (3) whether the step i of noise reduction is finished.
Step S5200: and iterating the noise image to obtain an intermediate image.
The noise image is reduced by a specified algorithm, and the i steps can be iterated. That is, the noise image may be iterated first, resulting in an intermediate image. Then, denoising is carried out, whether the step i denoising is completed is judged, if not, the jump is continued to be executed without S5200; if the i-step noise reduction has been completed, the jump proceeds to step S5210.
Alternatively, the adjustment of the shadow effect may be performed after each iteration to obtain the intermediate image.
Step S5210: decoding from the encoder to obtain the image to be processed.
And decoding the image obtained by the specified algorithm. Specifically, the image obtained by the specified algorithm can be decoded by the self-encoder to obtain the image to be processed.
Step S5220: the first color parameter and the second color parameter are acquired.
Step S5230: and re-polishing to obtain a target image.
The first color parameter is used for representing color parameters corresponding to areas except for a target area of a target object in the image to be processed. The second color parameter is used for representing the color parameter corresponding to the initial image. The specified color parameters may be determined based on the first color parameters and the second color parameters such that the target object in the image to be processed is re-illuminated by the specified color parameters. The detailed description will refer to the foregoing embodiments, and will not be repeated here.
And obtaining a target image corresponding to the to-be-fused line manuscript in the current processing process, and returning to the step S550 to judge.
Step S5240: a target video is generated.
In the case where it is determined that each of the line patterns to be fused has been traversed, a target video may be generated based on a plurality of target images at this time. The description of the specific generation of the target video may refer to the description of the foregoing embodiments, which is not repeated herein.
Step S5250: and (5) ending.
Referring to fig. 14, fig. 14 shows a block diagram of a video generating apparatus according to an embodiment of the present application, where the video generating apparatus 1400 includes: a first acquisition unit 1410, a second acquisition unit 1420, a fusion unit 1430, a color parameter determination unit 1440, a re-lighting unit 1450, and a video generation unit 1460.
The first obtaining unit 1410 is configured to obtain a video to be fused and an initial image, and perform frame decomposition on the video to be fused to obtain a plurality of frame images to be fused.
A second acquisition unit 1420, configured to acquire a mask image of the target object in the initial image.
Optionally, the second obtaining unit 1420 may be further configured to determine an initial mask image of the target object from the initial image based on a pre-obtained subject matting model; and adjusting the resolution of the initial mask image to a specified resolution, so as to obtain the mask image, wherein the specified resolution is the resolution of a potential space corresponding to the initial image.
And a fusion unit 1430, configured to fuse the initial image with each frame image to be fused based on a specified algorithm, so as to obtain a plurality of images to be processed.
Optionally, the fusion unit 1430 may be further configured to obtain a to-be-fused line manuscript corresponding to each to-be-fused frame image; acquiring a depth image of the initial image and an initial line manuscript of the initial image; fusing the initial line manuscript with each line manuscript to be fused respectively to obtain a plurality of line manuscript images to be processed; and obtaining a to-be-processed image corresponding to each to-be-processed line manuscript image based on a specified algorithm through the depth image, the to-be-processed line manuscript image and the noise image, wherein the noise image is an image obtained by adding noise to the initial image.
Optionally, the fusion unit 1430 may be further configured to encode the initial image based on a self-encoder to obtain a latent feature image of the initial image in a latent space; and adding Gaussian distribution noise to the potential characteristic image to obtain the noise image.
Optionally, the fusion unit 1430 may be further configured to obtain an image to be decoded corresponding to each of the line manuscript images to be processed through the depth image, the line manuscript image to be processed, and the noise image based on a specified algorithm; and decoding each image to be decoded based on the self-encoder to obtain a corresponding image to be processed of each document image to be processed.
Optionally, the fusion unit 1430 may be further configured to extract depth feature information of the depth image based on a depth model; extracting line manuscript characteristic information of the line manuscript image to be processed based on a line manuscript model; adding Gaussian noise to a target object in the noise image to obtain an intermediate image; and carrying out noise reduction processing on the intermediate image through the depth characteristic information and the line manuscript characteristic information based on a specified algorithm to obtain a to-be-processed image corresponding to each to-be-processed line manuscript image.
Optionally, the fusion unit 1430 may be further configured to determine, based on the mask image, a target area corresponding to a target object in the noise image, as the first image area; determining a region of the noise image other than the first image region as a second image region based on the mask image; adding Gaussian noise to the first image area to obtain a noise image area; and fusing the noise image area with the second image area to obtain the intermediate image.
Optionally, the fusion unit 1430 may be further configured to obtain an effect prompt word, where the effect prompt word is used to adjust a light effect of the intermediate image; and performing target operation on the intermediate image through the depth characteristic information, the line manuscript characteristic information and the effect prompt word based on a specified algorithm to obtain a to-be-processed image corresponding to each to-be-processed line manuscript image, wherein the target operation comprises noise reduction processing and light and shadow effect adjustment.
The color parameter determining unit 1440 is configured to determine a first color parameter corresponding to each of the images to be processed based on the mask image, and determine a second color parameter based on the initial image.
Optionally, the color parameter determining unit 1440 may be further configured to determine, based on the mask image, an area other than the target area corresponding to the target object in the image to be processed, as a third image area; acquiring a first red parameter corresponding to a red channel, a first green parameter corresponding to a green channel and a first blue parameter corresponding to a blue channel in each third image area; determining a first color parameter corresponding to each image to be processed based on the first red parameter, the first green parameter and the first blue parameter; acquiring a second red parameter corresponding to a red channel, a second green parameter corresponding to a green channel and a second blue parameter corresponding to a blue channel in an initial image; the second color parameter is determined based on a second red parameter, a second green parameter, and a second blue parameter.
Optionally, the color parameter determining unit 1440 may be further configured to use an average value of the first red parameter, the first green parameter, and the first blue parameter as a first color parameter corresponding to each of the images to be processed; and taking the average value of the second red parameter, the second green parameter and the second blue parameter as the second color parameter.
And a re-lighting unit 1450, configured to re-light each of the images to be processed based on the second color parameter, the mask image, the initial image, and the first color parameter corresponding to each of the images to be processed, so as to obtain a target image.
Optionally, the re-lighting unit 1450 may be further configured to determine, based on the mask image, a target area corresponding to the target object in the initial image, as a fourth image area; and fusing a fourth image area which is re-polished based on a specified color parameter with a third image area to obtain a target image, wherein the specified color parameter is a parameter determined based on the quotient of the first color parameter and the second color parameter, and the third image area is an area which is determined to be except for a target area corresponding to a target object in the image to be processed based on the mask image.
A video generating unit 1460 for generating a target video based on the target image.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus and unit described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.
In several embodiments provided by the present application, the coupling of the elements to each other may be electrical, mechanical, or other. In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
Referring to fig. 15, fig. 15 shows a block diagram of an electronic device according to an embodiment of the present application. The electronic device 110 may be a smart phone, desktop computer, on-board computer, server, tablet computer, or the like. The electronic device 110 of the present application may include one or more of the following components: a processor 111, a memory 112 and one or more application programs, wherein the processor 111 is electrically connected to the memory 112, the one or more program(s) being configured to perform the method as described in the previous embodiments.
Processor 111 may include one or more processing cores. The processor 111 connects various portions of the overall electronic device 110 using various interfaces and lines, performs various functions of the electronic device 110 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 112, and invoking data stored in the memory 112. Alternatively, the processor 111 may be implemented in at least one hardware form of digital signal processing (DIGITAL SIGNAL processing, DSP), field-programmable gate array (field-programmable GATE ARRAY, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 111 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. Wherein, the CPU mainly processes an operating system, a user interface, a computer program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 111 and may be implemented solely by a communication chip. The method as described in the previous embodiments may be performed in particular by the one or more processors 111.
For some embodiments, memory 112 may include random access memory (Random Access Memory, RAM) or read-only memory (ROM). Memory 112 may be used to store instructions, programs, code sets, or instruction sets. The memory 112 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function, instructions for implementing the various method embodiments described below, and the like. The storage data area may also store data created by the electronic device 110 in use, and the like.
Referring to fig. 16, a block diagram of a computer readable storage medium according to an embodiment of the present application is shown. Stored in the computer readable medium 1600 is program code that can be invoked by a processor to perform the methods described in the method embodiments above.
The computer readable storage medium 1600 may be an electronic memory such as a flash memory, EEPROM (electrically erasable programmable read only memory), EPROM, hard disk, or ROM. Optionally, computer readable storage medium 1600 includes non-volatile computer readable medium (non-transitory computer-readable storage medium). The computer readable storage medium 1600 has memory space for program code 1610 that performs any of the method steps described above. The program code can be read from or written to one or more computer program products. Program code 1610 may be compressed, for example, in a suitable form.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be appreciated by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (14)

1. A video generation method, comprising:
Acquiring a video to be fused and an initial image, and carrying out frame decomposition on the video to be fused to obtain a plurality of frame images to be fused;
Acquiring a mask image of a target object in the initial image;
fusing the initial image with each frame image to be fused based on a specified algorithm to obtain a plurality of images to be processed;
Determining a first color parameter corresponding to each image to be processed based on the mask image, and determining a second color parameter based on the initial image;
Performing re-lighting on each image to be processed based on the second color parameters, the mask image, the initial image and the first color parameters corresponding to each image to be processed to obtain a target image;
and generating a target video based on the target image.
2. The method according to claim 1, wherein the fusing the initial image with each frame image to be fused based on a specified algorithm to obtain a plurality of images to be processed includes:
acquiring a to-be-fused line manuscript corresponding to each to-be-fused frame image;
Acquiring a depth image of the initial image and an initial line manuscript of the initial image;
fusing the initial line manuscript with each line manuscript to be fused respectively to obtain a plurality of line manuscript images to be processed;
and obtaining a to-be-processed image corresponding to each to-be-processed line manuscript image based on a specified algorithm through the depth image, the to-be-processed line manuscript image and the noise image, wherein the noise image is an image obtained by adding noise to the initial image.
3. The method according to claim 2, wherein before the obtaining the to-be-processed image corresponding to each to-be-processed line manuscript image based on the depth image, the to-be-processed line manuscript image, and the noise image according to the specified algorithm, the method further comprises:
encoding the initial image based on a self-encoder, and acquiring a potential characteristic image of the initial image in a potential space;
and adding Gaussian distribution noise to the potential characteristic image to obtain the noise image.
4. The method of claim 3, wherein the obtaining the to-be-processed image corresponding to each to-be-processed line manuscript image based on the specified algorithm through the depth image, the to-be-processed line manuscript image, and the noise image includes:
obtaining an image to be decoded corresponding to each line manuscript image to be processed through the depth image, the line manuscript image to be processed and the noise image based on a specified algorithm;
and decoding each image to be decoded based on the self-encoder to obtain a corresponding image to be processed of each document image to be processed.
5. The method according to claim 2, wherein the obtaining the to-be-processed image corresponding to each to-be-processed line manuscript image based on the specified algorithm through the depth image, the to-be-processed line manuscript image, and the noise image includes:
Extracting depth characteristic information of the depth image based on a depth model;
extracting line manuscript characteristic information of the line manuscript image to be processed based on a line manuscript model;
adding Gaussian noise to a target object in the noise image to obtain an intermediate image;
and carrying out noise reduction processing on the intermediate image through the depth characteristic information and the line manuscript characteristic information based on a specified algorithm to obtain a to-be-processed image corresponding to each to-be-processed line manuscript image.
6. The method of claim 5, wherein adding gaussian noise to the target object in the noisy image results in an intermediate image, comprising:
determining a target area corresponding to a target object in the noise image based on the mask image as a first image area;
Determining a region of the noise image other than the first image region as a second image region based on the mask image;
adding Gaussian noise to the first image area to obtain a noise image area;
And fusing the noise image area with the second image area to obtain the intermediate image.
7. The method according to claim 5, wherein the denoising the intermediate image based on the depth feature information and the draft feature information by using a specified algorithm to obtain a to-be-processed image corresponding to each to-be-processed draft image includes:
Obtaining an effect prompt word, wherein the effect prompt word is used for adjusting the light and shadow effect of the intermediate image;
And performing target operation on the intermediate image through the depth characteristic information, the line manuscript characteristic information and the effect prompt word based on a specified algorithm to obtain a to-be-processed image corresponding to each to-be-processed line manuscript image, wherein the target operation comprises noise reduction processing and light and shadow effect adjustment.
8. The method of claim 1, wherein the acquiring a mask image of the target object in the initial image comprises:
Determining an initial mask image of a target object from the initial image based on a pre-acquired main body matting model;
And adjusting the resolution of the initial mask image to a specified resolution, so as to obtain the mask image, wherein the specified resolution is the resolution of a potential space corresponding to the initial image.
9. The method of claim 1, wherein determining a first color parameter for each of the images to be processed based on the mask image and determining a second color parameter based on the initial image comprises:
determining an area outside a target area corresponding to a target object in the image to be processed based on the mask image, and taking the area as a third image area;
Acquiring a first red parameter corresponding to a red channel, a first green parameter corresponding to a green channel and a first blue parameter corresponding to a blue channel in each third image area;
determining a first color parameter corresponding to each image to be processed based on the first red parameter, the first green parameter and the first blue parameter;
Acquiring a second red parameter corresponding to a red channel, a second green parameter corresponding to a green channel and a second blue parameter corresponding to a blue channel in an initial image;
the second color parameter is determined based on a second red parameter, a second green parameter, and a second blue parameter.
10. The method of claim 9, wherein determining the first color parameter for each of the images to be processed based on the first red parameter, the first green parameter, and the first blue parameter comprises:
taking the average value of the first red parameter, the first green parameter and the first blue parameter as a first color parameter corresponding to each image to be processed;
The determining the second color parameter based on the second red parameter, the second green parameter, and the second blue parameter includes:
and taking the average value of the second red parameter, the second green parameter and the second blue parameter as the second color parameter.
11. The method according to claim 1, wherein the re-lighting each of the images to be processed based on the second color parameter, the mask image, the initial image, and the first color parameter corresponding to each of the images to be processed to obtain a target image includes:
Determining a target area corresponding to a target object in the initial image based on the mask image, and taking the target area as a fourth image area;
and fusing a fourth image area which is re-polished based on a specified color parameter with a third image area to obtain a target image, wherein the specified color parameter is a parameter determined based on the quotient of the first color parameter and the second color parameter, and the third image area is an area which is determined to be except for a target area corresponding to a target object in the image to be processed based on the mask image.
12. A video generating apparatus, comprising:
The first acquisition unit is used for acquiring a video to be fused and an initial image, and carrying out frame decomposition on the video to be fused to obtain a plurality of frame images to be fused;
A second obtaining unit, configured to obtain a mask image of a target object in the initial image;
The fusion unit is used for respectively fusing the initial image with each frame image to be fused based on a specified algorithm to obtain a plurality of images to be processed;
a color parameter determining unit, configured to determine a first color parameter corresponding to each image to be processed based on the mask image, and determine a second color parameter based on the initial image;
The re-lighting unit is used for re-lighting each image to be processed based on the second color parameters, the mask image, the initial image and the first color parameters corresponding to each image to be processed to obtain a target image;
And the video generation unit is used for generating a target video based on the target image.
13. An electronic device, comprising: one or more processors;
A memory;
One or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of any of claims 1-11.
14. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a program code, which is callable by a processor for executing the method according to any one of claims 1-11.
CN202410059702.0A 2024-01-15 2024-01-15 Video generation method, device, electronic equipment and readable storage medium Pending CN117893419A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410059702.0A CN117893419A (en) 2024-01-15 2024-01-15 Video generation method, device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410059702.0A CN117893419A (en) 2024-01-15 2024-01-15 Video generation method, device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN117893419A true CN117893419A (en) 2024-04-16

Family

ID=90650630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410059702.0A Pending CN117893419A (en) 2024-01-15 2024-01-15 Video generation method, device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN117893419A (en)

Similar Documents

Publication Publication Date Title
KR102658960B1 (en) System and method for face reenactment
US10255681B2 (en) Image matting using deep learning
US20200357142A1 (en) Learning-based sampling for image matting
CN112990205B (en) Method and device for generating handwritten character sample, electronic equipment and storage medium
CN106447756B (en) Method and system for generating user-customized computer-generated animations
US11670031B2 (en) System and method for automatically generating an avatar with pronounced features
US20230021533A1 (en) Method and apparatus for generating video with 3d effect, method and apparatus for playing video with 3d effect, and device
JP2023539620A (en) Facial image processing method, display method, device and computer program
CN117058271A (en) Method and computing device for generating commodity main graph background
CN113516666A (en) Image cropping method and device, computer equipment and storage medium
WO2024131565A1 (en) Garment image extraction method and apparatus, and device, medium and product
CN114022497A (en) Image processing method and device
CN117541546A (en) Method and device for determining image cropping effect, storage medium and electronic equipment
CN115376033A (en) Information generation method and device
Kim et al. Game effect sprite generation with minimal data via conditional GAN
CN116954605A (en) Page generation method and device and electronic equipment
CN117893419A (en) Video generation method, device, electronic equipment and readable storage medium
CN112927321B (en) Intelligent image design method, device, equipment and storage medium based on neural network
CN117376660A (en) Subtitle element rendering method, device, equipment, medium and program product
US20230005107A1 (en) Multi-task text inpainting of digital images
CN115953597A (en) Image processing method, apparatus, device and medium
CN111476868A (en) Animation generation model training and animation generation method and device based on deep learning
CN111062862A (en) Color-based data enhancement method and system, computer device and storage medium
CN113836328B (en) Image data processing method and device
KR102656674B1 (en) Method and apparatus for transforming input image based on target style and target corlor information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication