CN117893419A

CN117893419A - Video generation method, device, electronic equipment and readable storage medium

Info

Publication number: CN117893419A
Application number: CN202410059702.0A
Authority: CN
Inventors: 王凡祎; 苏婧文
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2024-01-15
Filing date: 2024-01-15
Publication date: 2024-04-16

Abstract

The present application discloses a video generation method, device, electronic device and readable storage medium, including: obtaining a video to be fused and an initial image, and performing frame decomposition on the video to be fused to obtain multiple frame images to be fused; obtaining a mask image of the target object in the initial image; fusing the initial image with each frame image to be fused based on a specified algorithm to obtain multiple images to be processed; determining a first color parameter corresponding to each image to be processed based on the mask image, and determining a second color parameter based on the initial image; relighting each image to be processed based on the second color parameter, the mask image, the initial image and the first color parameter corresponding to each image to be processed to obtain a target image; generating a target video based on the target image. The video to be fused and the initial image are fused by a specified algorithm, and the obtained image can be relighted by color parameters, thereby obtaining a target video with more realistic lighting and shadows.

Description

Video generation method, device, electronic device and readable storage medium

技术领域Technical Field

本申请涉及图像处理技术领域，更具体地，涉及一种视频生成方法、装置、电子设备及可读存储介质。The present application relates to the field of image processing technology, and more specifically, to a video generation method, device, electronic device and readable storage medium.

背景技术Background technique

目前，随着电子信息技术的发展，可以在图像中添加一些动态视频中的元素，从而生成动效视频。然而，目前生成动效视频的方法，消耗人力较多，且生成的动效视频的光照和阴影不够逼真。At present, with the development of electronic information technology, some elements in dynamic videos can be added to images to generate dynamic effect videos. However, the current methods of generating dynamic effect videos consume a lot of manpower, and the lighting and shadows of the generated dynamic effect videos are not realistic enough.

发明内容Summary of the invention

本申请提出了一种视频生成方法、装置、电子设备及可读存储介质。The present application proposes a video generation method, device, electronic device and readable storage medium.

第一方面，本申请实施例提供了一种视频生成方法，包括：获取待融合视频以及初始图像，并对所述待融合视频进行帧分解，得到多个待融合帧图像；获取所述初始图像中目标对象的掩膜图像；基于指定算法将所述初始图像分别与每个所述待融合帧图像进行融合，得到多个待处理图像；基于所述掩膜图像确定每个所述待处理图像对应的第一色彩参数，基于所述初始图像确定第二色彩参数；基于所述第二色彩参数、掩膜图像、初始图像以及每个所述待处理图像对应的第一色彩参数对每个所述待处理图像进行重打光，得到目标图像；基于所述目标图像生成目标视频。In a first aspect, an embodiment of the present application provides a video generation method, comprising: obtaining a video to be fused and an initial image, and performing frame decomposition on the video to be fused to obtain a plurality of frame images to be fused; obtaining a mask image of a target object in the initial image; fusing the initial image with each of the frame images to be fused based on a specified algorithm to obtain a plurality of images to be processed; determining a first color parameter corresponding to each of the images to be processed based on the mask image, and determining a second color parameter based on the initial image; re-lighting each of the images to be processed based on the second color parameter, the mask image, the initial image, and the first color parameter corresponding to each of the images to be processed to obtain a target image; and generating a target video based on the target image.

第二方面，本申请实施例还提供了一种视频生成装置，包括：第一获取单元、第二获取单元、融合单元、色彩参数确定单元、重打光单元以及视频生成单元。其中，第一获取单元，用于获取待融合视频以及初始图像，并对所述待融合视频进行帧分解，得到多个待融合帧图像；第二获取单元，用于获取所述初始图像中目标对象的掩膜图像；融合单元，用于基于指定算法将所述初始图像分别与每个所述待融合帧图像进行融合，得到多个待处理图像；色彩参数确定单元，用于基于所述掩膜图像确定每个所述待处理图像对应的第一色彩参数，基于所述初始图像确定第二色彩参数；重打光单元，用于基于所述第二色彩参数、掩膜图像、初始图像以及每个所述待处理图像对应的第一色彩参数对每个所述待处理图像进行重打光，得到目标图像；视频生成单元，用于基于所述目标图像生成目标视频。In the second aspect, the embodiment of the present application also provides a video generation device, including: a first acquisition unit, a second acquisition unit, a fusion unit, a color parameter determination unit, a relighting unit and a video generation unit. Among them, the first acquisition unit is used to acquire the video to be fused and the initial image, and perform frame decomposition on the video to be fused to obtain multiple frame images to be fused; the second acquisition unit is used to acquire the mask image of the target object in the initial image; the fusion unit is used to fuse the initial image with each of the frame images to be fused based on a specified algorithm to obtain multiple images to be processed; the color parameter determination unit is used to determine the first color parameter corresponding to each of the images to be processed based on the mask image, and determine the second color parameter based on the initial image; the relighting unit is used to relight each of the images to be processed based on the second color parameter, the mask image, the initial image and the first color parameter corresponding to each of the images to be processed to obtain a target image; the video generation unit is used to generate a target video based on the target image.

第三方面，本申请实施例还提供了一种电子设备，包括：一个或多个处理器；存储器；一个或多个应用程序，其中所述一个或多个应用程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行，所述一个或多个程序配置用于执行第一方面所述的方法。In a third aspect, an embodiment of the present application further provides an electronic device comprising: one or more processors; a memory; and one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, and the one or more programs are configured to execute the method described in the first aspect.

第四方面，本申请实施例还提供了一种计算机可读存储介质，所述计算机可读存储介质中存储有程序代码，所述程序代码可被处理器调用执行上述第一方面所述的方法。In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, in which a program code is stored. The program code can be called by a processor to execute the method described in the first aspect above.

本申请实施例提供的视频生成方法、装置、电子设备及可读存储介质，首先获取待融合视频以及初始图像，并对所述待融合视频进行帧分解，得到多个待融合帧图像；然后获取所述初始图像中目标对象的掩膜图像；基于指定算法将所述初始图像分别与每个所述待融合帧图像进行融合，得到多个待处理图像；再基于所述掩膜图像确定每个所述待处理图像对应的第一色彩参数，基于所述初始图像确定第二色彩参数；基于所述第二色彩参数、掩膜图像、初始图像以及每个所述待处理图像对应的第一色彩参数对每个所述待处理图像进行重打光，得到目标图像；最后基于所述目标图像生成目标视频。首先，通过指定算法来将所述初始图像分别与每个所述待融合帧图像进行融合，得到多个待处理图像，避免了通过人工进行编辑，节省了人力成本，也提高了获取待处理图像的效率。另外，通过色彩参数来对得到的图像进行重打光，可以获取到光照和阴影更加逼真的目标视频。The video generation method, device, electronic device and readable storage medium provided by the embodiment of the present application first obtain the video to be fused and the initial image, and perform frame decomposition on the video to be fused to obtain multiple frame images to be fused; then obtain the mask image of the target object in the initial image; fuse the initial image with each of the frame images to be fused based on a specified algorithm to obtain multiple images to be processed; then determine the first color parameter corresponding to each of the images to be processed based on the mask image, and determine the second color parameter based on the initial image; relight each of the images to be processed based on the second color parameter, the mask image, the initial image and the first color parameter corresponding to each of the images to be processed to obtain the target image; finally generate the target video based on the target image. First, the initial image is fused with each of the frame images to be fused by a specified algorithm to obtain multiple images to be processed, which avoids manual editing, saves labor costs, and improves the efficiency of obtaining the images to be processed. In addition, by relighting the obtained image through color parameters, a target video with more realistic lighting and shadows can be obtained.

本申请实施例的其他特征和优点将在随后的说明书阐述，并且，部分地从说明书中变得显而易见，或者通过实施本申请实施例而了解。本申请实施例的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。Other features and advantages of the embodiments of the present application will be described in the subsequent description, and partly become apparent from the description, or can be understood by practicing the embodiments of the present application. The purposes and other advantages of the embodiments of the present application can be realized and obtained by the structures specifically pointed out in the written description, claims, and drawings.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required for use in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present application. For those skilled in the art, other drawings can be obtained based on these drawings without creative work.

图1示出了本申请实施例提供的视频生成方法的应用场景图；FIG1 shows an application scenario diagram of a video generation method provided by an embodiment of the present application;

图2示出了本申请实施例提供的视频生成方法的方法流程图；FIG2 shows a method flow chart of a video generation method provided by an embodiment of the present application;

图3示出了本申请实施例提供的初始图像的示意图；FIG3 is a schematic diagram of an initial image provided by an embodiment of the present application;

图4示出了本申请实施例提供的待融合帧图像的示意图；FIG4 is a schematic diagram of a frame image to be fused provided by an embodiment of the present application;

图5示出了本申请实施例提供的掩膜图像的示意图；FIG5 is a schematic diagram of a mask image provided by an embodiment of the present application;

图6示出了本申请实施例中提供的目标图像的示意图；FIG6 is a schematic diagram showing a target image provided in an embodiment of the present application;

图7示出了本申请另一实施例提供的视频生成方法的方法流程图；FIG7 shows a method flow chart of a video generation method provided by another embodiment of the present application;

图8示出了本申请实施例提供的待融合线稿图的示意图；FIG8 is a schematic diagram showing a line drawing to be fused provided in an embodiment of the present application;

图9示出了本申请实施例提供的初始线稿图的示意图；FIG9 shows a schematic diagram of an initial line drawing provided by an embodiment of the present application;

图10示出了本申请实施例提供的待处理线稿图像的示意图；FIG10 is a schematic diagram showing a line drawing image to be processed provided by an embodiment of the present application;

图11示出了本申请又一实施例提供的视频生成方法的方法流程图；FIG11 is a flowchart of a video generation method provided by another embodiment of the present application;

图12示出了本申请再一实施例提供的视频生成方法的方法流程图；FIG12 is a flowchart of a video generation method provided by yet another embodiment of the present application;

图13示出了本申请还一实施例提供的视频生成方法的方法流程图；FIG13 shows a method flow chart of a video generation method provided by another embodiment of the present application;

图14示出了本申请实施例提供的视频生成装置的结构框图；FIG14 shows a structural block diagram of a video generation device provided in an embodiment of the present application;

图15示出了本申请实施例提供的电子设备的结构框图；FIG15 shows a structural block diagram of an electronic device provided in an embodiment of the present application;

图16示出了本申请实施例提供的计算机可读存储介质的结构框图。FIG. 16 shows a block diagram of the structure of a computer-readable storage medium provided in an embodiment of the present application.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本申请方案，下面将结合本申请实施例中附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。因此，以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围，而是仅仅表示本申请的选定实施例。基于本申请的实施例，本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都属于本申请保护的范围。In order to make those skilled in the art better understand the present application scheme, the technical scheme in the present application embodiment will be clearly and completely described below in conjunction with the drawings in the present application embodiment. Obviously, the described embodiment is only a part of the present application embodiment, rather than all the embodiments. The components of the present application embodiment usually described and shown in the drawings here can be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of the present application provided in the drawings is not intended to limit the scope of the present application for protection, but merely represents the selected embodiment of the present application. Based on the embodiments of the present application, all other embodiments obtained by those skilled in the art without making creative work belong to the scope of protection of the present application.

应注意到：相似的标号和字母在下面的附图中表示类似项，因此，一旦某一项在一个附图中被定义，则在随后的附图中不需要对其进行进一步定义和解释。同时，在本申请的描述中，术语“第一”、“第二”等仅用于区分描述，而不能理解为指示或暗示相对重要性。It should be noted that similar reference numerals and letters represent similar items in the following drawings, so once an item is defined in one drawing, it does not need to be further defined and explained in the subsequent drawings. At the same time, in the description of this application, the terms "first", "second", etc. are only used to distinguish the description and cannot be understood as indicating or implying relative importance.

目前，随着电子信息技术的发展，可以在图像中添加一些动态视频中的元素，从而生成动效视频。然而，目前生成动效视频的方法，消耗人力较多，且生成的动效视频的光照和阴影不够逼真。如何减少生成动效视频所消耗的人力，且提高生成的动效视频的光照和阴影的逼真程度，是亟待解决的问题。At present, with the development of electronic information technology, some elements in dynamic videos can be added to images to generate dynamic effect videos. However, the current methods of generating dynamic effect videos consume a lot of manpower, and the lighting and shadows of the generated dynamic effect videos are not realistic enough. How to reduce the manpower consumed in generating dynamic effect videos and improve the realism of the lighting and shadows of the generated dynamic effect videos is an urgent problem to be solved.

其中，动效视频可以为通过在静态图像上编辑动效的元素生成的视频，该动效的元素可以为视频中提取的动效对象，例如在动效视频中提取的动效对象。The motion effect video may be a video generated by editing the elements of the motion effect on a static image, and the elements of the motion effect may be motion effect objects extracted from the video, such as motion effect objects extracted from the motion effect video.

现有技术中，可以人工在静态图像上添加动态元素，例如可以人工通过使用图像编辑软件或动画软件来在静态图像上添加动态元素。In the prior art, dynamic elements can be added to a static image manually, for example, by using image editing software or animation software to add dynamic elements to a static image manually.

然而，发明人在研究中发现，人工在静态图像上添加动态元素，需要较高的时间，对人工需求较高。并且，生成的动效视频的光照和阴影不够逼真。However, the inventors found in their research that manually adding dynamic elements to static images takes a long time and requires a lot of manpower. In addition, the lighting and shadows of the generated dynamic effect videos are not realistic enough.

因此，为了解决或部分解决上述问题，本申请提供了一种视频生成方法、装置、电子设备及可读存储介质。Therefore, in order to solve or partially solve the above problems, the present application provides a video generation method, device, electronic device and readable storage medium.

请参阅图1，图1示出了本申请实施例提供的视频生成方法的应用场景图，即视频生成场景100，该视频生成场景100中可以包括有电子设备110以及服务器120，其中电子设备110与服务器120相连接。Please refer to Figure 1, which shows an application scenario diagram of the video generation method provided in an embodiment of the present application, namely, a video generation scenario 100. The video generation scenario 100 may include an electronic device 110 and a server 120, wherein the electronic device 110 is connected to the server 120.

电子设备110可以通过接入互联网，从而和同样接入了互联网的服务器120建立连接。其中，电子设备110可以通过无线的方式接入互联网，例如通过无线通信技术Wi-Fi、蓝牙Bluetooth等接入互联网；电子设备110还可以通过有线的方式接入互联网，例如通过Rj45网线或光纤接入互联网。The electronic device 110 can access the Internet and thereby establish a connection with the server 120 which is also connected to the Internet. The electronic device 110 can access the Internet wirelessly, such as through wireless communication technologies such as Wi-Fi and Bluetooth, etc.; the electronic device 110 can also access the Internet in a wired manner, such as through an Rj45 network cable or optical fiber.

用户可以控制电子设备110从而使电子设备执行视频生成方法，例如，用户可以直接对电子设备110进行操作，从而控制电子设备执行视频生成方法，其中电子设备110可以在本地部署有指定算法，例如可以预先存储有指定算法，从而可以调用指定算法来实现视频生成，详细的介绍可以参阅后续实施方式。可选的，电子设备110也可以调用部署在服务器120中的指定算法来执行视频生成。The user can control the electronic device 110 so that the electronic device executes the video generation method. For example, the user can directly operate the electronic device 110 to control the electronic device to execute the video generation method. The electronic device 110 can be locally deployed with a specified algorithm, for example, the specified algorithm can be pre-stored, so that the specified algorithm can be called to achieve video generation. For a detailed description, please refer to the subsequent implementation. Optionally, the electronic device 110 can also call a specified algorithm deployed in the server 120 to execute video generation.

其中，服务器120可以为一种云端的服务器，也可以为本地服务器。The server 120 may be a cloud server or a local server.

对于一些实施方式，该视频生成方法可以应用于娱乐和游戏，例如，可以用于电影、电视剧、动画片、游戏等娱乐产业生成对应的视频内容。还可以应用于广告和营销，例如，可以用于生成吸引人的广告和营销的视频内容。具体的视频内容可以包括产品演示、动态广告横幅、动画标识等。又可以应用于虚拟现实(VR)和增强现实(AR)中，例如，可以生成可用于虚拟现实和增强现实的视频内容。具体的视频内容可以包括虚拟场景、动态元素和交互效果展示等。还可以应用于设计和创意领域，例如，可以生成可用于辅助设计和创意过程的视频内容。具体的视频内容可以包括动态的设计原型、艺术效果、视觉效果预览等。另外，还可以应用于社交媒体和表情包，例如，可以生成可以用于社交媒体平台和聊天应用的视频内容。具体的视频内容可以包括有趣、表达丰富情感的动态表情包和动图。再者，还可以应用于数据可视化，例如可以生成可用于数据可视化领域的视频内容。具体的，视频内容可以包括动态图表、可交互式图形和动态数据展示。For some embodiments, the video generation method can be applied to entertainment and games, for example, it can be used to generate corresponding video content for entertainment industries such as movies, TV series, cartoons, games, etc. It can also be applied to advertising and marketing, for example, it can be used to generate attractive advertising and marketing video content. Specific video content may include product demonstrations, dynamic advertising banners, animated logos, etc. It can also be applied to virtual reality (VR) and augmented reality (AR), for example, video content that can be used for virtual reality and augmented reality can be generated. Specific video content may include virtual scenes, dynamic elements, and interactive effect displays, etc. It can also be applied to the design and creative fields, for example, video content that can be used to assist the design and creative process can be generated. Specific video content may include dynamic design prototypes, artistic effects, visual effect previews, etc. In addition, it can also be applied to social media and emoticons, for example, video content that can be used for social media platforms and chat applications can be generated. Specific video content may include interesting dynamic emoticons and animated images that express rich emotions. Furthermore, it can also be applied to data visualization, for example, video content that can be used in the field of data visualization can be generated. Specifically, the video content may include dynamic charts, interactive graphics, and dynamic data displays.

需要说明的是，上述示出的本申请提供的视频生成方法的应用场景仅为一些示例，并不构成对本申请各实施例的限定。It should be noted that the application scenarios of the video generation method provided by the present application shown above are only some examples and do not constitute a limitation on the embodiments of the present application.

请参阅图2，图2示出了本申请实施例提供的一种视频生成方法的方法流程图。该视频生成方法可以应用于图1示出的视频生成场景中的电子设备，具体的可以将电子设备的处理器作为执行该视频生成方法的执行主体。该视频生成方法可以包括步骤S110至步骤S160。Please refer to Figure 2, which shows a method flow chart of a video generation method provided by an embodiment of the present application. The video generation method can be applied to the electronic device in the video generation scene shown in Figure 1, and specifically, the processor of the electronic device can be used as the execution subject of the video generation method. The video generation method may include steps S110 to S160.

步骤S110：获取待融合视频以及初始图像，并对所述待融合视频进行帧分解，得到多个待融合帧图像。Step S110: obtaining the video to be fused and the initial image, and performing frame decomposition on the video to be fused to obtain a plurality of frame images to be fused.

可以将视频中的一些动效融合至图像，进而再生成带有动效的视频。因此，可以首先获取初始图像以及待融合视频。其中，初始图像可以为需要将动效融合至的图像，也可以理解为可以在初始图像的基础上融合动效；而待融合视频即为包含有动效对象的视频。Some motion effects in the video can be fused into the image, and then a video with motion effects can be generated. Therefore, the initial image and the video to be fused can be obtained first. The initial image can be the image to which the motion effects need to be fused, or it can be understood that the motion effects can be fused on the basis of the initial image; and the video to be fused is the video containing the motion effect object.

在一些实施方式中，电子设备可以预先存储有一些图像文件以及视频文件，从而可以从预先存储的图像文件中选定一个图像文件，作为初始图像；可以从预先存储的视频文件中选定一个视频文件，作为待融合视频。示例性的，电子设备可以运行照片以及视频应用程序，从而在照片以及视频应用程序中选定一个图像文件，作为初始图像；选定一个视频文件，作为待融合视频。In some embodiments, the electronic device may pre-store some image files and video files, so that an image file may be selected from the pre-stored image files as the initial image; and a video file may be selected from the pre-stored video files as the video to be fused. Exemplarily, the electronic device may run a photo and video application, so that an image file may be selected from the photo and video application as the initial image; and a video file may be selected as the video to be fused.

在另一些实施方式中，电子设备可以通过应用程序，获取并不存储在本地的图像文件作为初始图像，以及获取视频文件作为待融合视频。示例性的，电子设备可以运行网页浏览应用程序，该网页浏览应用程序中可以对应有图像文件以及视频文件。从而电子设备可以获取网页浏览应用程序中的图像文件，作为初始图像；获取网页浏览应用程序中的视频文件，作为待融合视频。In other embodiments, the electronic device can obtain an image file that is not stored locally as an initial image and obtain a video file as a video to be fused through an application. For example, the electronic device can run a web browsing application, and the web browsing application can correspond to an image file and a video file. Thus, the electronic device can obtain an image file in the web browsing application as an initial image and obtain a video file in the web browsing application as a video to be fused.

示例性的，待融合视频可以为mp4格式、mkv格式或mov格式的文件等，也可以是gif格式的文件，本申请实施例不做具体限定。Exemplarily, the video to be fused may be a file in mp4 format, mkv format, mov format, etc., or may be a file in gif format, which is not specifically limited in the embodiments of the present application.

例如，请参阅图3，图3示出了本申请实施例提供的初始图像的示意图。图3示出的初始图像300中包括有目标对象301以及背景302。其中背景302可以为除目标对象301对应的目标区域之外的区域。For example, please refer to Figure 3, which shows a schematic diagram of an initial image provided by an embodiment of the present application. The initial image 300 shown in Figure 3 includes a target object 301 and a background 302. The background 302 may be an area other than the target area corresponding to the target object 301.

可以理解是，待融合视频可以由多个帧图像构成，例如，每一秒待融合视频中可以包括有指定数量的帧图像。具体的，该指定数量可以为24、30、50、60、120等。其中，每一秒包括的指定数量的帧图像，也可以称为该待融合视频的帧率，例如指定数量为24，即是该待融合视频的帧率为24；定数量为60，即是该待融合视频的帧率为60。因此，为了将待融合视频中的内容融合至初始图像，可以对所述待融合视频进行帧分解，得到多个待融合帧图像。其中，多个待融合帧图像，即为待融合视频中的多个帧图像。It can be understood that the video to be fused can be composed of multiple frame images. For example, each second of the video to be fused can include a specified number of frame images. Specifically, the specified number can be 24, 30, 50, 60, 120, etc. Among them, the specified number of frame images included in each second can also be called the frame rate of the video to be fused. For example, if the specified number is 24, the frame rate of the video to be fused is 24; if the specified number is 60, the frame rate of the video to be fused is 60. Therefore, in order to fuse the content in the video to be fused into the initial image, the video to be fused can be frame-decomposed to obtain multiple frame images to be fused. Among them, multiple frame images to be fused are multiple frame images in the video to be fused.

例如，待融合视频包括有N帧图像，则可以通过a_n来依次表征每一个待融合帧图像，其中n＝1,2,3…N。For example, if the video to be fused includes N frames of images, each frame of image to be fused can be represented in sequence by a_n, where n=1, 2, 3...N.

例如，请参阅图4，图4示出了本申请实施例提供的待融合帧图像的示意图。图4示出的待融合帧图像400中包括有待融合对象401。For example, please refer to Fig. 4, which shows a schematic diagram of a frame image to be fused provided by an embodiment of the present application. The frame image 400 to be fused shown in Fig. 4 includes an object 401 to be fused.

步骤S120：获取所述初始图像中目标对象的掩膜图像。Step S120: Acquire a mask image of the target object in the initial image.

为了方便后续获取到光照和阴影更加逼真的目标视频，可以首先获取初始图像中目标对象的掩膜图像，从而后续即可直接调用该掩膜图像。In order to facilitate the subsequent acquisition of a target video with more realistic lighting and shadows, a mask image of the target object in the initial image may be first acquired, so that the mask image may be directly called later.

其中，通过掩膜图像可以方便的对图像中一些指定区域进行操作，而不影响图像中除该指定区域之外的区域。在一些实施方式中，掩膜图像可以是一个与初始图像大小相同的二值或布尔图像，其中选定的区域被标记为1(或True)，而其余区域被标记为0(或False)。其中，选定的区域即为初始图像中目标对象对应的目标区域；而其余区域即为初始图像中除目标对象对应的目标区域之外的区域。从而，在视觉上看来，掩膜图像中目标对象对应的目标区域可以为一种颜色，而除目标对象对应的目标区域之外的区域可以为另一种颜色。Among them, through the mask image, some specified areas in the image can be easily operated without affecting the areas in the image other than the specified areas. In some embodiments, the mask image can be a binary or Boolean image of the same size as the initial image, in which the selected area is marked as 1 (or True), and the remaining areas are marked as 0 (or False). Among them, the selected area is the target area corresponding to the target object in the initial image; and the remaining areas are the areas in the initial image other than the target area corresponding to the target object. Therefore, visually, the target area corresponding to the target object in the mask image can be one color, and the area other than the target area corresponding to the target object can be another color.

其中，目标对象可以为初始图像中的主体人物，例如前述图3中的目标对象301。The target object may be a main character in the initial image, such as the target object 301 in the aforementioned FIG. 3 .

对于一些实施方式，可以通过主体抠图模型来获取所述初始图像中目标对象的掩膜图像。其中，主体抠图算法可以识别出输入的图像中的主体人物，从而获取初始图像中目标对象的掩膜图像。For some implementations, a mask image of the target object in the initial image may be obtained by a subject cutout model, wherein the subject cutout algorithm may identify the subject person in the input image, thereby obtaining the mask image of the target object in the initial image.

对于一些实施方式，主体抠图模型可以是预先通过对预训练模型进行调整或训练得到的。该主体抠图模型可以是基于u2net算法实现的。其中，u2net算法是一种基于U-Net结构的图像分割网络算法。For some implementations, the subject cutout model may be obtained by adjusting or training a pre-trained model in advance. The subject cutout model may be implemented based on a u2net algorithm. The u2net algorithm is an image segmentation network algorithm based on a U-Net structure.

例如，请参阅图5，图5示出了本申请实施例提供的掩膜图像的示意图。其中，掩膜图像500中包括目标对象对应的目标区域501，以及除目标对象对应的目标区域之外的区域502。For example, please refer to Fig. 5, which shows a schematic diagram of a mask image provided by an embodiment of the present application, wherein the mask image 500 includes a target area 501 corresponding to the target object and an area 502 other than the target area corresponding to the target object.

步骤S130：基于指定算法将所述初始图像分别与每个所述待融合帧图像进行融合，得到多个待处理图像。Step S130: Based on a specified algorithm, the initial image is fused with each of the frame images to be fused to obtain a plurality of images to be processed.

可以理解的是，后续生成视频，需要将多个帧图像进行合并才能得到视频。也就是说，可以将初始图像分别与每个待融合帧图像进行融合，从而可以得到多个待处理图像。后续可以对多个待处理图像处理后，生成视频。It is understandable that, in order to generate a video later, multiple frame images need to be merged to obtain the video. In other words, the initial image can be fused with each frame image to be fused respectively, so that multiple images to be processed can be obtained. Subsequently, the multiple images to be processed can be processed to generate a video.

对于一些实施方式，为了降低生成视频对人工的需求，并且降低生成视频所需的时间，可以通过指定算法来将所述初始图像分别与每个所述待融合帧图像进行融合，得到多个待处理图像。For some implementations, in order to reduce the need for manual work in generating videos and reduce the time required to generate videos, the initial image may be fused with each of the frame images to be fused by a specified algorithm to obtain multiple images to be processed.

其中，指定算法可以为图生图算法。例如，可以通过电子设备中运行的stablediffusion来执行该指定算法。而通过stable diffusion的生成能力对进行智能优化，可以使得生成目标视频的过程更加高效和准确。The specified algorithm may be a graph-to-graph algorithm. For example, the specified algorithm may be executed by stable diffusion running in an electronic device. By intelligently optimizing the generation capability of stable diffusion, the process of generating a target video may be made more efficient and accurate.

可选的，还可以结合图生图算法调用inpainting算法，来获得效果更好的待处理图像，详细的介绍可以参阅后续实施例。其中，inpainting算法是一种基于图像修复的算法技术，主要用于修复图像中的缺失或损坏部分。inpainting算法通过分析周围图像的信息，利用图像中未破损的信息进行图像修复或者修补，可以用于恢复受损的照片、去除或是降低图像中的噪声。Optionally, the inpainting algorithm can be called in combination with the image generation algorithm to obtain a better image to be processed. For detailed description, please refer to the subsequent embodiments. Among them, the inpainting algorithm is an algorithm technology based on image restoration, which is mainly used to repair missing or damaged parts in the image. The inpainting algorithm analyzes the information of the surrounding images and uses the undamaged information in the image to perform image restoration or repair. It can be used to restore damaged photos and remove or reduce noise in the image.

可以理解的是，每个待处理图像中包括有目标对象以及动效对象，该动效对象即为待融合图像帧中的待融合对象。也就是说，每个待处理图像中的目标对象是相同的，也即是初始图像中的目标对象。而每个待处理图像中的动效对象是不同的，也即是每个待处理图像中的动效对象是与生成该待处理图像的待融合图像帧中的待融合对象相对应的。It can be understood that each image to be processed includes a target object and a motion effect object, and the motion effect object is the object to be fused in the image frame to be fused. In other words, the target object in each image to be processed is the same, that is, the target object in the initial image. However, the motion effect object in each image to be processed is different, that is, the motion effect object in each image to be processed corresponds to the object to be fused in the image frame to be fused that generates the image to be processed.

从而，本申请实施例中通过直接调用指定算法来将所述初始图像分别与每个所述待融合帧图像进行融合，得到多个待处理图像，可以避免通过人工依次对初始图像分别于每个待融合帧图像进行处理，降低了对人工的需求，并且提高了获取多个待处理图像效率，从而整体上提高了生成视频的效率。同时，可以减少人工产生的错误，进一步提高该视频获取方法的稳定性。Therefore, in the embodiment of the present application, by directly calling the specified algorithm to fuse the initial image with each of the frame images to be fused, a plurality of images to be processed can be obtained, which can avoid manually processing the initial image and each frame image to be fused in sequence, reducing the demand for manual work and improving the efficiency of obtaining a plurality of images to be processed, thereby improving the efficiency of generating videos as a whole. At the same time, it can reduce the errors caused by manual work and further improve the stability of the video acquisition method.

步骤S140：基于所述掩膜图像确定每个所述待处理图像对应的第一色彩参数，基于所述初始图像确定第二色彩参数。Step S140: determining a first color parameter corresponding to each of the to-be-processed images based on the mask image, and determining a second color parameter based on the initial image.

在获取到多个待处理图像后，由于每个待处理图像中除目标对象的目标区域之外的区域相较于初始图像发生了变化，因此可以基于待处理图像中除目标对象的目标区域之外的区域的色彩参数来对目标对象进行重打光，使得目标对象的光影更加逼真。After acquiring multiple images to be processed, since the area other than the target area of the target object in each image to be processed has changed compared to the initial image, the target object can be re-lit based on the color parameters of the area other than the target area of the target object in the image to be processed, so that the light and shadow of the target object are more realistic.

可以获取每个所述待处理图像对应的第一色彩参数，其中，第一色彩参数可以是每个待处理图像中除目标对象的目标区域之外的区域对应的色彩参数。具体的，可以基于掩膜图像来确定每个所述待处理图像对应的第一色彩参数。在一些实施方式中，可以首先通过掩膜图像来确定待处理图像中目标对象对应的目标区域之外的区域，再确定第一色彩参数，详细的介绍可以参阅后续实施例。The first color parameter corresponding to each of the images to be processed may be obtained, wherein the first color parameter may be a color parameter corresponding to an area other than the target area of the target object in each of the images to be processed. Specifically, the first color parameter corresponding to each of the images to be processed may be determined based on the mask image. In some embodiments, the area other than the target area corresponding to the target object in the image to be processed may be determined first by the mask image, and then the first color parameter may be determined. For a detailed description, please refer to the subsequent embodiments.

进一步的，还可以基于初始图像确定第二色彩参数了，其中，第二色彩参数即为初始图像对应的色彩参数。Furthermore, a second color parameter may be determined based on the initial image, wherein the second color parameter is the color parameter corresponding to the initial image.

其中，色彩参数可以通过红绿蓝(RGB)颜色通道的参数确定得到，该色彩参数即为前述第一色彩参数以及第二色彩参数。具体的确定方法可以参阅后续实施例。The color parameter can be determined by the parameters of the red, green and blue (RGB) color channels, and the color parameter is the aforementioned first color parameter and second color parameter. The specific determination method can be referred to in the subsequent embodiments.

步骤S150：基于所述第二色彩参数、掩膜图像、初始图像以及每个所述待处理图像对应的第一色彩参数对每个所述待处理图像进行重打光，得到目标图像。Step S150: relighting each of the images to be processed based on the second color parameter, the mask image, the initial image and the first color parameter corresponding to each of the images to be processed to obtain a target image.

进一步的，在获取到每个所述待处理图像对应的第一色彩参数以及初始图像的第二色彩参数后，可以对待处理图像进行重打光，进而获取到光照和阴影更加逼真的图像。该重打光的调整能够消除不协调的光影问题，使生成的目标图像与初始图像的光影更加一致，增强了观众的沉浸感和真实感，使得目标图像的画面更加协调自然。Furthermore, after obtaining the first color parameter corresponding to each of the images to be processed and the second color parameter of the initial image, the image to be processed can be re-lit to obtain an image with more realistic lighting and shadows. The re-lighting adjustment can eliminate the problem of uncoordinated light and shadow, making the generated target image more consistent with the light and shadow of the initial image, enhancing the audience's immersion and sense of reality, and making the target image more coordinated and natural.

具体的，可以通过掩膜图像来将初始图像中的目标对象的目标区域筛选出来，然后对筛选出来的初始图像中的目标对象的目标区域进行重打光。其中，可以基于指定色彩参数来进行重打光，而该指定色彩参数即可以基于第二色彩参数以及每个待处理图像对应的第一色彩参数来确定，详细的介绍可以参阅后续实施例。Specifically, the target area of the target object in the initial image can be screened out by using a mask image, and then the target area of the target object in the screened initial image can be re-lit. The re-lighting can be performed based on a specified color parameter, and the specified color parameter can be determined based on the second color parameter and the first color parameter corresponding to each image to be processed. For a detailed description, please refer to the subsequent embodiments.

而对于初始图像中除确定所述待处理图像中除目标对象对应的目标区域之外的区域，可以不进行重打光。However, for the area in the initial image other than the target area corresponding to the target object in the image to be processed, no re-lighting may be performed.

进而，可以将重打光后的初始图像中的目标对象的目标区域与未被重打光的初始图像中除所述待处理图像中除目标对象对应的目标区域之外的区域进行融合，得到目标图像，详细的介绍可以参阅后续实施例。Furthermore, the target area of the target object in the re-illuminated initial image can be fused with the area in the initial image that has not been re-illuminated except the target area corresponding to the target object in the image to be processed to obtain the target image. For a detailed introduction, please refer to the subsequent embodiments.

可以理解的是，每一个待融合帧图像可以对应生成一个待处理图像，进而确定得到一个目标图像，因此，若存在N个待融合帧图像，则可以有N个待处理图像以及N个目标图像。It is understandable that each frame image to be fused may generate a corresponding image to be processed, and then determine a target image. Therefore, if there are N frame images to be fused, there may be N images to be processed and N target images.

示例性的，请参阅图6，图6示出了本申请实施例中提供的目标图像的示意图。图6示出的目标图像600中，包括有目标对象601以及背景602，其中背景602对应的区域为除目标对象601所在的区域之外的区域。背景602中还包括动效对象603。For example, please refer to FIG6 , which shows a schematic diagram of a target image provided in an embodiment of the present application. The target image 600 shown in FIG6 includes a target object 601 and a background 602, wherein the area corresponding to the background 602 is the area other than the area where the target object 601 is located. The background 602 also includes a motion effect object 603.

步骤S160：基于所述目标图像生成目标视频。Step S160: Generate a target video based on the target image.

在获取到目标图像后，可以基于目标图像生成目标视频。其中，可以将多个目标图像看作是帧图像，进而进行帧图像的合并，生成目标视频。在一些实施方式中，可以调用图像视频编辑软件来实现基于所述目标图像生成目标视频。而生成的目标视频中可以包括有目标对象以及动效对象。并且，该目标视频中，动效对象是变化运动的，而动效对象之外的内容是保持静止不变的。After acquiring the target image, a target video can be generated based on the target image. Among them, multiple target images can be regarded as frame images, and then the frame images are merged to generate a target video. In some embodiments, image video editing software can be called to generate a target video based on the target image. The generated target video may include a target object and a motion effect object. Moreover, in the target video, the motion effect object changes and moves, while the content outside the motion effect object remains static.

本申请实施例提供的视频生成方法，首先获取待融合视频以及初始图像，并对所述待融合视频进行帧分解，得到多个待融合帧图像；然后获取所述初始图像中目标对象的掩膜图像；基于指定算法将所述初始图像分别与每个所述待融合帧图像进行融合，得到多个待处理图像；再基于所述掩膜图像确定每个所述待处理图像对应的第一色彩参数，基于所述初始图像确定第二色彩参数；基于所述第二色彩参数、掩膜图像、初始图像以及每个所述待处理图像对应的第一色彩参数对每个所述待处理图像进行重打光，得到目标图像；最后基于所述目标图像生成目标视频。首先，通过指定算法来将所述初始图像分别与每个所述待融合帧图像进行融合，得到多个待处理图像，避免了通过人工进行编辑，节省了人力成本，也提高了获取待处理图像的效率。另外，通过色彩参数来对得到的图像进行重打光，可以获取到光照和阴影更加逼真的目标视频。The video generation method provided by the embodiment of the present application first obtains the video to be fused and the initial image, and performs frame decomposition on the video to be fused to obtain multiple frame images to be fused; then obtains the mask image of the target object in the initial image; based on the specified algorithm, the initial image is respectively fused with each of the frame images to be fused to obtain multiple images to be processed; then the first color parameter corresponding to each of the images to be processed is determined based on the mask image, and the second color parameter is determined based on the initial image; each of the images to be processed is re-lit based on the second color parameter, the mask image, the initial image, and the first color parameter corresponding to each of the images to be processed to obtain the target image; finally, the target video is generated based on the target image. First, the initial image is respectively fused with each of the frame images to be fused by a specified algorithm to obtain multiple images to be processed, which avoids manual editing, saves labor costs, and improves the efficiency of obtaining images to be processed. In addition, by re-lighting the obtained image through color parameters, a target video with more realistic lighting and shadows can be obtained.

请参阅图7，图7示出了本申请实施例提供的一种视频生成方法的方法流程图。该视频生成方法可以应用于图1示出的视频生成场景中的电子设备，具体的可以将电子设备的处理器作为执行该视频生成方法的执行主体。该视频生成方法可以包括步骤S210至步骤S2120。Please refer to Figure 7, which shows a method flow chart of a video generation method provided by an embodiment of the present application. The video generation method can be applied to the electronic device in the video generation scene shown in Figure 1, and specifically, the processor of the electronic device can be used as the execution subject of the video generation method. The video generation method may include steps S210 to S2120.

步骤S210：获取待融合视频以及初始图像，并对所述待融合视频进行帧分解，得到多个待融合帧图像。Step S210: obtaining the video to be fused and the initial image, and performing frame decomposition on the video to be fused to obtain a plurality of frame images to be fused.

其中，步骤S210在前述实施例中已经详细介绍，此处不再赘述。Among them, step S210 has been introduced in detail in the above embodiment and will not be repeated here.

步骤S220：基于预先获取的主体抠图模型从所述初始图像中确定目标对象的初始掩膜图像。Step S220: determining an initial mask image of the target object from the initial image based on the pre-acquired subject matting model.

步骤S230：将所述初始掩膜图像的分辨率调整为指定分辨率，得到所述掩膜图像，所述指定分辨率为所述初始图像对应的潜在空间的分辨率。Step S230: adjusting the resolution of the initial mask image to a specified resolution to obtain the mask image, where the specified resolution is the resolution of the latent space corresponding to the initial image.

首先，可以从初始图像中确定目标对象的初始掩膜图像，具体的，可以通过预先获取的主体抠图模型来从所述初始图像中确定目标对象的初始掩膜图像。其中，主体抠图模型可以是预先通过对预训练模型进行调整或训练得到的。体抠图算法可以识别出输入的图像中的主体人物，从而获取初始图像中目标对象的掩膜图像。First, an initial mask image of the target object can be determined from the initial image. Specifically, the initial mask image of the target object can be determined from the initial image by using a pre-acquired subject cutout model. The subject cutout model can be obtained by adjusting or training a pre-trained model in advance. The body cutout algorithm can identify the subject in the input image, thereby obtaining a mask image of the target object in the initial image.

示例性的，可以将输出初始图像输入该主体抠图模型中，从而获取到主体抠图模型输出的图像，作为掩膜图像。Exemplarily, the output initial image may be input into the subject cutout model, thereby obtaining an image output by the subject cutout model as a mask image.

在一些实施方式中，可以将初始图像进行编码，从而获取到初始图像位于潜在空间(latent space)中的潜在特征图像。也就是说，可以转换至潜在空间中对初始图像进行处理，而在潜在空间中处理初始图像，可以节约计算资源。In some embodiments, the initial image may be encoded to obtain a latent feature image of the initial image in a latent space. That is, the initial image may be converted to the latent space for processing, and processing the initial image in the latent space may save computing resources.

因此，可选的，还可以将主体抠图模型输出的图像，作为初始掩膜图像，进而再调整初始掩膜图像的分辨率，使得调整分辨率后的初始掩膜图像的分辨率与潜在空间的分辨率相同。进而可以实现在潜在空间中对潜在特征图像或掩膜图像进行处理，节约了计算资源。Therefore, optionally, the image output by the subject cutout model can be used as the initial mask image, and then the resolution of the initial mask image can be adjusted so that the resolution of the initial mask image after the resolution adjustment is the same as the resolution of the latent space. Then, the latent feature image or the mask image can be processed in the latent space, saving computing resources.

具体的，可以将所述初始掩膜图像的分辨率调整为指定分辨率，得到所述掩膜图像，所述指定分辨率为所述初始图像对应的潜在空间的分辨率。例如，指定分辨率可以为128*128；又例如，指定分辨率也可以为64*64。Specifically, the resolution of the initial mask image may be adjusted to a specified resolution to obtain the mask image, where the specified resolution is the resolution of the latent space corresponding to the initial image. For example, the specified resolution may be 128*128; or, for another example, the specified resolution may be 64*64.

步骤S240：获取每个所述待融合帧图像对应的待融合线稿图。Step S240: Obtain the line drawing to be fused corresponding to each of the frame images to be fused.

在一些实施方式中，可以首先获取每个所述待融合帧图像对应的待融合线稿图。其中，待融合线稿图可以用于表征待融合帧图像的线条和形状等信息。In some implementations, the line drawing to be fused corresponding to each frame image to be fused may be first obtained, wherein the line drawing to be fused may be used to represent information such as lines and shapes of the frame image to be fused.

示例性的，可以通过线稿图求取算法来生成每个待融合帧图像对应的待融合线稿图。具体的，线稿图求取算法可以为控制网络(Controlnet)中配套训练好的线性检测器(LineartDetector)，其中线性检测器可以被用来识别和定位图像中的目标物体，从而实现控制任务中的目标跟踪和识别等功能。线性检测器可以使用线性分类器(如SVM、逻辑回归等)对图像中的物体进行分类和定位，通过训练好的模型对输入图像进行分类，并输出物体的类别和位置信息，从而确定待融合帧图像对应的待融合线稿图。其中，控制网络是一种神经网络架构，该网络可以控制模型并让模型支持更多的输入条件。原始的模型能接受提示词以及原始图像的输入，控制网络提供了包括canny边缘，语义分割图，关键点,涂鸦等多种输入条件，使得人工智能生成内容(Artificial Intelligence Generated Content，AIGC)的可控性大幅提高。Exemplarily, a line drawing to be fused corresponding to each frame image to be fused can be generated by a line drawing acquisition algorithm. Specifically, the line drawing acquisition algorithm can be a well-trained linear detector (LineartDetector) in the control network (Controlnet), where the linear detector can be used to identify and locate the target object in the image, thereby realizing functions such as target tracking and recognition in the control task. The linear detector can use a linear classifier (such as SVM, logistic regression, etc.) to classify and locate objects in the image, classify the input image through a trained model, and output the category and position information of the object, so as to determine the line drawing to be fused corresponding to the frame image to be fused. Among them, the control network is a neural network architecture that can control the model and allow the model to support more input conditions. The original model can accept the input of prompt words and original images. The control network provides a variety of input conditions including canny edges, semantic segmentation maps, key points, graffiti, etc., which greatly improves the controllability of artificial intelligence generated content (AIGC).

通过前述的介绍可知，对待融合视频进行帧分解后可以得到多个待融合帧图像。因此，可以通过线稿图求取算法获取每个待融合帧图像对应的待融合线稿图。From the above introduction, it can be known that after decomposing the video to be fused, multiple frame images to be fused can be obtained. Therefore, the line drawing image to be fused corresponding to each frame image to be fused can be obtained through the line drawing algorithm.

请参阅图8，图8示出了本申请实施例提供的待融合线稿图的示意图。其中，图8示出的待融合线稿图800中包括有待融合对象801。请一并参阅图4和图8，可见图8示出的待融合线稿图800即为图4示出的待融合帧图像400对应的线稿图。Please refer to FIG8 , which shows a schematic diagram of a line drawing to be fused provided in an embodiment of the present application. The line drawing 800 to be fused shown in FIG8 includes an object to be fused 801. Please refer to FIG4 and FIG8 together, and it can be seen that the line drawing 800 to be fused shown in FIG8 is the line drawing corresponding to the frame image 400 to be fused shown in FIG4 .

对于一些实施方式，可以依次获取N个待融合帧图像的待融合线稿图，例如对于待融合帧图像a_n，可以依照1到n顺序依次获取N个待融合帧图像的待融合线稿图；也可以依照n到1的依次获取N个待融合帧图像的待融合线稿图，本申请实施例不做具体限定。For some implementations, the line draft images to be fused of N frame images to be fused can be obtained in sequence. For example, for the frame image a_n to be fused, the line draft images to be fused of N frame images to be fused can be obtained in sequence from 1 to n; or the line draft images to be fused of N frame images to be fused can be obtained in sequence from n to 1. The embodiments of the present application do not make specific limitations.

步骤S250：获取所述初始图像的深度图像以及所述初始图像的初始线稿图。Step S250: Acquire a depth image of the initial image and an initial line drawing of the initial image.

还可以获取初始图像的深度图像。示例性的，可以通过深度学习模型来提取初始图像的深度图像。例如，该深度学习模型可以为单目图像提取深度图算法，具体的可以为diffusers中提供的算法。该单目图像提取深度图算法可以实现根据一张输入的图像预测该图像深度。A depth image of the initial image may also be obtained. Exemplarily, the depth image of the initial image may be extracted by a deep learning model. For example, the deep learning model may be an algorithm for extracting a depth map from a monocular image, specifically an algorithm provided in diffusers. The algorithm for extracting a depth map from a monocular image may be used to predict the depth of the image based on an input image.

进一步的，还可以获取初始图像的初始线稿图。其中，获取初始图像的初始线稿图的方法和前述步骤中获取待融合帧图像对应的待融合线稿图的方法类似，也可以通过线稿图求取算法来获取初始图像的初始线稿图。详细的介绍可以参阅前述步骤，此处不再赘述。Furthermore, an initial line drawing of the initial image can also be obtained. The method for obtaining the initial line drawing of the initial image is similar to the method for obtaining the line drawing corresponding to the frame image to be fused in the aforementioned step, and the initial line drawing of the initial image can also be obtained by a line drawing obtaining algorithm. For a detailed introduction, please refer to the aforementioned steps, which will not be repeated here.

请参阅图9，图9示出了本申请实施例提供的初始线稿图的示意图。其中，图9中的初始线稿图900中包括有目标对象901以及背景902。其中背景902可以为除目标对象901对应的目标区域之外的区域。Please refer to Figure 9, which shows a schematic diagram of an initial line drawing provided by an embodiment of the present application. The initial line drawing 900 in Figure 9 includes a target object 901 and a background 902. The background 902 may be an area other than the target area corresponding to the target object 901.

步骤S260：将所述初始线稿图分别与每个待融合线稿图进行融合，得到多个待处理线稿图像。Step S260: Fusing the initial line drawing with each line drawing to be fused respectively to obtain a plurality of line drawing images to be processed.

进一步的，在获取到初始线稿图以及多个待融合线稿图后，可以将初始线稿图分别与每个待融合线稿图进行融合，从而可以得到多个待处理线稿图像。Furthermore, after obtaining the initial line draft image and a plurality of line draft images to be fused, the initial line draft image may be fused with each line draft image to be fused respectively, so as to obtain a plurality of line draft images to be processed.

在一些实施方式中，可以首先在初始线稿图中确定目标位置，该目标位置用于插入待融合线稿图，从而实现将初始线稿图分别与每个待融合线稿图进行融合。In some implementations, a target position may be first determined in the initial line drawing, and the target position is used to insert the line drawing to be fused, so as to achieve fusion of the initial line drawing with each line drawing to be fused.

其中，可以以初始线稿图建立平面坐标系，例如以该初始线稿图的一个顶点作为坐标原点，以连接于原点的初始线稿图的两边分别作为x轴与y轴建立平面坐标系。从而可以通过坐标(x，y)来表征目标位置。进而，可以将待融合线稿图的指定位置与目标位置对齐，从而实现将初始线稿图与待融合线稿图进行融合。例如，可以将指定位置设定为待融合线稿图的左上角的顶点，从而可以将待融合线稿图的左上角顶点设置于目标位置。Among them, a plane coordinate system can be established with the initial line drawing, for example, a vertex of the initial line drawing is used as the coordinate origin, and the two sides of the initial line drawing connected to the origin are used as the x-axis and the y-axis to establish a plane coordinate system. Thus, the target position can be represented by the coordinates (x, y). Then, the designated position of the line drawing to be fused can be aligned with the target position, so as to achieve the fusion of the initial line drawing and the line drawing to be fused. For example, the designated position can be set as the vertex of the upper left corner of the line drawing to be fused, so that the upper left corner vertex of the line drawing to be fused can be set at the target position.

示例性的，可以人工在初始线稿图中选定目标位置。例如，可以通过人工点击初始线稿图中的某一点，则将被点击的点作为目标位置。又一种示例性的，可以设定在初始线稿图中除目标对象的目标区域之外的任意位置作为目标位置。Exemplarily, the target position can be manually selected in the initial line drawing. For example, a certain point in the initial line drawing can be manually clicked, and the clicked point is used as the target position. In another exemplary embodiment, any position in the initial line drawing except the target area of the target object can be set as the target position.

一种示例性的，若待融合线稿图中的待融合对象为鱼，则可以在初始线稿图中除目标对象的目标区域之外的区域中寻找水对应的区域，从而在水的区域中确定一点作为目标位置。又一种示例性的，若待融合线稿图中的待融合对象为鸟兽，则可以在初始线稿图中除目标对象的目标区域之外的区域中寻找天空对应的区域，从而在天空的区域中确定一点作为目标位置。In an exemplary embodiment, if the object to be fused in the line drawing to be fused is a fish, an area corresponding to water can be found in the area other than the target area of the target object in the initial line drawing, thereby determining a point in the water area as the target position. In another exemplary embodiment, if the object to be fused in the line drawing to be fused is a bird or an animal, an area corresponding to the sky can be found in the area other than the target area of the target object in the initial line drawing, thereby determining a point in the sky area as the target position.

进一步的，将所述初始线稿图分别与每个待融合线稿图进行融合，可以是在目标位置插入待融合线稿图后，将待融合线稿图与初始线稿图进行逐像素加法，从而得到多个待处理线稿图像。需要说明的是，每个待融合线稿图都是在相同的目标位置与初始线稿图进行融合的。Furthermore, the initial line drawing is fused with each line drawing to be fused, which can be done by inserting the line drawing to be fused at the target position and then adding the line drawing to be fused and the initial line drawing pixel by pixel, thereby obtaining a plurality of line drawings to be processed. It should be noted that each line drawing to be fused is fused with the initial line drawing at the same target position.

示例性的，请参阅图10，图10示出了本申请实施例提供的待处理线稿图像的示意图。其中，图10示出的待处理线稿图像1000中包括有目标对象1010、动效对象1020以及背景1030。For example, please refer to Fig. 10, which shows a schematic diagram of a line drawing image to be processed provided by an embodiment of the present application. The line drawing image 1000 to be processed shown in Fig. 10 includes a target object 1010, a motion effect object 1020 and a background 1030.

步骤S270：基于自编码器对所述初始图像进行编码，获取所述初始图像位于潜在空间的潜在特征图像。Step S270: Encode the initial image based on the autoencoder to obtain a latent feature image of the initial image in a latent space.

通过前述的介绍可知，可以将初始图像转换至潜在空间中得到潜在特征图像，从而在潜在空间中进行后续处理，以节约计算资源。From the above introduction, it can be seen that the initial image can be converted into the latent space to obtain the latent feature image, so that subsequent processing can be performed in the latent space to save computing resources.

对于一些实施方式，可以通过自编码器(Variational Autoencoder，VAE)来对初始图像进行编码，从而获取所述初始图像位于潜在空间的潜在特征图像。其中，自编码器包括编码器(Encoder)以及解码器(Decoder)。因此，可以通过自编码器中的编码器对初始图像进行编码，从而获取所述初始图像位于潜在空间的潜在特征图像。其中，自编码器为通过结合自动编码器和变分推断的方式，学习生成具有潜在变量表示的高维数据的模型。For some embodiments, the initial image can be encoded by a variational autoencoder (VAE) to obtain a potential feature image of the initial image in the latent space. The autoencoder includes an encoder and a decoder. Therefore, the initial image can be encoded by the encoder in the autoencoder to obtain a potential feature image of the initial image in the latent space. The autoencoder is a model that learns to generate high-dimensional data represented by latent variables by combining an autoencoder and variational inference.

步骤S280：对所述潜在特征图像增加高斯分布噪声，得到所述噪声图像。Step S280: adding Gaussian distribution noise to the potential feature image to obtain the noise image.

进一步的，可以对潜在特征图像增加噪声，从而得到所述噪声图像。对于一些实施方式，可以对潜在特征图像增加高斯分布噪声，进而得到噪声图像。例如，可以对特征图像多次增加高斯分布噪声。具体的，可以对噪声图像增加T次高斯分布噪声，其中前向步骤可以为马尔可夫链，使得每次增加高斯分布噪声的步骤仅与上一次相关，实现将一张图片变为纯高斯噪声，即得到噪声图像。Further, noise may be added to the latent feature image to obtain the noise image. For some embodiments, Gaussian distribution noise may be added to the latent feature image to obtain the noise image. For example, Gaussian distribution noise may be added to the feature image multiple times. Specifically, T times of Gaussian distribution noise may be added to the noise image, wherein the forward step may be a Markov chain, so that each step of adding Gaussian distribution noise is only related to the previous step, thereby converting a picture into pure Gaussian noise, i.e., obtaining a noise image.

步骤S290：基于指定算法通过所述深度图像、待处理线稿图像以及噪声图像，得到每个所述待处理线稿图像对应的待处理图像，其中，所述噪声图像是基于对所述初始图像加噪得到的图像。Step S290: Obtaining an image to be processed corresponding to each of the line draft images to be processed through the depth image, the line draft image to be processed and the noise image based on a specified algorithm, wherein the noise image is an image obtained by adding noise to the initial image.

进而可以结合深度图像、待处理线稿图像以及噪声图像，通过指定算法进行处理，得到待处理线稿图像对应的待处理图像。而噪声图像，即为前述步骤中对初始图像加噪得到的图像。具体的可以首先将初始图像转换为位于潜在空间中的潜在特征图像，然后对潜在特征图像增加高斯分布噪声，得到所述噪声图像。Then, the depth image, the line image to be processed and the noise image can be combined and processed by a specified algorithm to obtain the image to be processed corresponding to the line image to be processed. The noise image is the image obtained by adding noise to the initial image in the above step. Specifically, the initial image can be first converted into a potential feature image in the latent space, and then Gaussian distribution noise is added to the potential feature image to obtain the noise image.

可以理解的是，由于噪声图像是位于潜在空间中的图像，因此，通过指定算法得到的图像，也是处于潜在空间中的图像。从而，还需要进一步对指定算法得到的图像进行解码处理。It is understandable that, since the noise image is an image in the latent space, the image obtained by the specified algorithm is also an image in the latent space. Therefore, it is necessary to further perform decoding processing on the image obtained by the specified algorithm.

具体的，步骤S290还可以包括步骤S291以及步骤S292。Specifically, step S290 may also include step S291 and step S292.

步骤S291：基于指定算法通过所述深度图像、待处理线稿图像以及噪声图像，得到每个所述待处理线稿图像对应的待解码图像。Step S291: obtaining an image to be decoded corresponding to each of the line draft images to be processed through the depth image, the line draft image to be processed and the noise image based on a specified algorithm.

步骤S292：基于自编码器对每个所述待解码图像进行解码，得到每个所述待处理线稿图像对应的待处理图像。Step S292: Decode each of the to-be-decoded images based on the autoencoder to obtain an image to be processed corresponding to each of the to-be-processed line draft images.

因此，可以首先基于指定算法通过所述深度图像、待处理线稿图像以及噪声图像，得到每个所述待处理线稿图像对应的待解码图像。然而，得到的每个待处理线稿图像对应的待解码图像为位于潜在空间中的图像，因此还可以通过自编码器的解码器对每个待解码图像进行解码，以得到每个所述待处理线稿图像对应的待处理图像。Therefore, the image to be decoded corresponding to each of the line draft images to be processed can be first obtained based on the specified algorithm through the depth image, the line draft image to be processed and the noise image. However, the image to be decoded corresponding to each of the line draft images to be processed is an image located in the latent space, so each image to be decoded can also be decoded by the decoder of the autoencoder to obtain the image to be processed corresponding to each of the line draft images to be processed.

具体的，可以通过自编码器中的解码器来对每个所述待解码图像进行解码。Specifically, each of the to-be-decoded images may be decoded by a decoder in the autoencoder.

其中，在指定算法中结合深度图像、待处理线稿图像以及噪声图像确定的多个待处理图像，可以结合初始图像的语义和上下文理解，使得生成的待处理图像更加符合初始图像的主题或语义，进而提高后续目标视频的整体质量与吸引力，使得目标视频与初始图像形成一致的视觉风格和主题。Among them, the multiple images to be processed determined by combining the depth image, the line draft image to be processed and the noise image in the specified algorithm can be combined with the semantics and contextual understanding of the initial image, so that the generated image to be processed is more consistent with the theme or semantics of the initial image, thereby improving the overall quality and attractiveness of the subsequent target video, so that the target video forms a consistent visual style and theme with the initial image.

基于指定算法通过所述深度图像、待处理线稿图像以及噪声图像，得到每个所述待处理线稿图像对应的待处理图像的详细介绍，可以参阅后续实施例。A detailed description of the image to be processed corresponding to each of the line draft images to be processed is obtained through the depth image, the line draft image to be processed and the noise image based on a specified algorithm, and reference may be made to subsequent embodiments.

步骤S2100：基于所述掩膜图像确定每个所述待处理图像对应的第一色彩参数，基于所述初始图像确定第二色彩参数。Step S2100: determining a first color parameter corresponding to each of the to-be-processed images based on the mask image, and determining a second color parameter based on the initial image.

步骤S2110：基于所述第二色彩参数、掩膜图像、初始图像以及每个所述待处理图像对应的第一色彩参数对每个所述待处理图像进行重打光，得到目标图像。Step S2110: relighting each of the images to be processed based on the second color parameter, the mask image, the initial image and the first color parameter corresponding to each of the images to be processed to obtain a target image.

步骤S2120：基于所述目标图像生成目标视频。Step S2120: Generate a target video based on the target image.

其中，步骤S2100至步骤S2120的详细介绍，可以参阅前述实施例，此处不再赘述。Among them, the detailed description of steps S2100 to S2120 can be found in the aforementioned embodiment and will not be repeated here.

本申请实施例提供的视频生成方法，可以转换至潜在空间中对初始图像进行处理，而在潜在空间中处理初始图像，可以节约计算资源。并且，在指定算法中结合深度图像、待处理线稿图像以及噪声图像确定的多个待处理图像，可以结合初始图像的语义和上下文理解，使得生成的待处理图像更加符合初始图像的主题或语义，进而提高后续目标视频的整体质量与吸引力，使得目标视频与初始图像形成一致的视觉风格和主题。也就是说，本申请实施例中通过AIGC来获取目标视频，能够更好地理解和表达初始图像的意图，从而提升了目标视频的视觉效果和整体品质The video generation method provided in the embodiment of the present application can be converted to the latent space to process the initial image, and processing the initial image in the latent space can save computing resources. Moreover, the multiple images to be processed determined by combining the depth image, the line draft image to be processed, and the noise image in the specified algorithm can be combined with the semantics and contextual understanding of the initial image, so that the generated image to be processed is more consistent with the theme or semantics of the initial image, thereby improving the overall quality and attractiveness of the subsequent target video, so that the target video and the initial image form a consistent visual style and theme. In other words, in the embodiment of the present application, obtaining the target video through AIGC can better understand and express the intention of the initial image, thereby improving the visual effect and overall quality of the target video.

请参阅图11，图11示出了本申请实施例提供的一种视频生成方法的方法流程图。该视频生成方法可以应用于图1示出的视频生成场景中的电子设备，具体的可以将电子设备的处理器作为执行该视频生成方法的执行主体。该视频生成方法可以包括步骤S310至步骤S3120。Please refer to Figure 11, which shows a method flow chart of a video generation method provided by an embodiment of the present application. The video generation method can be applied to the electronic device in the video generation scene shown in Figure 1, and specifically, the processor of the electronic device can be used as the execution subject of the video generation method. The video generation method may include steps S310 to S3120.

步骤S310：获取待融合视频以及初始图像，并对所述待融合视频进行帧分解，得到多个待融合帧图像。Step S310: obtaining a video to be fused and an initial image, and performing frame decomposition on the video to be fused to obtain a plurality of frame images to be fused.

步骤S320：获取所述初始图像中目标对象的掩膜图像。Step S320: Acquire a mask image of the target object in the initial image.

步骤S330：获取每个所述待融合帧图像对应的待融合线稿图。Step S330: Obtain the line drawing to be fused corresponding to each of the frame images to be fused.

步骤S340：获取所述初始图像的深度图像以及所述初始图像的初始线稿图。Step S340: Acquire a depth image of the initial image and an initial line drawing of the initial image.

步骤S350：将所述初始线稿图分别与每个待融合线稿图进行融合，得到多个待处理线稿图像。Step S350: fusing the initial line drawing with each line drawing to be fused respectively to obtain a plurality of line drawing images to be processed.

其中，步骤S310至步骤S350在前述实施例中已经详细介绍，此处不再赘述。Among them, step S310 to step S350 have been introduced in detail in the above embodiments and will not be repeated here.

步骤S360：基于深度模型提取所述深度图像的深度特征信息。Step S360: extracting depth feature information of the depth image based on the depth model.

步骤S370：基于线稿模型提取所述待处理线稿图像的线稿特征信息。Step S370: extracting line drawing feature information of the line drawing image to be processed based on the line drawing model.

在一些实施方式中，在基于指定算法通过所述深度图像、待处理线稿图像以及噪声图像，得到每个所述待处理线稿图像对应的待处理图像时，可以首先获取深度图像的深度特征信息以及待处理线稿图像的线稿特征信息。也就是说，输入至指定算法的可以包括深度图像提取得到的深度特征信息以及待处理线稿图像对应的待处理图像。从而通过指定算法，可以考虑深度特征信息以及线稿特征信息，进而实现考虑初始图像的特征、语义和上下文信息，使得后续生成的目标视频可以形成更加协调和统一的视觉效果。In some embodiments, when obtaining the to-be-processed image corresponding to each of the to-be-processed line draft images through the depth image, the to-be-processed line draft image, and the noise image based on the specified algorithm, the depth feature information of the depth image and the line draft feature information of the to-be-processed line draft image may be first obtained. In other words, the input to the specified algorithm may include the depth feature information extracted from the depth image and the to-be-processed image corresponding to the to-be-processed line draft image. Thus, through the specified algorithm, the depth feature information and the line draft feature information may be considered, and then the features, semantics, and context information of the initial image may be considered, so that the target video generated subsequently can form a more coordinated and unified visual effect.

具体的，可以通过深度模型(Controlnet-depth)提取所述深度图像的深度特征信息；通过线稿模型(Controlnet-lineart)提取所述待处理线稿图像的线稿特征信息。其中，可以将深度图像作为条件(Condition)输入至深度模型，从而得到深度模型输出的深度特征信息；可以将待处理线稿图像作为条件输入至线稿模型，从而得到线稿模型输出的线稿特征信息。Specifically, the depth feature information of the depth image can be extracted by the depth model (Controlnet-depth); and the line feature information of the line image to be processed can be extracted by the line model (Controlnet-lineart). The depth image can be input into the depth model as a condition (Condition), thereby obtaining the depth feature information output by the depth model; and the line image to be processed can be input into the line model as a condition, thereby obtaining the line feature information output by the line model.

其中，线稿特征信息可以用于表征输入的待处理线稿图的轮廓以及边缘的特征。The line drawing feature information may be used to characterize the contour and edge features of the input line drawing to be processed.

步骤S380：对所述噪声图像中的目标对象增加高斯噪声，得到中间图像。Step S380: Add Gaussian noise to the target object in the noise image to obtain an intermediate image.

对于一些实施方式，可以进一步的为噪声图像中的目标对象增加噪声，例如增加高斯噪声，从而得到中间图像。其中，噪声图像可以为对所述初始图像加噪得到的图像。获取噪声图像的详细介绍，可以参阅前述实施例的介绍，此处不再赘述。For some implementations, noise may be further added to the target object in the noise image, such as adding Gaussian noise, to obtain an intermediate image. The noise image may be an image obtained by adding noise to the initial image. For a detailed introduction to obtaining the noise image, please refer to the introduction of the aforementioned embodiment, which will not be repeated here.

具体的，步骤S380的可以包括步骤S381至步骤S384。Specifically, step S380 may include steps S381 to S384.

步骤S381：基于所述掩膜图像确定所述噪声图像中的目标对象对应的目标区域，作为第一图像区域。Step S381: determining a target area corresponding to a target object in the noise image based on the mask image as a first image area.

步骤S382：基于所述掩膜图像确定所述噪声图像中除第一图像区域之外的区域，作为第二图像区域。Step S382: determining, based on the mask image, an area in the noise image other than the first image area as a second image area.

步骤S383：对所述第一图像区域增加高斯噪声，得到噪声图像区域。Step S383: adding Gaussian noise to the first image area to obtain a noisy image area.

步骤S384：将噪声图像区域与第二图像区域融合，得到所述中间图像。Step S384: Fusing the noise image area with the second image area to obtain the intermediate image.

可以理解的是，为了对噪声图像中的目标对象增加高斯噪声，可以首先确定出噪声图像中的目标对象所对应的区域。具体的，可以通过掩膜图像来确定噪声图像中的目标对象对应的目标区域，作为第一图像区域。It is understandable that in order to add Gaussian noise to the target object in the noise image, the region corresponding to the target object in the noise image can be determined first. Specifically, the target region corresponding to the target object in the noise image can be determined by a mask image as the first image region.

进一步的，还可以确定噪声图像中除第一图像区域之外的区域。类似的，也可以通过掩膜图像来确定噪声图像中除第一图像区域之外的区域，作为第二图像区域。Furthermore, the region other than the first image region in the noise image may also be determined. Similarly, the region other than the first image region in the noise image may also be determined by the mask image as the second image region.

为了实现对噪声图像中的目标对象增加高斯噪声，可以对所述第一图像区域增加高斯噪声，而不对第二区域增加高斯噪声。在一些实施方式中，可以对所述第一图像区域增加高斯噪声，得到噪声图像区域。然后将噪声图像区域与第二图像区域融合，得到所述中间图像。也就是说，中间图像中包括的第二图像区域，是未被再次增加高斯的噪声的区域。In order to add Gaussian noise to the target object in the noise image, Gaussian noise may be added to the first image area, but not to the second area. In some embodiments, Gaussian noise may be added to the first image area to obtain a noise image area. The noise image area is then fused with the second image area to obtain the intermediate image. That is, the second image area included in the intermediate image is an area to which Gaussian noise is not added again.

示例性的，步骤S341至步骤S344可以通过下述公式(1)来表征：Exemplarily, steps S341 to S344 can be represented by the following formula (1):

E＝Di*C+E*(1-C) (1)E＝Di*C+E*(1-C) (1)

其中，C用于表征掩膜图像，例如可以为128*128像素的图像；Di用于表征噪声图像；Di*C相当于对第一区域加了高斯噪声，也即是对噪声图像中的目标对象增加了高斯噪声，从而得到噪声图像区域；(1-C)对应的是背景的掩膜，也即是掩膜图像中除目标对象的目标区域之外的区域；E*(1-C)相当于不对第二区域增加高斯噪声，也即是不对噪声图像中的目标对象的目标区域之外的区域增加高斯噪声，此处相当于直接复用第二区域。也即是实现了将噪声图像区域与第二图像区域融合，得到所述中间图像。Among them, C is used to characterize the mask image, for example, it can be an image of 128*128 pixels; Di is used to characterize the noise image; Di*C is equivalent to adding Gaussian noise to the first area, that is, adding Gaussian noise to the target object in the noise image, thereby obtaining the noise image area; (1-C) corresponds to the background mask, that is, the area in the mask image except the target area of the target object; E*(1-C) is equivalent to not adding Gaussian noise to the second area, that is, not adding Gaussian noise to the area outside the target area of the target object in the noise image, which is equivalent to directly multiplexing the second area. That is, the fusion of the noise image area and the second image area is realized to obtain the intermediate image.

需要说明的是，通过上述公式(1)得到的输出可以为再次加噪后的噪声图像，也即是E，而E可以作为下一次执行公式(1)的输入量，从而实现对噪声图像的迭代。可以多次通过上述公式(1)实现对E的迭代，以对第一区域多次添加高斯噪声。例如，在一些实施方式中，多次可以为i次，也即是通过i次对E的迭代，实现对第一区域增加i次高斯噪声，从而得到的i次迭代后的输出E即为中间图像。It should be noted that the output obtained by the above formula (1) can be a noise image after the noise is added again, that is, E, and E can be used as the input quantity for the next execution of formula (1), so as to implement the iteration of the noise image. The above formula (1) can be used to implement the iteration of E multiple times, so as to add Gaussian noise to the first area multiple times. For example, in some embodiments, the multiple times can be i times, that is, by iterating E i times, adding i times of Gaussian noise to the first area, so that the output E after i iterations is the intermediate image.

步骤S390：基于指定算法通过所述深度特征信息以及线稿特征信息对所述中间图像进行降噪处理，得到每个所述待处理线稿图像对应的待处理图像。Step S390: Based on a specified algorithm, the intermediate image is subjected to noise reduction processing using the depth feature information and the line draft feature information to obtain an image to be processed corresponding to each of the line draft images to be processed.

进一步的，在通过前述步骤得到深度特征信息以及线稿特征信息后，可以通过指定算法对中间图像进行降噪处理，以得到每个所述待处理线稿图像对应的待处理图像。Furthermore, after obtaining the depth feature information and the line draft feature information through the aforementioned steps, the intermediate image may be subjected to noise reduction processing by a specified algorithm to obtain an image to be processed corresponding to each of the line draft images to be processed.

需要说明的是，首先对一个图像进行加噪，再对加噪后的图像进行降噪处理，实质是利用了inpainting算法。由于inpainting算法主要用于修复图像中的缺失或损坏部分。该方法通过分析周围图像的信息，利用图像中未破损的信息进行图像修复或者修补，通常用于恢复受损的照片、去除或是降低图像中的噪声。因此，通过首先对一个图像进行加噪，再对加噪后的图像调用inpainting算法进行降噪处理，可以恢复得到被加噪之前的图像的信息。It should be noted that first adding noise to an image and then performing noise reduction on the noisy image actually utilizes the inpainting algorithm. Since the inpainting algorithm is mainly used to repair missing or damaged parts in an image. This method analyzes the information of surrounding images and uses the undamaged information in the image to perform image restoration or repair. It is usually used to restore damaged photos and remove or reduce noise in images. Therefore, by first adding noise to an image and then calling the inpainting algorithm to perform noise reduction on the noisy image, the information of the image before the noise is added can be restored.

另外需要说明的是，若在不结合其他条件引导的情况下，直接对一个图像进行加噪，再对加噪后的图像进行降噪处理，即使能还原出图像，然后还原得到的图像细节会损失在通过自编码器进行编码和解码的过程中，例如会损失在上下采样过程中。因此，本申请实施例中在对中间图像进行降噪处理的过程中，通过深度特征信息以及线稿特征信息来对指定算法进行引导，可以提高获取到的每个所述待处理线稿图像对应的待处理图像的效果，具体的，可以避免得到的待处理图像中在目标对象对应的目标区域与除目标对象对应的目标区域之外的区域的交界处出现不自然的瑕疵(artifacts)。It should also be noted that if an image is directly denoised without the guidance of other conditions, and then the denoised image is denoised, even if the image can be restored, the restored image details will be lost in the process of encoding and decoding through the autoencoder, for example, in the process of ups and downsampling. Therefore, in the embodiment of the present application, in the process of denoising the intermediate image, the specified algorithm is guided by the depth feature information and the line feature information, which can improve the effect of the image to be processed corresponding to each of the line draft images to be processed. Specifically, it can avoid the appearance of unnatural artifacts at the junction of the target area corresponding to the target object and the area other than the target area corresponding to the target object in the obtained image to be processed.

可选的，在一些实施方式中，步骤S390还可以包括步骤S391以及步骤S392。Optionally, in some implementations, step S390 may also include step S391 and step S392.

步骤S391：获取效果提示词，所述效果提示词用于调整所述中间图像的光影效果。Step S391: Obtain effect prompt words, where the effect prompt words are used to adjust the light and shadow effects of the intermediate image.

可选的，在一些实施方式中，除了通过深度特征信息、线稿特征信息来引导指定算法对中间图像进行操作之外，还可以结合效果提示词来进一步对中间图像进行调整。Optionally, in some implementations, in addition to using depth feature information and line drawing feature information to guide a specified algorithm to operate on the intermediate image, the intermediate image may also be further adjusted in combination with effect prompt words.

其中，效果提示词可以用于调整中间图像的光影效果。例如，可以预先存储有一些预设的效果提示词，从而用户可以直接从预设的效果提示词中选定需要的效果提示词。又例如，用户也可以直接输入需要的效果提示词。The effect prompt words can be used to adjust the light and shadow effects of the intermediate image. For example, some preset effect prompt words can be pre-stored, so that the user can directly select the desired effect prompt word from the preset effect prompt words. For another example, the user can also directly input the desired effect prompt word.

示例性的，效果提示词可以为“gold light”、“dark night”等。Exemplarily, the effect prompt words may be “gold light”, “dark night”, etc.

步骤S392：基于指定算法通过所述深度特征信息、线稿特征信息以及所述效果提示词对所述中间图像进行目标操作，得到每个所述待处理线稿图像对应的待处理图像，所述目标操作包括降噪处理以及光影效果调整。Step S392: Based on a specified algorithm, a target operation is performed on the intermediate image through the depth feature information, the line drawing feature information and the effect prompt word to obtain a to-be-processed image corresponding to each to-be-processed line drawing image, wherein the target operation includes noise reduction processing and light and shadow effect adjustment.

在获取到效果提示词后，可以基于指定算法通过所述深度特征信息、线稿特征信息以及所述效果提示词对所述中间图像进行目标操作，得到每个所述待处理线稿图像对应的待处理图像。目标操作包括降噪处理以及光影效果调整。其中，通过深度特征信息以及线稿特征信息引导指定算法来对中间图像进行降噪处理的相关介绍，可以参阅前述基于指定算法通过所述深度图像、待处理线稿图像以及噪声图像，得到每个所述待处理线稿图像对应的待处理图像的相关介绍，此处不再赘述。After obtaining the effect prompt word, the target operation can be performed on the intermediate image based on the specified algorithm through the depth feature information, the line draft feature information and the effect prompt word to obtain the image to be processed corresponding to each of the line draft images to be processed. The target operation includes noise reduction processing and light and shadow effect adjustment. Among them, the relevant introduction of guiding the specified algorithm to perform noise reduction processing on the intermediate image through the depth feature information and the line draft feature information can refer to the aforementioned relevant introduction of obtaining the image to be processed corresponding to each of the line draft images to be processed through the depth image, the line draft image to be processed and the noise image based on the specified algorithm, which will not be repeated here.

进一步的，还可以通过效果提示词引导指定算法调整中间图像的光影效果，从而获得更加逼真的光照和阴影效果。Furthermore, the effect prompt words may be used to guide a designated algorithm to adjust the light and shadow effects of the intermediate image, thereby obtaining more realistic light and shadow effects.

步骤S3100：基于所述掩膜图像确定每个所述待处理图像对应的第一色彩参数，基于所述初始图像确定第二色彩参数。Step S3100: determining a first color parameter corresponding to each of the to-be-processed images based on the mask image, and determining a second color parameter based on the initial image.

步骤S3110：基于所述第二色彩参数、掩膜图像、初始图像以及每个所述待处理图像对应的第一色彩参数对每个所述待处理图像进行重打光，得到目标图像。Step S3110: relighting each of the images to be processed based on the second color parameter, the mask image, the initial image and the first color parameter corresponding to each of the images to be processed to obtain a target image.

步骤S3120：基于所述目标图像生成目标视频。Step S3120: Generate a target video based on the target image.

其中，步骤S3100在前述实施例中已经详细介绍，此处不再赘述。Among them, step S3100 has been introduced in detail in the above embodiment and will not be repeated here.

本申请实施例提供的视频生成方法，通过指定算法，可以考虑深度特征信息以及线稿特征信息，进而实现考虑初始图像的特征、语义和上下文信息，使得后续生成的目标视频可以形成更加协调和统一的视觉效果。并且，可以避免得到的待处理图像中在目标对象对应的目标区域与除目标对象对应的目标区域之外的区域的交界处出现不自然的瑕疵。另外，本申请实施例中还可以通过效果提示词引导指定算法调整中间图像的光影效果，从而获得更加逼真的光照和阴影效果。The video generation method provided in the embodiment of the present application can consider the depth feature information and the line feature information through the specified algorithm, and then realize the consideration of the features, semantics and context information of the initial image, so that the target video generated subsequently can form a more coordinated and unified visual effect. In addition, it is possible to avoid unnatural defects at the junction of the target area corresponding to the target object and the area other than the target area corresponding to the target object in the obtained image to be processed. In addition, in the embodiment of the present application, the specified algorithm can be guided by the effect prompt word to adjust the light and shadow effects of the intermediate image, so as to obtain more realistic lighting and shadow effects.

需要说明的是，虽然生成的目标视频中的动效对象具体是由待融合视频中的动效对象确定的，例如待融合视频中的动效对象为一只龙，则生成的目标视频中的动效对象也为一只龙。然而，目标视频中动效对象的光照、阴影、显示的透明度等，在本申请提供的视频生成方法中可以被灵活调整，使得生成的目标视频具有更加协调和统一的视觉效果。It should be noted that although the motion effect object in the generated target video is specifically determined by the motion effect object in the video to be fused, for example, if the motion effect object in the video to be fused is a dragon, the motion effect object in the generated target video is also a dragon. However, the lighting, shadow, display transparency, etc. of the motion effect object in the target video can be flexibly adjusted in the video generation method provided in this application, so that the generated target video has a more coordinated and unified visual effect.

请参阅图12，图12示出了本申请实施例提供的一种视频生成方法的方法流程图。该视频生成方法可以应用于图1示出的视频生成场景中的电子设备，具体的可以将电子设备的处理器作为执行该视频生成方法的执行主体。该视频生成方法可以包括步骤S410至步骤S4110。Please refer to Figure 12, which shows a method flow chart of a video generation method provided by an embodiment of the present application. The video generation method can be applied to the electronic device in the video generation scene shown in Figure 1, and specifically, the processor of the electronic device can be used as the execution subject of the video generation method. The video generation method may include steps S410 to S4110.

步骤S410：获取待融合视频以及初始图像，并对所述待融合视频进行帧分解，得到多个待融合帧图像。Step S410: obtaining a video to be fused and an initial image, and performing frame decomposition on the video to be fused to obtain a plurality of frame images to be fused.

步骤S420：获取所述初始图像中目标对象的掩膜图像。Step S420: Acquire a mask image of the target object in the initial image.

步骤S430：基于指定算法将所述初始图像分别与每个所述待融合帧图像进行融合，得到多个待处理图像。Step S430: Based on a specified algorithm, the initial image is fused with each of the frame images to be fused to obtain a plurality of images to be processed.

其中，步骤S410至步骤S430在前述实施例中已经详细介绍，此处不再赘述。Among them, step S410 to step S430 have been introduced in detail in the above embodiments and will not be repeated here.

步骤S440：基于所述掩膜图像确定所述待处理图像中目标对象对应的目标区域之外的区域，作为第三图像区域。Step S440: determining, based on the mask image, an area outside the target area corresponding to the target object in the image to be processed as a third image area.

通过前述实施例的介绍可知，由于每个待处理图像中除目标对象的目标区域之外的区域相较于初始图像发生了变化，因此可以基于待处理图像中除目标对象的目标区域之外的区域的色彩参数来对目标对象进行重打光，使得目标对象的光影更加逼真。From the introduction of the foregoing embodiments, it can be seen that since the area other than the target area of the target object in each image to be processed has changed compared to the initial image, the target object can be re-lit based on the color parameters of the area other than the target area of the target object in the image to be processed, so that the light and shadow of the target object are more realistic.

因此，可以首先确定待处理图像中目标对象对应的目标区域之外的区域。具体的，可以基于掩膜图像确定待处理图像中目标对象对应的目标区域之外的区域，作为第三图像区域，进而可以对第三图像区域进行重打光。示例性的，可以基于第三图像区域确定第一色彩参数，后续再结合初始图像中目标对象对应的目标区域来确定第二色彩参数，从而可以结合第一色彩参数以及第二色彩参数对第三图像区域进行重打光。详细的介绍请继续参阅后续步骤。Therefore, the area outside the target area corresponding to the target object in the image to be processed can be determined first. Specifically, the area outside the target area corresponding to the target object in the image to be processed can be determined based on the mask image as the third image area, and then the third image area can be re-lit. Exemplarily, the first color parameter can be determined based on the third image area, and then the second color parameter can be determined in combination with the target area corresponding to the target object in the initial image, so that the third image area can be re-lit in combination with the first color parameter and the second color parameter. Please refer to the subsequent steps for a detailed introduction.

步骤S450：获取每个第三图像区域中红色通道对应的第一红色参数、绿色通道的第一绿色参数以及蓝色通道对应的第一蓝色参数。Step S450: Obtain a first red parameter corresponding to the red channel, a first green parameter corresponding to the green channel, and a first blue parameter corresponding to the blue channel in each third image region.

可以理解的是，图像的区域可以包括多个像素点，每个像素点由红绿蓝(RGB)三种色彩通道确定，每种色彩通道可以具有不同的数值，从而可以使像素点显示出不同的色彩。It is understandable that the region of the image may include a plurality of pixels, each pixel being determined by three color channels of red, green and blue (RGB), and each color channel may have a different value, so that the pixel may display a different color.

因此，在一些实施方式中，确定第三图像区域对应的第一色彩参数的方法具体可以是首先获取第三图像区域中红色通道对应的第一红色参数、绿色通道的第一绿色参数以及蓝色通道对应的第一蓝色参数。其中，第一红色参数可以表征第三图像区域中红色通道的色彩平均值，也即是第三图像区域中每个像素点的红色通道的数值的平均值。类似的，第一绿色参数可以表征第三图像区域中绿色通道的色彩平均值，也即是第三图像区域中每个像素点的绿色通道的数值的平均值。第一蓝色参数可以表征第三图像区域中蓝色通道的色彩平均值，也即是第三图像区域中每个像素点的蓝色通道的数值的平均值。Therefore, in some embodiments, the method for determining the first color parameter corresponding to the third image area may specifically be to first obtain the first red parameter corresponding to the red channel, the first green parameter of the green channel, and the first blue parameter corresponding to the blue channel in the third image area. The first red parameter may represent the color average of the red channel in the third image area, that is, the average value of the red channel of each pixel in the third image area. Similarly, the first green parameter may represent the color average of the green channel in the third image area, that is, the average value of the green channel of each pixel in the third image area. The first blue parameter may represent the color average of the blue channel in the third image area, that is, the average value of the blue channel of each pixel in the third image area.

需要说明的是，由于待处理图像为多个，因此，基于每个待处理图像确定的第一红色参数、第一绿色参数以及第一蓝色参数可以稍有差别。具体的，可以对待处理图像进行色彩分析，从而得到每个第三图像区域中红色通道对应的第一红色参数、绿色通道的第一绿色参数以及蓝色通道对应的第一蓝色参数。It should be noted that, since there are multiple images to be processed, the first red parameter, the first green parameter, and the first blue parameter determined based on each image to be processed may be slightly different. Specifically, color analysis may be performed on the image to be processed to obtain the first red parameter corresponding to the red channel, the first green parameter of the green channel, and the first blue parameter corresponding to the blue channel in each third image region.

步骤S460：基于第一红色参数、第一绿色参数以及第一蓝色参数确定每个所述待处理图像对应的第一色彩参数。Step S460: Determine a first color parameter corresponding to each of the to-be-processed images based on the first red parameter, the first green parameter, and the first blue parameter.

在获取到第一红色参数、第一绿色参数以及第一蓝色参数后，可以基于第一红色参数、第一绿色参数以及第一蓝色参数确定每个所述待处理图像对应的第一色彩参数。对于一些实施方式，可以将第一红色参数、第一绿色参数以及第一蓝色参数的平均值，作为每个所述待处理图像对应的第一色彩参数。示例性的，可以通过F_mean表征第一色彩参数。After obtaining the first red parameter, the first green parameter, and the first blue parameter, the first color parameter corresponding to each of the to-be-processed images may be determined based on the first red parameter, the first green parameter, and the first blue parameter. For some implementations, the average value of the first red parameter, the first green parameter, and the first blue parameter may be used as the first color parameter corresponding to each of the to-be-processed images. Exemplarily, the first color parameter may be characterized by F_mean.

步骤S470：获取初始图像中红色通道对应的第二红色参数、绿色通道的第二绿色参数以及蓝色通道对应的第二蓝色参数。Step S470: Obtain a second red parameter corresponding to the red channel, a second green parameter corresponding to the green channel, and a second blue parameter corresponding to the blue channel in the initial image.

确定初始图像对应的第二色彩参数的方法具体可以是首先获取初始图像中红色通道对应的第二红色参数、绿色通道的第二绿色参数以及蓝色通道对应的第二蓝色参数。其中，第二红色参数可以表征初始图像中红色通道的色彩平均值，也即是初始图像中每个像素点的红色通道的数值的平均值。类似的，第二绿色参数可以表征初始图像中绿色通道的色彩平均值，也即是初始图像中每个像素点的绿色通道的数值的平均值。第二蓝色参数可以表征初始图像中蓝色通道的色彩平均值，也即是初始图像中每个像素点的蓝色通道的数值的平均值。The method for determining the second color parameter corresponding to the initial image may specifically be to first obtain the second red parameter corresponding to the red channel, the second green parameter of the green channel, and the second blue parameter corresponding to the blue channel in the initial image. The second red parameter may represent the color average of the red channel in the initial image, that is, the average value of the red channel of each pixel in the initial image. Similarly, the second green parameter may represent the color average of the green channel in the initial image, that is, the average value of the green channel of each pixel in the initial image. The second blue parameter may represent the color average of the blue channel in the initial image, that is, the average value of the blue channel of each pixel in the initial image.

需要说明的是，确定初始图像中红色通道对应的第二红色参数、绿色通道的第二绿色参数以及蓝色通道对应的第二蓝色参数，可以是确定初始图像中整个图像区域对应的第二红色参数、第二绿色参数以及第二蓝色参数。It should be noted that determining the second red parameter corresponding to the red channel, the second green parameter of the green channel, and the second blue parameter corresponding to the blue channel in the initial image may be determining the second red parameter, the second green parameter, and the second blue parameter corresponding to the entire image area in the initial image.

步骤S480：基于第二红色参数、第二绿色参数以及第二蓝色参数确定所述第二色彩参数。Step S480: Determine the second color parameter based on a second red parameter, a second green parameter and a second blue parameter.

在获取到第二红色参数、第二绿色参数以及第二蓝色参数后，可以基于第二红色参数、第二绿色参数以及第二蓝色参数确定初始图像对应的第二色彩参数。对于一些实施方式，可以将第二红色参数、第二绿色参数以及第二蓝色参数的平均值，作为初始图像对应的第二色彩参数。示例性的，可以通过A_mean表征第二色彩参数。After obtaining the second red parameter, the second green parameter, and the second blue parameter, the second color parameter corresponding to the initial image can be determined based on the second red parameter, the second green parameter, and the second blue parameter. For some embodiments, the average value of the second red parameter, the second green parameter, and the second blue parameter can be used as the second color parameter corresponding to the initial image. Exemplarily, the second color parameter can be characterized by A_mean.

步骤S490：基于所述掩膜图像确定所述初始图像中目标对象对应的目标区域，作为第四图像区域。Step S490: determining a target area corresponding to the target object in the initial image based on the mask image as a fourth image area.

为了对待处理图像中的目标对象进行重打光，而每个待处理图像中的目标对象，即为初始图像中的目标对象，从而可以确定初始图像中目标对象对应的目标区域。In order to relight the target object in the image to be processed, the target object in each image to be processed is the target object in the initial image, so that the target area corresponding to the target object in the initial image can be determined.

具体的，可以基于所述掩膜图像确定所述初始图像中目标对象对应的目标区域，作为第四图像区域。Specifically, a target area corresponding to the target object in the initial image may be determined based on the mask image as the fourth image area.

步骤S4100：将基于指定色彩参数重打光后的第四图像区域与第三图像区域进行融合，得到目标图像，所述指定色彩参数为基于所述第一色彩参数与所述第二色彩参数的商确定的参数，所述第三图像区域为基于所述掩膜图像确定所述待处理图像中除目标对象对应的目标区域之外的区域。Step S4100: The fourth image area that has been re-lit based on specified color parameters is merged with the third image area to obtain a target image, wherein the specified color parameters are parameters determined based on the quotient of the first color parameter and the second color parameter, and the third image area is an area in the image to be processed determined based on the mask image excluding the target area corresponding to the target object.

在获取到第一色彩参数以及第二色彩参数之后，可以将基于指定色彩参数重打光后的第四图像区域与第三图像区域进行融合，得到目标图像。其中，指定色彩参数为基于所述第一色彩参数与所述第二色彩参数的商确定的参数，所述第三图像区域为基于所述掩膜图像确定所述待处理图像中除目标对象对应的目标区域之外的区域。After obtaining the first color parameter and the second color parameter, the fourth image area after relighting based on the specified color parameter can be fused with the third image area to obtain a target image. The specified color parameter is a parameter determined based on the quotient of the first color parameter and the second color parameter, and the third image area is an area other than the target area corresponding to the target object in the image to be processed determined based on the mask image.

示例性的，可以通过下式(2)来描述得到目标图像的具体方法：Exemplarily, the specific method of obtaining the target image can be described by the following formula (2):

G＝A*C*F_mean/A_mean+F*(1-C) (2)G＝A*C*F_mean/A_mean+F*(1-C) (2)

其中，G用于表征目标图像；A用于表征初始图像；C用于表征掩膜图像；F用于表征待处理图像；F*(1-C)相当于直接复用第四区域，也即是不对第四区域进行重打光；F_mean用于表征第一色彩参数、A_mean用于表征第二色彩参数；则F_mean/A_mean用于表征指定色彩参数；A*C*F_mean/A_mean则表征对第三区域进行重打光。而A*C*F_mean/A_mean+F*(1-C)则表征将基于指定色彩参数重打光后的第四图像区域与第三图像区域进行融合，得到目标图像G。Among them, G is used to represent the target image; A is used to represent the initial image; C is used to represent the mask image; F is used to represent the image to be processed; F*(1-C) is equivalent to directly multiplexing the fourth area, that is, not relighting the fourth area; F_mean is used to represent the first color parameter, A_mean is used to represent the second color parameter; then F_mean/A_mean is used to represent the specified color parameter; A*C*F_mean/A_mean represents relighting the third area. And A*C*F_mean/A_mean+F*(1-C) represents the fusion of the fourth image area relighted based on the specified color parameter with the third image area to obtain the target image G.

步骤S4110：基于所述目标图像生成目标视频。Step S4110: Generate a target video based on the target image.

其中，步骤S4110在前述实施例中已经详细介绍，此处不再赘述。Among them, step S4110 has been introduced in detail in the above embodiment and will not be repeated here.

本申请实施例提供的视频生成方法，获取到的目标图像中，实现了基于第三区域对第四区域中的目标对象进行重打光。目标对象所在的目标区域，以及除目标对象所在的目标区域之外的区域的光照和阴影的一致性较好，这两种区域的融合边缘也不会存在不自然的瑕疵。The video generation method provided in the embodiment of the present application realizes relighting of the target object in the fourth area based on the third area in the acquired target image. The target area where the target object is located and the area other than the target area where the target object is located have good consistency in lighting and shadow, and there will be no unnatural defects in the fusion edge of the two areas.

请参阅图13，图13示出了本申请实施例提供的一种视频生成方法的方法流程图。该视频生成方法可以应用于图1示出的视频生成场景中的电子设备，具体的可以将电子设备的处理器作为执行该视频生成方法的执行主体。该视频生成方法可以包括步骤S510至步骤S5250。Please refer to Figure 13, which shows a method flow chart of a video generation method provided by an embodiment of the present application. The video generation method can be applied to the electronic device in the video generation scene shown in Figure 1, and specifically, the processor of the electronic device can be used as the execution subject of the video generation method. The video generation method can include steps S510 to S5250.

步骤S510：开始。Step S510: Start.

步骤S520：获取待融合视频。Step S520: Obtain the video to be fused.

步骤S530：对待融合视频进行帧分解。Step S530: decomposing the video to be fused into frames.

首先可以获取待融合视频，待融合视频即为包含有动效对象的视频。然后对待融合视频进行帧分解，例如，待融合视频包括有N帧图像，则可以通过a_n来依次表征每一个待融合帧图像，其中n＝1,2,3…N。First, the video to be fused can be obtained, that is, the video containing the motion effect object. Then, the video to be fused is frame-decomposed. For example, if the video to be fused includes N frames of images, each frame of the image to be fused can be represented in turn by a_n, where n=1, 2, 3...N.

具体的获取待融合视频的方式以及进行帧分解的方式，可以参阅前述实施例的介绍，此处不再赘述。The specific method of obtaining the video to be fused and the method of performing frame decomposition can be found in the introduction of the aforementioned embodiment, which will not be repeated here.

步骤S540：获取待融合线稿图。Step S540: Obtain the line drawing to be fused.

进一步的，还可以获取每个所述待融合帧图像对应的待融合线稿图，从而得到多个线稿图。Furthermore, the line drawing to be fused corresponding to each of the frame images to be fused may be obtained, thereby obtaining a plurality of line drawings.

步骤S550：是否遍历完每个待融合线稿图。Step S550: Check whether each line drawing to be merged has been traversed.

每次可以对一个待融合线稿图进行处理，直至遍历完每个待融合线稿图。若未遍历完，则可以跳转执行步骤S560；若已经遍历完，则可以跳转执行步骤S5240。Each time, one line drawing to be fused may be processed until all the line drawings to be fused are traversed. If the traversal is not completed, the execution may jump to step S560; if the traversal is completed, the execution may jump to step S5240.

步骤S560：获取初始图像。Step S560: Acquire an initial image.

步骤S570：获取掩膜图像。Step S570: Acquire a mask image.

步骤S580：调整为指定分辨率。Step S580: Adjust to a specified resolution.

可以获取初始图像，然后通过主体抠图算法获取初始图像的掩膜图像，其中掩膜图像可以用于表征初始图像中的目标对象。然后调整掩膜图像的分辨率，使得掩膜图像的分辨率和潜在空间的分辨率相匹配。An initial image may be obtained, and then a mask image of the initial image may be obtained through a subject matting algorithm, wherein the mask image may be used to characterize a target object in the initial image. Then, the resolution of the mask image is adjusted so that the resolution of the mask image matches the resolution of the latent space.

步骤S590：自编码器编码。Step S590: Autoencoder encoding.

步骤S5100：获取潜在特征图像。Step S5100: Acquire potential feature image.

在一些实施方式中，可以通过自编码器来对初始图像进行编码，得到位于潜在空间的潜在特征图像。In some implementations, the initial image may be encoded by an autoencoder to obtain a latent feature image in a latent space.

步骤S5110：增加高斯分布噪声。Step S5110: Add Gaussian distribution noise.

进一步的，可以对潜在特征图像增加高斯分布噪声，得到噪声图像。具体的，可以对噪声图像增加T次高斯分布噪声，其中前向步骤可以为马尔可夫链，使得每次增加高斯分布噪声的步骤仅与上一次相关，实现将一张图片变为纯高斯噪声，即得到噪声图像。Furthermore, Gaussian distribution noise can be added to the potential feature image to obtain a noise image. Specifically, T times of Gaussian distribution noise can be added to the noise image, wherein the forward step can be a Markov chain, so that each step of adding Gaussian distribution noise is only related to the previous step, so as to convert a picture into pure Gaussian noise, that is, to obtain a noise image.

步骤S5120：获取深度图像。Step S5120: Acquire a depth image.

可以通过深度学习模型来提取初始图像的深度图像。A depth image of the initial image can be extracted through a deep learning model.

步骤S5130：获取初始线稿图。Step S5130: Obtain an initial line drawing.

获取初始图像的初始线稿图的方法和前述步骤中获取待融合帧图像对应的待融合线稿图的方法类似，也可以通过线稿图求取算法来获取初始图像的初始线稿图。The method for obtaining the initial line drawing of the initial image is similar to the method for obtaining the line drawing to be fused corresponding to the frame image to be fused in the aforementioned step. The initial line drawing of the initial image can also be obtained by a line drawing obtaining algorithm.

步骤S5140：得到待处理线稿图。Step S5140: Obtain the line drawing to be processed.

在一些实施方式中，可以将所述初始线稿图分别与每个待融合线稿图进行融合，得到多个待处理线稿图像。具体融合的方法可以参阅前述实施例的描述，此处不再赘述。In some implementations, the initial line drawing may be fused with each line drawing to be fused to obtain a plurality of line drawing images to be processed. The specific fusion method may refer to the description of the above embodiment, which will not be described in detail here.

步骤S5150：线稿模型。Step S5150: Line drawing model.

步骤S5160：深度模型。Step S5160: Depth model.

可以通过深度模型提取所述深度图像的深度特征信息；通过线稿模型提取所述待处理线稿图像的线稿特征信息。The depth feature information of the depth image may be extracted through a depth model; and the line draft feature information of the line draft image to be processed may be extracted through a line draft model.

步骤S5170：获取效果提示词。Step S5170: Obtain effect prompt words.

除了通过深度特征信息、线稿特征信息来引导指定算法对中间图像进行操作之外，还可以结合效果提示词来进一步对中间图像进行调整。其中，效果提示词可以用于调整中间图像的光影效果。具体的获取方式可以参阅前述实施例中的描述，此处不再赘述。In addition to using depth feature information and line feature information to guide the specified algorithm to operate the intermediate image, the intermediate image can also be further adjusted in combination with effect prompt words. Among them, the effect prompt words can be used to adjust the light and shadow effects of the intermediate image. The specific acquisition method can refer to the description in the above embodiment, which will not be repeated here.

步骤S5180：降噪以及光影效果调整。Step S5180: noise reduction and light and shadow effect adjustment.

调用指定算法来对噪声图像进行降噪以及光影效果调整。其中，指定算法可以在效果提示词的引导下进行光影效果调整，而在深度特征信息以及线稿特征信息的引导下进行降噪。Call the specified algorithm to perform noise reduction and light and shadow effect adjustment on the noisy image. The specified algorithm can adjust the light and shadow effect under the guidance of the effect prompt word, and perform noise reduction under the guidance of the depth feature information and line feature information.

步骤S5190：是否完成i步降噪。Step S5190: Whether i-step noise reduction is completed.

步骤S5200：迭代噪声图像，得到中间图像。Step S5200: Iterate the noise image to obtain an intermediate image.

其中，通过指定算法来对噪声图像进行降噪，可以迭代i步。也就是说，可以首先对噪声图像进行迭代，得到中间图像。然后进行降噪，判断是否完已经完成了i步降噪，若未完成，则跳转继续执行不再S5200；若已经完成i步降噪，则跳转执行步骤S5210。The noise image can be denoised by specifying an algorithm, and it can be iterated i steps. That is, the noise image can be iterated first to obtain an intermediate image. Then, denoising is performed to determine whether the i-step denoising has been completed. If not, the process jumps to continue executing S5200; if the i-step denoising has been completed, the process jumps to execute step S5210.

可选的，可以是每次迭代得到中间图像后，都进行光影效果的调整。Optionally, the light and shadow effects may be adjusted after obtaining the intermediate image in each iteration.

步骤S5210：自编码器解码得到待处理图像。Step S5210: Decode the image to be processed from the encoder.

对指定算法得到的图像进行解码处理。具体的，可以通过自编码器对指定算法得到的图像进行解码，得到待处理图像。The image obtained by the specified algorithm is decoded. Specifically, the image obtained by the specified algorithm can be decoded by an autoencoder to obtain an image to be processed.

步骤S5220：获取第一色彩参数以及第二色彩参数。Step S5220: Obtain a first color parameter and a second color parameter.

步骤S5230：重打光得到目标图像。Step S5230: Re-light to obtain the target image.

第一色彩参数用于表征待处理图像中除目标对象的目标区域之外的区域对应的色彩参数。第二色彩参数用于表征初始图像对应的色彩参数。可以基于第一色彩参数以及第二色彩参数来确定指定色彩参数，从而通过指定色彩参数对待处理图像中的目标对象进行重打光。详细的介绍可以参阅前述实施例，此处不再赘述。The first color parameter is used to characterize the color parameter corresponding to the area other than the target area of the target object in the image to be processed. The second color parameter is used to characterize the color parameter corresponding to the initial image. The specified color parameter can be determined based on the first color parameter and the second color parameter, so that the target object in the image to be processed is re-lit by the specified color parameter. For a detailed description, please refer to the aforementioned embodiment, which will not be repeated here.

得到本次处理过程中待融合线稿图对应的目标图像，此时返回步骤S550进行判断。The target image corresponding to the line drawing to be fused in this processing process is obtained, and then the process returns to step S550 for determination.

步骤S5240：生成目标视频。Step S5240: Generate target video.

在判定已经遍历完每个待融合线稿图的情况下，此时可以基于多个目标图像来生成目标视频。具体生成目标视频的介绍，可以参阅前述实施例的介绍，此处不再赘述。When it is determined that each line drawing to be fused has been traversed, a target video can be generated based on multiple target images. For the specific introduction of generating the target video, please refer to the introduction of the above embodiment, which will not be repeated here.

步骤S5250：结束。Step S5250: End.

请参阅图14，图14示出了本申请实施例提供的一种视频生成装置的结构框图，该视频生成装置1400包括：第一获取单元1410、第二获取单元1420、融合单元1430、色彩参数确定单元1440、重打光单元1450以及视频生成单元1460。Please refer to Figure 14, which shows a structural block diagram of a video generating device provided in an embodiment of the present application. The video generating device 1400 includes: a first acquisition unit 1410, a second acquisition unit 1420, a fusion unit 1430, a color parameter determination unit 1440, a re-lighting unit 1450 and a video generating unit 1460.

第一获取单元1410，用于获取待融合视频以及初始图像，并对所述待融合视频进行帧分解，得到多个待融合帧图像。The first acquisition unit 1410 is used to acquire the video to be fused and the initial image, and perform frame decomposition on the video to be fused to obtain a plurality of frame images to be fused.

第二获取单元1420，用于获取所述初始图像中目标对象的掩膜图像。The second acquisition unit 1420 is configured to acquire a mask image of the target object in the initial image.

可选的，第二获取单元1420还可以用于基于预先获取的主体抠图模型从所述初始图像中确定目标对象的初始掩膜图像；将所述初始掩膜图像的分辨率调整为指定分辨率，得到所述掩膜图像，所述指定分辨率为所述初始图像对应的潜在空间的分辨率。Optionally, the second acquisition unit 1420 can also be used to determine an initial mask image of the target object from the initial image based on a pre-acquired subject cutout model; adjust the resolution of the initial mask image to a specified resolution to obtain the mask image, and the specified resolution is the resolution of the latent space corresponding to the initial image.

融合单元1430，用于基于指定算法将所述初始图像分别与每个所述待融合帧图像进行融合，得到多个待处理图像。The fusion unit 1430 is used to fuse the initial image with each of the frame images to be fused based on a specified algorithm to obtain a plurality of images to be processed.

可选的，融合单元1430还可以用于获取每个所述待融合帧图像对应的待融合线稿图；获取所述初始图像的深度图像以及所述初始图像的初始线稿图；将所述初始线稿图分别与每个待融合线稿图进行融合，得到多个待处理线稿图像；基于指定算法通过所述深度图像、待处理线稿图像以及噪声图像，得到每个所述待处理线稿图像对应的待处理图像，其中，所述噪声图像是基于对所述初始图像加噪得到的图像。Optionally, the fusion unit 1430 can also be used to obtain a line draft image to be fused corresponding to each of the frame images to be fused; obtain a depth image of the initial image and an initial line draft image of the initial image; fuse the initial line draft image with each of the line draft images to be fused respectively to obtain a plurality of line draft images to be processed; based on a specified algorithm, obtain an image to be processed corresponding to each of the line draft images to be processed through the depth image, the line draft image to be processed and the noise image, wherein the noise image is an image obtained by adding noise to the initial image.

可选的，融合单元1430还可以用于基于自编码器对所述初始图像进行编码，获取所述初始图像位于潜在空间的潜在特征图像；对所述潜在特征图像增加高斯分布噪声，得到所述噪声图像。Optionally, the fusion unit 1430 can also be used to encode the initial image based on an autoencoder to obtain a potential feature image of the initial image in a latent space; and add Gaussian distribution noise to the potential feature image to obtain the noise image.

可选的，融合单元1430还可以用于基于指定算法通过所述深度图像、待处理线稿图像以及噪声图像，得到每个所述待处理线稿图像对应的待解码图像；基于自编码器对每个所述待解码图像进行解码，得到每个所述待处理线稿图像对应的待处理图像。Optionally, the fusion unit 1430 can also be used to obtain, based on a specified algorithm, an image to be decoded corresponding to each of the line draft images to be processed through the depth image, the line draft image to be processed and the noise image; and decode each of the image to be decoded based on the autoencoder to obtain an image to be processed corresponding to each of the line draft images to be processed.

可选的，融合单元1430还可以用于基于深度模型提取所述深度图像的深度特征信息；基于线稿模型提取所述待处理线稿图像的线稿特征信息；对所述噪声图像中的目标对象增加高斯噪声，得到中间图像；基于指定算法通过所述深度特征信息以及线稿特征信息对所述中间图像进行降噪处理，得到每个所述待处理线稿图像对应的待处理图像。Optionally, the fusion unit 1430 can also be used to extract depth feature information of the depth image based on the depth model; extract line draft feature information of the line draft image to be processed based on the line draft model; add Gaussian noise to the target object in the noise image to obtain an intermediate image; perform denoising on the intermediate image based on the depth feature information and the line draft feature information based on a specified algorithm to obtain an image to be processed corresponding to each of the line draft images to be processed.

可选的，融合单元1430还可以用于基于所述掩膜图像确定所述噪声图像中的目标对象对应的目标区域，作为第一图像区域；基于所述掩膜图像确定所述噪声图像中除第一图像区域之外的区域，作为第二图像区域；对所述第一图像区域增加高斯噪声，得到噪声图像区域；将噪声图像区域与第二图像区域融合，得到所述中间图像。Optionally, the fusion unit 1430 can also be used to determine a target area corresponding to a target object in the noise image based on the mask image as a first image area; determine an area other than the first image area in the noise image based on the mask image as a second image area; add Gaussian noise to the first image area to obtain a noise image area; and fuse the noise image area with the second image area to obtain the intermediate image.

可选的，融合单元1430还可以用于获取效果提示词，所述效果提示词用于调整所述中间图像的光影效果；基于指定算法通过所述深度特征信息、线稿特征信息以及所述效果提示词对所述中间图像进行目标操作，得到每个所述待处理线稿图像对应的待处理图像，所述目标操作包括降噪处理以及光影效果调整。Optionally, the fusion unit 1430 can also be used to obtain effect prompt words, and the effect prompt words are used to adjust the light and shadow effects of the intermediate image; based on a specified algorithm, a target operation is performed on the intermediate image through the depth feature information, the line draft feature information and the effect prompt words to obtain a to-be-processed image corresponding to each of the to-be-processed line draft images, and the target operation includes noise reduction processing and light and shadow effect adjustment.

色彩参数确定单元1440，用于基于所述掩膜图像确定每个所述待处理图像对应的第一色彩参数，基于所述初始图像确定第二色彩参数。The color parameter determination unit 1440 is configured to determine a first color parameter corresponding to each of the to-be-processed images based on the mask image, and to determine a second color parameter based on the initial image.

可选的，色彩参数确定单元1440还可以用于基于所述掩膜图像确定所述待处理图像中目标对象对应的目标区域之外的区域，作为第三图像区域；获取每个第三图像区域中红色通道对应的第一红色参数、绿色通道的第一绿色参数以及蓝色通道对应的第一蓝色参数；基于第一红色参数、第一绿色参数以及第一蓝色参数确定每个所述待处理图像对应的第一色彩参数；获取初始图像中红色通道对应的第二红色参数、绿色通道的第二绿色参数以及蓝色通道对应的第二蓝色参数；基于第二红色参数、第二绿色参数以及第二蓝色参数确定所述第二色彩参数。Optionally, the color parameter determination unit 1440 can also be used to determine an area outside the target area corresponding to the target object in the image to be processed based on the mask image as a third image area; obtain a first red parameter corresponding to the red channel, a first green parameter of the green channel, and a first blue parameter corresponding to the blue channel in each third image area; determine the first color parameter corresponding to each of the images to be processed based on the first red parameter, the first green parameter, and the first blue parameter; obtain a second red parameter corresponding to the red channel, the second green parameter of the green channel, and the second blue parameter corresponding to the blue channel in the initial image; determine the second color parameter based on the second red parameter, the second green parameter, and the second blue parameter.

可选的，色彩参数确定单元1440还可以用于将第一红色参数、第一绿色参数以及第一蓝色参数的平均值，作为每个所述待处理图像对应的第一色彩参数；将第二红色参数、第二绿色参数以及第二蓝色参数的平均值，作为所述第二色彩参数。Optionally, the color parameter determination unit 1440 can also be used to use the average value of the first red parameter, the first green parameter and the first blue parameter as the first color parameter corresponding to each of the images to be processed; and use the average value of the second red parameter, the second green parameter and the second blue parameter as the second color parameter.

重打光单元1450，用于基于所述第二色彩参数、掩膜图像、初始图像以及每个所述待处理图像对应的第一色彩参数对每个所述待处理图像进行重打光，得到目标图像。The relighting unit 1450 is used to relight each of the images to be processed based on the second color parameter, the mask image, the initial image and the first color parameter corresponding to each of the images to be processed to obtain a target image.

可选的，重打光单元1450还可以用于基于所述掩膜图像确定所述初始图像中目标对象对应的目标区域，作为第四图像区域；将基于指定色彩参数重打光后的第四图像区域与第三图像区域进行融合，得到目标图像，所述指定色彩参数为基于所述第一色彩参数与所述第二色彩参数的商确定的参数，所述第三图像区域为基于所述掩膜图像确定所述待处理图像中除目标对象对应的目标区域之外的区域。Optionally, the relighting unit 1450 can also be used to determine the target area corresponding to the target object in the initial image based on the mask image as a fourth image area; the fourth image area after relighting based on specified color parameters is merged with the third image area to obtain a target image, the specified color parameter is a parameter determined based on the quotient of the first color parameter and the second color parameter, and the third image area is an area in the image to be processed determined based on the mask image except the target area corresponding to the target object.

视频生成单元1460，用于基于所述目标图像生成目标视频。The video generating unit 1460 is configured to generate a target video based on the target image.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the above-described devices and units can refer to the corresponding processes in the aforementioned method embodiments, and will not be repeated here.

在本申请所提供的几个实施例中，单元相互之间的耦合可以是电性，机械或其它形式的耦合。另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In several embodiments provided in the present application, the coupling between the units may be electrical, mechanical or other forms of coupling. In addition, each functional unit in each embodiment of the present application may be integrated into a processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.

请参阅图15，图15示出了本申请实施例提供的一种电子设备的结构框图。该电子设备110可以是智能手机、台式电脑、车载计算机、服务器或平板电脑等。本申请中的电子设备110可以包括一个或多个如下部件：处理器111、存储器112以及一个或多个应用程序，其中处理器111电连接于存储器112，一个或多个程序配置用于执行如前述各实施例所描述的方法。Please refer to FIG. 15, which shows a block diagram of an electronic device provided in an embodiment of the present application. The electronic device 110 may be a smart phone, a desktop computer, a vehicle-mounted computer, a server, or a tablet computer, etc. The electronic device 110 in the present application may include one or more of the following components: a processor 111, a memory 112, and one or more application programs, wherein the processor 111 is electrically connected to the memory 112, and the one or more program configurations are used to execute the methods described in the aforementioned embodiments.

处理器111可以包括一个或者多个处理核。处理器111利用各种接口和线路连接整个电子设备110内的各个部分，通过运行或执行存储在存储器112内的指令、程序、代码集或指令集，以及调用存储在存储器112内的数据，执行电子设备110的各种功能和处理数据。可选地，处理器111可以采用数字信号处理(Digital Signal Processing，DSP)、现场可编程门阵列(Field－Programmable Gate Array，FPGA)、可编程逻辑阵列(Programmable LogicArray，PLA)中的至少一种硬件形式来实现。处理器111可集成中央处理器(CentralProcessing Unit，CPU)、图像处理器(Graphics Processing Unit，GPU)和调制解调器等中的一种或几种的组合。其中，CPU主要处理操作系统、用户界面和计算机程序等；GPU用于负责显示内容的渲染和绘制；调制解调器用于处理无线通信。可以理解的是，上述调制解调器也可以不集成到处理器111中，单独通过一块通信芯片进行实现。具体可以通过一个或多个处理器111执行如前述实施例所描述的方法。The processor 111 may include one or more processing cores. The processor 111 uses various interfaces and lines to connect various parts of the entire electronic device 110, and executes various functions and processes data of the electronic device 110 by running or executing instructions, programs, code sets or instruction sets stored in the memory 112, and calling data stored in the memory 112. Optionally, the processor 111 can be implemented in at least one hardware form of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), and programmable logic array (Programmable Logic Array, PLA). The processor 111 can integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU) and a modem. Among them, the CPU mainly processes the operating system, user interface and computer programs; the GPU is responsible for rendering and drawing display content; and the modem is used to process wireless communications. It can be understood that the above-mentioned modem may not be integrated into the processor 111, and it can be implemented separately through a communication chip. Specifically, the method described in the above embodiment can be executed by one or more processors 111.

对于一些实施方式，存储器112可以包括随机存储器(Random Access Memory，RAM)，也可以包括只读存储器(Read-Only Memory)。存储器112可用于存储指令、程序、代码、代码集或指令集。存储器112可包括存储程序区和存储数据区，其中，存储程序区可存储用于实现操作系统的指令、用于实现至少一个功能的指令、用于实现下述各个方法实施例的指令等。存储数据区还可以存储电子设备110在使用中所创建的数据等。For some embodiments, the memory 112 may include a random access memory (RAM) or a read-only memory (ROM). The memory 112 may be used to store instructions, programs, codes, code sets, or instruction sets. The memory 112 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for implementing at least one function, instructions for implementing the following various method embodiments, etc. The data storage area may also store data created by the electronic device 110 during use, etc.

请参阅图16，其示出了本申请实施例提供的一种计算机可读存储介质的结构框图。该计算机可读介质1600中存储有程序代码，所述程序代码可被处理器调用执行上述方法实施例中所描述的方法。Please refer to Figure 16, which shows a block diagram of a computer-readable storage medium provided in an embodiment of the present application. The computer-readable medium 1600 stores program codes, which can be called by a processor to execute the method described in the above method embodiment.

计算机可读存储介质1600可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。可选地，计算机可读存储介质1600包括非易失性计算机可读介质(non-transitory computer-readable storage medium)。计算机可读存储介质1600具有执行上述方法中的任何方法步骤的程序代码1610的存储空间。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。程序代码1610可以例如以适当形式进行压缩。The computer readable storage medium 1600 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read-only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer readable storage medium 1600 includes a non-transitory computer-readable storage medium. The computer readable storage medium 1600 has storage space for program code 1610 that performs any method step of the above method. These program codes can be read from or written to one or more computer program products. The program code 1610 can be compressed, for example, in an appropriate form.

最后应说明的是：以上实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不驱使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit it. Although the present application has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or make equivalent replacements for some of the technical features therein. However, these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A video generation method, comprising:

Acquire the video to be fused and the initial image, and perform frame decomposition on the video to be fused to obtain a plurality of frame images to be fused;

Acquire a mask image of the target object in the initial image;

Based on a specified algorithm, the initial image is respectively fused with each of the frame images to be fused to obtain a plurality of images to be processed;

Determine a first color parameter corresponding to each of the to-be-processed images based on the mask image, and determine a second color parameter based on the initial image;

Relighting each of the images to be processed based on the second color parameter, the mask image, the initial image, and the first color parameter corresponding to each of the images to be processed to obtain a target image;

A target video is generated based on the target image.

2. The method according to claim 1, characterized in that the fusing of the initial image with each of the frame images to be fused based on a specified algorithm to obtain a plurality of images to be processed comprises:

Obtaining the line drawing to be fused corresponding to each of the frame images to be fused;

Acquire a depth image of the initial image and an initial line drawing of the initial image;

The initial line drawing is merged with each line drawing to be merged respectively to obtain a plurality of line drawing images to be processed;

Based on a specified algorithm, an image to be processed corresponding to each of the line draft images to be processed is obtained through the depth image, the line draft image to be processed and the noise image, wherein the noise image is an image obtained by adding noise to the initial image.

3. The method according to claim 2, characterized in that before obtaining the image to be processed corresponding to each of the line draft images to be processed through the depth image, the line draft image to be processed and the noise image based on the specified algorithm, it also includes:

Encoding the initial image based on an autoencoder to obtain a latent feature image of the initial image in a latent space;

Gaussian distribution noise is added to the potential feature image to obtain the noise image.

4. The method according to claim 3, characterized in that the step of obtaining the image to be processed corresponding to each of the line draft images to be processed by using the depth image, the line draft image to be processed and the noise image based on a specified algorithm comprises:

Based on a specified algorithm, the depth image, the line draft image to be processed and the noise image are used to obtain an image to be decoded corresponding to each of the line draft images to be processed;

Each of the to-be-decoded images is decoded based on the autoencoder to obtain an image to be processed corresponding to each of the to-be-processed line draft images.

5. The method according to claim 2, characterized in that the step of obtaining the image to be processed corresponding to each of the line draft images to be processed by using the depth image, the line draft image to be processed and the noise image based on a specified algorithm comprises:

Extracting depth feature information of the depth image based on a depth model;

Extracting line draft feature information of the line draft image to be processed based on the line draft model;

Adding Gaussian noise to the target object in the noise image to obtain an intermediate image;

Based on a specified algorithm, the intermediate image is subjected to noise reduction processing using the depth feature information and the line draft feature information to obtain an image to be processed corresponding to each of the line draft images to be processed.

6. The method according to claim 5, characterized in that the step of adding Gaussian noise to the target object in the noise image to obtain an intermediate image comprises:

determining, based on the mask image, a target area corresponding to the target object in the noise image as a first image area;

determining, based on the mask image, an area other than the first image area in the noise image as a second image area;

Adding Gaussian noise to the first image region to obtain a noisy image region;

The noise image region is fused with the second image region to obtain the intermediate image.

7. The method according to claim 5, characterized in that the step of performing noise reduction processing on the intermediate image based on the specified algorithm by using the depth feature information and the line feature information to obtain the image to be processed corresponding to each of the line images to be processed comprises:

Acquire an effect prompt word, where the effect prompt word is used to adjust the light and shadow effect of the intermediate image;

Based on a specified algorithm, a target operation is performed on the intermediate image through the depth feature information, the line draft feature information and the effect prompt word to obtain a to-be-processed image corresponding to each to-be-processed line draft image, wherein the target operation includes noise reduction processing and light and shadow effect adjustment.

8. The method according to claim 1, wherein obtaining a mask image of the target object in the initial image comprises:

Determining an initial mask image of the target object from the initial image based on a pre-acquired subject matting model;

The resolution of the initial mask image is adjusted to a specified resolution to obtain the mask image, where the specified resolution is a resolution of a latent space corresponding to the initial image.

9. The method according to claim 1, characterized in that the determining the first color parameter corresponding to each of the to-be-processed images based on the mask image and the determining the second color parameter based on the initial image comprises:

Determine, based on the mask image, an area outside the target area corresponding to the target object in the image to be processed as a third image area;

Obtaining a first red parameter corresponding to the red channel, a first green parameter corresponding to the green channel, and a first blue parameter corresponding to the blue channel in each third image region;

Determine a first color parameter corresponding to each of the to-be-processed images based on a first red parameter, a first green parameter, and a first blue parameter;

Obtain a second red parameter corresponding to the red channel, a second green parameter corresponding to the green channel, and a second blue parameter corresponding to the blue channel in the initial image;

The second color parameter is determined based on a second red parameter, a second green parameter, and a second blue parameter.

10. The method according to claim 9, characterized in that the determining the first color parameter corresponding to each of the to-be-processed images based on the first red parameter, the first green parameter and the first blue parameter comprises:

Taking an average value of the first red parameter, the first green parameter and the first blue parameter as the first color parameter corresponding to each of the to-be-processed images;

The determining the second color parameter based on a second red parameter, a second green parameter, and a second blue parameter comprises:

An average value of the second red parameter, the second green parameter and the second blue parameter is used as the second color parameter.

11. The method according to claim 1, characterized in that the step of relighting each of the images to be processed based on the second color parameter, the mask image, the initial image, and the first color parameter corresponding to each of the images to be processed to obtain a target image comprises:

determining, based on the mask image, a target area corresponding to the target object in the initial image as a fourth image area;

The fourth image area that has been re-lit based on specified color parameters is fused with the third image area to obtain a target image, wherein the specified color parameters are parameters determined based on the quotient of the first color parameter and the second color parameter, and the third image area is an area in the image to be processed determined based on the mask image excluding the target area corresponding to the target object.

12. A video generation device, comprising:

A first acquisition unit is used to acquire the video to be fused and the initial image, and perform frame decomposition on the video to be fused to obtain a plurality of frame images to be fused;

A second acquisition unit, used to acquire a mask image of the target object in the initial image;

A fusion unit, used for fusing the initial image with each of the frame images to be fused based on a specified algorithm to obtain a plurality of images to be processed;

A color parameter determination unit, configured to determine a first color parameter corresponding to each of the to-be-processed images based on the mask image, and to determine a second color parameter based on the initial image;

a relighting unit, configured to relight each of the images to be processed based on the second color parameter, the mask image, the initial image, and the first color parameter corresponding to each of the images to be processed, so as to obtain a target image;

A video generating unit is used to generate a target video based on the target image.

13. An electronic device, comprising: one or more processors;

Memory;

One or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs are configured to execute the method according to any one of claims 1-11.

14. A computer-readable storage medium, characterized in that program code is stored in the computer-readable storage medium, and the program code can be called by a processor to execute the method according to any one of claims 1 to 11.