WO2023230927A1 - Image processing method and device, and readable storage medium - Google Patents

Image processing method and device, and readable storage medium Download PDF

Info

Publication number
WO2023230927A1
WO2023230927A1 PCT/CN2022/096483 CN2022096483W WO2023230927A1 WO 2023230927 A1 WO2023230927 A1 WO 2023230927A1 CN 2022096483 W CN2022096483 W CN 2022096483W WO 2023230927 A1 WO2023230927 A1 WO 2023230927A1
Authority
WO
WIPO (PCT)
Prior art keywords
segmentation
matting
training
image
round
Prior art date
Application number
PCT/CN2022/096483
Other languages
French (fr)
Chinese (zh)
Inventor
陈凌颖
张亚森
苏海军
倪鹏程
Original Assignee
北京小米移动软件有限公司
北京小米松果电子有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京小米移动软件有限公司, 北京小米松果电子有限公司 filed Critical 北京小米移动软件有限公司
Priority to CN202280004202.6A priority Critical patent/CN117501309A/en
Priority to PCT/CN2022/096483 priority patent/WO2023230927A1/en
Publication of WO2023230927A1 publication Critical patent/WO2023230927A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

An image processing method and device, and a readable storage medium. The method comprises: acquiring a target image; determining segmentation information for segmenting the target image by means of a target matting model, wherein the target matting model is obtained by alternately training a basic matting network on the basis of a first sample segmentation image and a sample matting image, the basic matting network is obtained by training an original matting network on the basis of a second sample segmentation image, the first sample segmentation image and the second sample segmentation image carry a first segmentation tag, the sample matting image carries a second segmentation tag, and the segmentation granularity of the first segmentation tag is greater than that of the second segmentation tag; and determining a matting target in the target image according to the segmentation information.

Description

图像处理方法、装置及可读存储介质Image processing method, device and readable storage medium 技术领域Technical field
本公开涉及图像处理领域,尤其涉及一种图像处理方法、装置及可读存储介质。The present disclosure relates to the field of image processing, and in particular, to an image processing method, device and readable storage medium.
背景技术Background technique
相关的无额外输入的抠图算法,其得到的分割信息的精准性较差。Related matting algorithms without additional inputs are less accurate in segmentation information.
相关的基于三分图或基于背景图的抠图算法,为了提高分割信息的精准性,不仅仅需要输入待抠图的目标图像,还需要输入针对待抠图的目标图像所对应的三分图或者背景图,而准备针对待抠图的目标图像所对应的三分图或者背景图,需要耗费大量的时间和精力。In order to improve the accuracy of the segmentation information related to the cutout algorithm based on the third image or the background image, it is not only necessary to input the target image to be cut out, but also to input the third image corresponding to the target image to be cut out. Or background image, and preparing the three-dimensional image or background image corresponding to the target image to be cut out requires a lot of time and energy.
发明内容Contents of the invention
为克服相关技术中存在的问题,本公开提供一种图像处理方法、装置及可读存储介质。In order to overcome problems existing in related technologies, the present disclosure provides an image processing method, device and readable storage medium.
根据本公开实施例的第一方面,提供一种数据处理方法,包括:According to a first aspect of an embodiment of the present disclosure, a data processing method is provided, including:
获取目标图像;Get the target image;
通过目标抠图模型确定用于分割所述目标图像的分割信息,所述目标抠图模型基于第一样本分割图像和样本抠图图像交替训练基础抠图网络得到,所述基础抠图网络基于第二样本分割图像训练原始抠图网络得到,所述第一样本分割图像和所述第二样本分割图像携带第一分割标签,所述样本抠图图像携带第二分割标签,所述第一分割标签的分割粒度大于所述第二分割标签的分割粒度;The segmentation information used to segment the target image is determined through a target matting model. The target matting model is obtained by alternately training a basic matting network based on the first sample segmentation image and the sample matting image. The basic matting network is based on The second sample segmentation image is obtained by training the original matting network, the first sample segmentation image and the second sample segmentation image carry a first segmentation label, the sample matting image carries a second segmentation label, and the first The segmentation granularity of the segmentation label is greater than the segmentation granularity of the second segmentation label;
根据所述分割信息,确定所述目标图像中的抠图目标。According to the segmentation information, the cutout target in the target image is determined.
根据本公开实施例的第二方面,提供一种图像处理装置,包括:According to a second aspect of an embodiment of the present disclosure, an image processing device is provided, including:
第一获取模块,被配置为获取目标图像;The first acquisition module is configured to acquire the target image;
分割模块,被配置为通过目标抠图模型确定用于分割所述目标图像的分割信息,所述目标抠图模型基于第一样本分割图像和样本抠图图像交替训练基础抠图网络得到,所述基础抠图网络基于第二样本分割图像训练原始抠图网络得到,所述第一样本分割图像和所述第二样本分割图像携带第一分割标签,所述样本抠图图像携带第二分割标签,所述第一分割标签的分割粒度大于所述第二分割标签的分割粒度;The segmentation module is configured to determine segmentation information for segmenting the target image through a target matting model. The target matting model is obtained by alternately training a basic matting network based on the first sample segmentation image and the sample matting image. The basic matting network is obtained by training the original matting network based on the second sample segmentation image, the first sample segmentation image and the second sample segmentation image carry a first segmentation label, and the sample matting image carries a second segmentation label, the segmentation granularity of the first segmentation label is greater than the segmentation granularity of the second segmentation label;
抠图目标确定模块,被配置为根据所述分割信息,确定所述目标图像中的抠图目标。The cutout target determination module is configured to determine the cutout target in the target image according to the segmentation information.
根据本公开实施例的第三方面,提供另一种图像处理装置,包括:According to a third aspect of an embodiment of the present disclosure, another image processing device is provided, including:
处理器;processor;
用于存储处理器可执行指令的存储器;Memory used to store instructions executable by the processor;
其中,所述处理器被配置为执行本公开实施例的第一方面所提供的图像处理方法的步骤。Wherein, the processor is configured to execute the steps of the image processing method provided by the first aspect of the embodiment of the present disclosure.
根据本公开实施例的第四方面,提供一种计算机可读存储介质,其上存储有计算机程序指令,该程序指令被处理器执行时实现本公开第一方面所提供的图像处理方法的步骤。According to a fourth aspect of an embodiment of the present disclosure, there is provided a computer-readable storage medium on which computer program instructions are stored. When the program instructions are executed by a processor, the steps of the image processing method provided by the first aspect of the present disclosure are implemented.
本公开的实施例提供的技术方案中,只需要将目标图像输入目标抠图模型,即可输出较为精确的分割信息,从而能够根据分割信息更加准确地确定目标图像中的抠图目标,即,目标抠图模型仅仅以目标图像为输入,而不需要再输入额外的辅助图像,例如三分图或者背景图等,从而能够省去准备辅助图像所耗费的大量时间和精力。In the technical solution provided by the embodiments of the present disclosure, more accurate segmentation information can be output only by inputting the target image into the target matting model, so that the matting target in the target image can be more accurately determined based on the segmentation information, that is, The target matting model only takes the target image as input, without the need to input additional auxiliary images, such as three-dimensional images or background images, thus saving a lot of time and energy in preparing auxiliary images.
而且,本公开中,原始抠图网络先通过第二样本分割图像进行训练,得到基础抠图 网络,由于第一分割标签的分割粒度小于第二分割标签的分割粒度,能够提高基础抠图网络预测的鲁棒性,且提高基础抠图网络定位抠图目标的准确性。然后再通过第一样本分割图像以及样本抠图图像对基础抠图网络进行交替训练,得到目标抠图网络,在此过程中,由于第二分割标签的分割粒度大于第一分割标签的分割粒度,使样本抠图图像对基础抠图网络进行训练后能够提高其分割的精准性,通过第一样本分割图像对基础抠图网络训练时的监督信息来辅助样本抠图图像对基础抠图网络的训练,能够在提高定位抠图目标的准确性的同时提高针对抠图目标的分割信息的精准性,且能够加快模型的训练速度。Moreover, in the present disclosure, the original matting network is first trained through the second sample segmentation image to obtain the basic matting network. Since the segmentation granularity of the first segmentation label is smaller than the segmentation granularity of the second segmentation label, the prediction of the basic matting network can be improved. The robustness of the algorithm is improved, and the accuracy of the basic matting network in locating the matting target is improved. Then, the basic matting network is alternately trained through the first sample segmentation image and the sample matting image to obtain the target matting network. In this process, since the segmentation granularity of the second segmentation label is greater than the segmentation granularity of the first segmentation label , so that the sample cutout image can improve the accuracy of its segmentation after training the basic cutout network, and use the supervision information when training the basic cutout network on the first sample segmentation image to assist the sample cutout image on the basic cutout network Training can improve the accuracy of positioning the cutout target while improving the accuracy of segmentation information for the cutout target, and can speed up the training of the model.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It should be understood that the foregoing general description and the following detailed description are exemplary and explanatory only, and do not limit the present disclosure.
附图说明Description of the drawings
本公开上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present disclosure will become apparent and readily understood from the following description of the embodiments in conjunction with the accompanying drawings, in which:
图1是根据一示例性实施例示出的一种图像处理方法的流程图。FIG. 1 is a flowchart of an image processing method according to an exemplary embodiment.
图2是根据一示例性实施例示出的一种图像背景虚化方法的流程图。FIG. 2 is a flow chart of an image background blurring method according to an exemplary embodiment.
图3是根据一示例性实施例示出的一种图像背景虚化的效果示意图。FIG. 3 is a schematic diagram illustrating the effect of image background blur according to an exemplary embodiment.
图4是根据一示例性实施例示出的一种图像背景替换方法的流程图。Figure 4 is a flow chart of an image background replacement method according to an exemplary embodiment.
图5是根据一示例性实施例示出的一种图像背景替换的效果示意图。Figure 5 is a schematic diagram illustrating the effect of image background replacement according to an exemplary embodiment.
图6是根据一示例性实施例示出的一种原始抠图网络的结构示意图。Figure 6 is a schematic structural diagram of an original matting network according to an exemplary embodiment.
图7是根据一示例性实施例示出的一种图像处理方法的流程示意图。FIG. 7 is a schematic flowchart of an image processing method according to an exemplary embodiment.
图8是根据一示例性实施例示出的一种原始抠图网络的结构示意图。Figure 8 is a schematic structural diagram of an original matting network according to an exemplary embodiment.
图9是根据一示例性实施例示出的一种基础抠图网络的结构示意图。Figure 9 is a schematic structural diagram of a basic matting network according to an exemplary embodiment.
图10是根据一示例性实施例示出的一种单任务训练的流程图。Figure 10 is a flow chart of single-task training according to an exemplary embodiment.
图11是根据一示例性实施例示出的一种双任务训练的流程图。Figure 11 is a flow chart of dual-task training according to an exemplary embodiment.
图12是根据一示例性实施例示出的一种得到精细分割总损失的方法的流程图。Figure 12 is a flowchart of a method for obtaining the total loss of fine segmentation according to an exemplary embodiment.
图13是根据一示例性实施例示出的一种得到语义分割损失的方法的流程图。Figure 13 is a flowchart of a method for obtaining semantic segmentation loss according to an exemplary embodiment.
图14是根据一示例性实施例示出的一种得到语义分割损失的方法的流程图。Figure 14 is a flowchart of a method for obtaining semantic segmentation loss according to an exemplary embodiment.
图15是根据一示例性实施例示出的一种得到目标精细分割损失的方法的流程图。Figure 15 is a flowchart of a method for obtaining target fine segmentation loss according to an exemplary embodiment.
图16是根据一示例性实施例示出的一种得到目标精细分割损失的方法的流程图。Figure 16 is a flowchart of a method for obtaining target fine segmentation loss according to an exemplary embodiment.
图17是根据一示例性实施例示出的一种对前景图进行色彩迁移的示意图。FIG. 17 is a schematic diagram illustrating color migration of a foreground image according to an exemplary embodiment.
图18是根据一示例性实施例示出的一种采样点对的示意图。Figure 18 is a schematic diagram of a sampling point pair according to an exemplary embodiment.
图19是根据一示例性实施例示出的一种图像处理装置的结构框图。Figure 19 is a structural block diagram of an image processing device according to an exemplary embodiment.
图20是根据一示例性实施例示出的一种图像处理装置的结构框图。Figure 20 is a structural block diagram of an image processing device according to an exemplary embodiment.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. When the following description refers to the drawings, the same numbers in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with aspects of the disclosure as detailed in the appended claims.
可以理解的是,本公开中“多个”是指两个或两个以上,其它量词与之类似。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚 地表示其他含义。It can be understood that "plurality" in this disclosure refers to two or more, and other quantifiers are similar. "And/or" describes the relationship between related objects, indicating that there can be three relationships. For example, A and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the related objects are in an "or" relationship. The singular forms "a", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
进一步可以理解的是,术语“第一”、“第二”等用于描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开,并不表示特定的顺序或者重要程度。实际上,“第一”、“第二”等表述完全可以互换使用。例如,在不脱离本公开范围的情况下,第一消息帧也可以被称为第二消息帧,类似地,第二消息帧也可以被称为第一消息帧。It is further understood that the terms "first", "second", etc. are used to describe various information, but such information should not be limited to these terms. These terms are only used to distinguish information of the same type from each other and do not imply a specific order or importance. In fact, expressions such as "first" and "second" can be used interchangeably. For example, without departing from the scope of the present disclosure, the first message frame may also be called a second message frame, and similarly, the second message frame may also be called a first message frame.
进一步可以理解的是,本公开实施例中尽管在附图中以特定的顺序描述操作,但是不应将其理解为要求按照所示的特定顺序或是串行顺序来执行这些操作,或是要求执行全部所示的操作以得到期望的结果。在特定环境中,多任务和并行处理可能是有利的。It will be further understood that although the operations are described in a specific order in the drawings in the embodiments of the present disclosure, this should not be understood as requiring that these operations be performed in the specific order shown or in a serial order, or that it is required that Perform all operations shown to obtain the desired results. In certain circumstances, multitasking and parallel processing may be advantageous.
此外,本申请中所有获取信号、信息或数据的动作都是在遵照所在地国家相应的数据保护法规政策的前提下,并获得由相应装置所有者给予授权的情况下进行的。In addition, all actions to obtain signals, information or data in this application are performed under the premise of complying with the corresponding data protection laws and policies of the country where the location is located, and with authorization from the owner of the corresponding device.
目前,基于深度学习的抠图算法主要包括以下三种:基于三分图的抠图算法、基于背景图的抠图算法和无额外输入的抠图算法。其中基于三分图的抠图算法需要一个三分图的额外输入,作为进行蒙版抠图区域的引导。因此需要人工或者额外的模型提供三分图的标注或者预测结果。基于背景图的抠图算法则需要提供无前景的背景图输入,从而隐式地提供了前景选择信息,提高抠图准确率。对于无额外输入的抠图算法,抠图时误分较多,例如,在抠图时,将属于背景的内容错误地划分为前景的内容,或,将属于前景的内容错误地划分为背景的内容,导致抠图的精准性较差。Currently, there are three main types of cutout algorithms based on deep learning: cutout algorithms based on three-dimensional images, cutout algorithms based on background images, and cutout algorithms without additional input. Among them, the cutout algorithm based on the three-point map requires an additional input of a three-point map as a guide for the mask cutout area. Therefore, manual work or additional models are needed to provide labeling or prediction results of the three-part graph. The background image-based cutout algorithm needs to provide a background image input without foreground, thus implicitly providing foreground selection information and improving the accuracy of cutout. For image-cutting algorithms without additional input, there are many misclassifications during image-cutting. For example, during image-cutting, the content belonging to the background is mistakenly classified as the content of the foreground, or the content belonging to the foreground is mistakenly classified as the content of the background. content, resulting in poor cutout accuracy.
为了解决上述问题,本公开实施例提供一种图像处理方法、装置及可读存储介质。下面首先介绍本公开实施例的应用环境。In order to solve the above problems, embodiments of the present disclosure provide an image processing method, device and readable storage medium. The following first introduces the application environment of the embodiment of the present disclosure.
本公开实施例提供的一种图像处理方法,可应用于终端设备,例如,手机或者相机等具有拍摄功能的终端设备,或者,具有图像处理功能的终端设备。对于具有拍摄功能的终端设备,用户可通过拍摄的方式获得目标图像和背景图像,也可从其它设备获取目标图像和背景图像,对于具有图像处理功能的终端设备,可从其它设备获取目标图像和背景图像。用户选中目标图像并点击,终端设备的输出界面即可展示背景虚化或背景替换等功能控件,用户点击背景虚化所对应的功能控件,即可触发该终端设备执行本公开实施例提供的图像处理方法,得到抠图目标,然后,终端设备可根据得到的抠图目标,继续进行针对目标图像的背景虚化算法,得到背景虚化后的图像。或者,当用户点击背景替换所对应的功能控件时,触发手机展示选择背景图选项,用户执行选择背景图的操作完成后,触发执行本公开中的图像处理方法,得到抠图目标,并根据得到的抠图目标,结合背景图,继续执行背景替换算法处理,从而得到背景替换后的图像。An image processing method provided by an embodiment of the present disclosure can be applied to terminal devices, such as mobile phones or cameras and other terminal devices with a shooting function, or terminal devices with image processing functions. For a terminal device with a shooting function, the user can obtain the target image and background image by shooting, or from other devices. For a terminal device with an image processing function, the user can obtain the target image and background image from other devices. Background image. The user selects the target image and clicks it, and the output interface of the terminal device can display functional controls such as background blur or background replacement. The user clicks the functional control corresponding to the background blur, which can trigger the terminal device to execute the image provided by the embodiment of the present disclosure. The processing method obtains the cutout target, and then the terminal device can continue to perform the background blur algorithm on the target image based on the obtained cutout target to obtain a background blurred image. Or, when the user clicks on the function control corresponding to the background replacement, the mobile phone is triggered to display the option of selecting a background image. After the user completes the operation of selecting the background image, the image processing method in the present disclosure is triggered to be executed, and the cutout target is obtained, and according to the obtained The cutout target is combined with the background image, and the background replacement algorithm is continued to process, thereby obtaining the image after background replacement.
图1是根据一示例性实施例示出的一种图像处理方法的流程图,如图1所示,该图像处理方法包括:Figure 1 is a flow chart of an image processing method according to an exemplary embodiment. As shown in Figure 1, the image processing method includes:
S101、获取目标图像。S101. Obtain the target image.
其中,目标图像为待处理的图像,即本公开中需要进行抠图处理的图像,该图像可为RGB图像,即光学三原色图像,其中,R代表Red(红色),G代表Green(绿色),B代表Blue(蓝色),且该目标图像包括抠图目标。其中,抠图目标可为人像、动物图像或者其它任意物体的图像。The target image is an image to be processed, that is, the image that needs to be cut out in this disclosure. The image can be an RGB image, that is, an optical three primary color image, where R represents Red (red) and G represents Green (green). B represents Blue, and the target image includes a cutout target. Among them, the cutout target can be a portrait, an animal image, or an image of any other object.
S102、通过目标抠图模型确定用于分割所述目标图像的分割信息,所述目标抠图模型基于第一样本分割图像和样本抠图图像交替训练基础抠图网络得到,所述基础抠图网络基于第二样本分割图像训练原始抠图网络得到,所述第一样本分割图像和所述第二样本分割图像携带第一分割标签,所述样本抠图图像携带第二分割标签,所述第一分割标签的分割粒度大于所述第二分割标签的分割粒度。S102. Determine the segmentation information for segmenting the target image through a target matting model. The target matting model is obtained by alternately training a basic matting network based on the first sample segmentation image and the sample matting image. The basic matting model is The original cutout network is trained by the network based on the second sample segmentation image. The first sample segmentation image and the second sample segmentation image carry a first segmentation label, and the sample cutout image carries a second segmentation label. The segmentation granularity of the first segmentation label is greater than the segmentation granularity of the second segmentation label.
本实施例中,直接将目标图像输入目标抠图模型,即可得到分割信息,为了提高分 割信息的精准性,目标图像可为内容相同但分辨率不同的两张图像。且目标抠图模型可为针对具体一类抠图目标的抠图模型,以提高分割信息的精准性,例如,目标抠图模型可为针对人像的抠图模型或者针对猫的抠图模型等,针对的抠图目标多种多样,在此不做具体限定。In this embodiment, the segmentation information can be obtained by directly inputting the target image into the target matting model. In order to improve the accuracy of the segmentation information, the target images can be two images with the same content but different resolutions. And the target matting model can be a matting model for a specific type of matting target to improve the accuracy of segmentation information. For example, the target matting model can be a matting model for portraits or a matting model for cats, etc. The target cutouts are diverse and are not specifically limited here.
其中,第一样本分割图像和第二样本分割图像可相同,也可不同,第一分割标签和第二分割标签为分割粒度不同的标签,且第一分割标签的分割粒度大于所述第二分割标签的分割粒度。其中,分割粒度的大小与对图像进行分割的精细程度的大小反相关。例如,第一分割标签可为二分类标签,可将抠图目标的值标注为1,将目标图像中抠图目标之外的区域的值标注为0。第二分割标签可为多分类标签(相比二分类标签更精细),例如,可将抠图目标内部的值标注为1,将抠图目标的边缘处设置过渡区域,在过渡区域标注值从1到0逐渐过渡,标注值可为0.9、0.8、0.7、0.6、0.5、0.4、0.3、0.2和0.1等。例如,越靠近抠图目标的内部,其标注值越接近1,目标图像中除抠图目标内部区域和过渡区域之外的部分标注值为0。Wherein, the first sample segmentation image and the second sample segmentation image may be the same or different, the first segmentation label and the second segmentation label are labels with different segmentation granularities, and the segmentation granularity of the first segmentation label is larger than the second segmentation label. The segmentation granularity of the segmentation label. Among them, the size of the segmentation granularity is inversely related to the size of the fineness of segmenting the image. For example, the first segmentation label may be a two-category label, and the value of the cutout target may be marked as 1, and the value of the area outside the cutout target in the target image may be marked as 0. The second segmentation label can be a multi-category label (more refined than the two-category label). For example, the value inside the cutout target can be marked as 1, a transition area can be set at the edge of the cutout target, and the value in the transition area can be marked from There is a gradual transition from 1 to 0, and the label values can be 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, etc. For example, the closer to the interior of the cutout target, the closer its label value is to 1, and the label value of the part of the target image except the interior area and transition area of the cutout target is 0.
原始抠图网络先通过第二样本分割图像进行训练,得到基础抠图网络,由于第一分割标签的分割粒度大于第二分割标签的分割粒度,能够提高基础抠图网络预测的鲁棒性,且提高基础抠图网络定位抠图目标的准确性。然后再通过第一样本分割图像以及样本抠图图像对基础抠图网络进行交替训练,得到目标抠图网络,在此过程中,由于第二分割标签的分割粒度小于第一分割标签的分割粒度,使样本抠图图像对基础抠图网络进行训练后能够提高其分割的精准性,通过第一样本分割图像对基础抠图网络训练时的监督信息来辅助样本抠图图像对基础抠图网络的训练,能够在提高定位抠图目标的准确性的同时提高针对抠图目标的分割信息的精准性,且能够加快模型的训练速度。The original matting network is first trained through the second sample segmentation image to obtain the basic matting network. Since the segmentation granularity of the first segmentation label is greater than the segmentation granularity of the second segmentation label, the robustness of the prediction of the basic matting network can be improved, and Improve the accuracy of the basic cutout network in locating cutout targets. Then, the basic matting network is alternately trained through the first sample segmentation image and the sample matting image to obtain the target matting network. In this process, since the segmentation granularity of the second segmentation label is smaller than the segmentation granularity of the first segmentation label , so that the sample cutout image can improve the accuracy of its segmentation after training the basic cutout network, and use the supervision information when training the basic cutout network on the first sample segmentation image to assist the sample cutout image on the basic cutout network Training can improve the accuracy of positioning the cutout target while improving the accuracy of segmentation information for the cutout target, and can speed up the training of the model.
S103、根据所述分割信息,确定所述目标图像中的抠图目标。S103. Determine the cutout target in the target image according to the segmentation information.
在得到分割信息后,即可根据分割信息中的预测值,确定目标图像中的抠图目标,例如,分割信息中,对于抠图目标,其预测值为非0的值,即1或0到1之间的数值,不包括0,对于抠图目标之外的区域,其预测值为0,从而能够将预测值为非0的值所代表的区域确定为抠图目标。After obtaining the segmentation information, the cutout target in the target image can be determined based on the predicted value in the segmentation information. For example, in the segmentation information, for the cutout target, the predicted value is a non-0 value, that is, 1 or 0 to Values between 1, excluding 0, for areas outside the cutout target, the predicted value is 0, so that the area represented by a predicted value other than 0 can be determined as the cutout target.
采用上述方法,原始抠图网络先通过第二样本分割图像进行训练,得到基础抠图网络,由于第一分割标签的分割粒度大于第二分割标签的分割粒度,能够提高基础抠图网络预测的鲁棒性,且提高基础抠图网络定位抠图目标的准确性,使得到的抠图目标内部更加完整,不会出现空洞。然后再通过第一样本分割图像以及样本抠图图像对基础抠图网络进行交替训练,得到目标抠图网络,在此过程中,由于第二分割标签的分割粒度小于第一分割标签的分割粒度,使样本抠图图像对基础抠图网络进行训练后能够提高其分割的精准性,通过第一样本分割图像对基础抠图网络训练时的监督信息来辅助样本抠图图像对基础抠图网络的训练,能够在提高定位抠图目标的准确性的同时提高针对抠图目标的分割信息的精准性,且能够加快模型的训练速度。从而能够在仅仅以目标图像作为输入,不需要输入额外辅助图像的情况下,输出更加精准的分割信息,以便得到更加精准的抠图目标。Using the above method, the original matting network is first trained through the second sample segmentation image to obtain the basic matting network. Since the segmentation granularity of the first segmentation label is greater than the segmentation granularity of the second segmentation label, the prediction accuracy of the basic matting network can be improved. It improves the accuracy of the basic cutout network in positioning the cutout target, making the interior of the resulting cutout target more complete and free of holes. Then, the basic matting network is alternately trained through the first sample segmentation image and the sample matting image to obtain the target matting network. In this process, since the segmentation granularity of the second segmentation label is smaller than the segmentation granularity of the first segmentation label , so that the sample cutout image can improve the accuracy of its segmentation after training the basic cutout network, and use the supervision information when training the basic cutout network on the first sample segmentation image to assist the sample cutout image on the basic cutout network Training can improve the accuracy of positioning the cutout target while improving the accuracy of segmentation information for the cutout target, and can speed up the training of the model. As a result, more accurate segmentation information can be output without inputting additional auxiliary images without using only the target image as input, so as to obtain more accurate matting targets.
图2是根据一示例性实施例示出的一种图像背景虚化方法的流程图,如图2所示,该方法包括:Figure 2 is a flow chart of an image background blurring method according to an exemplary embodiment. As shown in Figure 2, the method includes:
S201、获取目标图像。S201. Obtain the target image.
S202、通过目标抠图模型确定用于分割所述目标图像的分割信息,所述目标抠图模型基于第一样本分割图像和样本抠图图像交替训练基础抠图网络得到,所述基础抠图网络基于第二样本分割图像训练原始抠图网络得到,所述第一样本分割图像和所述第二样本分割图像携带第一分割标签,所述样本抠图图像携带第二分割标签,所述第一分割标 签的分割粒度大于所述第二分割标签的分割粒度。S202. Determine the segmentation information used to segment the target image through a target matting model. The target matting model is obtained by alternately training a basic matting network based on the first sample segmentation image and the sample matting image. The basic matting is The original cutout network is trained by the network based on the second sample segmentation image. The first sample segmentation image and the second sample segmentation image carry a first segmentation label, and the sample cutout image carries a second segmentation label. The segmentation granularity of the first segmentation label is greater than the segmentation granularity of the second segmentation label.
S203、根据所述分割信息,确定所述目标图像中的抠图目标。S203. Determine the cutout target in the target image according to the segmentation information.
其中,步骤S201至步骤S203的解释可参考上述步骤S101至步骤S103的解释,在此不再赘述。For explanations of steps S201 to S203, reference may be made to the explanations of steps S101 to S103 above, which will not be described again here.
S204、根据所述抠图目标,对所述目标图像中所述抠图目标之外的部分进行虚化处理,得到背景虚化图像。S204. According to the cutout target, perform blurring processing on the portion of the target image other than the cutout target to obtain a background blurred image.
在确定目标图像中的抠图目标之后,即可对目标图像中抠图目标之外的区域进行虚化处理,模糊处理的方法可为方框滤波、归一化盒子滤波和高斯滤波等。After determining the cutout target in the target image, the area outside the cutout target in the target image can be blurred. The blur processing method can be box filtering, normalized box filtering, Gaussian filtering, etc.
上述方法可用于相机拍照场景中的背景虚化,以人像作为抠图目标,以便使得到的图像中,除了人像之外的其余区域较为模糊,从而使人像更加清晰突出,避免背景对人像的干扰。The above method can be used to blur the background in camera shooting scenes, using portraits as the cutout target, so that the remaining areas in the resulting image except for the portrait are blurred, making the portrait clearer and more prominent, and avoiding interference from the background on the portrait. .
图3是根据一示例性实施例示出的一种图像背景虚化的效果示意图,如图3所示,终端设备可为手机,示例地,用户使用手机相机拍照,得到照片,即目标图像,然后,用户可点击照片,从而触发手机展示背景虚化等功能控件,用户点击背景虚化所对应的功能控件,触发手机执行本公开中的图像处理方法,得到抠图目标,并根据得到的抠图目标继续执行虚化算法处理,从而得到背景虚化后的图像。Figure 3 is a schematic diagram of an image background blurring effect according to an exemplary embodiment. As shown in Figure 3, the terminal device can be a mobile phone. For example, the user uses the mobile phone camera to take a photo to obtain the photo, that is, the target image, and then , the user can click on the photo, thereby triggering the mobile phone to display functional controls such as background blur. The user clicks on the functional control corresponding to the background blur, triggering the mobile phone to execute the image processing method in the present disclosure, obtain the cutout target, and based on the obtained cutout The target continues to perform blurring algorithm processing to obtain an image with a blurred background.
图4是根据一示例性实施例示出的一种图像背景替换方法的流程图,如图4所示,该方法包括:Figure 4 is a flow chart of an image background replacement method according to an exemplary embodiment. As shown in Figure 4, the method includes:
S401、获取目标图像。S401. Obtain the target image.
S402、通过目标抠图模型确定用于分割所述目标图像的分割信息,所述目标抠图模型基于第一样本分割图像和样本抠图图像交替训练基础抠图网络得到,所述基础抠图网络基于第二样本分割图像训练原始抠图网络得到,所述第一样本分割图像和所述第二样本分割图像携带第一分割标签,所述样本抠图图像携带第二分割标签,所述第一分割标签的分割粒度大于所述第二分割标签的分割粒度。S402. Determine the segmentation information for segmenting the target image through a target matting model. The target matting model is obtained by alternately training a basic matting network based on the first sample segmentation image and the sample matting image. The basic matting is The original cutout network is trained by the network based on the second sample segmentation image. The first sample segmentation image and the second sample segmentation image carry a first segmentation label, and the sample cutout image carries a second segmentation label. The segmentation granularity of the first segmentation label is greater than the segmentation granularity of the second segmentation label.
S403、根据所述分割信息,确定所述目标图像中的抠图目标。S403. Determine the cutout target in the target image according to the segmentation information.
其中,步骤S401至步骤S403的解释可参考上述步骤S101至步骤S103的解释,在此不再赘述。For explanations of steps S401 to S403, reference may be made to the explanations of steps S101 to S103 above, which will not be described again here.
S404、获取目标背景图像。S404. Obtain the target background image.
其中,目标背景图像为需要对目标图像进行背景替换的图像,目标背景图像可为任意图像。The target background image is an image that requires background replacement of the target image, and the target background image can be any image.
S405、根据所述抠图目标,对所述目标图像进行抠图处理,得到抠图目标图像。S405. According to the cutout target, perform cutout processing on the target image to obtain a cutout target image.
其中,确定了目标图像中的抠图目标之后,即可将目标图像中的抠图目标进行抠图处理,例如,将目标图像中的抠图目标分离,保存为新的图像,即抠图目标图像,或者,直接将目标图像中的抠图目标之外的区域做透明化处理,得到抠图目标图像。Among them, after the cutout target in the target image is determined, the cutout target in the target image can be cutout. For example, the cutout target in the target image is separated and saved as a new image, that is, the cutout target. image, or directly transparentize the area outside the cutout target in the target image to obtain the cutout target image.
S406、对所述抠图目标图像和所述目标背景图像进行合成处理,得到背景替换图像。S406: Perform synthesis processing on the cutout target image and the target background image to obtain a background replacement image.
在得到抠图目标图像之后,即可将抠图目标图像合成到目标背景图像上,其中,目标背景图像上与抠图目标重合的区域被抠图目标覆盖,从而得到背景替换图像。After obtaining the cutout target image, the cutout target image can be synthesized onto the target background image, wherein the area on the target background image that coincides with the cutout target is covered by the cutout target, thereby obtaining a background replacement image.
上述方法可用于图像的背景替换,通过获取输入的目标图像的分割信息,抠取抠图目标,使用新的目标背景图像,根据公式I=αF+(1-α)B获得背景替换图像,其中α为模型输出的分割信息,F为目标图像,B为目标背景图像。从而能够实现将现有目标图像中的背景替换为更加美观的背景,使得到的背景替换图像在包含抠图目标的情况下更加美观。The above method can be used for image background replacement by obtaining the segmentation information of the input target image, picking out the cutout target, using the new target background image, and obtaining the background replacement image according to the formula I=αF+(1-α)B, where α is the segmentation information output by the model, F is the target image, and B is the target background image. As a result, the background in the existing target image can be replaced with a more beautiful background, so that the obtained background replacement image is more beautiful when it contains the cutout target.
图5是根据一示例性实施例示出的一种图像背景替换的效果示意图,如图5所示,终端设备可为手机,示例地,用户使用手机相机拍照,得到照片,即目标图像,然后,用户可点击照片,从而触发手机展示背景替换等功能控件,用户点击背景替换所对应的 功能控件,触发手机展示选择背景图选项,用户执行选择背景图的操作完成后,触发执行本公开中的图像处理方法,得到抠图目标,并根据得到的抠图目标,结合背景图,继续执行背景替换算法处理,从而得到背景替换后的图像。Figure 5 is a schematic diagram of the effect of image background replacement according to an exemplary embodiment. As shown in Figure 5, the terminal device can be a mobile phone. For example, the user uses the mobile phone camera to take a photo to obtain the photo, that is, the target image. Then, The user can click on the photo, thereby triggering the mobile phone to display functional controls such as background replacement. The user clicks on the functional control corresponding to the background replacement, triggering the mobile phone to display the selection background image option. After the user completes the operation of selecting the background image, the execution of the image in this disclosure is triggered. The processing method is to obtain the cutout target, and based on the obtained cutout target, combined with the background image, continue to perform the background replacement algorithm processing, thereby obtaining the image after background replacement.
图6是根据一示例性实施例示出的一种原始抠图网络的结构示意图,如图6所示,本公开一示例性实施例还提供一种原始抠图网络,该原始抠图网络用于训练得到上述图像处理方法中的目标抠图模型,所述原始抠图网络包括特征提取模块、空洞卷积池化模块、上采样模块和多倍上采样模块。Figure 6 is a schematic structural diagram of an original matting network according to an exemplary embodiment. As shown in Figure 6, an exemplary embodiment of the present disclosure also provides an original matting network, which is used for The target matting model in the above image processing method is trained. The original matting network includes a feature extraction module, a dilated convolution pooling module, an upsampling module and a multiple upsampling module.
其中,原始抠图网络的整体网络结构属于Encoder-Decoder模型,Encoder部分为特征提取模块,其中,特征提取模块用于提取目标图像的特征,为了获取更多的上下文信息,加入了一个ASPP(Atrous Spatial Pyramid Pooling,空洞空间卷积池化金字塔)模块,即空洞卷积池化模块,再由一个多级的Decoder部分(上采样模块)逐步上采样,最后使用一个多倍的上采样卷积模块(多倍上采样模块)获取高分辨率特征,最终输出高分辨率预测结果,即一个较为精准的分割信息。Among them, the overall network structure of the original cutout network belongs to the Encoder-Decoder model, and the Encoder part is a feature extraction module. Among them, the feature extraction module is used to extract the features of the target image. In order to obtain more contextual information, an ASPP (Atrous Spatial Pyramid Pooling, the hole spatial convolution pooling pyramid) module, that is, the hole convolution pooling module, is then gradually upsampled by a multi-level Decoder part (upsampling module), and finally uses a multiple upsampling convolution module (Multiple upsampling module) obtains high-resolution features and finally outputs high-resolution prediction results, which is a more accurate segmentation information.
上述原始抠图网络,相较于相关的针对视频的抠图模型,使用多倍上采样模块替代深度导向滤波模块,解决了相关技术中针对视频的抠图模型难以应用在移动终端上的问题,其去掉了适用于视频的额外模块,特征提取模块使用更加轻量且适用于量化模型的网络结构,从而解决了移动端耗时长和功耗大的问题,更加适用于移动端的部署。Compared with the related video-specific matting models, the above-mentioned original matting network uses a multiple upsampling module to replace the depth-oriented filtering module, which solves the problem in related technologies that the video-oriented matting models are difficult to apply to mobile terminals. It removes the additional modules suitable for video, and the feature extraction module uses a network structure that is more lightweight and suitable for quantitative models, thus solving the problems of long time consumption and high power consumption on the mobile terminal, and is more suitable for mobile terminal deployment.
图7是根据一示例性实施例示出的一种图像处理方法的流程示意图,如图7所示,基于上述原始抠图网络训练得到的目标抠图模型,该图像处理方法包括:Figure 7 is a schematic flowchart of an image processing method according to an exemplary embodiment. As shown in Figure 7, based on the target matting model trained by the original matting network, the image processing method includes:
S701、获取目标图像。S701. Obtain the target image.
S702、通过所述特征提取模块对所述目标图像进行特征提取,得到原始特征向量。S702: Extract features from the target image through the feature extraction module to obtain original feature vectors.
S703、通过所述空洞卷积池化模块,对所述原始特征向量进行上下文提取,得到上下文特征向量。S703. Use the atrous convolution pooling module to perform context extraction on the original feature vector to obtain a context feature vector.
S704、通过上采样模块对所述上下文特征向量和所述原始特征向量进行上采样,得到精细分割结果。S704. Use an upsampling module to upsample the context feature vector and the original feature vector to obtain a fine segmentation result.
S705、通过多倍上采样模块,对所述精细分割结果和所述目标图像进行上采样,得到所述分割信息。S705: Use a multiple upsampling module to upsample the fine segmentation result and the target image to obtain the segmentation information.
S706、根据所述分割信息,确定所述目标图像中的抠图目标。S706: Determine the cutout target in the target image according to the segmentation information.
首先,特征提取模块可使用更加轻量且适用于量化模型的网络结构,特征提取模块包括多个卷积层,例如5个卷积层,并且对最后一个卷积层的比率进行修改,使特征提取模块最终输出的特征大小为输入的1/16,能够在维持特征图大小的同时保持感受野与原网络一致。特征提取模块依次提取的特征大小为输入的1/2、1/4、1/8、1/16和1/16。通过特征提取模块对目标图像进行特征提取,即可得到原始特征向量,为了获取更多的上下文信息,加入了一个ASPP(Atrous Spatial Pyramid Pooling,空洞空间卷积池化金字塔)模块,即空洞卷积池化模块,通过空洞卷积池化模块,对原始特征向量进行上下文提取,即可得到上下文特征向量。为了加速,ASPP模块中可使用深度可分离卷积替代普通卷积。Decoder模块(即上采样模块)由依次增大的上采样卷积模块组成,通过上采样模块对上下文特征向量和原始特征向量进行上采样,即可得到精细分割结果,具体地,上采样卷积模块的数量与特征提取模块中的卷积层的数量相同,每个上采样卷积模块的输入由上一个模块输出(即空洞卷积池化模块输出的上下文特征向量或前一个模块输出的精细分割结果)和特征提取模块的对应模块输出(即特征提取模块中对应的卷积层输出的原始特征向量)组成。First, the feature extraction module can use a more lightweight network structure suitable for quantitative models. The feature extraction module includes multiple convolutional layers, such as 5 convolutional layers, and the ratio of the last convolutional layer is modified to make the features The final output feature size of the extraction module is 1/16 of the input, which can maintain the size of the feature map while keeping the receptive field consistent with the original network. The feature extraction module sequentially extracts feature sizes of 1/2, 1/4, 1/8, 1/16 and 1/16 of the input. The original feature vector can be obtained by extracting features from the target image through the feature extraction module. In order to obtain more contextual information, an ASPP (Atrous Spatial Pyramid Pooling, Atrous Spatial Convolution Pooling Pyramid) module is added, that is, atrous convolution The pooling module performs context extraction on the original feature vector through the dilated convolution pooling module to obtain the context feature vector. To speed up, depthwise separable convolutions can be used instead of ordinary convolutions in the ASPP module. The Decoder module (i.e., the upsampling module) consists of sequentially increasing upsampling convolution modules. By upsampling the context feature vector and the original feature vector, the fine segmentation results can be obtained. Specifically, the upsampling convolution module The number of modules is the same as the number of convolutional layers in the feature extraction module, and the input of each upsampling convolution module is output by the previous module (i.e., the contextual feature vector output by the atrous convolution pooling module or the fine output of the previous module). Segmentation result) and the corresponding module output of the feature extraction module (i.e., the original feature vector output by the corresponding convolutional layer in the feature extraction module).
其中,目标图像包括两个大小不同的子图像,例如一个子图像为1024大小的图像,另一个子图像为512大小的图像,将较小的一个子图像输入特征提取模块,最后输出精 细分割结果,其中,1024大小为图像的宽度和高度均为1024个像素,512大小为图像的宽度和高度均为512个像素。Among them, the target image includes two sub-images of different sizes, for example, one sub-image is an image of size 1024, and the other sub-image is an image of size 512. The smaller sub-image is input into the feature extraction module, and finally the fine segmentation result is output. , where the 1024 size means that the width and height of the image are both 1024 pixels, and the 512 size means that the width and height of the image are both 512 pixels.
为了获取更高精度的分割信息,将目标图像中的较大的一个子图像与初步预测结果(即精细分割结果)输入一个多倍上采样模块,具体可为2倍上采样卷积模块,得到最终的分割信息。多倍上采样模块由卷积层和上采样组成,更加方便移动端部署。In order to obtain higher-precision segmentation information, a larger sub-image in the target image and the preliminary prediction result (ie, the fine segmentation result) are input into a multiple upsampling module, specifically a 2x upsampling convolution module, and we get final segmentation information. The multiple upsampling module consists of convolutional layers and upsampling, making it more convenient for mobile terminal deployment.
图8是根据一示例性实施例示出的一种原始抠图网络的结构示意图,如图8所示,针对训练阶段,该原始抠图网络包括由多个上采样卷积模块组成的上采样模块,每个上采样卷积模块连接一个Segmentation预测头(粗分割预测头),每个粗分割预测头用于输出对应的上采样卷积模块上采样后的粗分割结果。Figure 8 is a schematic structural diagram of an original matting network according to an exemplary embodiment. As shown in Figure 8, for the training phase, the original matting network includes an upsampling module composed of multiple upsampling convolution modules. , Each upsampling convolution module is connected to a Segmentation prediction head (coarse segmentation prediction head), and each coarse segmentation prediction head is used to output the coarse segmentation result after upsampling by the corresponding upsampling convolution module.
通过在每个上采样卷积模块连接一个粗分割预测头,在每一轮训练之后,使每个上采样模块均能够输出一个对应的粗分割结果,从而能够得到多个粗分割结果,以便能够计算得到更加准确的语义分割损失,从而更好地对模型参数进行优化,加快模型的收敛,提高模型的训练速度。By connecting a coarse segmentation prediction head to each upsampling convolution module, after each round of training, each upsampling module can output a corresponding coarse segmentation result, so that multiple coarse segmentation results can be obtained, so that A more accurate semantic segmentation loss can be calculated to better optimize model parameters, accelerate model convergence, and improve model training speed.
图9是根据一示例性实施例示出的一种基础抠图网络的结构示意图,如图9所示,针对训练阶段,该基础抠图网络包括由多个上采样卷积模块组成的上采样模块,每个上采样卷积模块分别连接一个Segmentation预测头(粗分割预测头)和一个Alpha预测头(精细分割预测头),每个粗分割预测头用于输出对应的上采样卷积模块上采样后的粗分割结果,每个精细分割预测头用于输出对应的上采样卷积模块上采样后的精细分割结果。Figure 9 is a schematic structural diagram of a basic matting network according to an exemplary embodiment. As shown in Figure 9, for the training phase, the basic matting network includes an upsampling module composed of multiple upsampling convolution modules. , each upsampling convolution module is connected to a Segmentation prediction head (coarse segmentation prediction head) and an Alpha prediction head (fine segmentation prediction head). Each coarse segmentation prediction head is used to output the corresponding upsampling convolution module upsampling. After obtaining the coarse segmentation result, each fine segmentation prediction head is used to output the upsampled fine segmentation result of the corresponding upsampling convolution module.
通过在每个上采样卷积模块连接一个粗分割预测头和精细分割预测头,在每一轮训练之后,使每个上采样模块均能够输出一个对应的粗分割结果和精细分割结果,从而能够得到多个粗分割结果和多个精细分割结果,以便能够计算得到更加准确的语义分割损失和精细分割总损失,从而更好地对模型参数进行优化,加快模型的收敛,提高模型的训练速度。By connecting a coarse segmentation prediction head and a fine segmentation prediction head to each upsampling convolution module, each upsampling module can output a corresponding coarse segmentation result and fine segmentation result after each round of training, thus enabling Multiple coarse segmentation results and multiple fine segmentation results are obtained so that more accurate semantic segmentation loss and total fine segmentation loss can be calculated, thereby better optimizing model parameters, accelerating model convergence, and improving model training speed.
本公开一示例性实施例还提供一种目标抠图模型的训练方法,训练得到的目标抠图模型用于实现上述任一实施例中的图像处理方法,该目标抠图模型的训练方法可包括样本数据获取和模型训练两个部分。An exemplary embodiment of the present disclosure also provides a training method for a target matting model. The trained target matting model is used to implement the image processing method in any of the above embodiments. The training method for the target matting model may include There are two parts: sample data acquisition and model training.
其中,样本数据获取用于得到第一样本分割图像、第二样本分割图像和样本抠图图像,例如,样本数据包括语义分割数据集和Matting(抠图)数据集。其中,第一样本分割图像和第二样本分割图像可为语义分割数据集中的图像,样本抠图图像为Matting数据集中的图像,语义分割数据集和Matting数据集均包含自采数据和公开数据集。具体地,自采语义分割数据为7W左右,再加上公开数据集Dark Complexion Portrait Segmentation Dataset(深色人像分割数据集)。自采Matting数据集约3700张高精度标注,外加公开数据集。The sample data acquisition is used to obtain the first sample segmentation image, the second sample segmentation image and the sample matting image. For example, the sample data includes a semantic segmentation data set and a Matting (matting) data set. Among them, the first sample segmentation image and the second sample segmentation image can be images in the semantic segmentation data set, and the sample matting image is an image in the Matting data set. Both the semantic segmentation data set and the Matting data set include self-collected data and public data. set. Specifically, the self-collected semantic segmentation data is about 7W, plus the public data set Dark Complexion Portrait Segmentation Dataset (Dark Portrait Segmentation Dataset). The self-collected Matting data set contains about 3,700 high-precision annotations, plus public data sets.
为了获取更多样化的样本数据,以便训练得到分割信息更精准的目标抠图模型,对于采集到的样本数据,还可进行数据预处理,即数据扩增,具体地,语义分割数据输入大小为512,语义分割数据对应的数据预处理包括随机缩放、水平翻转、旋转和色彩抖动等。Mating数据输入大小为1024,在输入原始抠图网络之前会先进行下采样到512。Matting数据预处理包含仿射变换、旋转、翻转、色彩抖动和随机噪声或者锐化处理。In order to obtain more diverse sample data and train a target matting model with more accurate segmentation information, data preprocessing, that is, data amplification, can also be performed on the collected sample data. Specifically, the semantic segmentation data input size For 512, data preprocessing corresponding to semantic segmentation data includes random scaling, horizontal flipping, rotation, and color dithering. The Mating data input size is 1024, which is downsampled to 512 before inputting the original matting network. Matting data preprocessing includes affine transformation, rotation, flipping, color dithering and random noise or sharpening.
对于Mating数据(即样本抠图图像),由于其标注更加细致,耗费精力较大,所以Mating数据的数据量较少,为了扩大Mating数据的数据量,还可对已有的Mating数据进行背景替换,得到新的Mating数据。通过获取新的背景图,可对原始的样本抠图图像进行背景替换,将Mating数据中的前景图结合到新的背景图上,得到新的样本抠图图像。在背景替换的过程中,由于背景图与原始的样本抠图图像中的前景图色彩空间分布差异通常较大,可对前景图进行色彩迁移处理,使前景图与背景图融合的更加自然真实。For Mating data (i.e., sample cutout images), because its annotation is more detailed and requires more effort, the amount of Mating data is smaller. In order to expand the amount of Mating data, the existing Mating data can also be background replaced. , get new Mating data. By obtaining a new background image, the original sample cutout image can be background replaced, and the foreground image in the mating data is combined with the new background image to obtain a new sample cutout image. In the process of background replacement, since the color space distribution difference of the foreground image in the background image and the original sample cutout image is usually large, the foreground image can be color migrated to make the fusion of the foreground image and the background image more natural and realistic.
其中,第一样本分割图像和第二样本分割图像可相同,也可不同。第一样本分割图像和第二样本分割图像携带第一分割标签,样本抠图图像携带第二分割标签,第一分割标签和第二分割标签为分割粒度不同的标签,且第一分割标签的分割粒度大于第二分割标签的分割粒度。例如,第一分割标签可为二分类标签,可将抠图目标的值标注为1,将目标图像中抠图目标之外的区域的值标注为0,第二分割标签可为多分类标签,可将抠图目标内部的值标注为1,将抠图目标的边缘处设置过渡区域,在过渡区域标注值从1到0逐渐过渡,标注值可为0.9、0.8、0.7、0.6、0.5、0.4、0.3、0.2和0.1等,例如,越靠近抠图目标的内部,其标注值越接近1,目标图像中除抠图目标内部区域和过渡区域之外的部分标注值为0。The first sample segmented image and the second sample segmented image may be the same or different. The first sample segmentation image and the second sample segmentation image carry the first segmentation label, the sample cutout image carries the second segmentation label, the first segmentation label and the second segmentation label are labels with different segmentation granularities, and the first segmentation label The segmentation granularity is greater than the segmentation granularity of the second segmentation label. For example, the first segmentation label can be a two-category label, the value of the cutout target can be marked as 1, and the value of the area outside the cutout target in the target image can be marked as 0. The second segmentation label can be a multi-category label. The value inside the cutout target can be marked as 1, a transition area is set at the edge of the cutout target, and the marked value in the transition area gradually transitions from 1 to 0. The marked values can be 0.9, 0.8, 0.7, 0.6, 0.5, 0.4 , 0.3, 0.2 and 0.1, etc., for example, the closer to the interior of the cutout target, the closer its label value is to 1, and the label value of the part of the target image except the interior area and transition area of the cutout target is 0.
基于上述获取到的第一样本分割图像,第二样本分割图像和样本抠图图像,即可进行模型训练。Based on the first sample segmentation image, the second sample segmentation image and the sample cutout image obtained above, model training can be performed.
对于模型训练,分为两个阶段:单任务训练和双任务训练,单任务训练为进行语义分割训练,具体为对原始抠图网络进行训练,得到基础抠图网络,双任务训练为语义分割训练和抠图训练交替进行,具体为对基础抠图网络进行训练,得到目标抠图模型。For model training, it is divided into two stages: single-task training and dual-task training. Single-task training is semantic segmentation training, specifically training the original matting network to obtain a basic matting network, and dual-task training is semantic segmentation training. Alternate with matting training, specifically training the basic matting network to obtain the target matting model.
图10是根据一示例性实施例示出的一种单任务训练的流程图,如图10所示,包括:Figure 10 is a flow chart of single-task training according to an exemplary embodiment. As shown in Figure 10, it includes:
S1001、根据多个所述第二样本分割图像,对所述原始抠图网络进行多轮迭代的分割训练。S1001. Perform multiple rounds of iterative segmentation training on the original matting network based on multiple second sample segmented images.
S1002、在每一轮迭代训练之后,获取本轮分割训练输出的多个粗分割结果。S1002. After each round of iterative training, obtain multiple rough segmentation results output by this round of segmentation training.
S1003、根据本轮分割训练输出的多个粗分割结果以及本轮分割训练中的第二样本分割图像携带的第一分割标签,得到本轮分割训练对应的语义分割损失。S1003. Obtain the semantic segmentation loss corresponding to this round of segmentation training based on the multiple rough segmentation results output by this round of segmentation training and the first segmentation label carried by the second sample segmentation image in this round of segmentation training.
S1004、根据所述本轮分割训练对应的语义分割损失,对所述原始抠图网络进行优化。S1004. Optimize the original matting network according to the semantic segmentation loss corresponding to the current round of segmentation training.
S1005、当所述原始抠图网络收敛时,停止训练,得到所述基础抠图网络。S1005. When the original matting network converges, stop training and obtain the basic matting network.
其中,可将多个第二样本分割图像分为训练集、测试集和验证集,对原始抠图网络进行多轮分割训练,原始抠图网络的具体结构可参考上述解释,在此不再赘述。针对训练阶段,在原始抠图网络的上采样模块中,上采样模块包含的每个上采样卷积模块连接一个粗分割预测头,每个粗分割预测头用于输出对应的上采样卷积模块上采样后的粗分割结果,在每一轮迭代训练之后,即可获取到本轮分割训练输出的多个粗分割结果。从而根据得到的多个粗分割结果和本轮分割训练中使用的第二样本分割图像所携带的第一分割标签,计算得到本轮分割训练对应的语义分割损失,根据得到的语义分割损失,计算调整参数,以便根据调整参数对原始抠图网络进行优化。通过多轮训练以及优化,直到原始抠图网络收敛。即,计算得到的语义分割损失不再变化或者变化率小于预设阈值,即可停止训练,此处的变化率的预设阈值可为0.1,或者,也可设定单任务训练的迭代次数,当达到迭代次数时,即可停止训练,得到基础抠图网络。单任务训练中,初始学习率可为0.0001,优化器可使用RMSprop优化器,每8次迭代以0.9的动量减小学习率。Among them, multiple second sample segmented images can be divided into training sets, test sets and verification sets, and multiple rounds of segmentation training are performed on the original matting network. The specific structure of the original matting network can be referred to the above explanation, and will not be repeated here. . For the training phase, in the upsampling module of the original matting network, each upsampling convolution module included in the upsampling module is connected to a coarse segmentation prediction head, and each coarse segmentation prediction head is used to output the corresponding upsampling convolution module. After upsampling, after each round of iterative training, multiple rough segmentation results output from this round of segmentation training can be obtained. Therefore, based on the multiple rough segmentation results obtained and the first segmentation label carried by the second sample segmentation image used in this round of segmentation training, the semantic segmentation loss corresponding to this round of segmentation training is calculated. Based on the obtained semantic segmentation loss, calculate Adjust the parameters to optimize the original matting network based on the adjusted parameters. Through multiple rounds of training and optimization, until the original matting network converges. That is, if the calculated semantic segmentation loss no longer changes or the change rate is less than the preset threshold, the training can be stopped. The preset threshold of the change rate here can be 0.1, or the number of iterations of single-task training can also be set. When the number of iterations is reached, the training can be stopped and the basic matting network is obtained. In single-task training, the initial learning rate can be 0.0001, and the optimizer can use the RMSprop optimizer to reduce the learning rate with a momentum of 0.9 every 8 iterations.
通过单任务训练,能够提高得到的基础抠图网络的图像分割的鲁棒性,准确定位抠图目标,且能够避免抠图目标内部出现空洞。Through single-task training, the robustness of the image segmentation of the obtained basic matting network can be improved, the matting target can be accurately located, and holes inside the matting target can be avoided.
图11是根据一示例性实施例示出的一种双任务训练的流程图,如图11所示,包括:Figure 11 is a flow chart of dual-task training according to an exemplary embodiment. As shown in Figure 11, it includes:
S1101、根据多个所述第一样本分割图像和多个所述样本抠图图像,对所述基础抠图网络进行多轮交替迭代的分割训练和抠图训练。S1101. Perform multiple rounds of alternate iterations of segmentation training and matting training on the basic matting network based on a plurality of the first sample segmentation images and a plurality of the sample matting images.
S1102、在每一轮的分割训练之后,获取本轮分割训练对应的语义分割损失。S1102. After each round of segmentation training, obtain the semantic segmentation loss corresponding to this round of segmentation training.
S1103、根据所述本轮分割训练对应的语义分割损失,对所述基础抠图网络进行优化。S1103. Optimize the basic matting network according to the semantic segmentation loss corresponding to the current round of segmentation training.
S1104、在每一轮的抠图训练之后,根据本轮抠图训练输出的多个粗分割结果、多个精细分割结果、分割信息以及本轮抠图训练中的样本抠图图像携带的第二分割标签,得到本轮抠图训练对应的精细分割总损失。S1104. After each round of matting training, according to the multiple coarse segmentation results, multiple fine segmentation results, segmentation information output by this round of matting training, and the second key carried by the sample matting images in this round of matting training. Segment the labels to get the total fine segmentation loss corresponding to this round of matting training.
S1105、根据所述本轮抠图训练对应的精细分割总损失,对优化后的基础抠图网络再次进行优化。S1105. According to the total fine segmentation loss corresponding to the current round of matting training, optimize the optimized basic matting network again.
S1106、当所述基础抠图网络收敛时,停止训练,得到所述目标抠图模型。S1106. When the basic matting network converges, stop training and obtain the target matting model.
在双任务训练中,基于单任务训练得到的基础抠图网络继续进行训练,在一示例中,对基础抠图网络进行多轮交替迭代的分割训练和抠图训练,即,在每一轮训练中,先进行一次分割训练,再进行一次抠图训练,分割训练可参考上述内容,在此不再赘述。In dual-task training, the basic matting network obtained from single-task training continues to be trained. In one example, the basic matting network is subjected to multiple rounds of alternating iterative segmentation training and matting training, that is, in each round of training In , a segmentation training is performed first, and then a cutout training is performed. For segmentation training, please refer to the above content and will not be repeated here.
在根据分割训练对基础抠图网络进行优化后,可对优化后的基础抠图网络继续进行抠图训练。基础抠图网络包括由多个上采样卷积模块组成的上采样模块,每个上采样卷积模块分别连接一个粗分割预测头和一个精细分割预测头,每个粗分割预测头用于输出对应的上采样卷积模块上采样后的粗分割结果,每个精细分割预测头用于输出对应的上采样卷积模块上采样后的精细分割结果。所以,在每一次抠图训练时,会通过粗分割预测头和精细分割预测头输出多个粗分割结果、多个精细分割结果,以及会输出最终的分割信息。其中,每个上采样卷积模块连接的一个粗分割预测头输出一个粗分割结果,每个上采样卷积模块连接的一个精细分割预测头输出一个精细分割结果。After optimizing the basic matting network based on segmentation training, matting training can continue on the optimized basic matting network. The basic matting network includes an upsampling module composed of multiple upsampling convolution modules. Each upsampling convolution module is connected to a coarse segmentation prediction head and a fine segmentation prediction head. Each coarse segmentation prediction head is used to output the corresponding The coarse segmentation result after upsampling by the upsampling convolution module, and each fine segmentation prediction head is used to output the fine segmentation result after upsampling by the corresponding upsampling convolution module. Therefore, during each cutout training, multiple coarse segmentation results, multiple fine segmentation results will be output through the coarse segmentation prediction head and the fine segmentation prediction head, and the final segmentation information will be output. Among them, a coarse segmentation prediction head connected to each upsampling convolution module outputs a coarse segmentation result, and a fine segmentation prediction head connected to each upsampling convolution module outputs a fine segmentation result.
然后,即可根据本轮抠图训练输出的多个粗分割结果、多个精细分割结果、分割信息以及本轮抠图训练中的样本抠图图像携带的第二分割标签,得到本轮抠图训练对应的精细分割总损失。再根据本轮抠图训练对应的精细分割总损失,对优化后的基础抠图网络再次进行优化,直到基础抠图网络收敛。即,计算得到的语义分割损失和精细分割总损失不再变化或者变化率小于预设阈值,即可停止训练,此处的变化率的预设阈值可为0.1,或者,也可设定双任务训练的交替迭代次数,当达到交替迭代次数时,即可停止训练,得到目标抠图模型。双任务训练中,初始学习率可为0.00001,优化器可使用RMSprop优化器,每4次迭代以0.9的动量减小学习率,直至学习率至0.000001时,学习率不再改变。Then, the current round of matting can be obtained based on the multiple coarse segmentation results, multiple fine segmentation results, segmentation information output by this round of matting training, and the second segmentation labels carried by the sample matting images in this round of matting training. The total loss of fine segmentation corresponding to training. Then, based on the total fine segmentation loss corresponding to this round of matting training, the optimized basic matting network is optimized again until the basic matting network converges. That is, if the calculated semantic segmentation loss and fine segmentation total loss no longer change or the change rate is less than the preset threshold, the training can be stopped. The preset threshold of the change rate here can be 0.1, or dual tasks can also be set. The number of alternating iterations of training. When the number of alternating iterations is reached, the training can be stopped and the target matting model can be obtained. In dual-task training, the initial learning rate can be 0.00001, and the optimizer can use the RMSprop optimizer to reduce the learning rate with a momentum of 0.9 every 4 iterations until the learning rate reaches 0.000001, and the learning rate no longer changes.
通过第一样本分割图像对基础抠图网络训练时的监督信息来辅助样本抠图图像对基础抠图网络的训练,能够在提高定位抠图目标的准确性的同时提高针对抠图目标的分割信息的精准性,且能够加快模型的训练速度。从而能够在仅仅以目标图像作为输入,不需要输入额外辅助图像的情况下,输出更加精准的分割信息,以便得到更加精准的抠图目标。Using the supervision information from the first sample segmentation image to train the basic matting network to assist the sample matting image in training the basic matting network can improve the accuracy of positioning the matting target while improving the segmentation of the matting target. The accuracy of the information and the speed of model training can be accelerated. As a result, more accurate segmentation information can be output without inputting additional auxiliary images without using only the target image as input, so as to obtain more accurate matting targets.
图12是根据一示例性实施例示出的一种得到精细分割总损失的方法的流程图,如图12所示,该方法包括:Figure 12 is a flow chart of a method for obtaining the total loss of fine segmentation according to an exemplary embodiment. As shown in Figure 12, the method includes:
S1201、根据预设分割值,对所述本轮抠图训练中的样本抠图图像携带的第二分割标签进行二值化处理,得到本轮抠图训练中的样本抠图图像对应的二值化分割标签。S1201. Binarize the second segmentation label carried by the sample matting image in the current round of matting training according to the preset segmentation value to obtain the binary value corresponding to the sample matting image in this round of matting training. Split tags.
在抠图训练中,使用的是样本抠图图像,而样本抠图图像携带的是标签较为精确的第二分割标签,即标签值不仅仅为1和0,还包括1和0之间的值,而粗分割结果包含的值为0或1,此时,为了计算语义分割损失,可根据预设分割值,对第二分割标签进行二值化处理,例如,预设分割值可为0.1,即,将第二分割标签中大于或等于0.1的值全部替换为1,小于0.1的值全部替换为0,从而得到一个只包含1和0的二值化分割标签。In the cutout training, sample cutout images are used, and the sample cutout images carry the second segmentation label with a more accurate label, that is, the label value is not only 1 and 0, but also includes values between 1 and 0. , and the coarse segmentation result contains a value of 0 or 1. At this time, in order to calculate the semantic segmentation loss, the second segmentation label can be binarized according to the preset segmentation value. For example, the preset segmentation value can be 0.1, That is, all values greater than or equal to 0.1 in the second segmentation label are replaced with 1, and all values less than 0.1 are replaced with 0, thereby obtaining a binary segmentation label containing only 1 and 0.
S1202、根据本轮抠图训练输出的多个粗分割结果以及所述二值化分割标签,得到本轮抠图训练对应的语义分割损失。S1202. Obtain the semantic segmentation loss corresponding to this round of matting training based on the multiple rough segmentation results output by this round of matting training and the binary segmentation labels.
得到二值化分割标签之后,即可根据本轮抠图训练输出的多个粗分割结果以及二值化分割标签,计算得到本轮抠图训练对应的语义分割损失。After obtaining the binary segmentation labels, the semantic segmentation loss corresponding to this round of matting training can be calculated based on the multiple rough segmentation results and the binary segmentation labels output by this round of matting training.
S1203、根据本轮抠图训练输出的多个精细分割结果、分割信息以及所述第二分割标签,得到本轮抠图训练对应的目标精细分割损失。S1203. Obtain the target fine segmentation loss corresponding to this round of matting training based on the multiple fine segmentation results, segmentation information and the second segmentation label output by this round of matting training.
精细分割结果和分割信息包含的值为0到1之间的数值,且包含0和1,所以,可直 接使用第二分割标签计算目标精细分割损失。The fine segmentation results and segmentation information contain values between 0 and 1, and include 0 and 1. Therefore, the second segmentation label can be directly used to calculate the target fine segmentation loss.
S1204、根据所述本轮抠图训练对应的语义分割损失和所述本轮抠图训练对应的目标精细分割损失,得到所述本轮抠图训练对应的精细分割总损失。S1204. According to the semantic segmentation loss corresponding to the current round of matting training and the target fine segmentation loss corresponding to the current round of matting training, obtain the total fine segmentation loss corresponding to the current round of matting training.
可对本轮抠图训练对应的语义分割损失和本轮抠图训练对应的目标精细分割损失进行加权求和,即可得到本轮抠图训练对应的精细分割总损失。The total fine segmentation loss corresponding to this round of matting training can be obtained by weighting the sum of the semantic segmentation loss corresponding to this round of matting training and the target fine segmentation loss corresponding to this round of matting training.
本实施例中,抠图训练既输出多个粗分割结果,也输出多个精细分割结果以及一个分割信息,从而能够使计算出的精细分割总损失包括语义分割损失和目标精细分割损失,以便能够更加精确地对每个上采样卷积模块的参数进行调整,从而加快模型训练速度。In this embodiment, the matting training outputs not only multiple coarse segmentation results, but also multiple fine segmentation results and one segmentation information, so that the calculated total fine segmentation loss includes semantic segmentation loss and target fine segmentation loss, so that Adjust the parameters of each upsampling convolution module more precisely to speed up model training.
图13是根据一示例性实施例示出的一种得到语义分割损失的方法的流程图,该方法用于计算分割训练对应的分割损失,如图13所示,包括:Figure 13 is a flow chart of a method for obtaining semantic segmentation loss according to an exemplary embodiment. The method is used to calculate the segmentation loss corresponding to segmentation training. As shown in Figure 13, it includes:
S1301、根据多个粗分割结果分别与本轮分割训练对应的第一分割标签之间的距离,得到第一语义分割子损失。S1301. Obtain the first semantic segmentation sub-loss based on the distance between multiple coarse segmentation results and the first segmentation label corresponding to this round of segmentation training.
对于多个粗分割结果,分别计算本轮输出的每一个粗分割结果与本轮分割训练对应的第一分割标签之间的距离,即可得到多个单一的第一语义分割子损失,其中,本轮分割训练可为使用第二样本分割图像对原始抠图网络进行的训练,此时,本轮分割训练对应的第一分割标签可为本轮使用的第二样本分割图像对应的第一分割标签;其中,本轮分割训练还可为使用第一样本分割图像对基础抠图网络进行的训练,此时,本轮分割训练对应的第一分割标签可为本轮使用的第一样本分割图像对应的第一分割标签,即,一个粗分割结果与本轮分割训练对应的第一分割标签,可计算得到一个单一的第一语义分割子损失,可根据多个单一的第一语义分割子损失计算得到最终的第一语义分割子损失,具体可对多个单一的第一语义分割子损失进行加权求和,得到最终的第一语义分割子损失。For multiple coarse segmentation results, calculate the distance between each coarse segmentation result output in this round and the first segmentation label corresponding to this round of segmentation training to obtain multiple single first semantic segmentation sub-losses, where, This round of segmentation training may be the training of the original matting network using the second sample segmentation image. At this time, the first segmentation label corresponding to this round of segmentation training may be the first segmentation corresponding to the second sample segmentation image used in this round. label; among them, this round of segmentation training can also be the training of the basic matting network using the first sample segmentation image. At this time, the first segmentation label corresponding to this round of segmentation training can be the first sample used in this round. The first segmentation label corresponding to the segmented image, that is, a rough segmentation result and the first segmentation label corresponding to this round of segmentation training, can be calculated to obtain a single first semantic segmentation sub-loss, which can be calculated based on multiple single first semantic segmentations The sub-loss is calculated to obtain the final first semantic segmentation sub-loss. Specifically, multiple single first semantic segmentation sub-losses can be weighted and summed to obtain the final first semantic segmentation sub-loss.
其中,单一的第一语义分割子损失的计算公式可为:Among them, the calculation formula of the single first semantic segmentation sub-loss can be:
Figure PCTCN2022096483-appb-000001
Figure PCTCN2022096483-appb-000001
Figure PCTCN2022096483-appb-000002
Figure PCTCN2022096483-appb-000002
其中,m g为第一分割标签对应的真实标签值,m p为输出的粗分割结果对应的值,β是用于控制样本对损失的影响权重的参数,γ是用于调节对难分类样本的关注度的参数,L f为单一的第一语义分割子损失。 Among them, m g is the real label value corresponding to the first segmentation label, m p is the value corresponding to the output rough segmentation result, β is a parameter used to control the weight of the sample's impact on the loss, and γ is used to adjust the sample that is difficult to classify. The parameter of attention, L f is the single first semantic segmentation sub-loss.
S1302、从所述多个粗分割结果中获取多对采样点对,根据所述多对采样点对分别与本轮分割训练对应的第一分割标签之间的距离,得到第二语义分割子损失。S1302. Obtain multiple pairs of sampling points from the plurality of rough segmentation results, and obtain the second semantic segmentation sub-loss according to the distance between the multiple pairs of sampling points and the first segmentation labels corresponding to the current round of segmentation training. .
其中,采样点对为从粗分割结果中获取的两个点所对应的预测值,从多个粗分割结果中获取多对采样点对,具体可为:在输出的粗分割结果中采集两个处于抠图目标边缘的点的预测值作为采样点对,或者,在输出的粗分割结果中采集两个处于抠图目标内部的点的预测值作为采样点对,或者,采集处于抠图目标外部的两个点的预测值作为采样点对,从而能够针对性地优化抠图目标的边缘的预测精度。其中,每一对采样点对与本轮分割训练对应的第一分割标签即可计算得到一个单一的第二语义分割子损失,可根据多个单一的第二语义分割子损失计算得到最终的第二语义分割子损失,具体可对多个单一的第二语义分割子损失进行加权求和,得到最终的第二语义分割子损失。Among them, the sampling point pair is the predicted value corresponding to the two points obtained from the coarse segmentation result, and multiple pairs of sampling point pairs are obtained from multiple coarse segmentation results. Specifically, the pair can be: collecting two points from the output coarse segmentation result. The predicted value of the point at the edge of the cutout target is used as a sampling point pair, or the predicted value of two points inside the cutout target is collected as a sampling point pair in the output coarse segmentation result, or the predicted value is collected outside the cutout target. The predicted values of the two points are used as a pair of sampling points, so that the prediction accuracy of the edge of the matting target can be optimized in a targeted manner. Among them, a single second semantic segmentation sub-loss can be calculated for each pair of sampling points and the first segmentation label corresponding to this round of segmentation training. The final second semantic segmentation sub-loss can be calculated based on multiple single second semantic segmentation sub-losses. Second semantic segmentation sub-loss, specifically, multiple single second semantic segmentation sub-losses can be weighted and summed to obtain the final second semantic segmentation sub-loss.
其中,单一的第二语义分割子损失的计算公式可为:Among them, the calculation formula of a single second semantic segmentation sub-loss can be:
Figure PCTCN2022096483-appb-000003
Figure PCTCN2022096483-appb-000003
其中
Figure PCTCN2022096483-appb-000004
Figure PCTCN2022096483-appb-000005
代表的是两个采样点分别对应的预测值,s表示预设距离阈值,为了保证计算准确度,当采样点对的值与第一分割标签对应的真实标签值的距离大于预设阈值时,则不将该采样点对加入单一的第二语义分割子损失的计算,L r为单一的第二语义分 割子损失,|| || 1为求绝对值。
in
Figure PCTCN2022096483-appb-000004
and
Figure PCTCN2022096483-appb-000005
represents the predicted value corresponding to the two sampling points, and s represents the preset distance threshold. In order to ensure calculation accuracy, when the distance between the value of the sampling point pair and the real label value corresponding to the first segmentation label is greater than the preset threshold, Then the sampling point pair is not added to the calculation of the single second semantic segmentation sub-loss. L r is the single second semantic segmentation sub-loss, and || || 1 is the absolute value.
S1303、根据所述第一语义分割子损失和所述第二语义分割子损失得到本轮分割训练对应的语义分割损失。S1303. Obtain the semantic segmentation loss corresponding to this round of segmentation training based on the first semantic segmentation sub-loss and the second semantic segmentation sub-loss.
其中,得到第一语义分割子损失和第二语义分割子损失后,即可直接将第一语义分割子损失确定为本轮分割训练对应的语义分割损失,也可直接将第二语义分割子损失确定为本轮分割训练对应的语义分割损失,还可对第一语义分割子损失和第二语义分割子损失进行加权求和,得到本轮分割训练对应的语义分割损失。Among them, after obtaining the first semantic segmentation sub-loss and the second semantic segmentation sub-loss, the first semantic segmentation sub-loss can be directly determined as the semantic segmentation loss corresponding to this round of segmentation training, or the second semantic segmentation sub-loss can be directly determined as To determine the semantic segmentation loss corresponding to this round of segmentation training, the first semantic segmentation sub-loss and the second semantic segmentation sub-loss can also be weighted and summed to obtain the semantic segmentation loss corresponding to this round of segmentation training.
图14是根据一示例性实施例示出的一种得到语义分割损失的方法的流程图,该方法用于计算抠图训练中的语义分割损失,如图14所示,包括:Figure 14 is a flow chart of a method for obtaining semantic segmentation loss according to an exemplary embodiment. The method is used to calculate the semantic segmentation loss in matting training. As shown in Figure 14, it includes:
S1401、根据本轮抠图训练输出的多个粗分割结果分别与所述二值化分割标签之间的距离,得到第三语义分割子损失。S1401. Obtain the third semantic segmentation sub-loss according to the distance between the multiple rough segmentation results output by this round of matting training and the binary segmentation labels.
S1402、从本轮抠图训练输出的多个粗分割结果中获取多对采样点对,根据所述多对采样点对分别与所述二值化分割标签之间的距离,得到第四语义分割子损失。S1402. Obtain multiple pairs of sampling points from multiple rough segmentation results output by this round of matting training, and obtain the fourth semantic segmentation based on the distances between the multiple pairs of sampling points and the binary segmentation labels. child loss.
S1403、根据所述第三语义分割子损失和所述第四语义分割子损失得到本轮抠图训练对应的语义分割损失。S1403. Obtain the semantic segmentation loss corresponding to this round of matting training based on the third semantic segmentation sub-loss and the fourth semantic segmentation sub-loss.
对于抠图训练中的语义分割损失,由于使用的是样本抠图图像,而样本抠图图像携带的是标签较为精确的第二分割标签,所以需要使用第二分割标签二值化之后的二值化分割标签进行语义分割损失的计算,具体的计算方法可参考分割训练中的语义分割损失的计算方法,将其中的第一分割标签替换为二值化分割标签即可,在此不再赘述。For the semantic segmentation loss in cutout training, since the sample cutout image is used, and the sample cutout image carries the second segmentation label with a more accurate label, it is necessary to use the binary value after the second segmentation label is binarized. Use binary segmentation labels to calculate semantic segmentation loss. For specific calculation methods, please refer to the calculation method of semantic segmentation loss in segmentation training. Just replace the first segmentation label with a binary segmentation label, which will not be described again here.
图15是根据一示例性实施例示出的一种得到目标精细分割损失的方法的流程图,如图15所示,该方法包括:Figure 15 is a flow chart of a method for obtaining target fine segmentation loss according to an exemplary embodiment. As shown in Figure 15, the method includes:
S1501、计算本轮抠图训练输出的分割信息以及多个精细分割结果分别与所述第二分割标签之间的平均绝对误差。S1501. Calculate the average absolute error between the segmentation information output by this round of matting training and the multiple fine segmentation results and the second segmentation label.
S1502、将所述平均绝对误差确定为本轮抠图训练对应的第一精细分割子损失。S1502. Determine the average absolute error as the first fine segmentation sub-loss corresponding to this round of matting training.
该方法中,上述目标精细分割损失可包括第一精细分割子损失,其中,一轮训练只输出一个分割信息,还可输出多个精细分割结果,该分割信息与第二分割标签可计算得到一个平均绝对子误差,一个精细分割结果可与第二分割标签计算得到一个平均绝对子误差,多个精细分割结果即可对应得到多个平均绝对子误差。然后,将分割信息得到的平均绝对子误差与多个精细分割结果对应得到的多个平均绝对子误差进行加权求和,即可得到平均绝对误差,然后将该平均绝对误差确定为本轮抠图训练对应的第一精细分割子损失。In this method, the above-mentioned target fine segmentation loss may include a first fine segmentation sub-loss, in which one round of training only outputs one segmentation information, and may also output multiple fine segmentation results. This segmentation information and the second segmentation label can be calculated to obtain a Average absolute sub error. A fine segmentation result can be calculated with the second segmentation label to obtain an average absolute sub error. Multiple fine segmentation results can correspond to multiple average absolute sub errors. Then, the average absolute sub-error obtained from the segmentation information and the multiple average absolute sub-errors obtained corresponding to multiple fine segmentation results are weighted and summed to obtain the average absolute error, and then the average absolute error is determined as the current round of matting Train the corresponding first fine segmentation sub-loss.
其中,平均绝对子误差的计算公式可为:Among them, the calculation formula of the average absolute sub-error can be:
Figure PCTCN2022096483-appb-000006
Figure PCTCN2022096483-appb-000006
其中,α p为分割信息或精细分割结果,α g为对应的第二分割标签,
Figure PCTCN2022096483-appb-000007
为平均绝对子误差,|| || 1为求绝对值。
Among them, α p is the segmentation information or fine segmentation result, α g is the corresponding second segmentation label,
Figure PCTCN2022096483-appb-000007
is the average absolute sub-error, || || 1 is the absolute value.
对本轮抠图训练对应的语义分割损失和本轮抠图训练对应的目标精细分割损失进行加权求和,即可得到本轮抠图训练对应的精细分割总损失。By weighting the sum of the semantic segmentation loss corresponding to this round of matting training and the target fine segmentation loss corresponding to this round of matting training, the total fine segmentation loss corresponding to this round of matting training can be obtained.
上述目标精细分割损失还包括以下至少一者:第二精细分割子损失、第三精细分割子损失和第四精细分割子损失。其中,第二精细分割子损失可以是计算本轮抠图训练输出的分割信息以及多个精细分割结果分别与所述第二分割标签之间的多尺度的拉普拉斯损失得到的;第三精细分割子损失可以是计算本轮抠图训练输出的分割信息以及多个精细分割结果分别与所述第二分割标签之间的梯度损失得到的;第四精细分割子损失可以是计算多个预测合成图分别与标签合成图之间的合成损失得到的,该多个预测合成图为 根据本轮抠图训练输出的多个精细分割结果以及分割信息得到的多个抠图目标分别与背景图合成的图像,该标签合成图为根据所述第二分割标签得到的抠图目标与所述背景图合成的图像。在目标精细分割损失包括多个精细分割子损失的情况下,可以对多个精细分割子损失进行加权求和,得到本轮抠图训练对应的目标精细分割损失。The above target fine segmentation loss also includes at least one of the following: a second fine segmentation sub-loss, a third fine segmentation sub-loss, and a fourth fine segmentation sub-loss. The second fine segmentation sub-loss can be obtained by calculating the multi-scale Laplacian loss between the segmentation information output by this round of matting training and multiple fine segmentation results respectively and the second segmentation label; the third The fine segmentation sub-loss can be obtained by calculating the segmentation information output by the current round of matting training and the gradient loss between multiple fine segmentation results and the second segmentation label respectively; the fourth fine segmentation sub-loss can be obtained by calculating multiple predictions. The multiple predicted synthetic images are obtained by the synthesis loss between the synthetic image and the label synthetic image. The multiple predicted synthetic images are synthesized with the background image based on the multiple fine segmentation results and segmentation information obtained from the current round of matting training. The image, the label composite image is a composite image of the cutout target obtained according to the second segmentation label and the background image. When the target fine segmentation loss includes multiple fine segmentation sub-losses, the multiple fine segmentation sub-losses can be weighted and summed to obtain the target fine segmentation loss corresponding to this round of matting training.
图16是根据一示例性实施例示出的一种得到目标精细分割损失的方法的流程图,如图16所示,该方法包括:Figure 16 is a flow chart of a method for obtaining target fine segmentation loss according to an exemplary embodiment. As shown in Figure 16, the method includes:
S1601、计算本轮抠图训练输出的分割信息以及多个精细分割结果分别与所述第二分割标签之间的平均绝对误差。S1601. Calculate the average absolute error between the segmentation information output by this round of matting training and the multiple fine segmentation results and the second segmentation label.
S1602、将所述平均绝对误差确定为本轮抠图训练对应的第一精细分割子损失。S1602. Determine the average absolute error as the first fine segmentation sub-loss corresponding to this round of matting training.
S1603、计算本轮抠图训练输出的分割信息以及多个精细分割结果分别与所述第二分割标签之间的多尺度的拉普拉斯损失,得到本轮抠图训练对应的第二精细分割子损失。S1603. Calculate the multi-scale Laplacian loss between the segmentation information output by this round of matting training and the multiple fine segmentation results respectively and the second segmentation labels, and obtain the second fine segmentation corresponding to this round of matting training. child loss.
S1604、计算本轮抠图训练输出的分割信息以及多个精细分割结果分别与所述第二分割标签之间的梯度损失,得到本轮抠图训练对应的第三精细分割子损失。S1604. Calculate the gradient loss between the segmentation information output by this round of matting training and the multiple fine segmentation results and the second segmentation labels respectively, and obtain the third fine segmentation sub-loss corresponding to this round of matting training.
S1605、计算多个预测合成图分别与标签合成图之间的合成损失,得到本轮抠图训练对应的第四精细分割子损失,所述多个预测合成图为根据本轮抠图训练输出的多个精细分割结果以及分割信息得到的多个抠图目标分别与背景图合成的图像,所述标签合成图为根据所述第二分割标签得到的抠图目标与所述背景图合成的图像。S1605. Calculate the synthesis loss between multiple predicted synthetic images and the label synthetic image respectively, and obtain the fourth fine segmentation sub-loss corresponding to this round of matting training. The multiple predicted synthetic images are output according to this round of matting training. An image composed of multiple cutout targets obtained from multiple fine segmentation results and segmentation information and the background image respectively. The label composite image is an image synthesized from the cutout targets obtained according to the second segmentation label and the background image.
S1606、对第一精细分割子损失、第二精细分割子损失、第三精细分割子损失和第四精细分割子损失进行加权求和,得到本轮抠图训练对应的目标精细分割损失。S1606. Perform a weighted sum of the first fine segmentation sub-loss, the second fine segmentation sub-loss, the third fine segmentation sub-loss and the fourth fine segmentation sub-loss to obtain the target fine segmentation loss corresponding to this round of matting training.
其中,一个分割信息与第二分割标签可计算得到一个多尺度的拉普拉斯损失,一个精细分割结果可与第二分割标签计算得到一个多尺度的拉普拉斯损失,多个精细分割结果即可对应得到多个多尺度的拉普拉斯损失,然后,将分割信息得到的多尺度的拉普拉斯损失与多个精细分割结果对应得到的多尺度的拉普拉斯损失进行加权求和,即可得到第二精细分割子损失。Among them, one segmentation information and the second segmentation label can be calculated to obtain a multi-scale Laplacian loss, and a fine segmentation result can be calculated with the second segmentation label to obtain a multi-scale Laplacian loss. Multiple fine segmentation results That is, multiple multi-scale Laplacian losses can be obtained correspondingly. Then, the multi-scale Laplacian loss obtained from the segmentation information and the multi-scale Laplacian loss obtained corresponding to multiple fine segmentation results are weighted. and, the second fine segmentation loss can be obtained.
其中,多尺度的拉普拉斯损失的计算公式可为:Among them, the calculation formula of multi-scale Laplacian loss can be:
Figure PCTCN2022096483-appb-000008
Figure PCTCN2022096483-appb-000008
其中,α p为分割信息或精细分割结果,α g为对应的第二分割标签,f s(x)表示拉普拉斯金字塔计算,将分割信息依次进行下采样,在不同的尺度上计算相似性,f s(x)中的x可为α p或α g,L lap为多尺度的拉普拉斯损失。 Among them, α p is the segmentation information or fine segmentation result, α g is the corresponding second segmentation label, f s (x) represents the Laplacian pyramid calculation, the segmentation information is sequentially downsampled, and similarity is calculated at different scales. property, x in f s (x) can be α p or α g , and L lap is the multi-scale Laplacian loss.
其中,一个分割信息与第二分割标签可计算得到一个梯度损失,一个精细分割结果可与第二分割标签计算得到一个梯度损失,多个精细分割结果即可对应得到多个梯度损失,然后,将分割信息得到的梯度损失与多个精细分割结果对应得到的梯度损失进行加权求和,即可得到第三精细分割子损失。Among them, one segmentation information and the second segmentation label can be calculated to obtain a gradient loss, a fine segmentation result can be calculated with the second segmentation label to obtain a gradient loss, and multiple fine segmentation results can correspond to multiple gradient losses. Then, The third fine segmentation sub-loss can be obtained by weighting the sum of the gradient loss obtained from the segmentation information and the gradient losses corresponding to multiple fine segmentation results.
其中,梯度损失的计算公式可为:Among them, the calculation formula of gradient loss can be:
L g=||G(α p)-G(α g)|| 1 L g =||G(α p )-G(α g )|| 1
其中,G(x)表示sobel算子(索贝尔算子),α p为分割信息或精细分割结果,α g为对应的第二分割标签,G(x)中的x可为α p或α g,L g为梯度损失。 Among them, G(x) represents the sobel operator (Sobel operator), α p is the segmentation information or fine segmentation result, α g is the corresponding second segmentation label, and x in G(x) can be α p or α g , L g is the gradient loss.
其中,一个精细分割结果得到的抠图目标与背景图可合成得到一个预测合成图,对个精细分割结果即可得到多个预测合成图,一个分割信息与背景图可合成得到一个预测合成图。然后,一个预测合成图可与标签合成图得到一个合成损失,多个预测合成图即可对应得到多个合成损失,对多个合成损失进行加权求和,即可得到本轮抠图训练对应 的第四精细分割子损失。Among them, the cutout target and the background image obtained from a fine segmentation result can be synthesized to obtain a predicted composite image, multiple predicted composite images can be obtained from each fine segmentation result, and a segmentation information and the background image can be synthesized to obtain a predicted composite image. Then, a predicted synthetic image can be combined with a label synthetic image to obtain a synthetic loss. Multiple predicted synthetic images can correspond to multiple synthetic losses. The multiple synthetic losses can be weighted and summed to obtain the corresponding result of this round of matting training. Fourth fine segmentation loss.
其中,合成损失的计算公式可为:Among them, the calculation formula of synthetic loss can be:
L C=||cp-cg|| 1 L C =||cp-cg|| 1
其中,c p为预测合成图,c g为标签合成图,L C为合成损失。 Among them, c p is the prediction synthetic map, c g is the label synthetic map, and L C is the synthetic loss.
其中,为了突出在抠图目标的边缘处的差异,背景图为随机选取的一个新的背景图,新的背景图与抠图目标原有的背景图不同。Among them, in order to highlight the differences at the edges of the cutout target, the background image is a new background image randomly selected, and the new background image is different from the original background image of the cutout target.
通过上述方法,针对分割训练和抠图训练设计了多个损失函数的计算方法,从而能够计算出更加准确的损失函数,以便能够使训练得到的目标抠图模型输出的分割信息更加精准。Through the above method, multiple loss function calculation methods are designed for segmentation training and matting training, so that a more accurate loss function can be calculated, so that the segmentation information output by the trained target matting model can be more accurate.
以下以抠图目标为人像为例,提供一种针对人像的目标抠图模型的训练方法,具体如下:The following takes the cutout target as a portrait as an example to provide a training method for a target cutout model for portraits, as follows:
目标抠图模型的训练可包括样本数据获取和模型训练两个部分。The training of the target matting model can include two parts: sample data acquisition and model training.
其中,样本数据获取包括数据集的获取和数据预处理,其中,Among them, sample data acquisition includes the acquisition of data sets and data preprocessing, where,
数据集的获取为:使用的数据集分为两个部分,语义分割数据集(包括多个携带第一分割标签的第一样本分割图像和第二样本分割图像)和Matting数据集(包含多个携带第二分割标签的样本抠图图像)。其中包含自采数据和公开数据集。自采语义分割数据为7W左右,再加上公开数据集Dark Complexion Portrait Segmentation Dataset(深色人像分割数据集)。自采Matting数据集约3700张高精度标注,外加公开数据集。The data set is obtained as follows: the data set used is divided into two parts, the semantic segmentation data set (including multiple first sample segmentation images and second sample segmentation images carrying the first segmentation label) and the Matting data set (including multiple sample cutout image carrying the second segmentation label). It contains self-collected data and public datasets. The self-collected semantic segmentation data is about 7W, plus the public data set Dark Complexion Portrait Segmentation Dataset (Dark Portrait Segmentation Dataset). The self-collected Matting data set contains about 3,700 high-precision annotations, plus public data sets.
然后,对获取的数据集进行数据预处理,具体为:Then, perform data preprocessing on the obtained data set, specifically:
数据预处理分为两个部分,其中人像分割数据输入大小为512,数据预处理包括随机缩放、水平翻转、旋转和色彩抖动等。人像Mating数据输入大小为1024,在输入原始抠图网络之前会先进行下采样到512。Matting数据预处理包含仿射变换、旋转、翻转、色彩抖动和随机噪声或者锐化处理。图17是根据一示例性实施例示出的一种对前景图进行色彩迁移的示意图,如图17所示,在Matting数据的扩增中,由于背景图与前景图(Matting数据中的图像)色彩空间分布差异通常较大,以一定的概率对前景图进行色彩迁移处理,使前景图与背景图融合的更加自然真实,图中的Alpha标注即为前景图的标签,预处理后图像为Matting数据中的图像经过背景替换以及色彩迁移后的图像,预处理后Alpha为Matting数据中的图像经过背景替换以及色彩迁移后的图像对应的标签。Data preprocessing is divided into two parts. The input size of the portrait segmentation data is 512. Data preprocessing includes random scaling, horizontal flipping, rotation and color dithering. The input size of the portrait mating data is 1024, which will be downsampled to 512 before being input to the original matting network. Matting data preprocessing includes affine transformation, rotation, flipping, color dithering and random noise or sharpening. Figure 17 is a schematic diagram of color migration of a foreground image according to an exemplary embodiment. As shown in Figure 17, in the amplification of Matting data, due to the color of the background image and the foreground image (image in the Matting data) The spatial distribution difference is usually large. The foreground image is color migrated with a certain probability to make the foreground image and background image more natural and realistic. The Alpha annotation in the image is the label of the foreground image, and the preprocessed image is Matting data. The image in the Matting data is the image after background replacement and color migration. After preprocessing, Alpha is the label corresponding to the image in the Matting data after background replacement and color migration.
模型训练:Model training:
整个训练过程分为两个阶段,语义分割阶段和双任务训练阶段。语义分割阶段只使用人像分割数据,各级预测头仅使用分割头。初始学习率为0.0001,优化器使用RMSprop(Root Mean Square Prop,均方根传递),每8个epoch(数据输入网络,完成了一次前向计算和反向传播)以0.9的动量减小学习率,即学习率调整为原来的0.9倍。第二阶段是语义分割与matting训练同时训练阶段,使用人像分割数据和人像Mating数据(人像抠图数据),语义分割与matting训练交替进行。这个交替训练的策略在matting数据量很少且网络没有额外三分图或者背景图输入的情况下,对维持模型的语义鲁棒性能起到至关重要的作用。在使用人像分割数据时仅计算语义分割损失。为加强matting数据训练阶段的语义监督,对于matting数据的标签根据阈值0.1进行二值化,同时计算语义分割损失和matting损失(精细分割总损失)。这阶段初始学习率为0.00001,每4个epoch以0.9的动量减小学习率,直至0.000001保持不变。The entire training process is divided into two stages, the semantic segmentation stage and the dual-task training stage. The semantic segmentation stage only uses portrait segmentation data, and the prediction heads at all levels only use segmentation heads. The initial learning rate is 0.0001, the optimizer uses RMSprop (Root Mean Square Prop, root mean square propagation), and every 8 epochs (data is input into the network, a forward calculation and backpropagation are completed) to reduce the learning rate with a momentum of 0.9 , that is, the learning rate is adjusted to 0.9 times the original value. The second stage is the simultaneous training stage of semantic segmentation and matting training, using portrait segmentation data and portrait mating data (portrait matting data), and semantic segmentation and matting training are performed alternately. This alternating training strategy plays a crucial role in maintaining the semantically robust performance of the model when the amount of matting data is small and the network does not have additional third-part image or background image input. Only the semantic segmentation loss is calculated when using portrait segmentation data. In order to strengthen the semantic supervision in the matting data training stage, the labels of the matting data are binarized according to the threshold of 0.1, and the semantic segmentation loss and matting loss (total fine segmentation loss) are calculated at the same time. The initial learning rate at this stage is 0.00001, and the learning rate is reduced with a momentum of 0.9 every 4 epochs until it remains unchanged at 0.000001.
其中,损失函数由两个部分组成,语义分割损失和matting损失。其中语义分割损失包含一个focalloss(焦点损失)和根据rankingloss(排序损失)实现的针对人像分割的损失函数。Matting损失包括L1损失、多尺度拉普拉斯损失、梯度损失和合成图损失。具 体描述如下:Among them, the loss function consists of two parts, semantic segmentation loss and matting loss. The semantic segmentation loss includes a focal loss (focus loss) and a loss function for portrait segmentation based on ranking loss (ranking loss). Matting losses include L1 loss, multi-scale Laplacian loss, gradient loss and synthetic graph loss. The specific description is as follows:
语义分割损失:由focalloss和改进的rankingloss组成。假设网络预测的mask值为m p,真实标签对应值为m g,其中,mask值为掩码值,此处为网络预测得到的掩码图中的每个像素点所对应的掩码值,真实标签为样本图像所携带的标签,例如,人像分割数据或人像Mating数据所携带的标签。focalloss为L f,具体计算公式如下: Semantic segmentation loss: composed of focal loss and improved ranking loss. Assume that the mask value predicted by the network is m p and the corresponding value of the real label is m g , where the mask value is the mask value, and here is the mask value corresponding to each pixel in the mask map predicted by the network, The real label is the label carried by the sample image, for example, the label carried by the portrait segmentation data or the portrait mating data. focal loss is L f , and the specific calculation formula is as follows:
Figure PCTCN2022096483-appb-000009
Figure PCTCN2022096483-appb-000009
Figure PCTCN2022096483-appb-000010
Figure PCTCN2022096483-appb-000010
其中β是用于控制正负样本对损失的影响权重的参数,γ则是用于调节对难分类样本的关注度。Among them, β is a parameter used to control the weight of positive and negative samples on the loss, and γ is used to adjust the attention to difficult-to-classify samples.
Rankingloss为L r,具体计算公式如下: Rankingloss is L r , and the specific calculation formula is as follows:
Figure PCTCN2022096483-appb-000011
Figure PCTCN2022096483-appb-000011
其中
Figure PCTCN2022096483-appb-000012
Figure PCTCN2022096483-appb-000013
代表的是两个采样点分别对应的mask预测值,s表示一定的阈值,当负样本与标签距离大于一定阈值就不加入损失计算。
in
Figure PCTCN2022096483-appb-000012
and
Figure PCTCN2022096483-appb-000013
It represents the mask prediction value corresponding to the two sampling points, and s represents a certain threshold. When the distance between the negative sample and the label is greater than a certain threshold, the loss calculation will not be added.
图18是根据一示例性实施例示出的一种采样点对的示意图,如图18所示,为了针对性的优化人像边缘预测精度,设计了两种采样方式,分别在人像边缘处和人像与背景内部进行采样,从而获得一定数量的采样点对进行损失计算,图中的原图即为目标图像,标签即为目标图像对应的标签,边缘采样点为在人像边缘处采样的多个点,内部采样点为在人像与背景内部进行采样的多个点。Figure 18 is a schematic diagram of a pair of sampling points according to an exemplary embodiment. As shown in Figure 18, in order to optimize the accuracy of portrait edge prediction, two sampling methods are designed, respectively at the edge of the portrait and between the portrait and the portrait. Sampling is performed inside the background to obtain a certain number of sampling points for loss calculation. The original image in the picture is the target image, the label is the label corresponding to the target image, and the edge sampling points are multiple points sampled at the edge of the portrait. Internal sampling points are multiple points sampled inside the portrait and the background.
Matting损失:这部分损失由四个组成部分,其中三个是预测的Alpha(即输出的精细分割结果或分割信息)与标签之间的损失计算,最后一个是根据预测的alpha和标签分别生成的合成图之间的损失计算,其中,预测的alpha为一个掩码图,掩码图中每个像素点对应一个0到1的掩码值。假设预测的alpha的值为α p,对应标签值为α g。第一个损失函数是L1损失
Figure PCTCN2022096483-appb-000014
计算公式如下:
Matting loss: This part of the loss consists of four components, three of which are loss calculations between the predicted Alpha (i.e., the output fine segmentation result or segmentation information) and the label, and the last one is generated based on the predicted alpha and label respectively. Loss calculation between synthetic images, where the predicted alpha is a mask image, and each pixel in the mask image corresponds to a mask value from 0 to 1. Assume that the predicted alpha value is α p and the corresponding label value is α g . The first loss function is L1 loss
Figure PCTCN2022096483-appb-000014
Calculated as follows:
Figure PCTCN2022096483-appb-000015
Figure PCTCN2022096483-appb-000015
第二个损失函数是一个多尺度的拉普拉斯损失,将预测的alpha依次进行下采样,在不同的尺度上计算相似性。具体计算如下:The second loss function is a multi-scale Laplacian loss, which downsamples the predicted alpha in sequence and calculates similarity at different scales. The specific calculation is as follows:
Figure PCTCN2022096483-appb-000016
Figure PCTCN2022096483-appb-000016
其中f s(x)表示拉普拉斯金字塔计算。 where f s (x) represents the Laplacian pyramid calculation.
第三个损失是梯度损失L g,用G(x)表示sobel算子,那么具体计算如下: The third loss is the gradient loss L g , using G(x) to represent the sobel operator, then the specific calculation is as follows:
L g=||G(α p)-G(α g)|| 1 L g =||G(α p )-G(α g )|| 1
最后一个损失是合成图损失L C,为了突出在人像边缘处的差异,在计算时使用随机的新背景图用于合成图的生成。假设c p表示预测的alpha生成的图像,c g则表示标签生成的图像,计算公式如下: The last loss is the composite image loss L C. In order to highlight the differences at the edges of the portrait, a random new background image is used in the calculation for the generation of the composite image. Assume that c p represents the image generated by predicted alpha, and c g represents the image generated by label. The calculation formula is as follows:
L C=||c p-c g|| 1 L C =||c p -c g || 1
总计损失就是对以上的损失按照一定的比例求和所得。The total loss is the sum of the above losses according to a certain proportion.
对于上述方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开并不受上文所描述的动作顺序的限制。其次,本领域技术人员也应该知悉,上文所描述的实施例属于优选实施例,所涉及的步骤并不一定是本公开所必须的。For the above method embodiments, for the sake of simple description, they are all expressed as a series of action combinations. However, those skilled in the art should know that the present disclosure is not limited by the action sequence described above. Secondly, those skilled in the art should also know that the embodiments described above are preferred embodiments, and the steps involved are not necessarily necessary for the present disclosure.
图19是根据一示例性实施例示出的一种图像处理装置的结构框图。该图像处理装置1900可以是通过软件、硬件或者软件与硬件结合实现的,用以执行前述方法实施例提供的图像处理方法的步骤。参照图19,该图像处理装置1900包括第一获取模块1901,分割模块1902和抠图目标确定模块1903。Figure 19 is a structural block diagram of an image processing device according to an exemplary embodiment. The image processing device 1900 may be implemented by software, hardware, or a combination of software and hardware, and is used to execute the steps of the image processing method provided by the foregoing method embodiments. Referring to FIG. 19 , the image processing device 1900 includes a first acquisition module 1901 , a segmentation module 1902 and a cutout target determination module 1903 .
该第一获取模块1901,被配置为获取目标图像;The first acquisition module 1901 is configured to acquire the target image;
该分割模块1902,被配置为通过目标抠图模型确定用于分割所述目标图像的分割信息,所述目标抠图模型基于第一样本分割图像和样本抠图图像交替训练基础抠图网络得到,所述基础抠图网络基于第二样本分割图像训练原始抠图网络得到,所述第一样本分割图像和所述第二样本分割图像携带第一分割标签,所述样本抠图图像携带第二分割标签,所述第一分割标签的分割粒度大于所述第二分割标签的分割粒度;The segmentation module 1902 is configured to determine segmentation information for segmenting the target image through a target matting model. The target matting model is obtained by alternately training a basic matting network based on the first sample segmentation image and the sample matting image. , the basic matting network is obtained by training the original matting network based on the second sample segmentation image, the first sample segmentation image and the second sample segmentation image carry a first segmentation label, and the sample matting image carries a first segmentation label. Two segmentation labels, the segmentation granularity of the first segmentation label is greater than the segmentation granularity of the second segmentation label;
该抠图目标确定模块1903,被配置为根据所述分割信息,确定所述目标图像中的抠图目标。The cutout target determination module 1903 is configured to determine the cutout target in the target image according to the segmentation information.
可选地,所述原始抠图网络包括由多个上采样卷积模块组成的上采样模块,每个上采样卷积模块连接一个粗分割预测头,每个所述粗分割预测头用于输出对应的上采样卷积模块上采样后的粗分割结果。Optionally, the original matting network includes an upsampling module composed of multiple upsampling convolution modules, each upsampling convolution module is connected to a coarse segmentation prediction head, and each of the coarse segmentation prediction heads is used to output The corresponding coarse segmentation result after upsampling by the upsampling convolution module.
可选地,所述装置还包括:Optionally, the device also includes:
第一训练模块,被配置为根据多个所述第二样本分割图像,对所述原始抠图网络进行多轮迭代的分割训练;A first training module configured to segment images according to a plurality of the second samples, and perform multiple rounds of iterative segmentation training on the original matting network;
第二获取模块,被配置为在每一轮迭代训练之后,获取本轮分割训练输出的多个粗分割结果;The second acquisition module is configured to acquire multiple rough segmentation results output by this round of segmentation training after each round of iterative training;
第一获得模块,被配置为根据本轮分割训练输出的多个粗分割结果以及本轮分割训练中的第二样本分割图像携带的第一分割标签,得到本轮分割训练对应的语义分割损失;The first acquisition module is configured to obtain the semantic segmentation loss corresponding to this round of segmentation training based on the multiple rough segmentation results output by this round of segmentation training and the first segmentation label carried by the second sample segmentation image in this round of segmentation training;
第一优化模块,被配置为根据所述本轮分割训练对应的语义分割损失,对所述原始抠图网络进行优化;The first optimization module is configured to optimize the original matting network according to the semantic segmentation loss corresponding to the current round of segmentation training;
第二获得模块,被配置为当所述原始抠图网络收敛时,停止训练,得到所述基础抠图网络。The second acquisition module is configured to stop training when the original matting network converges to obtain the basic matting network.
可选地,所述基础抠图网络包括由多个上采样卷积模块组成的上采样模块,每个上采样卷积模块分别连接一个粗分割预测头和一个精细分割预测头,每个所述粗分割预测头用于输出对应的上采样卷积模块上采样后的粗分割结果,每个所述精细分割预测头用于输出对应的上采样卷积模块上采样后的精细分割结果。Optionally, the basic matting network includes an upsampling module composed of multiple upsampling convolution modules, each upsampling convolution module is connected to a coarse segmentation prediction head and a fine segmentation prediction head, each of which The coarse segmentation prediction head is used to output the coarse segmentation result after upsampling by the corresponding upsampling convolution module, and each of the fine segmentation prediction heads is used to output the fine segmentation result after upsampling by the corresponding upsampling convolution module.
可选地,所述装置还包括:Optionally, the device also includes:
第二训练模块,被配置为根据多个所述第一样本分割图像和多个所述样本抠图图像,对所述基础抠图网络进行多轮交替迭代的分割训练和抠图训练;A second training module configured to perform multiple rounds of alternating iterations of segmentation training and matting training on the basic matting network based on a plurality of the first sample segmentation images and a plurality of the sample matting images;
第三获取模块,被配置为在每一轮的分割训练之后,获取本轮分割训练对应的语义分割损失;The third acquisition module is configured to obtain the semantic segmentation loss corresponding to this round of segmentation training after each round of segmentation training;
第二优化模块,被配置为根据所述本轮分割训练对应的语义分割损失,对所述基础抠图网络进行优化;The second optimization module is configured to optimize the basic matting network according to the semantic segmentation loss corresponding to the current round of segmentation training;
第三获得模块,被配置为在每一轮的抠图训练之后,根据本轮抠图训练输出的多个粗分割结果、多个精细分割结果、分割信息以及本轮抠图训练中的样本抠图图像携带的第二分割标签,得到本轮抠图训练对应的精细分割总损失;The third acquisition module is configured to, after each round of matting training, output multiple coarse segmentation results, multiple fine segmentation results, segmentation information according to the current round of matting training, and the sample matting in this round of matting training. The second segmentation label carried by the image is used to obtain the total fine segmentation loss corresponding to this round of matting training;
第三优化模块,被配置为根据所述本轮抠图训练对应的精细分割总损失,对优化后的基础抠图网络再次进行优化;The third optimization module is configured to optimize the optimized basic matting network again based on the total fine segmentation loss corresponding to the current round of matting training;
第四获得模块,被配置为当所述基础抠图网络收敛时,停止训练,得到所述目标抠 图模型。The fourth acquisition module is configured to stop training when the basic matting network converges to obtain the target matting model.
可选地,所述第三获得模块,包括:Optionally, the third acquisition module includes:
第一获得子模块,被配置为根据预设分割值,对所述本轮抠图训练中的样本抠图图像携带的第二分割标签进行二值化处理,得到本轮抠图训练中的样本抠图图像对应的二值化分割标签;The first acquisition sub-module is configured to perform binarization processing on the second segmentation label carried by the sample matting image in the current round of matting training according to the preset segmentation value to obtain the sample in the current round of matting training. The binary segmentation label corresponding to the cutout image;
第二获得子模块,被配置为根据本轮抠图训练输出的多个粗分割结果以及所述二值化分割标签,得到本轮抠图训练对应的语义分割损失;The second acquisition sub-module is configured to obtain the semantic segmentation loss corresponding to this round of matting training based on the multiple rough segmentation results output by this round of matting training and the binary segmentation labels;
第三获得子模块,被配置为根据本轮抠图训练输出的多个精细分割结果、分割信息以及所述第二分割标签,得到本轮抠图训练对应的目标精细分割损失;The third acquisition submodule is configured to obtain the target fine segmentation loss corresponding to this round of matting training based on the multiple fine segmentation results, segmentation information and the second segmentation label output by this round of matting training;
第四获得子模块,被配置为根据所述本轮抠图训练对应的语义分割损失和所述本轮抠图训练对应的目标精细分割损失,得到所述本轮抠图训练对应的精细分割总损失。The fourth acquisition sub-module is configured to obtain the fine segmentation total corresponding to the current round of matting training based on the semantic segmentation loss corresponding to the current round of matting training and the target fine segmentation loss corresponding to the current round of matting training. loss.
可选地,所述装置还包括语义分割损失子模块,所述语义分割损失子模块包括:Optionally, the device further includes a semantic segmentation loss sub-module, which includes:
第一获得子单元,被配置为根据多个粗分割结果分别与本轮分割训练对应的第一分割标签之间的距离,得到第一语义分割子损失;The first acquisition sub-unit is configured to obtain the first semantic segmentation sub-loss based on the distance between the multiple coarse segmentation results and the first segmentation label corresponding to the current round of segmentation training;
第二获得子单元,被配置为从所述多个粗分割结果中获取多对采样点对,根据所述多对采样点对分别与本轮分割训练对应的第一分割标签之间的距离,得到第二语义分割子损失;The second acquisition subunit is configured to acquire multiple pairs of sampling points from the plurality of coarse segmentation results, based on the distances between the multiple pairs of sampling points and the first segmentation labels corresponding to the current round of segmentation training, Obtain the second semantic segmentation sub-loss;
第三获得子单元,被配置为根据所述第一语义分割子损失和所述第二语义分割子损失得到本轮分割训练对应的语义分割损失。The third obtaining sub-unit is configured to obtain the semantic segmentation loss corresponding to the current round of segmentation training based on the first semantic segmentation sub-loss and the second semantic segmentation sub-loss.
可选地,所述第二获得子模块,包括:Optionally, the second acquisition sub-module includes:
第四获得子单元,被配置为根据本轮抠图训练输出的多个粗分割结果分别与所述二值化分割标签之间的距离,得到第三语义分割子损失;The fourth acquisition sub-unit is configured to obtain the third semantic segmentation sub-loss based on the distance between the multiple rough segmentation results output by the current round of matting training and the binary segmentation labels;
第五获得子单元,被配置为从本轮抠图训练输出的多个粗分割结果中获取多对采样点对,根据所述多对采样点对分别与所述二值化分割标签之间的距离,得到第四语义分割子损失;The fifth acquisition subunit is configured to obtain multiple pairs of sampling points from multiple coarse segmentation results output by the current round of matting training, and according to the distance between the multiple pairs of sampling points and the binary segmentation labels, distance, the fourth semantic segmentation sub-loss is obtained;
第六获得子单元,被配置为根据所述第三语义分割子损失和所述第四语义分割子损失得到本轮抠图训练对应的语义分割损失。The sixth obtaining sub-unit is configured to obtain the semantic segmentation loss corresponding to the current round of matting training based on the third semantic segmentation sub-loss and the fourth semantic segmentation sub-loss.
可选地,所述目标精细分割损失包括第一精细分割子损失;Optionally, the target fine segmentation loss includes a first fine segmentation sub-loss;
所述第三获得子模块,包括:The third acquisition sub-module includes:
计算子单元,被配置为计算本轮抠图训练输出的分割信息以及多个精细分割结果分别与所述第二分割标签之间的平均绝对误差;A calculation subunit configured to calculate the segmentation information output by the current round of matting training and the average absolute error between the multiple fine segmentation results and the second segmentation label;
确定子单元,被配置为将所述平均绝对误差确定为本轮抠图训练对应的第一精细分割子损失。The determination subunit is configured to determine the average absolute error as the first fine segmentation sub-loss corresponding to the current round of matting training.
可选地,所述目标精细分割损失还包括以下至少一者:第二精细分割子损失、第三精细分割子损失和第四精细分割子损失,相应地,所述第三获得子模块,还包括:Optionally, the target fine segmentation loss also includes at least one of the following: a second fine segmentation sub-loss, a third fine segmentation sub-loss, and a fourth fine segmentation sub-loss. Correspondingly, the third acquisition sub-module also include:
第七获得子单元,被配置为计算本轮抠图训练输出的分割信息以及多个精细分割结果分别与所述第二分割标签之间的多尺度的拉普拉斯损失,得到本轮抠图训练对应的第二精细分割子损失;和/或,The seventh acquisition subunit is configured to calculate the segmentation information of the current round of matting training output and the multi-scale Laplacian loss between the multiple fine segmentation results and the second segmentation label respectively, to obtain the current round of matting. Train the corresponding second fine segmentation sub-loss; and/or,
第八获得子单元,被配置为计算本轮抠图训练输出的分割信息以及多个精细分割结果分别与所述第二分割标签之间的梯度损失,得到本轮抠图训练对应的第三精细分割子损失;和/或,The eighth acquisition subunit is configured to calculate the segmentation information output by the current round of matting training and the gradient loss between the multiple fine segmentation results and the second segmentation labels, and obtain the third fine corresponding to the current round of matting training. split sub-loss; and/or,
第九获得子单元,被配置为计算多个预测合成图分别与标签合成图之间的合成损失,得到本轮抠图训练对应的第四精细分割子损失,所述多个预测合成图为根据本轮抠图训练输出的多个精细分割结果以及分割信息得到的多个抠图目标分别与背景图合成的图像, 所述标签合成图为根据所述第二分割标签得到的抠图目标与所述背景图合成的图像。The ninth acquisition sub-unit is configured to calculate the synthesis loss between multiple predicted synthetic images and the label synthetic image respectively, and obtain the fourth fine segmentation sub-loss corresponding to the current round of matting training. The multiple predicted synthetic images are based on The multiple fine segmentation results output by this round of matting training and the multiple matting targets obtained from the segmentation information are respectively combined with the background image. The label composite image is the matting target obtained according to the second segmentation label and the image. The image is synthesized from the background image.
可选地,所述装置还包括:Optionally, the device also includes:
背景虚化模块,被配置为根据所述抠图目标,对所述目标图像中所述抠图目标之外的部分进行虚化处理,得到背景虚化图像。The background blur module is configured to blur the portion of the target image other than the cutout target according to the cutout target to obtain a background blurred image.
可选地,所述装置还包括:Optionally, the device also includes:
第四获取模块,被配置为获取目标背景图像;The fourth acquisition module is configured to acquire the target background image;
第五获得模块,被配置为根据所述抠图目标,对所述目标图像进行抠图处理,得到抠图目标图像;The fifth acquisition module is configured to perform cutout processing on the target image according to the cutout target to obtain the cutout target image;
背景替换模块,被配置为对所述抠图目标图像和所述目标背景图像进行合成处理,得到背景替换图像。The background replacement module is configured to synthesize the cutout target image and the target background image to obtain a background replacement image.
可选地,所述原始抠图网络包括特征提取模块、空洞卷积池化模块、上采样模块和多倍上采样模块;Optionally, the original matting network includes a feature extraction module, a dilated convolution pooling module, an upsampling module and a multiple upsampling module;
所述分割模块1902,包括:The segmentation module 1902 includes:
特征提取子模块,被配置为通过所述特征提取模块对所述目标图像进行特征提取,得到原始特征向量;A feature extraction submodule configured to perform feature extraction on the target image through the feature extraction module to obtain an original feature vector;
上下文提取子模块,被配置为通过所述空洞卷积池化模块,对所述原始特征向量进行上下文提取,得到上下文特征向量;The context extraction sub-module is configured to perform context extraction on the original feature vector through the atrous convolution pooling module to obtain a context feature vector;
第一上采样子模块,被配置为通过上采样模块对所述上下文特征向量和所述原始特征向量进行上采样,得到精细分割结果;The first upsampling sub-module is configured to upsample the context feature vector and the original feature vector through the upsampling module to obtain a fine segmentation result;
第二上采样子模块,被配置为通过多倍上采样模块,对所述精细分割结果和所述目标图像进行上采样,得到所述分割信息。The second upsampling sub-module is configured to upsample the fine segmentation result and the target image through a multiple upsampling module to obtain the segmentation information.
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the devices in the above embodiments, the specific manner in which each module performs operations has been described in detail in the embodiments related to the method, and will not be described in detail here.
图20是根据一示例性实施例示出的一种图像处理装置的结构框图。例如,装置2000可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。Figure 20 is a structural block diagram of an image processing device according to an exemplary embodiment. For example, the device 2000 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like.
参照图20,装置2000可以包括以下一个或多个组件:处理组件2002,存储器2004,电源组件2006,多媒体组件2008,音频组件2010,输入/输出接口2012,传感器组件2014,以及通信组件2016。Referring to Figure 20, device 2000 may include one or more of the following components: processing component 2002, memory 2004, power supply component 2006, multimedia component 2008, audio component 2010, input/output interface 2012, sensor component 2014, and communication component 2016.
处理组件2002通常控制装置2000的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件2002可以包括一个或多个处理器2020来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件2002可以包括一个或多个模块,便于处理组件2002和其他组件之间的交互。例如,处理组件2002可以包括多媒体模块,以方便多媒体组件2008和处理组件2002之间的交互。 Processing component 2002 generally controls the overall operations of device 2000, such as operations associated with display, phone calls, data communications, camera operations, and recording operations. The processing component 2002 may include one or more processors 2020 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 2002 may include one or more modules that facilitate interaction between processing component 2002 and other components. For example, processing component 2002 may include a multimedia module to facilitate interaction between multimedia component 2008 and processing component 2002.
存储器2004被配置为存储各种类型的数据以支持在装置2000的操作。这些数据的示例包括用于在装置2000上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器2004可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。 Memory 2004 is configured to store various types of data to support operations at device 2000. Examples of such data include instructions for any application or method operating on device 2000, contact data, phonebook data, messages, pictures, videos, etc. Memory 2004 may be implemented by any type of volatile or non-volatile storage device, or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EEPROM), Programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
电源组件2006为装置2000的各种组件提供电力。电源组件2006可以包括电源管理系统,一个或多个电源,及其他与为装置2000生成、管理和分配电力相关联的组件。 Power supply component 2006 provides power to various components of device 2000. Power supply components 2006 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to device 2000.
多媒体组件2008包括在所述装置2000和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸 面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件2008包括一个前置摄像头和/或后置摄像头。当装置2000处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。 Multimedia component 2008 includes a screen that provides an output interface between the device 2000 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide action. In some embodiments, multimedia component 2008 includes a front-facing camera and/or a rear-facing camera. When the device 2000 is in an operating mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front-facing camera and rear-facing camera can be a fixed optical lens system or have a focal length and optical zoom capabilities.
音频组件2010被配置为输出和/或输入音频信号。例如,音频组件2010包括一个麦克风(MIC),当装置2000处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器2004或经由通信组件2016发送。在一些实施例中,音频组件2010还包括一个扬声器,用于输出音频信号。 Audio component 2010 is configured to output and/or input audio signals. For example, audio component 2010 includes a microphone (MIC) configured to receive external audio signals when device 2000 is in operating modes, such as call mode, recording mode, and speech recognition mode. The received audio signals may be further stored in memory 2004 or sent via communications component 2016 . In some embodiments, audio component 2010 also includes a speaker for outputting audio signals.
输入/输出接口2012为处理组件2002和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。The input/output interface 2012 provides an interface between the processing component 2002 and a peripheral interface module. The peripheral interface module may be a keyboard, a click wheel, a button, etc. These buttons may include, but are not limited to: Home button, Volume buttons, Start button, and Lock button.
传感器组件2014包括一个或多个传感器,用于为装置2000提供各个方面的状态评估。例如,传感器组件2014可以检测到装置2000的打开/关闭状态,组件的相对定位,例如所述组件为装置2000的显示器和小键盘,传感器组件2014还可以检测装置2000或装置2000一个组件的位置改变,用户与装置2000接触的存在或不存在,装置2000方位或加速/减速和装置2000的温度变化。传感器组件2014可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件2014还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件2014还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。 Sensor component 2014 includes one or more sensors that provide various aspects of status assessment for device 2000 . For example, the sensor component 2014 can detect the open/closed state of the device 2000, the relative positioning of components, such as the display and keypad of the device 2000, and the sensor component 2014 can also detect the position change of the device 2000 or a component of the device 2000. , the presence or absence of user contact with device 2000 , device 2000 orientation or acceleration/deceleration and temperature changes of device 2000 . Sensor assembly 2014 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 2014 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 2014 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
通信组件2016被配置为便于装置2000和其他设备之间有线或无线方式的通信。装置2000可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信组件2016经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件2016还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。 Communication component 2016 is configured to facilitate wired or wireless communication between apparatus 2000 and other devices. Device 2000 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 2016 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communications component 2016 also includes a near field communications (NFC) module to facilitate short-range communications. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
在示例性实施例中,装置2000可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。In an exemplary embodiment, apparatus 2000 may be configured by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable Gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are implemented for executing the above method.
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器2004,上述指令可由装置2000的处理器2020执行以完成上述方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, a non-transitory computer-readable storage medium including instructions, such as a memory 2004 including instructions, which can be executed by the processor 2020 of the device 2000 to complete the above method is also provided. For example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
在另一示例性实施例中,还提供一种计算机程序产品,该计算机程序产品包含能够由可编程的装置执行的计算机程序,该计算机程序具有当由该可编程的装置执行时用于执行上述的图像处理方法的代码部分。In another exemplary embodiment, a computer program product is also provided, the computer program product comprising a computer program executable by a programmable device, the computer program having a function for performing the above when executed by the programmable device. The code part of the image processing method.
本领域技术人员在考虑说明书及实践本公开后,将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要 求指出。Other embodiments of the disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any variations, uses, or adaptations of the disclosure that follow the general principles of the disclosure and include common knowledge or customary technical means in the technical field that are not disclosed in the disclosure. . It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It is to be understood that the present disclosure is not limited to the precise structures described above and illustrated in the accompanying drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the disclosure is limited only by the appended claims.

Claims (16)

  1. 一种图像处理方法,其特征在于,包括:An image processing method, characterized by including:
    获取目标图像;Get the target image;
    通过目标抠图模型确定用于分割所述目标图像的分割信息,所述目标抠图模型基于第一样本分割图像和样本抠图图像交替训练基础抠图网络得到,所述基础抠图网络基于第二样本分割图像训练原始抠图网络得到,所述第一样本分割图像和所述第二样本分割图像携带第一分割标签,所述样本抠图图像携带第二分割标签,所述第一分割标签的分割粒度大于所述第二分割标签的分割粒度;The segmentation information used to segment the target image is determined through a target matting model. The target matting model is obtained by alternately training a basic matting network based on the first sample segmentation image and the sample matting image. The basic matting network is based on The second sample segmentation image is obtained by training the original matting network, the first sample segmentation image and the second sample segmentation image carry a first segmentation label, the sample matting image carries a second segmentation label, and the first The segmentation granularity of the segmentation label is greater than the segmentation granularity of the second segmentation label;
    根据所述分割信息,确定所述目标图像中的抠图目标。According to the segmentation information, the cutout target in the target image is determined.
  2. 根据权利要求1所述的图像处理方法,其特征在于,所述原始抠图网络包括由多个上采样卷积模块组成的上采样模块,每个上采样卷积模块连接一个粗分割预测头,每个所述粗分割预测头用于输出对应的上采样卷积模块上采样后的粗分割结果。The image processing method according to claim 1, characterized in that the original matting network includes an upsampling module composed of multiple upsampling convolution modules, each upsampling convolution module is connected to a coarse segmentation prediction head, Each of the coarse segmentation prediction heads is used to output the coarse segmentation result upsampled by the corresponding upsampling convolution module.
  3. 根据权利要求1所述的图像处理方法,其特征在于,所述基础抠图网络通过以下步骤得到:The image processing method according to claim 1, characterized in that the basic matting network is obtained through the following steps:
    根据多个所述第二样本分割图像,对所述原始抠图网络进行多轮迭代的分割训练;Carry out multiple rounds of iterative segmentation training on the original matting network according to the plurality of second sample segmented images;
    在每一轮迭代训练之后,获取本轮分割训练输出的多个粗分割结果;After each round of iterative training, multiple coarse segmentation results output from this round of segmentation training are obtained;
    根据本轮分割训练输出的多个粗分割结果以及本轮分割训练中的第二样本分割图像携带的第一分割标签,得到本轮分割训练对应的语义分割损失;Based on the multiple rough segmentation results output by this round of segmentation training and the first segmentation label carried by the second sample segmentation image in this round of segmentation training, the semantic segmentation loss corresponding to this round of segmentation training is obtained;
    根据所述本轮分割训练对应的语义分割损失,对所述原始抠图网络进行优化;Optimize the original matting network according to the semantic segmentation loss corresponding to the current round of segmentation training;
    当所述原始抠图网络收敛时,停止训练,得到所述基础抠图网络。When the original matting network converges, training is stopped and the basic matting network is obtained.
  4. 根据权利要求1所述的图像处理方法,其特征在于,所述基础抠图网络包括由多个上采样卷积模块组成的上采样模块,每个上采样卷积模块分别连接一个粗分割预测头和一个精细分割预测头,每个所述粗分割预测头用于输出对应的上采样卷积模块上采样后的粗分割结果,每个所述精细分割预测头用于输出对应的上采样卷积模块上采样后的精细分割结果。The image processing method according to claim 1, characterized in that the basic matting network includes an upsampling module composed of a plurality of upsampling convolution modules, and each upsampling convolution module is connected to a coarse segmentation prediction head. and a fine segmentation prediction head. Each of the coarse segmentation prediction heads is used to output the upsampled coarse segmentation result of the corresponding upsampling convolution module. Each of the fine segmentation prediction heads is used to output the corresponding upsampling convolution. Fine segmentation results after module upsampling.
  5. 根据权利要求1所述的图像处理方法,其特征在于,所述目标抠图模型通过以下步骤得到:The image processing method according to claim 1, characterized in that the target matting model is obtained through the following steps:
    根据多个所述第一样本分割图像和多个所述样本抠图图像,对所述基础抠图网络进行多轮交替迭代的分割训练和抠图训练;According to a plurality of the first sample segmentation images and a plurality of the sample matting images, perform multiple rounds of alternating iterative segmentation training and matting training on the basic matting network;
    在每一轮的分割训练之后,获取本轮分割训练对应的语义分割损失;After each round of segmentation training, the semantic segmentation loss corresponding to this round of segmentation training is obtained;
    根据所述本轮分割训练对应的语义分割损失,对所述基础抠图网络进行优化;Optimize the basic matting network according to the semantic segmentation loss corresponding to the current round of segmentation training;
    在每一轮的抠图训练之后,根据本轮抠图训练输出的多个粗分割结果、多个精细分割结果、分割信息以及本轮抠图训练中的样本抠图图像携带的第二分割标签,得到本轮抠图训练对应的精细分割总损失;After each round of matting training, multiple coarse segmentation results, multiple fine segmentation results, segmentation information output according to this round of matting training, and the second segmentation label carried by the sample matting image in this round of matting training are , get the total fine segmentation loss corresponding to this round of matting training;
    根据所述本轮抠图训练对应的精细分割总损失,对优化后的基础抠图网络再次进行优化;According to the total fine segmentation loss corresponding to the current round of matting training, the optimized basic matting network is optimized again;
    当所述基础抠图网络收敛时,停止训练,得到所述目标抠图模型。When the basic matting network converges, training is stopped and the target matting model is obtained.
  6. 根据权利要求5所述的图像处理方法,其特征在于,The image processing method according to claim 5, characterized in that:
    所述根据本轮抠图训练输出的多个粗分割结果、多个精细分割结果、分割信息以及本轮抠图训练中的样本抠图图像携带的第二分割标签,得到本轮抠图训练对应的精细分割总损失,包括:According to the multiple coarse segmentation results, multiple fine segmentation results, segmentation information output by this round of cutout training, and the second segmentation labels carried by the sample cutout images in this round of cutout training, the corresponding round of cutout training is obtained. The total loss of fine segmentation includes:
    根据预设分割值,对所述本轮抠图训练中的样本抠图图像携带的第二分割标签进行二值化处理,得到本轮抠图训练中的样本抠图图像对应的二值化分割标签;According to the preset segmentation value, the second segmentation label carried by the sample matting image in the current round of matting training is binarized to obtain the binarized segmentation corresponding to the sample matting image in this round of matting training. Label;
    根据本轮抠图训练输出的多个粗分割结果以及所述二值化分割标签,得到本轮抠图训练对应的语义分割损失;According to the multiple rough segmentation results output by this round of matting training and the binary segmentation labels, the semantic segmentation loss corresponding to this round of matting training is obtained;
    根据本轮抠图训练输出的多个精细分割结果、分割信息以及所述第二分割标签,得到本轮抠图训练对应的目标精细分割损失;According to the multiple fine segmentation results, segmentation information and the second segmentation label output by this round of matting training, the target fine segmentation loss corresponding to this round of matting training is obtained;
    根据所述本轮抠图训练对应的语义分割损失和所述本轮抠图训练对应的目标精细分割损失,得到所述本轮抠图训练对应的精细分割总损失。According to the semantic segmentation loss corresponding to the current round of cutout training and the target fine segmentation loss corresponding to the current round of cutout training, the total fine segmentation loss corresponding to the current round of cutout training is obtained.
  7. 根据权利要求3或5所述的图像处理方法,其特征在于,分割训练对应的语义分割损失的得到包括:The image processing method according to claim 3 or 5, characterized in that obtaining the semantic segmentation loss corresponding to segmentation training includes:
    根据多个粗分割结果分别与本轮分割训练对应的第一分割标签之间的距离,得到第一语义分割子损失;According to the distance between multiple coarse segmentation results and the first segmentation label corresponding to this round of segmentation training, the first semantic segmentation sub-loss is obtained;
    从所述多个粗分割结果中获取多对采样点对,根据所述多对采样点对分别与本轮分割训练对应的第一分割标签之间的距离,得到第二语义分割子损失;Obtain multiple pairs of sampling points from the plurality of coarse segmentation results, and obtain the second semantic segmentation sub-loss according to the distance between the multiple pairs of sampling points and the first segmentation labels corresponding to this round of segmentation training;
    根据所述第一语义分割子损失和所述第二语义分割子损失得到本轮分割训练对应的语义分割损失。The semantic segmentation loss corresponding to this round of segmentation training is obtained according to the first semantic segmentation sub-loss and the second semantic segmentation sub-loss.
  8. 根据权利要求6所述的图像处理方法,其特征在于,所述根据本轮抠图训练输出的多个粗分割结果以及所述二值化分割标签,得到本轮抠图训练对应的语义分割损失包括:The image processing method according to claim 6, wherein the semantic segmentation loss corresponding to the current round of cutout training is obtained based on the multiple rough segmentation results output by the current round of cutout training and the binary segmentation labels. include:
    根据本轮抠图训练输出的多个粗分割结果分别与所述二值化分割标签之间的距离,得到第三语义分割子损失;According to the distance between the multiple rough segmentation results output by this round of matting training and the binary segmentation labels, the third semantic segmentation sub-loss is obtained;
    从本轮抠图训练输出的多个粗分割结果中获取多对采样点对,根据所述多对采样点对分别与所述二值化分割标签之间的距离,得到第四语义分割子损失;Multiple pairs of sampling points are obtained from multiple rough segmentation results output from this round of matting training, and the fourth semantic segmentation sub-loss is obtained based on the distance between the multiple pairs of sampling points and the binary segmentation labels. ;
    根据所述第三语义分割子损失和所述第四语义分割子损失得到本轮抠图训练对应的语义分割损失。The semantic segmentation loss corresponding to this round of matting training is obtained according to the third semantic segmentation sub-loss and the fourth semantic segmentation sub-loss.
  9. 根据权利要求6所述的图像处理方法,其特征在于,The image processing method according to claim 6, characterized in that:
    所述目标精细分割损失包括第一精细分割子损失;The target fine segmentation loss includes a first fine segmentation sub-loss;
    所述根据本轮抠图训练输出的多个精细分割结果、分割信息以及所述第二分割标签,得到本轮抠图训练对应的目标精细分割损失,包括:According to the multiple fine segmentation results, segmentation information and the second segmentation label output by this round of matting training, the target fine segmentation loss corresponding to this round of matting training is obtained, including:
    计算本轮抠图训练输出的分割信息以及多个精细分割结果分别与所述第二分割标签之间的平均绝对误差;Calculate the average absolute error between the segmentation information output by this round of matting training and the multiple fine segmentation results and the second segmentation label;
    将所述平均绝对误差确定为本轮抠图训练对应的第一精细分割子损失。The average absolute error is determined as the first fine segmentation sub-loss corresponding to this round of matting training.
  10. 根据权利要求9所述的图像处理方法,其特征在于,The image processing method according to claim 9, characterized in that:
    所述目标精细分割损失还包括以下至少一者:第二精细分割子损失、第三精细分割子损失和第四精细分割子损失,相应地,所述根据本轮抠图训练输出的多个精细分割结果、分割信息以及所述第二分割标签,得到本轮抠图训练对应的目标精细分割损失,还包括:The target fine segmentation loss also includes at least one of the following: a second fine segmentation sub-loss, a third fine segmentation sub-loss, and a fourth fine segmentation sub-loss. Correspondingly, the multiple fine segmentation sub-losses output according to this round of matting training The segmentation results, segmentation information and the second segmentation label are used to obtain the target fine segmentation loss corresponding to this round of matting training, which also includes:
    计算本轮抠图训练输出的分割信息以及多个精细分割结果分别与所述第二分割标签之间的多尺度的拉普拉斯损失,得到本轮抠图训练对应的第二精细分割子损失;和/或,Calculate the multi-scale Laplacian loss between the segmentation information output by this round of matting training and multiple fine segmentation results respectively and the second segmentation label, and obtain the second fine segmentation sub-loss corresponding to this round of matting training. ;and / or,
    计算本轮抠图训练输出的分割信息以及多个精细分割结果分别与所述第二分割标签之间的梯度损失,得到本轮抠图训练对应的第三精细分割子损失;和/或,Calculate the gradient loss between the segmentation information output by this round of matting training and the multiple fine segmentation results and the second segmentation label respectively, to obtain the third fine segmentation sub-loss corresponding to this round of matting training; and/or,
    计算多个预测合成图分别与标签合成图之间的合成损失,得到本轮抠图训练对应的第四精细分割子损失,所述多个预测合成图为根据本轮抠图训练输出的多个精 细分割结果以及分割信息得到的多个抠图目标分别与背景图合成的图像,所述标签合成图为根据所述第二分割标签得到的抠图目标与所述背景图合成的图像。Calculate the synthesis loss between multiple predicted synthetic images and the label synthetic image respectively to obtain the fourth fine segmentation sub-loss corresponding to this round of matting training. The multiple predicted synthetic images are multiple output according to this round of matting training. The multiple cutout targets obtained from the fine segmentation results and the segmentation information are respectively combined with the background image, and the label composite image is an image combined with the background image and the cutout target obtained according to the second segmentation label.
  11. 根据权利要求1所述的图像处理方法,其特征在于,所述方法还包括:The image processing method according to claim 1, characterized in that the method further includes:
    根据所述抠图目标,对所述目标图像中所述抠图目标之外的部分进行虚化处理,得到背景虚化图像。According to the cutout target, the portion of the target image other than the cutout target is blurred to obtain a background blurred image.
  12. 根据权利要求1所述的图像处理方法,其特征在于,所述方法还包括:The image processing method according to claim 1, characterized in that the method further includes:
    获取目标背景图像;Get the target background image;
    根据所述抠图目标,对所述目标图像进行抠图处理,得到抠图目标图像;According to the cutout target, perform cutout processing on the target image to obtain the cutout target image;
    对所述抠图目标图像和所述目标背景图像进行合成处理,得到背景替换图像。The cutout target image and the target background image are synthesized to obtain a background replacement image.
  13. 根据权利要求1所述的图像处理方法,其特征在于,The image processing method according to claim 1, characterized in that:
    所述原始抠图网络包括特征提取模块、空洞卷积池化模块、上采样模块和多倍上采样模块;The original matting network includes a feature extraction module, a dilated convolution pooling module, an upsampling module and a multiple upsampling module;
    所述通过目标抠图模型确定用于分割所述目标图像的分割信息,包括:Determining segmentation information for segmenting the target image through the target matting model includes:
    通过所述特征提取模块对所述目标图像进行特征提取,得到原始特征向量;Extract features from the target image through the feature extraction module to obtain original feature vectors;
    通过所述空洞卷积池化模块,对所述原始特征向量进行上下文提取,得到上下文特征向量;Use the dilated convolution pooling module to perform context extraction on the original feature vector to obtain a context feature vector;
    通过上采样模块对所述上下文特征向量和所述原始特征向量进行上采样,得到精细分割结果;The context feature vector and the original feature vector are upsampled through an upsampling module to obtain a fine segmentation result;
    通过多倍上采样模块,对所述精细分割结果和所述目标图像进行上采样,得到所述分割信息。Through a multiple upsampling module, the fine segmentation result and the target image are upsampled to obtain the segmentation information.
  14. 一种图像处理装置,其特征在于,包括:An image processing device, characterized in that it includes:
    第一获取模块,被配置为获取目标图像;The first acquisition module is configured to acquire the target image;
    分割模块,被配置为通过目标抠图模型确定用于分割所述目标图像的分割信息,所述目标抠图模型基于第一样本分割图像和样本抠图图像交替训练基础抠图网络得到,所述基础抠图网络基于第二样本分割图像训练得到,所述第一样本分割图像和所述第二样本分割图像携带第一分割标签,所述样本抠图图像携带第二分割标签,所述第一分割标签的分割粒度大于所述第二分割标签的分割粒度;The segmentation module is configured to determine segmentation information for segmenting the target image through a target matting model. The target matting model is obtained by alternately training a basic matting network based on the first sample segmentation image and the sample matting image. The basic matting network is trained based on the second sample segmentation image, the first sample segmentation image and the second sample segmentation image carry a first segmentation label, the sample matting image carries a second segmentation label, and the The segmentation granularity of the first segmentation label is greater than the segmentation granularity of the second segmentation label;
    抠图目标确定模块,被配置为根据所述分割信息,确定所述目标图像中的抠图目标。The cutout target determination module is configured to determine the cutout target in the target image according to the segmentation information.
  15. 一种图像处理装置,其特征在于,包括:An image processing device, characterized in that it includes:
    处理器;processor;
    用于存储处理器可执行指令的存储器;Memory used to store instructions executable by the processor;
    其中,所述处理器被配置为执行权利要求1-13中任一项所述方法的步骤。Wherein, the processor is configured to perform the steps of the method according to any one of claims 1-13.
  16. 一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,该程序指令被处理器执行时实现权利要求1-13中任一项所述方法的步骤。A computer-readable storage medium on which computer program instructions are stored, characterized in that when the program instructions are executed by a processor, the steps of the method described in any one of claims 1-13 are implemented.
PCT/CN2022/096483 2022-05-31 2022-05-31 Image processing method and device, and readable storage medium WO2023230927A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202280004202.6A CN117501309A (en) 2022-05-31 2022-05-31 Image processing method, device and readable storage medium
PCT/CN2022/096483 WO2023230927A1 (en) 2022-05-31 2022-05-31 Image processing method and device, and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/096483 WO2023230927A1 (en) 2022-05-31 2022-05-31 Image processing method and device, and readable storage medium

Publications (1)

Publication Number Publication Date
WO2023230927A1 true WO2023230927A1 (en) 2023-12-07

Family

ID=89026715

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/096483 WO2023230927A1 (en) 2022-05-31 2022-05-31 Image processing method and device, and readable storage medium

Country Status (2)

Country Link
CN (1) CN117501309A (en)
WO (1) WO2023230927A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986101A (en) * 2018-05-31 2018-12-11 浙江大学 Human body image dividing method based on circulation " scratching figure-segmentation " optimization
CN110517278A (en) * 2019-08-07 2019-11-29 北京旷视科技有限公司 Image segmentation and the training method of image segmentation network, device and computer equipment
US20200175700A1 (en) * 2018-11-29 2020-06-04 Adobe Inc. Joint Training Technique for Depth Map Generation
CN112489063A (en) * 2020-12-10 2021-03-12 北京金山云网络技术有限公司 Image segmentation method, and training method and device of image segmentation model
CN114120068A (en) * 2021-11-04 2022-03-01 腾讯科技(深圳)有限公司 Image processing method, image processing device, electronic equipment, storage medium and computer product
CN114445625A (en) * 2022-02-09 2022-05-06 携程旅游信息技术(上海)有限公司 Picture sky extraction method, system, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986101A (en) * 2018-05-31 2018-12-11 浙江大学 Human body image dividing method based on circulation " scratching figure-segmentation " optimization
US20200175700A1 (en) * 2018-11-29 2020-06-04 Adobe Inc. Joint Training Technique for Depth Map Generation
CN110517278A (en) * 2019-08-07 2019-11-29 北京旷视科技有限公司 Image segmentation and the training method of image segmentation network, device and computer equipment
CN112489063A (en) * 2020-12-10 2021-03-12 北京金山云网络技术有限公司 Image segmentation method, and training method and device of image segmentation model
CN114120068A (en) * 2021-11-04 2022-03-01 腾讯科技(深圳)有限公司 Image processing method, image processing device, electronic equipment, storage medium and computer product
CN114445625A (en) * 2022-02-09 2022-05-06 携程旅游信息技术(上海)有限公司 Picture sky extraction method, system, equipment and storage medium

Also Published As

Publication number Publication date
CN117501309A (en) 2024-02-02

Similar Documents

Publication Publication Date Title
WO2020224457A1 (en) Image processing method and apparatus, electronic device and storage medium
CN111310616B (en) Image processing method and device, electronic equipment and storage medium
CN109658401B (en) Image processing method and device, electronic equipment and storage medium
CN108629354B (en) Target detection method and device
WO2020088280A1 (en) Image style transfer method and system
WO2020134556A1 (en) Image style transfer method, device, electronic apparatus, and storage medium
US9153031B2 (en) Modifying video regions using mobile device input
CN110889851B (en) Robust use of semantic segmentation for depth and disparity estimation
WO2020134866A1 (en) Key point detection method and apparatus, electronic device, and storage medium
CN112767329B (en) Image processing method and device and electronic equipment
US20210319538A1 (en) Image processing method and device, electronic equipment and storage medium
US20170053156A1 (en) Human face recognition method, apparatus and terminal
WO2021208667A1 (en) Image processing method and apparatus, electronic device, and storage medium
JP2022522551A (en) Image processing methods and devices, electronic devices and storage media
CN109784164B (en) Foreground identification method and device, electronic equipment and storage medium
US20220392202A1 (en) Imaging processing method and apparatus, electronic device, and storage medium
CN114820584B (en) Lung focus positioner
WO2022110969A1 (en) Unsupervised image segmentation method, electronic device, and storage medium
CN109509195B (en) Foreground processing method and device, electronic equipment and storage medium
CN113409342A (en) Training method and device for image style migration model and electronic equipment
CN116824533A (en) Remote small target point cloud data characteristic enhancement method based on attention mechanism
CN109784327B (en) Boundary box determining method and device, electronic equipment and storage medium
CN114677517A (en) Semantic segmentation network model for unmanned aerial vehicle and image segmentation identification method
CN107992894B (en) Image recognition method, image recognition device and computer-readable storage medium
WO2023230927A1 (en) Image processing method and device, and readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22944263

Country of ref document: EP

Kind code of ref document: A1