WO2023071810A1

WO2023071810A1 - Image processing

Info

Publication number: WO2023071810A1
Application number: PCT/CN2022/125012
Authority: WO
Inventors: 程俊奇; 四建楼; 钱晨
Original assignee: 上海商汤智能科技有限公司
Priority date: 2021-10-29
Filing date: 2022-10-13
Publication date: 2023-05-04
Also published as: CN113920032A

Abstract

Embodiments of the present disclosure provide an image processing method and apparatus, an electronic device, and a storage medium. The method comprises: performing matting processing on an original image to obtain a matting result, the matting result comprising a first image and a transparency map corresponding to the original image; determining a color difference between a reserved area and a target area according to a difference between pixels in the first image and pixels in a material image containing the target area; performing tone adjustment on the pixels in the first image according to the color difference to obtain a second image matching the tone of the target area; and on the basis of the transparency map, performing image fusion on the second image and the material image to obtain a target image.

Description

Image Processing

Cross References to Related Applications

This application claims priority to a Chinese patent application with application number CN2021112739847 filed with the China Patent Office on October 29, 2021, the entire contents of which are incorporated in this disclosure by reference.

technical field

The present disclosure relates to computer vision techniques, and more particularly to image processing.

Background technique

As a basic image editing technology, region replacement is widely used in various image editing software, camera back-end algorithms and other scenarios. Segmentation models are usually used to semantically segment the original image, resulting in a rough mask of the replaced region in the original image. Then, according to the mask result, it is fused with the image containing the target area, so as to replace the replaced area with the target area.

Contents of the invention

In view of this, the embodiments of the present disclosure at least provide an image processing method, device, electronic device, and storage medium.

The present disclosure provides an image processing method, the method comprising: performing matting processing on an original image to obtain a matting result, the matting result including a first image and a transparency map corresponding to the original image; the The first image includes a reserved area in the original image, and the reserved area is the foreground or background in the original image; the numerical value of the pixel in the transparency map indicates the transparency of the pixel; according to the pixel in the first image and the difference between pixels in the material image containing the target area, determine the color difference between the reserved area and the target area; the target area is used to replace the non-retained area in the original image; according to the The color difference is used to adjust the hue of the pixels in the first image to obtain a second image that matches the tone of the target area; based on the transparency map, image fusion is performed on the second image and the material image to obtain target image.

The present disclosure proposes an image processing device, which includes: a map-cutting module, configured to perform map-cutting processing on an original image to obtain a map-cutting result, the map-cutting result including a first image and an image corresponding to the original image The transparency map; the first image includes a reserved area in the original image, and the reserved area is the foreground or background in the original image; the numerical value of the pixel in the transparency map indicates the transparency of the pixel; the determination module, It is used to determine the color difference between the reserved area and the target area according to the difference between the pixels in the first image and the pixels in the material image containing the target area; the target area is used to replace the target area The non-retained area in the original image; the adjustment module is used to adjust the color tone of the pixels in the first image according to the color difference to obtain a second image that matches the color tone of the target area; the fusion module is used to adjust the color tone based on the color difference. For the transparency map, image fusion is performed on the second image and the material image to obtain a target image.

The present disclosure proposes an electronic device, including: a processor; a memory for storing executable instructions of the processor; wherein, the processor executes the executable instructions to implement the image processing method shown in any of the foregoing embodiments .

The present disclosure provides a computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to cause a processor to execute the image processing method as shown in any one of the foregoing embodiments.

In the technical solutions disclosed in the image processing method, device, electronic equipment, and storage medium provided by the embodiments of the present disclosure, on the one hand, the pixel values of the pixels in the reserved area and the pixel values of the pixels in the target area obtained by matting the original image can be Pixel value, determine the color difference between the reserved area and the target area, and then adjust the color tone of the pixels in the reserved area according to the color difference, and unify the pixels in the reserved area and the target area The tone of the pixel, so that in the process of area replacement, it can ensure that the tone of the target area matches the tone of the non-retained area in the original image, thereby improving the effect of area replacement. On the other hand, the three-part map can be used for matting, so that the detailed information of the junction position between the reserved area and the non-reserved area can be well preserved. When performing area replacement, it is beneficial to improve the connection between the reserved area and the non-reserved area. Effect. On the other hand, the keying network can also be processed by channel compression, etc., and the original image can be scaled, so that the time-consuming and memory consumption of the keying process can be within the processing capability of the mobile terminal, so that there is no need to go through the server. Region replacement ensures data security and privacy.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

Description of drawings

In order to more clearly illustrate the technical solutions in one or more embodiments of the present disclosure or related technologies, the following will briefly introduce the drawings that need to be used in the descriptions of the embodiments or related technologies. Obviously, the accompanying drawings in the following description The drawings are only some embodiments described in one or more embodiments of the present disclosure, and those skilled in the art can obtain other drawings based on these drawings without any creative effort.

FIG. 1 is a schematic flowchart of an image processing method shown in an embodiment of the present disclosure;

FIG. 2 is a schematic flowchart of a method for determining color differences shown in an embodiment of the present disclosure;

FIG. 3 is a schematic flowchart of a method for determining color differences shown in an embodiment of the present disclosure;

FIG. 4 is a schematic flow chart of a tone adjustment method shown in an embodiment of the present disclosure;

FIG. 5 is a schematic flow diagram of a map-cutting method shown in an embodiment of the present disclosure;

FIG. 6 is a schematic flowchart of a method for obtaining a tripartite graph shown in an embodiment of the present disclosure;

Fig. 7a is a schematic diagram of a character image shown in an embodiment of the present disclosure;

Fig. 7b is a schematic diagram of a semantic probability map shown in an embodiment of the present disclosure;

Fig. 7c is a schematic diagram of a tripartite graph shown in an embodiment of the present disclosure;

Fig. 7d is a schematic diagram of a transparency map shown in an embodiment of the present disclosure;

Fig. 7e is a schematic diagram of a foreground image shown in an embodiment of the present disclosure;

Fig. 7f is a schematic diagram of a target image shown in an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of a region replacement process shown in an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of an area replacement process based on FIG. 8;

FIG. 10 is a schematic flowchart of a network training method shown in an embodiment of the present disclosure;

FIG. 11 is a schematic structural diagram of an image processing method shown in an embodiment of the present disclosure;

Fig. 12 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.

Detailed ways

Exemplary embodiments will be described in detail below with reference to the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of devices and methods consistent with aspects of the present disclosure as recited in the appended claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only, and is not intended to limit the present disclosure. As used in this disclosure and the appended claims, the singular forms "a", "the", and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. It should also be understood that the word "if", as used herein, could be interpreted as "at" or "when" or "in response to a determination", depending on the context.

This disclosure relates to the field of augmented reality. By acquiring the image information of the target object in the real environment, and then using various visual correlation algorithms to detect or identify the relevant features, states and attributes of the target object, and thus obtain the image information that matches the specific application. AR (Augmented Reality) effect combining virtual and reality. Exemplarily, the target object may involve faces, limbs, gestures, actions, etc. related to the human body, or markers and markers related to objects, or sand tables, display areas or display items related to venues or places. Vision-related algorithms may involve visual positioning, SLAM (Simultaneous Localization and Mapping), 3D reconstruction, image registration, background segmentation, object key point extraction and tracking, object pose or depth detection, etc. Specific applications can not only involve interactive scenes such as guided tours, navigation, explanations, reconstructions, virtual effect overlays and display related to real scenes or objects, but also special effects processing related to people, such as makeup beautification, body beautification, special effect display, virtual Interactive scenarios such as model display. The relevant features, states and attributes of the target object can be detected or identified through the convolutional neural network. The aforementioned convolutional neural network is a neural network model obtained through model training based on a deep learning framework.

As a basic image editing technology, region replacement is widely used in various image editing software, camera back-end algorithms and other scenarios. Segmentation models are usually used to semantically segment the original image, resulting in a rough mask of the replaced region in the original image. Then, according to the mask result, it is fused with the image containing the target area, so as to replace the replaced area with the target area. However, in the related art, there is an obvious color tone difference between the target area and the original image, and a direct replacement often results in an obvious inconsistency in picture color tone, resulting in a poor area replacement effect.

In view of this, the present disclosure proposes an image processing method. This method can ensure that the tone of the target area matches the reserved area in the original image during area replacement, thereby improving the effect of area replacement.

Please refer to FIG. 1 , which is a schematic flowchart of an image processing method shown in an embodiment of the present disclosure.

The processing method shown in FIG. 1 can be applied to electronic equipment. Wherein, the electronic device may execute the method by carrying software logic corresponding to the processing method. The type of the electronic device may be a notebook computer, a computer, a mobile phone, a PDA (Personal Digital Assistant, PDA) and the like. The type of the electronic device is not particularly limited in the present disclosure. The electronic device may also be a client device and/or a server device, which is not specifically limited here.

As shown in Fig. 1, the image processing method may include S102-S108. Unless otherwise specified, the present disclosure does not specifically limit the execution order of these steps.

S102. Perform matting processing on the original image to be processed to obtain a matting result, the matting result including a first image and a transparency map corresponding to the original image; the first image includes A reserved area, the reserved area is the foreground or background in the original image; the value of the pixel in the transparency map indicates the transparency of the pixel.

The original image needs to be replaced. The original image may include reserved areas and non-reserved areas. In an area replacement scenario, the non-reserved area is generally used as a replaced area, which is replaced by other materials. The reserved area and the non-reserved area can be distinguished by image processing techniques.

The reserved area refers to an area that is reserved and not replaced during the process of performing area replacement on the image. For example, in a scene where background areas are replaced, foreground areas are preserved. For example, the background area (such as the sky area) in the person image needs to be replaced, and the foreground area containing the person can be used as the reserved area. In scenes where the foreground area is replaced, the background area is the preserved area.

The first image may include a reserved area cut out from the original image. The first image has the same size as the original image. Areas in the first image other than the reserved area may be filled with pixels of preset pixel values. For example, the preset pixel value may be 0, 1 and so on.

The transparency map is used to distinguish reserved areas and non-reserved areas by different values of transparency. The value of a pixel within the transparency map indicates the transparency of the corresponding pixel. In some embodiments, the transparency values of the pixels belonging to the reserved area in the transparency map are the first value, and the transparency values of the pixels belonging to the non-reserved area are the second value.

In different scenarios, the first value and the second value will vary.

For example, in a scene where the area is replaced and the original non-reserved area will not be retained, the first value of the transparency map may be 1, indicating that the pixels in the reserved area are non-transparent, and the first value of the transparency map The second value can be 0, indicating that the pixels in the non-reserved area are transparent. The non-reserved area is replaced by this transparency, and the original non-reserved area is not preserved at all. For another example, in the scene where area replacement needs to blur the original non-reserved area, the first value of the transparency map can be 1, indicating that the pixels in the reserved area are non-transparent, and the first value of the transparency map The second value can be 0.3, indicating that the pixels in the non-reserved area are semi-transparent. By replacing the non-reserved area with this transparency, the original non-reserved area can be blurred. In some implementations, in S102, the three-part map corresponding to the original image to be processed can be obtained; for each pixel in the three-part map, the value corresponding to the pixel indicates that the pixel belongs to the reserved area, non-reserved area, or area to be determined; then, according to the tripartite map, the original image can be matted to obtain the first image and the transparency map.

The trimap has the characteristics of distinguishing the foreground, the background, and the transition area between the foreground and the background in the image. That is, regardless of whether the reserved area is the foreground or the background in the original image, the three-part map is used to distinguish the reserved area in the original image, the non-reserved area, and the area to be determined between the reserved area and the non-reserved area, so as to save the reserved area Details of the location of the handoff with the non-reserved area.

In some implementation manners, a pre-trained matting network may also be used for matting processing. The matting network is trained in a supervised manner through training samples marked with transparency information and reserved area information in advance. The first image and the transparency map can be obtained by inputting the original image into the matting network.

S104. Determine the color difference between the reserved area and the target area according to the difference between the pixels in the first image and the pixels in the material image containing the target area; the target area in the material image Used to replace non-preserved regions in the original image.

The material images are generally some pre-acquired images, and these images contain replacement materials used to replace non-reserved areas. The areas occupied by these replacement materials in the material image may be referred to as target areas. For example, in the sky replacement scene, the material image may contain some sky materials, and these sky materials may be used to replace the sky in the original image (that is, the non-preserved area in the original image).

The color difference refers to the pixel value difference between the pixels in the reserved area and the pixels in the target area.

A pixel value of a pixel may indicate a color value of the pixel. Exemplarily, the color difference between the reserved area and the target area may be obtained by calculating an average difference between pixel values of pixels in the first image and pixel values of pixels in the material image.

In some implementations, the reserved area and the target area can be sampled by sampling, and the color difference can be determined by the pixel value of the sampling point, thereby reducing the amount of computation for determining the color difference, thereby improving the efficiency of area replacement.

Please refer to FIG. 2 . FIG. 2 is a schematic flowchart of a method for determining color differences shown in an embodiment of the present disclosure. The steps shown in FIG. 2 are descriptions of S104. As shown in Fig. 2, the method for determining the color difference may include S202-S204. Unless otherwise specified, the present disclosure does not specifically limit the execution order of these steps.

S202. Sampling the pixels in the first image and the pixels in the material image respectively to obtain a first sampling point and a second sampling point.

In some implementations, the step size (step) can be preset. For example, the result obtained by dividing the short side of the original image by a preset value (for example, 10, 20, 30, etc.) may be determined as the step. For the first image, sampling may be performed in a preset order (for example, from left to right, from top to bottom) and a set step size to obtain some first sampling points; and for the material image, Sampling is performed according to the preset order (for example, from left to right, from top to bottom) and a set step size to obtain some second sampling points.

S204. Based on the difference between the pixel value of the first sampling point and the pixel value of the second sampling point, determine the color difference between the reserved area and the target area.

In some embodiments, the pixel mean value or pixel median value of the first sampling point can be determined according to the pixel value of the first sampling point, and the pixel value of the second sampling point can be determined according to the pixel value of the second sampling point Mean or pixel median value. The color difference is then determined based on the difference between two pixel means or two pixel median values.

Therefore, the pixel value of the sampling point is represented according to the pixel mean value or the pixel median value of the sampling point, which can simplify the operation.

In some implementations, when calculating the pixel mean value of the first sampling point, the transparency of the first sampling point can be combined, so that the determined pixel mean value is more accurate, which helps to accurately determine the color difference between the reserved area and the target area , thereby enhancing the tone adjustment effect.

In this implementation manner, the pixels in the transparency map may be first sampled to obtain the third sampling point.

Specifically, the steps disclosed in the foregoing S202 may be used for sampling to obtain some third sampling points.

Please refer to FIG. 3 . FIG. 3 is a schematic flowchart of a method for determining color differences shown in an embodiment of the present disclosure. The steps shown in FIG. 3 are supplementary descriptions of S204. As shown in Fig. 3, the method for determining the color difference may include S302-S306. Unless otherwise specified, the present disclosure does not specifically limit the execution order of these steps.

S302. Determine a first pixel average value of the first sampling point based on the pixel value of the first sampling point and the transparency value of the third sampling point.

In order to distinguish, the disclosure refers to the pixel mean value of the first sampling point determined based on the pixel value of the first sampling point and the transparency value of the third sampling point as the first pixel mean value.

The embodiment of the present disclosure does not limit how to determine the specific formula of the first pixel mean value, and the following is only an example:

Formula (1) above, wherein fg_mean indicates the first pixel mean value. FG1 refers to the pixel value of the first sampling point. Alpha1 refers to the transparency value of the third sampling point. The exact mean value of the first pixel can be obtained by combining the transparency of the sampling point through the formula (1).

S304. Determine a second average pixel value of the second sampling point based on the pixel value of the second sampling point.

In this disclosure, the pixel mean value of the second sampling point is referred to as the second pixel mean value.

The second pixel mean value bg_mean can be obtained by averaging BG1 through the mean number calculation formula. Wherein BG1 is the pixel value of the second sampling point.

S306. Determine the color difference between the reserved area and the target area according to the difference between the first pixel average value and the second pixel average value.

A pixel value of a pixel may indicate color information of the pixel. The difference in color of a pixel can be determined by the difference in pixel value. The embodiment of the present disclosure does not limit the specific formula of how to determine the color difference, and the following is only an example:

diff=bg_mean-fg_mean…………………(2)

Formula (2) above, where diff indicates the color difference. bg_mean indicates to get the second pixel mean. fg_mean indicates the first pixel mean.

Therefore, according to steps S302-S304, on the one hand, the pixel mean value or pixel median value of the sampling point can be used to represent the pixel value of the sampling point, which can simplify the operation. On the other hand, when calculating the pixel mean value of the first sampling point, the transparency of the sampling point is combined, so that the determined pixel mean value is more accurate, which helps to accurately determine the color difference between the reserved area and the target area, thereby improving the color tone Adjust the effect.

S106. Perform tone adjustment on pixels in the first image according to the color difference to obtain a second image that matches the tone of the target area.

Hue refers to the overall tendency of the color of the image. Although the image includes a variety of colors, it generally has a color tendency. For example, the image may be bluish or reddish, warm or cold, and so on. This tendency in color is the hue of the image. That is, the hue of the image can be indicated by the pixel value (color value) of the pixel in the image, and the hue adjustment can be completed by adjusting the pixel value of the image pixel.

The tone adjustment may be to adjust the color values of the pixels in the first image to be closer to the color values of the pixels in the target area.

The tone matching of the two images means that the difference between the color values of the pixels in the two images is smaller than the preset color threshold (empirical threshold), that is, the color values of the pixels in the two images are relatively close, showing roughly the same hue Effect.

In S106, the color difference may be fused with pixel values of pixels in the first image, so as to achieve the effect that the hue of the first image matches the hue of the target area, and complete the hue adjustment.

The embodiment of the present disclosure does not limit the specific formula of color fusion, and the following is only an example:

new_FG＝q*diff+FG……………..(3)

As in the above formula (3), FG refers to the pixel value of the pixel in the first image. new_FG refers to the pixel value of a pixel within the second image. q is the preset adjustment factor. The q is a preset value according to business requirements. diff indicates the color difference between the target area and the reserved area. According to the formula (3), the color tone of the pixels in the first image can be adjusted based on the color difference to obtain a second image that matches the color tone of the target area, thereby facilitating the improvement of the area replacement effect.

In some implementation manners, the color difference between the second image and the reserved area before adjustment can also be fused, and the pixels in the second image can be adjusted again to avoid the pixel value of the pixel in the second image being too large or Too small to enhance the tone adjustment effect.

Please refer to FIG. 4 . FIG. 4 is a schematic flowchart of a method for adjusting hue according to an embodiment of the present disclosure. The steps shown in FIG. 4 are supplementary descriptions of S106. As shown in FIG. 4, the tone adjustment method may include S402-S404. Unless otherwise specified, the present disclosure does not specifically limit the execution order of these steps.

S402. Based on the color difference, preliminarily adjust the pixel values of the pixels in the first image to obtain a third image; the pixels in the third image are fused with the color difference.

In some implementation manners, the third image may be obtained by using the foregoing formula (3).

S404. Based on the difference between the pixel mean value of the pixels in the third image and the pixel mean value of the pixels in the first image, adjust the pixel values of the pixels in the third image to obtain a tone matching the target area. second image.

The difference between the pixel mean value of the pixels in the third image and the pixel mean value of the pixels in the first image may indicate a color difference between the third image and the first image.

In S404, the difference between the pixel mean value of the pixels in the third image and the pixel mean value of the pixels in the first image may be determined first, and then the difference is fused into the pixel values of the pixels in the third image to obtain the second image.

new_FG’=new_FG+(mean(FG)-mean(new_FG))………………(4)

As in the above formula (4), wherein new_FG' on the left side of the equal sign is the pixel value of the pixel in the second image. The new_FG on the right side of the equal sign is the pixel value of the pixel in the third image obtained by the aforementioned formula (3). mean() is the average function. The color difference between the third image and the first image can be obtained by mean(FG)-mean(new_FG). Through S402-S404, formula (3) can be used to initially adjust the hue of the first image, and then formula (4) can be used to adjust the hue of the third image to obtain the second image, so that the hue of the second image is closer to The tone of the target area will not deviate too much from the tone of the first image, reducing the possibility of the color being too bright or too dark caused by the pixel value of the pixel in the second image being too large or too small, and improving the tone adjustment effect, and then Improve area replacement effect.

S108. Based on the transparency map, perform image fusion on the second image and the material image to obtain a target image.

The image fusion may include, but is not limited to, splicing, addition, and multiplication of pixel values of pixels in two images.

In S108, the first result can be obtained based on the fusion of the transparency map and the second image, and the second result can be obtained based on the fusion of the material image and the reverse transparency map corresponding to the transparency map; and then the obtained The first result is fused with the second result to obtain the target image.

The embodiment of the present disclosure does not limit the specific formula of image fusion, and the following is only an example:

new＝new_FG’*Alpha+BG*(1-Alpha)………………………(5)

Equation (5) above, where new indicates the pixel value of a pixel in the target image. new_FG' indicates the pixel value of the pixel in the second image obtained in S106. BG indicates the pixel value of the pixel in the material image corresponding to the target area. Alpha indicates the transparency value of the pixel within the transparency map. Among them, 1-Alpha can be expressed as the reverse transparency map corresponding to the transparency map. The first result obtained by the fusion of Alpha and new_FG' can be expressed as new_FG'*Alpha, the second result obtained by the fusion of BG and reverse transparency can be expressed as BG*(1-Alpha), the fusion of the first result and the second result The result is new obtained by formula (5).

The transparency values of the pixels belonging to the reserved area in the transparency map are the first value, and the transparency values of the pixels belonging to the non-reserved area are the second value. For example, in a scene where the area is replaced and the original non-reserved area is not retained, the first value may be 1, indicating that the pixel is non-transparent, and the value may be 0, indicating that the pixel is transparent. Through new_FG'*Alpha, the pixels belonging to the reserved area in the second image can be adjusted to be non-transparent, and the pixels belonging to the non-reserved area can be adjusted to be transparent. By using BG*(1-Alpha), the pixels belonging to the target area in the material image are adjusted to be non-transparent, and the pixels belonging to the non-target area are adjusted to be transparent. The image fusion can be realized by formula 5.

According to the scheme described in S102-S108, the color difference between the reserved area and the target area can be determined according to the pixel values of the pixels in the reserved area obtained by matting the original image and the pixel values of the pixels in the target area , and then adjust the hue of the pixels in the reserved area according to the color difference, and unify the hues of the pixels in the reserved area and the pixels in the target area, so that when the second image and the target area are combined Image fusion, in the process of region replacement, can ensure that the tone of the target region matches the reserved region in the original image, thereby improving the effect of region replacement.

In related technologies, segmentation models are commonly used to perform semantic segmentation on reserved regions to obtain rough mask results for non-reserved regions. Then, the replacement of the original non-reserved area is realized according to the mask result and the material image. Since the mask result output by the segmentation model is often rough in the boundary area between the reserved area and the non-reserved area, directly using the mask result for area replacement will cause obvious artifacts in the boundary area. For example, in a sky replacement scene, some local details between the sky and the horizon in the original image may be missing.

In some embodiments, a tripartite matting method may be used to solve the foregoing problems. In S102, the original image can be matted by using the trimap corresponding to the original image to obtain the first image and the transparency map, so that the original image can be well distinguished by using the trimap The characteristics of the reserved area, the unreserved area, and the area to be determined between the reserved area and the unreserved area enable the obtained transparency map to preserve the detailed information of the junction position between the reserved area and the unreserved area, and directly use the mask result to perform Compared with the region replacement scheme, the region replacement based on the transparency map obtained by matting the original image by using the tripartite map can help improve the connection effect between the reserved region and the target region.

Please refer to FIG. 5 . FIG. 5 is a schematic flowchart of a method for cutting out images according to an embodiment of the present disclosure. The steps shown in FIG. 5 are supplementary descriptions of S102. As shown in FIG. 5 , the map-cutting method may include S502-S504. Unless otherwise specified, the present disclosure does not specifically limit the execution order of these steps.

S502. Obtain the trimap corresponding to the original image; for each pixel in the trimeg, the value corresponding to the pixel indicates that the pixel belongs to any of the reserved area, non-reserved area, or undetermined area in the original image the probability of a region.

The trimap has the characteristics of distinguishing the foreground, the background, and the transition area between the foreground and the background in the image. That is, regardless of whether the reserved area is the foreground or the background in the original image, the three-part map is used to distinguish the reserved area in the original image, the non-reserved area, and the area to be determined between the reserved area and the non-reserved area, so as to save the reserved area Details of the location of the handoff with the non-reserved area. The trimap in the present disclosure can be represented by a trimap.

In some embodiments, editing software can be used to assist in obtaining the trimap of the original image. Taking the reserved area as the foreground area as an example, the non-reserved area (background area), reserved area (foreground area), and undetermined area can be marked on the original image through image editing software to obtain a tripartite map.

In some implementation manners, the trimap may be obtained by using a trimap extraction network generated based on a neural network. The trimap extraction network can be trained in advance based on training samples marked with trimap information.

In the above two implementations, either manual labeling by software, or prediction of trimap by prediction network. However, the manual labeling method is too complicated and inconvenient to use; the network prediction method requires a large number of trimap labels, which is cumbersome.

In order to simplify the trimap acquisition process, in some implementations, the user does not need to manually label the trimap, nor does it need to pre-train the prediction network for predicting the trimap, but the trimap can be obtained based on the results of semantic segmentation combined with probability conversion. .

Please refer to FIG. 6 . FIG. 6 is a schematic flowchart of a method for obtaining a tripartite graph according to an embodiment of the present disclosure. The steps shown in FIG. 6 are supplementary descriptions of the method for obtaining the trimap in S502. As shown in FIG. 6 , the method for obtaining a trimap may include S602-S604. Unless otherwise specified, the present disclosure does not specifically limit the execution order of these steps.

S602. Perform semantic segmentation processing on the original image to be processed to obtain a semantic probability map corresponding to the original image.

Wherein, the image to be matted may be referred to as an original image. For example, assuming that a non-sky area is to be extracted from a person image, the person image may be called an original image. The non-sky area is the target to be extracted in the matting process, which may be called a reserved area, and the reserved area may be the foreground or background in the original image.

In this embodiment, the semantic segmentation processing may be performed on the original image, for example, the semantic segmentation processing may be performed through a semantic segmentation network. The semantic segmentation network includes but is not limited to commonly used semantic segmentation networks such as SegNet, U-Net, DeepLab, and FCN.

After semantic segmentation processing, a semantic probability map of the original image can be obtained, and the semantic probability map can include: for each pixel in the original image, the first probability that the pixel belongs to the reserved area. Taking the reserved area as the foreground as an example, in the semantic probability map, the probability of a certain pixel in the original image belonging to the foreground may be 0.85, and the probability of another pixel belonging to the foreground may be 0.24.

S604. Perform probability conversion processing based on the semantic probability map to obtain a trimap corresponding to the original image.

In this step, probability conversion processing may be performed based on the semantic segmentation processing result to obtain a tripartite graph. The trimap obtained through probability conversion processing in this embodiment can be represented by soft-trimap.

Wherein, the probability conversion process may be to map the probability corresponding to the pixel obtained in the semantic probability map to the value corresponding to the pixel in the soft-trimap through a mathematical conversion method.

Specifically, take the reserved area as the foreground as an example. The probability in the semantic probability graph can be transformed into the following two parts:

1) Based on the semantic probability map, the first probability is converted to obtain the second probability.

Wherein, the trimap soft-trimap may include three kinds of regions: "reserved region (foreground)", "non-reserved region (background)" and "to-be-determined region". In this embodiment, the probability that the pixel belongs to the region to be determined in the tripartite map may be referred to as the second probability.

When converting the first probability that the pixel in the semantic probability map belongs to the reserved area to the second probability, the following probability conversion principle can be followed: the first probability characterizes the probability that the pixel belongs to the reserved area (foreground) or the non-reserved area (background) The higher the value, the lower the probability that the second probability indicates that the pixel belongs to the region to be determined in the tripartite map. For example, the closer the first probability is to 1 and 0, the closer the second probability is to 0; the closer the first probability is to 0.5, the closer the second probability is to 1. The above conversion principle is that if a pixel in the image has a higher probability of belonging to the reserved area (foreground), or a higher probability of belonging to the non-reserved area (background), the lower the probability of the pixel belonging to the area to be determined; and When the probability that the pixel belongs to the reserved area (foreground) or the non-reserved area (background) is around 0.5, it means that the probability that the pixel belongs to the area to be determined is higher.

Based on the above principle of probability conversion, the first probability can be converted to obtain the second probability. The embodiment of the present disclosure does not limit the specific formula of probability conversion, and the following is only an example:

un=-k4*score^4+k3*score^3–k2*score^2+k1*score.......(6)

As in the above formula (6), where un represents the second probability that the pixel belongs to the region to be determined, and score represents the first probability that the pixel belongs to the reserved region in the semantic probability map. The formula (6) is a way of polynomial fitting, through polynomial fitting, the first probability of the pixel is fitted to obtain the second probability. This embodiment does not limit the specific values of the coefficients "k1/k2/k3/k4" mentioned above.

It can be understood that the actual implementation is not limited to the above-mentioned polynomial fitting, and other functional formulas can also be used, as long as the above-mentioned probability conversion principle is followed. In this embodiment, polynomial fitting is used to convert the first probability into the second probability, which can make the polynomial conversion calculation more efficient, and also more accurately reflect the above conversion principle.

2) Generate the trimap according to the first probability and the second probability corresponding to each pixel in the semantic probability map.

As above, by performing semantic segmentation on the original image, a semantic probability map can be obtained, and the reserved area (foreground) and non-reserved area (background) in the original image can be roughly distinguished through the semantic probability map. For example, if If the first probability of a pixel belonging to the foreground is 0.96, then the probability of belonging to the foreground is very high; if the first probability of a pixel belonging to the foreground is 0.14, it means that the probability of the pixel belonging to the background is very high.

After the second probability is obtained based on the semantic probability map, the second probability that each pixel belongs to the region to be determined can be obtained. For each pixel in the original image, the first probability corresponding to the pixel in the semantic probability map and the second probability that the pixel belongs to the region to be determined can be combined for probability fusion, and the pixel can be obtained in the tripartition map soft- The corresponding numerical value in the trimap, which can represent the probability that the pixel belongs to any area in the determined reserved area (foreground), non-reserved area (background) or undetermined area in the original image.

For example: in soft-trimap, if the value corresponding to a pixel is closer to 1, it means that the pixel is more likely to belong to the reserved area (foreground) in the original image; the closer the value of the pixel in soft-trimap is to 0, it means The pixel is more likely to belong to the non-reserved area (background); the closer the value of the pixel in the soft-trimap is to 0.5, the more likely the pixel belongs to the area to be determined. That is, the probability that the pixel belongs to any one of the reserved area, the non-reserved area, or the area to be determined can be expressed by the value corresponding to the pixel in the soft-trimap.

The following formula (7) exemplifies a way to obtain a tripartite graph according to the first probability and the second probability:

soft_trimap=-k5*un/k6*sign(score-k7)+(sign(score-k7)+k8)/k9....(7)

In the above formula (7), soft_trimap represents the value corresponding to the pixel in soft-trimap, un represents the second probability, score represents the first probability, and sign() represents the sign function. Similarly, this embodiment does not limit the specific values of the aforementioned coefficients "k5/k6/k7/k8".

As described in the above example, after converting the first probability corresponding to the pixel to obtain the second probability, and combining the first probability and the second probability corresponding to the pixel to generate the tripartite map, the probability conversion based on the semantic probability map is realized Process to get the three-point map soft_trimap.

In some embodiments, before performing the above-mentioned probability conversion processing based on the semantic probability map, pooling processing may be performed on the semantic probability map, and the above-mentioned probability conversion processing is performed on the pooled semantic probability map. See equation (8) below:

score_=avgpool2d(score, ks, stride).......(8)

As shown in formula (8), in an example, the average pooling process can be performed on the semantic probability map, and the pooling is performed according to the convolution stride and the convolution kernel size (kernel_size, ks). score_ represents the semantic probability map after pooling, which contains the probability of each pooling.

If pooling is performed on the semantic probability map, the scores in the above formulas (6) and (7) are replaced with the pooled probability, that is, the pooled semantic probability map is used to perform probability conversion.

The size of the kernel used in the above pooling process can be adjusted, and the pooling process is performed before the probability conversion of the semantic probability map, which helps to adjust the width of the area to be determined in the soft_trimap to be generated by adjusting the size of the convolution kernel . For example, the larger the kernel_size is, the wider the width of the area to be determined can be.

In some embodiments, assuming that the semantic segmentation processing of the original image is performed by a semantic segmentation network, before performing the semantic segmentation processing, the image size of the original image can also be preprocessed, and the preprocessing can be based on the semantic segmentation network For the downsampling multiple of the original image, the image size of the original image is processed by an integer multiple of the downsampling multiple, so that the image size after the integer multiple processing can be divided by the above-mentioned downsampling multiple scale_factor, which is the semantic segmentation network for the original image. The downsampling multiple, the specific value is determined by the network structure of the semantic segmentation network.

Through the steps recorded in S602-S604 to obtain the three-part map, the semantic probability map obtained by semantic segmentation based on the original image can be probabilistically converted to obtain the three-part map, which makes the acquisition of the three-part map faster and more convenient, no manual labeling is required, and It is no longer necessary to train the prediction network through trimap annotations, which makes the process of map matting easier to implement; moreover, this method of using probability conversion to obtain a trimap is based on the semantic probability map of semantic segmentation, so that the generated trimap The graph is more accurate.

S504. Perform image matting processing according to the trimap and the original image to obtain the matting result.

In this step, the process of the matting process may include: using the tripartite image and the original image as the input of the matting network, obtaining the residual of the reserved area output by the matting network and the initial transparency of the original image picture.

The residual of the reserved area may be a residual result obtained by the residual processing unit in the matting network. The reserved area residual may indicate a difference between a pixel value of a pixel in the reserved area extracted by the residual processing unit and a pixel value of a corresponding pixel in the original image.

The value of the pixel in the initial transparency map indicates the transparency of the pixel.

Then, the first image can be obtained based on the original image and the reserved area residual (for example, the foreground image can be obtained based on the addition of the foreground residual and the original image, or the background image can be obtained based on the addition of the background residual and the original image), and can According to the trimap soft_trimap, the values of the pixels in the initial transparency map are adjusted to obtain a transparency map corresponding to the original image.

Through the foregoing adjustments, first, the transparency values of pixels in the reserved region of the trimap in the initial transparency map may be adjusted to a first value. Second, the transparency values of pixels in the non-reserved region of the trimap in the initial transparency map may be adjusted to a second value. Thirdly, based on the magnitude of the transparency value of the pixels in the region to be determined in the trimap in the initial transparency map, distinguish the pixels in the region to be determined in the initial transparency map, Distinguish whether the pixels in the area to be determined belong to a reserved area or a non-reserved area, and assign a value to it.

For example, in the scene where the area is replaced and the original non-reserved area is not retained, based on the aforementioned adjustment, the transparency value of the pixel in the reserved area of the tripartite map in the initial transparency map can be adjusted to 1, The transparency value of the pixels in the non-reserved area of the tripartite map in the initial transparency map can be adjusted to 0, and the transparency value of the pixels in the initial transparency map in the region to be determined can be greater than 0.5 The opacity value is adjusted to 1, and the opacity value less than 0.5 is adjusted to 0.

According to the matting scheme recorded in S502-S504, the three-part map can be used for matting, so that the detailed information of the handover position between the reserved area and the non-reserved area can be well preserved, and it is beneficial to improve the reserved area and the target when performing area replacement. Cohesion effect between regions.

In related technologies, due to the lack of computing power or processing capability of the mobile phone, the area replacement software on the mobile phone mainly uploads data to the server for processing, and then transmits the area replacement result back to the mobile phone for local reading. The security and privacy of data in this scheme are difficult to guarantee.

In order to solve this problem, considering the processing capability of the mobile terminal, in some embodiments, the network deployed to the mobile terminal can be miniaturized, and the size of the original image can be scaled, so that the time-consuming and memory consumption of the mobile terminal can be reduced. Within the processing capability of the terminal, there is no need to replace the region through the server to ensure data security and privacy. An example of image keying on the mobile terminal is described as follows.

In this embodiment, when the method described in S502-S504 is used for image matting, a semantic segmentation network and an image matting network may be used. Wherein, the semantic segmentation network may be a network such as SegNet, U-Net, etc., and the matting network may include an encoder (encoder) and a decoder (decoder). The encoder of the image matting network can adopt the structure design of mobv2, and before the image matting network is deployed to the mobile terminal, the channel compression of the image matting network can be performed, and the channel compression can be carried out in the middle of the network of the image matting network The number of channels of the features (that is, the features of the middle layer of the network) is compressed. For example, it is possible to reduce the number of output channels of the convolution kernel in the process of matting network processing. Assuming that the number of output channels of the convolution kernel is originally a, it can be compressed according to 0.35 times the number of channels. After compression, the output of the convolution kernel The number of channels is 0.35*a.

In the following, the replacement of the sky scene in the person image is taken as an example for illustration.

Please refer to FIG. 7a, which is a schematic diagram of a character image shown in an embodiment of the present disclosure.

The sky area in the person image shown in FIG. 7a is used as a background area, which is also a non-reserved area, and needs to be replaced with another sky area (ie, the target area in this disclosure) in the pre-acquired material image. The non-sky area in Figure 7a is the reserved area output by the matting network in this example, that is, the foreground area.

Please refer to FIG. 8 . FIG. 8 is a schematic diagram of a region replacement process shown in an embodiment of the present disclosure. FIG. 9 is a schematic flowchart of the region replacement method based on FIG. 8 . As shown in FIG. 9, the region replacement method may include S901-S909. Unless otherwise specified, the present disclosure does not specifically limit the execution order of these steps.

S901. Perform scaling processing on the original image.

The original image in this embodiment may be the person image shown in Fig. 7a. The person image may be captured by the user through the camera of the mobile terminal, or may be an image stored in the mobile terminal or received from other devices.

The purpose of the matting process in this embodiment may be to extract the non-sky area in the person image. Non-sky regions in the original image can be considered as foreground.

Since the cutout process is performed on the mobile terminal in this embodiment, the original image can be scaled in order to reduce the processing load on the mobile terminal and save the calculation amount of the mobile terminal. Assuming that the size of the original image in Figure 7a is 1080*1920, the image can be scaled to a size of 480*288. For example, scaling can be done by way of bilinear difference. Scaling can be performed with reference to the following formula (9) and formula (10):

scale=max(h/basesize,w/basesize).......(9)

new_h=int(h/scale+k10)new_w=int(w/scale+k11)......(10)

Among them, h and w are the length and width of the original image, basesize is the base size, which is 480 in this example, and int(x) means rounding x. new_h and new_w are the scaled dimensions of the original image respectively, where the specific values of the coefficients in formula (10) are not limited in this embodiment.

In addition, according to formula (11) and formula (12), the image size of the original image can be processed by an integer multiple of the downsampling multiple to control the scaled image size to be able to divide the semantic segmentation network's downsampling multiple scale_factor of the image. It can be understood that other formulas may also be used for the integer multiple processing, and are not limited to the following two formulas.

new_h=int(int(int(new_h–k12+scale_factor–k13)/scale_factor)*scale_factor)...(11)

new_w=int(int(int(new_w–k14+scale_factor–k15)/scale_factor)*scale_factor)...(12)

This embodiment does not limit the specific values of the respective coefficients in the above formula (11) and formula (12). For example, the above values of k12 to k15 may all be set to 1. If the original image before scaling is marked with A, then the original image obtained by normalizing the image after being scaled into a 480*288 image can be marked with B. Referring to FIG. 8 , the original image B is the original image after scaling.

S902. Perform semantic segmentation processing on the scaled original image through the semantic segmentation network to obtain a semantic probability map output by the semantic segmentation network.

As shown in Fig. 8, the semantic segmentation process can be performed on the original image B through the semantic segmentation network 81, and the semantic probability map 82 output by the semantic segmentation network can be obtained. The semantic probability map can be identified by score, and Fig. 7b shows a semantic probability picture. It can be seen that the score of the semantic probability map indicates the probability that the pixel belongs to the non-sky area (foreground), which roughly distinguishes the foreground and background in the image, that is, roughly distinguishes the sky area from the non-sky area.

S903. Perform probability conversion processing based on the semantic probability map to obtain a trimogram.

In this step, the trimap soft-trimap may be generated according to the probability conversion process described in the aforementioned S604. For example, the semantic probability map can be pooled according to formula (8), and then the probability conversion process can be performed on the pooled semantic probability map according to formula (6) and formula (7) to generate a tripartite map. See this tripartite diagram 83 in FIG. 8 .

Please refer to the illustration in Figure 7c. This Figure 7c illustrates a three-part map soft-trimap. It can be seen that the probability value of the pixel in the soft-trimap can represent the probability that the pixel belongs to three types of regions. According to the probability value A "sky area", a "non-sky area" and "a region to be determined between the sky area and the non-sky area" in the image can be distinguished.

S904, using the trimap and the original image as input to the matting network, and obtaining the foreground residual and the initial transparency map output by the matting network.

Please refer to Fig. 8, the three-part image 83 and the original image B can be used as the input of the matting network 84, and the matting network can output a 4-channel result, wherein the result of one channel is the initial transparency map raw_alpha, and in addition The result of the three channels is the foreground residual fg_res. The first result 85 output by the keying network in FIG. 8 may include "raw_alpha+fg_res".

S905. Obtain a foreground image including the foreground region based on the original image and the foreground residual, and obtain a transparency map according to the initial transparency and the trimap.

The foreground is the non-sky area in the person image.

Please continue to combine with Figure 8, the foreground residual fg_res can be enlarged through the bilinear difference, so that it can be restored to the scale before the original image is scaled, and then execute the formula (13):

FG=clip(A+fg_res,s1,s2).......(13)

As shown in FIG. 8 , according to the enlarged foreground residual fg_res and the original image A, a matting result 86 , that is, the foreground image FG in the original image, can be obtained. Among them, clip(x, s1, s2) is to limit the value of x to [s1, s2]. This embodiment does not limit specific values of s1 and s2 in the above formula (13), for example, s1 may be 0, and s2 may be 1.

In addition, the transparency map corresponding to the non-sky region can be calculated according to the following equations (14) and (15):

fs=clip((soft_trimap-s3)/s4,s5,s6)...(14)

Alpha＝clip(fs+un*raw_alpha,s7,s8).....(15)

Among them, Alpha represents the transparency corresponding to the non-sky area. After obtaining the Alpha, the Alpha can be enlarged back to the original size of the original image before scaling through the bilinear difference. Likewise, this embodiment does not limit the specific values of the respective coefficients s3 to s8 in the above formula (14) and formula (15).

Fig. 7d shows a transparency map Alpha, where the non-sky area and the sky area can be clearly distinguished in Alpha. Exemplarily, the first value of a pixel in the sky area is 1, indicating non-transparency. The second value of pixels in the non-sky area is 0, which means completely transparent. Fig. 7e illustrates the extracted non-sky area, ie the foreground image FG.

S906. Sampling the foreground image FG, Alpha, and the material image including the target area BG according to a preset step, to obtain a foreground sampling point FG1, a third sampling point Alpha1, and a second sampling point BG1.

In conjunction with FIG. 8 , in this step, the value obtained after the short side of the person image is divided by 20 can be used as the step, and the matting result 86 (foreground image FG and Alpha) and the material image 87 are sampled.

S907. Determine the color difference between the target area BG and the foreground area FG.

In this step, the color difference can be obtained based on the method shown in FIG. 3 .

Wherein, based on the foreground sampling point FG1 and the third sampling point Alpha1, the first pixel mean value fg_mean of the foreground sampling point can be obtained according to the foregoing formula (1).

Based on the second sampling point BG1, determine a second pixel mean value bg_mean of the second sampling point.

Then, the color difference diff can be obtained according to the aforementioned formula (2).

S908. Based on the color difference, perform tone adjustment on the foreground image FG to obtain an adjusted foreground image (second image) that matches the tone of the target area.

In this step, tone adjustment may be performed according to the tone adjustment method shown in FIG. 4 .

In conjunction with Fig. 8, wherein, the non-sky area (the foreground image FG in the matting result) can be adjusted according to the preset adjustment coefficient q first, according to the aforementioned formula (3), to obtain the preliminary adjusted foreground image (the third image). Then, based on the aforementioned formula (4), the tone correction is performed on the preliminarily adjusted foreground image (third image) to obtain the final adjustment result 88, that is, the final adjusted foreground image new_FG' (second image).

S909. Based on the transparency map, perform image fusion on the adjusted foreground image and the material image to obtain a target image.

In this step, image fusion can be performed according to the foregoing formula (5), and the target image new is obtained after replacing the sky area.

As shown in FIG. 7f, it is the target image after replacing the sky area obtained through S901-S909. In this method, on the one hand, in the process of region replacement, it can be ensured that the tone of the target region matches the non-sky region in the original image, thereby improving the effect of region replacement.

On the other hand, the three-part map can be used for matting, so that the detailed information of the intersection position between the sky area and the non-sky area can be well obtained. When performing area replacement, it is beneficial to improve the connection effect between the sky area and the non-sky area. .

On the other hand, by performing channel compression and other processing on the keying network, and scaling the original image, it can make it more suitable for keying on the mobile terminal. For example, after the user uses his mobile terminal to capture the image, he can Complete the matting process of the non-sky area directly on the mobile terminal, and merge it with the target area to complete the replacement of the sky area, so that these processes can be performed locally on the mobile terminal without uploading to the cloud, which improves data security privacy protection. And it can be seen from FIG. 8 that the method of region replacement is to directly obtain the matting result by using a single original image as input, that is, to provide an original image, and based on the region replacement method provided by the embodiment of the present disclosure, the corresponding region replacement method can be obtained. The prediction of the foreground in the original image requires less input information, which makes the image processing more convenient.

In addition, the semantic segmentation network and the image matting network are used in the image matting process of the embodiment of the present disclosure, and this embodiment does not limit the training methods of these two networks. FIG. 10 is a schematic flowchart of a network training method shown in an embodiment of the present disclosure. This method can be used for joint training of semantic segmentation network and matting network. As shown in Figure 10, the method may include the following processing:

S1002. Acquire a training sample set, where the training sample set includes a plurality of sample data.

In some implementations, each sample data in the training sample set may include a sample image, a first feature label corresponding to the sample image, and a second feature label corresponding to the sample image. Taking a matting scene as an example, the first feature label may be a segmentation label for the sample image, and the second feature label may be a matting label for the sample image.

S1004, for each sample data in the training sample set, process the sample data to obtain a global image including global image information of the sample image, a segmentation label corresponding to the global image, and local image information including the sample image The partial image of and the keying label corresponding to the partial image.

In some embodiments, the first processing can be performed on the sample image of the sample data to obtain a global image including most of the image information of the sample image. It can be considered that the global image includes the global image information of the sample image. The same first processing is performed on the first feature label of , to obtain the segmentation label corresponding to the global image. For example, the sample image can be scaled according to the size requirements of the semantic segmentation network for the input image, but still retain most of the image information of the sample image to obtain the global image, and perform the same scaling process on the first feature label to obtain Split tags.

At the same time, the second processing is performed on the sample image of the sample data to obtain a partial image including partial image information of the sample image, and at the same time, the same second processing is performed on the second feature label corresponding to the sample image to obtain the keying corresponding to the partial image Label. For example, the sample image may be partially cropped to obtain a partial image including partial image information of the sample image, and the same partial cropping may be performed on the second feature label to obtain the matte label.

S1006. Perform semantic segmentation processing on the global image through a semantic segmentation network to obtain a semantic probability map output by the semantic segmentation network.

S1008. Perform probability conversion processing based on the semantic probability map to obtain a trimogram.

For the probability conversion processing in this step, reference may be made to the foregoing embodiments, and no further details are given. The soft-trimap of the embodiment of the present disclosure can be obtained through the probability conversion process.

S1010, through the matting network, perform matting processing based on the trimap and the partial image, to obtain a matting result. The matting result may indicate a matting result for a reserved region in the sample.

S1012. Adjust the network parameters of the semantic segmentation network according to the difference between the semantic probability map and the segmentation label, and adjust the network parameters of the matting network based on the difference between the matting result and the matting label.

As can be seen from the above, in the embodiments of the present disclosure, by processing each sample data, the obtained global image including global image information and the first label are used to train the first sub-network, and the local image including local image information and The second label trains the second sub-network to improve the joint training effect and reduce the risk of network effect degradation.

In addition, in the above training method, the generation of soft-trimap adopts the method of probability conversion processing, which can assist the network training to a certain extent and have a better effect.

Specifically, soft-trimap can be adaptively adjusted during network training. For example, in the process of adjusting the network parameters of the semantic segmentation network according to the difference between the semantic probability map and the segmentation label, and adjusting the network parameters of the matting network based on the difference between the matting result and the matting label, the network parameters of the semantic segmentation network The parameters will be updated, and thus the semantic probability map output by the semantic segmentation network will also be updated.

Furthermore, the soft-trimap is generated based on the semantic probability map. Therefore, the update of the semantic probability map will bring the update of the three-part map soft-trimap, and then the matting result will also be updated. That is, in the network training process, it is usually iterated multiple times, and after each iteration, if the parameters of the semantic segmentation network are updated, even if the input is the same image, the semantic probability map, soft-trimap and matting results will be adaptive. Update, and continue to adjust the network parameters according to the updated results. This method of adaptively adjusting the soft-trimap will help to dynamically optimize the generated soft-trimap and matting results along with the adjustment of the semantic segmentation network, so that the training effect of the final model is better and more accurate. Extract into preserved regions in the target image.

Fig. 11 illustrates an image processing device, which can be applied to implement the image processing method of any embodiment of the present disclosure. As shown in FIG. 11 , the apparatus may include: a map matting module 1110 , a determination module 1120 , an adjustment module 1130 and a fusion module 1140 .

As shown in Figure 11, the device 1100 includes:

The matting module 1110 is configured to perform matting processing on the original image to be processed to obtain a matting result, the matting result including a first image and a transparency map corresponding to the original image; the first image includes the A reserved area in the original image, the reserved area is the foreground or background in the original image; the numerical value of the pixel in the transparency map indicates the transparency of the pixel;

A determination module 1120, configured to determine the color difference between the reserved area and the target area according to the difference between the pixels in the first image and the pixels in the material image containing the target area; the target area for replacing non-preserved regions in said original image;

An adjustment module 1130, configured to adjust the hue of the pixels in the first image according to the color difference to obtain a second image that matches the hue of the target area;

The fusion module 1140 is configured to perform image fusion on the second image and the material image based on the transparency map to obtain a target image.

In some embodiments, the determining module 1120 is specifically configured to:

Sampling the pixels in the first image and the pixels in the material image respectively to obtain a first sampling point and a second sampling point;

The color difference between the reserved area and the target area is determined based on the difference between the pixel value of the first sampling point and the pixel value of the second sampling point.

In some embodiments, the device 1100 also includes:

A module is used to sample pixels in the transparency map to obtain a third sampling point;

The determining module 1120 is specifically used for:

determining a first pixel mean value of the first sampling point based on the pixel value of the first sampling point and the transparency value of the third sampling point;

determining a second average pixel value of the second sampling point based on the pixel value of the second sampling point;

The color difference between the reserved area and the target area is determined according to the difference between the first pixel mean value and the second pixel mean value.

In some embodiments, the adjustment module 1130 is specifically used to:

Based on the color difference, preliminarily adjust the pixel values of the pixels in the first image to obtain a third image; the pixels in the third image are fused with the color difference;

Based on the difference between the pixel mean value of the pixels in the third image and the pixel mean value of the pixels in the first image, adjust the pixel values of the pixels in the third image to obtain a second color matching the tone of the target area. image.

In some embodiments, the fusion module 1140 is specifically used for:

performing fusion based on the transparency map and the second image to obtain a first result;

performing fusion based on the material image and a reverse transparency map corresponding to the transparency map to obtain a second result;

performing fusion based on the first result and the second result to obtain the target image.

In some embodiments, the map-cutting module 1110 is specifically used for:

Acquiring the three-part map corresponding to the original image; for each pixel in the three-part map, the value corresponding to the pixel indicates that the pixel belongs to any of the reserved area, non-reserved area or undetermined area in the original image the probability of a region;

and performing matting processing according to the trimap and the original image to obtain the matting result.

In some embodiments, the map-cutting module 1110 is specifically used for:

performing semantic segmentation processing on the original image to obtain a semantic probability map of the original image, where the value of a pixel in the semantic probability map indicates a first probability that the pixel belongs to the reserved area in the original image;

Probability conversion processing is performed based on the semantic probability map to obtain a tripartite map corresponding to the original image.

In some embodiments, the map-cutting module 1110 is specifically used for:

performing semantic segmentation processing on the original image through a semantic segmentation network to obtain the semantic probability map output by the semantic segmentation network;

Performing image matting processing according to the trimap and the original image includes: performing matting processing according to the trimap and the original image through a matting network.

In some embodiments, the map-cutting module 1110 is specifically used for:

For each pixel in the semantic probability map, a probability conversion is performed based on the first probability of the pixel to obtain a second probability that the pixel belongs to a region to be determined in the tripartite map;

The trimap is generated according to the first probability and the second probability of each pixel in the semantic probability map.

In some embodiments, the first probability of each pixel in the semantic probability map represents the higher the probability that the pixel belongs to the foreground or the background, and the second probability obtained through probability conversion represents that the pixel belongs to the tripartite map The lower the probability of the area to be determined;

Generating the trimap according to the first probability and the second probability of each pixel in the semantic probability map includes: for each pixel in the original image, according to the corresponding Perform probability fusion of the first probability and the second probability to determine the value corresponding to the pixel in the tripartite map.

In some embodiments, the map-cutting module 1110 is specifically used for:

Carrying out image matting processing according to the trimap and the original image to obtain an initial transparency map of the retained region residual and the original image; the numerical value of the pixel in the initial transparency map indicates the transparency of the pixel;

Obtaining the first image based on the original image and the reserved area residual;

According to the tripartite map, the values of the pixels in the initial transparency map are adjusted to obtain the transparency map corresponding to the original image.

In some embodiments, the device 1100 also includes:

A scaling module, configured to scale the original image;

The map-cutting module 1110 is specifically used for:

Enlarging the residual of the reserved area to the scale of the original image before performing the scaling process;

The first image is obtained according to the enlarged retained area residual and the original image.

In some embodiments, the non-preserved area includes a sky area in the original image; the target area includes a sky area in the material image.

Embodiments of the image processing apparatus shown in the present disclosure can be applied to electronic equipment. Accordingly, the present disclosure discloses an electronic device, which may include: a processor.

Memory used to store processor-executable instructions.

Wherein, the processor is configured to call executable instructions stored in the memory to implement the image processing method shown in any one of the foregoing embodiments.

Please refer to FIG. 12 . FIG. 12 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.

As shown in FIG. 12, the electronic device may include a processor for executing instructions, a network interface for connecting to a network, a memory for storing operation data for the processor, and a memory for storing instructions corresponding to the image processing device. volatile memory.

Wherein, the embodiment of the device may be implemented by software, or by hardware or a combination of software and hardware. Taking software implementation as an example, as a device in a logical sense, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory for operation by the processor of the electronic device where it is located. From the perspective of hardware, in addition to the processor, memory, network interface, and non-volatile memory shown in Figure 12, the electronic device where the device in the embodiment is usually based on the actual function of the electronic device can also include other Hardware, no more details on this.

It can be understood that, in order to increase the processing speed, the corresponding instructions of the image processing device may also be directly stored in the memory, which is not limited herein.

The present disclosure provides a computer-readable storage medium, the storage medium stores a computer program, and the computer program can be used to make a processor execute the image processing method shown in any one of the foregoing embodiments.

Those skilled in the art will appreciate that one or more embodiments of the present disclosure may be provided as a method, system or computer program product. Accordingly, one or more embodiments of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present disclosure may employ a computer implemented on one or more computer-usable storage media (which may include, but are not limited to, disk storage, CD-ROM, optical storage, etc.) with computer-usable program code embodied therein. The form of the Program Product.

"And/or" in the present disclosure means at least one of the two, for example, "A and/or B" may include three options: A, B, and "A and B".

Each embodiment in the present disclosure is described in a progressive manner, the same and similar parts of the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the data processing device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment.

The specific embodiments of the present disclosure have been described above. Other implementations are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain embodiments.

Embodiments of the subject matter and functional operations described in this disclosure can be implemented in digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware that may include the structures disclosed in this disclosure and their structural equivalents, or their A combination of one or more of them. Embodiments of the subject matter described in this disclosure can be implemented as one or more computer programs, i.e. one or more of computer program instructions encoded on a tangible, non-transitory program carrier for execution by or to control the operation of data processing apparatus. Multiple modules. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for transmission by the data The processing means executes. A computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

The processes and logic flows described in this disclosure can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, such as an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit).

Computers suitable for the execution of a computer program may include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Generally, a central processing unit will receive instructions and data from a read only memory and/or a random access memory. The basic components of a computer may include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to, one or more mass storage devices for storing data, such as magnetic or magneto-optical disks, or optical disks, to receive data therefrom or Send data to it, or both. However, a computer is not required to have such a device. In addition, a computer may be embedded in another device such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a device such as a Universal Serial Bus (USB) ) portable storage devices like flash drives, to name a few.

Computer-readable media suitable for storing computer program instructions and data may include all forms of non-volatile memory, media and memory devices and may include, for example, semiconductor memory devices such as EPROM, EEPROM and flash memory devices, magnetic disks such as internal hard drives or removable disks), magneto-optical disks, and CD ROM and DVD-ROM disks. The processor and memory can be supplemented by, or incorporated in, special purpose logic circuitry.

While this disclosure contains many specific implementation details, these should not be construed as limitations on the scope of any disclosure or of what may be claimed, but rather as primarily describing features of particular disclosed embodiments. Certain features that are described in multiple embodiments within this disclosure can also be implemented in combination in a single embodiment. On the other hand, various features that are described in a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may function in certain combinations as described and even initially claimed as such, one or more features from a claimed combination may in some cases be removed from that combination and the claimed A protected combination can point to a subcombination or a variant of a subcombination.

Similarly, while operations are depicted in the figures in a particular order, this should not be construed as requiring that those operations be performed in the particular order shown, or sequentially, or that all illustrated operations be performed, to achieve the desired result. In some cases, multitasking and parallel processing may be advantageous. Furthermore, the separation of the various system modules and components in the described embodiments should not be construed as requiring such separation in all embodiments, and it should be understood that the described program components and systems can often be integrated together in a single software product, or packaged into multiple software products.

Thus, certain embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.

The above are only some examples of one or more embodiments of the present disclosure, and are not intended to limit the one or more embodiments of the present disclosure. Any modification, equivalent replacement, improvement, etc. shall be included in the protection scope of one or more embodiments of the present disclosure.

Claims

An image processing method, characterized in that the method comprises:

Carrying out matting processing on the original image to obtain a matting result, the matting result including a first image and a transparency map corresponding to the original image; the first image includes a reserved area in the original image, so The reserved area is the foreground or background in the original image; the numerical value of the pixel in the transparency map indicates the transparency of the pixel;

Determine the color difference between the reserved area and the target area according to the difference between the pixels in the first image and the pixels in the material image containing the target area; the target area in the material image is used for replacing non-preserved regions in said original image;

adjusting the color tone of the pixels in the first image according to the color difference to obtain a second image that matches the color tone of the target area;

Based on the transparency map, image fusion is performed on the second image and the material image to obtain a target image.
The method according to claim 1, wherein the reserved area and the target area are determined according to the difference between the pixels in the first image and the pixels in the material image containing the target area. Said color differences between areas, including:

Sampling the pixels in the first image and the pixels in the material image respectively to obtain a first sampling point and a second sampling point;

The color difference between the reserved area and the target area is determined based on the difference between the pixel value of the first sampling point and the pixel value of the second sampling point.
The method according to claim 2, further comprising:

Sampling pixels in the transparency map to obtain a third sampling point;

Determining the color difference between the reserved area and the target area based on the difference between the pixel value of the first sampling point and the pixel value of the second sampling point includes:

determining a first pixel mean value of the first sampling point based on the pixel value of the first sampling point and the transparency value of the third sampling point;

determining a second average pixel value of the second sampling point based on the pixel value of the second sampling point;

The color difference between the reserved area and the target area is determined according to the difference between the first pixel mean value and the second pixel mean value.
The method according to any one of claims 1-3, wherein the tone adjustment is performed on the pixels in the first image according to the color difference to obtain the second image that matches the tone of the target area, including :

Based on the color difference, preliminarily adjust the pixel values of the pixels in the first image to obtain a third image; the pixels in the third image are fused with the color difference;

Based on the difference between the pixel mean value of the pixels in the third image and the pixel mean value of the pixels in the first image, adjust the pixel values of the pixels in the third image to obtain a color matching the color tone of the target area. second image.
The method according to any one of claims 1-4, wherein, based on the transparency map, performing image fusion of the second image and the material image to obtain the target image includes:

performing fusion based on the transparency map and the second image to obtain a first result;

performing fusion based on the material image and a reverse transparency map corresponding to the transparency map to obtain a second result;

performing fusion based on the first result and the second result to obtain the target image.
The method according to any one of claims 1-5, wherein the matting process is performed on the original image to obtain the matting result, including:

Acquiring the three-part map corresponding to the original image; for each pixel in the three-part map, the value corresponding to the pixel indicates that the pixel belongs to any of the reserved area, non-reserved area or undetermined area in the original image the probability of a region;

and performing matting processing according to the trimap and the original image to obtain the matting result.
The method according to claim 6, wherein obtaining the tripartite map corresponding to the original image comprises:

performing semantic segmentation processing on the original image to obtain a semantic probability map of the original image, where the value of a pixel in the semantic probability map indicates a first probability that the pixel belongs to the reserved area in the original image;

Probability conversion processing is performed based on the semantic probability map to obtain a tripartite map corresponding to the original image.
The method according to claim 7, wherein performing semantic segmentation processing on the original image to obtain the semantic probability map corresponding to the original image comprises: performing semantic segmentation on the original image through a semantic segmentation network Processing to obtain the semantic probability map output by the semantic segmentation network;

Performing image matting processing according to the trimap and the original image includes: performing matting processing according to the trimap and the original image through a matting network.
The method according to claim 7 or 8, wherein, carrying out probability conversion processing based on the semantic probability map to obtain the corresponding three-part map of the original image, including:

For each pixel in the semantic probability map, perform probability conversion based on the first probability of the pixel, to obtain a second probability that the pixel belongs to the region to be determined in the tripartite map;

The trimap is generated according to the first probability and the second probability of each pixel in the semantic probability map.
The method according to claim 9, characterized in that, for each pixel in the semantic probability map, the first probability of the pixel indicates that the higher the probability that the pixel belongs to the foreground or the background, the higher the probability conversion obtained The second probability indicates that the pixel belongs to the lower probability of the region to be determined in the tripartite map;

Generating the trimap according to the first probability and the second probability of each pixel in the semantic probability map includes: for each pixel in the original image, according to the corresponding Perform probability fusion of the first probability and the second probability to determine the value corresponding to the pixel in the tripartite map.
The method according to any one of claims 6-10, wherein the matting process is performed according to the trimap and the original image to obtain the matting result, including:

Carrying out image matting processing according to the trimap and the original image to obtain an initial transparency map of the retained region residual and the original image; the numerical value of the pixel in the initial transparency map indicates the transparency of the pixel;

Obtaining the first image based on the original image and the reserved area residual;

According to the tripartite map, the values of the pixels in the initial transparency map are adjusted to obtain the transparency map corresponding to the original image.
The method according to claim 11, wherein, before performing semantic segmentation processing on the original image, the method further comprises: performing scaling processing on the original image;

Obtaining the first image based on the original image and the reserved area residual, including:

Enlarging the residual of the reserved area to the scale of the original image before performing the scaling process;

The first image is obtained according to the enlarged retained area residual and the original image.
The method according to any one of claims 1-12, wherein the non-reserved area includes a sky area in the original image; and the target area includes a sky area in the material image.
An image processing device, characterized in that the device comprises:

A matting module, configured to perform matting processing on an original image to obtain a matting result, the matting result including a first image and a transparency map corresponding to the original image; the first image includes the original image A reserved area in the original image, the reserved area is the foreground or background in the original image; the numerical value of the pixel in the transparency map indicates the transparency of the pixel;

A determining module, configured to determine the color difference between the reserved area and the target area according to the difference between the pixels in the first image and the pixels in the material image containing the target area; the target area is used for replacing non-reserved regions in said original image;

An adjustment module, configured to adjust the tone of the pixels in the first image according to the color difference, to obtain a second image that matches the tone of the target area;

A fusion module, configured to perform image fusion on the second image and the material image based on the transparency map to obtain a target image.
An electronic device, characterized in that it comprises:

processor;

memory for storing processor-executable instructions;

Wherein, the processor implements the image processing method according to any one of claims 1-13 by running the executable instructions.
A computer-readable storage medium, characterized in that the storage medium stores a computer program, and the computer program is used to make a processor execute the image processing method according to any one of claims 1-13.