WO2023071810A1 - Image processing - Google Patents

Image processing Download PDF

Info

Publication number
WO2023071810A1
WO2023071810A1 PCT/CN2022/125012 CN2022125012W WO2023071810A1 WO 2023071810 A1 WO2023071810 A1 WO 2023071810A1 CN 2022125012 W CN2022125012 W CN 2022125012W WO 2023071810 A1 WO2023071810 A1 WO 2023071810A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
pixel
probability
original image
map
Prior art date
Application number
PCT/CN2022/125012
Other languages
French (fr)
Chinese (zh)
Inventor
程俊奇
四建楼
钱晨
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023071810A1 publication Critical patent/WO2023071810A1/en

Links

Images

Classifications

    • G06T5/90
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • the present disclosure relates to computer vision techniques, and more particularly to image processing.
  • region replacement is widely used in various image editing software, camera back-end algorithms and other scenarios.
  • Segmentation models are usually used to semantically segment the original image, resulting in a rough mask of the replaced region in the original image. Then, according to the mask result, it is fused with the image containing the target area, so as to replace the replaced area with the target area.
  • the embodiments of the present disclosure at least provide an image processing method, device, electronic device, and storage medium.
  • the present disclosure provides an image processing method, the method comprising: performing matting processing on an original image to obtain a matting result, the matting result including a first image and a transparency map corresponding to the original image; the The first image includes a reserved area in the original image, and the reserved area is the foreground or background in the original image; the numerical value of the pixel in the transparency map indicates the transparency of the pixel; according to the pixel in the first image and the difference between pixels in the material image containing the target area, determine the color difference between the reserved area and the target area; the target area is used to replace the non-retained area in the original image; according to the The color difference is used to adjust the hue of the pixels in the first image to obtain a second image that matches the tone of the target area; based on the transparency map, image fusion is performed on the second image and the material image to obtain target image.
  • the present disclosure proposes an image processing device, which includes: a map-cutting module, configured to perform map-cutting processing on an original image to obtain a map-cutting result, the map-cutting result including a first image and an image corresponding to the original image
  • the transparency map includes a reserved area in the original image, and the reserved area is the foreground or background in the original image; the numerical value of the pixel in the transparency map indicates the transparency of the pixel;
  • the determination module It is used to determine the color difference between the reserved area and the target area according to the difference between the pixels in the first image and the pixels in the material image containing the target area; the target area is used to replace the target area The non-retained area in the original image;
  • the adjustment module is used to adjust the color tone of the pixels in the first image according to the color difference to obtain a second image that matches the color tone of the target area; the fusion module is used to adjust the color tone based on the color difference.
  • image fusion is performed on the second image and the material image to
  • the present disclosure proposes an electronic device, including: a processor; a memory for storing executable instructions of the processor; wherein, the processor executes the executable instructions to implement the image processing method shown in any of the foregoing embodiments .
  • the present disclosure provides a computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to cause a processor to execute the image processing method as shown in any one of the foregoing embodiments.
  • the pixel values of the pixels in the reserved area and the pixel values of the pixels in the target area obtained by matting the original image can be Pixel value, determine the color difference between the reserved area and the target area, and then adjust the color tone of the pixels in the reserved area according to the color difference, and unify the pixels in the reserved area and the target area
  • the tone of the pixel so that in the process of area replacement, it can ensure that the tone of the target area matches the tone of the non-retained area in the original image, thereby improving the effect of area replacement.
  • the three-part map can be used for matting, so that the detailed information of the junction position between the reserved area and the non-reserved area can be well preserved.
  • the keying network can also be processed by channel compression, etc., and the original image can be scaled, so that the time-consuming and memory consumption of the keying process can be within the processing capability of the mobile terminal, so that there is no need to go through the server. Region replacement ensures data security and privacy.
  • FIG. 1 is a schematic flowchart of an image processing method shown in an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of a method for determining color differences shown in an embodiment of the present disclosure
  • FIG. 3 is a schematic flowchart of a method for determining color differences shown in an embodiment of the present disclosure
  • FIG. 4 is a schematic flow chart of a tone adjustment method shown in an embodiment of the present disclosure
  • FIG. 5 is a schematic flow diagram of a map-cutting method shown in an embodiment of the present disclosure.
  • FIG. 6 is a schematic flowchart of a method for obtaining a tripartite graph shown in an embodiment of the present disclosure
  • Fig. 7a is a schematic diagram of a character image shown in an embodiment of the present disclosure.
  • Fig. 7b is a schematic diagram of a semantic probability map shown in an embodiment of the present disclosure.
  • Fig. 7c is a schematic diagram of a tripartite graph shown in an embodiment of the present disclosure.
  • Fig. 7d is a schematic diagram of a transparency map shown in an embodiment of the present disclosure.
  • Fig. 7e is a schematic diagram of a foreground image shown in an embodiment of the present disclosure.
  • Fig. 7f is a schematic diagram of a target image shown in an embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of a region replacement process shown in an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of an area replacement process based on FIG. 8;
  • FIG. 10 is a schematic flowchart of a network training method shown in an embodiment of the present disclosure.
  • FIG. 11 is a schematic structural diagram of an image processing method shown in an embodiment of the present disclosure.
  • Fig. 12 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.
  • This disclosure relates to the field of augmented reality.
  • AR Augmented Reality
  • the target object may involve faces, limbs, gestures, actions, etc. related to the human body, or markers and markers related to objects, or sand tables, display areas or display items related to venues or places.
  • Vision-related algorithms may involve visual positioning, SLAM (Simultaneous Localization and Mapping), 3D reconstruction, image registration, background segmentation, object key point extraction and tracking, object pose or depth detection, etc.
  • Specific applications can not only involve interactive scenes such as guided tours, navigation, explanations, reconstructions, virtual effect overlays and display related to real scenes or objects, but also special effects processing related to people, such as makeup beautification, body beautification, special effect display, virtual Interactive scenarios such as model display.
  • the relevant features, states and attributes of the target object can be detected or identified through the convolutional neural network.
  • the aforementioned convolutional neural network is a neural network model obtained through model training based on a deep learning framework.
  • region replacement is widely used in various image editing software, camera back-end algorithms and other scenarios. Segmentation models are usually used to semantically segment the original image, resulting in a rough mask of the replaced region in the original image. Then, according to the mask result, it is fused with the image containing the target area, so as to replace the replaced area with the target area.
  • segmentation models are usually used to semantically segment the original image, resulting in a rough mask of the replaced region in the original image. Then, according to the mask result, it is fused with the image containing the target area, so as to replace the replaced area with the target area.
  • there is an obvious color tone difference between the target area and the original image and a direct replacement often results in an obvious inconsistency in picture color tone, resulting in a poor area replacement effect.
  • the present disclosure proposes an image processing method. This method can ensure that the tone of the target area matches the reserved area in the original image during area replacement, thereby improving the effect of area replacement.
  • FIG. 1 is a schematic flowchart of an image processing method shown in an embodiment of the present disclosure.
  • the processing method shown in FIG. 1 can be applied to electronic equipment.
  • the electronic device may execute the method by carrying software logic corresponding to the processing method.
  • the type of the electronic device may be a notebook computer, a computer, a mobile phone, a PDA (Personal Digital Assistant, PDA) and the like.
  • the type of the electronic device is not particularly limited in the present disclosure.
  • the electronic device may also be a client device and/or a server device, which is not specifically limited here.
  • the image processing method may include S102-S108. Unless otherwise specified, the present disclosure does not specifically limit the execution order of these steps.
  • S102 Perform matting processing on the original image to be processed to obtain a matting result, the matting result including a first image and a transparency map corresponding to the original image; the first image includes A reserved area, the reserved area is the foreground or background in the original image; the value of the pixel in the transparency map indicates the transparency of the pixel.
  • the original image needs to be replaced.
  • the original image may include reserved areas and non-reserved areas.
  • the non-reserved area is generally used as a replaced area, which is replaced by other materials.
  • the reserved area and the non-reserved area can be distinguished by image processing techniques.
  • the reserved area refers to an area that is reserved and not replaced during the process of performing area replacement on the image. For example, in a scene where background areas are replaced, foreground areas are preserved. For example, the background area (such as the sky area) in the person image needs to be replaced, and the foreground area containing the person can be used as the reserved area. In scenes where the foreground area is replaced, the background area is the preserved area.
  • the first image may include a reserved area cut out from the original image.
  • the first image has the same size as the original image. Areas in the first image other than the reserved area may be filled with pixels of preset pixel values.
  • the preset pixel value may be 0, 1 and so on.
  • the transparency map is used to distinguish reserved areas and non-reserved areas by different values of transparency.
  • the value of a pixel within the transparency map indicates the transparency of the corresponding pixel.
  • the transparency values of the pixels belonging to the reserved area in the transparency map are the first value
  • the transparency values of the pixels belonging to the non-reserved area are the second value.
  • the first value and the second value will vary.
  • the first value of the transparency map may be 1, indicating that the pixels in the reserved area are non-transparent, and the first value of the transparency map
  • the second value can be 0, indicating that the pixels in the non-reserved area are transparent.
  • the non-reserved area is replaced by this transparency, and the original non-reserved area is not preserved at all.
  • the first value of the transparency map can be 1, indicating that the pixels in the reserved area are non-transparent, and the first value of the transparency map
  • the second value can be 0.3, indicating that the pixels in the non-reserved area are semi-transparent.
  • the three-part map corresponding to the original image to be processed can be obtained; for each pixel in the three-part map, the value corresponding to the pixel indicates that the pixel belongs to the reserved area, non-reserved area, or area to be determined; then, according to the tripartite map, the original image can be matted to obtain the first image and the transparency map.
  • the trimap has the characteristics of distinguishing the foreground, the background, and the transition area between the foreground and the background in the image. That is, regardless of whether the reserved area is the foreground or the background in the original image, the three-part map is used to distinguish the reserved area in the original image, the non-reserved area, and the area to be determined between the reserved area and the non-reserved area, so as to save the reserved area Details of the location of the handoff with the non-reserved area.
  • a pre-trained matting network may also be used for matting processing.
  • the matting network is trained in a supervised manner through training samples marked with transparency information and reserved area information in advance.
  • the first image and the transparency map can be obtained by inputting the original image into the matting network.
  • S104 Determine the color difference between the reserved area and the target area according to the difference between the pixels in the first image and the pixels in the material image containing the target area; the target area in the material image Used to replace non-preserved regions in the original image.
  • the material images are generally some pre-acquired images, and these images contain replacement materials used to replace non-reserved areas.
  • the areas occupied by these replacement materials in the material image may be referred to as target areas.
  • the material image may contain some sky materials, and these sky materials may be used to replace the sky in the original image (that is, the non-preserved area in the original image).
  • the color difference refers to the pixel value difference between the pixels in the reserved area and the pixels in the target area.
  • a pixel value of a pixel may indicate a color value of the pixel.
  • the color difference between the reserved area and the target area may be obtained by calculating an average difference between pixel values of pixels in the first image and pixel values of pixels in the material image.
  • the reserved area and the target area can be sampled by sampling, and the color difference can be determined by the pixel value of the sampling point, thereby reducing the amount of computation for determining the color difference, thereby improving the efficiency of area replacement.
  • FIG. 2 is a schematic flowchart of a method for determining color differences shown in an embodiment of the present disclosure.
  • the steps shown in FIG. 2 are descriptions of S104.
  • the method for determining the color difference may include S202-S204. Unless otherwise specified, the present disclosure does not specifically limit the execution order of these steps.
  • the step size can be preset. For example, the result obtained by dividing the short side of the original image by a preset value (for example, 10, 20, 30, etc.) may be determined as the step.
  • a preset value for example, 10, 20, 30, etc.
  • sampling may be performed in a preset order (for example, from left to right, from top to bottom) and a set step size to obtain some first sampling points; and for the material image, Sampling is performed according to the preset order (for example, from left to right, from top to bottom) and a set step size to obtain some second sampling points.
  • the pixel mean value or pixel median value of the first sampling point can be determined according to the pixel value of the first sampling point, and the pixel value of the second sampling point can be determined according to the pixel value of the second sampling point Mean or pixel median value.
  • the color difference is then determined based on the difference between two pixel means or two pixel median values.
  • the pixel value of the sampling point is represented according to the pixel mean value or the pixel median value of the sampling point, which can simplify the operation.
  • the transparency of the first sampling point can be combined, so that the determined pixel mean value is more accurate, which helps to accurately determine the color difference between the reserved area and the target area , thereby enhancing the tone adjustment effect.
  • the pixels in the transparency map may be first sampled to obtain the third sampling point.
  • the steps disclosed in the foregoing S202 may be used for sampling to obtain some third sampling points.
  • FIG. 3 is a schematic flowchart of a method for determining color differences shown in an embodiment of the present disclosure.
  • the steps shown in FIG. 3 are supplementary descriptions of S204.
  • the method for determining the color difference may include S302-S306. Unless otherwise specified, the present disclosure does not specifically limit the execution order of these steps.
  • S302. Determine a first pixel average value of the first sampling point based on the pixel value of the first sampling point and the transparency value of the third sampling point.
  • the disclosure refers to the pixel mean value of the first sampling point determined based on the pixel value of the first sampling point and the transparency value of the third sampling point as the first pixel mean value.
  • the embodiment of the present disclosure does not limit how to determine the specific formula of the first pixel mean value, and the following is only an example:
  • fg_mean indicates the first pixel mean value.
  • FG1 refers to the pixel value of the first sampling point.
  • Alpha1 refers to the transparency value of the third sampling point. The exact mean value of the first pixel can be obtained by combining the transparency of the sampling point through the formula (1).
  • the pixel mean value of the second sampling point is referred to as the second pixel mean value.
  • the second pixel mean value bg_mean can be obtained by averaging BG1 through the mean number calculation formula.
  • BG1 is the pixel value of the second sampling point.
  • S306. Determine the color difference between the reserved area and the target area according to the difference between the first pixel average value and the second pixel average value.
  • a pixel value of a pixel may indicate color information of the pixel.
  • the difference in color of a pixel can be determined by the difference in pixel value.
  • the embodiment of the present disclosure does not limit the specific formula of how to determine the color difference, and the following is only an example:
  • the pixel mean value or pixel median value of the sampling point can be used to represent the pixel value of the sampling point, which can simplify the operation.
  • the transparency of the sampling point is combined, so that the determined pixel mean value is more accurate, which helps to accurately determine the color difference between the reserved area and the target area, thereby improving the color tone Adjust the effect.
  • Hue refers to the overall tendency of the color of the image. Although the image includes a variety of colors, it generally has a color tendency. For example, the image may be bluish or reddish, warm or cold, and so on. This tendency in color is the hue of the image. That is, the hue of the image can be indicated by the pixel value (color value) of the pixel in the image, and the hue adjustment can be completed by adjusting the pixel value of the image pixel.
  • the tone adjustment may be to adjust the color values of the pixels in the first image to be closer to the color values of the pixels in the target area.
  • the tone matching of the two images means that the difference between the color values of the pixels in the two images is smaller than the preset color threshold (empirical threshold), that is, the color values of the pixels in the two images are relatively close, showing roughly the same hue Effect.
  • the preset color threshold empirical threshold
  • the color difference may be fused with pixel values of pixels in the first image, so as to achieve the effect that the hue of the first image matches the hue of the target area, and complete the hue adjustment.
  • FG refers to the pixel value of the pixel in the first image.
  • new_FG refers to the pixel value of a pixel within the second image.
  • q is the preset adjustment factor.
  • the q is a preset value according to business requirements.
  • diff indicates the color difference between the target area and the reserved area. According to the formula (3), the color tone of the pixels in the first image can be adjusted based on the color difference to obtain a second image that matches the color tone of the target area, thereby facilitating the improvement of the area replacement effect.
  • the color difference between the second image and the reserved area before adjustment can also be fused, and the pixels in the second image can be adjusted again to avoid the pixel value of the pixel in the second image being too large or Too small to enhance the tone adjustment effect.
  • FIG. 4 is a schematic flowchart of a method for adjusting hue according to an embodiment of the present disclosure.
  • the steps shown in FIG. 4 are supplementary descriptions of S106.
  • the tone adjustment method may include S402-S404. Unless otherwise specified, the present disclosure does not specifically limit the execution order of these steps.
  • the third image may be obtained by using the foregoing formula (3).
  • the difference between the pixel mean value of the pixels in the third image and the pixel mean value of the pixels in the first image may indicate a color difference between the third image and the first image.
  • the difference between the pixel mean value of the pixels in the third image and the pixel mean value of the pixels in the first image may be determined first, and then the difference is fused into the pixel values of the pixels in the third image to obtain the second image.
  • new_FG’ new_FG+(mean(FG)-mean(new_FG)).
  • new_FG' on the left side of the equal sign is the pixel value of the pixel in the second image.
  • the new_FG on the right side of the equal sign is the pixel value of the pixel in the third image obtained by the aforementioned formula (3).
  • mean() is the average function.
  • the color difference between the third image and the first image can be obtained by mean(FG)-mean(new_FG).
  • formula (3) can be used to initially adjust the hue of the first image, and then formula (4) can be used to adjust the hue of the third image to obtain the second image, so that the hue of the second image is closer to The tone of the target area will not deviate too much from the tone of the first image, reducing the possibility of the color being too bright or too dark caused by the pixel value of the pixel in the second image being too large or too small, and improving the tone adjustment effect, and then Improve area replacement effect.
  • the image fusion may include, but is not limited to, splicing, addition, and multiplication of pixel values of pixels in two images.
  • the first result can be obtained based on the fusion of the transparency map and the second image, and the second result can be obtained based on the fusion of the material image and the reverse transparency map corresponding to the transparency map; and then the obtained The first result is fused with the second result to obtain the target image.
  • Equation (5) where new indicates the pixel value of a pixel in the target image.
  • new_FG' indicates the pixel value of the pixel in the second image obtained in S106.
  • BG indicates the pixel value of the pixel in the material image corresponding to the target area.
  • Alpha indicates the transparency value of the pixel within the transparency map.
  • 1-Alpha can be expressed as the reverse transparency map corresponding to the transparency map.
  • the first result obtained by the fusion of Alpha and new_FG' can be expressed as new_FG'*Alpha
  • the second result obtained by the fusion of BG and reverse transparency can be expressed as BG*(1-Alpha)
  • the fusion of the first result and the second result The result is new obtained by formula (5).
  • the transparency values of the pixels belonging to the reserved area in the transparency map are the first value
  • the transparency values of the pixels belonging to the non-reserved area are the second value.
  • the first value may be 1, indicating that the pixel is non-transparent
  • the value may be 0, indicating that the pixel is transparent.
  • new_FG'*Alpha the pixels belonging to the reserved area in the second image can be adjusted to be non-transparent, and the pixels belonging to the non-reserved area can be adjusted to be transparent.
  • BG*(1-Alpha) the pixels belonging to the target area in the material image are adjusted to be non-transparent, and the pixels belonging to the non-target area are adjusted to be transparent.
  • the image fusion can be realized by formula 5.
  • the color difference between the reserved area and the target area can be determined according to the pixel values of the pixels in the reserved area obtained by matting the original image and the pixel values of the pixels in the target area , and then adjust the hue of the pixels in the reserved area according to the color difference, and unify the hues of the pixels in the reserved area and the pixels in the target area, so that when the second image and the target area are combined Image fusion, in the process of region replacement, can ensure that the tone of the target region matches the reserved region in the original image, thereby improving the effect of region replacement.
  • segmentation models are commonly used to perform semantic segmentation on reserved regions to obtain rough mask results for non-reserved regions. Then, the replacement of the original non-reserved area is realized according to the mask result and the material image. Since the mask result output by the segmentation model is often rough in the boundary area between the reserved area and the non-reserved area, directly using the mask result for area replacement will cause obvious artifacts in the boundary area. For example, in a sky replacement scene, some local details between the sky and the horizon in the original image may be missing.
  • a tripartite matting method may be used to solve the foregoing problems.
  • the original image can be matted by using the trimap corresponding to the original image to obtain the first image and the transparency map, so that the original image can be well distinguished by using the trimap
  • the characteristics of the reserved area, the unreserved area, and the area to be determined between the reserved area and the unreserved area enable the obtained transparency map to preserve the detailed information of the junction position between the reserved area and the unreserved area, and directly use the mask result to perform
  • the region replacement based on the transparency map obtained by matting the original image by using the tripartite map can help improve the connection effect between the reserved region and the target region.
  • FIG. 5 is a schematic flowchart of a method for cutting out images according to an embodiment of the present disclosure.
  • the steps shown in FIG. 5 are supplementary descriptions of S102.
  • the map-cutting method may include S502-S504. Unless otherwise specified, the present disclosure does not specifically limit the execution order of these steps.
  • the trimap has the characteristics of distinguishing the foreground, the background, and the transition area between the foreground and the background in the image. That is, regardless of whether the reserved area is the foreground or the background in the original image, the three-part map is used to distinguish the reserved area in the original image, the non-reserved area, and the area to be determined between the reserved area and the non-reserved area, so as to save the reserved area Details of the location of the handoff with the non-reserved area.
  • the trimap in the present disclosure can be represented by a trimap.
  • editing software can be used to assist in obtaining the trimap of the original image.
  • the reserved area as the foreground area as an example, the non-reserved area (background area), reserved area (foreground area), and undetermined area can be marked on the original image through image editing software to obtain a tripartite map.
  • the trimap may be obtained by using a trimap extraction network generated based on a neural network.
  • the trimap extraction network can be trained in advance based on training samples marked with trimap information.
  • the user does not need to manually label the trimap, nor does it need to pre-train the prediction network for predicting the trimap, but the trimap can be obtained based on the results of semantic segmentation combined with probability conversion. .
  • FIG. 6 is a schematic flowchart of a method for obtaining a tripartite graph according to an embodiment of the present disclosure.
  • the steps shown in FIG. 6 are supplementary descriptions of the method for obtaining the trimap in S502.
  • the method for obtaining a trimap may include S602-S604. Unless otherwise specified, the present disclosure does not specifically limit the execution order of these steps.
  • S602. Perform semantic segmentation processing on the original image to be processed to obtain a semantic probability map corresponding to the original image.
  • the image to be matted may be referred to as an original image.
  • the person image may be called an original image.
  • the non-sky area is the target to be extracted in the matting process, which may be called a reserved area, and the reserved area may be the foreground or background in the original image.
  • the semantic segmentation processing may be performed on the original image, for example, the semantic segmentation processing may be performed through a semantic segmentation network.
  • the semantic segmentation network includes but is not limited to commonly used semantic segmentation networks such as SegNet, U-Net, DeepLab, and FCN.
  • a semantic probability map of the original image can be obtained, and the semantic probability map can include: for each pixel in the original image, the first probability that the pixel belongs to the reserved area. Taking the reserved area as the foreground as an example, in the semantic probability map, the probability of a certain pixel in the original image belonging to the foreground may be 0.85, and the probability of another pixel belonging to the foreground may be 0.24.
  • probability conversion processing may be performed based on the semantic segmentation processing result to obtain a tripartite graph.
  • the trimap obtained through probability conversion processing in this embodiment can be represented by soft-trimap.
  • the probability conversion process may be to map the probability corresponding to the pixel obtained in the semantic probability map to the value corresponding to the pixel in the soft-trimap through a mathematical conversion method.
  • the probability in the semantic probability graph can be transformed into the following two parts:
  • the first probability is converted to obtain the second probability.
  • the trimap soft-trimap may include three kinds of regions: "reserved region (foreground)", “non-reserved region (background)” and "to-be-determined region".
  • the probability that the pixel belongs to the region to be determined in the tripartite map may be referred to as the second probability.
  • the first probability characterizes the probability that the pixel belongs to the reserved area (foreground) or the non-reserved area (background)
  • the higher the value the lower the probability that the second probability indicates that the pixel belongs to the region to be determined in the tripartite map. For example, the closer the first probability is to 1 and 0, the closer the second probability is to 0; the closer the first probability is to 0.5, the closer the second probability is to 1.
  • the above conversion principle is that if a pixel in the image has a higher probability of belonging to the reserved area (foreground), or a higher probability of belonging to the non-reserved area (background), the lower the probability of the pixel belonging to the area to be determined; and
  • the probability that the pixel belongs to the reserved area (foreground) or the non-reserved area (background) is around 0.5, it means that the probability that the pixel belongs to the area to be determined is higher.
  • the first probability can be converted to obtain the second probability.
  • the embodiment of the present disclosure does not limit the specific formula of probability conversion, and the following is only an example:
  • polynomial fitting is used to convert the first probability into the second probability, which can make the polynomial conversion calculation more efficient, and also more accurately reflect the above conversion principle.
  • a semantic probability map can be obtained, and the reserved area (foreground) and non-reserved area (background) in the original image can be roughly distinguished through the semantic probability map. For example, if the first probability of a pixel belonging to the foreground is 0.96, then the probability of belonging to the foreground is very high; if the first probability of a pixel belonging to the foreground is 0.14, it means that the probability of the pixel belonging to the background is very high.
  • the second probability that each pixel belongs to the region to be determined can be obtained.
  • the first probability corresponding to the pixel in the semantic probability map and the second probability that the pixel belongs to the region to be determined can be combined for probability fusion, and the pixel can be obtained in the tripartition map soft-
  • the corresponding numerical value in the trimap which can represent the probability that the pixel belongs to any area in the determined reserved area (foreground), non-reserved area (background) or undetermined area in the original image.
  • the value corresponding to a pixel is closer to 1, it means that the pixel is more likely to belong to the reserved area (foreground) in the original image; the closer the value of the pixel in soft-trimap is to 0, it means The pixel is more likely to belong to the non-reserved area (background); the closer the value of the pixel in the soft-trimap is to 0.5, the more likely the pixel belongs to the area to be determined. That is, the probability that the pixel belongs to any one of the reserved area, the non-reserved area, or the area to be determined can be expressed by the value corresponding to the pixel in the soft-trimap.
  • soft_trimap -k5*un/k6*sign(score-k7)+(sign(score-k7)+k8)/k9....(7)
  • soft_trimap represents the value corresponding to the pixel in soft-trimap
  • un represents the second probability
  • score represents the first probability
  • sign() represents the sign function.
  • this embodiment does not limit the specific values of the aforementioned coefficients "k5/k6/k7/k8".
  • the probability conversion based on the semantic probability map is realized Process to get the three-point map soft_trimap.
  • pooling processing may be performed on the semantic probability map, and the above-mentioned probability conversion processing is performed on the pooled semantic probability map. See equation (8) below:
  • the average pooling process can be performed on the semantic probability map, and the pooling is performed according to the convolution stride and the convolution kernel size (kernel_size, ks).
  • score_ represents the semantic probability map after pooling, which contains the probability of each pooling.
  • the scores in the above formulas (6) and (7) are replaced with the pooled probability, that is, the pooled semantic probability map is used to perform probability conversion.
  • the size of the kernel used in the above pooling process can be adjusted, and the pooling process is performed before the probability conversion of the semantic probability map, which helps to adjust the width of the area to be determined in the soft_trimap to be generated by adjusting the size of the convolution kernel .
  • the image size of the original image can also be preprocessed, and the preprocessing can be based on the semantic segmentation network
  • the image size of the original image is processed by an integer multiple of the downsampling multiple, so that the image size after the integer multiple processing can be divided by the above-mentioned downsampling multiple scale_factor, which is the semantic segmentation network for the original image.
  • the downsampling multiple the specific value is determined by the network structure of the semantic segmentation network.
  • the semantic probability map obtained by semantic segmentation based on the original image can be probabilistically converted to obtain the three-part map, which makes the acquisition of the three-part map faster and more convenient, no manual labeling is required, and It is no longer necessary to train the prediction network through trimap annotations, which makes the process of map matting easier to implement; moreover, this method of using probability conversion to obtain a trimap is based on the semantic probability map of semantic segmentation, so that the generated trimap The graph is more accurate.
  • the process of the matting process may include: using the tripartite image and the original image as the input of the matting network, obtaining the residual of the reserved area output by the matting network and the initial transparency of the original image picture.
  • the residual of the reserved area may be a residual result obtained by the residual processing unit in the matting network.
  • the reserved area residual may indicate a difference between a pixel value of a pixel in the reserved area extracted by the residual processing unit and a pixel value of a corresponding pixel in the original image.
  • the value of the pixel in the initial transparency map indicates the transparency of the pixel.
  • the first image can be obtained based on the original image and the reserved area residual (for example, the foreground image can be obtained based on the addition of the foreground residual and the original image, or the background image can be obtained based on the addition of the background residual and the original image), and can According to the trimap soft_trimap, the values of the pixels in the initial transparency map are adjusted to obtain a transparency map corresponding to the original image.
  • the foreground image can be obtained based on the addition of the foreground residual and the original image, or the background image can be obtained based on the addition of the background residual and the original image
  • the transparency values of pixels in the reserved region of the trimap in the initial transparency map may be adjusted to a first value.
  • the transparency values of pixels in the non-reserved region of the trimap in the initial transparency map may be adjusted to a second value.
  • the transparency value of the pixel in the reserved area of the tripartite map in the initial transparency map can be adjusted to 1
  • the transparency value of the pixels in the non-reserved area of the tripartite map in the initial transparency map can be adjusted to 0, and the transparency value of the pixels in the initial transparency map in the region to be determined can be greater than 0.5
  • the opacity value is adjusted to 1, and the opacity value less than 0.5 is adjusted to 0.
  • the three-part map can be used for matting, so that the detailed information of the handover position between the reserved area and the non-reserved area can be well preserved, and it is beneficial to improve the reserved area and the target when performing area replacement. Cohesion effect between regions.
  • the area replacement software on the mobile phone mainly uploads data to the server for processing, and then transmits the area replacement result back to the mobile phone for local reading.
  • the security and privacy of data in this scheme are difficult to guarantee.
  • the network deployed to the mobile terminal can be miniaturized, and the size of the original image can be scaled, so that the time-consuming and memory consumption of the mobile terminal can be reduced.
  • the processing capability of the terminal there is no need to replace the region through the server to ensure data security and privacy.
  • An example of image keying on the mobile terminal is described as follows.
  • a semantic segmentation network and an image matting network may be used.
  • the semantic segmentation network may be a network such as SegNet, U-Net, etc.
  • the matting network may include an encoder (encoder) and a decoder (decoder).
  • the encoder of the image matting network can adopt the structure design of mobv2, and before the image matting network is deployed to the mobile terminal, the channel compression of the image matting network can be performed, and the channel compression can be carried out in the middle of the network of the image matting network The number of channels of the features (that is, the features of the middle layer of the network) is compressed.
  • the number of output channels of the convolution kernel in the process of matting network processing. Assuming that the number of output channels of the convolution kernel is originally a, it can be compressed according to 0.35 times the number of channels. After compression, the output of the convolution kernel The number of channels is 0.35*a.
  • FIG. 7a is a schematic diagram of a character image shown in an embodiment of the present disclosure.
  • the sky area in the person image shown in FIG. 7a is used as a background area, which is also a non-reserved area, and needs to be replaced with another sky area (ie, the target area in this disclosure) in the pre-acquired material image.
  • the non-sky area in Figure 7a is the reserved area output by the matting network in this example, that is, the foreground area.
  • FIG. 8 is a schematic diagram of a region replacement process shown in an embodiment of the present disclosure.
  • FIG. 9 is a schematic flowchart of the region replacement method based on FIG. 8 .
  • the region replacement method may include S901-S909. Unless otherwise specified, the present disclosure does not specifically limit the execution order of these steps.
  • the original image in this embodiment may be the person image shown in Fig. 7a.
  • the person image may be captured by the user through the camera of the mobile terminal, or may be an image stored in the mobile terminal or received from other devices.
  • the purpose of the matting process in this embodiment may be to extract the non-sky area in the person image.
  • Non-sky regions in the original image can be considered as foreground.
  • the original image can be scaled in order to reduce the processing load on the mobile terminal and save the calculation amount of the mobile terminal. Assuming that the size of the original image in Figure 7a is 1080*1920, the image can be scaled to a size of 480*288. For example, scaling can be done by way of bilinear difference. Scaling can be performed with reference to the following formula (9) and formula (10):
  • h and w are the length and width of the original image
  • basesize is the base size, which is 480 in this example
  • int(x) means rounding x
  • new_h and new_w are the scaled dimensions of the original image respectively, where the specific values of the coefficients in formula (10) are not limited in this embodiment.
  • the image size of the original image can be processed by an integer multiple of the downsampling multiple to control the scaled image size to be able to divide the semantic segmentation network's downsampling multiple scale_factor of the image. It can be understood that other formulas may also be used for the integer multiple processing, and are not limited to the following two formulas.
  • This embodiment does not limit the specific values of the respective coefficients in the above formula (11) and formula (12).
  • the above values of k12 to k15 may all be set to 1. If the original image before scaling is marked with A, then the original image obtained by normalizing the image after being scaled into a 480*288 image can be marked with B. Referring to FIG. 8 , the original image B is the original image after scaling.
  • the semantic segmentation process can be performed on the original image B through the semantic segmentation network 81, and the semantic probability map 82 output by the semantic segmentation network can be obtained.
  • the semantic probability map can be identified by score, and Fig. 7b shows a semantic probability picture. It can be seen that the score of the semantic probability map indicates the probability that the pixel belongs to the non-sky area (foreground), which roughly distinguishes the foreground and background in the image, that is, roughly distinguishes the sky area from the non-sky area.
  • the trimap soft-trimap may be generated according to the probability conversion process described in the aforementioned S604.
  • the semantic probability map can be pooled according to formula (8), and then the probability conversion process can be performed on the pooled semantic probability map according to formula (6) and formula (7) to generate a tripartite map. See this tripartite diagram 83 in FIG. 8 .
  • FIG. 7c illustrates a three-part map soft-trimap. It can be seen that the probability value of the pixel in the soft-trimap can represent the probability that the pixel belongs to three types of regions. According to the probability value A "sky area”, a "non-sky area” and "a region to be determined between the sky area and the non-sky area" in the image can be distinguished.
  • the three-part image 83 and the original image B can be used as the input of the matting network 84, and the matting network can output a 4-channel result, wherein the result of one channel is the initial transparency map raw_alpha, and in addition The result of the three channels is the foreground residual fg_res.
  • the first result 85 output by the keying network in FIG. 8 may include "raw_alpha+fg_res".
  • the foreground is the non-sky area in the person image.
  • the foreground residual fg_res can be enlarged through the bilinear difference, so that it can be restored to the scale before the original image is scaled, and then execute the formula (13):
  • a matting result 86 that is, the foreground image FG in the original image, can be obtained.
  • clip(x, s1, s2) is to limit the value of x to [s1, s2].
  • This embodiment does not limit specific values of s1 and s2 in the above formula (13), for example, s1 may be 0, and s2 may be 1.
  • Alpha represents the transparency corresponding to the non-sky area. After obtaining the Alpha, the Alpha can be enlarged back to the original size of the original image before scaling through the bilinear difference.
  • this embodiment does not limit the specific values of the respective coefficients s3 to s8 in the above formula (14) and formula (15).
  • Fig. 7d shows a transparency map Alpha, where the non-sky area and the sky area can be clearly distinguished in Alpha.
  • the first value of a pixel in the sky area is 1, indicating non-transparency.
  • the second value of pixels in the non-sky area is 0, which means completely transparent.
  • Fig. 7e illustrates the extracted non-sky area, ie the foreground image FG.
  • the value obtained after the short side of the person image is divided by 20 can be used as the step, and the matting result 86 (foreground image FG and Alpha) and the material image 87 are sampled.
  • the color difference can be obtained based on the method shown in FIG. 3 .
  • the first pixel mean value fg_mean of the foreground sampling point can be obtained according to the foregoing formula (1).
  • the color difference diff can be obtained according to the aforementioned formula (2).
  • tone adjustment may be performed according to the tone adjustment method shown in FIG. 4 .
  • the non-sky area (the foreground image FG in the matting result) can be adjusted according to the preset adjustment coefficient q first, according to the aforementioned formula (3), to obtain the preliminary adjusted foreground image (the third image). Then, based on the aforementioned formula (4), the tone correction is performed on the preliminarily adjusted foreground image (third image) to obtain the final adjustment result 88, that is, the final adjusted foreground image new_FG' (second image).
  • image fusion can be performed according to the foregoing formula (5), and the target image new is obtained after replacing the sky area.
  • FIG. 7f it is the target image after replacing the sky area obtained through S901-S909.
  • this method on the one hand, in the process of region replacement, it can be ensured that the tone of the target region matches the non-sky region in the original image, thereby improving the effect of region replacement.
  • the three-part map can be used for matting, so that the detailed information of the intersection position between the sky area and the non-sky area can be well obtained.
  • area replacement it is beneficial to improve the connection effect between the sky area and the non-sky area. .
  • the method of region replacement is to directly obtain the matting result by using a single original image as input, that is, to provide an original image, and based on the region replacement method provided by the embodiment of the present disclosure, the corresponding region replacement method can be obtained.
  • the prediction of the foreground in the original image requires less input information, which makes the image processing more convenient.
  • FIG. 10 is a schematic flowchart of a network training method shown in an embodiment of the present disclosure. This method can be used for joint training of semantic segmentation network and matting network. As shown in Figure 10, the method may include the following processing:
  • each sample data in the training sample set may include a sample image, a first feature label corresponding to the sample image, and a second feature label corresponding to the sample image.
  • the first feature label may be a segmentation label for the sample image
  • the second feature label may be a matting label for the sample image.
  • S1004 for each sample data in the training sample set, process the sample data to obtain a global image including global image information of the sample image, a segmentation label corresponding to the global image, and local image information including the sample image The partial image of and the keying label corresponding to the partial image.
  • the first processing can be performed on the sample image of the sample data to obtain a global image including most of the image information of the sample image. It can be considered that the global image includes the global image information of the sample image.
  • the same first processing is performed on the first feature label of , to obtain the segmentation label corresponding to the global image.
  • the sample image can be scaled according to the size requirements of the semantic segmentation network for the input image, but still retain most of the image information of the sample image to obtain the global image, and perform the same scaling process on the first feature label to obtain Split tags.
  • the second processing is performed on the sample image of the sample data to obtain a partial image including partial image information of the sample image, and at the same time, the same second processing is performed on the second feature label corresponding to the sample image to obtain the keying corresponding to the partial image Label.
  • the sample image may be partially cropped to obtain a partial image including partial image information of the sample image, and the same partial cropping may be performed on the second feature label to obtain the matte label.
  • the matting network perform matting processing based on the trimap and the partial image, to obtain a matting result.
  • the matting result may indicate a matting result for a reserved region in the sample.
  • the obtained global image including global image information and the first label are used to train the first sub-network, and the local image including local image information and The second label trains the second sub-network to improve the joint training effect and reduce the risk of network effect degradation.
  • the generation of soft-trimap adopts the method of probability conversion processing, which can assist the network training to a certain extent and have a better effect.
  • soft-trimap can be adaptively adjusted during network training. For example, in the process of adjusting the network parameters of the semantic segmentation network according to the difference between the semantic probability map and the segmentation label, and adjusting the network parameters of the matting network based on the difference between the matting result and the matting label, the network parameters of the semantic segmentation network The parameters will be updated, and thus the semantic probability map output by the semantic segmentation network will also be updated.
  • the soft-trimap is generated based on the semantic probability map. Therefore, the update of the semantic probability map will bring the update of the three-part map soft-trimap, and then the matting result will also be updated. That is, in the network training process, it is usually iterated multiple times, and after each iteration, if the parameters of the semantic segmentation network are updated, even if the input is the same image, the semantic probability map, soft-trimap and matting results will be adaptive. Update, and continue to adjust the network parameters according to the updated results. This method of adaptively adjusting the soft-trimap will help to dynamically optimize the generated soft-trimap and matting results along with the adjustment of the semantic segmentation network, so that the training effect of the final model is better and more accurate. Extract into preserved regions in the target image.
  • Fig. 11 illustrates an image processing device, which can be applied to implement the image processing method of any embodiment of the present disclosure.
  • the apparatus may include: a map matting module 1110 , a determination module 1120 , an adjustment module 1130 and a fusion module 1140 .
  • the device 1100 includes:
  • the matting module 1110 is configured to perform matting processing on the original image to be processed to obtain a matting result, the matting result including a first image and a transparency map corresponding to the original image; the first image includes the A reserved area in the original image, the reserved area is the foreground or background in the original image; the numerical value of the pixel in the transparency map indicates the transparency of the pixel;
  • a determination module 1120 configured to determine the color difference between the reserved area and the target area according to the difference between the pixels in the first image and the pixels in the material image containing the target area; the target area for replacing non-preserved regions in said original image;
  • An adjustment module 1130 configured to adjust the hue of the pixels in the first image according to the color difference to obtain a second image that matches the hue of the target area;
  • the fusion module 1140 is configured to perform image fusion on the second image and the material image based on the transparency map to obtain a target image.
  • the determining module 1120 is specifically configured to:
  • the color difference between the reserved area and the target area is determined based on the difference between the pixel value of the first sampling point and the pixel value of the second sampling point.
  • the device 1100 also includes:
  • a module is used to sample pixels in the transparency map to obtain a third sampling point
  • the determining module 1120 is specifically used for:
  • the color difference between the reserved area and the target area is determined according to the difference between the first pixel mean value and the second pixel mean value.
  • the adjustment module 1130 is specifically used to:
  • the fusion module 1140 is specifically used for:
  • the map-cutting module 1110 is specifically used for:
  • the value corresponding to the pixel indicates that the pixel belongs to any of the reserved area, non-reserved area or undetermined area in the original image the probability of a region;
  • the map-cutting module 1110 is specifically used for:
  • Probability conversion processing is performed based on the semantic probability map to obtain a tripartite map corresponding to the original image.
  • the map-cutting module 1110 is specifically used for:
  • Performing image matting processing according to the trimap and the original image includes: performing matting processing according to the trimap and the original image through a matting network.
  • the map-cutting module 1110 is specifically used for:
  • a probability conversion is performed based on the first probability of the pixel to obtain a second probability that the pixel belongs to a region to be determined in the tripartite map;
  • the trimap is generated according to the first probability and the second probability of each pixel in the semantic probability map.
  • the first probability of each pixel in the semantic probability map represents the higher the probability that the pixel belongs to the foreground or the background, and the second probability obtained through probability conversion represents that the pixel belongs to the tripartite map The lower the probability of the area to be determined;
  • Generating the trimap according to the first probability and the second probability of each pixel in the semantic probability map includes: for each pixel in the original image, according to the corresponding Perform probability fusion of the first probability and the second probability to determine the value corresponding to the pixel in the tripartite map.
  • the map-cutting module 1110 is specifically used for:
  • the values of the pixels in the initial transparency map are adjusted to obtain the transparency map corresponding to the original image.
  • the device 1100 also includes:
  • a scaling module configured to scale the original image
  • the map-cutting module 1110 is specifically used for:
  • the first image is obtained according to the enlarged retained area residual and the original image.
  • the non-preserved area includes a sky area in the original image; the target area includes a sky area in the material image.
  • Embodiments of the image processing apparatus shown in the present disclosure can be applied to electronic equipment. Accordingly, the present disclosure discloses an electronic device, which may include: a processor.
  • Memory used to store processor-executable instructions.
  • the processor is configured to call executable instructions stored in the memory to implement the image processing method shown in any one of the foregoing embodiments.
  • FIG. 12 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.
  • the electronic device may include a processor for executing instructions, a network interface for connecting to a network, a memory for storing operation data for the processor, and a memory for storing instructions corresponding to the image processing device. volatile memory.
  • the embodiment of the device may be implemented by software, or by hardware or a combination of software and hardware.
  • software implementation as an example, as a device in a logical sense, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory for operation by the processor of the electronic device where it is located.
  • the electronic device where the device in the embodiment is usually based on the actual function of the electronic device can also include other Hardware, no more details on this.
  • the corresponding instructions of the image processing device may also be directly stored in the memory, which is not limited herein.
  • the present disclosure provides a computer-readable storage medium, the storage medium stores a computer program, and the computer program can be used to make a processor execute the image processing method shown in any one of the foregoing embodiments.
  • one or more embodiments of the present disclosure may be provided as a method, system or computer program product. Accordingly, one or more embodiments of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present disclosure may employ a computer implemented on one or more computer-usable storage media (which may include, but are not limited to, disk storage, CD-ROM, optical storage, etc.) with computer-usable program code embodied therein. The form of the Program Product.
  • Embodiments of the subject matter and functional operations described in this disclosure can be implemented in digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware that may include the structures disclosed in this disclosure and their structural equivalents, or their A combination of one or more of them.
  • Embodiments of the subject matter described in this disclosure can be implemented as one or more computer programs, i.e. one or more of computer program instructions encoded on a tangible, non-transitory program carrier for execution by or to control the operation of data processing apparatus. Multiple modules.
  • the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for transmission by the data
  • the processing means executes.
  • a computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
  • the processes and logic flows described in this disclosure can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output.
  • the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, such as an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit).
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • Computers suitable for the execution of a computer program may include, for example, general and/or special purpose microprocessors, or any other type of central processing unit.
  • a central processing unit will receive instructions and data from a read only memory and/or a random access memory.
  • the basic components of a computer may include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operatively coupled to, one or more mass storage devices for storing data, such as magnetic or magneto-optical disks, or optical disks, to receive data therefrom or Send data to it, or both.
  • mass storage devices for storing data, such as magnetic or magneto-optical disks, or optical disks, to receive data therefrom or Send data to it, or both.
  • a computer is not required to have such a device.
  • a computer may be embedded in another device such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a device such as a Universal Serial Bus (USB) ) portable storage devices like flash drives, to name a few.
  • PDA personal digital assistant
  • GPS Global Positioning System
  • USB Universal Serial Bus
  • Computer-readable media suitable for storing computer program instructions and data may include all forms of non-volatile memory, media and memory devices and may include, for example, semiconductor memory devices such as EPROM, EEPROM and flash memory devices, magnetic disks such as internal hard drives or removable disks), magneto-optical disks, and CD ROM and DVD-ROM disks.
  • semiconductor memory devices such as EPROM, EEPROM and flash memory devices
  • magnetic disks such as internal hard drives or removable disks
  • magneto-optical disks and CD ROM and DVD-ROM disks.
  • the processor and memory can be supplemented by, or incorporated in, special purpose logic circuitry.

Abstract

Embodiments of the present disclosure provide an image processing method and apparatus, an electronic device, and a storage medium. The method comprises: performing matting processing on an original image to obtain a matting result, the matting result comprising a first image and a transparency map corresponding to the original image; determining a color difference between a reserved area and a target area according to a difference between pixels in the first image and pixels in a material image containing the target area; performing tone adjustment on the pixels in the first image according to the color difference to obtain a second image matching the tone of the target area; and on the basis of the transparency map, performing image fusion on the second image and the material image to obtain a target image.

Description

图像处理Image Processing
相关申请的交叉引用Cross References to Related Applications
本申请要求在2021年10月29日提交至中国专利局、申请号为CN2021112739847的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This application claims priority to a Chinese patent application with application number CN2021112739847 filed with the China Patent Office on October 29, 2021, the entire contents of which are incorporated in this disclosure by reference.
技术领域technical field
本公开涉及计算机视觉技术,具体涉及图像处理。The present disclosure relates to computer vision techniques, and more particularly to image processing.
背景技术Background technique
区域替换作为一种基础性的图片编辑技术,被广泛应用于各种图片编辑软件,相机后端算法等场景。通常使用分割模型对原始图像进行语义分割,得到原始图像中被替换区域的粗糙的掩膜结果。然后再根据该掩膜结果与包含目标区域的图像进行融合,实现将所述被替换区域替换为所述目标区域。As a basic image editing technology, region replacement is widely used in various image editing software, camera back-end algorithms and other scenarios. Segmentation models are usually used to semantically segment the original image, resulting in a rough mask of the replaced region in the original image. Then, according to the mask result, it is fused with the image containing the target area, so as to replace the replaced area with the target area.
发明内容Contents of the invention
有鉴于此,本公开实施例至少提供一种图像处理方法、装置、电子设备及存储介质。In view of this, the embodiments of the present disclosure at least provide an image processing method, device, electronic device, and storage medium.
本公开提供一种图像处理方法,所述方法包括:对原始图像进行抠图处理,得到抠图结果,所述抠图结果中包括第一图像以及与所述原始图像对应的透明度图;所述第一图像包括所述原始图像中的保留区域,所述保留区域为所述原始图像中的前景或者背景;所述透明度图内像素的数值指示像素的透明度;根据所述第一图像内的像素以及包含目标区域的素材图像内的像素之间的差异,确定所述保留区域与所述目标区域之间的颜色差异;所述目标区域用于替换所述原始图像中的非保留区域;根据所述颜色差异对所述第一图像内像素进行色调调整,得到与所述目标区域色调匹配的第二图像;基于所述透明度图,将所述第二图像与所述素材图像进行图像融合,得到目标图像。The present disclosure provides an image processing method, the method comprising: performing matting processing on an original image to obtain a matting result, the matting result including a first image and a transparency map corresponding to the original image; the The first image includes a reserved area in the original image, and the reserved area is the foreground or background in the original image; the numerical value of the pixel in the transparency map indicates the transparency of the pixel; according to the pixel in the first image and the difference between pixels in the material image containing the target area, determine the color difference between the reserved area and the target area; the target area is used to replace the non-retained area in the original image; according to the The color difference is used to adjust the hue of the pixels in the first image to obtain a second image that matches the tone of the target area; based on the transparency map, image fusion is performed on the second image and the material image to obtain target image.
本公开提出一种图像处理装置,所述装置包括:抠图模块,用于对原始图像进行抠图处理,得到抠图结果,所述抠图结果中包括第一图像以及与所述原始图像对应的透明度图;所述第一图像包括所述原始图像中的保留区域,所述保留区域为所述原始图像中的前景或者背景;所述透明度图内像素的数值指示像素的透明度;确定模块,用于根据所述第一图像内的像素以及包含目标区域的素材图像内的像素之间的差异,确定所述保留区域与所述目标区域之间的颜色差异;所述目标区域用于替换所述原始图像中的非保留区域;调整模块,用于根据所述颜色差异对所述第一图像内像素进行色调调整,得到与所述目标区域色调匹配的第二图像;融合模块,用于基于所述透明度图,将所述第二图像与所述素材图像进行图像融合,得到目标图像。The present disclosure proposes an image processing device, which includes: a map-cutting module, configured to perform map-cutting processing on an original image to obtain a map-cutting result, the map-cutting result including a first image and an image corresponding to the original image The transparency map; the first image includes a reserved area in the original image, and the reserved area is the foreground or background in the original image; the numerical value of the pixel in the transparency map indicates the transparency of the pixel; the determination module, It is used to determine the color difference between the reserved area and the target area according to the difference between the pixels in the first image and the pixels in the material image containing the target area; the target area is used to replace the target area The non-retained area in the original image; the adjustment module is used to adjust the color tone of the pixels in the first image according to the color difference to obtain a second image that matches the color tone of the target area; the fusion module is used to adjust the color tone based on the color difference. For the transparency map, image fusion is performed on the second image and the material image to obtain a target image.
本公开提出一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器通过运行所述可执行指令以实现如前述任一实施例示出的图像处理方法。The present disclosure proposes an electronic device, including: a processor; a memory for storing executable instructions of the processor; wherein, the processor executes the executable instructions to implement the image processing method shown in any of the foregoing embodiments .
本公开提出一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于使处理器执行如前述任一实施例示出的图像处理方法。The present disclosure provides a computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to cause a processor to execute the image processing method as shown in any one of the foregoing embodiments.
本公开实施例提供的图像处理方法、装置、电子设备及存储介质公开的技术方案中,一方面,可以根据对原始图像进行抠图处理得到的保留区域内像素的像素值以及目标区域内像素的像素值,确定所述保留区域与所述目标区域之间的颜色差异,然后可以根据所述颜色差异对所述保留区域内像素进行色调调整,统一所述保留区域内像素与所述目标区域内像素的色调,从而进行区域替换的过程中,可以保证目标区域与原始图像中的非保留区域色调相匹配,从而提升区域替换效果。另一方面,可以利用三分图进行抠图,从而可以很好的保存保留区域与非保留区域交接位置的细节信息,在进行区域替换时,有利于提升保留区域与非保留区域之间的衔接效果。还一方面,还可以将抠像网络进行通道压缩等处理,并且将原始图像进行缩放处理,以使得抠像过程运行耗时和内存消耗在移动端的处理能力范围内,从而可以无需通过服务端进行区域替换,保证数据的安全与隐私性。In the technical solutions disclosed in the image processing method, device, electronic equipment, and storage medium provided by the embodiments of the present disclosure, on the one hand, the pixel values of the pixels in the reserved area and the pixel values of the pixels in the target area obtained by matting the original image can be Pixel value, determine the color difference between the reserved area and the target area, and then adjust the color tone of the pixels in the reserved area according to the color difference, and unify the pixels in the reserved area and the target area The tone of the pixel, so that in the process of area replacement, it can ensure that the tone of the target area matches the tone of the non-retained area in the original image, thereby improving the effect of area replacement. On the other hand, the three-part map can be used for matting, so that the detailed information of the junction position between the reserved area and the non-reserved area can be well preserved. When performing area replacement, it is beneficial to improve the connection between the reserved area and the non-reserved area. Effect. On the other hand, the keying network can also be processed by channel compression, etc., and the original image can be scaled, so that the time-consuming and memory consumption of the keying process can be within the processing capability of the mobile terminal, so that there is no need to go through the server. Region replacement ensures data security and privacy.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.
附图说明Description of drawings
为了更清楚地说明本公开一个或多个实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开一个或多个实施例中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in one or more embodiments of the present disclosure or related technologies, the following will briefly introduce the drawings that need to be used in the descriptions of the embodiments or related technologies. Obviously, the accompanying drawings in the following description The drawings are only some embodiments described in one or more embodiments of the present disclosure, and those skilled in the art can obtain other drawings based on these drawings without any creative effort.
图1为本公开实施例示出的一种图像处理方法的流程示意图;FIG. 1 is a schematic flowchart of an image processing method shown in an embodiment of the present disclosure;
图2为本公开实施例示出的一种确定颜色差异的方法流程示意图;FIG. 2 is a schematic flowchart of a method for determining color differences shown in an embodiment of the present disclosure;
图3为本公开实施例示出的一种确定颜色差异的方法流程示意图;FIG. 3 is a schematic flowchart of a method for determining color differences shown in an embodiment of the present disclosure;
图4为本公开实施例示出的一种色调调整方法的流程示意图;FIG. 4 is a schematic flow chart of a tone adjustment method shown in an embodiment of the present disclosure;
图5为本公开实施例示出的一种抠图方法的流程示意图;FIG. 5 is a schematic flow diagram of a map-cutting method shown in an embodiment of the present disclosure;
图6为本公开实施例示出的一种获取三分图方法的流程示意图;FIG. 6 is a schematic flowchart of a method for obtaining a tripartite graph shown in an embodiment of the present disclosure;
图7a为本公开实施例示出的一种人物图像示意图;Fig. 7a is a schematic diagram of a character image shown in an embodiment of the present disclosure;
图7b为本公开实施例示出的一种语义概率图示意图;Fig. 7b is a schematic diagram of a semantic probability map shown in an embodiment of the present disclosure;
图7c为本公开实施例示出的一种三分图示意图;Fig. 7c is a schematic diagram of a tripartite graph shown in an embodiment of the present disclosure;
图7d为本公开实施例示出的一种透明度图示意图;Fig. 7d is a schematic diagram of a transparency map shown in an embodiment of the present disclosure;
图7e为本公开实施例示出的一种前景图像示意图;Fig. 7e is a schematic diagram of a foreground image shown in an embodiment of the present disclosure;
图7f为本公开实施例示出的一种目标图像示意图;Fig. 7f is a schematic diagram of a target image shown in an embodiment of the present disclosure;
图8为本公开实施例示出的一种区域替换的过程示意图;FIG. 8 is a schematic diagram of a region replacement process shown in an embodiment of the present disclosure;
图9为基于图8的区域替换流程示意图;FIG. 9 is a schematic diagram of an area replacement process based on FIG. 8;
图10为本公开实施例示出的一种网络训练方法的流程示意图;FIG. 10 is a schematic flowchart of a network training method shown in an embodiment of the present disclosure;
图11为本公开实施例示出的一种图像处理方法的结构示意图;FIG. 11 is a schematic structural diagram of an image processing method shown in an embodiment of the present disclosure;
图12为本公开实施例示出的一种电子设备的硬件结构示意图。Fig. 12 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.
具体实施方式Detailed ways
下面将详细地结合附图对示例性实施例进行说明。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的设备和方法的例子。Exemplary embodiments will be described in detail below with reference to the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of devices and methods consistent with aspects of the present disclosure as recited in the appended claims.
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指包含一个或多个相关联的列出项目的任一或所有可能组合。还应当理解,本文中所使用的词语“如果”,取决于语境,可以被解释成为“在……时”或“当……时”或“响应于确定”。The terminology used in the present disclosure is for the purpose of describing particular embodiments only, and is not intended to limit the present disclosure. As used in this disclosure and the appended claims, the singular forms "a", "the", and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. It should also be understood that the word "if", as used herein, could be interpreted as "at" or "when" or "in response to a determination", depending on the context.
本公开涉及增强现实领域,通过获取现实环境中的目标对象的图像信息,进而借助各类视觉相关算法实现对目标对象的相关特征、状态及属性进行检测或识别处理,从而得到与具体应用匹配的虚拟与现实相结合的AR(Augmented Reality)效果。示例性的,目标对象可涉及与人体相关的脸部、肢体、手势、动作等,或者与物体相关的标识物、标志物,或者与场馆或场所相关的沙盘、展示区域或展示物品等。视觉相关算法可涉及视觉定位、SLAM(Simultaneous Localization and Mapping)、三维重建、图像注册、背景分割、对象的关键点提取及跟踪、对象的位姿或深度检测等。具体应用不仅可以涉及跟真实场景或物品相关的导览、导航、讲解、重建、虚拟效果叠加展示等交互场景,还可以涉及与人相关的特效处理,比如妆容美化、肢体美化、特效展示、虚拟模型展示等交互场景。可通过卷积神经网络,实现对目标对象的相关特征、状态及属性进行检测或识别处理。前述卷积神经网络是基于深度学习框架进行模型训练而得到的神经网络模型。This disclosure relates to the field of augmented reality. By acquiring the image information of the target object in the real environment, and then using various visual correlation algorithms to detect or identify the relevant features, states and attributes of the target object, and thus obtain the image information that matches the specific application. AR (Augmented Reality) effect combining virtual and reality. Exemplarily, the target object may involve faces, limbs, gestures, actions, etc. related to the human body, or markers and markers related to objects, or sand tables, display areas or display items related to venues or places. Vision-related algorithms may involve visual positioning, SLAM (Simultaneous Localization and Mapping), 3D reconstruction, image registration, background segmentation, object key point extraction and tracking, object pose or depth detection, etc. Specific applications can not only involve interactive scenes such as guided tours, navigation, explanations, reconstructions, virtual effect overlays and display related to real scenes or objects, but also special effects processing related to people, such as makeup beautification, body beautification, special effect display, virtual Interactive scenarios such as model display. The relevant features, states and attributes of the target object can be detected or identified through the convolutional neural network. The aforementioned convolutional neural network is a neural network model obtained through model training based on a deep learning framework.
区域替换作为一种基础性的图片编辑技术,被广泛应用于各种图片编辑软件,相机后端算法等场景。通常使用分割模型对原始图像进行语义分割,得到原始图像中被替换区域的粗糙的掩膜结果。然后再根据该掩膜结果与包含目标区域的图像进行融合,实现将所述被替换区域替换为所述目标区域。但是,相关技术中,所述目标区域与所述原始图像会存在明显的色调差异,直接进行替换往往会出现明显的画面色调不统一的情况,导致区域替换效果差。As a basic image editing technology, region replacement is widely used in various image editing software, camera back-end algorithms and other scenarios. Segmentation models are usually used to semantically segment the original image, resulting in a rough mask of the replaced region in the original image. Then, according to the mask result, it is fused with the image containing the target area, so as to replace the replaced area with the target area. However, in the related art, there is an obvious color tone difference between the target area and the original image, and a direct replacement often results in an obvious inconsistency in picture color tone, resulting in a poor area replacement effect.
有鉴于此,本公开提出一种图像处理方法。该方法在区域替换时,可以保证目标区域与原始图像中的保留区域色调匹配,从而提升区域替换效果。In view of this, the present disclosure proposes an image processing method. This method can ensure that the tone of the target area matches the reserved area in the original image during area replacement, thereby improving the effect of area replacement.
请参见图1,图1为本公开实施例示出的一种图像处理方法的流程示意图。Please refer to FIG. 1 , which is a schematic flowchart of an image processing method shown in an embodiment of the present disclosure.
图1示出的处理方法可以应用于电子设备中。其中,所述电子设备可以通过搭载与处理方法对应的软件逻辑执行该方法。所述电子设备的类型可以是笔记本电脑,计算机,手机,掌上电脑(Personal Digital Assistant,PDA)等。在本公开中不特别限定所述电子设备的类型。所述电子设备也可以是客户端设备和/或服务端设备,在此不作特别限定。The processing method shown in FIG. 1 can be applied to electronic equipment. Wherein, the electronic device may execute the method by carrying software logic corresponding to the processing method. The type of the electronic device may be a notebook computer, a computer, a mobile phone, a PDA (Personal Digital Assistant, PDA) and the like. The type of the electronic device is not particularly limited in the present disclosure. The electronic device may also be a client device and/or a server device, which is not specifically limited here.
如图1所示,所述图像处理方法可以包括S102-S108。除特别说明外,本公开不特别限定这些步骤的执行顺序。As shown in Fig. 1, the image processing method may include S102-S108. Unless otherwise specified, the present disclosure does not specifically limit the execution order of these steps.
S102,对待处理的原始图像进行抠图处理,得到抠图结果,所述抠图结果中包括第一图像以及与所述原始图像对应的透明度图;所述第一图像包括所述原始图像中的保留区域,所述保留区域为所述原始图像中的前景或者背景;所述透明度图内像素的数值指示像素的透明度。S102. Perform matting processing on the original image to be processed to obtain a matting result, the matting result including a first image and a transparency map corresponding to the original image; the first image includes A reserved area, the reserved area is the foreground or background in the original image; the value of the pixel in the transparency map indicates the transparency of the pixel.
所述原始图像需进行区域替换。该原始图像可以包括保留区域与非保留区域。在区域替换场景中,所述非保留区域一般作为被替换区域,被其它素材进行替换。所述保留区域与所述非保留区域可以通过图像处理技术被区分开来。The original image needs to be replaced. The original image may include reserved areas and non-reserved areas. In an area replacement scenario, the non-reserved area is generally used as a replaced area, which is replaced by other materials. The reserved area and the non-reserved area can be distinguished by image processing techniques.
所述保留区域,是指在对图像进行区域替换的过程中进行保留,不被替换的区域。例如,在替换背景区域的场景中,前景区域为保留区域。比如需要将人物图像中的背景区域(如天空区域)进 行替换,其中包含人物的前景区域可以作为所述保留区域。在替换前景区域的场景中,背景区域即为保留区域。The reserved area refers to an area that is reserved and not replaced during the process of performing area replacement on the image. For example, in a scene where background areas are replaced, foreground areas are preserved. For example, the background area (such as the sky area) in the person image needs to be replaced, and the foreground area containing the person can be used as the reserved area. In scenes where the foreground area is replaced, the background area is the preserved area.
所述第一图像中可以包括从所述原始图像中抠出的保留区域。所述第一图像的大小与所述原始图像的大小相同。所述第一图像中除所述保留区域以外的区域可以用预设像素值的像素填充。例如,所述预设像素值可以是0,1等。The first image may include a reserved area cut out from the original image. The first image has the same size as the original image. Areas in the first image other than the reserved area may be filled with pixels of preset pixel values. For example, the preset pixel value may be 0, 1 and so on.
所述透明度图,用于通过透明度的取值的不同来区分保留区域与非保留区域。所述透明度图内像素的数值指示对应像素的透明度。在一些实施例中,所述透明度图中属于所述保留区域的像素的透明度值为第一取值,属于非保留区域的像素的透明度值为第二取值。The transparency map is used to distinguish reserved areas and non-reserved areas by different values of transparency. The value of a pixel within the transparency map indicates the transparency of the corresponding pixel. In some embodiments, the transparency values of the pixels belonging to the reserved area in the transparency map are the first value, and the transparency values of the pixels belonging to the non-reserved area are the second value.
在不同的场景中,所述第一取值与所述第二取值会有所变化。In different scenarios, the first value and the second value will vary.
例如,在区域替换,且不会保留原有的非保留区域的场景中,透明度图的所述第一取值可以为1,指示所述保留区域的像素为非透明,透明度图的所述第二取值可以为0,指示非保留区域的像素为透明。通过该透明度对非保留区域进行替换,完全不会保留原有的非保留区域。再例如,在区域替换,需要虚化原有的非保留区域的场景中,透明度图的所述第一取值可以为1,指示所述保留区域的像素为非透明,透明度图的所述第二取值可以为0.3,指示非保留区域的像素为半透明。通过该透明度对非保留区域进行替换,可以虚化原有的非保留区域。在一些实施方式中,在S102中,可以获取待处理的原始图像对应的三分图;针对所述三分图中每个像素,该像素对应的数值表示该像素属于所述原始图像中的保留区域、非保留区域或待确定区域中的任一区域的概率;然后可以根据所述三分图,对所述原始图像进行抠像处理,得到所述第一图像与所述透明度图。For example, in a scene where the area is replaced and the original non-reserved area will not be retained, the first value of the transparency map may be 1, indicating that the pixels in the reserved area are non-transparent, and the first value of the transparency map The second value can be 0, indicating that the pixels in the non-reserved area are transparent. The non-reserved area is replaced by this transparency, and the original non-reserved area is not preserved at all. For another example, in the scene where area replacement needs to blur the original non-reserved area, the first value of the transparency map can be 1, indicating that the pixels in the reserved area are non-transparent, and the first value of the transparency map The second value can be 0.3, indicating that the pixels in the non-reserved area are semi-transparent. By replacing the non-reserved area with this transparency, the original non-reserved area can be blurred. In some implementations, in S102, the three-part map corresponding to the original image to be processed can be obtained; for each pixel in the three-part map, the value corresponding to the pixel indicates that the pixel belongs to the reserved area, non-reserved area, or area to be determined; then, according to the tripartite map, the original image can be matted to obtain the first image and the transparency map.
所述三分图具备区分图像中前景,背景,以及前景与背景之间交接区域的特性。即,不论保留区域为原始图像中的前景还是背景,利用所述三分图区分出原始图像中保留区域,非保留区域,以及保留区域与非保留区域之间的待确定区域,以保存保留区域与非保留区域交接位置的细节信息。The trimap has the characteristics of distinguishing the foreground, the background, and the transition area between the foreground and the background in the image. That is, regardless of whether the reserved area is the foreground or the background in the original image, the three-part map is used to distinguish the reserved area in the original image, the non-reserved area, and the area to be determined between the reserved area and the non-reserved area, so as to save the reserved area Details of the location of the handoff with the non-reserved area.
在一些实施方式中,也可以采用预先训练的抠图网络进行抠图处理。所述抠图网络预先通过标注了透明度信息与保留区域信息的训练样本通过有监督方式训练得到。将所述原始图像输入该抠图网络即可得到第一图像与透明度图。In some implementation manners, a pre-trained matting network may also be used for matting processing. The matting network is trained in a supervised manner through training samples marked with transparency information and reserved area information in advance. The first image and the transparency map can be obtained by inputting the original image into the matting network.
S104,根据所述第一图像内的像素以及包含目标区域的素材图像内的像素之间的差异,确定所述保留区域与所述目标区域之间的颜色差异;所述素材图像中的目标区域用于替换所述原始图像中的非保留区域。S104. Determine the color difference between the reserved area and the target area according to the difference between the pixels in the first image and the pixels in the material image containing the target area; the target area in the material image Used to replace non-preserved regions in the original image.
所述素材图像一般是预先获取的一些图像,这些图像包含用于替换非保留区域的替换素材。这些替换素材在所述素材图像中所占的区域可以被称为目标区域。例如,在天空替换场景中,所述素材图像可以包含一些天空素材,这些天空素材可以用于替换原始图像中的天空(即,原始图像中的非保留区域)。The material images are generally some pre-acquired images, and these images contain replacement materials used to replace non-reserved areas. The areas occupied by these replacement materials in the material image may be referred to as target areas. For example, in the sky replacement scene, the material image may contain some sky materials, and these sky materials may be used to replace the sky in the original image (that is, the non-preserved area in the original image).
所述颜色差异,是指所述保留区域内像素与所述目标区域内像素的像素值差异。The color difference refers to the pixel value difference between the pixels in the reserved area and the pixels in the target area.
像素的像素值可以指示像素的颜色值。示例性的,可以通过计算所述第一图像内像素的像素值与所述素材图像内像素的像素值之间的差异均值可以得到所述保留区域与所述目标区域之间的颜色差异。A pixel value of a pixel may indicate a color value of the pixel. Exemplarily, the color difference between the reserved area and the target area may be obtained by calculating an average difference between pixel values of pixels in the first image and pixel values of pixels in the material image.
在一些实施方式中,可以通过采样的方式在保留区域与目标区域内采样,通过采样点的像素值确定所述颜色差异,从而减少确定颜色差异的运算量,进而提升区域替换效率。In some implementations, the reserved area and the target area can be sampled by sampling, and the color difference can be determined by the pixel value of the sampling point, thereby reducing the amount of computation for determining the color difference, thereby improving the efficiency of area replacement.
请参见图2,图2为本公开实施例示出的一种确定颜色差异的方法流程示意图。图2示出的步骤为对S104的说明。如图2所示,所述确定颜色差异的方法可以包括S202-S204。除特别说明外,本公开不特别限定这些步骤的执行顺序。Please refer to FIG. 2 . FIG. 2 is a schematic flowchart of a method for determining color differences shown in an embodiment of the present disclosure. The steps shown in FIG. 2 are descriptions of S104. As shown in Fig. 2, the method for determining the color difference may include S202-S204. Unless otherwise specified, the present disclosure does not specifically limit the execution order of these steps.
S202,对所述第一图像内的像素和所述素材图像内的像素分别进行采样,得到第一采样点与第二采样点。S202. Sampling the pixels in the first image and the pixels in the material image respectively to obtain a first sampling point and a second sampling point.
在一些实施方式中,可以预先设置采用步长(step)。例如可以将原始图像的短边整除预设值(例如,10,20,30等)后的结果确定为step。针对所述第一图像,可以按照预设顺序(例如,从左到右,从上到下的顺序)及设定的步长进行采样,得到一些第一采样点;以及针对所述素材图像,按照所述预设顺序(例如,从左到右,从上到下的顺序)及设定的步长进行采样,得到一些第二采样点。In some implementations, the step size (step) can be preset. For example, the result obtained by dividing the short side of the original image by a preset value (for example, 10, 20, 30, etc.) may be determined as the step. For the first image, sampling may be performed in a preset order (for example, from left to right, from top to bottom) and a set step size to obtain some first sampling points; and for the material image, Sampling is performed according to the preset order (for example, from left to right, from top to bottom) and a set step size to obtain some second sampling points.
S204,基于所述第一采样点的像素值与第二采样点的像素值之间的差异,确定所述保留区域与所述目标区域之间的颜色差异。S204. Based on the difference between the pixel value of the first sampling point and the pixel value of the second sampling point, determine the color difference between the reserved area and the target area.
在一些实施方式中,可以根据所述第一采样点的像素值,确定第一采样点的像素均值或者像素中位数值,根据所述第二采样点的像素值,确定第二采样点的像素均值或者像素中位数值。然后基于两个像素均值或两个像素中位数值之差,确定所述颜色差异。In some embodiments, the pixel mean value or pixel median value of the first sampling point can be determined according to the pixel value of the first sampling point, and the pixel value of the second sampling point can be determined according to the pixel value of the second sampling point Mean or pixel median value. The color difference is then determined based on the difference between two pixel means or two pixel median values.
由此根据采样点的像素均值或像素中位数值代表采样点像素值,可以简化运算。Therefore, the pixel value of the sampling point is represented according to the pixel mean value or the pixel median value of the sampling point, which can simplify the operation.
在一些实施方式中,可以在计算第一采样点像素均值的时候,结合第一采样点的透明度,从而使确定的像素均值更准确,有助于准确确定保留区域与目标区域之间的颜色差异,进而提升色调调整效果。In some implementations, when calculating the pixel mean value of the first sampling point, the transparency of the first sampling point can be combined, so that the determined pixel mean value is more accurate, which helps to accurately determine the color difference between the reserved area and the target area , thereby enhancing the tone adjustment effect.
该实施方式中,可以先对所述透明度图内的像素进行采样,得到第三采样点。In this implementation manner, the pixels in the transparency map may be first sampled to obtain the third sampling point.
具体地,可以采用前述S202公开的step进行采样,得到一些第三采样点。Specifically, the steps disclosed in the foregoing S202 may be used for sampling to obtain some third sampling points.
请参见图3,图3为本公开实施例示出的一种确定颜色差异的方法流程示意图。图3示出的步骤为对S204的补充说明。如图3所示,所述确定颜色差异的方法可以包括S302-S306。除特别说明外,本公开不特别限定这些步骤的执行顺序。Please refer to FIG. 3 . FIG. 3 is a schematic flowchart of a method for determining color differences shown in an embodiment of the present disclosure. The steps shown in FIG. 3 are supplementary descriptions of S204. As shown in Fig. 3, the method for determining the color difference may include S302-S306. Unless otherwise specified, the present disclosure does not specifically limit the execution order of these steps.
S302,基于所述第一采样点的像素值,与所述第三采样点的透明度值,确定所述第一采样点的第一像素均值。S302. Determine a first pixel average value of the first sampling point based on the pixel value of the first sampling point and the transparency value of the third sampling point.
为了区分,本公开将基于所述第一采样点的像素值与所述第三采样点的透明度值确定的第一采样点的像素均值称为第一像素均值。In order to distinguish, the disclosure refers to the pixel mean value of the first sampling point determined based on the pixel value of the first sampling point and the transparency value of the third sampling point as the first pixel mean value.
本公开实施例不限制如何确定所述第一像素均值的具体公式,如下仅示例一种:The embodiment of the present disclosure does not limit how to determine the specific formula of the first pixel mean value, and the following is only an example:
Figure PCTCN2022125012-appb-000001
Figure PCTCN2022125012-appb-000001
如上的公式(1),其中fg_mean指示所述第一像素均值。FG1是指第一采样点的像素值。Alpha1是指第三采样点的透明度值。通过公式(1)可以结合采样点的透明度,得到准确的第一像素均值。Formula (1) above, wherein fg_mean indicates the first pixel mean value. FG1 refers to the pixel value of the first sampling point. Alpha1 refers to the transparency value of the third sampling point. The exact mean value of the first pixel can be obtained by combining the transparency of the sampling point through the formula (1).
S304,基于所述第二采样点的像素值,确定所述第二采样点的第二像素均值。S304. Determine a second average pixel value of the second sampling point based on the pixel value of the second sampling point.
本公开将第二采样点的像素均值称为第二像素均值。In this disclosure, the pixel mean value of the second sampling point is referred to as the second pixel mean value.
通过平均数计算公式,对BG1求平均即可得到所述第二像素均值bg_mean。其中BG1为第二采样点的像素值。The second pixel mean value bg_mean can be obtained by averaging BG1 through the mean number calculation formula. Wherein BG1 is the pixel value of the second sampling point.
S306,根据所述第一像素均值与所述第二像素均值之间的差异,确定所述保留区域与所述目标区域之间的颜色差异。S306. Determine the color difference between the reserved area and the target area according to the difference between the first pixel average value and the second pixel average value.
像素的像素值可以指示像素的颜色信息。通过像素值的差异可以确定像素的颜色差异。本公开实施例不限制如何确定颜色差异的具体公式,如下仅示例一种:A pixel value of a pixel may indicate color information of the pixel. The difference in color of a pixel can be determined by the difference in pixel value. The embodiment of the present disclosure does not limit the specific formula of how to determine the color difference, and the following is only an example:
diff=bg_mean-fg_mean……………………(2)diff=bg_mean-fg_mean…………………(2)
如上公式(2),其中diff指示所述颜色差异。bg_mean指示得到第二像素均值。fg_mean指示所述第一像素均值。Formula (2) above, where diff indicates the color difference. bg_mean indicates to get the second pixel mean. fg_mean indicates the first pixel mean.
由此,根据步骤S302-S304,一方面,可以用采样点的像素均值或像素中位数值代表采样点像素值,可以简化运算。另一方面,在计算第一采样点像素均值的时候,结合了采样点的透明度,从而使确定的像素均值更准确,有助于准确确定保留区域与目标区域之间的颜色差异,进而提升色调调整效果。Therefore, according to steps S302-S304, on the one hand, the pixel mean value or pixel median value of the sampling point can be used to represent the pixel value of the sampling point, which can simplify the operation. On the other hand, when calculating the pixel mean value of the first sampling point, the transparency of the sampling point is combined, so that the determined pixel mean value is more accurate, which helps to accurately determine the color difference between the reserved area and the target area, thereby improving the color tone Adjust the effect.
S106,根据所述颜色差异对所述第一图像内像素进行色调调整,得到与所述目标区域色调匹配的第二图像。S106. Perform tone adjustment on pixels in the first image according to the color difference to obtain a second image that matches the tone of the target area.
色调是指图像色彩的总体倾向。图像虽然包括多种颜色,但总体有一种颜色倾向。例如图像可能偏蓝或偏红,偏暖或偏冷等等。这种颜色上的倾向就是所述图像的色调。即,通过图像内像素的像素值(颜色值)可以指示图像的色调,通过对图像像素的像素值进行调整即可完成色调调整。Hue refers to the overall tendency of the color of the image. Although the image includes a variety of colors, it generally has a color tendency. For example, the image may be bluish or reddish, warm or cold, and so on. This tendency in color is the hue of the image. That is, the hue of the image can be indicated by the pixel value (color value) of the pixel in the image, and the hue adjustment can be completed by adjusting the pixel value of the image pixel.
所述色调调整可以是对第一图像内像素的颜色值进行调整,使其更接近所述目标区域内像素的颜色值。The tone adjustment may be to adjust the color values of the pixels in the first image to be closer to the color values of the pixels in the target area.
两幅图像的色调匹配,是指所述两幅图像内像素的颜色值之间的差异小于预设颜色阈值(经验阈值),即两幅图像内像素的颜色值比较接近,呈现出色调大致相同的效果。The tone matching of the two images means that the difference between the color values of the pixels in the two images is smaller than the preset color threshold (empirical threshold), that is, the color values of the pixels in the two images are relatively close, showing roughly the same hue Effect.
S106中,可以将第一图像内像素的像素值融合所述颜色差异,以达到所述第一图像的色调与所述目标区域的色调匹配的效果,完成所述色调调整。In S106, the color difference may be fused with pixel values of pixels in the first image, so as to achieve the effect that the hue of the first image matches the hue of the target area, and complete the hue adjustment.
本公开实施例不限制颜色融合的具体公式,如下仅示例一种:The embodiment of the present disclosure does not limit the specific formula of color fusion, and the following is only an example:
new_FG=q*diff+FG……………..(3)new_FG=q*diff+FG……………..(3)
如上公式(3),FG是指第一图像内像素的像素值。new_FG是指第二图像内像素的像素值。q是预设的调整系数。该q是根据业务需求预先设定的数值。diff指示目标区域与保留区域之间的颜色差异。通过该公式(3)可以通过基于所述颜色差异,对第一图像内像素进行色调调整,得到与目标区域色调匹配的第二图像,进而便于提升区域替换效果。As in the above formula (3), FG refers to the pixel value of the pixel in the first image. new_FG refers to the pixel value of a pixel within the second image. q is the preset adjustment factor. The q is a preset value according to business requirements. diff indicates the color difference between the target area and the reserved area. According to the formula (3), the color tone of the pixels in the first image can be adjusted based on the color difference to obtain a second image that matches the color tone of the target area, thereby facilitating the improvement of the area replacement effect.
在一些实施方式中,还可以融合所述第二图像与调整前的所述保留区域之间的颜色差异,对第二图像内像素再次进行调整,避免第二图像中像素的像素值过大或过小,提升色调调整效果。In some implementation manners, the color difference between the second image and the reserved area before adjustment can also be fused, and the pixels in the second image can be adjusted again to avoid the pixel value of the pixel in the second image being too large or Too small to enhance the tone adjustment effect.
请参见图4,图4为本公开实施例示出的一种色调调整方法的流程示意图。图4示出的步骤为对S106的补充说明。如图4所示,所述色调调整方法可以包括S402-S404。除特别说明外,本公开 不特别限定这些步骤的执行顺序。Please refer to FIG. 4 . FIG. 4 is a schematic flowchart of a method for adjusting hue according to an embodiment of the present disclosure. The steps shown in FIG. 4 are supplementary descriptions of S106. As shown in FIG. 4, the tone adjustment method may include S402-S404. Unless otherwise specified, the present disclosure does not specifically limit the execution order of these steps.
S402,基于所述颜色差异,对所述第一图像内像素的像素值进行初步调整,得到第三图像;所述第三图像内像素融合了所述颜色差异。S402. Based on the color difference, preliminarily adjust the pixel values of the pixels in the first image to obtain a third image; the pixels in the third image are fused with the color difference.
在一些实施方式中,可以采用前述公式(3)得到所述第三图像。In some implementation manners, the third image may be obtained by using the foregoing formula (3).
S404,基于所述第三图像内像素的像素均值与第一图像内像素的像素均值之间的差异,对所述第三图像内像素的像素值进行调整,得到与所述目标区域色调匹配的第二图像。S404. Based on the difference between the pixel mean value of the pixels in the third image and the pixel mean value of the pixels in the first image, adjust the pixel values of the pixels in the third image to obtain a tone matching the target area. second image.
所述第三图像内像素的像素均值与第一图像内像素的像素均值之间的差异,可以指示所述第三图像与第一图像之间的颜色差异。The difference between the pixel mean value of the pixels in the third image and the pixel mean value of the pixels in the first image may indicate a color difference between the third image and the first image.
S404中,可以先确定第三图像内像素的像素均值与第一图像内像素的像素均值之间的差异,然后在所述第三图像内像素的像素值中融合该差异,得到第二图像。In S404, the difference between the pixel mean value of the pixels in the third image and the pixel mean value of the pixels in the first image may be determined first, and then the difference is fused into the pixel values of the pixels in the third image to obtain the second image.
本公开实施例不限制颜色融合的具体公式,如下仅示例一种:The embodiment of the present disclosure does not limit the specific formula of color fusion, and the following is only an example:
new_FG’=new_FG+(mean(FG)-mean(new_FG))……………………(4)new_FG’=new_FG+(mean(FG)-mean(new_FG))………………(4)
如上公式(4),其中,等号左边的new_FG’即为第二图像内像素的像素值。等号右边的new_FG即为前述公式(3)得到的第三图像内像素的像素值。mean()为求平均值函数。通过mean(FG)-mean(new_FG)可以得到第三图像与第一图像之间的颜色差异。通过S402-S404,可以先利用公式(3)对第一图像进行色调的初步调整,然后可以利用公式(4)对第三图像进行色调回调,得到第二图像,使得第二图像的色调更加接近所述目标区域的色调,但也不会过分偏离第一图像的色调,降低第二图像中像素的像素值过大或过小导致的颜色过亮或过暗的可能,提升色调调整效果,进而提升区域替换效果。As in the above formula (4), wherein new_FG' on the left side of the equal sign is the pixel value of the pixel in the second image. The new_FG on the right side of the equal sign is the pixel value of the pixel in the third image obtained by the aforementioned formula (3). mean() is the average function. The color difference between the third image and the first image can be obtained by mean(FG)-mean(new_FG). Through S402-S404, formula (3) can be used to initially adjust the hue of the first image, and then formula (4) can be used to adjust the hue of the third image to obtain the second image, so that the hue of the second image is closer to The tone of the target area will not deviate too much from the tone of the first image, reducing the possibility of the color being too bright or too dark caused by the pixel value of the pixel in the second image being too large or too small, and improving the tone adjustment effect, and then Improve area replacement effect.
S108,基于所述透明度图,将所述第二图像与所述素材图像进行图像融合,得到目标图像。S108. Based on the transparency map, perform image fusion on the second image and the material image to obtain a target image.
所述图像融合可以包括但不限于两个图像内像素的像素值拼接,相加,相乘等任一方式。The image fusion may include, but is not limited to, splicing, addition, and multiplication of pixel values of pixels in two images.
在S108中,可以基于所述透明度图与所述第二图像进行融合得到第一结果,基于所述素材图像与所述透明度图对应的反向透明度图进行融合得到第二结果;然后再将所述第一结果与所述第二结果进行融合得到所述目标图像。In S108, the first result can be obtained based on the fusion of the transparency map and the second image, and the second result can be obtained based on the fusion of the material image and the reverse transparency map corresponding to the transparency map; and then the obtained The first result is fused with the second result to obtain the target image.
本公开实施例不限制图像融合的具体公式,如下仅示例一种:The embodiment of the present disclosure does not limit the specific formula of image fusion, and the following is only an example:
new=new_FG’*Alpha+BG*(1-Alpha)……………………..(5)new=new_FG’*Alpha+BG*(1-Alpha)………………………(5)
如上的公式(5),其中new指示目标图像内像素的像素值。new_FG’指示S106中得到的第二图像内像素的像素值。BG指示目标区域对应的素材图像内像素的像素值。Alpha指示透明度图内像素的透明度值。其中,1-Alpha可以表示为透明度图对应的反向透明度图。Alpha与new_FG’融合得到的第一结果可表示为new_FG’*Alpha,BG与反向透明图融合得到的第二结果可表示为BG*(1-Alpha),第一结果和第二结果融合的结果即为通过公式(5)得到的new。Equation (5) above, where new indicates the pixel value of a pixel in the target image. new_FG' indicates the pixel value of the pixel in the second image obtained in S106. BG indicates the pixel value of the pixel in the material image corresponding to the target area. Alpha indicates the transparency value of the pixel within the transparency map. Among them, 1-Alpha can be expressed as the reverse transparency map corresponding to the transparency map. The first result obtained by the fusion of Alpha and new_FG' can be expressed as new_FG'*Alpha, the second result obtained by the fusion of BG and reverse transparency can be expressed as BG*(1-Alpha), the fusion of the first result and the second result The result is new obtained by formula (5).
所述透明度图中属于所述保留区域的像素的透明度值为第一取值,属于非保留区域的像素的透明度值为第二取值。例如,在区域替换,且不保留原有的非保留区域的场景中,所述第一取值可以为1,指示像素为非透明,所述取值可以为0,指示像素为透明。通过new_FG’*Alpha,可以将所述第二图像内属于所述保留区域的像素调整为非透明,属于所述非保留区域的像素调整为透明。通过BG*(1-Alpha),将所述素材图像内属于所述目标区域的像素调整为非透明,属于非目标区域的像素调整为透明。通过公式5即可实现所述图像融合。The transparency values of the pixels belonging to the reserved area in the transparency map are the first value, and the transparency values of the pixels belonging to the non-reserved area are the second value. For example, in a scene where the area is replaced and the original non-reserved area is not retained, the first value may be 1, indicating that the pixel is non-transparent, and the value may be 0, indicating that the pixel is transparent. Through new_FG'*Alpha, the pixels belonging to the reserved area in the second image can be adjusted to be non-transparent, and the pixels belonging to the non-reserved area can be adjusted to be transparent. By using BG*(1-Alpha), the pixels belonging to the target area in the material image are adjusted to be non-transparent, and the pixels belonging to the non-target area are adjusted to be transparent. The image fusion can be realized by formula 5.
根据S102-S108记载的方案,可以根据对原始图像进行抠图处理得到的保留区域内像素的像素值以及目标区域内像素的像素值,确定所述保留区域与所述目标区域之间的颜色差异,然后可以根据所述颜色差异对所述保留区域内像素进行色调调整,统一所述保留区域内像素与所述目标区域内像素的色调,从而在将所述第二图像与所述目标区域进行图像融合,进行区域替换的过程中,可以保证目标区域与原始图像中的保留区域色调匹配,从而提升区域替换效果。According to the scheme described in S102-S108, the color difference between the reserved area and the target area can be determined according to the pixel values of the pixels in the reserved area obtained by matting the original image and the pixel values of the pixels in the target area , and then adjust the hue of the pixels in the reserved area according to the color difference, and unify the hues of the pixels in the reserved area and the pixels in the target area, so that when the second image and the target area are combined Image fusion, in the process of region replacement, can ensure that the tone of the target region matches the reserved region in the original image, thereby improving the effect of region replacement.
相关技术中,普遍使用分割模型对保留区域进行语义分割得到非保留区域的粗糙的掩膜结果。然后再根据掩膜结果与素材图像实现对原有非保留区域的替换。由于分割模型输出的掩膜结果在保留区域与非保留区域的边界区域往往较为粗糙,直接使用所述掩膜结果进行区域替换会使得所述边界区域出现明显人工痕迹。例如,在天空替换场景中,原始图像中的天空与地平线上之间的一些局部细节有可能缺失。In related technologies, segmentation models are commonly used to perform semantic segmentation on reserved regions to obtain rough mask results for non-reserved regions. Then, the replacement of the original non-reserved area is realized according to the mask result and the material image. Since the mask result output by the segmentation model is often rough in the boundary area between the reserved area and the non-reserved area, directly using the mask result for area replacement will cause obvious artifacts in the boundary area. For example, in a sky replacement scene, some local details between the sky and the horizon in the original image may be missing.
在一些实施例中,可以采用三分图抠图的方式以解决前述问题。在S102中,可以利用与原始图像对应的三分图对所述原始图像进行抠图,得到所述第一图像与所述透明度图,从而可以利用所述三分图可以很好地区分原始图像中保留区域,非保留区域,以及保留区域与非保留区域之间的待确定区域的特性,使得得到的透明度图可以保存保留区域与非保留区域交接位置的细节信息,与直接利用掩膜结果进行区域替换的方案相比,基于利用三分图对原始图像进行抠图得到的透明度图进行 区域替换,可以有利于提升保留区域与目标区域之间的衔接效果。In some embodiments, a tripartite matting method may be used to solve the foregoing problems. In S102, the original image can be matted by using the trimap corresponding to the original image to obtain the first image and the transparency map, so that the original image can be well distinguished by using the trimap The characteristics of the reserved area, the unreserved area, and the area to be determined between the reserved area and the unreserved area enable the obtained transparency map to preserve the detailed information of the junction position between the reserved area and the unreserved area, and directly use the mask result to perform Compared with the region replacement scheme, the region replacement based on the transparency map obtained by matting the original image by using the tripartite map can help improve the connection effect between the reserved region and the target region.
请参见图5,图5为本公开实施例示出的一种抠图方法的流程示意图。图5示出的步骤为对S102的补充说明。如图5所示,所述抠图方法可以包括S502-S504。除特别说明外,本公开不特别限定这些步骤的执行顺序。Please refer to FIG. 5 . FIG. 5 is a schematic flowchart of a method for cutting out images according to an embodiment of the present disclosure. The steps shown in FIG. 5 are supplementary descriptions of S102. As shown in FIG. 5 , the map-cutting method may include S502-S504. Unless otherwise specified, the present disclosure does not specifically limit the execution order of these steps.
S502,获取原始图像对应的三分图;针对所述三分图中每个像素,该像素对应的数值表示该像素属于所述原始图像中的保留区域、非保留区域或待确定区域中的任一区域的概率。S502. Obtain the trimap corresponding to the original image; for each pixel in the trimeg, the value corresponding to the pixel indicates that the pixel belongs to any of the reserved area, non-reserved area, or undetermined area in the original image the probability of a region.
所述三分图具备区分图像中前景,背景,以及前景与背景之间交接区域的特性。即,不论保留区域为原始图像中的前景还是背景,利用所述三分图区分出原始图像中保留区域,非保留区域,以及保留区域与非保留区域之间的待确定区域,以保存保留区域与非保留区域交接位置的细节信息。本公开中的三分图可以用trimap表示。The trimap has the characteristics of distinguishing the foreground, the background, and the transition area between the foreground and the background in the image. That is, regardless of whether the reserved area is the foreground or the background in the original image, the three-part map is used to distinguish the reserved area in the original image, the non-reserved area, and the area to be determined between the reserved area and the non-reserved area, so as to save the reserved area Details of the location of the handoff with the non-reserved area. The trimap in the present disclosure can be represented by a trimap.
在一些实施方式中,可以采用编辑软件辅助获取所述原始图像的三分图。以保留区域为前景区域为例,可以通过图像编辑软件,在原始图像上标注非保留区域(背景区域),保留区域(前景区域),以及待确定区域,得到三分图。In some embodiments, editing software can be used to assist in obtaining the trimap of the original image. Taking the reserved area as the foreground area as an example, the non-reserved area (background area), reserved area (foreground area), and undetermined area can be marked on the original image through image editing software to obtain a tripartite map.
在一些实施方式中,可以采用基于神经网络生成的三分图提取网络得到所述三分图。所述三分图提取网络可以预先基于标注了三分图信息的训练样本训练得到。In some implementation manners, the trimap may be obtained by using a trimap extraction network generated based on a neural network. The trimap extraction network can be trained in advance based on training samples marked with trimap information.
通过以上两种实施方式中,要么是由人工通过软件标注,要么是通过预测网络预测trimap。而人工标注的方式太复杂,使用不方便;网络预测的方式又需要大量的trimap标注,比较繁琐。In the above two implementations, either manual labeling by software, or prediction of trimap by prediction network. However, the manual labeling method is too complicated and inconvenient to use; the network prediction method requires a large number of trimap labels, which is cumbersome.
为了简化三分图获取流程,在一些实施方式中,不需要用户人工标注trimap,也不需要预先训练用于预测trimap的预测网络,而是可以基于语义分割的结果结合概率转换,就可以得到trimap。In order to simplify the trimap acquisition process, in some implementations, the user does not need to manually label the trimap, nor does it need to pre-train the prediction network for predicting the trimap, but the trimap can be obtained based on the results of semantic segmentation combined with probability conversion. .
请参见图6,图6为本公开实施例示出的一种获取三分图方法的流程示意图。图6示出的步骤为对S502中获取三分图方法的补充说明。如图6所示,所述获取三分图方法可以包括S602-S604。除特别说明外,本公开不特别限定这些步骤的执行顺序。Please refer to FIG. 6 . FIG. 6 is a schematic flowchart of a method for obtaining a tripartite graph according to an embodiment of the present disclosure. The steps shown in FIG. 6 are supplementary descriptions of the method for obtaining the trimap in S502. As shown in FIG. 6 , the method for obtaining a trimap may include S602-S604. Unless otherwise specified, the present disclosure does not specifically limit the execution order of these steps.
S602,对待处理的原始图像进行语义分割处理,得到原始图像对应的语义概率图。S602. Perform semantic segmentation processing on the original image to be processed to obtain a semantic probability map corresponding to the original image.
其中,可以将待进行抠图处理的图像称为原始图像。例如,假设要从一个人物图像中提取出非天空区域,该人物图像可以称为原始图像。其中的非天空区域即为抠图处理要提取的目标,可以称为保留区域,该保留区域可以是原始图像中的前景或者背景。Wherein, the image to be matted may be referred to as an original image. For example, assuming that a non-sky area is to be extracted from a person image, the person image may be called an original image. The non-sky area is the target to be extracted in the matting process, which may be called a reserved area, and the reserved area may be the foreground or background in the original image.
本实施例中,可以通过对原始图像进行语义分割处理,例如,可以通过语义分割网络进行语义分割处理。该语义分割网络包括但不限于SegNet、U-Net、DeepLab、FCN等常用的语义分割网络。In this embodiment, the semantic segmentation processing may be performed on the original image, for example, the semantic segmentation processing may be performed through a semantic segmentation network. The semantic segmentation network includes but is not limited to commonly used semantic segmentation networks such as SegNet, U-Net, DeepLab, and FCN.
通过语义分割处理后可以得到原始图像的语义概率图,该语义概率图可以包括:针对所述原始图像中的每个像素,该像素属于保留区域的第一概率。以保留区域为前景为例,语义概率图中,原始图像内的某个像素属于前景的概率可能是0.85,另一个像素属于前景的概率可能是0.24。After semantic segmentation processing, a semantic probability map of the original image can be obtained, and the semantic probability map can include: for each pixel in the original image, the first probability that the pixel belongs to the reserved area. Taking the reserved area as the foreground as an example, in the semantic probability map, the probability of a certain pixel in the original image belonging to the foreground may be 0.85, and the probability of another pixel belonging to the foreground may be 0.24.
S604,基于所述语义概率图进行概率转换处理,得到所述原始图像对应的三分图。S604. Perform probability conversion processing based on the semantic probability map to obtain a trimap corresponding to the original image.
本步骤中,可以基于语义分割处理的结果,进行概率转换处理,得到三分图。本实施例中的通过概率转换处理得到的三分图可以用soft-trimap表示。In this step, probability conversion processing may be performed based on the semantic segmentation processing result to obtain a tripartite graph. The trimap obtained through probability conversion processing in this embodiment can be represented by soft-trimap.
其中,所述的概率转换处理可以是通过数学的转换方式,将语义概率图中得到的像素对应的概率映射到soft-trimap中的像素对应的数值。Wherein, the probability conversion process may be to map the probability corresponding to the pixel obtained in the semantic probability map to the value corresponding to the pixel in the soft-trimap through a mathematical conversion method.
具体的,以保留区域为前景为例。可以将语义概率图中的概率执行如下两部分的概率转换:Specifically, take the reserved area as the foreground as an example. The probability in the semantic probability graph can be transformed into the following two parts:
1)基于语义概率图,将第一概率转换得到第二概率。1) Based on the semantic probability map, the first probability is converted to obtain the second probability.
其中,三分图soft-trimap中可以包括三种区域:“保留区域(前景)”、“非保留区域(背景)”和“待确定区域”。本实施例可以将像素属于所述三分图中的待确定区域的概率称为第二概率。Wherein, the trimap soft-trimap may include three kinds of regions: "reserved region (foreground)", "non-reserved region (background)" and "to-be-determined region". In this embodiment, the probability that the pixel belongs to the region to be determined in the tripartite map may be referred to as the second probability.
将语义概率图中像素属于保留区域的第一概率转换到第二概率时,可以遵循如下的概率转换原则:第一概率表征所述像素属于保留区域(前景)或者非保留区域(背景)的概率越高,所述第二概率表征所述像素属于三分图中的待确定区域的概率越低。比如,第一概率越接近1和0,第二概率越接近于0;第一概率越接近0.5,第二概率越接近于1。上述转换原则即为,若图像中的一个像素属于保留区域(前景)的概率越高,或者属于非保留区域(背景)的概率越高,则该像素属于待确定区域的概率就越低;而像素属于保留区域(前景)或者非保留区域(背景)的概率在0.5附近时,表示该像素属于待确定区域的概率就越高。When converting the first probability that the pixel in the semantic probability map belongs to the reserved area to the second probability, the following probability conversion principle can be followed: the first probability characterizes the probability that the pixel belongs to the reserved area (foreground) or the non-reserved area (background) The higher the value, the lower the probability that the second probability indicates that the pixel belongs to the region to be determined in the tripartite map. For example, the closer the first probability is to 1 and 0, the closer the second probability is to 0; the closer the first probability is to 0.5, the closer the second probability is to 1. The above conversion principle is that if a pixel in the image has a higher probability of belonging to the reserved area (foreground), or a higher probability of belonging to the non-reserved area (background), the lower the probability of the pixel belonging to the area to be determined; and When the probability that the pixel belongs to the reserved area (foreground) or the non-reserved area (background) is around 0.5, it means that the probability that the pixel belongs to the area to be determined is higher.
基于上述的概率转换的原则,可以将第一概率转换得到第二概率。本公开实施例不限制概率转换的具体公式,如下仅示例一种:Based on the above principle of probability conversion, the first probability can be converted to obtain the second probability. The embodiment of the present disclosure does not limit the specific formula of probability conversion, and the following is only an example:
un=-k4*score^4+k3*score^3–k2*score^2+k1*score.......(6)un=-k4*score^4+k3*score^3–k2*score^2+k1*score.......(6)
如上的公式(6),其中的un表示像素属于待确定区域的第二概率,score表示该像素在语义概率图中属于保留区域的第一概率。该公式(6)是一种多项式拟合的方式,通过多项式拟合,将像 素的第一概率拟合得到第二概率。本实施例不限制上述的各个系数“k1/k2/k3/k4”的具体取值。As in the above formula (6), where un represents the second probability that the pixel belongs to the region to be determined, and score represents the first probability that the pixel belongs to the reserved region in the semantic probability map. The formula (6) is a way of polynomial fitting, through polynomial fitting, the first probability of the pixel is fitted to obtain the second probability. This embodiment does not limit the specific values of the coefficients "k1/k2/k3/k4" mentioned above.
可以理解的是,实际实施中并不局限于上述的多项式拟合,也可以采用其他的函数式,只要遵循上述的概率转换原则即可。本实施例采用多项式拟合将第一概率转换为第二概率,能够使得这种多项式的转换计算效率更高,而且也较准确的反映了上述的转换原则。It can be understood that the actual implementation is not limited to the above-mentioned polynomial fitting, and other functional formulas can also be used, as long as the above-mentioned probability conversion principle is followed. In this embodiment, polynomial fitting is used to convert the first probability into the second probability, which can make the polynomial conversion calculation more efficient, and also more accurately reflect the above conversion principle.
2)根据语义概率图中的每个像素对应的第一概率和第二概率,生成所述三分图。2) Generate the trimap according to the first probability and the second probability corresponding to each pixel in the semantic probability map.
如上,通过对原始图像进行语义分割处理,可以得到了语义概率图,通过该语义概率图就可以大致的将原始图像中的保留区域(前景)和非保留区域(背景)区分出来,例如,若一个像素属于前景的第一概率是0.96,那属于前景的概率很高;若一个像素属于前景的第一概率是0.14,即为像素属于背景的概率很高。As above, by performing semantic segmentation on the original image, a semantic probability map can be obtained, and the reserved area (foreground) and non-reserved area (background) in the original image can be roughly distinguished through the semantic probability map. For example, if If the first probability of a pixel belonging to the foreground is 0.96, then the probability of belonging to the foreground is very high; if the first probability of a pixel belonging to the foreground is 0.14, it means that the probability of the pixel belonging to the background is very high.
在基于语义概率图得到第二概率后,就可以得到每一个像素属于待确定区域的第二概率。对于原始图像中的每个像素,可以结合该像素在语义概率图中对应的第一概率、以及该像素属于待确定区域的第二概率进行概率融合,就可以得到该像素在三分图soft-trimap中对应的数值,该数值可以表征所述像素在原始图像中属于确定保留区域(前景)、非保留区域(背景)或待确定区域中的任一区域的概率。After the second probability is obtained based on the semantic probability map, the second probability that each pixel belongs to the region to be determined can be obtained. For each pixel in the original image, the first probability corresponding to the pixel in the semantic probability map and the second probability that the pixel belongs to the region to be determined can be combined for probability fusion, and the pixel can be obtained in the tripartition map soft- The corresponding numerical value in the trimap, which can represent the probability that the pixel belongs to any area in the determined reserved area (foreground), non-reserved area (background) or undetermined area in the original image.
举例来说:在soft-trimap中,若一个像素对应的数值越靠近1,表示该像素在原始图像中越可能属于保留区域(前景);该像素在soft-trimap中对应的数值越靠近0,表示该像素越可能属于非保留区域(背景);该像素在soft-trimap中对应的数值越靠近0.5,表示该像素越可能属于待确定区域。即通过像素在soft-trimap中对应的数值就可以表示出该像素属于保留区域、非保留区域或待确定区域中的任一区域的概率。For example: in soft-trimap, if the value corresponding to a pixel is closer to 1, it means that the pixel is more likely to belong to the reserved area (foreground) in the original image; the closer the value of the pixel in soft-trimap is to 0, it means The pixel is more likely to belong to the non-reserved area (background); the closer the value of the pixel in the soft-trimap is to 0.5, the more likely the pixel belongs to the area to be determined. That is, the probability that the pixel belongs to any one of the reserved area, the non-reserved area, or the area to be determined can be expressed by the value corresponding to the pixel in the soft-trimap.
如下的公式(7),示例了一种根据第一概率和第二概率得到三分图的方式:The following formula (7) exemplifies a way to obtain a tripartite graph according to the first probability and the second probability:
soft_trimap=-k5*un/k6*sign(score-k7)+(sign(score-k7)+k8)/k9.......(7)soft_trimap=-k5*un/k6*sign(score-k7)+(sign(score-k7)+k8)/k9....(7)
如上的公式(7)中,soft_trimap表示soft-trimap中的像素对应的数值,un表示第二概率,score表示第一概率,sign()表示sign函数。同理,本实施例不限制上述的各个系数“k5/k6/k7/k8”的具体取值。In the above formula (7), soft_trimap represents the value corresponding to the pixel in soft-trimap, un represents the second probability, score represents the first probability, and sign() represents the sign function. Similarly, this embodiment does not limit the specific values of the aforementioned coefficients "k5/k6/k7/k8".
如上示例的描述,经过将像素对应的第一概率转换得到第二概率、以及结合像素对应的第一概率和第二概率生成所述三分图的处理后,实现了基于语义概率图进行概率转换处理得到三分图soft_trimap。As described in the above example, after converting the first probability corresponding to the pixel to obtain the second probability, and combining the first probability and the second probability corresponding to the pixel to generate the tripartite map, the probability conversion based on the semantic probability map is realized Process to get the three-point map soft_trimap.
在一些实施例中,在进行上述的基于语义概率图进行概率转换处理之前,还可以对所述语义概率图进行池化处理,并对池化后的语义概率图进行上述的概率转换处理。请参见下面的公式(8):In some embodiments, before performing the above-mentioned probability conversion processing based on the semantic probability map, pooling processing may be performed on the semantic probability map, and the above-mentioned probability conversion processing is performed on the pooled semantic probability map. See equation (8) below:
score_=avgpool2d(score,ks,stride).......(8)score_=avgpool2d(score, ks, stride).......(8)
如公式(8)所示,在一个示例中,可以对语义概率图进行平均池化处理,并且依据卷积步长stride、卷积核大小(kernel_size,ks)进行池化。score_表示池化后的语义概率图,其中包含各池化后的概率。As shown in formula (8), in an example, the average pooling process can be performed on the semantic probability map, and the pooling is performed according to the convolution stride and the convolution kernel size (kernel_size, ks). score_ represents the semantic probability map after pooling, which contains the probability of each pooling.
如果对语义概率图进行了池化处理,那么上面的公式(6)和公式(7)中的score都替换为池化后的概率,即采用池化后的语义概率图执行概率转换。If pooling is performed on the semantic probability map, the scores in the above formulas (6) and (7) are replaced with the pooled probability, that is, the pooled semantic probability map is used to perform probability conversion.
上述池化处理中采用的kernel的大小可以调整,并且在对语义概率图进行概率转换前进行池化处理,有助于通过调整卷积核大小,调整将要生成的soft_trimap中的待确定区域的宽度。例如,kernel_size越大,待确定区域的宽度就可以越宽。The size of the kernel used in the above pooling process can be adjusted, and the pooling process is performed before the probability conversion of the semantic probability map, which helps to adjust the width of the area to be determined in the soft_trimap to be generated by adjusting the size of the convolution kernel . For example, the larger the kernel_size is, the wider the width of the area to be determined can be.
在一些实施例中,假设对原始图像的语义分割处理是由语义分割网络进行,那么在进行语义分割处理之前,还可以对原始图像的图像尺寸进行预处理,该预处理可以是基于语义分割网络对原始图像的下采样倍数,将原始图像的图像尺寸进行该下采样倍数的整数倍处理,使得整数倍处理后的图像尺寸能够整除上述下采样倍数scale_factor,该scale_factor是语义分割网络对原始图像的下采样倍数,具体数值由语义分割网络的网络结构确定。In some embodiments, assuming that the semantic segmentation processing of the original image is performed by a semantic segmentation network, before performing the semantic segmentation processing, the image size of the original image can also be preprocessed, and the preprocessing can be based on the semantic segmentation network For the downsampling multiple of the original image, the image size of the original image is processed by an integer multiple of the downsampling multiple, so that the image size after the integer multiple processing can be divided by the above-mentioned downsampling multiple scale_factor, which is the semantic segmentation network for the original image. The downsampling multiple, the specific value is determined by the network structure of the semantic segmentation network.
通过S602-S604记载的步骤获取三分图,可以对基于原始图像进行语义分割得到的语义概率图进行概率转换得到三分图,使得三分图的获得更加快捷方便,不再需要人工标注,也不再需要通过trimap标注训练预测网络,从而使得抠图处理的过程实现起来更加简单;并且,这种利用概率转换得到三分图的方式,依据了语义分割的语义概率图,从而使得生成的三分图较为准确。Through the steps recorded in S602-S604 to obtain the three-part map, the semantic probability map obtained by semantic segmentation based on the original image can be probabilistically converted to obtain the three-part map, which makes the acquisition of the three-part map faster and more convenient, no manual labeling is required, and It is no longer necessary to train the prediction network through trimap annotations, which makes the process of map matting easier to implement; moreover, this method of using probability conversion to obtain a trimap is based on the semantic probability map of semantic segmentation, so that the generated trimap The graph is more accurate.
S504,根据所述三分图和所述原始图像进行抠像处理,得到所述抠图结果。S504. Perform image matting processing according to the trimap and the original image to obtain the matting result.
本步骤中,所述的抠图处理的过程可以包括:将三分图和原始图像作为抠像网络的输入,得到所述抠像网络输出的保留区域残差、以及所述原始图像的初始透明度图。In this step, the process of the matting process may include: using the tripartite image and the original image as the input of the matting network, obtaining the residual of the reserved area output by the matting network and the initial transparency of the original image picture.
所述保留区域残差可以通过抠像网络中残差处理单元得到的残差结果。所述保留区域残差可以指示通过所述残差处理单元提取的所述保留区域内像素的像素值与所述原始图像中相应像素的像素值之间的差值。The residual of the reserved area may be a residual result obtained by the residual processing unit in the matting network. The reserved area residual may indicate a difference between a pixel value of a pixel in the reserved area extracted by the residual processing unit and a pixel value of a corresponding pixel in the original image.
所述初始透明度图内像素的数值指示像素的透明度。The value of the pixel in the initial transparency map indicates the transparency of the pixel.
接着,可以基于原始图像和保留区域残差得到所述第一图像(例如,基于前景残差和原始图像相加得到前景图像,或者基于背景残差和原始图像相加得到背景图像),并可以根据所述三分图soft_trimap,对所述初始透明度图内像素的数值进行调整,得到与所述原始图像对应的透明度图。Then, the first image can be obtained based on the original image and the reserved area residual (for example, the foreground image can be obtained based on the addition of the foreground residual and the original image, or the background image can be obtained based on the addition of the background residual and the original image), and can According to the trimap soft_trimap, the values of the pixels in the initial transparency map are adjusted to obtain a transparency map corresponding to the original image.
通过前述调整,第一,可以将所述初始透明度图中,处于所述三分图的保留区域内的像素的透明度值调整为第一取值。第二,可以将所述初始透明度图中,处于所述三分图的非保留区域内的像素的透明度值调整为第二取值。第三,可以基于所述初始透明度图中处于所述三分图的待确定区域内的像素的透明度值的大小,对所述初始透明度图中处于所述待确定区域内的像素做出区分,区分出该待确定区域中内像素属于保留区域还是非保留区域,并为其赋值。Through the foregoing adjustments, first, the transparency values of pixels in the reserved region of the trimap in the initial transparency map may be adjusted to a first value. Second, the transparency values of pixels in the non-reserved region of the trimap in the initial transparency map may be adjusted to a second value. Thirdly, based on the magnitude of the transparency value of the pixels in the region to be determined in the trimap in the initial transparency map, distinguish the pixels in the region to be determined in the initial transparency map, Distinguish whether the pixels in the area to be determined belong to a reserved area or a non-reserved area, and assign a value to it.
例如,在区域替换,且不保留原有的非保留区域的场景中,基于前述调整,可以使所述初始透明度图中处于所述三分图的保留区域内的像素的透明度值调整为1,可以使所述初始透明度图中处于所述三分图的非保留区域内的像素的透明度值调整为0,可以将所述初始透明度图处于所述待确定区域内的像素中,透明度值大于0.5的透明度值调整为1,小于0.5的透明度调整为0。For example, in the scene where the area is replaced and the original non-reserved area is not retained, based on the aforementioned adjustment, the transparency value of the pixel in the reserved area of the tripartite map in the initial transparency map can be adjusted to 1, The transparency value of the pixels in the non-reserved area of the tripartite map in the initial transparency map can be adjusted to 0, and the transparency value of the pixels in the initial transparency map in the region to be determined can be greater than 0.5 The opacity value is adjusted to 1, and the opacity value less than 0.5 is adjusted to 0.
根据S502-S504记载的抠图方案,可以利用三分图进行抠图,从而可以很好的保存保留区域与非保留区域交接位置的细节信息,在进行区域替换时,有利于提升保留区域与目标区域之间的衔接效果。According to the matting scheme recorded in S502-S504, the three-part map can be used for matting, so that the detailed information of the handover position between the reserved area and the non-reserved area can be well preserved, and it is beneficial to improve the reserved area and the target when performing area replacement. Cohesion effect between regions.
相关技术中,由于手机端的计算能力或处理能力不足,手机端的区域替换软件主要是在通过将数据上传到服务端进行处理,然后将区域替换结果再传输回手机端本地读取。该方案中数据的安全与隐私性很难得到保。In related technologies, due to the lack of computing power or processing capability of the mobile phone, the area replacement software on the mobile phone mainly uploads data to the server for processing, and then transmits the area replacement result back to the mobile phone for local reading. The security and privacy of data in this scheme are difficult to guarantee.
为了解决这一问题,考虑到移动端的处理能力,在一些实施例中可以将部署到移动端的网络进行小型化设计,并且可以将原始图像的尺寸进行缩放,以使得运行耗时和内存消耗在移动端的处理能力范围内,从而可以无需通过服务端进行区域替换,保证数据的安全与隐私性。如下描述一个在移动端进行抠像的例子。In order to solve this problem, considering the processing capability of the mobile terminal, in some embodiments, the network deployed to the mobile terminal can be miniaturized, and the size of the original image can be scaled, so that the time-consuming and memory consumption of the mobile terminal can be reduced. Within the processing capability of the terminal, there is no need to replace the region through the server to ensure data security and privacy. An example of image keying on the mobile terminal is described as follows.
本实施例在利用S502-S504记载的方法进行抠像时,可以采用语义分割网络和抠像网络。其中,语义分割网络可以是SegNet、U-Net等网络,抠像网络可以包括编码器(encoder)和解码器(decoder)。所述的抠像网络的编码器可以采用mobv2的结构设计,并且在抠像网络部署到移动端之前,可以将抠像网络进行通道压缩,所述的通道压缩可以是对抠像网络的网络中间特征(即网络中间层特征)的通道数量进行压缩。例如,可以是将抠像网络处理过程中的卷积核的输出通道数量降低,假设卷积核的输出通道数原本是a,可以按照0.35倍的通道数量进行压缩,压缩后卷积核的输出通道数是0.35*a。In this embodiment, when the method described in S502-S504 is used for image matting, a semantic segmentation network and an image matting network may be used. Wherein, the semantic segmentation network may be a network such as SegNet, U-Net, etc., and the matting network may include an encoder (encoder) and a decoder (decoder). The encoder of the image matting network can adopt the structure design of mobv2, and before the image matting network is deployed to the mobile terminal, the channel compression of the image matting network can be performed, and the channel compression can be carried out in the middle of the network of the image matting network The number of channels of the features (that is, the features of the middle layer of the network) is compressed. For example, it is possible to reduce the number of output channels of the convolution kernel in the process of matting network processing. Assuming that the number of output channels of the convolution kernel is originally a, it can be compressed according to 0.35 times the number of channels. After compression, the output of the convolution kernel The number of channels is 0.35*a.
以下结合对人物图像中的天空场景进行替换为例进行说明。In the following, the replacement of the sky scene in the person image is taken as an example for illustration.
请参见图7a,图7a为本公开实施例示出的一种人物图像示意图。Please refer to FIG. 7a, which is a schematic diagram of a character image shown in an embodiment of the present disclosure.
图7a示出的人物图像中的天空区域作为背景区域,也是非保留区域,需要替换为预先获取的素材图像中的另一种天空区域(即本公开中的目标区域)。图7a中的非天空区域即为本例中经抠像网络输出的保留区域,也即前景区域。The sky area in the person image shown in FIG. 7a is used as a background area, which is also a non-reserved area, and needs to be replaced with another sky area (ie, the target area in this disclosure) in the pre-acquired material image. The non-sky area in Figure 7a is the reserved area output by the matting network in this example, that is, the foreground area.
请参见图8,图8为本公开实施例示出的一种区域替换的过程示意图。图9为基于图8的区域替换方法的流程示意图。如图9所示,所述区域替换方法可以包括S901-S909。除特别说明外,本公开不特别限定这些步骤的执行顺序。Please refer to FIG. 8 . FIG. 8 is a schematic diagram of a region replacement process shown in an embodiment of the present disclosure. FIG. 9 is a schematic flowchart of the region replacement method based on FIG. 8 . As shown in FIG. 9, the region replacement method may include S901-S909. Unless otherwise specified, the present disclosure does not specifically limit the execution order of these steps.
S901,对原始图像进行缩放处理。S901. Perform scaling processing on the original image.
本实施例的原始图像可以是图7a示出的人物图像。该人物图像可以是用户通过移动终端摄像头拍摄得到,或者也可以是移动终端存储或从其他设备接收到的图像。The original image in this embodiment may be the person image shown in Fig. 7a. The person image may be captured by the user through the camera of the mobile terminal, or may be an image stored in the mobile terminal or received from other devices.
本实施例进行抠图处理的目的可以是提取该人物图像中的非天空区域。可以将原始图像中的非天空区域作为前景。The purpose of the matting process in this embodiment may be to extract the non-sky area in the person image. Non-sky regions in the original image can be considered as foreground.
由于本实施例是在移动端执行抠图处理,为了减轻移动端处理的负担,节省移动端的计算量,可以对原始图像进行缩放。假设图7a中的原始图像的尺寸是1080*1920,可以将该图像缩放到480*288的尺寸。例如,可以通过双线性差值的方式进行缩放。可以参照如下的公式(9)和公式(10)进行缩放:Since the cutout process is performed on the mobile terminal in this embodiment, the original image can be scaled in order to reduce the processing load on the mobile terminal and save the calculation amount of the mobile terminal. Assuming that the size of the original image in Figure 7a is 1080*1920, the image can be scaled to a size of 480*288. For example, scaling can be done by way of bilinear difference. Scaling can be performed with reference to the following formula (9) and formula (10):
scale=max(h/basesize,w/basesize).......(9)scale=max(h/basesize,w/basesize).......(9)
new_h=int(h/scale+k10)new_w=int(w/scale+k11).......(10)new_h=int(h/scale+k10)new_w=int(w/scale+k11)......(10)
其中,h和w是原始图像的长和宽,basesize是基准尺寸,本例子中是480,int(x)表示对x进行取整。new_h和new_w分别是对原始图像进行缩放后的尺寸,其中,公式(10)中的系数的具体取值本实施例不做限制。Among them, h and w are the length and width of the original image, basesize is the base size, which is 480 in this example, and int(x) means rounding x. new_h and new_w are the scaled dimensions of the original image respectively, where the specific values of the coefficients in formula (10) are not limited in this embodiment.
此外,可以继续根据公式(11)和公式(12),对原始图像的图像尺寸进行下采样倍数的整数 倍处理,来控制缩放后的图像尺寸能够整除语义分割网络对图像的下采样倍数scale_factor。可以理解,所述的整数倍处理也可以采用其他公式,不局限于如下的两个公式。In addition, according to formula (11) and formula (12), the image size of the original image can be processed by an integer multiple of the downsampling multiple to control the scaled image size to be able to divide the semantic segmentation network's downsampling multiple scale_factor of the image. It can be understood that other formulas may also be used for the integer multiple processing, and are not limited to the following two formulas.
new_h=int(int(int(new_h–k12+scale_facor–k13)/scale_factor)*scale_factor)......(11)new_h=int(int(int(new_h–k12+scale_factor–k13)/scale_factor)*scale_factor)...(11)
new_w=int(int(int(new_w–k14+scale_facor–k15)/scale_factor)*scale_factor)......(12)new_w=int(int(int(new_w–k14+scale_factor–k15)/scale_factor)*scale_factor)...(12)
本实施例不限制上述的公式(11)和公式(12)中的各个系数的具体取值,例如,可以将上述的k12至k15的取值都设置为1。如果将缩放前的原始图像以A标识,那么在缩放成480*288的图像后,再将该图像进行归一化得到的原始图像可以用B标识。参见图8中所示,原始图像B即为缩放处理后的原始图像。This embodiment does not limit the specific values of the respective coefficients in the above formula (11) and formula (12). For example, the above values of k12 to k15 may all be set to 1. If the original image before scaling is marked with A, then the original image obtained by normalizing the image after being scaled into a 480*288 image can be marked with B. Referring to FIG. 8 , the original image B is the original image after scaling.
S902,通过语义分割网络对缩放处理后的原始图像进行语义分割处理,得到语义分割网络输出的语义概率图。S902. Perform semantic segmentation processing on the scaled original image through the semantic segmentation network to obtain a semantic probability map output by the semantic segmentation network.
结合图8所示,可以通过语义分割网络81对原始图像B进行语义分割处理,得到语义分割网络输出的语义概率图82,该语义概率图可以用score标识,并且图7b示意了一种语义概率图。可以看到,该语义概率图的score指示像素属于非天空区域(前景)的概率,粗略的区分了图像中的前景和背景,即粗略区分了天空区域与非天空区域。As shown in Fig. 8, the semantic segmentation process can be performed on the original image B through the semantic segmentation network 81, and the semantic probability map 82 output by the semantic segmentation network can be obtained. The semantic probability map can be identified by score, and Fig. 7b shows a semantic probability picture. It can be seen that the score of the semantic probability map indicates the probability that the pixel belongs to the non-sky area (foreground), which roughly distinguishes the foreground and background in the image, that is, roughly distinguishes the sky area from the non-sky area.
S903,基于所述语义概率图进行概率转换处理,得到三分图。S903. Perform probability conversion processing based on the semantic probability map to obtain a trimogram.
本步骤中,可以按照前述S604中描述的概率转换处理生成三分图soft-trimap。例如,可以先根据公式(8)将语义概率图进行池化处理,再对池化后的语义概率图根据公式(6)和公式(7)进行概率转换处理,生成三分图。参见图8中的该三分图83。In this step, the trimap soft-trimap may be generated according to the probability conversion process described in the aforementioned S604. For example, the semantic probability map can be pooled according to formula (8), and then the probability conversion process can be performed on the pooled semantic probability map according to formula (6) and formula (7) to generate a tripartite map. See this tripartite diagram 83 in FIG. 8 .
请参见图7c的示意,该图7c示意了一种三分图soft-trimap,可以看到,该soft-trimap中的像素的概率值可以表示该像素属于三种区域的概率,根据该概率值可以区分了图像中的“天空区域”、“非天空区域”和“天空区域与非天空区域之间的待确定区域”。Please refer to the illustration in Figure 7c. This Figure 7c illustrates a three-part map soft-trimap. It can be seen that the probability value of the pixel in the soft-trimap can represent the probability that the pixel belongs to three types of regions. According to the probability value A "sky area", a "non-sky area" and "a region to be determined between the sky area and the non-sky area" in the image can be distinguished.
S904,将三分图和原始图像作为抠像网络的输入,得到抠像网络输出的前景残差和初始透明度图。S904, using the trimap and the original image as input to the matting network, and obtaining the foreground residual and the initial transparency map output by the matting network.
请参见图8所示,可以将三分图83和原始图像B都作为抠像网络84的输入,该抠像网络可以输出一个4通道的结果,其中一个通道的结果是初始透明度图raw_alpha,另外三个通道的结果是前景残差fg_res。图8中的抠像网络输出的第一结果85可以包括“raw_alpha+fg_res”。Please refer to Fig. 8, the three-part image 83 and the original image B can be used as the input of the matting network 84, and the matting network can output a 4-channel result, wherein the result of one channel is the initial transparency map raw_alpha, and in addition The result of the three channels is the foreground residual fg_res. The first result 85 output by the keying network in FIG. 8 may include "raw_alpha+fg_res".
S905,基于原始图像和前景残差得到包含前景区域的前景图像,并根据初始透明度和三分图得到透明度图。S905. Obtain a foreground image including the foreground region based on the original image and the foreground residual, and obtain a transparency map according to the initial transparency and the trimap.
所述前景即人物图像中的非天空区域。The foreground is the non-sky area in the person image.
请继续结合图8所示,可以将前景残差fg_res通过双线性差值进行放大处理,使得恢复到原始图像进行缩放处理之前的尺度,然后执行公式(13):Please continue to combine with Figure 8, the foreground residual fg_res can be enlarged through the bilinear difference, so that it can be restored to the scale before the original image is scaled, and then execute the formula (13):
FG=clip(A+fg_res,s1,s2).......(13)FG=clip(A+fg_res,s1,s2).......(13)
如图8所示,可以根据放大处理后的前景残差fg_res和原始图像A,得到抠图结果86,即原始图像中的前景图像FG。其中,clip(x,s1,s2)为将x的数值限制在[s1,s2]。本实施例不限制上述的公式(13)中的s1和s2的具体取值,例如,s1可以是0,s2可以是1。As shown in FIG. 8 , according to the enlarged foreground residual fg_res and the original image A, a matting result 86 , that is, the foreground image FG in the original image, can be obtained. Among them, clip(x, s1, s2) is to limit the value of x to [s1, s2]. This embodiment does not limit specific values of s1 and s2 in the above formula (13), for example, s1 may be 0, and s2 may be 1.
此外,可以按照下面的公式(14)和公式(15)计算与非天空区域对应的透明度图:In addition, the transparency map corresponding to the non-sky region can be calculated according to the following equations (14) and (15):
fs=clip((soft_trimap-s3)/s4,s5,s6)......(14)fs=clip((soft_trimap-s3)/s4,s5,s6)...(14)
Alpha=clip(fs+un*raw_alpha,s7,s8).....(15)Alpha=clip(fs+un*raw_alpha,s7,s8).....(15)
其中,Alpha表示与非天空区域对应的透明度,在得到Alpha后,可以通过双线性差值将Alpha放大回原始图像缩放前的原始尺寸。同样的,本实施例不限制上述的公式(14)和公式(15)中的各个系数s3至s8的具体取值。Among them, Alpha represents the transparency corresponding to the non-sky area. After obtaining the Alpha, the Alpha can be enlarged back to the original size of the original image before scaling through the bilinear difference. Likewise, this embodiment does not limit the specific values of the respective coefficients s3 to s8 in the above formula (14) and formula (15).
图7d示意了透明度图Alpha,其中Alpha中可以明显区分非天空区域与天空区域。示例性的,天空区域内像素的第一取值为1,表示非透明。非天空区域内像素的第二取值为0,表示完全透明。图7e示意了抠出的非天空区域,即前景图像FG。Fig. 7d shows a transparency map Alpha, where the non-sky area and the sky area can be clearly distinguished in Alpha. Exemplarily, the first value of a pixel in the sky area is 1, indicating non-transparency. The second value of pixels in the non-sky area is 0, which means completely transparent. Fig. 7e illustrates the extracted non-sky area, ie the foreground image FG.
S906,按照预设步长(step)对前景图像FG,Alpha,对包含目标区域BG的素材图像进行采样,得到前景采样点FG1,第三采样点Alpha1与第二采样点BG1。S906. Sampling the foreground image FG, Alpha, and the material image including the target area BG according to a preset step, to obtain a foreground sampling point FG1, a third sampling point Alpha1, and a second sampling point BG1.
结合图8,本步骤中可以按照人物图像短边整除20后得到的数值作为所述step,对抠图结果86(前景图像FG和Alpha),素材图像87进行所述采样。In conjunction with FIG. 8 , in this step, the value obtained after the short side of the person image is divided by 20 can be used as the step, and the matting result 86 (foreground image FG and Alpha) and the material image 87 are sampled.
S907,确定目标区域BG与所述前景区域FG之间的颜色差异。S907. Determine the color difference between the target area BG and the foreground area FG.
本步骤中可以基于图3示出的方法得到所述颜色差异。In this step, the color difference can be obtained based on the method shown in FIG. 3 .
其中,可以基于所述前景采样点FG1,与所述第三采样点Alpha1,根据前述公式(1)得到所述前景采样点的第一像素均值fg_mean。Wherein, based on the foreground sampling point FG1 and the third sampling point Alpha1, the first pixel mean value fg_mean of the foreground sampling point can be obtained according to the foregoing formula (1).
基于所述第二采样点BG1,确定所述第二采样点的第二像素均值bg_mean。Based on the second sampling point BG1, determine a second pixel mean value bg_mean of the second sampling point.
然后根据前述公式(2)可以得到所述颜色差异diff。Then, the color difference diff can be obtained according to the aforementioned formula (2).
S908,基于所述颜色差异,对前景图像FG进行色调调整,得到与目标区域色调匹配的调整后的前景图像(第二图像)。S908. Based on the color difference, perform tone adjustment on the foreground image FG to obtain an adjusted foreground image (second image) that matches the tone of the target area.
本步骤中,可以根据图4示出的色调调整方法进行色调调整。In this step, tone adjustment may be performed according to the tone adjustment method shown in FIG. 4 .
结合图8,其中,可以先根据预设的调整系数q,根据前述公式(3),对非天空区域(抠图结果中的前景图像FG)进行调整,得到初步调整后的前景图像(第三图像)。然后再基于前述公式(4),对初步调整后的前景图像(第三图像)进行色调修正,得到最终调整结果88,即最终调整后的前景图像new_FG’(第二图像)。In conjunction with Fig. 8, wherein, the non-sky area (the foreground image FG in the matting result) can be adjusted according to the preset adjustment coefficient q first, according to the aforementioned formula (3), to obtain the preliminary adjusted foreground image (the third image). Then, based on the aforementioned formula (4), the tone correction is performed on the preliminarily adjusted foreground image (third image) to obtain the final adjustment result 88, that is, the final adjusted foreground image new_FG' (second image).
S909,基于所述透明度图,将所述调整后的前景图像与所述素材图像进行图像融合,得到目标图像。S909. Based on the transparency map, perform image fusion on the adjusted foreground image and the material image to obtain a target image.
本步骤中,可以根据前述公式(5)进行图像融合,得到替换掉天空区域后得到目标图像new。In this step, image fusion can be performed according to the foregoing formula (5), and the target image new is obtained after replacing the sky area.
如图7f所示,即通过S901-S909得到的替换天空区域后的目标图像。该方法中,一方面,进行区域替换的过程中,可以保证目标区域与原始图像中的非天空区域色调相匹配,从而提升区域替换效果。As shown in FIG. 7f, it is the target image after replacing the sky area obtained through S901-S909. In this method, on the one hand, in the process of region replacement, it can be ensured that the tone of the target region matches the non-sky region in the original image, thereby improving the effect of region replacement.
另一方面,可以利用三分图进行抠图,从而可以很好的天空区域与非天空区域交接位置的细节信息,在进行区域替换时,有利于提升天空区域与非天空区域之间的衔接效果。On the other hand, the three-part map can be used for matting, so that the detailed information of the intersection position between the sky area and the non-sky area can be well obtained. When performing area replacement, it is beneficial to improve the connection effect between the sky area and the non-sky area. .
还一方面,还可以通过将抠像网络进行通道压缩等处理,并且将原始图像进行缩放处理,可以使得能够更适合在移动端进行抠像,例如,用户使用自己的移动终端拍摄图像后,可以直接在移动终端完成对非天空区域的抠图处理,并与目标区域进行融合,完成对天空区域的替换,从而使得这些处理都可以在移动终端本地进行,不需要上传云端,提高了数据的安全隐私保护。并且可以由图8可以发现,该区域替换的方法是将单一的原始图像作为输入即可直接得到抠像结果,即提供一张原始图像,基于本公开实施例提供的区域替换方法就可以得到对该原始图像中前景的预测,输入的信息较少,从而使得该图像处理更加便利。On the other hand, by performing channel compression and other processing on the keying network, and scaling the original image, it can make it more suitable for keying on the mobile terminal. For example, after the user uses his mobile terminal to capture the image, he can Complete the matting process of the non-sky area directly on the mobile terminal, and merge it with the target area to complete the replacement of the sky area, so that these processes can be performed locally on the mobile terminal without uploading to the cloud, which improves data security privacy protection. And it can be seen from FIG. 8 that the method of region replacement is to directly obtain the matting result by using a single original image as input, that is, to provide an original image, and based on the region replacement method provided by the embodiment of the present disclosure, the corresponding region replacement method can be obtained. The prediction of the foreground in the original image requires less input information, which makes the image processing more convenient.
此外,本公开实施例的抠图处理的流程中,使用的语义分割网络和抠像网络,本实施例不限制这两个网络的训练方法。图10为本公开实施例示出的一种网络训练方法的流程示意图。该方法可以用于对语义分割网络和抠像网络进行联合训练。如图10所示,该方法可以包括如下处理:In addition, the semantic segmentation network and the image matting network are used in the image matting process of the embodiment of the present disclosure, and this embodiment does not limit the training methods of these two networks. FIG. 10 is a schematic flowchart of a network training method shown in an embodiment of the present disclosure. This method can be used for joint training of semantic segmentation network and matting network. As shown in Figure 10, the method may include the following processing:
S1002,获取训练样本集,所述训练样本集包括多个样本数据。S1002. Acquire a training sample set, where the training sample set includes a plurality of sample data.
在一些实施方式中,所述的训练样本集中的每个样本数据可以包括样本图像、样本图像对应的第一特征标签以及样本图像对应的第二特征标签。以抠像场景为例,第一特征标签可以是针对样本图像的分割标签,第二特征标签可以是针对样本图像的抠像标签。In some implementations, each sample data in the training sample set may include a sample image, a first feature label corresponding to the sample image, and a second feature label corresponding to the sample image. Taking a matting scene as an example, the first feature label may be a segmentation label for the sample image, and the second feature label may be a matting label for the sample image.
S1004,针对所述训练样本集中的每个样本数据,对该样本数据处理得到包含样本图像的全局图像信息的全局图像和所述全局图像对应的分割标签,以及包含所述样本图像的局部图像信息的局部图像和所述局部图像对应的抠像标签。S1004, for each sample data in the training sample set, process the sample data to obtain a global image including global image information of the sample image, a segmentation label corresponding to the global image, and local image information including the sample image The partial image of and the keying label corresponding to the partial image.
在一些实施方式中,可以对样本数据的样本图像进行第一处理,得到包括样本图像的大部分图像信息的全局图像,可以认为该全局图像包括了样本图像的全局图像信息,同时对样本图像对应的第一特征标签进行相同的第一处理,得到全局图像对应的分割标签。例如,可以按照语义分割网络对输入图像的尺寸要求,将样本图像进行缩放处理,但仍保留该样本图像的大部分图像信息,得到全局图像,并将第一特征标签进行相同的缩放处理,得到分割标签。In some embodiments, the first processing can be performed on the sample image of the sample data to obtain a global image including most of the image information of the sample image. It can be considered that the global image includes the global image information of the sample image. The same first processing is performed on the first feature label of , to obtain the segmentation label corresponding to the global image. For example, the sample image can be scaled according to the size requirements of the semantic segmentation network for the input image, but still retain most of the image information of the sample image to obtain the global image, and perform the same scaling process on the first feature label to obtain Split tags.
同时,对样本数据的样本图像进行第二处理,还可得到包括样本图像局部图像信息的局部图像,同时对样本图像对应的第二特征标签进行相同的第二处理,得到局部图像对应的抠像标签。例如,可以将样本图像进行局部裁切,得到包括样本图像的局部图像信息的局部图像,并且将第二特征标签进行相同的局部裁切,得到所述抠像标签。At the same time, the second processing is performed on the sample image of the sample data to obtain a partial image including partial image information of the sample image, and at the same time, the same second processing is performed on the second feature label corresponding to the sample image to obtain the keying corresponding to the partial image Label. For example, the sample image may be partially cropped to obtain a partial image including partial image information of the sample image, and the same partial cropping may be performed on the second feature label to obtain the matte label.
S1006,通过语义分割网络对所述全局图像进行语义分割处理,得到所述语义分割网络输出的语义概率图。S1006. Perform semantic segmentation processing on the global image through a semantic segmentation network to obtain a semantic probability map output by the semantic segmentation network.
S1008,基于所述语义概率图进行概率转换处理,得到三分图。S1008. Perform probability conversion processing based on the semantic probability map to obtain a trimogram.
本步骤的概率转换处理可以参照前述的实施例,不再详述。通过所述的概率转换处理可以得到本公开实施例的soft-trimap。For the probability conversion processing in this step, reference may be made to the foregoing embodiments, and no further details are given. The soft-trimap of the embodiment of the present disclosure can be obtained through the probability conversion process.
S1010,通过抠像网络,基于三分图和局部图像进行抠图处理,得到抠像结果。所述抠像结果可以指示针对样本中保留区域的抠像结果。S1010, through the matting network, perform matting processing based on the trimap and the partial image, to obtain a matting result. The matting result may indicate a matting result for a reserved region in the sample.
S1012,根据所述语义概率图与分割标签的差异调整语义分割网络的网络参数,并且,基于抠 像结果和抠像标签的差异调整抠像网络的网络参数。S1012. Adjust the network parameters of the semantic segmentation network according to the difference between the semantic probability map and the segmentation label, and adjust the network parameters of the matting network based on the difference between the matting result and the matting label.
通过上述可知,本公开实施方式中,通过对每个样本数据进行处理,利用得到的包括全局图像信息的全局图像和第一标签对第一子网络进行训练,和包括局部图像信息的局部图像和第二标签对第二子网络进行训练,提高联合训练效果,降低网络效果退化的风险。As can be seen from the above, in the embodiments of the present disclosure, by processing each sample data, the obtained global image including global image information and the first label are used to train the first sub-network, and the local image including local image information and The second label trains the second sub-network to improve the joint training effect and reduce the risk of network effect degradation.
此外,上述的训练方式中,soft-trimap的生成采用概率转换处理这种方式,能够在一定程度上辅助网络训练的效果更好。In addition, in the above training method, the generation of soft-trimap adopts the method of probability conversion processing, which can assist the network training to a certain extent and have a better effect.
具体的,soft-trimap能够在网络训练过程中自适应进行调整。比如,在根据所述语义概率图与分割标签的差异调整语义分割网络的网络参数,并且,基于抠像结果和抠像标签的差异调整抠像网络的网络参数的过程中,语义分割网络的网络参数将进行更新,进而该语义分割网络输出的语义概率图也进行了更新。Specifically, soft-trimap can be adaptively adjusted during network training. For example, in the process of adjusting the network parameters of the semantic segmentation network according to the difference between the semantic probability map and the segmentation label, and adjusting the network parameters of the matting network based on the difference between the matting result and the matting label, the network parameters of the semantic segmentation network The parameters will be updated, and thus the semantic probability map output by the semantic segmentation network will also be updated.
进一步的,soft-trimap是基于语义概率图生成的,因此,语义概率图更新将带来三分图soft-trimap的更新,进而抠像结果也会更新。即,在网络训练过程中通常会迭代多次,而每一次迭代后,如果语义分割网络发生了参数更新,即使输入的是同一个图像,语义概率图、soft-trimap和抠像结果都会适应性更新,并根据更新后的结果继续调整网络参数。这种自适应调整soft-trimap的方式,将有助于使得生成的soft-trimap和抠像结果都随着语义分割网络的调整进行动态优化,使得最终模型的训练效果更好,能更准确的提取到目标图像中的保留区域。Furthermore, the soft-trimap is generated based on the semantic probability map. Therefore, the update of the semantic probability map will bring the update of the three-part map soft-trimap, and then the matting result will also be updated. That is, in the network training process, it is usually iterated multiple times, and after each iteration, if the parameters of the semantic segmentation network are updated, even if the input is the same image, the semantic probability map, soft-trimap and matting results will be adaptive. Update, and continue to adjust the network parameters according to the updated results. This method of adaptively adjusting the soft-trimap will help to dynamically optimize the generated soft-trimap and matting results along with the adjustment of the semantic segmentation network, so that the training effect of the final model is better and more accurate. Extract into preserved regions in the target image.
图11示例了一种图像处理装置,该装置可以应用于实现本公开任一实施例的图像处理方法。如图11所示,该装置可以包括:抠图模块1110、确定模块1120、调整模块1130和融合模块1140。Fig. 11 illustrates an image processing device, which can be applied to implement the image processing method of any embodiment of the present disclosure. As shown in FIG. 11 , the apparatus may include: a map matting module 1110 , a determination module 1120 , an adjustment module 1130 and a fusion module 1140 .
如图11所示,所述装置1100包括:As shown in Figure 11, the device 1100 includes:
抠图模块1110,用于对待处理的原始图像进行抠图处理,得到抠图结果,所述抠图结果中包括第一图像以及与所述原始图像对应的透明度图;所述第一图像包括所述原始图像中的保留区域,所述保留区域为所述原始图像中的前景或者背景;所述透明度图内像素的数值指示像素的透明度;The matting module 1110 is configured to perform matting processing on the original image to be processed to obtain a matting result, the matting result including a first image and a transparency map corresponding to the original image; the first image includes the A reserved area in the original image, the reserved area is the foreground or background in the original image; the numerical value of the pixel in the transparency map indicates the transparency of the pixel;
确定模块1120,用于根据所述第一图像内的像素以及包含目标区域的素材图像内的像素之间的差异,确定所述保留区域与所述目标区域之间的颜色差异;所述目标区域用于替换所述原始图像中的非保留区域;A determination module 1120, configured to determine the color difference between the reserved area and the target area according to the difference between the pixels in the first image and the pixels in the material image containing the target area; the target area for replacing non-preserved regions in said original image;
调整模块1130,用于根据所述颜色差异对所述第一图像内像素进行色调调整,得到与所述目标区域色调匹配的第二图像;An adjustment module 1130, configured to adjust the hue of the pixels in the first image according to the color difference to obtain a second image that matches the hue of the target area;
融合模块1140,用于基于所述透明度图,将所述第二图像与所述素材图像进行图像融合,得到目标图像。The fusion module 1140 is configured to perform image fusion on the second image and the material image based on the transparency map to obtain a target image.
在一些实施例中,所述确定模块1120,具体用于:In some embodiments, the determining module 1120 is specifically configured to:
对所述第一图像内的像素和所述素材图像内的像素分别进行采样,得到第一采样点与第二采样点;Sampling the pixels in the first image and the pixels in the material image respectively to obtain a first sampling point and a second sampling point;
基于所述第一采样点的像素值与所述第二采样点的像素值之间的差异,确定所述保留区域与所述目标区域之间的所述颜色差异。The color difference between the reserved area and the target area is determined based on the difference between the pixel value of the first sampling point and the pixel value of the second sampling point.
在一些实施例中,所述装置1100还包括:In some embodiments, the device 1100 also includes:
采用模块,用于对所述透明度图内的像素进行采样,得到第三采样点;A module is used to sample pixels in the transparency map to obtain a third sampling point;
所述确定模块1120,具体用于:The determining module 1120 is specifically used for:
基于所述第一采样点的像素值,与所述第三采样点的透明度值,确定所述第一采样点的第一像素均值;determining a first pixel mean value of the first sampling point based on the pixel value of the first sampling point and the transparency value of the third sampling point;
基于所述第二采样点的像素值,确定所述第二采样点的第二像素均值;determining a second average pixel value of the second sampling point based on the pixel value of the second sampling point;
根据所述第一像素均值与所述第二像素均值之间的差异,确定所述保留区域与所述目标区域之间的所述颜色差异。The color difference between the reserved area and the target area is determined according to the difference between the first pixel mean value and the second pixel mean value.
在一些实施例中,所述调整模块1130,具体用于:In some embodiments, the adjustment module 1130 is specifically used to:
基于所述颜色差异,对所述第一图像内像素的像素值进行初步调整,得到第三图像;所述第三图像内像素融合了所述颜色差异;Based on the color difference, preliminarily adjust the pixel values of the pixels in the first image to obtain a third image; the pixels in the third image are fused with the color difference;
基于所述第三图像内像素的像素均值与第一图像内像素的像素均值之间的差异,对所述第三图像内像素的像素值进行调整,得到与所述目标区域色调匹配的第二图像。Based on the difference between the pixel mean value of the pixels in the third image and the pixel mean value of the pixels in the first image, adjust the pixel values of the pixels in the third image to obtain a second color matching the tone of the target area. image.
在一些实施例中,所述融合模块1140,具体用于:In some embodiments, the fusion module 1140 is specifically used for:
基于所述透明度图与所述第二图像进行融合得到第一结果;performing fusion based on the transparency map and the second image to obtain a first result;
基于所述素材图像与所述透明度图对应的反向透明度图进行融合得到第二结果;performing fusion based on the material image and a reverse transparency map corresponding to the transparency map to obtain a second result;
基于所述第一结果与所述第二结果进行融合得到所述目标图像。performing fusion based on the first result and the second result to obtain the target image.
在一些实施例中,所述抠图模块1110,具体用于:In some embodiments, the map-cutting module 1110 is specifically used for:
获取所述原始图像对应的三分图;针对所述三分图中每个像素,该像素对应的数值表示该像素 属于所述原始图像中的保留区域、非保留区域或待确定区域中的任一区域的概率;Acquiring the three-part map corresponding to the original image; for each pixel in the three-part map, the value corresponding to the pixel indicates that the pixel belongs to any of the reserved area, non-reserved area or undetermined area in the original image the probability of a region;
根据所述三分图和所述原始图像进行抠像处理,得到所述抠图结果。and performing matting processing according to the trimap and the original image to obtain the matting result.
在一些实施例中,所述抠图模块1110,具体用于:In some embodiments, the map-cutting module 1110 is specifically used for:
对所述原始图像进行语义分割处理,得到所述原始图像的语义概率图,所述语义概率图内像素的数值指示该像素在所述原始图像中属于所述保留区域的第一概率;performing semantic segmentation processing on the original image to obtain a semantic probability map of the original image, where the value of a pixel in the semantic probability map indicates a first probability that the pixel belongs to the reserved area in the original image;
基于所述语义概率图进行概率转换处理,得到所述原始图像对应的三分图。Probability conversion processing is performed based on the semantic probability map to obtain a tripartite map corresponding to the original image.
在一些实施例中,所述抠图模块1110,具体用于:In some embodiments, the map-cutting module 1110 is specifically used for:
通过语义分割网络对所述原始图像进行语义分割处理,得到所述语义分割网络输出的所述语义概率图;performing semantic segmentation processing on the original image through a semantic segmentation network to obtain the semantic probability map output by the semantic segmentation network;
根据所述三分图和所述原始图像进行抠像处理,包括:通过抠像网络,根据所述三分图和所述原始图像进行抠像处理。Performing image matting processing according to the trimap and the original image includes: performing matting processing according to the trimap and the original image through a matting network.
在一些实施例中,所述抠图模块1110,具体用于:In some embodiments, the map-cutting module 1110 is specifically used for:
针对所述语义概率图中的每个像素,基于该像素的所述第一概率进行概率转换,得到该像素属于所述三分图中待确定区域的第二概率;For each pixel in the semantic probability map, a probability conversion is performed based on the first probability of the pixel to obtain a second probability that the pixel belongs to a region to be determined in the tripartite map;
根据语义概率图中的每个像素的所述第一概率和所述第二概率,生成所述三分图。The trimap is generated according to the first probability and the second probability of each pixel in the semantic probability map.
在一些实施例中,语义概率图中的每个像素的所述第一概率表征该像素属于前景或者背景的概率越高,经概率转换得到的所述第二概率表征该像素属于三分图中的待确定区域的概率越低;In some embodiments, the first probability of each pixel in the semantic probability map represents the higher the probability that the pixel belongs to the foreground or the background, and the second probability obtained through probability conversion represents that the pixel belongs to the tripartite map The lower the probability of the area to be determined;
根据所述语义概率图中的每个像素的所述第一概率和所述第二概率,生成所述三分图,包括:对于所述原始图像中的每个像素,根据该像素对应的所述第一概率和所述第二概率进行概率融合,确定该像素在所述三分图中对应的数值。Generating the trimap according to the first probability and the second probability of each pixel in the semantic probability map includes: for each pixel in the original image, according to the corresponding Perform probability fusion of the first probability and the second probability to determine the value corresponding to the pixel in the tripartite map.
在一些实施例中,所述抠图模块1110,具体用于:In some embodiments, the map-cutting module 1110 is specifically used for:
根据所述三分图和所述原始图像进行抠像处理,得到保留区域残差和所述原始图像的初始透明度图;所述初始透明度图内像素的数值指示像素的透明度;Carrying out image matting processing according to the trimap and the original image to obtain an initial transparency map of the retained region residual and the original image; the numerical value of the pixel in the initial transparency map indicates the transparency of the pixel;
基于所述原始图像和所述保留区域残差,得到所述第一图像;Obtaining the first image based on the original image and the reserved area residual;
根据所述三分图,对所述初始透明度图内像素的数值进行调整,得到与所述原始图像对应的所述透明度图。According to the tripartite map, the values of the pixels in the initial transparency map are adjusted to obtain the transparency map corresponding to the original image.
在一些实施例中,所述装置1100还包括:In some embodiments, the device 1100 also includes:
缩放模块,用于对原始图像进行缩放处理;A scaling module, configured to scale the original image;
所述抠图模块1110,具体用于:The map-cutting module 1110 is specifically used for:
将所述保留区域残差放大至所述原始图像进行所述缩放处理之前的尺度;Enlarging the residual of the reserved area to the scale of the original image before performing the scaling process;
根据放大后的保留区域残差和所述原始图像,得到所述第一图像。The first image is obtained according to the enlarged retained area residual and the original image.
在一些实施例中,所述非保留区域包括所述原始图像中的天空区域;所述目标区域包括所述素材图像中的天空区域。In some embodiments, the non-preserved area includes a sky area in the original image; the target area includes a sky area in the material image.
本公开示出的图像处理装置的实施例可以应用于电子设备上。相应地,本公开公开了一种电子设备,该设备可以包括:处理器。Embodiments of the image processing apparatus shown in the present disclosure can be applied to electronic equipment. Accordingly, the present disclosure discloses an electronic device, which may include: a processor.
用于存储处理器可执行指令的存储器。Memory used to store processor-executable instructions.
其中,所述处理器被配置为调用所述存储器中存储的可执行指令,实现前述任一实施例示出的图像处理方法。Wherein, the processor is configured to call executable instructions stored in the memory to implement the image processing method shown in any one of the foregoing embodiments.
请参见图12,图12为本公开实施例示出的一种电子设备的硬件结构示意图。Please refer to FIG. 12 . FIG. 12 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.
如图12所示,该电子设备可以包括用于执行指令的处理器,用于进行网络连接的网络接口,用于为处理器存储运行数据的内存,以及用于存储图像处理装置对应指令的非易失性存储器。As shown in FIG. 12, the electronic device may include a processor for executing instructions, a network interface for connecting to a network, a memory for storing operation data for the processor, and a memory for storing instructions corresponding to the image processing device. volatile memory.
其中,所述装置的实施例可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软件实现为例,作为一个逻辑意义上的装置,是通过其所在电子设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言,除了图12所示的处理器、内存、网络接口、以及非易失性存储器之外,实施例中装置所在的电子设备通常根据该电子设备的实际功能,还可以包括其他硬件,对此不再赘述。Wherein, the embodiment of the device may be implemented by software, or by hardware or a combination of software and hardware. Taking software implementation as an example, as a device in a logical sense, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory for operation by the processor of the electronic device where it is located. From the perspective of hardware, in addition to the processor, memory, network interface, and non-volatile memory shown in Figure 12, the electronic device where the device in the embodiment is usually based on the actual function of the electronic device can also include other Hardware, no more details on this.
可以理解的是,为了提升处理速度,所述图像处理装置对应指令也可以直接存储于内存中,在此不作限定。It can be understood that, in order to increase the processing speed, the corresponding instructions of the image processing device may also be directly stored in the memory, which is not limited herein.
本公开提出一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序可以用于使处理器执行前述任一实施例示出的图像处理方法。The present disclosure provides a computer-readable storage medium, the storage medium stores a computer program, and the computer program can be used to make a processor execute the image processing method shown in any one of the foregoing embodiments.
本领域技术人员应明白,本公开一个或多个实施例可提供为方法、系统或计算机程序产品。因此,本公开一个或多个实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本公开一个或多个实施例可采用在一个或多个其中包含有计算机可用程序代码 的计算机可用存储介质(可以包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that one or more embodiments of the present disclosure may be provided as a method, system or computer program product. Accordingly, one or more embodiments of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present disclosure may employ a computer implemented on one or more computer-usable storage media (which may include, but are not limited to, disk storage, CD-ROM, optical storage, etc.) with computer-usable program code embodied therein. The form of the Program Product.
本公开中的“和/或”表示至少具有两者中的其中一个,例如,“A和/或B”可以包括三种方案:A、B、以及“A和B”。"And/or" in the present disclosure means at least one of the two, for example, "A and/or B" may include three options: A, B, and "A and B".
本公开中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于数据处理设备实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in the present disclosure is described in a progressive manner, the same and similar parts of the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the data processing device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment.
以上对本公开特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的行为或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。The specific embodiments of the present disclosure have been described above. Other implementations are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain embodiments.
本公开中描述的主题及功能操作的实施例可以在以下中实现:数字电子电路、有形体现的计算机软件或固件、可以包括本公开中公开的结构及其结构性等同物的计算机硬件、或者它们中的一个或多个的组合。本公开中描述的主题的实施例可以实现为一个或多个计算机程序,即编码在有形非暂时性程序载体上以被数据处理装置执行或控制数据处理装置的操作的计算机程序指令中的一个或多个模块。可替代地或附加地,程序指令可以被编码在人工生成的传播信号上,例如机器生成的电、光或电磁信号,该信号被生成以将信息编码并传输到合适的接收机装置以由数据处理装置执行。计算机存储介质可以是机器可读存储设备、机器可读存储基板、随机或串行存取存储器设备、或它们中的一个或多个的组合。Embodiments of the subject matter and functional operations described in this disclosure can be implemented in digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware that may include the structures disclosed in this disclosure and their structural equivalents, or their A combination of one or more of them. Embodiments of the subject matter described in this disclosure can be implemented as one or more computer programs, i.e. one or more of computer program instructions encoded on a tangible, non-transitory program carrier for execution by or to control the operation of data processing apparatus. Multiple modules. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for transmission by the data The processing means executes. A computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
本公开中描述的处理及逻辑流程可以由执行一个或多个计算机程序的一个或多个可编程计算机执行,以通过根据输入数据进行操作并生成输出来执行相应的功能。所述处理及逻辑流程还可以由专用逻辑电路—例如FPGA(现场可编程门阵列)或ASIC(专用集成电路)来执行,并且装置也可以实现为专用逻辑电路。The processes and logic flows described in this disclosure can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, such as an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit).
适合用于执行计算机程序的计算机可以包括,例如通用和/或专用微处理器,或任何其他类型的中央处理单元。通常,中央处理单元将从只读存储器和/或随机存取存储器接收指令和数据。计算机的基本组件可以包括用于实施或执行指令的中央处理单元以及用于存储指令和数据的一个或多个存储器设备。通常,计算机还将可以包括用于存储数据的一个或多个大容量存储设备,例如磁盘、磁光盘或光盘等,或者计算机将可操作地与此大容量存储设备耦接以从其接收数据或向其传送数据,抑或两种情况兼而有之。然而,计算机不是必须具有这样的设备。此外,计算机可以嵌入在另一设备中,例如移动电话、个人数字助理(PDA)、移动音频或视频播放器、游戏操纵台、全球定位系统(GPS)接收机、或例如通用串行总线(USB)闪存驱动器的便携式存储设备,仅举几例。Computers suitable for the execution of a computer program may include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Generally, a central processing unit will receive instructions and data from a read only memory and/or a random access memory. The basic components of a computer may include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to, one or more mass storage devices for storing data, such as magnetic or magneto-optical disks, or optical disks, to receive data therefrom or Send data to it, or both. However, a computer is not required to have such a device. In addition, a computer may be embedded in another device such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a device such as a Universal Serial Bus (USB) ) portable storage devices like flash drives, to name a few.
适合于存储计算机程序指令和数据的计算机可读介质可以包括所有形式的非易失性存储器、媒介和存储器设备,例如可以包括半导体存储器设备(例如EPROM、EEPROM和闪存设备)、磁盘(例如内部硬盘或可移动盘)、磁光盘以及CD ROM和DVD-ROM盘。处理器和存储器可由专用逻辑电路补充或并入专用逻辑电路中。Computer-readable media suitable for storing computer program instructions and data may include all forms of non-volatile memory, media and memory devices and may include, for example, semiconductor memory devices such as EPROM, EEPROM and flash memory devices, magnetic disks such as internal hard drives or removable disks), magneto-optical disks, and CD ROM and DVD-ROM disks. The processor and memory can be supplemented by, or incorporated in, special purpose logic circuitry.
虽然本公开包含许多具体实施细节,但是这些不应被解释为限制任何公开的范围或所要求保护的范围,而是主要用于描述特定公开的具体实施例的特征。本公开内在多个实施例中描述的某些特征也可以在单个实施例中被组合实施。另一方面,在单个实施例中描述的各种特征也可以在多个实施例中分开实施或以任何合适的子组合来实施。此外,虽然特征可以如所述在某些组合中起作用并且甚至最初如此要求保护,但是来自所要求保护的组合中的一个或多个特征在一些情况下可以从该组合中去除,并且所要求保护的组合可以指向子组合或子组合的变型。While this disclosure contains many specific implementation details, these should not be construed as limitations on the scope of any disclosure or of what may be claimed, but rather as primarily describing features of particular disclosed embodiments. Certain features that are described in multiple embodiments within this disclosure can also be implemented in combination in a single embodiment. On the other hand, various features that are described in a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may function in certain combinations as described and even initially claimed as such, one or more features from a claimed combination may in some cases be removed from that combination and the claimed A protected combination can point to a subcombination or a variant of a subcombination.
类似地,虽然在附图中以特定顺序描绘了操作,但是这不应被理解为要求这些操作以所示的特定顺序执行或顺次执行、或者要求所有例示的操作被执行,以实现期望的结果。在某些情况下,多任务和并行处理可能是有利的。此外,所述实施例中的各种系统模块和组件的分离不应被理解为在所有实施例中均需要这样的分离,并且应当理解,所描述的程序组件和系统通常可以一起集成在单个软件产品中,或者封装成多个软件产品。Similarly, while operations are depicted in the figures in a particular order, this should not be construed as requiring that those operations be performed in the particular order shown, or sequentially, or that all illustrated operations be performed, to achieve the desired result. In some cases, multitasking and parallel processing may be advantageous. Furthermore, the separation of the various system modules and components in the described embodiments should not be construed as requiring such separation in all embodiments, and it should be understood that the described program components and systems can often be integrated together in a single software product, or packaged into multiple software products.
由此,主题的特定实施例已被描述。其他实施例在所附权利要求书的范围以内。在某些情况下,权利要求书中记载的动作可以以不同的顺序执行并且仍实现期望的结果。此外,附图中描绘的处理并非必需所示的特定顺序或顺次顺序,以实现期望的结果。在某些实现中,多任务和并行处理可能是有利的。Thus, certain embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.
以上仅为本公开一个或多个实施例的一些实施例而已,并不用以限制本公开一个或多个实施例,凡在本公开一个或多个实施例的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本公开一个或多个实施例保护的范围之内。The above are only some examples of one or more embodiments of the present disclosure, and are not intended to limit the one or more embodiments of the present disclosure. Any modification, equivalent replacement, improvement, etc. shall be included in the protection scope of one or more embodiments of the present disclosure.

Claims (16)

  1. 一种图像处理方法,其特征在于,所述方法包括:An image processing method, characterized in that the method comprises:
    对原始图像进行抠图处理,得到抠图结果,所述抠图结果中包括第一图像以及与所述原始图像对应的透明度图;所述第一图像包括所述原始图像中的保留区域,所述保留区域为所述原始图像中的前景或者背景;所述透明度图内像素的数值指示像素的透明度;Carrying out matting processing on the original image to obtain a matting result, the matting result including a first image and a transparency map corresponding to the original image; the first image includes a reserved area in the original image, so The reserved area is the foreground or background in the original image; the numerical value of the pixel in the transparency map indicates the transparency of the pixel;
    根据所述第一图像内的像素以及包含目标区域的素材图像内的像素之间的差异,确定所述保留区域与所述目标区域之间的颜色差异;所述素材图像中的目标区域用于替换所述原始图像中的非保留区域;Determine the color difference between the reserved area and the target area according to the difference between the pixels in the first image and the pixels in the material image containing the target area; the target area in the material image is used for replacing non-preserved regions in said original image;
    根据所述颜色差异对所述第一图像内像素进行色调调整,得到与所述目标区域色调匹配的第二图像;adjusting the color tone of the pixels in the first image according to the color difference to obtain a second image that matches the color tone of the target area;
    基于所述透明度图,将所述第二图像与所述素材图像进行图像融合,得到目标图像。Based on the transparency map, image fusion is performed on the second image and the material image to obtain a target image.
  2. 根据权利要求1所述的方法,其特征在于,根据所述第一图像内的像素以及包含所述目标区域的所述素材图像内的像素之间的差异,确定所述保留区域与所述目标区域之间的所述颜色差异,包括:The method according to claim 1, wherein the reserved area and the target area are determined according to the difference between the pixels in the first image and the pixels in the material image containing the target area. Said color differences between areas, including:
    对所述第一图像内的像素和所述素材图像内的像素分别进行采样,得到第一采样点与第二采样点;Sampling the pixels in the first image and the pixels in the material image respectively to obtain a first sampling point and a second sampling point;
    基于所述第一采样点的像素值与所述第二采样点的像素值之间的差异,确定所述保留区域与所述目标区域之间的所述颜色差异。The color difference between the reserved area and the target area is determined based on the difference between the pixel value of the first sampling point and the pixel value of the second sampling point.
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:The method according to claim 2, further comprising:
    对所述透明度图内的像素进行采样,得到第三采样点;Sampling pixels in the transparency map to obtain a third sampling point;
    基于所述第一采样点的像素值与所述第二采样点的像素值之间的差异,确定所述保留区域与所述目标区域之间的所述颜色差异,包括:Determining the color difference between the reserved area and the target area based on the difference between the pixel value of the first sampling point and the pixel value of the second sampling point includes:
    基于所述第一采样点的像素值,与所述第三采样点的透明度值,确定所述第一采样点的第一像素均值;determining a first pixel mean value of the first sampling point based on the pixel value of the first sampling point and the transparency value of the third sampling point;
    基于所述第二采样点的像素值,确定所述第二采样点的第二像素均值;determining a second average pixel value of the second sampling point based on the pixel value of the second sampling point;
    根据所述第一像素均值与所述第二像素均值之间的差异,确定所述保留区域与所述目标区域之间的所述颜色差异。The color difference between the reserved area and the target area is determined according to the difference between the first pixel mean value and the second pixel mean value.
  4. 根据权利要求1-3任一所述的方法,其特征在于,根据所述颜色差异对所述第一图像内像素进行色调调整,得到与所述目标区域色调匹配的所述第二图像,包括:The method according to any one of claims 1-3, wherein the tone adjustment is performed on the pixels in the first image according to the color difference to obtain the second image that matches the tone of the target area, including :
    基于所述颜色差异,对所述第一图像内像素的像素值进行初步调整,得到第三图像;所述第三图像内像素融合了所述颜色差异;Based on the color difference, preliminarily adjust the pixel values of the pixels in the first image to obtain a third image; the pixels in the third image are fused with the color difference;
    基于所述第三图像内像素的像素均值与所述第一图像内像素的像素均值之间的差异,对所述第三图像内像素的像素值进行调整,得到与所述目标区域色调匹配的第二图像。Based on the difference between the pixel mean value of the pixels in the third image and the pixel mean value of the pixels in the first image, adjust the pixel values of the pixels in the third image to obtain a color matching the color tone of the target area. second image.
  5. 根据权利要求1-4任一所述的方法,其特征在于,基于所述透明度图,将所述第二图像与所述素材图像进行图像融合,得到所述目标图像,包括:The method according to any one of claims 1-4, wherein, based on the transparency map, performing image fusion of the second image and the material image to obtain the target image includes:
    基于所述透明度图与所述第二图像进行融合得到第一结果;performing fusion based on the transparency map and the second image to obtain a first result;
    基于所述素材图像与所述透明度图对应的反向透明度图进行融合得到第二结果;performing fusion based on the material image and a reverse transparency map corresponding to the transparency map to obtain a second result;
    基于所述第一结果与所述第二结果进行融合得到所述目标图像。performing fusion based on the first result and the second result to obtain the target image.
  6. 根据权利要求1-5任一所述的方法,其特征在于,对所述原始图像进行抠图处理,得到所述抠图结果,包括:The method according to any one of claims 1-5, wherein the matting process is performed on the original image to obtain the matting result, including:
    获取所述原始图像对应的三分图;针对所述三分图中每个像素,该像素对应的数值表示该像素属于所述原始图像中的保留区域、非保留区域或待确定区域中的任一区域的概率;Acquiring the three-part map corresponding to the original image; for each pixel in the three-part map, the value corresponding to the pixel indicates that the pixel belongs to any of the reserved area, non-reserved area or undetermined area in the original image the probability of a region;
    根据所述三分图和所述原始图像进行抠像处理,得到所述抠图结果。and performing matting processing according to the trimap and the original image to obtain the matting result.
  7. 根据权利要求6所述的方法,其特征在于,获取所述原始图像对应的所述三分图,包括:The method according to claim 6, wherein obtaining the tripartite map corresponding to the original image comprises:
    对所述原始图像进行语义分割处理,得到所述原始图像的语义概率图,所述语义概率图内像素的数值指示该像素在所述原始图像中属于所述保留区域的第一概率;performing semantic segmentation processing on the original image to obtain a semantic probability map of the original image, where the value of a pixel in the semantic probability map indicates a first probability that the pixel belongs to the reserved area in the original image;
    基于所述语义概率图进行概率转换处理,得到所述原始图像对应的三分图。Probability conversion processing is performed based on the semantic probability map to obtain a tripartite map corresponding to the original image.
  8. 根据权利要求7所述的方法,其特征在于,对所述原始图像进行语义分割处理,得到对应所述原始图像的所述语义概率图,包括:通过语义分割网络对所述原始图像进行语义分割处理,得到所述语义分割网络输出的所述语义概率图;The method according to claim 7, wherein performing semantic segmentation processing on the original image to obtain the semantic probability map corresponding to the original image comprises: performing semantic segmentation on the original image through a semantic segmentation network Processing to obtain the semantic probability map output by the semantic segmentation network;
    根据所述三分图和所述原始图像进行抠像处理,包括:通过抠像网络,根据所述三分图和所述原始图像进行抠像处理。Performing image matting processing according to the trimap and the original image includes: performing matting processing according to the trimap and the original image through a matting network.
  9. 根据权利要求7或8所述的方法,其特征在于,基于所述语义概率图进行概率转换处理,得 到所述原始图像对应的所述三分图,包括:The method according to claim 7 or 8, wherein, carrying out probability conversion processing based on the semantic probability map to obtain the corresponding three-part map of the original image, including:
    针对所述语义概率图中的每个像素,基于该像素的所述第一概率进行概率转换,得到该像素属于所述三分图中所述待确定区域的第二概率;For each pixel in the semantic probability map, perform probability conversion based on the first probability of the pixel, to obtain a second probability that the pixel belongs to the region to be determined in the tripartite map;
    根据所述语义概率图中的每个像素的所述第一概率和所述第二概率,生成所述三分图。The trimap is generated according to the first probability and the second probability of each pixel in the semantic probability map.
  10. 根据权利要求9所述的方法,其特征在于,针对所述语义概率图中的每个像素,该像素的所述第一概率表征该像素属于前景或者背景的概率越高,经概率转换得到的所述第二概率表征该像素属于三分图中的待确定区域的概率越低;The method according to claim 9, characterized in that, for each pixel in the semantic probability map, the first probability of the pixel indicates that the higher the probability that the pixel belongs to the foreground or the background, the higher the probability conversion obtained The second probability indicates that the pixel belongs to the lower probability of the region to be determined in the tripartite map;
    根据所述语义概率图中的每个像素的所述第一概率和所述第二概率,生成所述三分图,包括:对于所述原始图像中的每个像素,根据该像素对应的所述第一概率和所述第二概率进行概率融合,确定该像素在所述三分图中对应的数值。Generating the trimap according to the first probability and the second probability of each pixel in the semantic probability map includes: for each pixel in the original image, according to the corresponding Perform probability fusion of the first probability and the second probability to determine the value corresponding to the pixel in the tripartite map.
  11. 根据权利要求6-10任一所述的方法,其特征在于,根据所述三分图和所述原始图像进行抠像处理,得到所述抠图结果,包括:The method according to any one of claims 6-10, wherein the matting process is performed according to the trimap and the original image to obtain the matting result, including:
    根据所述三分图和所述原始图像进行抠像处理,得到保留区域残差和所述原始图像的初始透明度图;所述初始透明度图内像素的数值指示像素的透明度;Carrying out image matting processing according to the trimap and the original image to obtain an initial transparency map of the retained region residual and the original image; the numerical value of the pixel in the initial transparency map indicates the transparency of the pixel;
    基于所述原始图像和所述保留区域残差,得到所述第一图像;Obtaining the first image based on the original image and the reserved area residual;
    根据所述三分图,对所述初始透明度图内像素的数值进行调整,得到与所述原始图像对应的所述透明度图。According to the tripartite map, the values of the pixels in the initial transparency map are adjusted to obtain the transparency map corresponding to the original image.
  12. 根据权利要求11所述的方法,其特征在于,在对所述原始图像进行语义分割处理之前,所述方法还包括:对原始图像进行缩放处理;The method according to claim 11, wherein, before performing semantic segmentation processing on the original image, the method further comprises: performing scaling processing on the original image;
    基于所述原始图像和所述保留区域残差,得到所述第一图像,包括:Obtaining the first image based on the original image and the reserved area residual, including:
    将所述保留区域残差放大至所述原始图像进行所述缩放处理之前的尺度;Enlarging the residual of the reserved area to the scale of the original image before performing the scaling process;
    根据放大后的保留区域残差和所述原始图像,得到所述第一图像。The first image is obtained according to the enlarged retained area residual and the original image.
  13. 根据权利要求1-12任一所述的方法,其特征在于,所述非保留区域包括所述原始图像中的天空区域;所述目标区域包括所述素材图像中的天空区域。The method according to any one of claims 1-12, wherein the non-reserved area includes a sky area in the original image; and the target area includes a sky area in the material image.
  14. 一种图像处理装置,其特征在于,所述装置包括:An image processing device, characterized in that the device comprises:
    抠图模块,用于对原始图像进行抠图处理,得到抠图结果,所述抠图结果中包括第一图像以及与所述原始图像对应的透明度图;所述第一图像包括所述原始图像中的保留区域,所述保留区域为所述原始图像中的前景或者背景;所述透明度图内像素的数值指示像素的透明度;A matting module, configured to perform matting processing on an original image to obtain a matting result, the matting result including a first image and a transparency map corresponding to the original image; the first image includes the original image A reserved area in the original image, the reserved area is the foreground or background in the original image; the numerical value of the pixel in the transparency map indicates the transparency of the pixel;
    确定模块,用于根据所述第一图像内的像素以及包含目标区域的素材图像内的像素之间的差异,确定所述保留区域与所述目标区域之间的颜色差异;所述目标区域用于替换所述原始图像中的非保留区域;A determining module, configured to determine the color difference between the reserved area and the target area according to the difference between the pixels in the first image and the pixels in the material image containing the target area; the target area is used for replacing non-reserved regions in said original image;
    调整模块,用于根据所述颜色差异对所述第一图像内像素进行色调调整,得到与所述目标区域色调匹配的第二图像;An adjustment module, configured to adjust the tone of the pixels in the first image according to the color difference, to obtain a second image that matches the tone of the target area;
    融合模块,用于基于所述透明度图,将所述第二图像与所述素材图像进行图像融合,得到目标图像。A fusion module, configured to perform image fusion on the second image and the material image based on the transparency map to obtain a target image.
  15. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    处理器;processor;
    用于存储处理器可执行指令的存储器;memory for storing processor-executable instructions;
    其中,所述处理器通过运行所述可执行指令以实现如权利要求1-13任一所述的图像处理方法。Wherein, the processor implements the image processing method according to any one of claims 1-13 by running the executable instructions.
  16. 一种计算机可读存储介质,其特征在于,所述存储介质存储有计算机程序,所述计算机程序用于使处理器执行如权利要求1-13任一所述的图像处理方法。A computer-readable storage medium, characterized in that the storage medium stores a computer program, and the computer program is used to make a processor execute the image processing method according to any one of claims 1-13.
PCT/CN2022/125012 2021-10-29 2022-10-13 Image processing WO2023071810A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111273984.7 2021-10-29
CN202111273984.7A CN113920032A (en) 2021-10-29 2021-10-29 Image processing method, image processing device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2023071810A1 true WO2023071810A1 (en) 2023-05-04

Family

ID=79243957

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/125012 WO2023071810A1 (en) 2021-10-29 2022-10-13 Image processing

Country Status (2)

Country Link
CN (1) CN113920032A (en)
WO (1) WO2023071810A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117522717A (en) * 2024-01-03 2024-02-06 支付宝(杭州)信息技术有限公司 Image synthesis method, device and equipment
CN117522717B (en) * 2024-01-03 2024-04-19 支付宝(杭州)信息技术有限公司 Image synthesis method, device and equipment

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113920032A (en) * 2021-10-29 2022-01-11 上海商汤智能科技有限公司 Image processing method, image processing device, electronic equipment and storage medium
CN114615443A (en) * 2022-03-15 2022-06-10 维沃移动通信有限公司 Image processing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170294000A1 (en) * 2016-04-08 2017-10-12 Adobe Systems Incorporated Sky editing based on image composition
CN110335277A (en) * 2019-05-07 2019-10-15 腾讯科技(深圳)有限公司 Image processing method, device, computer readable storage medium and computer equipment
CN111179282A (en) * 2019-12-27 2020-05-19 Oppo广东移动通信有限公司 Image processing method, image processing apparatus, storage medium, and electronic device
CN111275729A (en) * 2020-01-17 2020-06-12 新华智云科技有限公司 Method and system for precisely dividing sky area and method and system for changing sky of image
CN113920032A (en) * 2021-10-29 2022-01-11 上海商汤智能科技有限公司 Image processing method, image processing device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170294000A1 (en) * 2016-04-08 2017-10-12 Adobe Systems Incorporated Sky editing based on image composition
CN110335277A (en) * 2019-05-07 2019-10-15 腾讯科技(深圳)有限公司 Image processing method, device, computer readable storage medium and computer equipment
CN111179282A (en) * 2019-12-27 2020-05-19 Oppo广东移动通信有限公司 Image processing method, image processing apparatus, storage medium, and electronic device
CN111275729A (en) * 2020-01-17 2020-06-12 新华智云科技有限公司 Method and system for precisely dividing sky area and method and system for changing sky of image
CN113920032A (en) * 2021-10-29 2022-01-11 上海商汤智能科技有限公司 Image processing method, image processing device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117522717A (en) * 2024-01-03 2024-02-06 支付宝(杭州)信息技术有限公司 Image synthesis method, device and equipment
CN117522717B (en) * 2024-01-03 2024-04-19 支付宝(杭州)信息技术有限公司 Image synthesis method, device and equipment

Also Published As

Publication number Publication date
CN113920032A (en) 2022-01-11

Similar Documents

Publication Publication Date Title
WO2023071810A1 (en) Image processing
CN108388882B (en) Gesture recognition method based on global-local RGB-D multi-mode
US10839585B2 (en) 4D hologram: real-time remote avatar creation and animation control
KR102135478B1 (en) Method and system for virtually dying hair
Yang et al. Semantic portrait color transfer with internet images
WO2023066099A1 (en) Matting processing
EP2556660A1 (en) A method of real-time cropping of a real entity recorded in a video sequence
CN107766803B (en) Video character decorating method and device based on scene segmentation and computing equipment
CN114445562A (en) Three-dimensional reconstruction method and device, electronic device and storage medium
KR102181144B1 (en) Method and system for recognizing gender based on image deep learning
CN113822798B (en) Method and device for training generation countermeasure network, electronic equipment and storage medium
Wang et al. Where2stand: A human position recommendation system for souvenir photography
CN116917938A (en) Visual effect of whole body
CN117136381A (en) whole body segmentation
CN110689546A (en) Method, device and equipment for generating personalized head portrait and storage medium
CN108171716B (en) Video character decorating method and device based on self-adaptive tracking frame segmentation
CN113689372A (en) Image processing method, apparatus, storage medium, and program product
CN111402118B (en) Image replacement method and device, computer equipment and storage medium
CN108010038B (en) Live-broadcast dress decorating method and device based on self-adaptive threshold segmentation
CN114466133B (en) Photographing method and device
CN113920023A (en) Image processing method and device, computer readable medium and electronic device
CN111382647A (en) Picture processing method, device, equipment and storage medium
CN114677620A (en) Focusing method, electronic device and computer readable medium
CN114445427A (en) Image processing method, image processing device, electronic equipment and storage medium
CN107945201B (en) Video landscape processing method and device based on self-adaptive threshold segmentation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22885689

Country of ref document: EP

Kind code of ref document: A1