WO2022127454A1 - 抠图模型的训练、抠图方法、装置、设备及存储介质 - Google Patents

抠图模型的训练、抠图方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2022127454A1
WO2022127454A1 PCT/CN2021/129913 CN2021129913W WO2022127454A1 WO 2022127454 A1 WO2022127454 A1 WO 2022127454A1 CN 2021129913 W CN2021129913 W CN 2021129913W WO 2022127454 A1 WO2022127454 A1 WO 2022127454A1
Authority
WO
WIPO (PCT)
Prior art keywords
trimap
sample
image
initial
pixel
Prior art date
Application number
PCT/CN2021/129913
Other languages
English (en)
French (fr)
Inventor
刘钰安
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2022127454A1 publication Critical patent/WO2022127454A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • G06T5/30Erosion or dilatation, e.g. thinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20036Morphological image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Definitions

  • the embodiments of the present application relate to the field of computer vision, and in particular, to a method, apparatus, device, and storage medium for training a matting model, matting.
  • Cutout refers to the operation of separating a part of a picture or image from the original image or original image into a separate layer, which can be used in fields such as portrait blur, background replacement, and image synthesis.
  • the current image processing software uses a neural network for image processing without manual operation, which can improve the efficiency of image processing and the accuracy of matting.
  • the two-stage portrait matting algorithm is a relatively common matting tool.
  • the specific method is to first perform corrosion and expansion processing on the mask map generated by the image segmentation model to obtain a three-layer segmentation (Trimap) map. Or use the Trimap segmentation model to directly obtain the Trimap image containing the foreground, background and undetermined areas, and finally input the Trimap image into the matting model to generate a transparent channel (Alpha) image, so as to use the Alpha image to process the original image.
  • Trimap three-layer segmentation
  • Alpha transparent channel
  • the embodiments of the present application provide a method, apparatus, device, and storage medium for training and matting a map model.
  • the technical solution is as follows:
  • an embodiment of the present application provides a method for training a matting model, the method comprising:
  • the target sample Trimap and the sample image are input into the matting model to obtain a sample Alpha map corresponding to the sample image, and the sample Alpha map includes the predicted transparent channel value corresponding to each pixel;
  • the image processing model is trained based on the sample Alpha map and the labeled Alpha map corresponding to the sample image, where the sample transparency channel value of each pixel is labeled in the labeled Alpha map.
  • an embodiment of the present application provides a method for matting, the method comprising:
  • the graph is divided into foreground area, background area and pending area;
  • an Alpha map corresponding to the target image is obtained, and the Alpha map includes transparent channel values corresponding to each pixel point.
  • an embodiment of the present application provides a training device for a cutout model, and the device includes:
  • the first input module is used to input the sample image into the first image segmentation model and the second image segmentation model, and respectively obtain the initial sample mask map corresponding to the sample image and the initial sample three-layer segmentation Trimap map, the initial sample
  • the Mask image is divided into foreground area and background area
  • the initial sample Trimap image is divided into foreground area, background area and undetermined area;
  • the first optimization module is used to optimize the initial sample Trimap using the initial sample Mask to obtain the target sample Trimap;
  • the second input module is used to input the target sample Trimap and the sample image into a matting model to obtain a sample transparent channel Alpha map corresponding to the sample image, and the sample Alpha map contains predictions corresponding to each pixel. transparent channel value;
  • the training module is configured to train the image processing model based on the sample Alpha map and the labeled Alpha map corresponding to the sample image, and the labeled Alpha map is marked with the sample transparency channel value of each pixel.
  • an embodiment of the present application provides a matting device, the device comprising:
  • the third input module is used for inputting the target image into the first image segmentation model and the second image segmentation model, and respectively obtains the initial Mask map and the initial Trimap map corresponding to the target image, and the initial Mask map is divided into foreground area and Background area, the initial Trimap is divided into foreground area, background area and undetermined area;
  • the second optimization module is used to optimize the initial Trimap by utilizing the initial Mask to obtain the target Trimap;
  • the fourth input module is configured to input the target Trimap and the target image into a matting model to obtain an Alpha map corresponding to the target image, where the Alpha map includes transparency channel values corresponding to each pixel.
  • an embodiment of the present application provides a computer device, the computer device includes a processor and a memory; the memory stores at least one instruction, at least a piece of program, code set or instruction set, the at least one instruction .
  • the at least one piece of program, the code set or the instruction set is loaded and executed by the processor to implement the method for training a matting model in the above aspect, or to implement the method for matting in the above aspect.
  • an embodiment of the present application provides a computer-readable storage medium, where at least one piece of program code is stored in the computer-readable storage medium, and the program code is loaded and executed by a processor to implement the above aspects
  • the training method of the matting model, or implementing the matting method described in the above aspects is not limited to:
  • a computer program product or computer program comprising computer instructions stored in a computer readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device implements the training method or the matting model provided in the various optional implementations of the above aspects. method.
  • Fig. 1 is the flow chart of the matting method in the related art
  • FIG. 2 is a flowchart of a training method for a matting model provided by an exemplary embodiment of the present application
  • FIG. 3 is a schematic diagram of generating an initial sample Mask map and an initial sample Trimap map from a sample image provided by an exemplary embodiment of the present application;
  • FIG. 4 is a flowchart of a training method for a matting model provided by another exemplary embodiment of the present application.
  • Fig. 5 is a flowchart of a method for matting provided by an exemplary embodiment of the present application.
  • FIG. 6 is an initial Mask diagram provided by an exemplary embodiment of the present application.
  • FIG. 8 is a flowchart of a method for matting provided by another exemplary embodiment of the present application.
  • FIG. 9 is a schematic diagram of performing etching processing on an initial Mask map provided by an exemplary embodiment of the present application.
  • FIG. 10 is a schematic diagram of optimizing the foreground area in the candidate Trimap map provided by an exemplary embodiment of the present application.
  • FIG. 11 is an Alpha map generated by a cutout model provided by an exemplary embodiment of the present application.
  • Fig. 12 is a flowchart of a method for matting provided by another exemplary embodiment of the present application.
  • FIG. 13 is a flowchart of generating a target Trimap provided by an exemplary embodiment of the present application.
  • FIG. 14 is a structural block diagram of a training device for a matting model provided by an exemplary embodiment of the present application.
  • FIG. 15 is a structural block diagram of a matting device provided by an exemplary embodiment of the present application.
  • FIG. 16 is a structural block diagram of a computer device provided by an exemplary embodiment of the present application.
  • plural refers to two or more.
  • “And/or”, which describes the association relationship of the associated objects, means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone.
  • the character “/” generally indicates that the associated objects are an "or" relationship.
  • the related art usually uses a two-stage portrait matting algorithm for matting.
  • One method is to first use the Mask image generated by the image segmentation model, and then perform corrosion and expansion processing on it to obtain a Trimap including the foreground area, the background area and the undetermined area.
  • the process includes step 101, obtaining a target image; step 102, inputting the target image into the image segmentation model to generate a Mask map; Step 103, performing expansion and corrosion processing on the Mask map to generate a Trimap map; Step 104 , Input the Trimap map and the target image into the matting model to generate an Alpha map.
  • Another method is to use the Trimap segmentation model to directly obtain a Trimap image containing a foreground area, a background area and an undetermined area.
  • the process is to replace the above steps 102 to 103 with step 105, and input the target image into the Trimap segmentation model to generate a Trimap image.
  • the two methods finally input the Trimap image into the matting model, so that the matting model is further divided on the basis of the rough division of the content to be cut and other contents in the Trimap image, and an Alpha image is generated, so as to use the Alpha image to process the original image. deal with.
  • the matting accuracy of the two-stage portrait matting algorithm mainly depends on the accuracy of the Trimap image generated in the first stage.
  • the Trimap image is not corrected and optimized in the related technology. As a result, the final cutout result is inaccurate.
  • the embodiments of the present application provide a method for training and matting a map model. After using the first image segmentation model and the second image segmentation model to obtain an initial Mask map and an initial Trimap map respectively, use The Mask map optimizes each area in the initial Trimap to obtain the target Trimap map, thereby improving the matting accuracy and accuracy of the matting model.
  • the initial sample Mask image is divided into foreground area and background area, and the initial sample Trimap image is Divided into foreground area, background area and pending area;
  • the matting model is trained, and the sample transparent channel values of each pixel are labeled in the Alpha map.
  • use the initial sample Mask to optimize the initial sample Trimap to obtain the target sample Trimap including:
  • the foreground area in the candidate sample Trimap is optimized by using the corrosion sample Mask image, and the target sample Trimap image is obtained.
  • the corrosion sample Mask image is obtained from the initial sample Mask image through corrosion processing.
  • the pixels in the foreground area correspond to the first pixel value
  • the pixels in the undetermined area correspond to the second pixel value
  • the pixels in the background area correspond to the third pixel value
  • the first pixel value is greater than the second pixel value
  • the second pixel value is greater than the third pixel value
  • Use the initial sample Mask map to optimize the undetermined area in the initial sample Trimap map to obtain candidate sample Trimap maps including:
  • the pixel value of the pixel in the initial sample Trimap is updated to the second pixel value to obtain the candidate sample Trimap.
  • the matting model is trained, including:
  • the image processing model is back-propagated using the gradient descent algorithm.
  • the foreground area in the candidate sample Trimap is optimized by using the corrosion sample Mask, and before obtaining the target sample Trimap, the method includes:
  • the initial sample Mask image is eroded to obtain the corrosion sample Mask image.
  • the foreground area in the corrosion sample Mask image is smaller than the initial sample Mask image. foreground area.
  • the initial Mask image is divided into foreground area and background area
  • the initial Trimap image is divided into foreground area. , background area and pending area;
  • use the initial Mask graph to optimize the initial Trimap graph to obtain the target Trimap graph including:
  • the foreground area in the candidate Trimap map is optimized by using the corrosion Mask map, and the target Trimap map is obtained.
  • the corrosion Mask map is obtained from the original Mask map after corrosion processing.
  • the pixels in the foreground area correspond to the first pixel value
  • the pixels in the undetermined area correspond to the second pixel value
  • the pixels in the background area correspond to the third pixel value
  • the first pixel value is greater than the second pixel value
  • the second pixel value is greater than the third pixel value
  • Use the initial Mask map to optimize the undetermined area in the initial Trimap map to obtain candidate Trimap maps including:
  • the initial Trimap In response to the pixel value corresponding to the pixel in the initial Trimap being the third pixel value, and the pixel corresponding to the pixel in the initial Mask being the first pixel value, the initial Trimap The pixel value of the pixel in the figure is updated to the second pixel value to obtain a candidate Trimap.
  • the pixel value of the pixel in the superimposed image is updated to the first pixel value to obtain the target Trimap.
  • input the target Trimap image and the target image into the matting model to obtain the Alpha image corresponding to the target image including:
  • the matting model is used to segment the undetermined area in the target Trimap, and the transparency of the target Trimap is processed to obtain the Alpha image.
  • FIG. 2 shows a flowchart of a method for training a matting model provided by an embodiment of the present application.
  • the embodiments of the present application are described by taking the method applied to a computer device as an example, and the method includes:
  • Step 201 input the sample image into the first image segmentation model and the second image segmentation model, respectively obtain an initial sample Mask image and an initial sample Trimap image corresponding to the sample image, and the initial sample Mask image is divided into a foreground area and a background area.
  • the Trimap map is divided into foreground area, background area and pending area.
  • the mask map is a bitmap obtained by directly segmenting the target content and other content in the image by the image segmentation model, which includes the foreground area and the background area.
  • the foreground area is the area corresponding to the target content determined by the image segmentation model
  • the background area is It is the image segmentation model that determines the area corresponding to other content.
  • the foreground area is the portrait in the image
  • the background area is the part of the image other than the portrait.
  • the Trimap image is a bitmap obtained by the expansion and corrosion of the Mask image, or by directly segmenting the image by the image segmentation model. It includes the foreground area, the background area and the pending area.
  • the content area usually includes the boundary of the target content, and the undetermined area needs to be accurately segmented by the matting model in the subsequent matting stage. For example, corresponding to the field of portrait matting, areas that are difficult to accurately identify, such as near the hair tips, gaps between hair strands, and finger gaps, may be determined as undetermined areas.
  • the computer device inputs the sample image into the first image segmentation model to obtain the initial sample Mask map, and inputs the sample image into the second image segmentation model to obtain the initial sample Trimap map.
  • FIG. 3 shows an initial sample Mask map and an initial sample Trimap map corresponding to a sample image.
  • the computer device inputs the sample image 301 into the first image segmentation model and the second image segmentation model respectively, and segments the portrait and the background in the sample image 301 to obtain an initial sample Mask map 302 and an initial sample Trimap map 303, wherein the initial sample Mask
  • the image 302 includes a foreground area 302a and a background area 302b
  • the initial sample Trimap image 303 includes a foreground area 303a, a background area 303b, and a pending area 303c.
  • the computer device in the model training stage, firstly performs model training on the first image segmentation model and the second image segmentation model, respectively, and uses the trained first image segmentation model and the second image segmentation model to The sample image is image segmented, and then the training step of the matting model is performed.
  • the computer equipment separately trains the first image segmentation model, the second image segmentation model and the matting model, and then cascades the above three models, fixes the model parameters of the first image segmentation model and the second image segmentation model, and uses the sample image.
  • using a lower learning rate to fine-tune the matting model and the embodiment of this application mainly describes the content of the fine-tuning part of the model.
  • Step 202 optimizing the initial sample Trimap by using the initial sample Mask to obtain the target sample Trimap.
  • the Mask image is obtained by the image segmentation model directly distinguishing the target content from other content. Therefore, the uncertain area such as the junction of the target content and other content may be directly divided into the foreground area. If the Mask image is directly used for matting , which may lead to lower cutout accuracy of the subsequent cutout model, that is, the obtained portrait contains other content; while the Trimap map contains both the foreground area and the undetermined area, and the computer equipment only compares the probability of belonging to the target content. The high part is divided into the foreground area, and the content that is not easy to be divided, such as the boundary of the target content, is divided into the undetermined area, that is, the undetermined area contains part of the target content, so the target content may be incomplete in the subsequent matting.
  • the embodiment of the present application obtains the target sample Trimap by using the initial sample Mask to optimize the initial sample Trimap, thereby using the target sample Trimap Train the cutout model.
  • Step 203 Input the target sample Trimap image and the sample image into the matting model to obtain a sample transparent channel Alpha image corresponding to the sample image, and the sample Alpha image includes the predicted transparent channel value corresponding to each pixel point.
  • the computer equipment uses the matting model to perform feature extraction on the sample image based on the foreground area and background area in the target sample Trimap, so as to further segment the to-be-determined area based on the characteristics of each area, and at the same time, each pixel in the target sample Trimap Point to set the transparency channel value. For example, some pixels in the foreground area can be directly set to opaque, and some pixels in the background area can be directly set to be completely transparent. For the undetermined area, the matting model is based on the results of fine segmentation. Pixels are set with different transparency channel values, so that the edge area of the target content can be processed in more detail.
  • Step 204 based on the sample Alpha map and the labeled Alpha map corresponding to the sample image, train the matting model, and label the sample transparency channel values of each pixel in the Alpha map.
  • the labeled alpha map is an alpha map pre-labeled with the sample transparent channel value corresponding to each pixel point.
  • the computer equipment adjusts the parameters in the cutout model based on the sample Alpha map and the labeled Alpha map, and iterates repeatedly until the output result of the cutout model (sample Alpha map) is closer to the labeled Alpha map, so that the model parameters of the cutout model are constantly being changed. optimization.
  • the initial mask map generated by the first image segmentation model is used to optimize the initial Trimap map generated by the second image segmentation model, so that the foreground area, background area and The division of the undetermined area is closer to the sample image, which improves the segmentation accuracy of the Trimap, so that the matting model can adjust the model parameters based on a more accurate image segmentation basis during the training process based on the target Trimap, thereby improving the matting. Cutout accuracy and accuracy of graph models.
  • FIG. 4 shows a flowchart of a training method for a matting model provided by another embodiment of the present application.
  • the embodiments of the present application are described by taking the method applied to a computer device as an example, and the method includes:
  • Step 401 input the sample image into the first image segmentation model and the second image segmentation model, respectively obtain an initial sample Mask image and an initial sample Trimap image corresponding to the sample image, and the initial sample Mask image is divided into a foreground area and a background area.
  • the Trimap map is divided into foreground area, background area and pending area.
  • step 401 For the specific implementation of step 401, reference may be made to the foregoing step 201, and details are not described herein again in this embodiment of the present application.
  • Step 402 using the initial sample Mask map to optimize the undetermined area in the initial sample Trimap map to obtain a candidate sample Trimap map.
  • the image segmentation model generates the initial sample Mask image and the initial sample Trimap image in different ways, and there are differences in the segmentation of the target content boundary and other regions.
  • the foreground area of the initial sample Mask image is too large and the foreground area of the initial sample Trimap image is too small.
  • the computer device optimizes the undetermined area in the initial sample Trimap by using the foreground area in the initial sample Mask, which will belong to the foreground area in the initial sample Mask, and will belong to the foreground area in the initial sample Trimap.
  • the part belonging to the background area is re-determined as the undetermined area, so that the undetermined area and the foreground area in the candidate sample Trimap map can completely cover the area corresponding to the target content as much as possible.
  • the pixel values corresponding to different regions are different in the initial sample Mask map and the initial sample Trimap map.
  • the pixels in the foreground area correspond to the first pixel value
  • the pixels in the undetermined area correspond to the second pixel value
  • the pixels in the background area correspond to the third pixel value
  • the first pixel value is greater than the second pixel value
  • the first pixel value is greater than the second pixel value.
  • the two pixel value is greater than the third pixel value.
  • the pixel value of the foreground area is 255
  • the pixel value of the background area is 0
  • the pixel value of the undetermined area is 128.
  • step 402 includes the following steps:
  • Step 402a Determine the correspondence between the pixel points in the initial sample Trimap map and the initial sample Mask map.
  • the initial sample Trimap image and the initial sample Mask image are obtained based on the same sample image, and the pixels of the two are in one-to-one correspondence.
  • the computer device determines the pixel points with the same relative positions in the initial sample Trimap map and the initial sample Mask map as corresponding pixel points. For example, the computer device determines the corresponding relationship of the pixel points based on the coordinates of each pixel point in the initial sample Trimap map and the initial sample Mask map. Two pixels with the same coordinates in the initial sample Trimap map and the initial sample Mask map are determined as a pair of pixel points, and based on the difference of the pixel values of each pair of pixel points, the pixel value of the pixel point in the initial sample Trimap map is determined. Make corrections to optimize the initial sample Trimap.
  • Step 402b for any pixel in the initial sample Trimap, in response to the pixel value corresponding to the pixel in the initial sample Trimap being the third pixel value, and the pixel corresponding to the pixel in the initial sample Mask being the first pixel value. pixel value, update the pixel value of the pixel point in the initial sample Trimap image to the second pixel value to obtain the candidate sample Trimap image.
  • the computer device divides the pixels with the third pixel value in the initial sample Trimap and the first pixel value in the initial sample Mask into the undetermined area in the candidate sample Trimap , that is, the first image segmentation model is confirmed to belong to the foreground area, and the second image segmentation model is confirmed to belong to the background area.
  • the disputed part between the two is determined as the undetermined area, and the undetermined area in the initial sample Trimap is expanded to make the foreground area.
  • the undetermined area covers the target content as much as possible, so as to avoid part of the target content in the target sample Trimap map being divided into the background area, resulting in incomplete target content obtained by subsequent matting.
  • the computer device uses the array operation in the open source database (Numerical Python, Numpy) to generate the Trimap of the candidate samples.
  • the computer equipment copies the initial sample Trimap to obtain the copy image, and updates pixel values in the copied image; or, the computer equipment directly optimizes the initial sample Trimap to obtain the candidate sample Trimap. This embodiment of the present application does not limit this.
  • the computer device traverses the pixels in the initial sample Trimap and the initial sample Mask in a preset order, and when traversing to pixel A, determines that the pixel value of pixel A in the initial sample Trimap is 255, then the The pixel value of the point remains unchanged; when traversing to the pixel point B, it is determined that the pixel value of the pixel point B in the initial sample Trimap image is 128, then the pixel value of the point remains unchanged; when traversing to the pixel point C, it is determined that the pixel point C is in the initial sample. If the pixel value in the sample Trimap is 0, and the pixel value in the initial sample Mask is 255, then the pixel value of the point in the initial sample Trimap (or a copy thereof) is updated to 128.
  • Step 403 using the image corrosion function in OpenCV, perform corrosion processing on the initial sample Mask image to obtain a corrosion sample Mask image, and the foreground area in the corrosion sample Mask image is smaller than the foreground area in the initial sample Mask image.
  • Erosion is a basic morphological operation, which is used to find extremely small areas in the image, that is, to reduce and refine the highlighted parts or white areas in the image, so that the resulting image is larger than the highlighted area of the original image (that is, the target content corresponds to the original image). area) is smaller.
  • the computer equipment is processed by corrosion, so that the foreground area in the obtained corroded sample Mask image is smaller than the foreground area in the initial sample Mask image.
  • the computer device uses the image erosion function cv2.ercode() in OpenCV to erode the initial sample Mask image.
  • the computer equipment uses a 15*15 convolution kernel to perform one corrosion iteration on the initial sample Mask map, to obtain the corrosion sample Mask map.
  • Step 404 optimize the foreground region in the candidate sample Trimap by using the corrosion sample Mask image to obtain the target sample Trimap image, and the corrosion sample Mask image is obtained by corroding the initial sample Mask image.
  • the foreground area in the initial sample Trimap is too small to completely contain the area corresponding to the target content, and the foreground area in the corrosion sample Mask image has higher accuracy than the initial sample Trimap image, so computer equipment uses The foreground area in the mask map of the corroded sample refines the foreground area in the candidate sample Trimap.
  • step 404 includes the following steps:
  • Step 404a performing pixel-level superposition on the corrosion sample Mask image and the candidate sample Trimap image to obtain a sample superimposed image.
  • the computer equipment directly superimposes the mask image of the corrosion sample and the Trimap image of the candidate sample at the pixel level, and additionally generates a sample superimposed image;
  • the candidate sample Trimap is converted into a sample overlay image, which is not limited in this embodiment of the present application.
  • the computer device traverses the pixel points in the candidate sample Trimap map and the corrosion sample Mask map in a preset order, for a pair of pixel points such as the pixel point A in the candidate sample Trimap map and the pixel point a in the corrosion sample Mask map. , if the pixel value of pixel point A in the candidate sample Trimap map is 255, and the pixel value of pixel point a in the corrosion sample Mask map is 255, then determine the pixel value of the corresponding pixel point in the sample overlay image for the pair of pixels. 510.
  • Step 404b for any pixel in the sample overlay image, in response to the pixel value being greater than the first pixel value, update the pixel value of the pixel in the sample overlay image to the first pixel value to obtain the target sample Trimap.
  • the computer device updates the pixel value of the pixel whose pixel value is greater than the first pixel value in the sample superimposed image to the first pixel value to obtain the target sample Trimap. That is, for the pixels that belong to the undetermined area in the candidate sample Trimap, but belong to the foreground area in the corrosion sample Mask image, it is determined that they belong to the foreground area in the target sample Trimap image, thereby improving the integrity and accuracy of the foreground area. .
  • the computer device updates its pixel value to 255.
  • the above optimization process can be implemented by using the array operations and functions in the open source library, without the need to build a neural network model, and the operation speed is fast.
  • Step 405 Input the target sample Trimap image and the sample image into the matting model to obtain the sample transparent channel Alpha map corresponding to the sample image, and the sample Alpha map includes the predicted transparent channel value corresponding to each pixel point.
  • the computer equipment After the computer equipment optimizes the foreground area and the undetermined area in the initial sample Trimap, it inputs the obtained target sample Trimap and the sample image into the matting model.
  • the matting model uses the foreground area and background area in the target sample Trimap to extract features from the content of the corresponding area in the sample image, learn the features and differences between the target content and other content, and analyze the target sample Trimap based on the extracted features.
  • the undetermined area in the is finely segmented, and the predicted transparent channel value corresponding to each pixel is generated.
  • Step 406 Calculate the Euclidean distance between the predicted transparent channel value corresponding to each pixel in the sample Alpha map and the sample transparent channel value.
  • the computer device uses Euclidean distance to calculate the matting loss. First, calculate the Euclidean distance between the predicted transparent channel value and the sample transparent channel value corresponding to each pixel in the sample Alpha map. The formula is as follows:
  • ⁇ Alpha is the predicted transparent channel value of the pixel point
  • ⁇ Label is the sample transparent channel value of the pixel point
  • is a constant used to correct the calculation result.
  • Step 407 Determine the matting loss of the matting model based on the Euclidean distance, where the matting loss is the sum of the Euclidean distances corresponding to each pixel point.
  • the computer equipment takes the sum of the Euclidean distances corresponding to each pixel as the matting loss of the matting model, and its specific formula is as follows:
  • N represents the number of pixels in the sample Alpha map, is the predicted transparent channel value of the i-th pixel, The sample transparency channel value of the ith pixel.
  • Step 408 based on the matting loss, use a gradient descent algorithm to perform back-propagation training on the matting model.
  • the computer equipment calculates the gradient of each model parameter based on the calculated matting loss, thereby determining the convergence direction of the loss function, and revising the model parameters. After multiple iterations of training, the accuracy of the matting model is improved.
  • the number of times of model training reaches a threshold of times, or when the matting loss is less than a preset value, it is determined that the matting loss is converged, and the model training is completed.
  • the undetermined area in the initial sample Trimap is optimized by using the foreground area in the initial sample Mask, and the foreground area in the candidate sample Trimap is optimized by using the foreground area in the corrosion sample Mask, so that The undetermined area and foreground area in the target sample Trimap map can cover the area corresponding to the target content as much as possible, thereby improving the matting accuracy of the matting model and improving the matting efficiency; using the array operations and corrosion functions in the open source library to optimize, There is no need to build a neural network model and the operation speed is fast; the Euclidean distance between the sample transparent channel value corresponding to the pixel and the predicted transparent channel value is used to calculate the matting loss, and the matting model is back-propagated based on the matting loss. Improve the cutout model's accuracy and cutout effect.
  • FIG. 5 shows a flow chart of a matting method according to an exemplary embodiment of the present application.
  • the embodiment of the present application is described by taking the method applied to a computing device as an example, and the method includes:
  • Step 501 input the target image into the first image segmentation model and the second image segmentation model, respectively obtain an initial Mask image and an initial Trimap image corresponding to the target image, the initial Mask image is divided into a foreground area and a background area, and the initial Trimap image is divided. For the foreground area, background area and pending area.
  • the computer device inputs the target image into the first image segmentation model to obtain an initial Mask map, and inputs the target image into the second image segmentation model to obtain an initial Trimap map.
  • FIG. 6 shows an initial Mask image corresponding to a target image after being processed by the first image segmentation model, which includes a foreground area 601 and a background area 602;
  • FIG. 7 shows a target image after the second image is processed.
  • the corresponding initial Trimap map after the segmentation model is processed, which includes a foreground area 701 , a background area 702 and an undetermined area 703 .
  • the computer device first inputs the target image into the first image segmentation model to obtain an initial Mask map, and then inputs the target image into the second image segmentation model to obtain an initial Trimap map; or, firstly inputs the target image into the second image segmentation model. , obtain the initial Trimap map, and then input the target image into the first image segmentation model to obtain the initial Mask map; or, the computer equipment performs a copy operation on the target image, and simultaneously inputs the two target images into the first image segmentation model and the second image segmentation model Model.
  • This embodiment of the present application does not limit this.
  • Step 502 using the initial Mask map to optimize the initial Trimap map to obtain the target Trimap map.
  • the first image segmentation model may be the target content (that is, the image content to be extracted) are divided into foreground areas, so the foreground area of the initial Mask map will contain relatively complete target content and a part of the background content; while the initial Trimap map includes foreground areas and background areas, as well as undetermined areas, the second matting map
  • the model only divides the part that is most likely to belong to the target content into the foreground area, and the uncertain part is divided into the undetermined area, so usually the foreground area in the initial Trimap map cannot contain the complete target content, while the undetermined area contains both the target content. Background content is also included, and the foreground and pending regions may also not fully cover the target content. Therefore, in a possible implementation manner, the computer device uses the initial Mask map to optimize the initial Trimap map to obtain the target Trimap map.
  • Step 503 Input the target Trimap map and the target image into the matting model to obtain an Alpha map corresponding to the target image, and the Alpha map contains the transparent channel values corresponding to each pixel point.
  • the computer equipment uses the matting model to perform feature extraction on the target image based on the foreground area and background area in the target Trimap, so as to further segment the to-be-determined area based on the characteristics of each area, and set each pixel in the target Trimap at the same time.
  • the initial Trimap is optimized by using the initial Mask to obtain the target Trimap, so that the matting model only needs to refine the to-be-determined area based on the foreground area and the background area in the target Trimap.
  • the segmentation reduces the amount of calculation of the matting model, provides a high-accuracy image segmentation basis for the matting model in advance, avoids the matting model from performing matting based on wrong segmentation results, and improves the matting efficiency of the matting model. graph accuracy.
  • FIG. 8 shows a flowchart of a map-out method according to another exemplary embodiment of the present application.
  • the embodiment of the present application is described by taking the method applied to a computing device as an example, and the method includes:
  • Step 801 input the target image into the first image segmentation model and the second image segmentation model, respectively obtain the initial Mask image and the initial Trimap image corresponding to the target image, the initial Mask image is divided into a foreground area and a background area, and the initial Trimap image is divided into For the foreground area, background area and pending area.
  • step 801 For the specific implementation of step 801, reference may be made to the foregoing step 501, and details are not described herein again in this embodiment of the present application.
  • Step 802 using the initial Mask map to optimize the undetermined area in the initial Trimap map to obtain a candidate Trimap map.
  • the pixel values corresponding to different areas are different, and the pixels in the foreground area correspond to the first pixel.
  • the pixel in the pending area corresponds to the second pixel value
  • the pixel in the background area corresponds to the third pixel value
  • the first pixel value is greater than the second pixel value
  • the second pixel value is greater than the third pixel value.
  • the pixel value of the foreground area is 255
  • the pixel value of the background area is 0
  • the pixel value of the undetermined area is 128.
  • step 802 includes the following steps:
  • Step 802a Determine the correspondence between the pixel points in the initial Trimap map and the initial Mask map.
  • the initial Trimap image and the initial Mask image are obtained based on the same sample image, and the pixels of the two are in one-to-one correspondence.
  • the computer device determines the pixel points with the same relative positions in the initial Trimap map and the initial Mask map as corresponding pixel points. For example, the computer device determines the corresponding relationship of the pixel points based on the coordinates of each pixel point in the initial Trimap map and the initial Mask map. Two pixels with the same coordinates in the initial Trimap map and the initial Mask map are determined as a pair of pixel points, and based on the difference between the pixel values of each pair of pixel points, the pixel values of the pixel points in the initial Trimap map are corrected, Optimize the initial Trimap map.
  • Step 802b for any pixel in the initial Trimap, in response to the pixel value corresponding to the pixel in the initial Trimap being the third pixel value, and the pixel corresponding to the pixel in the initial Mask being the first pixel value, Update the pixel value of the pixel point in the initial Trimap image to the second pixel value to obtain the candidate Trimap image.
  • the computer device divides the pixel points with the third pixel value in the initial Trimap map and the first pixel value in the initial Mask map into the undetermined area in the candidate Trimap map, that is, the first pixel value.
  • One image segmentation model confirms that it belongs to the foreground area
  • the second image segmentation model confirms that the part belonging to the background area is determined as the undetermined area
  • the undetermined area in the initial Trimap map is enlarged, so that the foreground area and the undetermined area cover the target content as much as possible, To avoid partial target content in the target Trimap map being divided into the background area, resulting in incomplete target content obtained by subsequent matting.
  • the computer device uses the array operation in Numpy to generate the Trimap of candidate samples.
  • Step 803 optimize the foreground area in the candidate Trimap map by using the corrosion Mask map to obtain the target Trimap map, and the corrosion Mask map is obtained from the initial Mask map through corrosion processing.
  • the computer device uses the image erosion function in OpenCV to perform erosion processing on the initial Mask image to obtain an eroded Mask image, and the foreground area in the eroded Mask image is smaller than the foreground area in the initial Mask image.
  • FIG. 9 shows a process of generating an erosion Mask map from an initial Mask map. After the initial Mask image 901 is etched, an etched Mask image 902 is obtained. It can be clearly seen from FIG. 9 that the foreground area in the etched Mask image 902 is smaller than that in the initial Mask image 901 .
  • Step 803a performing pixel-level superposition on the erosion Mask image and the candidate Trimap image to obtain a superimposed image.
  • the computer device traverses the pixel points in the candidate Trimap map and the eroded Mask map in a preset order.
  • the pixel value of A in the candidate Trimap map is 255
  • the pixel value of pixel point a in the corrosion Mask map is 255, then it is determined that the pixel value of the pair of pixels in the superimposed image is 510.
  • Step 803b for any pixel in the superimposed image, in response to the pixel value being greater than the first pixel value, update the pixel value of the pixel in the superimposed image to the first pixel value to obtain the target Trimap.
  • the computer device updates the pixel value of the pixel whose pixel value is greater than the first pixel value in the superimposed image to the first pixel value to obtain the target Trimap image. That is, for the pixels belonging to the undetermined area in the candidate Trimap, but belonging to the foreground area in the erosion Mask, it is determined that they belong to the foreground area in the target Trimap, thereby improving the integrity of the foreground area.
  • FIG. 10 shows a process of optimizing the foreground area in the candidate Trimap map by eroding the Mask map to obtain the target Trimap map.
  • the computer equipment superimposes the corrosion Mask map 902 and the candidate Trimap map 1001 at the pixel level and updates the pixel values to obtain the target Trimap map 1002. It can be clearly seen from FIG. 10 that the target Trimap map 1002 is compared with the candidate Trimap map 1001. More complete, close to the actual portrait area.
  • Step 804 input the target Trimap and the target image into the matting model.
  • the computer equipment optimizes the foreground area and the undetermined area in the sample Trimap
  • the obtained target Trimap and the target image are input into the matting model, and the matting model is used to perform fine matting on the target Trimap.
  • Step 805 using the matting model to perform feature extraction on parts of the target image corresponding to the foreground region and the background region of the target Trimap to obtain image features.
  • the computer device uses a matting model to learn image features of the foreground region and the background region in the target Trimap, and further segment the region to be determined based on the learned image features.
  • Step 806 based on the image features, use the matting model to perform image segmentation on the undetermined area in the target Trimap, and perform transparency processing on the target Trimap to obtain an Alpha image.
  • the matting model uses the foreground area and background area in the target Trimap to extract features from the content of the corresponding area in the target image, learn the image features and differences between the target content and other content, and analyze the target Trimap based on the extracted image features.
  • the undetermined area is finely segmented, and the transparent channel value corresponding to each pixel is generated at the same time.
  • FIG. 11 shows an Alpha map generated by the cutout model for a target Trimap map.
  • the target Trimap 1002 includes the foreground area, the background area and the undetermined area, and the corresponding Alpha graph 1101 only includes the foreground area and the background area, and it can be seen from FIG. 11 that the target Trimap 1002
  • the edge area is relatively smooth, while the Alpha The edge region of the portrait in Figure 1101 is finely segmented.
  • the undetermined area in the initial Trimap is optimized by using the foreground area in the initial Mask, and the foreground area in the candidate Trimap is optimized by using the foreground area in the Corrosion Mask, so that the target Trimap is optimized.
  • the pending area and foreground area can cover the area corresponding to the target content as much as possible, thereby improving the matting accuracy of the matting model and improving the matting efficiency.
  • Step 1201 acquiring a target image.
  • Step 1202 input the target image into the first image segmentation model.
  • Step 1203 generating an initial Mask graph.
  • Step 1204 input the target image into the second image segmentation model.
  • Step 1205 generate an initial Trimap map.
  • Step 1206 using the initial Mask map to optimize the initial Trimap map to generate a target Trimap map.
  • Step 1207 Input the target Trimap map and the target image into the matting model to generate an Alpha map.
  • the process of generating the target Trimap is as shown in FIG. 13 .
  • Step 1301 Copy the original Trimap to obtain the copied Trimap.
  • Step 1302 using the initial Mask map to optimize the undetermined area in the copy Trimap map to obtain a candidate Trimap map.
  • Step 1303 performing etching processing on the initial Mask image to obtain an etching Mask image.
  • Step 1304 superimposing the corrosion Mask image and the candidate Trimap image to obtain the target Trimap image.
  • FIG. 14 shows a structural block diagram of an apparatus for training a matting model provided by an exemplary embodiment of the present application.
  • the apparatus can be implemented as all or a part of the terminal through software, hardware or a combination of the two.
  • the device includes:
  • the first input module 1401 is used to input the sample image into the first image segmentation model and the second image segmentation model, and obtain the initial sample mask map corresponding to the sample image and the initial sample three-layer segmentation Trimap map, respectively.
  • the sample Mask image is divided into foreground area and background area
  • the initial sample Trimap image is divided into foreground area, background area and undetermined area;
  • a first optimization module 1402 configured to optimize the initial sample Trimap by using the initial sample Mask to obtain a target sample Trimap
  • the second input module 1403 is configured to input the target sample Trimap and the sample image into a matting model to obtain a sample transparent channel Alpha map corresponding to the sample image, and the sample Alpha map includes the corresponding pixel points in the sample Alpha map. predict transparent channel value;
  • the training module 1404 is configured to train the image processing model based on the sample Alpha map and the labeled Alpha map corresponding to the sample image, where the sample transparency channel value of each pixel is labeled in the labeled Alpha map.
  • the first optimization module 1402 includes:
  • the first optimization unit is used to optimize the undetermined area in the initial sample Trimap by using the initial sample Mask to obtain a candidate sample Trimap;
  • the second optimization unit is configured to optimize the foreground area in the candidate sample Trimap by using the corrosion sample Mask image to obtain the target sample Trimap image, and the corrosion sample Mask image is etched from the initial sample Mask image. get.
  • the pixels in the foreground area correspond to the first pixel value
  • the pixels in the undetermined area correspond to the second pixel value
  • the pixels in the background area correspond to the third pixel value
  • the first pixel value is greater than the first pixel value.
  • Two pixel values, and the second pixel value is greater than the third pixel value;
  • the first optimization unit is also used for:
  • the pixel value corresponding to the pixel in the initial sample Trimap is the third pixel value, and the pixel is in the initial sample Mask
  • the corresponding pixel value in the figure is the first pixel value, and the pixel value of the pixel point in the initial sample Trimap is updated to the second pixel value to obtain the candidate sample Trimap.
  • the second optimization unit is also used for:
  • the training module 1404 includes:
  • a computing unit used to calculate the Euclidean distance between the predicted transparent channel value corresponding to each pixel in the sample Alpha map and the sample transparent channel value;
  • a determining unit configured to determine a matting loss of the matting model based on the Euclidean distance, where the matting loss is the sum of the Euclidean distances corresponding to each of the pixel points;
  • a training unit configured to perform back-propagation training on the image processing model by using a gradient descent algorithm based on the matting loss.
  • the device further includes:
  • the first image processing unit is used for using the image corrosion instruction in the open source computer vision library OpenCV to perform corrosion processing on the initial sample Mask image to obtain the corrosion sample Mask image, and the foreground area in the corrosion sample Mask image is smaller than The foreground area in the initial sample Mask image.
  • FIG. 15 shows a structural block diagram of a matting apparatus provided by an exemplary embodiment of the present application.
  • the apparatus can be implemented as all or a part of the terminal through software, hardware or a combination of the two.
  • the device includes:
  • the third input module 1501 is configured to input the target image into the first image segmentation model and the second image segmentation model, and obtain an initial Mask image and an initial Trimap image corresponding to the target image, respectively, and the initial Mask image is divided into foreground regions and background area, described initial Trimap is divided into foreground area, background area and undetermined area;
  • the second optimization module 1502 is used to optimize the initial Trimap by utilizing the initial Mask to obtain a target Trimap;
  • the fourth input module 1503 is configured to input the target Trimap map and the target image into a matting model to obtain an Alpha map corresponding to the target image, and the Alpha map includes transparency channel values corresponding to each pixel point.
  • the second optimization module 1502 includes:
  • the third optimization unit is used to optimize the undetermined area in the initial Trimap by utilizing the initial Mask to obtain a candidate Trimap;
  • the fourth optimization unit is configured to optimize the foreground area in the candidate Trimap map by using an erosion Mask map to obtain the target Trimap map, and the erosion Mask map is obtained by etching the initial Mask map.
  • the pixels in the foreground area correspond to the first pixel value
  • the pixels in the undetermined area correspond to the second pixel value
  • the pixels in the background area correspond to the third pixel value
  • the first pixel value is greater than the the second pixel value
  • the second pixel value is greater than the third pixel value
  • the third optimization unit is also used for:
  • the pixel value is the first pixel value
  • the pixel value of the pixel point in the initial Trimap is updated to the second pixel value to obtain the candidate Trimap.
  • the fourth optimization unit is also used for:
  • Pixel-level superposition is performed on the corrosion Mask image and the candidate Trimap image to obtain a superimposed image
  • the fourth input module 1503 includes:
  • a feature extraction unit used for utilizing the matting model to perform feature extraction on the part of the foreground region and the background region in the target image corresponding to the target Trimap, to obtain image features
  • the second image processing unit is configured to, based on the image features, use the matting model to perform image segmentation on the undetermined area in the target Trimap, and perform transparency processing on the target Trimap to obtain the Alpha image .
  • the initial sample Trimap generated by the second image segmentation model is optimized by using the initial sample Mask generated by the first image segmentation model, so that the foreground area, The division of the background area and the undetermined area is closer to the sample image, which improves the segmentation accuracy of the Trimap image, so that the matting model can adjust the model parameters based on a more accurate image segmentation basis during the training process based on the target Trimap image. Then, the matting accuracy and accuracy of the matting model are improved; in the model application stage, the initial Trimap image is optimized by using the initial Mask image to obtain the target Trimap image, so that the matting model only needs to be based on the foreground area in the target Trimap image.
  • Finely segment the to-be-determined area with the background area reduce the amount of calculation of the matting model, provide a high-accuracy image segmentation basis for the matting model in advance, and avoid the matting model based on the wrong segmentation results.
  • the cutout efficiency and cutout accuracy of the graph model are the following:
  • FIG. 16 shows a schematic structural diagram of a computer device provided by an exemplary embodiment of the present application.
  • Computer device 1600 may also be a single computer device or a cluster of computer devices.
  • the computer device 1600 includes a central processing unit (Central Processing Unit, CPU) 1601, a system including a random access memory (Random Access Memory, RAM) 1602 and a read-only memory (Read-Only Memory, ROM) 1603 A memory 1604, and a system bus 1605 connecting the system memory 1604 and the central processing unit 1601.
  • CPU Central Processing Unit
  • RAM random access memory
  • ROM Read-Only Memory
  • the computer device 1600 also includes a basic input/output system (Input/Output system, I/O system) 1606 that helps to transfer information between various devices within the computer device 1600, and is used to store the operating system 1613, application programs 1614 and Mass storage device 1607 for other program modules 1615.
  • I/O system input/output system
  • the basic input/output system 1606 includes a display 1608 for displaying information and input devices 1609 such as a mouse, keyboard, etc., for user input of information. Both the display 1608 and the input device 1609 are connected to the central processing unit 1601 through the input and output controller 1600 connected to the system bus 1605.
  • the basic input/output system 1606 may also include an input output controller 1600 for receiving and processing input from a number of other devices such as a keyboard, mouse, or electronic stylus.
  • input output controller 1600 also provides output to a display screen, printer, or other type of output device.
  • the mass storage device 1607 is connected to the central processing unit 1601 through a mass storage controller (not shown) connected to the system bus 1605 .
  • the mass storage device 1607 and its associated computer-readable media provide non-volatile storage for the computer device 1600. That is, the mass storage device 1607 may include a computer-readable medium (not shown) such as a hard disk or a Compact Disc Read-Only Memory (CD-ROM).
  • a computer-readable medium such as a hard disk or a Compact Disc Read-Only Memory (CD-ROM).
  • Computer-readable media can include computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include RAM, ROM, computer memory (Erasable Programmable Read Only Memory, EPROM), read-write memory (Electrically Erasable Programmable Read Only Memory, EEPROM), flash memory or other solid-state storage technologies, CD-ROM, digital versatile disc ( Digital Versatile Disc, DVD) or other optical storage, cassettes, tapes, disk storage or other magnetic storage devices.
  • RAM random access memory
  • ROM read-write memory
  • flash memory or other solid-state storage technologies
  • CD-ROM compact disc
  • DVD digital versatile disc
  • cassettes, tapes, disk storage or other magnetic storage devices etc.
  • the memory stores one or more programs, the one or more programs are configured to be executed by the one or more central processing units 1601, and the one or more programs contain instructions for implementing the above-mentioned compiling method of the application installation package, and the central processing
  • the unit 1601 executes the one or more programs to implement the methods provided by the foregoing method embodiments.
  • the computer device 1600 may also be connected to a remote computer on a network through a network such as the Internet to operate. That is, the computer device 1600 can be connected to the network 1612 through the network interface unit 1611 connected to the system bus 1605, or can also use the network interface unit 1611 to connect to other types of networks or remote computer systems (not shown). ).
  • the memory further includes one or more programs, the one or more programs are stored in the memory, and the one or more programs include steps for performing the steps performed by the computer device in the method provided by the embodiment of the present application .
  • Embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores at least one instruction, and the at least one instruction is loaded and executed by a processor to implement the cutout model described in the above embodiments training method or matting method.
  • a computer program product or computer program comprising computer instructions stored in a computer readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the training method or the matting model provided in the various optional implementations of the above aspects. method.
  • Computer-readable storage media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another.
  • a storage medium can be any available medium that can be accessed by a general purpose or special purpose computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

一种抠图模型的训练、抠图方法、装置、设备及存储介质,属于计算机视觉领域。方法包括:将样本图像输入第一图像分割模型和第二图像分割模型,分别得到样本图像对应的初始样本Mask图和初始样本Trimap图;利用初始样本Mask图对初始样本Trimap图进行优化,得到目标样本Trimap图;将目标样本Trimap图和样本图像输入抠图模型,得到样本图像对应的样本Alpha图;基于样本Alpha图和样本图像对应的标注Alpha图,对图像处理模型进行训练。可以提高Trimap图的分割精确度,进而提高抠图模型的抠图精度和准确率。

Description

抠图模型的训练、抠图方法、装置、设备及存储介质
本申请要求于2020年12月18日提交的申请号为202011504662.4、发明名称为“抠图模型的训练、抠图方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及计算机视觉领域,特别涉及一种抠图模型的训练、抠图方法、装置、设备及存储介质。
背景技术
抠图是指将图片或影像的一部分从原始图片或原始影像中分离,成为单独图层的操作,可以应用于人像虚化、背景替换以及图像合成等领域。目前的图像处理软件利用神经网络进行图像处理,不需要手动操作,能够提高图像处理的效率以及抠图的精确度。
相关技术中,二阶段人像抠图算法是较为常见的抠图工具,具体方法是先对图像分割模型所生成的掩膜(Mask)图进行腐蚀和膨胀处理,得到三层分割(Trimap)图,或利用Trimap分割模型直接得到含有前景、背景及待定区域的Trimap图,最后将Trimap图输入抠图模型,生成透明通道(Alpha)图,从而利用Alpha图对原图像进行处理。
发明内容
本申请实施例提供了一种抠图模型的训练、抠图方法、装置、设备及存储介质。所述技术方案如下:
一方面,本申请实施例提供了一种抠图模型的训练方法,所述方法包括:
将样本图像输入第一图像分割模型和第二图像分割模型,分别得到所述样本图像对应的初始样本掩膜Mask图和初始样本三层分割Trimap图,所述初始样本Mask图被划分为前景区域和背景区域,所述初始样本Trimap图被划分为前景区域、背景区域和待定区域;
利用所述初始样本Mask图对所述初始样本Trimap图进行优化,得到目标样本Trimap图;
将所述目标样本Trimap图和所述样本图像输入抠图模型,得到所述样本图像对应的样本Alpha图,所述样本Alpha图中包含各个像素点对应的预测透明通道值;
基于所述样本Alpha图和所述样本图像对应的标注Alpha图,对所述图像处理模型进行训练,所述标注Alpha图中标注有各个像素点的样本透明通道值。
另一方面,本申请实施例提供了一种抠图方法,所述方法包括:
将目标图像输入第一图像分割模型和第二图像分割模型,分别得到所述目标图像对应的初始Mask图和初始Trimap图,所述初始Mask图被划分为前景区域和背景区域,所述初始Trimap图被划分为前景区域、背景区域和待定区域;
利用所述初始Mask图对所述初始Trimap图进行优化,得到目标Trimap图;
将所述目标Trimap图和所述目标图像输入抠图模型,得到所述目标图像对应的Alpha图,所述Alpha图中包含各个像素点对应的透明通道值。
另一方面,本申请实施例提供了一种抠图模型的训练装置,所述装置包括:
第一输入模块,用于将样本图像输入第一图像分割模型和第二图像分割模型,分别得到所述样本图像对应的初始样本掩膜Mask图和初始样本三层分割Trimap图,所述初始样本Mask图被划分为前景区域和背景区域,所述初始样本Trimap图被划分为前景区域、背景区域和待定区域;
第一优化模块,用于利用所述初始样本Mask图对所述初始样本Trimap图进行优化,得到目标样本Trimap图;
第二输入模块,用于将所述目标样本Trimap图和所述样本图像输入抠图模型,得到所述样本图像对应的样本透明通道Alpha图,所述样本Alpha图中包含各个像素点对应的预测透明通道值;
训练模块,用于基于所述样本Alpha图和所述样本图像对应的标注Alpha图,对所述图像处理模型进行训练,所述标注Alpha图中标注有各个像素点的样本透明通道值。
另一方面,本申请实施例提供了一种抠图装置,所述装置包括:
第三输入模块,用于将目标图像输入第一图像分割模型和第二图像分割模型,分别得到所述目标图像对应的初始Mask图和初始Trimap图,所述初始Mask图被划分为前景区域和背景区域,所述初始Trimap 图被划分为前景区域、背景区域和待定区域;
第二优化模块,用于利用所述初始Mask图对所述初始Trimap图进行优化,得到目标Trimap图;
第四输入模块,用于将所述目标Trimap图和所述目标图像输入抠图模型,得到所述目标图像对应的Alpha图,所述Alpha图中包含各个像素点对应的透明通道值。
另一方面,本申请实施例提供了一种计算机设备,所述计算机设备包括处理器和存储器;所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如上述方面所述的抠图模型的训练方法,或实现如上述方面所述的抠图方法。
另一方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条程序代码,所述程序代码由处理器加载并执行以实现如上述方面所述的抠图模型的训练方法,或实现如上述方面所述的抠图方法。
根据本申请的一个方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备实现上述方面的各种可选实现方式中提供的抠图模型的训练方法或抠图方法。
附图说明
图1是相关技术中的抠图方法的流程图;
图2是本申请一个示例性实施例提供的抠图模型的训练方法的流程图;
图3是本申请一个示例性实施例提供的由样本图像生成初始样本Mask图和初始样本Trimap图的示意图;
图4是本申请另一个示例性实施例提供的抠图模型的训练方法的流程图;
图5是本申请一个示例性实施例提供的抠图方法的流程图;
图6是本申请一个示例性实施例提供的初始Mask图;
图7是本申请一个示例性实施例提供的初始Trimap图;
图8是本申请另一个示例性实施例提供的抠图方法的流程图;
图9是本申请一个示例性实施例提供的对初始Mask图进行腐蚀处理的示意图;
图10是本申请一个示例性实施例提供的对候选Trimap图中的前景区域进行优化的示意图;
图11是本申请一个示例性实施例提供的利用抠图模型生成的Alpha图;
图12是本申请另一个示例性实施例提供的抠图方法的流程图;
图13是本申请一个示例性实施例提供的生成目标Trimap图的流程图;
图14是本申请一个示例性实施例提供的抠图模型的训练装置的结构框图;
图15是本申请一个示例性实施例提供的抠图装置的结构框图;
图16是本申请一个示例性实施例提供的计算机设备的结构框图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
相关技术通常利用二阶段人像抠图算法进行抠图,一种方法方法是先利用图像分割模型生成的Mask图,再对其进行腐蚀和膨胀处理,得到包含前景区域、背景区域及待定区域的Trimap图,如图1所示,其流程包括步骤101,获取目标图像;步骤102,将目标图像输入图像分割模型,生成Mask图;步骤103,对Mask图进行膨胀腐蚀处理,生成Trimap图;步骤104,将Trimap图和目标图像输入抠图模型,生成Alpha图。另一种方法是利用Trimap分割模型直接得到含有前景区域、背景区域及待定区域的Trimap图,其流程是将上述步骤102至103替换为步骤105,将目标图像输入Trimap分割模型,生成Trimap图。两种方式最后都是将Trimap图输入抠图模型,使抠图模型在Trimap图对待抠取内容和其它内容大致划分的基础上,进一步分割,并生成Alpha图,从而利用Alpha图对原图像进行处理。
显然,二阶段人像抠图算法的抠图精确度主要依赖于第一阶段生成Trimap图的精确度,然而相关技术中并没有对Trimap图进行修正和优化,若Trimap图精确度较低,则会导致最终的抠图结果不准确。
为了解决上述技术问题,本申请实施例提供了一种抠图模型的训练、抠图方法,在利用第一图像分割模型和第二图像分割模型分别得到初始Mask图和初始Trimap图后,利用初始Mask图对初始Trimap中各个区域进行优化,得到目标Trimap图,从而提高抠图模型的抠图精确度和准确率。
本申请实施例提供的抠图模型的训练方法包括:
将样本图像输入第一图像分割模型和第二图像分割模型,分别得到样本图像对应的初始样本Mask图和初始样本Trimap图,初始样本Mask图被划分为前景区域和背景区域,初始样本Trimap图被划分为前景区域、背景区域和待定区域;
利用初始样本Mask图对初始样本Trimap图进行优化,得到目标样本Trimap图;
将目标样本Trimap图和样本图像输入抠图模型,得到样本图像对应的样本Alpha图,样本Alpha图中包含各个像素点对应的预测透明通道值;
基于样本Alpha图和样本图像对应的标注Alpha图,对抠图模型进行训练,标注Alpha图中标注有各个像素点的样本透明通道值。
可选的,利用初始样本Mask图对初始样本Trimap图进行优化,得到目标样本Trimap图,包括:
利用初始样本Mask图对初始样本Trimap图中的待定区域进行优化,得到候选样本Trimap图;
利用腐蚀样本Mask图对候选样本Trimap图中的前景区域进行优化,得到目标样本Trimap图,腐蚀样本Mask图由初始样本Mask图经过腐蚀处理得到。
可选的,前景区域中的像素点对应第一像素值,待定区域中的像素点对应第二像素值,背景区域中的像素点对应第三像素值,第一像素值大于第二像素值,且第二像素值大于第三像素值;
利用初始样本Mask图对初始样本Trimap图中的待定区域进行优化,得到候选样本Trimap图,包括:
确定初始样本Trimap图和初始样本Mask图中像素点的对应关系;
对于初始样本Trimap图中任一像素点,响应于像素点在初始样本Trimap图中对应的像素值为第三像素值,且像素点在初始样本Mask图中对应的像素值为第一像素值,将初始样本Trimap图中像素点的像素值更新为第二像素值,得到候选样本Trimap图。
可选的,利用腐蚀样本Mask图对候选样本Trimap图中的前景区域进行优化,得到目标样本Trimap图,包括:
对腐蚀样本Mask图和候选样本Trimap图进行像素级叠加,得到样本叠加图像;
对于样本叠加图像中的任一像素点,响应于像素值大于第一像素值,将像素点在样本叠加图像中的像素值更新为第一像素值,得到目标样本Trimap图。
可选的,基于样本Alpha图和样本图像对应的标注Alpha图,对抠图模型进行训练,包括:
计算样本Alpha图中各个像素点对应预测透明通道值与样本透明通道值的欧氏距离;
基于欧氏距离确定抠图模型的抠图损失,抠图损失为各个像素点对应的欧氏距离之和;
基于抠图损失,利用梯度下降算法对图像处理模型进行反向传播训练。
可选的,利用腐蚀样本Mask图对候选样本Trimap图中的前景区域进行优化,得到目标样本Trimap图之前,该方法包括:
利用开源计算机视觉库(Open Source Computer Vision Library,OpenCV)中的图像腐蚀函数,对初始样本Mask图进行腐蚀处理,得到腐蚀样本Mask图,腐蚀样本Mask图中的前景区域小于初始样本Mask图中的前景区域。
本申请实施例提供的抠图方法包括:
将目标图像输入第一图像分割模型和第二图像分割模型,分别得到目标图像对应的初始Mask图和初始Trimap图,初始Mask图被划分为前景区域和背景区域,初始Trimap图被划分为前景区域、背景区域和待定区域;
利用初始Mask图对初始Trimap图进行优化,得到目标Trimap图;
将目标Trimap图和目标图像输入抠图模型,得到目标图像对应的Alpha图,Alpha图中包含各个像素点对应的透明通道值。
可选的,利用初始Mask图对初始Trimap图进行优化,得到目标Trimap图,包括:
利用初始Mask图对初始Trimap图中的待定区域进行优化,得到候选Trimap图;
利用腐蚀Mask图对候选Trimap图中的前景区域进行优化,得到目标Trimap图,腐蚀Mask图由初始Mask图经过腐蚀处理得到。
可选的,前景区域中的像素点对应第一像素值,待定区域中的像素点对应第二像素值,背景区域中的像素点对应第三像素值,第一像素值大于第二像素值,且第二像素值大于第三像素值;
利用初始Mask图对初始Trimap图中的待定区域进行优化,得到候选Trimap图,包括:
确定初始Trimap图和初始Mask图中像素点的对应关系;
对于初始Trimap图中任一像素点,响应于像素点在初始Trimap图中对应的像素值为第三像素值,且像素点在初始Mask图中对应的像素值为第一像素值,将初始Trimap图中像素点的像素值更新为第二像素值,得到候选Trimap图。
可选的,利用腐蚀Mask图对候选Trimap图中的前景区域进行优化,得到目标Trimap图,包括:
对腐蚀Mask图和候选Trimap图进行像素级叠加,得到叠加图像;
对于叠加图像中的任一像素点,响应于像素值大于第一像素值,将像素点在叠加图像中的像素值更新为第一像素值,得到目标Trimap图。
可选的,将目标Trimap图和目标图像输入抠图模型,得到目标图像对应的Alpha图,包括:
将目标Trimap图和目标图像输入抠图模型;
利用抠图模型对目标图像中对应目标Trimap图前景区域和背景区域的部分进行特征提取,得到图像特征;
基于图像特征,利用抠图模型对目标Trimap图中的待定区域进行图像分割,以及对目标Trimap图进行透明度处理,得到Alpha图。
图2示出了本申请一个实施例提供的抠图模型的训练方法的流程图。本申请实施例以该方法应用于计算机设备为例进行说明,该方法包括:
步骤201,将样本图像输入第一图像分割模型和第二图像分割模型,分别得到样本图像对应的初始样本Mask图和初始样本Trimap图,初始样本Mask图被划分为前景区域和背景区域,初始样本Trimap图被划分为前景区域、背景区域和待定区域。
Mask图是由图像分割模型直接对图像中的目标内容和其它内容进行分割得到的位图,其中包含前景区域和背景区域,前景区域为图像分割模型确定目标内容所对应的区域,而背景区域则是图像分割模型确定其它内容所对应的区域。对应于人像抠图领域,前景区域即为图像中的人像,而背景区域则是图像中除人像以外的部分。
Trimap图是由Mask图经过膨胀腐蚀处理,或者由图像分割模型直接对图像进行分割等方式得到的位图,其中包含前景区域、背景区域以及待定区域,待定区域是图像分割模型无法确定是否为目标内容的区域,通常包含目标内容的边界,待定区域需要在后续抠图阶段由抠图模型进行精确分割。例如对应于人像抠图领域,人像的发梢附近、发丝之间的空隙以及指缝等难以精确识别的区域,可能被确定为待定区域。
在一种可能的实施方式中,计算机设备将样本图像输入第一图像分割模型,得到初始样本Mask图,并将样本图像输入第二图像分割模型,得到初始样本Trimap图。
示意性的,图3示出了一种样本图像对应的初始样本Mask图和初始样本Trimap图。计算机设备将样本图像301分别输入第一图像分割模型和第二图像分割模型,将样本图像301中的人像和背景进行分割,得到初始样本Mask图302和初始样本Trimap图303,其中,初始样本Mask图302中包含前景区域302a和背景区域302b,初始样本Trimap图303中包含前景区域303a、背景区域303b以及待定区域303c。
在一种可能的实施方式中,在模型训练阶段,计算机设备首先分别对第一图像分割模型和第二图像分割模型进行模型训练,利用训练完成的第一图像分割模型和第二图像分割模型对样本图片进行图像分割,再执行对抠图模型的训练步骤。计算机设备分别单独训练第一图像分割模型、第二图像分割模型以及抠图模型,然后对上述三个模型进行级联,固定第一图像分割模型和第二图像分割模型的模型参数,利用样本图像,使用较低的学习率对抠图模型进行微调,本申请实施例主要描述模型微调部分的内容。
步骤202,利用初始样本Mask图对初始样本Trimap图进行优化,得到目标样本Trimap图。
通常,Mask图是图像分割模型直接对目标内容和其它内容进行区分得到的,因此可能会将目标内容与其它内容的交界处等不确定区域直接划分为前景区域,若直接利用Mask图进行抠图,可能导致后续抠图模型的抠图准确度较低,即得到的人像中包含其它内容的情况;而Trimap图中既包含前景区域,也包含待定区域,计算机设备只将属于目标内容的概率较高的部分划分为前景区域,将目标内容的边界等不易分割的内容划分为待定区域,即待定区域中包含部分目标内容,因此可能会存在后续抠图时目标内容不完整的情况。
在一种可能的实施方式中,为了提高抠图模型的准确度,本申请实施例通过利用初始样本Mask图对初始样本Trimap图进行优化的方式,得到目标样本Trimap图,从而利用目标样本Trimap图对抠图模型进行训练。
步骤203,将所目标样本Trimap图和样本图像输入抠图模型,得到样本图像对应的样本透明通道Alpha图,样本Alpha图中包含各个像素点对应的预测透明通道值。
计算机设备利用抠图模型,基于目标样本Trimap图中的前景区域和背景区域,对样本图像进行特征提取,从而基于各区域的特征对待定区域进行进一步分割,同时对目标样本Trimap图中的各个像素点设置透明通道值,例如,对于前景区域的部分像素点可以直接设置为不透明,对于背景区域的部分像素则直接设置为完全透明,对于待定区域,抠图模型基于精细分割的结果对其中的不同像素点设置不同的透明通道值,使目标内容的边缘区域得到更加细致的处理。
步骤204,基于样本Alpha图和样本图像对应的标注Alpha图,对抠图模型进行训练,标注Alpha图中标注有各个像素点的样本透明通道值。
标注Alpha图是预先被标注有各个像素点对应的样本透明通道值的Alpha图,其样本透明通道值的精确度较高,用作抠图模型对比抠图结果进而调整参数的依据。计算机设备基于样本Alpha图和标注Alpha图对抠图模型中的参数进行调整,并重复迭代至抠图模型的输出结果(样本Alpha图)较为接近标注Alpha图,使抠图模型的模型参数不断被优化。
综上所述,本申请实施例中,利用第一图像分割模型生成的初始Mask图对第二图像分割模型所生成的初始Trimap图进行优化,使得到的目标Trimap图中前景区域、背景区域和待定区域的划分更加接近样本图像,提高Trimap图的分割精确度,从而使抠图模型在基于目标Trimap图进行训练的过程中,能够以更加准确的图像分割依据对模型参数进行调整,进而提高抠图模型的抠图精确度和准确率。
图4示出了本申请另一个实施例提供的抠图模型的训练方法的流程图。本申请实施例以该方法应用于计算机设备为例进行说明,该方法包括:
步骤401,将样本图像输入第一图像分割模型和第二图像分割模型,分别得到样本图像对应的初始样本Mask图和初始样本Trimap图,初始样本Mask图被划分为前景区域和背景区域,初始样本Trimap图被划分为前景区域、背景区域和待定区域。
步骤401的具体实施方式可以参考上述步骤201,本申请实施例在此不再赘述。
步骤402,利用初始样本Mask图对初始样本Trimap图中的待定区域进行优化,得到候选样本Trimap图。
图像分割模型生成初始样本Mask图和初始样本Trimap图的方式不同,二者对于目标内容边界等区域的分割存在差异,通常初始样本Mask图的前景区域过大而初始样本Trimap图的前景区域过小。在一种可能的实施方式中,计算机设备利用初始样本Mask图中的前景区域对初始样本Trimap图中的待定区域进行优化,将在初始样本Mask图中属于前景区域,而在初始样本Trimap图中属于背景区域的部分,重新确定为待定区域,使得候选样本Trimap图中的待定区域和前景区域能够尽量完全覆盖目标内容所对应的区域。
为了便于抠图模型对不同图像内容对应的区域进行区分和学习,以及计算机设备对初始样本Trimap图的优化,初始样本Mask图以及初始样本Trimap图中,不同区域对应的像素值不同。前景区域中的像素点对应第一像素值,待定区域中的像素点对应第二像素值,背景区域中的像素点对应第三像素值,第一像素值大于所述第二像素值,且第二像素值大于第三像素值。
示意性的,前景区域的像素值为255,背景区域的像素值为0,待定区域的像素值为128。
在一种可能的实施方式中,计算机设备基于初始样本Mask图和初始样本Trimap图中像素点的像素值进行优化,步骤402包括如下步骤:
步骤402a,确定初始样本Trimap图和初始样本Mask图中像素点的对应关系。
初始样本Trimap图和初始样本Mask图是基于同一张样本图像得到的,二者的像素点一一对应。计算机设备将在初始样本Trimap图和初始样本Mask图中相对位置相同的像素点确定为对应像素点。例如,计算机设备基于各个像素点在初始样本Trimap图和初始样本Mask图中的坐标,确定像素点的对应关系。将在初始样本Trimap图和初始样本Mask图中坐标相同的两个像素点确定为一对像素点,以及基于每一对像素点的像素值的差异,对初始样本Trimap图中像素点的像素值进行修正,优化初始样本Trimap图。
步骤402b,对于初始样本Trimap图中任一像素点,响应于像素点在初始样本Trimap图中对应的像素值为第三像素值,且像素点在初始样本Mask图中对应的像素值为第一像素值,将初始样本Trimap图中像素点的像素值更新为第二像素值,得到候选样本Trimap图。
在一种可能的实施方式中,计算机设备将在初始样本Trimap图中为第三像素值,而在初始样本Mask图中为第一像素值的像素点,划分至候选样本Trimap图中的待定区域,即将第一图像分割模型确认属于前景区域,而第二图像分割模型确认属于背景区域,二者存在争议的部分,确定为待定区域,对初始样本Trimap图中的待定区域进行扩大,使前景区域和待定区域尽可能涵盖目标内容,避免目标样本Trimap图中部分目标内容被划分至背景区域,导致后续抠图得到的目标内容不完整。
具体的,计算机设备利用开源数据库(Numerical Python,Numpy)中的数组运算,生成候选样本Trimap图。
可选的,计算机设备复制初始样本Trimap图得到其拷贝图,在拷贝图中进行像素值的更新;或者,计算机设备直接对初始样本Trimap图进行优化得到候选样本Trimap图。本申请实施例对此不作限定。
示意性的,计算机设备按照预设顺序遍历初始样本Trimap图和初始样本Mask图中的像素点,遍历至像素点A时,确定像素点A在初始样本Trimap图中的像素值为255,则该点像素值不变;遍历至像素点B时,确定像素点B在初始样本Trimap图中的像素值为128,则该点像素值不变;遍历至像素点C时,确定像素点C在初始样本Trimap图中的像素值为0,且在初始样本Mask图中的像素值为255,则将该点在初始样本Trimap图(或其拷贝图)中的像素值更新为128。
步骤403,利用OpenCV中的图像腐蚀函数,对初始样本Mask图进行腐蚀处理,得到腐蚀样本Mask图,腐蚀样本Mask图中的前景区域小于初始样本Mask图中的前景区域。
腐蚀是一种基本的形态学运算,用于寻找图像中的极小区域,即将图像中的高亮部分或白色区域进行缩减细化,使结果图比原图的高亮区域(即目标内容对应的区域)更小。计算机设备通过腐蚀处理,使得到的腐蚀样本Mask图中前景区域小于初始样本Mask图中的前景区域。
具体的,计算机设备利用OpenCV中的图像腐蚀函数cv2.ercode()对初始样本Mask图进行腐蚀处理。示意性的,计算机设备利用15*15的卷积核对初始样本Mask图进行一次腐蚀迭代,得到腐蚀样本Mask图。
步骤404,利用腐蚀样本Mask图对候选样本Trimap图中的前景区域进行优化,得到目标样本Trimap图,腐蚀样本Mask图由初始样本Mask图经过腐蚀处理得到。
与待定区域类似,通常初始样本Trimap中的前景区域过小,并不能完全包含目标内容对应的区域,而腐蚀样本Mask图中的前景区域准确度相对于初始样本Trimap图较高,因此计算机设备利用腐蚀样本Mask图中的前景区域对候选样本Trimap中的前景区域进行完善。
在一种可能的实施方式中,基于上述步骤中的像素值分布情况,计算机设备基于腐蚀样本Mask图和候选样本Trimap图中像素点的像素值进行优化,步骤404包括如下步骤:
步骤404a,对腐蚀样本Mask图和候选样本Trimap图进行像素级叠加,得到样本叠加图像。
可选的,计算机设备直接将腐蚀样本Mask图和候选样本Trimap图进行像素级叠加,额外生成样本叠加图像;或者,计算机设备将腐蚀样本Mask图中像素点的像素值叠加至候选样本Trimap图,将候选样本Trimap图转化为样本叠加图像,本申请实施例对此不作限定。
示意性的,计算机设备按照预设顺序遍历候选样本Trimap图和腐蚀样本Mask图中的像素点,对于候选样本Trimap图中的像素点A以及腐蚀样本Mask图中的像素点a这样一对像素点,若像素点A在候选样本Trimap图中的像素值为255,像素点a在腐蚀样本Mask图中的像素值为255,则确定该对像素点在样本叠加图像中对应像素点的像素值为510。
步骤404b,对于样本叠加图像中的任一像素点,响应于像素值大于第一像素值,将像素点在样本叠加图像中的像素值更新为第一像素值,得到目标样本Trimap图。
在一种可能的实施方式中,计算机设备对于在样本叠加图像中像素值大于第一像素值的像素点,将其像素值更新为第一像素值,得到目标样本Trimap图。也即是对于在候选样本Trimap图中属于待定区域,而在腐蚀样本Mask图中属于前景区域的像素点,确定其属于目标样本Trimap图中的前景区域,从而提高前景区域的完整度和准确度。
示意性的,腐蚀样本Mask图和候选样本Trimap图中,前景区域的像素值为255,待定区域的像素值为128,背景区域的像素值为0,则对于样本叠加图像中像素值为383的像素点,计算机设备将其像素值更新为255。
上述优化过程可以利用开源库中的数组运算以及函数实现,无需搭建神经网络模型且运算速度快。
步骤405,将目标样本Trimap图和样本图像输入抠图模型,得到样本图像对应的样本透明通道Alpha图,样本Alpha图中包含各个像素点对应的预测透明通道值。
计算机设备对初始样本Trimap图中的前景区域和待定区域优化完成后,将得到的目标样本Trimap图和样本图像输入抠图模型。抠图模型利用目标样本Trimap图中的前景区域和背景区域,对样本图像中相应区域的内容进行特征提取,学习目标内容与其它内容的特征及区别,并基于提取到的特征对目标样本Trimap图中的待定区域进行精细分割,同时生成各个像素点对应的预测透明通道值。
步骤406,计算样本Alpha图中各个像素点对应预测透明通道值与样本透明通道值的欧氏距离。
在一种可能的实施方式中,计算机设备采用欧氏距离计算抠图损失。首先计算样本Alpha图中各个像素点对应预测透明通道值与样本透明通道值的欧氏距离,其公式如下:
Figure PCTCN2021129913-appb-000001
其中,α Alpha为像素点的预测透明通道值,α Label为像素点的样本透明通道值,∈为一常量,用于修正 计算结果。
步骤407,基于欧氏距离确定抠图模型的抠图损失,抠图损失为各个像素点对应的欧氏距离之和。
在一种可能的实施方式中,计算机设备以各个像素点对应的欧氏距离之和为抠图模型的抠图损失,其具体公式如下:
Figure PCTCN2021129913-appb-000002
其中,N表示样本Alpha图中像素点的个数,
Figure PCTCN2021129913-appb-000003
为第i个像素点的预测透明通道值,
Figure PCTCN2021129913-appb-000004
第i个像素点的样本透明通道值。
步骤408,基于抠图损失,利用梯度下降算法对抠图模型进行反向传播训练。
计算机设备基于计算得到的抠图损失,计算其对各个模型参数的梯度,从而确定损失函数的收敛方向,并对模型参数进行修正,经过多次迭代训练,使抠图模型的准确率升高。
可选的,当模型训练的次数达到次数阈值,或者当抠图损失小于预设值时,确定抠图损失收敛,模型训练完成。
本申请实施例中,利用初始样本Mask图中的前景区域对初始样本Trimap图中的待定区域进行优化,并利用腐蚀样本Mask图中的前景区域对候选样本Trimap图中的前景区域进行优化,使得目标样本Trimap图中的待定区域和前景区域能够尽量覆盖目标内容所对应的区域,从而提高抠图模型的抠图精确度,提高抠图效率;利用开源库中的数组运算和腐蚀函数进行优化,无需搭建神经网络模型且运算速度快;利用像素点对应的样本透明通道值与预测透明通道值之间的欧氏距离计算抠图损失,并对基于抠图损失对抠图模型反向传播训练,提高抠图模型的准确度和抠图效果。
上述各个实施例描述了抠图模型的训练过程,在抠图模型训练完成后,计算机设备利用抠图模型对目标图像进行抠图,得到其对应的Alpha图。图5示出了本申请一个示例性实施例示出的抠图方法的流程图,本申请实施例以该方法应用于计算式设备为例进行说明,该方法包括:
步骤501,将目标图像输入第一图像分割模型和第二图像分割模型,分别得到目标图像对应的初始Mask图和初始Trimap图,初始Mask图被划分为前景区域和背景区域,初始Trimap图被划分为前景区域、背景区域和待定区域。
计算机设备将目标图像输入第一图像分割模型,得到初始Mask图,并将目标图像输入第二图像分割模型,得到初始Trimap图。示意性的,图6示出了一种目标图像经过第一图像分割模型处理后对应的初始Mask图,其中包含前景区域601以及背景区域602;图7示出了一种目标图像经过第二图像分割模型处理后对应的初始Trimap图,其中包含前景区域701、背景区域702以及待定区域703。
可选的,计算机设备首先将目标图像输入第一图像分割模型,得到初始Mask图,然后将目标图像输入第二图像分割模型,得到初始Trimap图;或者,首先将目标图像输入第二图像分割模型,得到初始Trimap图,然后将目标图像输入第一图像分割模型,得到初始Mask图;或者,计算机设备对目标图像进行复制操作,将两张目标图像同时输入第一图像分割模型和第二图像分割模型。本申请实施例对此不作限定。
步骤502,利用初始Mask图对初始Trimap图进行优化,得到目标Trimap图。
从图6和图7中可以明显看出,初始Mask图中仅包含前景区域和背景区域,并未分割出待定区域,第一图像分割模型将可能为目标内容(即待抠取的图像内容)的部分都划分为前景区域,因此初始Mask图的前景区域中会包含较为完整的目标内容以及一部分背景内容;而初始Trimap图中除了前景区域和背景区域,还包含有待定区域,第二抠图模型只将极可能属于目标内容的部分分割为前景区域,对于不确定的部分则分割为待定区域,因此通常初始Trimap图中的前景区域无法包含完整的目标内容,而待定区域则既包含目标内容也包含背景内容,并且前景区域和待定区域可能也并未完全覆盖目标内容。因此在一种可能的实施方式中,计算机设备利用初始Mask图对初始Trimap图进行优化的方式,得到目标Trimap图。
步骤503,将目标Trimap图和目标图像输入抠图模型,得到目标图像对应的Alpha图,Alpha图中包含各个像素点对应的透明通道值。
计算机设备利用抠图模型,基于目标Trimap图中的前景区域和背景区域,对目标图像进行特征提取,从而基于各区域的特征对待定区域进行进一步分割,同时对目标Trimap图中的各个像素点设置透明通道值,使目标内容的边缘区域得到更加细致的处理。
综上所述,本申请实施例中,利用初始Mask图对初始Trimap图进行优化,得到目标Trimap图,从而使抠图模型只需基于目标Trimap图中的前景区域和背景区域对待定区域进行精细分割,减少了抠图模型的计算量,预先为抠图模型提供了高准确度的图像分割依据,避免抠图模型基于错误的分割结果进行抠图,提高了抠图模型的抠图效率以及抠图精确度。
图8示出了本申请另一个示例性实施例示出的抠图方法的流程图,本申请实施例以该方法应用于计算式设备为例进行说明,该方法包括:
步骤801,将目标图像输入第一图像分割模型和第二图像分割模型,分别得到目标图像对应的初始Mask图和初始Trimap图,初始Mask图被划分为前景区域和背景区域,初始Trimap图被划分为前景区域、背景区域和待定区域。
步骤801的具体实施方式可以参考上述步骤501,本申请实施例在此不再赘述。
步骤802,利用初始Mask图对初始Trimap图中的待定区域进行优化,得到候选Trimap图。
为了便于抠图模型对不同的区域进行区分和学习以及计算机设备对初始Trimap图的优化,初始Mask图以及初始Trimap图中,不同区域对应的像素值不同,前景区域中的像素点对应第一像素值,待定区域中的像素点对应第二像素值,背景区域中的像素点对应第三像素值,第一像素值大于第二像素值,且第二像素值大于第三像素值。
示意性的,前景区域的像素值为255,背景区域的像素值为0,待定区域的像素值为128。
在一种可能的实施方式中,计算机设备基于初始Mask图和初始Trimap图中像素点的像素值进行优化,步骤802包括如下步骤:
步骤802a,确定初始Trimap图和初始Mask图中像素点的对应关系。
初始Trimap图和初始Mask图是基于同一张样本图像得到的,二者的像素点一一对应。计算机设备将在初始Trimap图和初始Mask图中相对位置相同的像素点确定为对应像素点。例如,计算机设备基于各个像素点在初始Trimap图和初始Mask图中的坐标,确定像素点的对应关系。将在初始Trimap图和初始Mask图中坐标相同的两个像素点确定为一对像素点,以及基于每一对像素点的像素值的差异,对初始Trimap图中像素点的像素值进行修正,优化初始Trimap图。
步骤802b,对于初始Trimap图中任一像素点,响应于像素点在初始Trimap图中对应的像素值为第三像素值,且像素点在初始Mask图中对应的像素值为第一像素值,将初始Trimap图中像素点的像素值更新为第二像素值,得到候选Trimap图。
在一种可能的实施方式中,计算机设备将在初始Trimap图中为第三像素值,而在初始Mask图中为第一像素值的像素点,划分至候选Trimap图中的待定区域,即将第一图像分割模型确认属于前景区域,而第二图像分割模型确认属于背景区域的部分,确定为待定区域,对初始Trimap图中的待定区域进行扩大,使前景区域和待定区域尽可能涵盖目标内容,避免目标Trimap图中部分目标内容被划分至背景区域,导致后续抠图得到的目标内容不完整的情况。
具体的,计算机设备利用Numpy中的数组运算,生成候选样本Trimap图。
步骤803,利用腐蚀Mask图对候选Trimap图中的前景区域进行优化,得到目标Trimap图,腐蚀Mask图由初始Mask图经过腐蚀处理得到。
在一种可能的实施方式中,计算机设备利用OpenCV中的图像腐蚀函数,对初始Mask图进行腐蚀处理,得到腐蚀Mask图,腐蚀Mask图中的前景区域小于初始Mask图中的前景区域。
示意性的,图9示出了一种由初始Mask图生成腐蚀Mask图的过程。初始Mask图901经过腐蚀处理后,得到腐蚀Mask图902,由图9可以明显看出,腐蚀Mask图902中的前景区域小于初始Mask图901中的前景区域。
步骤803a,对腐蚀Mask图和候选Trimap图进行像素级叠加,得到叠加图像。
示意性的,计算机设备按照预设顺序遍历候选Trimap图和腐蚀Mask图中的像素点,对于候选Trimap图中的像素点A以及腐蚀Mask图中的像素点a这样一对像素点,若像素点A在候选Trimap图中的像素值为255,像素点a在腐蚀Mask图中的像素值为255,则确定该对像素点在叠加图像中对应像素点的像素值为510。
步骤803b,对于叠加图像中的任一像素点,响应于像素值大于所述第一像素值,将像素点在叠加图像中的像素值更新为第一像素值,得到目标Trimap图。
计算机设备对于在叠加图像中像素值大于第一像素值的像素点,将其像素值更新为第一像素值,得到目标Trimap图。也即是对于在候选Trimap图中属于待定区域,而在腐蚀Mask图中属于前景区域的像素点,确定其属于目标Trimap图中的前景区域,从而提高前景区域的完整度。
示意性的,图10示出了一种由腐蚀Mask图对候选Trimap图中的前景区域进行优化,得到目标Trimap图的过程。计算机设备将腐蚀Mask图902和候选Trimap图1001进行像素级叠加并更新像素值,得到目标Trimap图1002,由图10可以明显看出,目标Trimap图1002相比于候选Trimap图1001,其前景区域更加完整,接近实际的人像区域。
步骤804,将目标Trimap图和目标图像输入抠图模型。
计算机设备对样本Trimap图中的前景区域和待定区域优化完成后,将得到的目标Trimap图和目标图 像输入抠图模型,利用抠图模型对目标Trimap图进行精细抠图。
步骤805,利用抠图模型对目标图像中对应目标Trimap图前景区域和背景区域的部分进行特征提取,得到图像特征。
在一种可能的实施方式中,计算机设备利用抠图模型学习目标Trimap图中前景区域和背景区域的图像特征,并基于学习到的图像特征对待定区域进一步分割。
步骤806,基于图像特征,利用抠图模型对目标Trimap图中的待定区域进行图像分割,以及对目标Trimap图进行透明度处理,得到Alpha图。
抠图模型利用目标Trimap图中的前景区域和背景区域,对目标图像中相应区域的内容进行特征提取,学习目标内容与其它内容的图像特征及区别,并基于提取到的图像特征对目标Trimap图中的待定区域进行精细分割,同时生成各个像素点对应的透明通道值。
示意性的,图11示出了抠图模型对一种目标Trimap图进行抠图所生成的Alpha图。其中,目标Trimap图1002中包含前景区域、背景区域和待定区域,对应的Alpha图1101中仅包含前景区域和背景区域,并且由图11可以看出,目标Trimap图1002边缘区域较为平滑,而Alpha图1101中人像边缘区域得到了精细分割。
本申请实施例中,利用初始Mask图中的前景区域对初始Trimap图中的待定区域进行优化,并利用腐蚀Mask图中的前景区域对候选Trimap图中的前景区域进行优化,使得目标Trimap图中的待定区域和前景区域能够尽量覆盖目标内容所对应的区域,从而提高抠图模型的抠图精确度,提高抠图效率。
结合上述各个实施例,在一个示意性的例子中,抠图方法的流程如图12所示。
步骤1201,获取目标图像。
步骤1202,将目标图像输入第一图像分割模型。
步骤1203,生成初始Mask图。
步骤1204,将目标图像输入第二图像分割模型。
步骤1205,生成初始Trimap图。
步骤1206,利用初始Mask图对初始Trimap图进行优化,生成目标Trimap图。
步骤1207,将目标Trimap图和目标图像输入抠图模型,生成Alpha图。
结合上述各个实施例,在一个示意性的例子中,生成目标Trimap图的流程如图13所示。
步骤1301,复制初始Trimap图,得到拷贝Trimap图。
步骤1302,利用初始Mask图对拷贝Trimap图中的待定区域进行优化,得到候选Trimap图。
步骤1303,对初始Mask图进行腐蚀处理,得到腐蚀Mask图。
步骤1304,将腐蚀Mask图与候选Trimap图叠加,得到目标Trimap图。
图14示出了本申请一个示例性实施例提供的抠图模型的训练装置的结构框图。该装置可以通过软件、硬件或者两者的结合实现成为终端的全部或一部分。该装置包括:
第一输入模块1401,用于将样本图像输入第一图像分割模型和第二图像分割模型,分别得到所述样本图像对应的初始样本掩膜Mask图和初始样本三层分割Trimap图,所述初始样本Mask图被划分为前景区域和背景区域,所述初始样本Trimap图被划分为前景区域、背景区域和待定区域;
第一优化模块1402,用于利用所述初始样本Mask图对所述初始样本Trimap图进行优化,得到目标样本Trimap图;
第二输入模块1403,用于将所述目标样本Trimap图和所述样本图像输入抠图模型,得到所述样本图像对应的样本透明通道Alpha图,所述样本Alpha图中包含各个像素点对应的预测透明通道值;
训练模块1404,用于基于所述样本Alpha图和所述样本图像对应的标注Alpha图,对所述图像处理模型进行训练,所述标注Alpha图中标注有各个像素点的样本透明通道值。
可选的,所述第一优化模块1402包括:
第一优化单元,用于利用所述初始样本Mask图对所述初始样本Trimap图中的待定区域进行优化,得到候选样本Trimap图;
第二优化单元,用于利用腐蚀样本Mask图对所述候选样本Trimap图中的前景区域进行优化,得到所述目标样本Trimap图,所述腐蚀样本Mask图由所述初始样本Mask图经过腐蚀处理得到。
可选的,前景区域中的像素点对应第一像素值,待定区域中的像素点对应第二像素值,背景区域中的像素点对应第三像素值,所述第一像素值大于所述第二像素值,且所述第二像素值大于所述第三像素值;
所述第一优化单元,还用于:
确定所述初始样本Trimap图和所述初始样本Mask图中像素点的对应关系;
对于所述初始样本Trimap图中任一像素点,响应于所述像素点在所述初始样本Trimap图中对应的像素值为所述第三像素值,且所述像素点在所述初始样本Mask图中对应的像素值为所述第一像素值,将所述初始样本Trimap图中所述像素点的像素值更新为所述第二像素值,得到所述候选样本Trimap图。
可选的,所述第二优化单元,还用于:
对所述腐蚀样本Mask图和所述候选样本Trimap图进行像素级叠加,得到样本叠加图像;
对于所述样本叠加图像中的任一像素点,响应于像素值大于所述第一像素值,将所述像素点在所述样本叠加图像中的像素值更新为所述第一像素值,得到所述目标样本Trimap图。
可选的,所述训练模块1404,包括:
计算单元,用于计算所述样本Alpha图中各个像素点对应预测透明通道值与所述样本透明通道值的欧氏距离;
确定单元,用于基于所述欧氏距离确定所述抠图模型的抠图损失,所述抠图损失为各个所述像素点对应的所述欧氏距离之和;
训练单元,用于基于所述抠图损失,利用梯度下降算法对所述图像处理模型进行反向传播训练。
可选的,所述装置还包括:
第一图像处理单元,用于利用开源计算机视觉库OpenCV中的图像腐蚀指令,对所述初始样本Mask图进行腐蚀处理,得到所述腐蚀样本Mask图,所述腐蚀样本Mask图中的前景区域小于所述初始样本Mask图中的前景区域。
图15示出了本申请一个示例性实施例提供的抠图装置的结构框图。该装置可以通过软件、硬件或者两者的结合实现成为终端的全部或一部分。该装置包括:
第三输入模块1501,用于将目标图像输入第一图像分割模型和第二图像分割模型,分别得到所述目标图像对应的初始Mask图和初始Trimap图,所述初始Mask图被划分为前景区域和背景区域,所述初始Trimap图被划分为前景区域、背景区域和待定区域;
第二优化模块1502,用于利用所述初始Mask图对所述初始Trimap图进行优化,得到目标Trimap图;
第四输入模块1503,用于将所述目标Trimap图和所述目标图像输入抠图模型,得到所述目标图像对应的Alpha图,所述Alpha图中包含各个像素点对应的透明通道值。
可选的,所述第二优化模块1502,包括:
第三优化单元,用于利用所述初始Mask图对所述初始Trimap图中的待定区域进行优化,得到候选Trimap图;
第四优化单元,用于利用腐蚀Mask图对所述候选Trimap图中的前景区域进行优化,得到所述目标Trimap图,所述腐蚀Mask图由所述初始Mask图经过腐蚀处理得到。
可选的,所述前景区域中的像素点对应第一像素值,待定区域中的像素点对应第二像素值,背景区域中的像素点对应第三像素值,所述第一像素值大于所述第二像素值,且所述第二像素值大于所述第三像素值;
所述第三优化单元,还用于:
确定所述初始Trimap图和所述初始Mask图中像素点的对应关系;
对于所述初始Trimap图中任一像素点,响应于所述像素点在所述初始Trimap图中对应的像素值为所述第三像素值,且所述像素点在所述初始Mask图中对应的像素值为所述第一像素值,将所述初始Trimap图中所述像素点的像素值更新为所述第二像素值,得到所述候选Trimap图。
可选的,所述第四优化单元,还用于:
对所述腐蚀Mask图和所述候选Trimap图进行像素级叠加,得到叠加图像;
对于所述叠加图像中的任一像素点,响应于像素值大于所述第一像素值,将所述像素点在所述叠加图像中的像素值更新为所述第一像素值,得到所述目标Trimap图。
可选的,所述第四输入模块1503,包括:
特征提取单元,用于利用所述抠图模型对所述目标图像对应所述目标Trimap图中前景区域和背景区域的部分进行特征提取,得到图像特征;
第二图像处理单元,用于基于所述图像特征,利用所述抠图模型对所述目标Trimap图中的待定区域进行图像分割,并对所述目标Trimap图进行透明度处理,得到所述Alpha图。
综上所述,本申请实施例中,利用第一图像分割模型生成的初始样本Mask图对第二图像分割模型所生成的初始样本Trimap图进行优化,使得到的目标样本Trimap图中前景区域、背景区域和待定区域的划分更加接近样本图像,提高Trimap图的分割精确度,从而使抠图模型在基于目标Trimap图进行训练的过程中,能够以更加准确的图像分割依据对模型参数进行调整,进而提高抠图模型的抠图精确度和准确率;在模型应用阶段,利用初始Mask图对初始Trimap图进行优化,得到目标Trimap图,从而使抠图模型只 需基于目标Trimap图中的前景区域和背景区域对待定区域进行精细分割,减少了抠图模型的计算量,预先为抠图模型提供了高准确度的图像分割依据,避免抠图模型基于错误的分割结果进行抠图,提高了抠图模型的抠图效率以及抠图精确度。
图16示出了本申请一个示例性实施例提供的计算机设备的结构示意图。计算机设备1600还可以是一台计算机设备或计算机设备集群。具体来讲:所述计算机设备1600包括中央处理单元(Central Processing Unit,CPU)1601、包括随机存取存储器(Random Access Memory,RAM)1602和只读存储器(Read-Only Memory,ROM)1603的系统存储器1604,以及连接系统存储器1604和中央处理单元1601的系统总线1605。所述计算机设备1600还包括帮助计算机设备1600内的各个器件之间传输信息的基本输入/输出系统(Input/Output系统,I/O系统)1606,和用于存储操作系统1613、应用程序1614和其他程序模块1615的大容量存储设备1607。
所述基本输入/输出系统1606包括有用于显示信息的显示器1608和用于用户输入信息的诸如鼠标、键盘之类的输入设备1609。其中所述显示器1608和输入设备1609都通过连接到系统总线1605的输入输出控制器1600连接到中央处理单元1601。所述基本输入/输出系统1606还可以包括输入输出控制器1600以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器1600还提供输出到显示屏、打印机或其他类型的输出设备。
所述大容量存储设备1607通过连接到系统总线1605的大容量存储控制器(未示出)连接到中央处理单元1601。所述大容量存储设备1607及其相关联的计算机可读介质为计算机设备1600提供非易失性存储。也就是说,所述大容量存储设备1607可以包括诸如硬盘或者光盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)之类的计算机可读介质(未示出)。
不失一般性,所述计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、计算机存储器(Erasable Programmable Read Only Memory,EPROM)、读写存储器(Electrically Erasable Programmable Read Only Memory,EEPROM)、闪存或其他固态存储其技术,CD-ROM、数字通用光盘(Digital Versatile Disc,DVD)或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知所述计算机存储介质不局限于上述几种。上述的系统存储器1604和大容量存储设备1607可以统称为存储器。
存储器存储有一个或多个程序,一个或多个程序被配置成由一个或多个中央处理单元1601执行,一个或多个程序包含用于实现上述应用程序安装包的编译方法的指令,中央处理单元1601执行该一个或多个程序实现上述各个方法实施例提供的方法。
根据本申请的各种实施例,所述计算机设备1600还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即计算机设备1600可以通过连接在所述系统总线1605上的网络接口单元1611连接到网络1612,或者说,也可以使用网络接口单元1611来连接到其他类型的网络或远程计算机系统(未示出)。
所述存储器还包括一个或者一个以上的程序,所述一个或者一个以上程序存储于存储器中,所述一个或者一个以上程序包含用于进行本申请实施例提供的方法中由计算机设备所执行的步骤。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有至少一条指令,所述至少一条指令由处理器加载并执行以实现如上各个实施例所述的抠图模型的训练方法或抠图方法。
根据本申请的一个方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述方面的各种可选实现方式中提供的抠图模型的训练方法或抠图方法。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本申请实施例所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读存储介质中或者作为计算机可读存储介质上的一个或多个指令或代码进行传输。计算机可读存储介质包括计算机存储介质和通信介质,其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能够存取的任何可用介质。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种抠图模型的训练方法,所述方法包括:
    将样本图像输入第一图像分割模型和第二图像分割模型,分别得到所述样本图像对应的初始样本掩膜Mask图和初始样本三层分割Trimap图,所述初始样本Mask图被划分为前景区域和背景区域,所述初始样本Trimap图被划分为前景区域、背景区域和待定区域;
    利用所述初始样本Mask图对所述初始样本Trimap图进行优化,得到目标样本Trimap图;
    将所述目标样本Trimap图和所述样本图像输入抠图模型,得到所述样本图像对应的样本透明通道Alpha图,所述样本Alpha图中包含各个像素点对应的预测透明通道值;
    基于所述样本Alpha图和所述样本图像对应的标注Alpha图,对所述抠图模型进行训练,所述标注Alpha图中标注有各个像素点的样本透明通道值。
  2. 根据权利要求1所述的方法,其中,所述利用所述初始样本Mask图对所述初始样本Trimap图进行优化,得到目标样本Trimap图,包括:
    利用所述初始样本Mask图对所述初始样本Trimap图中的待定区域进行优化,得到候选样本Trimap图;
    利用腐蚀样本Mask图对所述候选样本Trimap图中的前景区域进行优化,得到所述目标样本Trimap图,所述腐蚀样本Mask图由所述初始样本Mask图经过腐蚀处理得到。
  3. 根据权利要求2所述的方法,其中,前景区域中的像素点对应第一像素值,待定区域中的像素点对应第二像素值,背景区域中的像素点对应第三像素值,所述第一像素值大于所述第二像素值,且所述第二像素值大于所述第三像素值;
    所述利用所述初始样本Mask图对所述初始样本Trimap图中的待定区域进行优化,得到候选样本Trimap图,包括:
    确定所述初始样本Trimap图和所述初始样本Mask图中像素点的对应关系;
    对于所述初始样本Trimap图中任一像素点,响应于所述像素点在所述初始样本Trimap图中对应的像素值为所述第三像素值,且所述像素点在所述初始样本Mask图中对应的像素值为所述第一像素值,将所述初始样本Trimap图中所述像素点的像素值更新为所述第二像素值,得到所述候选样本Trimap图。
  4. 根据权利要求3所述的方法,其中,所述利用腐蚀样本Mask图对所述候选样本Trimap图中的前景区域进行优化,得到所述目标样本Trimap图,包括:
    对所述腐蚀样本Mask图和所述候选样本Trimap图进行像素级叠加,得到样本叠加图像;
    对于所述样本叠加图像中的任一像素点,响应于像素值大于所述第一像素值,将所述像素点在所述样本叠加图像中的像素值更新为所述第一像素值,得到所述目标样本Trimap图。
  5. 根据权利要求1至4任一所述的方法,其中,所述基于所述样本Alpha图和所述样本图像对应的标注Alpha图,对所述抠图模型进行训练,包括:
    计算所述样本Alpha图中各个像素点对应预测透明通道值与所述样本透明通道值的欧氏距离;
    基于所述欧氏距离确定所述抠图模型的抠图损失,所述抠图损失为各个所述像素点对应的所述欧氏距离之和;
    基于所述抠图损失,利用梯度下降算法对所述图像处理模型进行反向传播训练。
  6. 根据权利要求2至4任一所述的方法,其中,所述利用腐蚀样本Mask图对所述候选样本Trimap图中的所述前景区域进行优化,得到所述目标样本Trimap图之前,所述方法包括:
    利用开源计算机视觉库OpenCV中的图像腐蚀函数,对所述初始样本Mask图进行腐蚀处理,得到所述腐蚀样本Mask图,所述腐蚀样本Mask图中的前景区域小于所述初始样本Mask图中的前景区域。
  7. 一种抠图方法,所述方法包括:
    将目标图像输入第一图像分割模型和第二图像分割模型,分别得到所述目标图像对应的初始Mask图和初始Trimap图,所述初始Mask图被划分为前景区域和背景区域,所述初始Trimap图被划分为前景区域、背景区域和待定区域;
    利用所述初始Mask图对所述初始Trimap图进行优化,得到目标Trimap图;
    将所述目标Trimap图和所述目标图像输入抠图模型,得到所述目标图像对应的Alpha图,所述Alpha图中包含各个像素点对应的透明通道值。
  8. 根据权利要求7所述的方法,其中,所述利用所述初始Mask图对所述初始Trimap图进行优化,得到目标Trimap图,包括:
    利用所述初始Mask图对所述初始Trimap图中的待定区域进行优化,得到候选Trimap图;
    利用腐蚀Mask图对所述候选Trimap图中的前景区域进行优化,得到所述目标Trimap图,所述腐蚀Mask图由所述初始Mask图经过腐蚀处理得到。
  9. 根据权利要求8所述的方法,其中,前景区域中的像素点对应第一像素值,待定区域中的像素点对应第二像素值,背景区域中的像素点对应第三像素值,所述第一像素值大于所述第二像素值,且所述第二像素值大于所述第三像素值;
    所述利用所述初始Mask图对所述初始Trimap图中的待定区域进行优化,得到候选Trimap图,包括:
    确定所述初始Trimap图和所述初始Mask图中像素点的对应关系;
    对于所述初始Trimap图中任一像素点,响应于所述像素点在所述初始Trimap图中对应的像素值为所述第三像素值,且所述像素点在所述初始Mask图中对应的像素值为所述第一像素值,将所述初始Trimap图中所述像素点的像素值更新为所述第二像素值,得到所述候选Trimap图。
  10. 根据权利要求9所述的方法,其中,所述利用腐蚀Mask图对所述候选Trimap图中的前景区域进行优化,得到所述目标Trimap图,包括:
    对所述腐蚀Mask图和所述候选Trimap图进行像素级叠加,得到叠加图像;
    对于所述叠加图像中的任一像素点,响应于像素值大于所述第一像素值,将所述像素点在所述叠加图像中的像素值更新为所述第一像素值,得到所述目标Trimap图。
  11. 根据权利要求7至10任一所述的方法,其中,所述将所述目标Trimap图和所述目标图像输入抠图模型,得到所述目标图像对应的Alpha图,包括:
    将所述目标Trimap图和所述目标图像输入所述抠图模型;
    利用所述抠图模型对所述目标图像中对应所述目标Trimap图前景区域和背景区域的部分进行特征提取,得到图像特征;
    基于所述图像特征,利用所述抠图模型对所述目标Trimap图中的待定区域进行图像分割,以及对所述目标Trimap图进行透明度处理,得到所述Alpha图。
  12. 一种抠图模型的训练装置,所述装置包括:
    第一输入模块,用于将样本图像输入第一图像分割模型和第二图像分割模型,分别得到所述样本图像对应的初始样本掩膜Mask图和初始样本三层分割Trimap图,所述初始样本Mask图被划分为前景区域和背景区域,所述初始样本Trimap图被划分为前景区域、背景区域和待定区域;
    第一优化模块,用于利用所述初始样本Mask图对所述初始样本Trimap图进行优化,得到目标样本Trimap图;
    第二输入模块,用于将所述目标样本Trimap图和所述样本图像输入抠图模型,得到所述样本图像对应的样本透明通道Alpha图,所述样本Alpha图中包含各个像素点对应的预测透明通道值;
    训练模块,用于基于所述样本Alpha图和所述样本图像对应的标注Alpha图,对所述图像处理模型进行训练,所述标注Alpha图中标注有各个像素点的样本透明通道值。
  13. 根据权利要求12所述的装置,其中,所述第一优化模块,包括:
    第一优化单元,用于利用所述初始样本Mask图对所述初始样本Trimap图中的待定区域进行优化,得到候选样本Trimap图;
    第二优化单元,用于利用腐蚀样本Mask图对所述候选样本Trimap图中的前景区域进行优化,得到所述目标样本Trimap图,所述腐蚀样本Mask图由所述初始样本Mask图经过腐蚀处理得到。
  14. 根据权利要求13所述的装置,其中,前景区域中的像素点对应第一像素值,待定区域中的像素点对应第二像素值,背景区域中的像素点对应第三像素值,所述第一像素值大于所述第二像素值,且所述第二像素值大于所述第三像素值;
    所述第一优化单元,还用于:
    确定所述初始样本Trimap图和所述初始样本Mask图中像素点的对应关系;
    对于所述初始样本Trimap图中任一像素点,响应于所述像素点在所述初始样本Trimap图中对应的像素值为所述第三像素值,且所述像素点在所述初始样本Mask图中对应的像素值为所述第一像素值,将所述初始样本Trimap图中所述像素点的像素值更新为所述第二像素值,得到所述候选样本Trimap图。
  15. 一种抠图装置,其中所述装置包括:
    第三输入模块,用于将目标图像输入第一图像分割模型和第二图像分割模型,分别得到所述目标图像对应的初始Mask图和初始Trimap图,所述初始Mask图被划分为前景区域和背景区域,所述初始Trimap图被划分为前景区域、背景区域和待定区域;
    第二优化模块,用于利用所述初始Mask图对所述初始Trimap图进行优化,得到目标Trimap图;
    第四输入模块,用于将所述目标Trimap图和所述目标图像输入抠图模型,得到所述目标图像对应的Alpha图,所述Alpha图中包含各个像素点对应的透明通道值。
  16. 根据权利要求15所述的装置,其中,所述第二优化模块,包括:
    第三优化单元,用于利用所述初始Mask图对所述初始Trimap图中的待定区域进行优化,得到候选Trimap图;
    第四优化单元,用于利用腐蚀Mask图对所述候选Trimap图中的前景区域进行优化,得到所述目标Trimap图,所述腐蚀Mask图由所述初始Mask图经过腐蚀处理得到。
  17. 根据权利要求16所述的装置,其中,所述前景区域中的像素点对应第一像素值,待定区域中的像素点对应第二像素值,背景区域中的像素点对应第三像素值,所述第一像素值大于所述第二像素值,且所述第二像素值大于所述第三像素值;
    所述第三优化单元,还用于:
    确定所述初始Trimap图和所述初始Mask图中像素点的对应关系;
    对于所述初始Trimap图中任一像素点,响应于所述像素点在所述初始Trimap图中对应的像素值为所述第三像素值,且所述像素点在所述初始Mask图中对应的像素值为所述第一像素值,将所述初始Trimap图中所述像素点的像素值更新为所述第二像素值,得到所述候选Trimap图。
  18. 一种计算机设备,其中,所述计算机设备包括处理器和存储器;所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1至6任一所述的抠图模型的训练方法,或实现如权利要求7至11任一所述的抠图方法。
  19. 一种计算机可读存储介质,其中,所述计算机可读存储介质中存储有至少一条程序代码,所述程序代码由处理器加载并执行以实现如权利要求1至6任一所述的抠图模型的训练方法,或实现如权利要求7至11任一所述的抠图方法。
  20. 一种计算机程序产品或计算机程序,其中,所述计算机程序产品或计算机程序包括计算机指令,所述计算机指令存储在计算机可读存储介质中,计算机设备的处理器从所述计算机可读存储介质读取所述计算机指令,所述处理器执行所述计算机指令,使得所述计算机设备执行如权利要求1至6任一所述的抠图模型的训练方法,或实现如权利要求7至11任一所述的抠图方法。
PCT/CN2021/129913 2020-12-18 2021-11-10 抠图模型的训练、抠图方法、装置、设备及存储介质 WO2022127454A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011504662.4 2020-12-18
CN202011504662.4A CN112541927A (zh) 2020-12-18 2020-12-18 抠图模型的训练、抠图方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022127454A1 true WO2022127454A1 (zh) 2022-06-23

Family

ID=75019079

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/129913 WO2022127454A1 (zh) 2020-12-18 2021-11-10 抠图模型的训练、抠图方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN112541927A (zh)
WO (1) WO2022127454A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115223016A (zh) * 2022-09-19 2022-10-21 苏州万店掌网络科技有限公司 一种样本标注方法、装置、设备及介质
CN115496776A (zh) * 2022-09-13 2022-12-20 北京百度网讯科技有限公司 抠图方法、抠图模型的训练方法及装置、设备、介质
CN116167922A (zh) * 2023-04-24 2023-05-26 广州趣丸网络科技有限公司 一种抠图方法、装置、存储介质及计算机设备
CN116433696A (zh) * 2023-06-14 2023-07-14 荣耀终端有限公司 抠图方法、电子设备及计算机可读存储介质
CN118524258A (zh) * 2024-07-25 2024-08-20 浙江嗨皮网络科技有限公司 离线视频背景处理方法、系统及可读存储介质

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541927A (zh) * 2020-12-18 2021-03-23 Oppo广东移动通信有限公司 抠图模型的训练、抠图方法、装置、设备及存储介质
CN113592074B (zh) * 2021-07-28 2023-12-12 北京世纪好未来教育科技有限公司 一种训练方法、生成方法及装置、电子设备
CN114299075A (zh) * 2021-09-08 2022-04-08 上海哔哩哔哩科技有限公司 图像分割及像素点对选择方法、装置、设备及存储介质
CN113657403B (zh) * 2021-10-18 2022-02-25 北京市商汤科技开发有限公司 图像处理方法及图像处理网络的训练方法
CN114155260A (zh) * 2021-12-08 2022-03-08 广州绿怡信息科技有限公司 回收检测的外观影像抠图模型训练方法及抠图方法
CN114399454B (zh) * 2022-01-18 2024-10-18 平安科技(深圳)有限公司 图像处理方法、装置、电子设备及存储介质
CN114298947A (zh) * 2022-01-22 2022-04-08 奥比中光科技集团股份有限公司 生成三分图的方法及用于生成三分图的神经网络构建方法
CN116563304B (zh) * 2022-01-28 2025-03-11 腾讯科技(深圳)有限公司 图像处理方法和装置、图像处理模型的训练方法和装置
CN115082724B (zh) * 2022-03-30 2024-11-12 Oppo广东移动通信有限公司 一种模型处理方法、装置、存储介质及电子设备
CN114820666B (zh) * 2022-04-29 2024-07-23 深圳万兴软件有限公司 一种增加抠图精确度的方法、装置、计算机设备及存储介质
CN114926491B (zh) * 2022-05-11 2024-07-09 北京字节跳动网络技术有限公司 一种抠图方法、装置、电子设备及存储介质
WO2023230936A1 (zh) * 2022-05-31 2023-12-07 北京小米移动软件有限公司 图像分割模型的训练方法、图像分割方法和装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109461167A (zh) * 2018-11-02 2019-03-12 Oppo广东移动通信有限公司 图像处理模型的训练方法、抠图方法、装置、介质及终端
CN111179285A (zh) * 2019-12-31 2020-05-19 珠海方图智能科技有限公司 一种图像处理方法、系统及存储介质
CN111275729A (zh) * 2020-01-17 2020-06-12 新华智云科技有限公司 精分割天空区域的方法及系统、图像换天的方法及系统
CN111784726A (zh) * 2019-09-25 2020-10-16 北京沃东天骏信息技术有限公司 人像抠图方法和装置
CN112541927A (zh) * 2020-12-18 2021-03-23 Oppo广东移动通信有限公司 抠图模型的训练、抠图方法、装置、设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109461167A (zh) * 2018-11-02 2019-03-12 Oppo广东移动通信有限公司 图像处理模型的训练方法、抠图方法、装置、介质及终端
CN111784726A (zh) * 2019-09-25 2020-10-16 北京沃东天骏信息技术有限公司 人像抠图方法和装置
CN111179285A (zh) * 2019-12-31 2020-05-19 珠海方图智能科技有限公司 一种图像处理方法、系统及存储介质
CN111275729A (zh) * 2020-01-17 2020-06-12 新华智云科技有限公司 精分割天空区域的方法及系统、图像换天的方法及系统
CN112541927A (zh) * 2020-12-18 2021-03-23 Oppo广东移动通信有限公司 抠图模型的训练、抠图方法、装置、设备及存储介质

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115496776A (zh) * 2022-09-13 2022-12-20 北京百度网讯科技有限公司 抠图方法、抠图模型的训练方法及装置、设备、介质
CN115223016A (zh) * 2022-09-19 2022-10-21 苏州万店掌网络科技有限公司 一种样本标注方法、装置、设备及介质
CN116167922A (zh) * 2023-04-24 2023-05-26 广州趣丸网络科技有限公司 一种抠图方法、装置、存储介质及计算机设备
CN116433696A (zh) * 2023-06-14 2023-07-14 荣耀终端有限公司 抠图方法、电子设备及计算机可读存储介质
CN116433696B (zh) * 2023-06-14 2023-10-20 荣耀终端有限公司 抠图方法、电子设备及计算机可读存储介质
CN118524258A (zh) * 2024-07-25 2024-08-20 浙江嗨皮网络科技有限公司 离线视频背景处理方法、系统及可读存储介质

Also Published As

Publication number Publication date
CN112541927A (zh) 2021-03-23

Similar Documents

Publication Publication Date Title
WO2022127454A1 (zh) 抠图模型的训练、抠图方法、装置、设备及存储介质
CN112069874B (zh) 胚胎光镜图像中细胞的识别方法及系统、设备及存储介质
CN112639396B (zh) 尺寸测量装置、尺寸测量方法以及半导体制造系统
WO2018108129A1 (zh) 用于识别物体类别的方法及装置、电子设备
US20080136820A1 (en) Progressive cut: interactive object segmentation
CN113763340A (zh) 基于多任务深度学习强直性脊柱炎的自动分级方法
JP2006053919A (ja) 画像データ分離システム及びその方法
CN113312973B (zh) 一种手势识别关键点特征提取方法及系统
TWI701608B (zh) 用於圖片匹配定位的神經網路系統、方法及裝置
KR102352942B1 (ko) 객체 경계정보의 주석을 입력하는 방법 및 장치
CN114897738A (zh) 一种基于语义不一致性检测的图像盲修复方法
CN111931581A (zh) 一种基于卷积神经网络农业害虫识别方法、终端及可读存储介质
CN111179284A (zh) 交互式图像分割方法、系统及终端
CN110310305A (zh) 一种基于bssd检测与卡尔曼滤波的目标跟踪方法与装置
CN114155406B (zh) 一种基于区域级特征融合的位姿估计方法
CN112734778B (zh) 基于神经网络的车辆抠图方法、系统、设备及存储介质
CN116935418A (zh) 一种三维图文模板自动重组方法、设备及系统
CN113361530A (zh) 使用交互手段的图像语义精准分割及优化方法
CN114241202A (zh) 着装分类模型的训练方法及装置、着装分类方法及装置
CN113780040B (zh) 唇部关键点的定位方法及装置、存储介质、电子设备
CN114494693A (zh) 对图像进行语义分割的方法及装置
CN118429863A (zh) 一种先验距离引导的相似性记忆匹配视频实例分割方法
CN113869320B (zh) 一种基于模板的键值对提取方法及系统
Song et al. Bi-directional seed attention network for interactive image segmentation
CN114241481B (zh) 基于文本骨架的文本检测方法、装置和计算机设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21905383

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21905383

Country of ref document: EP

Kind code of ref document: A1