WO2023230927A1 - Procédé et dispositif de traitement d'image, et support lisible de stockage - Google Patents

Procédé et dispositif de traitement d'image, et support lisible de stockage Download PDF

Info

Publication number
WO2023230927A1
WO2023230927A1 PCT/CN2022/096483 CN2022096483W WO2023230927A1 WO 2023230927 A1 WO2023230927 A1 WO 2023230927A1 CN 2022096483 W CN2022096483 W CN 2022096483W WO 2023230927 A1 WO2023230927 A1 WO 2023230927A1
Authority
WO
WIPO (PCT)
Prior art keywords
segmentation
matting
training
image
round
Prior art date
Application number
PCT/CN2022/096483
Other languages
English (en)
Chinese (zh)
Inventor
陈凌颖
张亚森
苏海军
倪鹏程
Original Assignee
北京小米移动软件有限公司
北京小米松果电子有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京小米移动软件有限公司, 北京小米松果电子有限公司 filed Critical 北京小米移动软件有限公司
Priority to CN202280004202.6A priority Critical patent/CN117501309A/zh
Priority to PCT/CN2022/096483 priority patent/WO2023230927A1/fr
Publication of WO2023230927A1 publication Critical patent/WO2023230927A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation

Definitions

  • the present disclosure relates to the field of image processing, and in particular, to an image processing method, device and readable storage medium.
  • the present disclosure provides an image processing method, device and readable storage medium.
  • a data processing method including:
  • the segmentation information used to segment the target image is determined through a target matting model.
  • the target matting model is obtained by alternately training a basic matting network based on the first sample segmentation image and the sample matting image.
  • the basic matting network is based on
  • the second sample segmentation image is obtained by training the original matting network, the first sample segmentation image and the second sample segmentation image carry a first segmentation label, the sample matting image carries a second segmentation label, and the first The segmentation granularity of the segmentation label is greater than the segmentation granularity of the second segmentation label;
  • the cutout target in the target image is determined.
  • an image processing device including:
  • the first acquisition module is configured to acquire the target image
  • the segmentation module is configured to determine segmentation information for segmenting the target image through a target matting model.
  • the target matting model is obtained by alternately training a basic matting network based on the first sample segmentation image and the sample matting image.
  • the basic matting network is obtained by training the original matting network based on the second sample segmentation image, the first sample segmentation image and the second sample segmentation image carry a first segmentation label, and the sample matting image carries a second segmentation label, the segmentation granularity of the first segmentation label is greater than the segmentation granularity of the second segmentation label;
  • the cutout target determination module is configured to determine the cutout target in the target image according to the segmentation information.
  • another image processing device including:
  • Memory used to store instructions executable by the processor
  • the processor is configured to execute the steps of the image processing method provided by the first aspect of the embodiment of the present disclosure.
  • a computer-readable storage medium on which computer program instructions are stored.
  • the program instructions are executed by a processor, the steps of the image processing method provided by the first aspect of the present disclosure are implemented.
  • more accurate segmentation information can be output only by inputting the target image into the target matting model, so that the matting target in the target image can be more accurately determined based on the segmentation information, that is,
  • the target matting model only takes the target image as input, without the need to input additional auxiliary images, such as three-dimensional images or background images, thus saving a lot of time and energy in preparing auxiliary images.
  • the original matting network is first trained through the second sample segmentation image to obtain the basic matting network. Since the segmentation granularity of the first segmentation label is smaller than the segmentation granularity of the second segmentation label, the prediction of the basic matting network can be improved. The robustness of the algorithm is improved, and the accuracy of the basic matting network in locating the matting target is improved. Then, the basic matting network is alternately trained through the first sample segmentation image and the sample matting image to obtain the target matting network.
  • the sample cutout image can improve the accuracy of its segmentation after training the basic cutout network, and use the supervision information when training the basic cutout network on the first sample segmentation image to assist the sample cutout image on the basic cutout network Training can improve the accuracy of positioning the cutout target while improving the accuracy of segmentation information for the cutout target, and can speed up the training of the model.
  • FIG. 1 is a flowchart of an image processing method according to an exemplary embodiment.
  • FIG. 2 is a flow chart of an image background blurring method according to an exemplary embodiment.
  • FIG. 3 is a schematic diagram illustrating the effect of image background blur according to an exemplary embodiment.
  • Figure 4 is a flow chart of an image background replacement method according to an exemplary embodiment.
  • Figure 5 is a schematic diagram illustrating the effect of image background replacement according to an exemplary embodiment.
  • Figure 6 is a schematic structural diagram of an original matting network according to an exemplary embodiment.
  • FIG. 7 is a schematic flowchart of an image processing method according to an exemplary embodiment.
  • Figure 8 is a schematic structural diagram of an original matting network according to an exemplary embodiment.
  • Figure 9 is a schematic structural diagram of a basic matting network according to an exemplary embodiment.
  • Figure 10 is a flow chart of single-task training according to an exemplary embodiment.
  • Figure 11 is a flow chart of dual-task training according to an exemplary embodiment.
  • Figure 12 is a flowchart of a method for obtaining the total loss of fine segmentation according to an exemplary embodiment.
  • Figure 13 is a flowchart of a method for obtaining semantic segmentation loss according to an exemplary embodiment.
  • Figure 14 is a flowchart of a method for obtaining semantic segmentation loss according to an exemplary embodiment.
  • Figure 15 is a flowchart of a method for obtaining target fine segmentation loss according to an exemplary embodiment.
  • Figure 16 is a flowchart of a method for obtaining target fine segmentation loss according to an exemplary embodiment.
  • FIG. 17 is a schematic diagram illustrating color migration of a foreground image according to an exemplary embodiment.
  • Figure 18 is a schematic diagram of a sampling point pair according to an exemplary embodiment.
  • Figure 19 is a structural block diagram of an image processing device according to an exemplary embodiment.
  • Figure 20 is a structural block diagram of an image processing device according to an exemplary embodiment.
  • first, second, etc. are used to describe various information, but such information should not be limited to these terms. These terms are only used to distinguish information of the same type from each other and do not imply a specific order or importance. In fact, expressions such as “first” and “second” can be used interchangeably.
  • first message frame may also be called a second message frame, and similarly, the second message frame may also be called a first message frame.
  • cutout algorithms based on deep learning there are three main types of cutout algorithms based on deep learning: cutout algorithms based on three-dimensional images, cutout algorithms based on background images, and cutout algorithms without additional input.
  • the cutout algorithm based on the three-point map requires an additional input of a three-point map as a guide for the mask cutout area. Therefore, manual work or additional models are needed to provide labeling or prediction results of the three-part graph.
  • the background image-based cutout algorithm needs to provide a background image input without foreground, thus implicitly providing foreground selection information and improving the accuracy of cutout.
  • image-cutting algorithms without additional input there are many misclassifications during image-cutting. For example, during image-cutting, the content belonging to the background is mistakenly classified as the content of the foreground, or the content belonging to the foreground is mistakenly classified as the content of the background. content, resulting in poor cutout accuracy.
  • embodiments of the present disclosure provide an image processing method, device and readable storage medium. The following first introduces the application environment of the embodiment of the present disclosure.
  • An image processing method provided by an embodiment of the present disclosure can be applied to terminal devices, such as mobile phones or cameras and other terminal devices with a shooting function, or terminal devices with image processing functions.
  • terminal devices such as mobile phones or cameras and other terminal devices with a shooting function, or terminal devices with image processing functions.
  • the user can obtain the target image and background image by shooting, or from other devices.
  • the user can obtain the target image and background image from other devices. Background image.
  • the user selects the target image and clicks it, and the output interface of the terminal device can display functional controls such as background blur or background replacement.
  • the user clicks the functional control corresponding to the background blur which can trigger the terminal device to execute the image provided by the embodiment of the present disclosure.
  • the processing method obtains the cutout target, and then the terminal device can continue to perform the background blur algorithm on the target image based on the obtained cutout target to obtain a background blurred image. Or, when the user clicks on the function control corresponding to the background replacement, the mobile phone is triggered to display the option of selecting a background image. After the user completes the operation of selecting the background image, the image processing method in the present disclosure is triggered to be executed, and the cutout target is obtained, and according to the obtained The cutout target is combined with the background image, and the background replacement algorithm is continued to process, thereby obtaining the image after background replacement.
  • Figure 1 is a flow chart of an image processing method according to an exemplary embodiment. As shown in Figure 1, the image processing method includes:
  • the target image is an image to be processed, that is, the image that needs to be cut out in this disclosure.
  • the image can be an RGB image, that is, an optical three primary color image, where R represents Red (red) and G represents Green (green). B represents Blue, and the target image includes a cutout target.
  • the cutout target can be a portrait, an animal image, or an image of any other object.
  • S102 Determine the segmentation information for segmenting the target image through a target matting model.
  • the target matting model is obtained by alternately training a basic matting network based on the first sample segmentation image and the sample matting image.
  • the basic matting model is The original cutout network is trained by the network based on the second sample segmentation image.
  • the first sample segmentation image and the second sample segmentation image carry a first segmentation label
  • the sample cutout image carries a second segmentation label.
  • the segmentation granularity of the first segmentation label is greater than the segmentation granularity of the second segmentation label.
  • the segmentation information can be obtained by directly inputting the target image into the target matting model.
  • the target images can be two images with the same content but different resolutions.
  • the target matting model can be a matting model for a specific type of matting target to improve the accuracy of segmentation information.
  • the target matting model can be a matting model for portraits or a matting model for cats, etc.
  • the target cutouts are diverse and are not specifically limited here.
  • the first sample segmentation image and the second sample segmentation image may be the same or different, the first segmentation label and the second segmentation label are labels with different segmentation granularities, and the segmentation granularity of the first segmentation label is larger than the second segmentation label.
  • the segmentation granularity of the segmentation label is inversely related to the size of the fineness of segmenting the image.
  • the first segmentation label may be a two-category label, and the value of the cutout target may be marked as 1, and the value of the area outside the cutout target in the target image may be marked as 0.
  • the second segmentation label can be a multi-category label (more refined than the two-category label).
  • the value inside the cutout target can be marked as 1, a transition area can be set at the edge of the cutout target, and the value in the transition area can be marked from There is a gradual transition from 1 to 0, and the label values can be 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, etc.
  • the original matting network is first trained through the second sample segmentation image to obtain the basic matting network. Since the segmentation granularity of the first segmentation label is greater than the segmentation granularity of the second segmentation label, the robustness of the prediction of the basic matting network can be improved, and Improve the accuracy of the basic cutout network in locating cutout targets. Then, the basic matting network is alternately trained through the first sample segmentation image and the sample matting image to obtain the target matting network.
  • the sample cutout image can improve the accuracy of its segmentation after training the basic cutout network, and use the supervision information when training the basic cutout network on the first sample segmentation image to assist the sample cutout image on the basic cutout network Training can improve the accuracy of positioning the cutout target while improving the accuracy of segmentation information for the cutout target, and can speed up the training of the model.
  • the cutout target in the target image can be determined based on the predicted value in the segmentation information.
  • the predicted value is a non-0 value, that is, 1 or 0 to Values between 1, excluding 0, for areas outside the cutout target, the predicted value is 0, so that the area represented by a predicted value other than 0 can be determined as the cutout target.
  • the original matting network is first trained through the second sample segmentation image to obtain the basic matting network. Since the segmentation granularity of the first segmentation label is greater than the segmentation granularity of the second segmentation label, the prediction accuracy of the basic matting network can be improved. It improves the accuracy of the basic cutout network in positioning the cutout target, making the interior of the resulting cutout target more complete and free of holes. Then, the basic matting network is alternately trained through the first sample segmentation image and the sample matting image to obtain the target matting network.
  • the sample cutout image can improve the accuracy of its segmentation after training the basic cutout network, and use the supervision information when training the basic cutout network on the first sample segmentation image to assist the sample cutout image on the basic cutout network Training can improve the accuracy of positioning the cutout target while improving the accuracy of segmentation information for the cutout target, and can speed up the training of the model.
  • more accurate segmentation information can be output without inputting additional auxiliary images without using only the target image as input, so as to obtain more accurate matting targets.
  • Figure 2 is a flow chart of an image background blurring method according to an exemplary embodiment. As shown in Figure 2, the method includes:
  • S202 Determine the segmentation information used to segment the target image through a target matting model.
  • the target matting model is obtained by alternately training a basic matting network based on the first sample segmentation image and the sample matting image.
  • the basic matting is The original cutout network is trained by the network based on the second sample segmentation image.
  • the first sample segmentation image and the second sample segmentation image carry a first segmentation label
  • the sample cutout image carries a second segmentation label.
  • the segmentation granularity of the first segmentation label is greater than the segmentation granularity of the second segmentation label.
  • steps S201 to S203 For explanations of steps S201 to S203, reference may be made to the explanations of steps S101 to S103 above, which will not be described again here.
  • the area outside the cutout target in the target image can be blurred.
  • the blur processing method can be box filtering, normalized box filtering, Gaussian filtering, etc.
  • the above method can be used to blur the background in camera shooting scenes, using portraits as the cutout target, so that the remaining areas in the resulting image except for the portrait are blurred, making the portrait clearer and more prominent, and avoiding interference from the background on the portrait. .
  • Figure 3 is a schematic diagram of an image background blurring effect according to an exemplary embodiment.
  • the terminal device can be a mobile phone.
  • the user uses the mobile phone camera to take a photo to obtain the photo, that is, the target image, and then , the user can click on the photo, thereby triggering the mobile phone to display functional controls such as background blur.
  • the target continues to perform blurring algorithm processing to obtain an image with a blurred background.
  • Figure 4 is a flow chart of an image background replacement method according to an exemplary embodiment. As shown in Figure 4, the method includes:
  • S402. Determine the segmentation information for segmenting the target image through a target matting model.
  • the target matting model is obtained by alternately training a basic matting network based on the first sample segmentation image and the sample matting image.
  • the basic matting is The original cutout network is trained by the network based on the second sample segmentation image.
  • the first sample segmentation image and the second sample segmentation image carry a first segmentation label
  • the sample cutout image carries a second segmentation label.
  • the segmentation granularity of the first segmentation label is greater than the segmentation granularity of the second segmentation label.
  • steps S401 to S403 For explanations of steps S401 to S403, reference may be made to the explanations of steps S101 to S103 above, which will not be described again here.
  • the target background image is an image that requires background replacement of the target image, and the target background image can be any image.
  • the cutout target in the target image can be cutout.
  • the cutout target in the target image is separated and saved as a new image, that is, the cutout target. image, or directly transparentize the area outside the cutout target in the target image to obtain the cutout target image.
  • S406 Perform synthesis processing on the cutout target image and the target background image to obtain a background replacement image.
  • the cutout target image can be synthesized onto the target background image, wherein the area on the target background image that coincides with the cutout target is covered by the cutout target, thereby obtaining a background replacement image.
  • is the segmentation information output by the model
  • F is the target image
  • B is the target background image.
  • Figure 5 is a schematic diagram of the effect of image background replacement according to an exemplary embodiment.
  • the terminal device can be a mobile phone.
  • the user uses the mobile phone camera to take a photo to obtain the photo, that is, the target image. Then, The user can click on the photo, thereby triggering the mobile phone to display functional controls such as background replacement. The user clicks on the functional control corresponding to the background replacement, triggering the mobile phone to display the selection background image option.
  • the execution of the image in this disclosure is triggered.
  • the processing method is to obtain the cutout target, and based on the obtained cutout target, combined with the background image, continue to perform the background replacement algorithm processing, thereby obtaining the image after background replacement.
  • Figure 6 is a schematic structural diagram of an original matting network according to an exemplary embodiment. As shown in Figure 6, an exemplary embodiment of the present disclosure also provides an original matting network, which is used for The target matting model in the above image processing method is trained.
  • the original matting network includes a feature extraction module, a dilated convolution pooling module, an upsampling module and a multiple upsampling module.
  • the overall network structure of the original cutout network belongs to the Encoder-Decoder model, and the Encoder part is a feature extraction module.
  • the feature extraction module is used to extract the features of the target image.
  • an ASPP Adrous Spatial Pyramid Pooling, the hole spatial convolution pooling pyramid
  • the hole convolution pooling module is then gradually upsampled by a multi-level Decoder part (upsampling module), and finally uses a multiple upsampling convolution module (Multiple upsampling module) obtains high-resolution features and finally outputs high-resolution prediction results, which is a more accurate segmentation information.
  • the above-mentioned original matting network uses a multiple upsampling module to replace the depth-oriented filtering module, which solves the problem in related technologies that the video-oriented matting models are difficult to apply to mobile terminals. It removes the additional modules suitable for video, and the feature extraction module uses a network structure that is more lightweight and suitable for quantitative models, thus solving the problems of long time consumption and high power consumption on the mobile terminal, and is more suitable for mobile terminal deployment.
  • Figure 7 is a schematic flowchart of an image processing method according to an exemplary embodiment. As shown in Figure 7, based on the target matting model trained by the original matting network, the image processing method includes:
  • S702 Extract features from the target image through the feature extraction module to obtain original feature vectors.
  • S705 Use a multiple upsampling module to upsample the fine segmentation result and the target image to obtain the segmentation information.
  • S706 Determine the cutout target in the target image according to the segmentation information.
  • the feature extraction module can use a more lightweight network structure suitable for quantitative models.
  • the feature extraction module includes multiple convolutional layers, such as 5 convolutional layers, and the ratio of the last convolutional layer is modified to make the features
  • the final output feature size of the extraction module is 1/16 of the input, which can maintain the size of the feature map while keeping the receptive field consistent with the original network.
  • the feature extraction module sequentially extracts feature sizes of 1/2, 1/4, 1/8, 1/16 and 1/16 of the input.
  • the original feature vector can be obtained by extracting features from the target image through the feature extraction module.
  • an ASPP Atrous Spatial Pyramid Pooling, Atrous Spatial Convolution Pooling Pyramid
  • the pooling module performs context extraction on the original feature vector through the dilated convolution pooling module to obtain the context feature vector.
  • depthwise separable convolutions can be used instead of ordinary convolutions in the ASPP module.
  • the Decoder module i.e., the upsampling module
  • the Upsampling module consists of sequentially increasing upsampling convolution modules.
  • the upsampling convolution module The number of modules is the same as the number of convolutional layers in the feature extraction module, and the input of each upsampling convolution module is output by the previous module (i.e., the contextual feature vector output by the atrous convolution pooling module or the fine output of the previous module). Segmentation result) and the corresponding module output of the feature extraction module (i.e., the original feature vector output by the corresponding convolutional layer in the feature extraction module).
  • the target image includes two sub-images of different sizes, for example, one sub-image is an image of size 1024, and the other sub-image is an image of size 512.
  • the smaller sub-image is input into the feature extraction module, and finally the fine segmentation result is output.
  • the 1024 size means that the width and height of the image are both 1024 pixels
  • the 512 size means that the width and height of the image are both 512 pixels.
  • a larger sub-image in the target image and the preliminary prediction result (ie, the fine segmentation result) are input into a multiple upsampling module, specifically a 2x upsampling convolution module, and we get final segmentation information.
  • the multiple upsampling module consists of convolutional layers and upsampling, making it more convenient for mobile terminal deployment.
  • FIG 8 is a schematic structural diagram of an original matting network according to an exemplary embodiment.
  • the original matting network includes an upsampling module composed of multiple upsampling convolution modules.
  • Each upsampling convolution module is connected to a Segmentation prediction head (coarse segmentation prediction head), and each coarse segmentation prediction head is used to output the coarse segmentation result after upsampling by the corresponding upsampling convolution module.
  • each upsampling module can output a corresponding coarse segmentation result, so that multiple coarse segmentation results can be obtained, so that A more accurate semantic segmentation loss can be calculated to better optimize model parameters, accelerate model convergence, and improve model training speed.
  • FIG 9 is a schematic structural diagram of a basic matting network according to an exemplary embodiment.
  • the basic matting network includes an upsampling module composed of multiple upsampling convolution modules.
  • each upsampling convolution module is connected to a Segmentation prediction head (coarse segmentation prediction head) and an Alpha prediction head (fine segmentation prediction head).
  • Each coarse segmentation prediction head is used to output the corresponding upsampling convolution module upsampling.
  • each fine segmentation prediction head is used to output the upsampled fine segmentation result of the corresponding upsampling convolution module.
  • each upsampling module can output a corresponding coarse segmentation result and fine segmentation result after each round of training, thus enabling Multiple coarse segmentation results and multiple fine segmentation results are obtained so that more accurate semantic segmentation loss and total fine segmentation loss can be calculated, thereby better optimizing model parameters, accelerating model convergence, and improving model training speed.
  • An exemplary embodiment of the present disclosure also provides a training method for a target matting model.
  • the trained target matting model is used to implement the image processing method in any of the above embodiments.
  • the training method for the target matting model may include There are two parts: sample data acquisition and model training.
  • the sample data acquisition is used to obtain the first sample segmentation image, the second sample segmentation image and the sample matting image.
  • the sample data includes a semantic segmentation data set and a Matting (matting) data set.
  • the first sample segmentation image and the second sample segmentation image can be images in the semantic segmentation data set
  • the sample matting image is an image in the Matting data set.
  • Both the semantic segmentation data set and the Matting data set include self-collected data and public data. set.
  • the self-collected semantic segmentation data is about 7W, plus the public data set Dark Complexion Portrait Segmentation Dataset (Dark Portrait Segmentation Dataset).
  • the self-collected Matting data set contains about 3,700 high-precision annotations, plus public data sets.
  • data preprocessing that is, data amplification
  • data preprocessing can also be performed on the collected sample data.
  • the semantic segmentation data input size For 512 data preprocessing corresponding to semantic segmentation data includes random scaling, horizontal flipping, rotation, and color dithering.
  • the Mating data input size is 1024, which is downsampled to 512 before inputting the original matting network. Matting data preprocessing includes affine transformation, rotation, flipping, color dithering and random noise or sharpening.
  • Mating data i.e., sample cutout images
  • the existing Mating data can also be background replaced.
  • the original sample cutout image can be background replaced, and the foreground image in the mating data is combined with the new background image to obtain a new sample cutout image.
  • the foreground image can be color migrated to make the fusion of the foreground image and the background image more natural and realistic.
  • the first sample segmented image and the second sample segmented image may be the same or different.
  • the first sample segmentation image and the second sample segmentation image carry the first segmentation label
  • the sample cutout image carries the second segmentation label
  • the first segmentation label and the second segmentation label are labels with different segmentation granularities
  • the first segmentation label The segmentation granularity is greater than the segmentation granularity of the second segmentation label.
  • the first segmentation label can be a two-category label
  • the value of the cutout target can be marked as 1
  • the value of the area outside the cutout target in the target image can be marked as 0.
  • the second segmentation label can be a multi-category label.
  • the value inside the cutout target can be marked as 1, a transition area is set at the edge of the cutout target, and the marked value in the transition area gradually transitions from 1 to 0.
  • the marked values can be 0.9, 0.8, 0.7, 0.6, 0.5, 0.4 , 0.3, 0.2 and 0.1, etc., for example, the closer to the interior of the cutout target, the closer its label value is to 1, and the label value of the part of the target image except the interior area and transition area of the cutout target is 0.
  • model training can be performed.
  • Single-task training is semantic segmentation training, specifically training the original matting network to obtain a basic matting network
  • dual-task training is semantic segmentation training.
  • Alternate with matting training specifically training the basic matting network to obtain the target matting model.
  • Figure 10 is a flow chart of single-task training according to an exemplary embodiment. As shown in Figure 10, it includes:
  • multiple second sample segmented images can be divided into training sets, test sets and verification sets, and multiple rounds of segmentation training are performed on the original matting network.
  • the specific structure of the original matting network can be referred to the above explanation, and will not be repeated here.
  • each upsampling convolution module included in the upsampling module is connected to a coarse segmentation prediction head, and each coarse segmentation prediction head is used to output the corresponding upsampling convolution module.
  • the semantic segmentation loss corresponding to this round of segmentation training is calculated.
  • the preset threshold of the change rate here can be 0.1, or the number of iterations of single-task training can also be set.
  • the training can be stopped and the basic matting network is obtained.
  • the initial learning rate can be 0.0001
  • the optimizer can use the RMSprop optimizer to reduce the learning rate with a momentum of 0.9 every 8 iterations.
  • the robustness of the image segmentation of the obtained basic matting network can be improved, the matting target can be accurately located, and holes inside the matting target can be avoided.
  • Figure 11 is a flow chart of dual-task training according to an exemplary embodiment. As shown in Figure 11, it includes:
  • the basic matting network obtained from single-task training continues to be trained.
  • the basic matting network is subjected to multiple rounds of alternating iterative segmentation training and matting training, that is, in each round of training In , a segmentation training is performed first, and then a cutout training is performed.
  • segmentation training please refer to the above content and will not be repeated here.
  • the basic matting network includes an upsampling module composed of multiple upsampling convolution modules.
  • Each upsampling convolution module is connected to a coarse segmentation prediction head and a fine segmentation prediction head.
  • Each coarse segmentation prediction head is used to output the corresponding The coarse segmentation result after upsampling by the upsampling convolution module
  • each fine segmentation prediction head is used to output the fine segmentation result after upsampling by the corresponding upsampling convolution module. Therefore, during each cutout training, multiple coarse segmentation results, multiple fine segmentation results will be output through the coarse segmentation prediction head and the fine segmentation prediction head, and the final segmentation information will be output.
  • a coarse segmentation prediction head connected to each upsampling convolution module outputs a coarse segmentation result
  • a fine segmentation prediction head connected to each upsampling convolution module outputs a fine segmentation result.
  • the current round of matting can be obtained based on the multiple coarse segmentation results, multiple fine segmentation results, segmentation information output by this round of matting training, and the second segmentation labels carried by the sample matting images in this round of matting training.
  • the total loss of fine segmentation corresponding to training is obtained based on the total fine segmentation loss corresponding to this round of matting training.
  • the optimized basic matting network is optimized again until the basic matting network converges. That is, if the calculated semantic segmentation loss and fine segmentation total loss no longer change or the change rate is less than the preset threshold, the training can be stopped.
  • the preset threshold of the change rate here can be 0.1, or dual tasks can also be set. The number of alternating iterations of training.
  • the training can be stopped and the target matting model can be obtained.
  • the initial learning rate can be 0.00001
  • the optimizer can use the RMSprop optimizer to reduce the learning rate with a momentum of 0.9 every 4 iterations until the learning rate reaches 0.000001, and the learning rate no longer changes.
  • Using the supervision information from the first sample segmentation image to train the basic matting network to assist the sample matting image in training the basic matting network can improve the accuracy of positioning the matting target while improving the segmentation of the matting target.
  • the accuracy of the information and the speed of model training can be accelerated.
  • more accurate segmentation information can be output without inputting additional auxiliary images without using only the target image as input, so as to obtain more accurate matting targets.
  • Figure 12 is a flow chart of a method for obtaining the total loss of fine segmentation according to an exemplary embodiment. As shown in Figure 12, the method includes:
  • the second segmentation label can be binarized according to the preset segmentation value.
  • the preset segmentation value can be 0.1, That is, all values greater than or equal to 0.1 in the second segmentation label are replaced with 1, and all values less than 0.1 are replaced with 0, thereby obtaining a binary segmentation label containing only 1 and 0.
  • the semantic segmentation loss corresponding to this round of matting training can be calculated based on the multiple rough segmentation results and the binary segmentation labels output by this round of matting training.
  • the fine segmentation results and segmentation information contain values between 0 and 1, and include 0 and 1. Therefore, the second segmentation label can be directly used to calculate the target fine segmentation loss.
  • the total fine segmentation loss corresponding to this round of matting training can be obtained by weighting the sum of the semantic segmentation loss corresponding to this round of matting training and the target fine segmentation loss corresponding to this round of matting training.
  • the matting training outputs not only multiple coarse segmentation results, but also multiple fine segmentation results and one segmentation information, so that the calculated total fine segmentation loss includes semantic segmentation loss and target fine segmentation loss, so that Adjust the parameters of each upsampling convolution module more precisely to speed up model training.
  • Figure 13 is a flow chart of a method for obtaining semantic segmentation loss according to an exemplary embodiment. The method is used to calculate the segmentation loss corresponding to segmentation training. As shown in Figure 13, it includes:
  • This round of segmentation training may be the training of the original matting network using the second sample segmentation image.
  • the first segmentation label corresponding to this round of segmentation training may be the first segmentation corresponding to the second sample segmentation image used in this round. label; among them, this round of segmentation training can also be the training of the basic matting network using the first sample segmentation image.
  • the first segmentation label corresponding to this round of segmentation training can be the first sample used in this round.
  • the first segmentation label corresponding to the segmented image can be calculated to obtain a single first semantic segmentation sub-loss, which can be calculated based on multiple single first semantic segmentations
  • the sub-loss is calculated to obtain the final first semantic segmentation sub-loss.
  • multiple single first semantic segmentation sub-losses can be weighted and summed to obtain the final first semantic segmentation sub-loss.
  • the calculation formula of the single first semantic segmentation sub-loss can be:
  • m g is the real label value corresponding to the first segmentation label
  • m p is the value corresponding to the output rough segmentation result
  • is a parameter used to control the weight of the sample's impact on the loss
  • is used to adjust the sample that is difficult to classify.
  • the parameter of attention, L f is the single first semantic segmentation sub-loss.
  • the sampling point pair is the predicted value corresponding to the two points obtained from the coarse segmentation result, and multiple pairs of sampling point pairs are obtained from multiple coarse segmentation results.
  • the pair can be: collecting two points from the output coarse segmentation result.
  • the predicted value of the point at the edge of the cutout target is used as a sampling point pair, or the predicted value of two points inside the cutout target is collected as a sampling point pair in the output coarse segmentation result, or the predicted value is collected outside the cutout target.
  • the predicted values of the two points are used as a pair of sampling points, so that the prediction accuracy of the edge of the matting target can be optimized in a targeted manner.
  • a single second semantic segmentation sub-loss can be calculated for each pair of sampling points and the first segmentation label corresponding to this round of segmentation training.
  • the final second semantic segmentation sub-loss can be calculated based on multiple single second semantic segmentation sub-losses. Second semantic segmentation sub-loss, specifically, multiple single second semantic segmentation sub-losses can be weighted and summed to obtain the final second semantic segmentation sub-loss.
  • the calculation formula of a single second semantic segmentation sub-loss can be:
  • the first semantic segmentation sub-loss can be directly determined as the semantic segmentation loss corresponding to this round of segmentation training, or the second semantic segmentation sub-loss can be directly determined as To determine the semantic segmentation loss corresponding to this round of segmentation training, the first semantic segmentation sub-loss and the second semantic segmentation sub-loss can also be weighted and summed to obtain the semantic segmentation loss corresponding to this round of segmentation training.
  • Figure 14 is a flow chart of a method for obtaining semantic segmentation loss according to an exemplary embodiment. The method is used to calculate the semantic segmentation loss in matting training. As shown in Figure 14, it includes:
  • Figure 15 is a flow chart of a method for obtaining target fine segmentation loss according to an exemplary embodiment. As shown in Figure 15, the method includes:
  • the above-mentioned target fine segmentation loss may include a first fine segmentation sub-loss, in which one round of training only outputs one segmentation information, and may also output multiple fine segmentation results.
  • This segmentation information and the second segmentation label can be calculated to obtain a Average absolute sub error.
  • a fine segmentation result can be calculated with the second segmentation label to obtain an average absolute sub error.
  • Multiple fine segmentation results can correspond to multiple average absolute sub errors.
  • the average absolute sub-error obtained from the segmentation information and the multiple average absolute sub-errors obtained corresponding to multiple fine segmentation results are weighted and summed to obtain the average absolute error, and then the average absolute error is determined as the current round of matting Train the corresponding first fine segmentation sub-loss.
  • the calculation formula of the average absolute sub-error can be:
  • ⁇ p is the segmentation information or fine segmentation result
  • ⁇ g is the corresponding second segmentation label
  • 1 is the absolute value.
  • the total fine segmentation loss corresponding to this round of matting training can be obtained.
  • the above target fine segmentation loss also includes at least one of the following: a second fine segmentation sub-loss, a third fine segmentation sub-loss, and a fourth fine segmentation sub-loss.
  • the second fine segmentation sub-loss can be obtained by calculating the multi-scale Laplacian loss between the segmentation information output by this round of matting training and multiple fine segmentation results respectively and the second segmentation label;
  • the third The fine segmentation sub-loss can be obtained by calculating the segmentation information output by the current round of matting training and the gradient loss between multiple fine segmentation results and the second segmentation label respectively;
  • the fourth fine segmentation sub-loss can be obtained by calculating multiple predictions.
  • the multiple predicted synthetic images are obtained by the synthesis loss between the synthetic image and the label synthetic image.
  • the multiple predicted synthetic images are synthesized with the background image based on the multiple fine segmentation results and segmentation information obtained from the current round of matting training.
  • the image, the label composite image is a composite image of the cutout target obtained according to the second segmentation label and the background image.
  • the target fine segmentation loss includes multiple fine segmentation sub-losses
  • the multiple fine segmentation sub-losses can be weighted and summed to obtain the target fine segmentation loss corresponding to this round of matting training.
  • Figure 16 is a flow chart of a method for obtaining target fine segmentation loss according to an exemplary embodiment. As shown in Figure 16, the method includes:
  • S1605. Calculate the synthesis loss between multiple predicted synthetic images and the label synthetic image respectively, and obtain the fourth fine segmentation sub-loss corresponding to this round of matting training.
  • the multiple predicted synthetic images are output according to this round of matting training.
  • the label composite image is an image synthesized from the cutout targets obtained according to the second segmentation label and the background image.
  • one segmentation information and the second segmentation label can be calculated to obtain a multi-scale Laplacian loss, and a fine segmentation result can be calculated with the second segmentation label to obtain a multi-scale Laplacian loss.
  • Multiple fine segmentation results That is, multiple multi-scale Laplacian losses can be obtained correspondingly.
  • the multi-scale Laplacian loss obtained from the segmentation information and the multi-scale Laplacian loss obtained corresponding to multiple fine segmentation results are weighted. and, the second fine segmentation loss can be obtained.
  • the calculation formula of multi-scale Laplacian loss can be:
  • ⁇ p is the segmentation information or fine segmentation result
  • ⁇ g is the corresponding second segmentation label
  • f s (x) represents the Laplacian pyramid calculation
  • the segmentation information is sequentially downsampled
  • similarity is calculated at different scales.
  • x in f s (x) can be ⁇ p or ⁇ g
  • L lap is the multi-scale Laplacian loss.
  • one segmentation information and the second segmentation label can be calculated to obtain a gradient loss
  • a fine segmentation result can be calculated with the second segmentation label to obtain a gradient loss
  • multiple fine segmentation results can correspond to multiple gradient losses.
  • the third fine segmentation sub-loss can be obtained by weighting the sum of the gradient loss obtained from the segmentation information and the gradient losses corresponding to multiple fine segmentation results.
  • the calculation formula of gradient loss can be:
  • G(x) represents the sobel operator (Sobel operator)
  • ⁇ p is the segmentation information or fine segmentation result
  • ⁇ g is the corresponding second segmentation label
  • x in G(x) can be ⁇ p or ⁇ g
  • L g is the gradient loss.
  • the cutout target and the background image obtained from a fine segmentation result can be synthesized to obtain a predicted composite image
  • multiple predicted composite images can be obtained from each fine segmentation result
  • a segmentation information and the background image can be synthesized to obtain a predicted composite image.
  • a predicted synthetic image can be combined with a label synthetic image to obtain a synthetic loss.
  • Multiple predicted synthetic images can correspond to multiple synthetic losses. The multiple synthetic losses can be weighted and summed to obtain the corresponding result of this round of matting training. Fourth fine segmentation loss.
  • c p is the prediction synthetic map
  • c g is the label synthetic map
  • L C is the synthetic loss
  • the background image is a new background image randomly selected, and the new background image is different from the original background image of the cutout target.
  • multiple loss function calculation methods are designed for segmentation training and matting training, so that a more accurate loss function can be calculated, so that the segmentation information output by the trained target matting model can be more accurate.
  • the following takes the cutout target as a portrait as an example to provide a training method for a target cutout model for portraits, as follows:
  • the training of the target matting model can include two parts: sample data acquisition and model training.
  • sample data acquisition includes the acquisition of data sets and data preprocessing, where,
  • the data set is obtained as follows: the data set used is divided into two parts, the semantic segmentation data set (including multiple first sample segmentation images and second sample segmentation images carrying the first segmentation label) and the Matting data set (including multiple sample cutout image carrying the second segmentation label). It contains self-collected data and public datasets.
  • the self-collected semantic segmentation data is about 7W, plus the public data set Dark Complexion Portrait Segmentation Dataset (Dark Portrait Segmentation Dataset).
  • the self-collected Matting data set contains about 3,700 high-precision annotations, plus public data sets.
  • Data preprocessing is divided into two parts.
  • the input size of the portrait segmentation data is 512.
  • Data preprocessing includes random scaling, horizontal flipping, rotation and color dithering.
  • the input size of the portrait mating data is 1024, which will be downsampled to 512 before being input to the original matting network.
  • Matting data preprocessing includes affine transformation, rotation, flipping, color dithering and random noise or sharpening.
  • Figure 17 is a schematic diagram of color migration of a foreground image according to an exemplary embodiment. As shown in Figure 17, in the amplification of Matting data, due to the color of the background image and the foreground image (image in the Matting data) The spatial distribution difference is usually large.
  • the foreground image is color migrated with a certain probability to make the foreground image and background image more natural and realistic.
  • the Alpha annotation in the image is the label of the foreground image
  • the preprocessed image is Matting data.
  • the image in the Matting data is the image after background replacement and color migration.
  • Alpha is the label corresponding to the image in the Matting data after background replacement and color migration.
  • the entire training process is divided into two stages, the semantic segmentation stage and the dual-task training stage.
  • the semantic segmentation stage only uses portrait segmentation data, and the prediction heads at all levels only use segmentation heads.
  • the initial learning rate is 0.0001
  • the optimizer uses RMSprop (Root Mean Square Prop, root mean square propagation), and every 8 epochs (data is input into the network, a forward calculation and backpropagation are completed) to reduce the learning rate with a momentum of 0.9 , that is, the learning rate is adjusted to 0.9 times the original value.
  • the second stage is the simultaneous training stage of semantic segmentation and matting training, using portrait segmentation data and portrait mating data (portrait matting data), and semantic segmentation and matting training are performed alternately.
  • This alternating training strategy plays a crucial role in maintaining the semantically robust performance of the model when the amount of matting data is small and the network does not have additional third-part image or background image input.
  • Only the semantic segmentation loss is calculated when using portrait segmentation data.
  • the labels of the matting data are binarized according to the threshold of 0.1, and the semantic segmentation loss and matting loss (total fine segmentation loss) are calculated at the same time.
  • the initial learning rate at this stage is 0.00001, and the learning rate is reduced with a momentum of 0.9 every 4 epochs until it remains unchanged at 0.000001.
  • the loss function consists of two parts, semantic segmentation loss and matting loss.
  • the semantic segmentation loss includes a focal loss (focus loss) and a loss function for portrait segmentation based on ranking loss (ranking loss).
  • Matting losses include L1 loss, multi-scale Laplacian loss, gradient loss and synthetic graph loss. The specific description is as follows:
  • Semantic segmentation loss composed of focal loss and improved ranking loss.
  • the mask value predicted by the network is m p and the corresponding value of the real label is m g , where the mask value is the mask value, and here is the mask value corresponding to each pixel in the mask map predicted by the network
  • the real label is the label carried by the sample image, for example, the label carried by the portrait segmentation data or the portrait mating data.
  • focal loss is L f
  • the specific calculation formula is as follows:
  • is a parameter used to control the weight of positive and negative samples on the loss
  • is used to adjust the attention to difficult-to-classify samples.
  • Rankingloss is L r , and the specific calculation formula is as follows:
  • Figure 18 is a schematic diagram of a pair of sampling points according to an exemplary embodiment.
  • two sampling methods are designed, respectively at the edge of the portrait and between the portrait and the portrait. Sampling is performed inside the background to obtain a certain number of sampling points for loss calculation.
  • the original image in the picture is the target image
  • the label is the label corresponding to the target image
  • the edge sampling points are multiple points sampled at the edge of the portrait.
  • Internal sampling points are multiple points sampled inside the portrait and the background.
  • Matting loss This part of the loss consists of four components, three of which are loss calculations between the predicted Alpha (i.e., the output fine segmentation result or segmentation information) and the label, and the last one is generated based on the predicted alpha and label respectively. Loss calculation between synthetic images, where the predicted alpha is a mask image, and each pixel in the mask image corresponds to a mask value from 0 to 1. Assume that the predicted alpha value is ⁇ p and the corresponding label value is ⁇ g . The first loss function is L1 loss Calculated as follows:
  • the second loss function is a multi-scale Laplacian loss, which downsamples the predicted alpha in sequence and calculates similarity at different scales. The specific calculation is as follows:
  • f s (x) represents the Laplacian pyramid calculation.
  • the third loss is the gradient loss L g , using G(x) to represent the sobel operator, then the specific calculation is as follows:
  • the last loss is the composite image loss L C.
  • L C the composite image loss
  • c p represents the image generated by predicted alpha
  • c g represents the image generated by label. The calculation formula is as follows:
  • the total loss is the sum of the above losses according to a certain proportion.
  • FIG. 19 is a structural block diagram of an image processing device according to an exemplary embodiment.
  • the image processing device 1900 may be implemented by software, hardware, or a combination of software and hardware, and is used to execute the steps of the image processing method provided by the foregoing method embodiments.
  • the image processing device 1900 includes a first acquisition module 1901 , a segmentation module 1902 and a cutout target determination module 1903 .
  • the first acquisition module 1901 is configured to acquire the target image
  • the segmentation module 1902 is configured to determine segmentation information for segmenting the target image through a target matting model.
  • the target matting model is obtained by alternately training a basic matting network based on the first sample segmentation image and the sample matting image.
  • the basic matting network is obtained by training the original matting network based on the second sample segmentation image, the first sample segmentation image and the second sample segmentation image carry a first segmentation label, and the sample matting image carries a first segmentation label.
  • Two segmentation labels, the segmentation granularity of the first segmentation label is greater than the segmentation granularity of the second segmentation label;
  • the cutout target determination module 1903 is configured to determine the cutout target in the target image according to the segmentation information.
  • the original matting network includes an upsampling module composed of multiple upsampling convolution modules, each upsampling convolution module is connected to a coarse segmentation prediction head, and each of the coarse segmentation prediction heads is used to output The corresponding coarse segmentation result after upsampling by the upsampling convolution module.
  • the device also includes:
  • a first training module configured to segment images according to a plurality of the second samples, and perform multiple rounds of iterative segmentation training on the original matting network
  • the second acquisition module is configured to acquire multiple rough segmentation results output by this round of segmentation training after each round of iterative training;
  • the first acquisition module is configured to obtain the semantic segmentation loss corresponding to this round of segmentation training based on the multiple rough segmentation results output by this round of segmentation training and the first segmentation label carried by the second sample segmentation image in this round of segmentation training;
  • the first optimization module is configured to optimize the original matting network according to the semantic segmentation loss corresponding to the current round of segmentation training;
  • the second acquisition module is configured to stop training when the original matting network converges to obtain the basic matting network.
  • the basic matting network includes an upsampling module composed of multiple upsampling convolution modules, each upsampling convolution module is connected to a coarse segmentation prediction head and a fine segmentation prediction head, each of which The coarse segmentation prediction head is used to output the coarse segmentation result after upsampling by the corresponding upsampling convolution module, and each of the fine segmentation prediction heads is used to output the fine segmentation result after upsampling by the corresponding upsampling convolution module.
  • the device also includes:
  • a second training module configured to perform multiple rounds of alternating iterations of segmentation training and matting training on the basic matting network based on a plurality of the first sample segmentation images and a plurality of the sample matting images;
  • the third acquisition module is configured to obtain the semantic segmentation loss corresponding to this round of segmentation training after each round of segmentation training;
  • the second optimization module is configured to optimize the basic matting network according to the semantic segmentation loss corresponding to the current round of segmentation training;
  • the third acquisition module is configured to, after each round of matting training, output multiple coarse segmentation results, multiple fine segmentation results, segmentation information according to the current round of matting training, and the sample matting in this round of matting training.
  • the second segmentation label carried by the image is used to obtain the total fine segmentation loss corresponding to this round of matting training;
  • the third optimization module is configured to optimize the optimized basic matting network again based on the total fine segmentation loss corresponding to the current round of matting training;
  • the fourth acquisition module is configured to stop training when the basic matting network converges to obtain the target matting model.
  • the third acquisition module includes:
  • the first acquisition sub-module is configured to perform binarization processing on the second segmentation label carried by the sample matting image in the current round of matting training according to the preset segmentation value to obtain the sample in the current round of matting training.
  • the binary segmentation label corresponding to the cutout image
  • the second acquisition sub-module is configured to obtain the semantic segmentation loss corresponding to this round of matting training based on the multiple rough segmentation results output by this round of matting training and the binary segmentation labels;
  • the third acquisition submodule is configured to obtain the target fine segmentation loss corresponding to this round of matting training based on the multiple fine segmentation results, segmentation information and the second segmentation label output by this round of matting training;
  • the fourth acquisition sub-module is configured to obtain the fine segmentation total corresponding to the current round of matting training based on the semantic segmentation loss corresponding to the current round of matting training and the target fine segmentation loss corresponding to the current round of matting training. loss.
  • the device further includes a semantic segmentation loss sub-module, which includes:
  • the first acquisition sub-unit is configured to obtain the first semantic segmentation sub-loss based on the distance between the multiple coarse segmentation results and the first segmentation label corresponding to the current round of segmentation training;
  • the second acquisition subunit is configured to acquire multiple pairs of sampling points from the plurality of coarse segmentation results, based on the distances between the multiple pairs of sampling points and the first segmentation labels corresponding to the current round of segmentation training, Obtain the second semantic segmentation sub-loss;
  • the third obtaining sub-unit is configured to obtain the semantic segmentation loss corresponding to the current round of segmentation training based on the first semantic segmentation sub-loss and the second semantic segmentation sub-loss.
  • the second acquisition sub-module includes:
  • the fourth acquisition sub-unit is configured to obtain the third semantic segmentation sub-loss based on the distance between the multiple rough segmentation results output by the current round of matting training and the binary segmentation labels;
  • the fifth acquisition subunit is configured to obtain multiple pairs of sampling points from multiple coarse segmentation results output by the current round of matting training, and according to the distance between the multiple pairs of sampling points and the binary segmentation labels, distance, the fourth semantic segmentation sub-loss is obtained;
  • the sixth obtaining sub-unit is configured to obtain the semantic segmentation loss corresponding to the current round of matting training based on the third semantic segmentation sub-loss and the fourth semantic segmentation sub-loss.
  • the target fine segmentation loss includes a first fine segmentation sub-loss
  • the third acquisition sub-module includes:
  • a calculation subunit configured to calculate the segmentation information output by the current round of matting training and the average absolute error between the multiple fine segmentation results and the second segmentation label
  • the determination subunit is configured to determine the average absolute error as the first fine segmentation sub-loss corresponding to the current round of matting training.
  • the target fine segmentation loss also includes at least one of the following: a second fine segmentation sub-loss, a third fine segmentation sub-loss, and a fourth fine segmentation sub-loss.
  • the third acquisition sub-module also include:
  • the seventh acquisition subunit is configured to calculate the segmentation information of the current round of matting training output and the multi-scale Laplacian loss between the multiple fine segmentation results and the second segmentation label respectively, to obtain the current round of matting. Train the corresponding second fine segmentation sub-loss; and/or,
  • the eighth acquisition subunit is configured to calculate the segmentation information output by the current round of matting training and the gradient loss between the multiple fine segmentation results and the second segmentation labels, and obtain the third fine corresponding to the current round of matting training. split sub-loss; and/or,
  • the ninth acquisition sub-unit is configured to calculate the synthesis loss between multiple predicted synthetic images and the label synthetic image respectively, and obtain the fourth fine segmentation sub-loss corresponding to the current round of matting training.
  • the multiple predicted synthetic images are based on The multiple fine segmentation results output by this round of matting training and the multiple matting targets obtained from the segmentation information are respectively combined with the background image.
  • the label composite image is the matting target obtained according to the second segmentation label and the image.
  • the image is synthesized from the background image.
  • the device also includes:
  • the background blur module is configured to blur the portion of the target image other than the cutout target according to the cutout target to obtain a background blurred image.
  • the device also includes:
  • the fourth acquisition module is configured to acquire the target background image
  • the fifth acquisition module is configured to perform cutout processing on the target image according to the cutout target to obtain the cutout target image
  • the background replacement module is configured to synthesize the cutout target image and the target background image to obtain a background replacement image.
  • the original matting network includes a feature extraction module, a dilated convolution pooling module, an upsampling module and a multiple upsampling module;
  • the segmentation module 1902 includes:
  • a feature extraction submodule configured to perform feature extraction on the target image through the feature extraction module to obtain an original feature vector
  • the context extraction sub-module is configured to perform context extraction on the original feature vector through the atrous convolution pooling module to obtain a context feature vector;
  • the first upsampling sub-module is configured to upsample the context feature vector and the original feature vector through the upsampling module to obtain a fine segmentation result
  • the second upsampling sub-module is configured to upsample the fine segmentation result and the target image through a multiple upsampling module to obtain the segmentation information.
  • Figure 20 is a structural block diagram of an image processing device according to an exemplary embodiment.
  • the device 2000 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like.
  • device 2000 may include one or more of the following components: processing component 2002, memory 2004, power supply component 2006, multimedia component 2008, audio component 2010, input/output interface 2012, sensor component 2014, and communication component 2016.
  • Processing component 2002 generally controls the overall operations of device 2000, such as operations associated with display, phone calls, data communications, camera operations, and recording operations.
  • the processing component 2002 may include one or more processors 2020 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 2002 may include one or more modules that facilitate interaction between processing component 2002 and other components. For example, processing component 2002 may include a multimedia module to facilitate interaction between multimedia component 2008 and processing component 2002.
  • Memory 2004 is configured to store various types of data to support operations at device 2000. Examples of such data include instructions for any application or method operating on device 2000, contact data, phonebook data, messages, pictures, videos, etc.
  • Memory 2004 may be implemented by any type of volatile or non-volatile storage device, or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EEPROM), Programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EEPROM erasable programmable read-only memory
  • EPROM Programmable read-only memory
  • PROM programmable read-only memory
  • ROM read-only memory
  • magnetic memory flash memory, magnetic or optical disk.
  • Power supply component 2006 provides power to various components of device 2000.
  • Power supply components 2006 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to device 2000.
  • Multimedia component 2008 includes a screen that provides an output interface between the device 2000 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide action.
  • multimedia component 2008 includes a front-facing camera and/or a rear-facing camera. When the device 2000 is in an operating mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data.
  • Each front-facing camera and rear-facing camera can be a fixed optical lens system or have a focal length and optical zoom capabilities.
  • Audio component 2010 is configured to output and/or input audio signals.
  • audio component 2010 includes a microphone (MIC) configured to receive external audio signals when device 2000 is in operating modes, such as call mode, recording mode, and speech recognition mode. The received audio signals may be further stored in memory 2004 or sent via communications component 2016 .
  • audio component 2010 also includes a speaker for outputting audio signals.
  • the input/output interface 2012 provides an interface between the processing component 2002 and a peripheral interface module.
  • the peripheral interface module may be a keyboard, a click wheel, a button, etc. These buttons may include, but are not limited to: Home button, Volume buttons, Start button, and Lock button.
  • Sensor component 2014 includes one or more sensors that provide various aspects of status assessment for device 2000 .
  • the sensor component 2014 can detect the open/closed state of the device 2000, the relative positioning of components, such as the display and keypad of the device 2000, and the sensor component 2014 can also detect the position change of the device 2000 or a component of the device 2000. , the presence or absence of user contact with device 2000 , device 2000 orientation or acceleration/deceleration and temperature changes of device 2000 .
  • Sensor assembly 2014 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact.
  • Sensor assembly 2014 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor component 2014 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • Communication component 2016 is configured to facilitate wired or wireless communication between apparatus 2000 and other devices.
  • Device 2000 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof.
  • the communication component 2016 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communications component 2016 also includes a near field communications (NFC) module to facilitate short-range communications.
  • NFC near field communications
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • apparatus 2000 may be configured by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable Gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are implemented for executing the above method.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field programmable Gate array
  • controller microcontroller, microprocessor or other electronic components are implemented for executing the above method.
  • a non-transitory computer-readable storage medium including instructions such as a memory 2004 including instructions, which can be executed by the processor 2020 of the device 2000 to complete the above method is also provided.
  • the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
  • a computer program product comprising a computer program executable by a programmable device, the computer program having a function for performing the above when executed by the programmable device.
  • the code part of the image processing method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé et un dispositif de traitement d'image, et un support de stockage lisible. Le procédé consiste à : acquérir une image cible ; déterminer des informations de segmentation pour segmenter l'image cible au moyen d'un modèle de détourage cible, le modèle de détourage cible étant obtenu par formation alternée d'un réseau de détourage de base sur la base d'une première image de segmentation d'échantillon et d'une image de détourage d'échantillon, le réseau de détourage de base étant obtenu par apprentissage d'un réseau de détourage d'origine sur la base d'une seconde image de segmentation d'échantillon, la première image de segmentation d'échantillon et la seconde image de segmentation d'échantillon comportant une première étiquette de segmentation, l'image de détourage d'échantillon comportant une seconde étiquette de segmentation, et la granularité de segmentation de la première étiquette de segmentation étant supérieure à celle de la seconde étiquette de segmentation ; et déterminer une cible de détourage dans l'image cible selon les informations de segmentation.
PCT/CN2022/096483 2022-05-31 2022-05-31 Procédé et dispositif de traitement d'image, et support lisible de stockage WO2023230927A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202280004202.6A CN117501309A (zh) 2022-05-31 2022-05-31 图像处理方法、装置及可读存储介质
PCT/CN2022/096483 WO2023230927A1 (fr) 2022-05-31 2022-05-31 Procédé et dispositif de traitement d'image, et support lisible de stockage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/096483 WO2023230927A1 (fr) 2022-05-31 2022-05-31 Procédé et dispositif de traitement d'image, et support lisible de stockage

Publications (1)

Publication Number Publication Date
WO2023230927A1 true WO2023230927A1 (fr) 2023-12-07

Family

ID=89026715

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/096483 WO2023230927A1 (fr) 2022-05-31 2022-05-31 Procédé et dispositif de traitement d'image, et support lisible de stockage

Country Status (2)

Country Link
CN (1) CN117501309A (fr)
WO (1) WO2023230927A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986101A (zh) * 2018-05-31 2018-12-11 浙江大学 基于循环“抠图-分割”优化的人体图像分割方法
CN110517278A (zh) * 2019-08-07 2019-11-29 北京旷视科技有限公司 图像分割和图像分割网络的训练方法、装置和计算机设备
US20200175700A1 (en) * 2018-11-29 2020-06-04 Adobe Inc. Joint Training Technique for Depth Map Generation
CN112489063A (zh) * 2020-12-10 2021-03-12 北京金山云网络技术有限公司 图像分割方法、图像分割模型的训练方法和装置
CN114120068A (zh) * 2021-11-04 2022-03-01 腾讯科技(深圳)有限公司 图像处理方法、装置、电子设备、存储介质及计算机产品
CN114445625A (zh) * 2022-02-09 2022-05-06 携程旅游信息技术(上海)有限公司 图片天空提取方法、系统、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986101A (zh) * 2018-05-31 2018-12-11 浙江大学 基于循环“抠图-分割”优化的人体图像分割方法
US20200175700A1 (en) * 2018-11-29 2020-06-04 Adobe Inc. Joint Training Technique for Depth Map Generation
CN110517278A (zh) * 2019-08-07 2019-11-29 北京旷视科技有限公司 图像分割和图像分割网络的训练方法、装置和计算机设备
CN112489063A (zh) * 2020-12-10 2021-03-12 北京金山云网络技术有限公司 图像分割方法、图像分割模型的训练方法和装置
CN114120068A (zh) * 2021-11-04 2022-03-01 腾讯科技(深圳)有限公司 图像处理方法、装置、电子设备、存储介质及计算机产品
CN114445625A (zh) * 2022-02-09 2022-05-06 携程旅游信息技术(上海)有限公司 图片天空提取方法、系统、设备及存储介质

Also Published As

Publication number Publication date
CN117501309A (zh) 2024-02-02

Similar Documents

Publication Publication Date Title
WO2020224457A1 (fr) Appareil et procédé de traitement d'image, dispositif électronique et support d'informations
CN111310616B (zh) 图像处理方法及装置、电子设备和存储介质
CN108629354B (zh) 目标检测方法及装置
CN109658401B (zh) 图像处理方法及装置、电子设备和存储介质
WO2020088280A1 (fr) Procédé et système de transfert de style d'image
WO2020134556A1 (fr) Procédé de transfert de style d'image, dispositif, appareil électronique et support de stockage
US9153031B2 (en) Modifying video regions using mobile device input
WO2020134866A1 (fr) Procédé et appareil de détection de point-clé, dispositif électronique, et support de stockage
CN110889851B (zh) 针对深度和视差估计的语义分割的稳健用途
CN112767329B (zh) 图像处理方法及装置、电子设备
US20210319538A1 (en) Image processing method and device, electronic equipment and storage medium
US20170053156A1 (en) Human face recognition method, apparatus and terminal
WO2021208667A1 (fr) Procédé et appareil de traitement d'images, dispositif électronique et support de stockage
JP2022522551A (ja) 画像処理方法及び装置、電子機器並びに記憶媒体
US20220392202A1 (en) Imaging processing method and apparatus, electronic device, and storage medium
CN114820584B (zh) 肺部病灶定位装置
WO2022110969A1 (fr) Procédé de segmentation d'images non supervisé, dispositif électronique et support de stockage
CN109784164B (zh) 前景识别方法、装置、电子设备及存储介质
CN113409342A (zh) 图像风格迁移模型的训练方法、装置及电子设备
CN116824533A (zh) 一种基于注意力机制的远小目标点云数据特征增强方法
CN109784327B (zh) 边界框确定方法、装置、电子设备及存储介质
CN109509195B (zh) 前景处理方法、装置、电子设备及存储介质
CN114677517A (zh) 一种无人机用语义分割网络模型及图像分割识别方法
CN107992894B (zh) 图像识别方法、装置及计算机可读存储介质
WO2023230927A1 (fr) Procédé et dispositif de traitement d'image, et support lisible de stockage

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 202280004202.6

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22944263

Country of ref document: EP

Kind code of ref document: A1