WO2022017025A1 - Image processing method and apparatus, storage medium, and electronic device - Google Patents

Image processing method and apparatus, storage medium, and electronic device Download PDF

Info

Publication number
WO2022017025A1
WO2022017025A1 PCT/CN2021/098905 CN2021098905W WO2022017025A1 WO 2022017025 A1 WO2022017025 A1 WO 2022017025A1 CN 2021098905 W CN2021098905 W CN 2021098905W WO 2022017025 A1 WO2022017025 A1 WO 2022017025A1
Authority
WO
WIPO (PCT)
Prior art keywords
module
model
image
segmentation
training
Prior art date
Application number
PCT/CN2021/098905
Other languages
French (fr)
Chinese (zh)
Inventor
刘钰安
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2022017025A1 publication Critical patent/WO2022017025A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • G06T5/30Erosion or dilatation, e.g. thinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/143Segmentation; Edge detection involving probabilistic approaches, e.g. Markov random field [MRF] modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present application belongs to the field of image technology, and in particular, relates to an image processing method, device, storage medium and electronic device.
  • Image segmentation is a fundamental topic in the field of computer vision. Image segmentation is the technology and process of dividing an image into several specific regions with unique properties and proposing objects of interest. It is a key step from image processing to image analysis.
  • Embodiments of the present application provide an image processing method, apparatus, storage medium, and electronic device, which can improve the accuracy of image segmentation by the electronic device.
  • an embodiment of the present application provides an image processing method, the method comprising:
  • the pre-trained image segmentation model is used to output a segmentation mask of an image
  • the pre-trained image segmentation model includes at least a segmentation module
  • the segmentation module includes a plurality of convolutional network blocks and at least one volume Layering
  • the multiple convolutional network blocks are sequentially connected and then connected to the at least one convolutional layer
  • each of the convolutional network blocks includes a convolutional layer, a batch normalization layer and a nonlinear activation layer;
  • a second image is segmented from the first image according to a segmentation mask corresponding to the first image.
  • an embodiment of the present application provides an image processing apparatus, and the apparatus includes:
  • a first acquisition module configured to acquire a first image
  • the second acquisition module is used to acquire a pre-trained image segmentation model
  • the pre-trained image segmentation model is used to output a segmentation mask of the image
  • the pre-trained image segmentation model includes at least a segmentation module
  • the segmentation module includes a plurality of volumes
  • the multiple convolutional network blocks are connected in sequence and then connected to the at least one convolutional layer.
  • Each of the convolutional network blocks includes a convolutional layer and a batch normalization layer. and nonlinear activation layer;
  • a processing module configured to input the first image into the pre-trained image segmentation model, and output a segmentation mask corresponding to the first image from the pre-trained image segmentation model;
  • a segmentation module configured to segment a second image from the first image according to a segmentation mask corresponding to the first image.
  • embodiments of the present application provide a storage medium on which a computer program is stored, and when the computer program is executed on a computer, causes the computer to execute the image processing method provided by the embodiments of the present application.
  • an embodiment of the present application further provides an electronic device, including a memory and a processor, where the processor is configured to execute the image processing method provided by the embodiment of the present application by invoking a computer program stored in the memory.
  • FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 2 is another schematic flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a training model provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a model including a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module, and a deep feature supervision module provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of each network block provided by an embodiment of the present application.
  • FIG. 6 is another schematic structural diagram of a training model provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 9 is another schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the executive body of the embodiment of the present application may be an electronic device such as a smart phone or a tablet computer.
  • the application provides an image processing method, the method includes:
  • the pre-trained image segmentation model is used to output a segmentation mask of an image
  • the pre-trained image segmentation model includes at least a segmentation module
  • the segmentation module includes a plurality of convolutional network blocks and at least one volume Layering
  • the multiple convolutional network blocks are sequentially connected and then connected to the at least one convolutional layer
  • each of the convolutional network blocks includes a convolutional layer, a batch normalization layer and a nonlinear activation layer;
  • a second image is segmented from the first image according to a segmentation mask corresponding to the first image.
  • the pre-trained image segmentation model further includes a multi-scale encoder module, a feature pyramid module, and a multi-scale decoder module, the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, The split modules are connected in sequence.
  • the training model used to obtain the pre-trained image segmentation model further includes a deep feature supervision module, the deep feature supervision module is connected to the feature pyramid module, and the deep feature supervision module is connected to the feature pyramid module.
  • the feature supervision module is used to supervise deep features from multiple scales;
  • the multi-scale decoder in the training model outputs a first preliminary segmentation mask corresponding to the training samples
  • the deep feature supervision module in the training model outputs N deep supervision prediction masks corresponding to the training samples, N is the number of layers of the feature pyramid;
  • the back-propagation algorithm is performed on the training model to update the model parameters
  • the model training process is repeated over multiple training cycles until the loss function of the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is fully converged, the model is saved and the parameters of the model are not frozen.
  • the training model further includes an edge gradient module, and the edge gradient module is configured to provide an edge gradient loss function as one of the loss functions during model training.
  • the edge gradient module is used to calculate the edge gradient map corresponding to the input image in the training sample
  • the edge gradient module is used to calculate the edge probability prediction map corresponding to the fine segmentation mask
  • the edge gradient loss function provided by the edge gradient module is used to calculate the edge gradient loss between the edge gradient map and the edge probability prediction map.
  • the deep feature supervision module in the saved model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is removed, and a segmentation module and an edge are added.
  • gradient module
  • Model training continues based on the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, the segmentation module and the edge gradient module, including the following processes:
  • the second preliminary segmentation mask and the input image are spliced in the channel dimension and then input to the segmentation module, and the segmentation module outputs the fine segmentation mask;
  • the edge gradient module invokes the Sobel operator included in the edge gradient module to perform corresponding calculations on the input image to obtain a gradient map of the input image;
  • the model obtained after removing the edge gradient module in the model obtained after training is determined as the pre-trained image segmentation model.
  • the method further includes:
  • the preset processing including random cropping and/or normalization processing
  • the inputting the first image into the pre-trained image segmentation model includes: inputting the image obtained after the first image is subjected to the preset processing into the pre-trained image segmentation model.
  • FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the present application. The process may include:
  • Image segmentation is a fundamental topic in the field of computer vision.
  • Image segmentation is the technology and process of dividing an image into several specific regions with unique properties and proposing objects of interest. It is a key step from image processing to image analysis.
  • the accuracy of image segmentation by the electronic device is low.
  • the electronic device may acquire the first image first.
  • the first image is an image that needs to be processed by image segmentation.
  • Obtain a pre-trained image segmentation model where the pre-trained image segmentation model is used to output a segmentation mask of an image, the pre-trained image segmentation model includes at least a segmentation module, and the segmentation module includes a plurality of convolutional network blocks and at least one convolutional
  • the multiple convolutional network blocks are sequentially connected and then connected to the at least one convolutional layer.
  • Each convolutional network block includes a convolutional layer, a batch normalization layer and a nonlinear activation layer.
  • the electronic device can also acquire a pre-trained image segmentation model that has been trained in advance, where the pre-trained image segmentation model can be used to output a segmentation mask of the image.
  • the pre-trained image segmentation model may include at least a segmentation module, the segmentation module includes a plurality of convolutional network blocks and at least one convolutional layer, the plurality of convolutional network blocks are sequentially connected and then connected to the at least one convolutional layer, each A convolutional network block includes convolutional layers, batch normalization layers (ie, BN layers) and non-linear activation layers (ie, ReLu layers).
  • the user can pre-train the model according to the requirements, so that the pre-trained image segmentation model can output the segmentation mask required by the user.
  • the pre-trained image segmentation model obtained after training should be a model that can output portrait segmentation masks.
  • the user needs to use the trained model to segment a specific object (such as a car or a potted plant, etc.), then the pre-trained image segmentation model obtained after training should be able to output the segmentation mask of the specific object. model, etc.
  • the electronic device can input the first image into the pre-trained image segmentation model, and the pre-trained image segmentation model outputs the segmentation corresponding to the first image mask.
  • the electronic device may segment the first image to obtain a corresponding image, that is, the second image, according to the segmentation mask corresponding to the first image.
  • the electronic device can segment the corresponding portrait from the first image according to the portrait segmentation mask.
  • the electronic device can acquire the first image and a pre-trained image segmentation model, the pre-trained image segmentation model is used to output a segmentation mask of the image, and the pre-trained image segmentation model at least includes a segmentation module.
  • the segmentation module includes a plurality of convolutional network blocks and at least one convolutional layer, the plurality of convolutional network blocks are sequentially connected and then connected to the at least one convolutional layer, and each convolutional network block includes a convolutional layer, a batch of Normalization layer and nonlinear activation layer.
  • the electronic device may input the first image to the pre-trained image segmentation model, and the pre-trained image segmentation model outputs a segmentation mask corresponding to the first image.
  • the electronic device may obtain a second image by segmenting the first image according to a segmentation mask corresponding to the first image. Since the pre-trained image segmentation model includes a segmentation module, the segmentation module includes multiple convolutional network blocks and at least one convolutional layer. The multiple convolutional network blocks are connected in sequence and then connected to the at least one convolutional layer.
  • the product network block includes a convolution layer, a BN layer, and a ReLu layer, so the electronic device can use the segmentation mask output by the pre-trained image segmentation model to more accurately segment the corresponding image from the first image. That is, the embodiments of the present application can improve the accuracy of image segmentation by the electronic device.
  • FIG. 2 is another schematic flowchart of an image processing method provided by an embodiment of the present application.
  • the training model may include a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module, a deep feature supervision module, a segmentation module, and an edge gradient module.
  • the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the segmentation module are connected in sequence.
  • the deep feature supervision module is connected with the feature pyramid module, which is used to supervise the deep features from multiple scales.
  • the edge gradient module is connected with the segmentation module. The edge gradient module is used to provide the edge gradient loss function as one of the loss functions during model training.
  • the multi-scale decoder in the training model outputs the first preliminary segmentation mask corresponding to the training samples
  • the deep feature supervision module in the training model outputs N deep supervision prediction masks corresponding to the training samples, where N is The number of layers of the feature pyramid;
  • the back-propagation algorithm is performed on the training model to update the model parameters
  • the model training process is repeated over multiple training cycles until the loss function of the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is fully converged, the model is saved and the parameters of the model are not frozen.
  • a model including a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module, and a deep feature supervision module may be trained first, and the training may be referred to as the first One stage of training.
  • FIG. 4 includes four modules including a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module, and a deep feature supervision module provided by this embodiment of the application. Schematic diagram of the structure of the model.
  • the input image is input by the multi-scale encoder module, processed by the multi-scale encoder module and transmitted to the feature pyramid module, and then processed by the feature pyramid module and then transmitted to the deep feature supervision module and the multi-scale decoder module respectively.
  • four up-sampling masks can be obtained, such as Mask32, Mask16, Mask8, and Mask4, respectively.
  • the multi-scale decoder module may output a first preliminary segmentation mask (Mask).
  • the basic network in the multi-scale encoder can choose the MobileNetV2 network with strong feature extraction ability and relatively lightweight, and then extract feature maps of different scales to form a feature pyramid.
  • the numbers 320, 64, 32 and 24 on the feature pyramid image in the feature pyramid module represent the number of channels, and the numbers 1/4, 1/8, 1/16 and 1/32 represent the multiples of the down-sampling resolution relative to the original image .
  • conv represents the convolution processing performed by the convolutional layer
  • up2x represents the bilinear interpolation 2 times upsampling processing
  • 4x represents the bilinear interpolation 4 times the upsampling processing.
  • cgr2x represents a first network block consisting of a convolutional layer, a Group Normalization layer, a ReLU layer, and a bilinear interpolation 2x upsampling layer in sequence.
  • sgr2x represents a second network block consisting of a convolutional layer with the same number of input and output channels, a Group Normalization layer, a ReLU layer, and a bilinear interpolation 2 times upsampling layer in sequence.
  • sgr is the third network block where sgr2x removes the bilinear interpolation 2x upsampling layer.
  • FIG. 5 (a) is a schematic structural diagram of the first network block cgr2x, (b) is a structural schematic diagram of the second network block sgr2x, and (c) is a structural schematic diagram of the third network block sgr.
  • the training process of the first stage is described below by taking the training of a portrait segmentation mask as an example.
  • the model used for the first stage training includes a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module and a deep feature supervision module.
  • the electronic device can obtain training samples, and divide the training samples into a test set and a training set in a ratio of 2:8.
  • the electronic device can perform data enhancement processing on the samples in the training set, including random rotation, random left-right flip, random cropping, and Gamma transformation. It should be noted that performing data enhancement processing on the samples in the training set can not only increase the sample data in the training set, but also improve the robustness of the model obtained by training.
  • the electronic device may acquire an image in a training sample for input to the multi-scale encoder module, such as a third image, and perform preset processing on the third image, wherein the preset processing Random cropping and/or normalization processing may be included.
  • the electronic device can input the image obtained by the preset processing of the third image into the multi-scale encoder, and the resolution obtained after being processed by the multi-scale encoder is 1/4, 1/8, and 1/4 of the third image, respectively. 16 and 1/32 feature maps.
  • the multi-scale encoder includes 5 layers, namely the receiving layer (Layer0), the first layer (Layer1), the second layer (Layer2), the third layer (Layer3) and the fourth layer. (Layer4).
  • the receiving layer is used to receive the input third image
  • the first layer, the second layer, the third layer and the fourth layer are respectively used to extract the feature maps of the input image at different scales.
  • the first layer extracts feature maps at one-fourth the resolution of the input image
  • the second layer extracts feature maps at one-eighth the resolution of the input image
  • the third layer extracts the input image
  • the feature map of 1/16th the resolution of the input image is extracted by the fourth layer, and the feature map of 1/32th the resolution of the input image is extracted.
  • the electronic device can transmit these feature maps to the feature pyramid module, so as to obtain corresponding feature pyramid images, which are denoted as the third feature pyramid, for example.
  • the resolution of the third feature pyramid increases sequentially from top to bottom, that is, the resolution of the first layer feature map of the third feature pyramid is 1/32 of the third image, and the second layer feature map
  • the resolution of the third layer is 1/16 of the third image
  • the resolution of the third layer feature map is 1/8 of the third image
  • the resolution of the fourth layer feature map is 1/4 of the third image.
  • the number of channels of the first layer feature map to the fourth layer feature map of the third feature pyramid are 320, 64, 32, and 24 in sequence.
  • the electronic device may record the first-level feature map, second-level feature map, third-level feature map, and fourth-level feature map of the third feature pyramid as a1, b1, c1, and d1 in sequence.
  • the electronic device can call the feature pyramid module to process each image in the third feature pyramid into an image with the same number of channels, so as to obtain a fourth feature pyramid composed of images with the same number of channels.
  • the feature pyramid module can use convolution processing and bilinear interpolation double upsampling processing to process the images in the third feature pyramid into the number of channels. Consistent images, resulting in a fourth feature pyramid.
  • the feature pyramid module can use convolution processing and bilinear interpolation twice upsampling processing, and use bilinear interpolation for lower resolution feature maps.
  • the double upsampling process is mixed with the higher resolution feature maps in the third feature pyramid that have the same resolution as the lower resolution upsampling by a factor of 2, thereby processing the number of channels of both images to 128.
  • the feature pyramid module can first perform convolution processing and bilinear interpolation double upsampling processing on the feature map a1, for example, after processing
  • the obtained feature map a11, and the feature map b1 is subjected to convolution processing, such as the feature map b11 obtained after processing, and then the feature pyramid module can add the feature map a11 and the feature map b11, and obtain after the addition process.
  • the feature map of is subjected to convolution processing to obtain the feature map b2, and the number of channels of the feature map b2 is 128.
  • the electronic device can perform convolution processing on the feature map a1 to obtain the feature map a2, and the number of channels of the feature map a2 is 128.
  • the feature pyramid module can first obtain the feature map a11 and the feature map b11 and add them to obtain b12, and then analyze the feature map b12 performs bilinear interpolation 2 times upsampling processing to obtain feature map b13, and then performs convolution processing on feature map c1 to obtain feature map c11, then adds feature map b13 and feature map c11, and then performs convolution processing to obtain Feature map c2, the number of channels of feature map c2 is 128.
  • the feature pyramid module can first perform convolution processing on the feature map d1 to obtain the feature map d11, and then obtain the feature map b13 and the feature map c11.
  • the feature map c12 obtained after the addition processing, and then perform the bilinear interpolation 2 times upsampling process on the feature map c12 to obtain the feature map c13, add the feature map c13 and the feature map d11, and then perform the convolution process to obtain the feature Figure d2, the number of channels of the feature map d2 is 128.
  • the fourth feature pyramid is composed of feature maps a2, b2, c2, and d2, and their number of channels is 128.
  • the resolutions of feature maps a2, b2, c2, and d2 are 1/32 of the third image, 1/16, 1/8, 1/4.
  • the electronic device can call the deep feature supervision module, and pass the feature maps of each layer of the fourth feature pyramid from top to bottom through the four upsampling layers of the deep feature supervision module, upsampling 32 times, 16 times, 8 times and 4 times respectively. times, so as to obtain a mask image with the same size as the third image (ie, a deep supervision prediction mask), for example, denote the four deep supervision prediction masks as Mask32, Mask16, Mask8, and Mask4, respectively.
  • the electronic device can call the multi-scale decoder module to perform certain processing on each image in the fourth feature pyramid, so that the resolutions of the feature maps of each layer in the fourth feature pyramid are equal to 1 of the third image. /4.
  • the feature map a2 of the first layer of the fourth feature pyramid can be calculated by two first network blocks cgr2x and one second network block sgr2x in turn to obtain a resolution of 1/4 of the third image Image.
  • the feature map b2 of the second layer of the fourth feature pyramid can be sequentially calculated by a first network block cgr2x and a second network block sgr2x to obtain an image with a resolution of 1/4 of the third image.
  • the third layer feature map c2 of the fourth feature pyramid can be calculated by a second network block sgr2x to obtain an image with a resolution of 1/4 of the third image.
  • the third layer feature map d2 of the fourth feature pyramid can be calculated by a third network block sgr to obtain an image with a resolution of 1/4 of the third image.
  • the multi-scale decoder module can sequentially perform addition processing, convolution processing, and 4-fold upsampling processing on the four images obtained in the above-mentioned manner with a resolution of 1/4 of the third image, thereby obtaining a preliminary segmentation mask,
  • the preliminary segmentation mask is denoted as the first preliminary segmentation mask.
  • the model can obtain the first label segmentation mask used as the label (Label) of the training model in the training sample. It can be understood that the first labeled segmentation mask is the accurate portrait segmentation mask corresponding to the third image in this training sample.
  • the model (including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module) can respectively calculate the four deep supervision prediction masks Mask32, Mask16, Mask8 output by the deep feature supervision module , the cross entropy loss between Mask4 and the first annotation segmentation mask, and the cross entropy loss between the first preliminary segmentation mask and the first annotation segmentation mask.
  • the model can perform a backpropagation algorithm on the model according to the above-mentioned five cross-entropy losses calculated, and update the parameters of the model.
  • the electronic device can repeatedly perform the above-mentioned process of training the model (including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module) using the training samples, until The loss function of the model is fully converged, the model is saved, and the parameters of the model are not frozen.
  • the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module
  • the electronic device can perform the second stage of training.
  • the deep feature supervision module in the model obtained after the first stage of training can be removed, and a segmentation module and an edge gradient module can be added.
  • the edge gradient module is used to calculate the edge gradient map corresponding to the input image in the training sample
  • the edge gradient module is used to calculate the edge probability prediction map corresponding to the fine segmentation mask
  • the edge gradient loss function provided by the edge gradient module is used to calculate the edge gradient loss between the edge gradient map and the edge probability prediction map.
  • the model training (that is, the second stage of training) is continued based on the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, the segmentation module and the edge gradient module, including the following process:
  • the second preliminary segmentation mask and the input image are spliced in the channel dimension and then input to the segmentation module, and the segmentation module outputs the fine segmentation mask;
  • the edge gradient module invokes the Sobel operator included in the edge gradient module to perform corresponding calculations on the input image to obtain a gradient map of the input image;
  • the model obtained after removing the edge gradient module in the model obtained after training is determined as the pre-trained image segmentation model.
  • the electronic device can obtain the image in the training sample for input to the multi-scale encoder module, for example, denoted as the fourth image, and firstly record the fourth image.
  • Preset processing is performed, wherein the preset processing may include random cropping and/or normalization processing.
  • the electronic device can input the image obtained by the preset processing of the fourth image into the training model, and sequentially process the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module to obtain a preliminary segmentation mask, such as Denote as the second preliminary segmentation mask.
  • the electronic device can call the training model to perform concat processing on the image obtained after the preset processing of the fourth image and the second preliminary segmentation mask, and transmit the concat-processed image to the segmentation module, where the concat processing is to combine the two
  • the graph is stitched in the channel dimension.
  • each convolutional network block is a network block composed of a convolutional layer, a BN layer and a ReLU layer in sequence.
  • the fine segmentation mask is a two-channel probability prediction map without argmax operation.
  • the electronic device may acquire a second label segmentation mask used as a label (Label) of the training model in the training sample.
  • the second labeled segmentation mask is the precise portrait segmentation mask corresponding to the fourth image in this training sample.
  • the electronic device may input the second preliminary segmentation mask, the second annotation segmentation mask, and the fourth image to the edge gradient module.
  • the fourth image is transmitted to the Sobel operator module of the edge gradient module, and the gradient map of the fourth image is obtained after being processed by the Sobel operator of the Sobel operator module.
  • the edge gradient module calls the expansion and corrosion module it includes to perform expansion and corrosion processing on the second label segmentation mask to obtain an edge mask, which is the edge of the portrait (that is, an edge composed of 0 and 1). mask).
  • the edge gradient module can multiply the gradient map of the fourth image by the edge mask to obtain the edge gradient map of the fourth image, and multiply the fine segmentation mask by the edge mask to obtain the edge probability of the fine segmentation mask forecast graph.
  • the edge gradient module can calculate the cross-entropy loss and structural similarity loss (SSIM Loss) between the fine segmentation mask and the second annotation segmentation mask, and calculate the edge gradient map of the fourth image and the edge of the fine segmentation mask Edge Gradient Loss between probabilistic prediction graphs.
  • the edge gradient module can calculate the cross-entropy loss and structural similarity loss between the fine segmentation mask and the second annotated segmentation mask and the edge probability prediction map between the edge gradient map of the fourth image and the fine segmentation mask.
  • the edge gradient losses are summed.
  • the edge gradient module can perform a backpropagation algorithm on the trained model based on the sum of the calculated losses to update the model parameters.
  • the above-mentioned second-stage training process can be repeated in multiple training cycles until the loss function of the training model is completely converged, the model and its model parameters are saved, and the edge gradient module in the model obtained after the second-stage training is removed to obtain The model is identified as a pretrained image segmentation model.
  • the pre-trained image segmentation model can be obtained by training in the above manner.
  • the third image and the fourth image may not be pre-processed, and when the pre-trained image segmentation model is applied to output the segmentation mask of the image, there is no need to pre-process the image. It is set to be processed and then input to the pre-trained image segmentation model.
  • the calculation methods of the cross-entropy loss, the structural similarity loss, and the edge gradient loss are all calculation methods in the prior art, and thus are not repeated in the embodiments of the present application.
  • this application applies a structural similarity loss to portrait segmentation, so that the segmentation mask is consistent with the label mask in image structure, and additional gradients are provided to train the model and reduce false positive predictions .
  • This application designs an edge gradient module, which motivates the segmentation mask to be consistent with the input image map on the edge gradient, provides additional gradients for edge features, improves the refined segmentation effect on the edge, and reduces false positive predictions.
  • This application adopts a lightweight design, utilizes a lightweight basic network in the multi-scale encoder, and realizes a small amount of computation, so it can be deployed on mobile terminals such as mobile phones.
  • the present application provides a lightweight portrait segmentation model that combines structural similarity and edge gradient.
  • the refinement module ie, the segmentation module
  • the edge gradient module are simultaneously applied to the portrait segmentation model.
  • the two modules are combined, Work together to improve the segmentation effect on the edge and improve the accuracy of portrait segmentation.
  • this application adds structural similarity loss and edge gradient loss during model training to provide additional gradients.
  • the structural similarity loss and edge gradient loss can make the model pay more attention to the segmentation effect on the edge, and the additional gradient on the edge can promote the improvement of the edge segmentation effect.
  • This application designs a segmentation module consisting of three convolutional network blocks and one convolutional layer, which improves the segmentation effect while only introducing a small amount of computation.
  • the edge gradient module and deep feature supervision module in this application are removed in the final deployment, so no additional computing resource requirements are added.
  • the segmentation module can be designed to be more complex, for example, various neural networks can be used to implement it, and it only needs to output a segmentation mask at the end.
  • Resnet Block can be used in the segmentation module, etc.
  • the number of layers of the feature pyramid can be adjusted flexibly depending on the specific data set.
  • the maximum downsampling multiple can be 64 times, 32 times, 16 times, etc. more high-level feature information.
  • the multi-scale encoder can be implemented using various lightweight basic networks, such as ShuffleNet, MobileNetV3, etc.
  • the flow of the image processing method shown in FIG. 2 may include:
  • the electronic device acquires a first image.
  • the electronic device can use the pre-trained image segmentation model to segment images. For example, the electronic device may acquire the first image first.
  • the electronic device performs preset processing on the first image, where the preset processing includes random cropping and/or normalization processing.
  • the electronic device may perform preset processing on the first image.
  • the preset processing may include random cropping and/or normalization processing.
  • the electronic device obtains a pre-trained image segmentation model, where the pre-trained image segmentation model is used to output a portrait segmentation mask of an image, and the pre-trained image segmentation model includes a multi-scale encoder module, a feature pyramid module, and a multi-scale decoder module, segmentation module, the multi-scale encoder module, feature pyramid module, multi-scale decoder module and segmentation module are connected in sequence, the segmentation module includes a plurality of convolutional network blocks and at least one convolutional layer, the plurality of convolutional network The blocks are sequentially connected and then connected to the at least one convolutional layer, and each convolutional network block includes a convolutional layer, a batch normalization layer, and a nonlinear activation layer.
  • the electronic device can also acquire a pre-trained image segmentation model, and the pre-trained image segmentation model can be used to output a portrait segmentation mask of an image.
  • the pretrained image segmentation model may include a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module, and a segmentation module.
  • the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the segmentation module are connected in sequence, and the segmentation module includes a plurality of convolutional network blocks and at least one convolutional layer, and the plurality of convolutional network blocks are connected in sequence before Connected to the at least one convolutional layer, each convolutional network block includes a convolutional layer, a batch normalization layer, and a nonlinear activation layer.
  • the user can pre-train the model according to the requirements of portrait segmentation, so that the pre-trained image segmentation model can output the portrait segmentation mask required by the user.
  • the electronic device inputs the image obtained by the preset processing of the first image into a pre-trained image segmentation model, and the pre-trained image segmentation model outputs a portrait segmentation mask corresponding to the first image.
  • the electronic device can input the image obtained by pre-processing the first image into the pre-trained image segmentation model, and the pre-trained image segmentation model A portrait segmentation mask corresponding to the first image is output.
  • the electronic device segments the portrait from the first image according to the portrait segmentation mask corresponding to the first image.
  • the electronic device may segment the corresponding portrait from the first image according to the portrait segmentation mask.
  • the electronic device After segmenting the portrait from the first image, the electronic device performs background blur processing, background replacement processing, or portrait beauty processing on the first image according to the segmented portrait.
  • the electronic device may perform various processing on the first image according to the segmented portrait. For example, the electronic device may perform background blurring processing on the first image according to the segmented portrait, or the electronic device may perform background replacement processing on the first image according to the segmented portrait, or the electronic device may perform background blurring processing on the first image according to the segmented portrait.
  • the electronic device since the portrait segmented by the electronic device from the first image has high precision, the electronic device performs background blurring processing, background replacement processing, or portrait beauty processing, etc. on the first image according to the segmented portrait. The processing effect will be better, resulting in better image quality.
  • the image processing apparatus 300 may include: a first acquisition module 301 , a second acquisition module 302 , a processing module 303 , and a segmentation module 304 .
  • a first acquisition module 301 configured to acquire a first image
  • the second acquisition module 302 is configured to acquire a pre-trained image segmentation model, the pre-trained image segmentation model is used to output a segmentation mask of an image, the pre-trained image segmentation model includes at least a segmentation module, and the segmentation module includes a plurality of A convolutional network block is connected to at least one convolutional layer, the plurality of convolutional network blocks are connected in sequence and then connected to the at least one convolutional layer, and each of the convolutional network blocks includes a convolutional layer, batch normalization layer and nonlinear activation layer;
  • a processing module 303 configured to input the first image into the pre-trained image segmentation model, and output a segmentation mask corresponding to the first image from the pre-trained image segmentation model;
  • a segmentation module 304 configured to segment a second image from the first image according to a segmentation mask corresponding to the first image.
  • the pre-trained image segmentation model further includes a multi-scale encoder module, a feature pyramid module, and a multi-scale decoder module, the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, The split modules are connected in sequence.
  • the training model used to obtain the pre-trained image segmentation model further includes a deep feature supervision module, the deep feature supervision module is connected to the feature pyramid module, and the deep feature supervision module is connected to the feature pyramid module.
  • the feature supervision module is used to supervise deep features from multiple scales;
  • the multi-scale decoder in the training model outputs a first preliminary segmentation mask corresponding to the training samples
  • the deep feature supervision module in the training model outputs N deep supervision prediction masks corresponding to the training samples, N is the number of layers of the feature pyramid;
  • the back-propagation algorithm is performed on the training model to update the model parameters
  • the model training process is repeated over multiple training cycles until the loss function of the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is fully converged, and the model is saved and the parameters of the model are not frozen.
  • the training model further includes an edge gradient module, and the edge gradient module is configured to provide an edge gradient loss function as one of the loss functions during model training.
  • the edge gradient module is used to calculate the edge gradient map corresponding to the input image in the training sample
  • the edge gradient module is used to calculate the edge probability prediction map corresponding to the fine segmentation mask
  • the edge gradient loss function provided by the edge gradient module is used to calculate the edge gradient loss between the edge gradient map and the edge probability prediction map.
  • the deep feature supervision module in the saved model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is removed, and a segmentation module and an edge are added.
  • gradient module
  • Model training continues based on the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, the segmentation module and the edge gradient module, including the following processes:
  • the second preliminary segmentation mask and the input image are spliced in the channel dimension and then input to the segmentation module, and the segmentation module outputs the fine segmentation mask;
  • the edge gradient module invokes the Sobel operator included in the edge gradient module to perform corresponding calculations on the input image to obtain a gradient map of the input image;
  • the model obtained after removing the edge gradient module in the model obtained after training is determined as the pre-trained image segmentation model.
  • the segmentation module 304 may also be used to:
  • the preset processing including random cropping and/or normalization processing
  • the inputting the first image into the pre-trained image segmentation model includes: inputting the image obtained after the first image is subjected to the preset processing into the pre-trained image segmentation model.
  • An embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed on a computer, causes the computer to execute the process in the image processing method provided in this embodiment.
  • An embodiment of the present application further provides an electronic device, including a memory and a processor, where the processor is configured to execute the process in the image processing method provided by the present embodiment by invoking a computer program stored in the memory.
  • the above-mentioned electronic device may be a mobile terminal such as a tablet computer or a smart phone.
  • FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 400 may include components such as a display screen 401, a memory 402, a processor 403, and the like.
  • components such as a display screen 401, a memory 402, a processor 403, and the like.
  • FIG. 8 does not constitute a limitation on the electronic device, and may include more or less components than the one shown, or combine some components, or arrange different components.
  • the display screen 401 may be used to display information such as text, images, and the like.
  • Memory 402 may be used to store applications and data.
  • the application program stored in the memory 402 contains executable code.
  • Applications can be composed of various functional modules.
  • the processor 403 executes various functional applications and data processing by executing the application programs stored in the memory 402 .
  • the processor 403 is the control center of the electronic device, uses various interfaces and lines to connect various parts of the entire electronic device, and executes the electronic device by running or executing the application program stored in the memory 402 and calling the data stored in the memory 402.
  • the various functions and processing data of the device are used to monitor the electronic equipment as a whole.
  • the processor 403 in the electronic device loads the executable code corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 403 executes the execution and stores it in the memory 402 in the application, thus executing:
  • the pre-trained image segmentation model is used to output a segmentation mask of an image
  • the pre-trained image segmentation model includes at least a segmentation module
  • the segmentation module includes a plurality of convolutional network blocks and at least one volume Layering
  • the multiple convolutional network blocks are sequentially connected and then connected to the at least one convolutional layer
  • each of the convolutional network blocks includes a convolutional layer, a batch normalization layer and a nonlinear activation layer;
  • a second image is segmented from the first image according to a segmentation mask corresponding to the first image.
  • the electronic device 400 may include a display screen 401 , a memory 402 , a processor 403 , a battery 404 , a camera module 405 , a speaker 406 , a microphone 407 and other components.
  • the display screen 401 may be used to display information such as images, text, and the like.
  • Memory 402 may be used to store applications and data.
  • the application program stored in the memory 402 contains executable code.
  • Applications can be composed of various functional modules.
  • the processor 403 executes various functional applications and data processing by executing the application programs stored in the memory 402 .
  • the processor 403 is the control center of the electronic device, uses various interfaces and lines to connect various parts of the entire electronic device, and executes the electronic device by running or executing the application program stored in the memory 402 and calling the data stored in the memory 402.
  • the various functions and processing data of the device are used to monitor the electronic equipment as a whole.
  • the battery 404 can be used to provide power support for various modules and components of the electronic device, thereby ensuring the normal operation of the electronic device.
  • the camera module 405 can be used to capture images.
  • Speaker 406 may be used to play sound signals.
  • the microphone 407 may be used to collect sound signals in the surrounding environment.
  • the microphone 407 may be used to capture the user's voice commands.
  • the processor 403 in the electronic device loads the executable code corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 403 executes the execution and stores it in the memory 402 in the application, thus executing:
  • the pre-trained image segmentation model is used to output a segmentation mask of an image
  • the pre-trained image segmentation model includes at least a segmentation module
  • the segmentation module includes a plurality of convolutional network blocks and at least one volume a stacking layer, wherein the multiple convolutional network blocks are connected in sequence and then connected to the at least one convolutional layer, and each of the convolutional network blocks includes a convolutional layer, a batch normalization layer and a nonlinear activation layer;
  • a second image is segmented from the first image according to a segmentation mask corresponding to the first image.
  • the pre-trained image segmentation model further includes a multi-scale encoder module, a feature pyramid module, and a multi-scale decoder module, the multi-scale encoder module, feature pyramid module, multi-scale decoder module, The split modules are connected in sequence.
  • the training model used to obtain the pre-trained image segmentation model further includes a deep feature supervision module, the deep feature supervision module is connected to the feature pyramid module, and the deep feature supervision module is connected to the feature pyramid module.
  • the feature supervision module is used to supervise deep features from multiple scales;
  • the multi-scale decoder in the training model outputs a first preliminary segmentation mask corresponding to the training samples
  • the deep feature supervision module in the training model outputs N deep supervision prediction masks corresponding to the training samples, N is the number of layers of the feature pyramid;
  • the back-propagation algorithm is performed on the training model to update the model parameters
  • the model training process is repeated over multiple training cycles until the loss function of the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is fully converged, and the model is saved and the parameters of the model are not frozen.
  • the training model further includes an edge gradient module, and the edge gradient module is configured to provide an edge gradient loss function as one of the loss functions during model training.
  • the edge gradient module is used to calculate the edge gradient map corresponding to the input image in the training sample
  • the edge gradient module is used to calculate the edge probability prediction map corresponding to the fine segmentation mask
  • the edge gradient loss function provided by the edge gradient module is used to calculate the edge gradient loss between the edge gradient map and the edge probability prediction map.
  • the deep feature supervision module in the saved model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is removed, and a segmentation module and an edge are added.
  • gradient module
  • Model training continues based on the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, the segmentation module and the edge gradient module, including the following process:
  • the second preliminary segmentation mask and the input image are spliced in the channel dimension and then input to the segmentation module, and the segmentation module outputs the fine segmentation mask;
  • the edge gradient module invokes the Sobel operator included in the edge gradient module to perform corresponding calculations on the input image to obtain a gradient map of the input image;
  • Loss sums the cross-entropy loss and structural similarity loss between the fine segmentation mask and the second annotated segmentation mask and the edge gradient loss between the edge gradient map and the edge probability prediction map deal with;
  • the model obtained after removing the edge gradient module in the model obtained after training is determined as the pre-trained image segmentation model.
  • the processor 403 may also execute:
  • the preset processing including random cropping and/or normalization processing
  • the inputting the first image into the pre-trained image segmentation model includes: inputting the image obtained after the preset processing of the first image into the pre-trained image segmentation model.
  • the image processing apparatus provided in the embodiments of the present application and the image processing methods in the above embodiments belong to the same concept, and any method provided in the image processing method embodiments can be executed on the image processing apparatus.
  • any method provided in the image processing method embodiments can be executed on the image processing apparatus.
  • For the implementation process please refer to the embodiment of the image processing method, which will not be repeated here.
  • the computer program can be stored in a computer-readable storage medium, such as a memory, and executed by at least one processor, and the execution process can include the flow of the embodiment of the image processing method .
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), and the like.
  • each functional module may be integrated into one processing chip, or each module may exist physically alone, or two or more modules may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium, such as a read-only memory, a magnetic disk or an optical disk, etc. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

An image processing method and apparatus, a storage medium, and an electronic device. The method comprises: multiple convolutional network blocks comprised in a segmentation module at least comprised in a pre-trained image segmentation model being connected in sequence and then connected to at least one convolutional layer comprised in the segmentation module, and each convolutional network block comprising a convolutional layer, a batch normalization layer, and a nonlinear activation layer; inputting a first image into the pre-trained image segmentation model, and outputting a segmentation mask; and segmenting a second image from the first image according to the segmentation mask.

Description

图像处理方法、装置、存储介质以及电子设备Image processing method, device, storage medium, and electronic device
本申请要求于2020年7月23日提交中国专利局、申请号为202010718338.6、申请名称为“图像处理方法、装置、存储介质以及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on July 23, 2020 with the application number 202010718338.6 and the application name "image processing method, device, storage medium and electronic device", the entire contents of which are incorporated by reference in this application.
技术领域technical field
本申请属于图像技术领域,尤其涉及一种图像处理方法、装置、存储介质及电子设备。The present application belongs to the field of image technology, and in particular, relates to an image processing method, device, storage medium and electronic device.
背景技术Background technique
图像分割是计算机视觉领域的一个基础课题。图像分割就是把图像分成若干个特定的、具有独特性质的区域并提出感兴趣目标的技术和过程。它是由图像处理到图像分析的关键步骤。Image segmentation is a fundamental topic in the field of computer vision. Image segmentation is the technology and process of dividing an image into several specific regions with unique properties and proposing objects of interest. It is a key step from image processing to image analysis.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供一种图像处理方法、装置、存储介质及电子设备,可以提高电子设备对图像进行分割的精度。Embodiments of the present application provide an image processing method, apparatus, storage medium, and electronic device, which can improve the accuracy of image segmentation by the electronic device.
第一方面,本申请实施例提供一种图像处理方法,所述方法包括:In a first aspect, an embodiment of the present application provides an image processing method, the method comprising:
获取第一图像;get the first image;
获取预训练图像分割模型,所述预训练图像分割模型用于输出图像的分割掩模,所述预训练图像分割模型至少包括分割模块,所述分割模块包括多个卷积网络块与至少一个卷积层,所述多个卷积网络块依次连接后再与所述至少一个卷积层连接,每一所述卷积网络块包括卷积层、批归一化层及非线性激活层;Acquire a pre-trained image segmentation model, the pre-trained image segmentation model is used to output a segmentation mask of an image, the pre-trained image segmentation model includes at least a segmentation module, and the segmentation module includes a plurality of convolutional network blocks and at least one volume Layering, the multiple convolutional network blocks are sequentially connected and then connected to the at least one convolutional layer, and each of the convolutional network blocks includes a convolutional layer, a batch normalization layer and a nonlinear activation layer;
将所述第一图像输入至所述预训练图像分割模型,由所述预训练图像分割模型输出所述第一图像对应的分割掩模;inputting the first image into the pre-training image segmentation model, and outputting a segmentation mask corresponding to the first image by the pre-training image segmentation model;
根据所述第一图像对应的分割掩模从所述第一图像中分割出第二图像。A second image is segmented from the first image according to a segmentation mask corresponding to the first image.
第二方面,本申请实施例提供一种图像处理装置,所述装置包括:In a second aspect, an embodiment of the present application provides an image processing apparatus, and the apparatus includes:
第一获取模块,用于获取第一图像;a first acquisition module, configured to acquire a first image;
第二获取模块,用于获取预训练图像分割模型,所述预训练图像分割模型用于输出图像的分割掩模,所述预训练图像分割模型至少包括分割模块,所述分割模块包括多个卷积网络块与至少一个卷积层,所述多个卷积网络块依次连接后再与所述至少一个卷积层连接,每一所述卷积网络块包括卷积层、批归一化层及非线性激活层;The second acquisition module is used to acquire a pre-trained image segmentation model, the pre-trained image segmentation model is used to output a segmentation mask of the image, the pre-trained image segmentation model includes at least a segmentation module, and the segmentation module includes a plurality of volumes The multiple convolutional network blocks are connected in sequence and then connected to the at least one convolutional layer. Each of the convolutional network blocks includes a convolutional layer and a batch normalization layer. and nonlinear activation layer;
处理模块,用于将所述第一图像输入至所述预训练图像分割模型,由所述预训练图像分割模型输出所述第一图像对应的分割掩模;a processing module, configured to input the first image into the pre-trained image segmentation model, and output a segmentation mask corresponding to the first image from the pre-trained image segmentation model;
分割模块,用于根据所述第一图像对应的分割掩模从所述第一图像中分割出第二图像。A segmentation module, configured to segment a second image from the first image according to a segmentation mask corresponding to the first image.
第三方面,本申请实施例提供一种存储介质,其上存储有计算机程序,当所述计算机程序在计算机上执行时,使得所述计算机执行本申请实施例提供的图像处理方法。In a third aspect, embodiments of the present application provide a storage medium on which a computer program is stored, and when the computer program is executed on a computer, causes the computer to execute the image processing method provided by the embodiments of the present application.
第四方面,本申请实施例还提供一种电子设备,包括存储器,处理器,所述处理器通过调用所述存储器中存储的计算机程序,用于执行本申请实施例提供的图像处理方法。In a fourth aspect, an embodiment of the present application further provides an electronic device, including a memory and a processor, where the processor is configured to execute the image processing method provided by the embodiment of the present application by invoking a computer program stored in the memory.
附图说明Description of drawings
下面结合附图,通过对本申请的具体实施方式详细描述,将使本申请的技术方案及其有益效果显而易见。The technical solutions of the present application and the beneficial effects thereof will be apparent through the detailed description of the specific embodiments of the present application in conjunction with the accompanying drawings.
图1是本申请实施例提供的图像处理方法的流程示意图。FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
图2是本申请实施例提供的图像处理方法的另一流程示意图。FIG. 2 is another schematic flowchart of an image processing method provided by an embodiment of the present application.
图3是本申请实施例提供的训练模型的结构示意图。FIG. 3 is a schematic structural diagram of a training model provided by an embodiment of the present application.
图4是本申请实施例提供的包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、深层特征监督模块这四个模块在内的模型的结构示意图。FIG. 4 is a schematic structural diagram of a model including a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module, and a deep feature supervision module provided by an embodiment of the present application.
图5是本申请实施例提供的各网络块的结构示意图。FIG. 5 is a schematic structural diagram of each network block provided by an embodiment of the present application.
图6是本申请实施例提供的训练模型的另一结构示意图。FIG. 6 is another schematic structural diagram of a training model provided by an embodiment of the present application.
图7是本申请实施例提供的图像处理装置的结构示意图。FIG. 7 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
图8是本申请实施例提供的电子设备的结构示意图。FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
图9是本申请实施例提供的电子设备的另一结构示意图。FIG. 9 is another schematic structural diagram of an electronic device provided by an embodiment of the present application.
具体实施方式detailed description
请参照图示,其中相同的组件符号代表相同的组件,本申请的原理是以实施在一适当的运算环境中来举例说明。以下的说明是基于所例示的本申请具体实施例,其不应被视为限制本申请未在此详述的其它具体实施例。Please refer to the drawings, wherein the same component symbols represent the same components, and the principles of the present application are exemplified by being implemented in a suitable computing environment. The following description is based on illustrated specific embodiments of the present application and should not be construed as limiting other specific embodiments of the present application not detailed herein.
可以理解的是,本申请实施例的执行主体可以是诸如智能手机或平板电脑等的电子设备。It can be understood that, the executive body of the embodiment of the present application may be an electronic device such as a smart phone or a tablet computer.
本申请提供一种图像处理方法,所述方法包括:The application provides an image processing method, the method includes:
获取第一图像;get the first image;
获取预训练图像分割模型,所述预训练图像分割模型用于输出图像的分割掩模,所述预训练图像分割模型至少包括分割模块,所述分割模块包括多个卷积网络块与至少一个卷积层,所述多个卷积网络块依次连接后再与所述至少一个卷积层连接,每一所述卷积网络块包括卷积层、批归一化层及非线性激活层;Acquire a pre-trained image segmentation model, the pre-trained image segmentation model is used to output a segmentation mask of an image, the pre-trained image segmentation model includes at least a segmentation module, and the segmentation module includes a plurality of convolutional network blocks and at least one volume Layering, the multiple convolutional network blocks are sequentially connected and then connected to the at least one convolutional layer, and each of the convolutional network blocks includes a convolutional layer, a batch normalization layer and a nonlinear activation layer;
将所述第一图像输入至所述预训练图像分割模型,由所述预训练图像分割模型输出所述第一图像对应的分割掩模;inputting the first image into the pre-training image segmentation model, and outputting a segmentation mask corresponding to the first image by the pre-training image segmentation model;
根据所述第一图像对应的分割掩模从所述第一图像中分割出第二图像。A second image is segmented from the first image according to a segmentation mask corresponding to the first image.
在一种实施方式中,所述预训练图像分割模型还包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块,所述多尺度编码器模块、特征金字塔模块、多尺度解码器模块、分割模块依次连接。In one embodiment, the pre-trained image segmentation model further includes a multi-scale encoder module, a feature pyramid module, and a multi-scale decoder module, the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, The split modules are connected in sequence.
在一种实施方式中,在进行模型训练时,用于得到所述预训练图像分割模型的训练模型还包括深层特征监督模块,所述深层特征监督模块与所述特征金字塔模块连接,所述深层特征监督模块用于从多个尺度对深层特征进行监督;In one embodiment, during model training, the training model used to obtain the pre-trained image segmentation model further includes a deep feature supervision module, the deep feature supervision module is connected to the feature pyramid module, and the deep feature supervision module is connected to the feature pyramid module. The feature supervision module is used to supervise deep features from multiple scales;
在模型训练时,训练模型中的多尺度解码器输出与训练样本对应的第一初步分割掩模,训练模型中的深层特征监督模块输出与所述训练样本对应的N个深监督预测掩模,N为特征金字塔的层数;During model training, the multi-scale decoder in the training model outputs a first preliminary segmentation mask corresponding to the training samples, and the deep feature supervision module in the training model outputs N deep supervision prediction masks corresponding to the training samples, N is the number of layers of the feature pyramid;
获取所述训练样本中用作标注的第一标注分割掩模;obtaining the first label segmentation mask used as label in the training sample;
分别计算所述N个深监督预测掩模中的每一个掩模与所述第一标注分割掩模的交叉熵损失,以及计算所述第一初步分割掩模与所述第一标注分割掩模的交叉熵损失;separately calculating the cross-entropy loss for each of the N deep-supervised prediction masks and the first annotation segmentation mask, and computing the first preliminary segmentation mask and the first annotation segmentation mask The cross entropy loss of ;
根据计算得到的多个交叉熵损失,对训练模型执行反向传播算法,更新模型参数;According to the calculated multiple cross-entropy losses, the back-propagation algorithm is performed on the training model to update the model parameters;
在多个训练周期内重复模型训练过程直至包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、深层特征监督模块的模型的损失函数完全收敛,保存模型且不冻结模型的参数。The model training process is repeated over multiple training cycles until the loss function of the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is fully converged, the model is saved and the parameters of the model are not frozen.
在一种实施方式中,训练模型还包括边缘梯度模块,所述边缘梯度模块用于提供边缘梯度损失函数作为模型训练时的其中一个损失函数。In one embodiment, the training model further includes an edge gradient module, and the edge gradient module is configured to provide an edge gradient loss function as one of the loss functions during model training.
在一种实施方式中,所述边缘梯度模块用于计算训练样本中的输入图像对应的边缘梯度图;In one embodiment, the edge gradient module is used to calculate the edge gradient map corresponding to the input image in the training sample;
将训练样本中的输入图像输入至训练模型中,依次经过多尺度编码器模块、特征金字塔模块、多尺度解码器模块的处理得到第二初步分割掩模;Input the input image in the training sample into the training model, and obtain the second preliminary segmentation mask through the processing of the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module in turn;
将所述第二初步分割掩模和所述输入图像输入至分割模块,由分割模块输出精细分割掩模;inputting the second preliminary segmentation mask and the input image to a segmentation module, and the segmentation module outputs a fine segmentation mask;
所述边缘梯度模块用于计算所述精细分割掩模对应的边缘概率预测图;The edge gradient module is used to calculate the edge probability prediction map corresponding to the fine segmentation mask;
所述边缘梯度模块提供的边缘梯度损失函数用于计算所述边缘梯度图和所述边缘概率预测图之间的边缘梯度损失。The edge gradient loss function provided by the edge gradient module is used to calculate the edge gradient loss between the edge gradient map and the edge probability prediction map.
在一种实施方式中,将保存的包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、深层特征监督模块的模型中的所述深层特征监督模块移除,并加入分割模块和边缘梯度模块;In one embodiment, the deep feature supervision module in the saved model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is removed, and a segmentation module and an edge are added. gradient module;
基于包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、分割模块和边缘梯度模块的模型继续进行模型训练,包括如下流程:Model training continues based on the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, the segmentation module and the edge gradient module, including the following processes:
将训练样本中的输入图像输入至训练模型中,依次经过多尺度编码器模块、特征金字塔模块、多尺度解码器模块的处理得到第二初步分割掩模;Input the input image in the training sample into the training model, and obtain the second preliminary segmentation mask through the processing of the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module in turn;
将所述第二初步分割掩模和所述输入图像在通道维度进行拼接后输入至分割模块,由分割模块输出精细分割掩模;The second preliminary segmentation mask and the input image are spliced in the channel dimension and then input to the segmentation module, and the segmentation module outputs the fine segmentation mask;
将所述输入图像输入至所述边缘梯度模块,由所述边缘梯度模块调用其所包括的索贝尔算子对所述输入图像进行相应的计算,得到所述输入图像的梯度图;Inputting the input image to the edge gradient module, and the edge gradient module invokes the Sobel operator included in the edge gradient module to perform corresponding calculations on the input image to obtain a gradient map of the input image;
获取训练样本中用作标注的第二标注分割掩模;Obtain the second annotation segmentation mask used as annotation in the training sample;
由所述边缘梯度模块调用其所包括的膨胀腐蚀模块对所述第二标注分割掩模进行膨胀腐蚀处理,得到边缘掩模;Invoking the expansion and corrosion module included in the edge gradient module to perform expansion and corrosion processing on the second labeling segmentation mask to obtain an edge mask;
将所述输入图像的梯度图与所述边缘掩模相乘,得到所述输入图像对应的边缘梯度图;Multiplying the gradient map of the input image by the edge mask to obtain an edge gradient map corresponding to the input image;
将所述精细分割掩模与所述边缘掩模相乘,得到所述精细分割掩模对应的边缘概率预测图;multiplying the fine segmentation mask by the edge mask to obtain an edge probability prediction map corresponding to the fine segmentation mask;
计算所述精细分割掩模与所述第二标注分割掩模之间的交叉熵损失和结构相似性损失;calculating a cross-entropy loss and a structural similarity loss between the fine segmentation mask and the second annotated segmentation mask;
计算所述边缘梯度图与所述边缘概率预测图之间的边缘梯度损失;calculating the edge gradient loss between the edge gradient map and the edge probability prediction map;
对所述精细分割掩模与所述第二标注分割掩模之间的交叉熵损失和结构相似性损失以及所述边缘梯度图与所述边缘概率预测图之间的边缘梯度损失进行损失求和处理;Loss summation of the cross-entropy loss and structural similarity loss between the fine segmentation mask and the second annotated segmentation mask and the edge gradient loss between the edge gradient map and the edge probability prediction map deal with;
根据计算得到的损失之和对训练模型执行反向传播算法,更新模型参数;Perform back-propagation algorithm on the training model according to the sum of the calculated losses, and update the model parameters;
在多个训练周期内重复模型训练的过程直至模型的损失函数完全收敛,保存模型及其模型参数;Repeat the process of model training in multiple training cycles until the loss function of the model is completely converged, and save the model and its model parameters;
将训练完成后得到的模型中的边缘梯度模块去除后得到的模型确定为预训练图像分割模型。The model obtained after removing the edge gradient module in the model obtained after training is determined as the pre-trained image segmentation model.
在一种实施方式中,所述方法还包括:In one embodiment, the method further includes:
对所述第一图像进行预设处理,所述预设处理包括随机裁剪和/或归一化处理;performing preset processing on the first image, the preset processing including random cropping and/or normalization processing;
所述将所述第一图像输入至所述预训练图像分割模型,包括:将所述第一图像经过所述预设处理后得到的图像输入至所述预训练图像分割模型。The inputting the first image into the pre-trained image segmentation model includes: inputting the image obtained after the first image is subjected to the preset processing into the pre-trained image segmentation model.
请参阅图1,图1是本申请实施例提供的图像处理方法的流程示意图,流程可以包括:Please refer to FIG. 1. FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the present application. The process may include:
101、获取第一图像。101. Acquire a first image.
图像分割是计算机视觉领域的一个基础课题。图像分割就是把图像分成若干个特定的、具有独特性质的区域并提出感兴趣目标的技术和过程。它是由图像处理到图像分析的关键步骤。然而,相关技术中,电子设备对图像进行分割的精度较低。Image segmentation is a fundamental topic in the field of computer vision. Image segmentation is the technology and process of dividing an image into several specific regions with unique properties and proposing objects of interest. It is a key step from image processing to image analysis. However, in the related art, the accuracy of image segmentation by the electronic device is low.
在本申请实施例中,比如,电子设备可以先获取第一图像。可以理解的是,该第一图像即是需要进行图像分割处理的图像。In this embodiment of the present application, for example, the electronic device may acquire the first image first. It can be understood that the first image is an image that needs to be processed by image segmentation.
102、获取预训练图像分割模型,该预训练图像分割模型用于输出图像的分割掩模,该预训练图像分割模型至少包括分割模块,该分割模块包括多个卷积网络块与至少一个卷积层,该多个卷积网络块依次连接后再与该至少一个卷积层连接,每一卷积网络块包括卷积层、批归一化层及非线性激活层。102. Obtain a pre-trained image segmentation model, where the pre-trained image segmentation model is used to output a segmentation mask of an image, the pre-trained image segmentation model includes at least a segmentation module, and the segmentation module includes a plurality of convolutional network blocks and at least one convolutional The multiple convolutional network blocks are sequentially connected and then connected to the at least one convolutional layer. Each convolutional network block includes a convolutional layer, a batch normalization layer and a nonlinear activation layer.
比如,电子设备还可以获取已预先经过训练的预训练图像分割模型,其中,该预训练图像分割模型可以用于输出图像的分割掩模。该预训练图像分割模型至少可以包括分割模块,该分割模块包括多个卷积网络块与至少一个卷积层,该多个卷积网络块依次连接后再与该至少一个卷积层连接,每一卷积网络块包括卷积层、批归一化层(即BN层)及非线性激活层(即ReLu层)。For example, the electronic device can also acquire a pre-trained image segmentation model that has been trained in advance, where the pre-trained image segmentation model can be used to output a segmentation mask of the image. The pre-trained image segmentation model may include at least a segmentation module, the segmentation module includes a plurality of convolutional network blocks and at least one convolutional layer, the plurality of convolutional network blocks are sequentially connected and then connected to the at least one convolutional layer, each A convolutional network block includes convolutional layers, batch normalization layers (ie, BN layers) and non-linear activation layers (ie, ReLu layers).
需要说明的是,用户可以预先根据需求来训练模型,从而使得预训练图像分割模型可以输出用户所需的分割掩模。例如,用户需要利用训练得到的模型来做人像分割,那么经过训练后得到的预训练图像分割模型应是可以输出人像分割掩模的模型。又如,用户需要利用训练得到的模型来做某一具体物体(如汽车或者盆栽植物等)的分割,那么经过训练后得到的预训练图像分割模型应是可以输出该具体物体的分割掩模的模型,等等。It should be noted that the user can pre-train the model according to the requirements, so that the pre-trained image segmentation model can output the segmentation mask required by the user. For example, if the user needs to use the trained model for portrait segmentation, the pre-trained image segmentation model obtained after training should be a model that can output portrait segmentation masks. For another example, the user needs to use the trained model to segment a specific object (such as a car or a potted plant, etc.), then the pre-trained image segmentation model obtained after training should be able to output the segmentation mask of the specific object. model, etc.
103、将第一图像输入至预训练图像分割模型,由该预训练图像分割模型输出该第一图像对应的分割掩模。103. Input the first image to a pre-trained image segmentation model, and output a segmentation mask corresponding to the first image by the pre-trained image segmentation model.
比如,在获取到第一图像和预训练图像分割模型后,电子设备可以将该第一图像输入至该预训练图 像分割模型中,并由该预训练图像分割模型输出该第一图像对应的分割掩模。For example, after acquiring the first image and the pre-trained image segmentation model, the electronic device can input the first image into the pre-trained image segmentation model, and the pre-trained image segmentation model outputs the segmentation corresponding to the first image mask.
104、根据第一图像对应的分割掩模从第一图像中分割出第二图像。104. Segment the second image from the first image according to the segmentation mask corresponding to the first image.
比如,在得到第一图像对应的分割掩模后,电子设备可以根据该第一图像对应的分割掩模从该第一图像中分割得到对应的图像,即第二图像。For example, after obtaining the segmentation mask corresponding to the first image, the electronic device may segment the first image to obtain a corresponding image, that is, the second image, according to the segmentation mask corresponding to the first image.
例如,在得到第一图像对应的人像分割掩模后,电子设备可以根据该人像分割掩模从该第一图像出分割出对应的人像。For example, after obtaining the portrait segmentation mask corresponding to the first image, the electronic device can segment the corresponding portrait from the first image according to the portrait segmentation mask.
可以理解的是,本申请实施例中,电子设备可以获取第一图像和预训练图像分割模型,该预训练图像分割模型用于输出图像的分割掩模,该预训练图像分割模型至少包括分割模块,该分割模块包括多个卷积网络块与至少一个卷积层,该多个卷积网络块依次连接后再与该至少一个卷积层连接,每一卷积网络块包括卷积层、批归一化层及非线性激活层。之后,电子设备可以将该第一图像输入至该预训练图像分割模型,并由该预训练图像分割模型输出该第一图像对应的分割掩模。电子设备可以根据该第一图像对应的分割掩模从该第一图像中分割得到第二图像。由于预训练图像分割模型包括分割模块,该分割模块包括多个卷积网络块与至少一个卷积层,该多个卷积网络块依次连接后再与该至少一个卷积层连接,每一卷积网络块包括卷积层、BN层及ReLu层,因此电子设备可以利用该预训练图像分割模型输出的分割掩模更加精确地从第一图像中分割出对应的图像。即,本申请实施例可以提高电子设备对图像进行分割的精度。It can be understood that, in this embodiment of the present application, the electronic device can acquire the first image and a pre-trained image segmentation model, the pre-trained image segmentation model is used to output a segmentation mask of the image, and the pre-trained image segmentation model at least includes a segmentation module. , the segmentation module includes a plurality of convolutional network blocks and at least one convolutional layer, the plurality of convolutional network blocks are sequentially connected and then connected to the at least one convolutional layer, and each convolutional network block includes a convolutional layer, a batch of Normalization layer and nonlinear activation layer. Afterwards, the electronic device may input the first image to the pre-trained image segmentation model, and the pre-trained image segmentation model outputs a segmentation mask corresponding to the first image. The electronic device may obtain a second image by segmenting the first image according to a segmentation mask corresponding to the first image. Since the pre-trained image segmentation model includes a segmentation module, the segmentation module includes multiple convolutional network blocks and at least one convolutional layer. The multiple convolutional network blocks are connected in sequence and then connected to the at least one convolutional layer. Each volume The product network block includes a convolution layer, a BN layer, and a ReLu layer, so the electronic device can use the segmentation mask output by the pre-trained image segmentation model to more accurately segment the corresponding image from the first image. That is, the embodiments of the present application can improve the accuracy of image segmentation by the electronic device.
请参阅图2,图2为本申请实施例提供的图像处理方法的另一流程示意图。Please refer to FIG. 2 , which is another schematic flowchart of an image processing method provided by an embodiment of the present application.
下面先对预训练图像分割模型的训练过程进行说明。First, the training process of the pre-trained image segmentation model will be described below.
本申请实施例中,如图3所示,训练模型可以包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、深层特征监督模块、分割模块以及边缘梯度模块。其中,多尺度编码器模块、特征金字塔模块、多尺度解码器模块、分割模块依次连接。深层特征监督模块与特征金字塔模块连接,该深层特征监督模块用于从多个尺度对深层特征进行监督。边缘梯度模块与分割模块连接。该边缘梯度模块用于提供边缘梯度损失函数作为模型训练时的其中一个损失函数。In the embodiment of the present application, as shown in FIG. 3 , the training model may include a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module, a deep feature supervision module, a segmentation module, and an edge gradient module. Among them, the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the segmentation module are connected in sequence. The deep feature supervision module is connected with the feature pyramid module, which is used to supervise the deep features from multiple scales. The edge gradient module is connected with the segmentation module. The edge gradient module is used to provide the edge gradient loss function as one of the loss functions during model training.
在模型训练时,训练模型中的多尺度解码器输出与训练样本对应的第一初步分割掩模,训练模型中的深层特征监督模块输出与训练样本对应的N个深监督预测掩模,N为特征金字塔的层数;During model training, the multi-scale decoder in the training model outputs the first preliminary segmentation mask corresponding to the training samples, and the deep feature supervision module in the training model outputs N deep supervision prediction masks corresponding to the training samples, where N is The number of layers of the feature pyramid;
获取训练样本中用作标注的第一标注分割掩模;Obtain the first annotation segmentation mask used as an annotation in the training sample;
分别计算N个深监督预测掩模中的每一个掩模与第一标注分割掩模的交叉熵损失,以及计算第一初步分割掩模与第一标注分割掩模的交叉熵损失;Calculate the cross-entropy loss of each of the N deep-supervised prediction masks and the first annotation segmentation mask, respectively, and calculate the cross-entropy loss of the first preliminary segmentation mask and the first annotation segmentation mask;
根据计算得到的多个交叉熵损失,对训练模型执行反向传播算法,更新模型参数;According to the calculated multiple cross-entropy losses, the back-propagation algorithm is performed on the training model to update the model parameters;
在多个训练周期内重复模型训练过程直至包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、深层特征监督模块的模型的损失函数完全收敛,保存模型且不冻结模型的参数。The model training process is repeated over multiple training cycles until the loss function of the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is fully converged, the model is saved and the parameters of the model are not frozen.
比如,在本实施例中,可以先对包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、深层特征监督模块这四个模块在内的模型进行训练,可以将该训练称为第一阶段的训练。For example, in this embodiment, a model including a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module, and a deep feature supervision module may be trained first, and the training may be referred to as the first One stage of training.
在进行第一阶段的训练时,请参阅图4,图4为本申请实施例提供的包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、深层特征监督模块这四个模块在内的模型的结构示意图。During the first stage of training, please refer to FIG. 4 . FIG. 4 includes four modules including a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module, and a deep feature supervision module provided by this embodiment of the application. Schematic diagram of the structure of the model.
输入图像由多尺度编码器模块输入,经多尺度编码器模块处理后被传输至特征金字塔模块,经特征金字塔模块处理后再分别被传输至深层特征监督模块和多尺度解码器模块。图像对应的数据经深层特征监督模块处理后可以得到4个尺度上采样的掩模(Mask),例如分别记为Mask32、Mask16、Mask8、Mask4。多尺度解码器模块可以输出第一初步分割掩模(Mask)。其中,多尺度编码器中的基础网络可以选用特征提取能力较强同时较为轻量级的MobileNetV2网络,然后提取不同尺度的特征图组成特征金字塔。特征金字塔模块中的特征金字塔图像上的数字320、64、32和24代表通道数,数字1/4、1/8、1/16和1/32代表下采样后分辨率相对于原图的倍数。conv表示卷积层进行的卷积处理,up2x表示双线性插值2倍上采样处理,4x表示双线性插值4倍上采样处理。cgr2x表示一个依次由卷积层、Group Normalization层、ReLU层和双线性插值2倍上采样层组成的第一网络块。sgr2x表示一个依次由输入输 出通道数相同的卷积层、Group Normalization层、ReLU层和双线性插值2倍上采样层组成的第二网络块。sgr则是sgr2x去掉了双线性插值2倍上采样层的第三网络块。请一并参阅图5,图5中(a)为第一网络块cgr2x的结构示意图,(b)为第二网络块sgr2x的结构示意图,(c)为第三网络块sgr的结构示意图。The input image is input by the multi-scale encoder module, processed by the multi-scale encoder module and transmitted to the feature pyramid module, and then processed by the feature pyramid module and then transmitted to the deep feature supervision module and the multi-scale decoder module respectively. After the data corresponding to the image is processed by the deep feature supervision module, four up-sampling masks (Mask) can be obtained, such as Mask32, Mask16, Mask8, and Mask4, respectively. The multi-scale decoder module may output a first preliminary segmentation mask (Mask). Among them, the basic network in the multi-scale encoder can choose the MobileNetV2 network with strong feature extraction ability and relatively lightweight, and then extract feature maps of different scales to form a feature pyramid. The numbers 320, 64, 32 and 24 on the feature pyramid image in the feature pyramid module represent the number of channels, and the numbers 1/4, 1/8, 1/16 and 1/32 represent the multiples of the down-sampling resolution relative to the original image . conv represents the convolution processing performed by the convolutional layer, up2x represents the bilinear interpolation 2 times upsampling processing, and 4x represents the bilinear interpolation 4 times the upsampling processing. cgr2x represents a first network block consisting of a convolutional layer, a Group Normalization layer, a ReLU layer, and a bilinear interpolation 2x upsampling layer in sequence. sgr2x represents a second network block consisting of a convolutional layer with the same number of input and output channels, a Group Normalization layer, a ReLU layer, and a bilinear interpolation 2 times upsampling layer in sequence. sgr is the third network block where sgr2x removes the bilinear interpolation 2x upsampling layer. Please also refer to FIG. 5. In FIG. 5, (a) is a schematic structural diagram of the first network block cgr2x, (b) is a structural schematic diagram of the second network block sgr2x, and (c) is a structural schematic diagram of the third network block sgr.
下面以训练人像分割掩模为例说明第一阶段的训练过程,用于进行第一阶段的训练的模型包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块以及深层特征监督模块。The training process of the first stage is described below by taking the training of a portrait segmentation mask as an example. The model used for the first stage training includes a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module and a deep feature supervision module.
首先,电子设备可以获取训练样本,并将训练样本按照2:8的比例划分为测试集和训练集。电子设备可以对训练集中的样本做包括随机旋转、随机左右翻转、随机裁剪、Gamma变换等在内的数据增强处理。需要说明的是,对训练集中的样本做数据增强处理既可以增加训练集中的样本数据,也可以提高训练得到的模型的鲁棒性。First, the electronic device can obtain training samples, and divide the training samples into a test set and a training set in a ratio of 2:8. The electronic device can perform data enhancement processing on the samples in the training set, including random rotation, random left-right flip, random cropping, and Gamma transformation. It should be noted that performing data enhancement processing on the samples in the training set can not only increase the sample data in the training set, but also improve the robustness of the model obtained by training.
在进行训练时,比如,电子设备可以获取一条训练样本中用于输入到多尺度编码器模块的图像,例如为第三图像,并对该第三图像先进行预设处理,其中该预设处理可以包括随机裁剪和/或归一化处理。During training, for example, the electronic device may acquire an image in a training sample for input to the multi-scale encoder module, such as a third image, and perform preset processing on the third image, wherein the preset processing Random cropping and/or normalization processing may be included.
之后,电子设备可以将第三图像经过预设处理后得到的图像输入到多尺度编码器,经多尺度编码器处理后得到分辨率分别为第三图像的1/4、1/8、1/16和1/32的特征图。例如,如图4所示,多尺度编码器内部包括5个层,分别为接收层(Layer0)、第一层(Layer1)、第二层(Layer2)、第三层(Layer3)和第四层(Layer4)。接收层用于接收输入的第三图像,第一层、第二层、第三层和第四层分别用于提取输入的图像在不同尺度上的特征图。例如,第一层提取的是输入图像的分辨率的四分之一的特征图,第二层提取的是输入图像的分辨率的八分之一的特征图,第三层提取的是输入图像的分辨率的十六分之一的特征图,第四层提取的是输入图像的分辨率的三十二分之一的特征图。电子设备可以将这些特征图传输至特征金字塔模块,从而得到对应的特征金字塔图像,例如记为第三特征金字塔。例如,如图4所示,第三特征金字塔从上到下分辨率依次增大,即第三特征金字塔的第一层特征图的分辨率为第三图像的1/32,第二层特征图的分辨率为第三图像的1/16,第三层特征图的分辨率为第三图像的1/8,第四层特征图的分辨率为第三图像的1/4。并且,如图4所示,第三特征金字塔的第一层特征图至第四层特征图的通道数依次为320、64、32、24。在本实施例中,例如电子设备可以将第三特征金字塔的第一层特征图、第二层特征图、第三层特征图、第四层特征图依次记为a1、b1、c1、d1。After that, the electronic device can input the image obtained by the preset processing of the third image into the multi-scale encoder, and the resolution obtained after being processed by the multi-scale encoder is 1/4, 1/8, and 1/4 of the third image, respectively. 16 and 1/32 feature maps. For example, as shown in Figure 4, the multi-scale encoder includes 5 layers, namely the receiving layer (Layer0), the first layer (Layer1), the second layer (Layer2), the third layer (Layer3) and the fourth layer. (Layer4). The receiving layer is used to receive the input third image, and the first layer, the second layer, the third layer and the fourth layer are respectively used to extract the feature maps of the input image at different scales. For example, the first layer extracts feature maps at one-fourth the resolution of the input image, the second layer extracts feature maps at one-eighth the resolution of the input image, and the third layer extracts the input image The feature map of 1/16th the resolution of the input image is extracted by the fourth layer, and the feature map of 1/32th the resolution of the input image is extracted. The electronic device can transmit these feature maps to the feature pyramid module, so as to obtain corresponding feature pyramid images, which are denoted as the third feature pyramid, for example. For example, as shown in Figure 4, the resolution of the third feature pyramid increases sequentially from top to bottom, that is, the resolution of the first layer feature map of the third feature pyramid is 1/32 of the third image, and the second layer feature map The resolution of the third layer is 1/16 of the third image, the resolution of the third layer feature map is 1/8 of the third image, and the resolution of the fourth layer feature map is 1/4 of the third image. Moreover, as shown in FIG. 4 , the number of channels of the first layer feature map to the fourth layer feature map of the third feature pyramid are 320, 64, 32, and 24 in sequence. In this embodiment, for example, the electronic device may record the first-level feature map, second-level feature map, third-level feature map, and fourth-level feature map of the third feature pyramid as a1, b1, c1, and d1 in sequence.
之后,电子设备可以调用特征金字塔模块将第三特征金字塔中的各图像都处理成通道数一致的图像,从而得到由通道数一致的图像构成的第四特征金字塔。比如,对于第三特征金字塔中每一对上下相邻的两个图像,特征金字塔模块可以利用卷积处理和双线性插值两倍上采样处理,将第三特征金字塔中的图像处理成通道数一致的图像,从而得到第四特征金字塔。例如,对于第三特征金字塔中每一对上下相邻的两个图像,特征金字塔模块可以利用卷积处理和双线性插值两倍上采样处理,将较低分辨率特征图使用双线性插值两倍上采样处理后和第三特征金字塔中与该较低分辨率上采样2倍后分辨率相同的较高分辨率特征图进行混合,从而将这两个图像的通道数都处理成128。After that, the electronic device can call the feature pyramid module to process each image in the third feature pyramid into an image with the same number of channels, so as to obtain a fourth feature pyramid composed of images with the same number of channels. For example, for each pair of two adjacent images in the third feature pyramid, the feature pyramid module can use convolution processing and bilinear interpolation double upsampling processing to process the images in the third feature pyramid into the number of channels. Consistent images, resulting in a fourth feature pyramid. For example, for each pair of two adjacent images in the third feature pyramid, the feature pyramid module can use convolution processing and bilinear interpolation twice upsampling processing, and use bilinear interpolation for lower resolution feature maps. The double upsampling process is mixed with the higher resolution feature maps in the third feature pyramid that have the same resolution as the lower resolution upsampling by a factor of 2, thereby processing the number of channels of both images to 128.
例如,对于第三特征金字塔中的第一层特征图a1和第二层特征图b1,特征金字塔模块可以先对特征图a1进行卷积处理和双线性插值两倍上采样处理,例如处理后得到的特征图a11,并对特征图b1进行卷积处理,例如处理后得到的特征图b11,之后特征金字塔模块可以对特征图a11和特征图b11进行相加处理,并对相加处理后得到的特征图进行卷积处理,从而得到特征图b2,特征图b2的通道数为128。For example, for the feature map a1 of the first layer and the feature map b1 of the second layer in the third feature pyramid, the feature pyramid module can first perform convolution processing and bilinear interpolation double upsampling processing on the feature map a1, for example, after processing The obtained feature map a11, and the feature map b1 is subjected to convolution processing, such as the feature map b11 obtained after processing, and then the feature pyramid module can add the feature map a11 and the feature map b11, and obtain after the addition process. The feature map of is subjected to convolution processing to obtain the feature map b2, and the number of channels of the feature map b2 is 128.
并且,电子设备可以对特征图a1进行卷积处理,从而得到特征图a2,特征图a2的通道数为128。In addition, the electronic device can perform convolution processing on the feature map a1 to obtain the feature map a2, and the number of channels of the feature map a2 is 128.
对于第三特征金字塔中的第二层特征图b1和第三层特征图c1,特征金字塔模块可以先获取特征图a11和特征图b11并对二者进行相加处理,得到b12,再对特征图b12进行双线性插值2倍上采样处理得到特征图b13,然后对特征图c1进行卷积处理得到特征图c11,然后将特征图b13和特征图c11进行相加处理后再进行卷积处理得到特征图c2,特征图c2的通道数为128。For the second-layer feature map b1 and the third-layer feature map c1 in the third feature pyramid, the feature pyramid module can first obtain the feature map a11 and the feature map b11 and add them to obtain b12, and then analyze the feature map b12 performs bilinear interpolation 2 times upsampling processing to obtain feature map b13, and then performs convolution processing on feature map c1 to obtain feature map c11, then adds feature map b13 and feature map c11, and then performs convolution processing to obtain Feature map c2, the number of channels of feature map c2 is 128.
对于第三特征金字塔中的第三层特征图c1和第四层特征图d1,特征金字塔模块可以先对特征图d1进行卷积处理得到特征图d11,之后可以获取特征图b13和特征图c11进行相加处理后得到的特征图c12,再对特征图c12进行双线性插值2倍上采样处理得到特征图c13,将特征图c13和特征图d11进行相加 处理后再进行卷积处理得到特征图d2,特征图d2的通道数为128。For the third layer feature map c1 and the fourth layer feature map d1 in the third feature pyramid, the feature pyramid module can first perform convolution processing on the feature map d1 to obtain the feature map d11, and then obtain the feature map b13 and the feature map c11. The feature map c12 obtained after the addition processing, and then perform the bilinear interpolation 2 times upsampling process on the feature map c12 to obtain the feature map c13, add the feature map c13 and the feature map d11, and then perform the convolution process to obtain the feature Figure d2, the number of channels of the feature map d2 is 128.
即,第四特征金字塔有特征图a2、b2、c2、d2构成,它们的通道数均为128,其中,特征图a2、b2、c2、d2的分辨率依次为第三图像的1/32、1/16、1/8、1/4。That is, the fourth feature pyramid is composed of feature maps a2, b2, c2, and d2, and their number of channels is 128. The resolutions of feature maps a2, b2, c2, and d2 are 1/32 of the third image, 1/16, 1/8, 1/4.
之后,电子设备可以调用深层特征监督模块,将第四特征金字塔由上到下的各层特征图通过深层特征监督模块的4个上采样层,分别上采样32倍、16倍、8倍和4倍,从而得到和第三图像相同尺寸的掩模图像(即深监督预测掩模),例如将这4个深监督预测掩模分别记为Mask32、Mask16、Mask8、Mask4。After that, the electronic device can call the deep feature supervision module, and pass the feature maps of each layer of the fourth feature pyramid from top to bottom through the four upsampling layers of the deep feature supervision module, upsampling 32 times, 16 times, 8 times and 4 times respectively. times, so as to obtain a mask image with the same size as the third image (ie, a deep supervision prediction mask), for example, denote the four deep supervision prediction masks as Mask32, Mask16, Mask8, and Mask4, respectively.
并且,电子设备可以调用多尺度解码器模块对所述第四特征金字塔中的各图像进行一定的处理,以使该第四特征金字塔中的各层特征图的分辨率均为第三图像的1/4。例如,如图4所示,第四特征金字塔的第一层特征图a2可以依次经过两个第一网络块cgr2x和一个第二网络块sgr2x的计算后得到分辨率为第三图像的1/4的图像。第四特征金字塔的第二层特征图b2可以依次经过一个第一网络块cgr2x和一个第二网络块sgr2x的计算后得到分辨率为第三图像的1/4的图像。第四特征金字塔的第三层特征图c2可以经过一个第二网络块sgr2x的计算后得到分辨率为第三图像的1/4的图像。第四特征金字塔的第三层特征图d2可以经过一个第三网络块sgr的计算后得到分辨率为第三图像的1/4的图像。多尺度解码器模块可以将通过上述方式得到的4个分辨率均为第三图像的1/4的图像依次进行相加处理、卷积处理、4倍上采样处理,从而得到初步分割掩模,例如将该初步分割掩模记为第一初步分割掩模。In addition, the electronic device can call the multi-scale decoder module to perform certain processing on each image in the fourth feature pyramid, so that the resolutions of the feature maps of each layer in the fourth feature pyramid are equal to 1 of the third image. /4. For example, as shown in Figure 4, the feature map a2 of the first layer of the fourth feature pyramid can be calculated by two first network blocks cgr2x and one second network block sgr2x in turn to obtain a resolution of 1/4 of the third image Image. The feature map b2 of the second layer of the fourth feature pyramid can be sequentially calculated by a first network block cgr2x and a second network block sgr2x to obtain an image with a resolution of 1/4 of the third image. The third layer feature map c2 of the fourth feature pyramid can be calculated by a second network block sgr2x to obtain an image with a resolution of 1/4 of the third image. The third layer feature map d2 of the fourth feature pyramid can be calculated by a third network block sgr to obtain an image with a resolution of 1/4 of the third image. The multi-scale decoder module can sequentially perform addition processing, convolution processing, and 4-fold upsampling processing on the four images obtained in the above-mentioned manner with a resolution of 1/4 of the third image, thereby obtaining a preliminary segmentation mask, For example, the preliminary segmentation mask is denoted as the first preliminary segmentation mask.
之后,模型可以获取训练样本中用作训练模型的标注(Label)的第一标注分割掩模。可以理解的是,第一标注分割掩模即是本条训练样本中与第三图像对应的精确的人像分割掩模。After that, the model can obtain the first label segmentation mask used as the label (Label) of the training model in the training sample. It can be understood that the first labeled segmentation mask is the accurate portrait segmentation mask corresponding to the third image in this training sample.
之后,(包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、深层特征监督模块在内的)模型可以分别计算深层特征监督模块输出的4个深监督预测掩模Mask32、Mask16、Mask8、Mask4与第一标注分割掩模的交叉熵损失(cross entropy loss),以及第一初步分割掩模与第一标注分割掩模的交叉熵损失。之后,该模型可以根据计算得到的上述5个交叉熵损失,对该模型执行反向传播算法,并更新该模型的参数。After that, the model (including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module) can respectively calculate the four deep supervision prediction masks Mask32, Mask16, Mask8 output by the deep feature supervision module , the cross entropy loss between Mask4 and the first annotation segmentation mask, and the cross entropy loss between the first preliminary segmentation mask and the first annotation segmentation mask. After that, the model can perform a backpropagation algorithm on the model according to the above-mentioned five cross-entropy losses calculated, and update the parameters of the model.
在多个训练周期内,电子设备可以反复执行上述利用训练样本对(包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、深层特征监督模块在内的)模型进行训练的过程,直至该模型的损失函数完全收敛,保存该模型,且不冻结该模型的参数。In multiple training cycles, the electronic device can repeatedly perform the above-mentioned process of training the model (including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module) using the training samples, until The loss function of the model is fully converged, the model is saved, and the parameters of the model are not frozen.
在得到经过训练的包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、深层特征监督模块在内的模型之后,电子设备可以进行第二阶段的训练。在进行第二阶段的训练时,可以将经过第一阶段的训练得到的模型中的深层特征监督模块移除,并加入分割模块和边缘梯度模块。After obtaining the trained model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module, the electronic device can perform the second stage of training. In the second stage of training, the deep feature supervision module in the model obtained after the first stage of training can be removed, and a segmentation module and an edge gradient module can be added.
在本实施例中,所述边缘梯度模块用于计算训练样本中的输入图像对应的边缘梯度图;In this embodiment, the edge gradient module is used to calculate the edge gradient map corresponding to the input image in the training sample;
将训练样本中的输入图像输入至训练模型中,依次经过多尺度编码器模块、特征金字塔模块、多尺度解码器模块的处理得到第二初步分割掩模;Input the input image in the training sample into the training model, and obtain the second preliminary segmentation mask through the processing of the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module in turn;
将所述第二初步分割掩模和所述输入图像输入至分割模块,由分割模块输出精细分割掩模;inputting the second preliminary segmentation mask and the input image to a segmentation module, and the segmentation module outputs a fine segmentation mask;
所述边缘梯度模块用于计算所述精细分割掩模对应的边缘概率预测图;The edge gradient module is used to calculate the edge probability prediction map corresponding to the fine segmentation mask;
所述边缘梯度模块提供的边缘梯度损失函数用于计算所述边缘梯度图和所述边缘概率预测图之间的边缘梯度损失。The edge gradient loss function provided by the edge gradient module is used to calculate the edge gradient loss between the edge gradient map and the edge probability prediction map.
在本实施例中,基于包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、分割模块和边缘梯度模块的模型继续进行模型训练(即第二阶段的训练),包括如下流程:In this embodiment, the model training (that is, the second stage of training) is continued based on the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, the segmentation module and the edge gradient module, including the following process:
将训练样本中的输入图像输入至训练模型中,依次经过多尺度编码器模块、特征金字塔模块、多尺度解码器模块的处理得到第二初步分割掩模;Input the input image in the training sample into the training model, and obtain the second preliminary segmentation mask through the processing of the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module in turn;
将所述第二初步分割掩模和所述输入图像在通道维度进行拼接后输入至分割模块,由分割模块输出精细分割掩模;The second preliminary segmentation mask and the input image are spliced in the channel dimension and then input to the segmentation module, and the segmentation module outputs the fine segmentation mask;
将所述输入图像输入至所述边缘梯度模块,由所述边缘梯度模块调用其所包括的索贝尔算子对所述输入图像进行相应的计算,得到所述输入图像的梯度图;Inputting the input image to the edge gradient module, and the edge gradient module invokes the Sobel operator included in the edge gradient module to perform corresponding calculations on the input image to obtain a gradient map of the input image;
获取训练样本中用作标注的第二标注分割掩模;Obtain the second annotation segmentation mask used as annotation in the training sample;
由所述边缘梯度模块调用其所包括的膨胀腐蚀模块对所述第二标注分割掩模进行膨胀腐蚀处理,得到边缘掩模;Invoking the expansion and corrosion module included in the edge gradient module to perform expansion and corrosion processing on the second labeling segmentation mask to obtain an edge mask;
将所述输入图像的梯度图与所述边缘掩模相乘,得到所述输入图像对应的边缘梯度图;Multiplying the gradient map of the input image by the edge mask to obtain an edge gradient map corresponding to the input image;
将所述精细分割掩模与所述边缘掩模相乘,得到所述精细分割掩模对应的边缘概率预测图;multiplying the fine segmentation mask by the edge mask to obtain an edge probability prediction map corresponding to the fine segmentation mask;
计算所述精细分割掩模与所述第二标注分割掩模之间的交叉熵损失和结构相似性损失;calculating a cross-entropy loss and a structural similarity loss between the fine segmentation mask and the second annotated segmentation mask;
计算所述边缘梯度图与所述边缘概率预测图之间的边缘梯度损失;calculating the edge gradient loss between the edge gradient map and the edge probability prediction map;
对所述精细分割掩模与所述第二标注分割掩模之间的交叉熵损失和结构相似性损失以及所述边缘梯度图与所述边缘概率预测图之间的边缘梯度损失进行损失求和处理;Loss summation of the cross-entropy loss and structural similarity loss between the fine segmentation mask and the second annotated segmentation mask and the edge gradient loss between the edge gradient map and the edge probability prediction map deal with;
根据计算得到的损失之和对训练模型执行反向传播算法,更新模型参数;Perform back-propagation algorithm on the training model according to the sum of the calculated losses, and update the model parameters;
在多个训练周期内重复模型训练的过程直至模型的损失函数完全收敛,保存模型及其模型参数;Repeat the process of model training in multiple training cycles until the loss function of the model is completely converged, and save the model and its model parameters;
将训练完成后得到的模型中的边缘梯度模块去除后得到的模型确定为预训练图像分割模型。The model obtained after removing the edge gradient module in the model obtained after training is determined as the pre-trained image segmentation model.
比如,如图6所示,在进行第二阶段的训练时,电子设备可以获取训练样本中用于输入到多尺度编码器模块的图像,例如记为第四图像,并对该第四图像先进行预设处理,其中该预设处理可以包括随机裁剪和/或归一化处理。For example, as shown in FIG. 6 , during the second stage of training, the electronic device can obtain the image in the training sample for input to the multi-scale encoder module, for example, denoted as the fourth image, and firstly record the fourth image. Preset processing is performed, wherein the preset processing may include random cropping and/or normalization processing.
之后,电子设备可以将第四图像经过预设处理后得到的图像输入至训练模型中,依次经过多尺度编码器模块、特征金字塔模块、多尺度解码器模块的处理,得到初步分割掩模,例如记为第二初步分割掩模。After that, the electronic device can input the image obtained by the preset processing of the fourth image into the training model, and sequentially process the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module to obtain a preliminary segmentation mask, such as Denote as the second preliminary segmentation mask.
之后,电子设备可以调用训练模型对第四图像经过预设处理后得到的图像和第二初步分割掩模进行concat处理,并将concat处理后的图像传输至分割模块,其中concat处理为将两个图在通道维度进行拼接的处理。After that, the electronic device can call the training model to perform concat processing on the image obtained after the preset processing of the fourth image and the second preliminary segmentation mask, and transmit the concat-processed image to the segmentation module, where the concat processing is to combine the two The graph is stitched in the channel dimension.
输入至分割模块的图像依次经三个卷积网络块以及一个卷积层Conv处理后输出精细分割掩模。其中,每一卷积网络块是依次由卷积层、BN层以及ReLU层构成的网络块。需要说明的是,精细分割掩模为未进行argmax操作的双通道概率预测图。The image input to the segmentation module is sequentially processed by three convolutional network blocks and a convolutional layer Conv to output a fine segmentation mask. Among them, each convolutional network block is a network block composed of a convolutional layer, a BN layer and a ReLU layer in sequence. It should be noted that the fine segmentation mask is a two-channel probability prediction map without argmax operation.
之后,电子设备可以获取训练样本中用作训练模型的标注(Label)的第二标注分割掩模。可以理解的是,第二标注分割掩模即是本条训练样本中与第四图像对应的精确的人像分割掩模。电子设备可以将第二初步分割掩模、第二标注分割掩模、第四图像输入至边缘梯度模块。其中,第四图像被传输至边缘梯度模块的索贝尔Sobel算子模块,经过该索贝尔算子模块的索贝尔算子处理后得到第四图像的梯度图。Afterwards, the electronic device may acquire a second label segmentation mask used as a label (Label) of the training model in the training sample. It can be understood that the second labeled segmentation mask is the precise portrait segmentation mask corresponding to the fourth image in this training sample. The electronic device may input the second preliminary segmentation mask, the second annotation segmentation mask, and the fourth image to the edge gradient module. The fourth image is transmitted to the Sobel operator module of the edge gradient module, and the gradient map of the fourth image is obtained after being processed by the Sobel operator of the Sobel operator module.
之后,由边缘梯度模块调用其所包括的膨胀腐蚀模块对第二标注分割掩模进行膨胀腐蚀处理,得到边缘掩模,该边缘掩模即为人像的边缘(即一个由0和1构成的边缘掩模)。After that, the edge gradient module calls the expansion and corrosion module it includes to perform expansion and corrosion processing on the second label segmentation mask to obtain an edge mask, which is the edge of the portrait (that is, an edge composed of 0 and 1). mask).
之后,边缘梯度模块可以将第四图像的梯度图与边缘掩模相乘,得到第四图像的边缘梯度图,并将精细分割掩模与边缘掩模相乘,得到精细分割掩模的边缘概率预测图。After that, the edge gradient module can multiply the gradient map of the fourth image by the edge mask to obtain the edge gradient map of the fourth image, and multiply the fine segmentation mask by the edge mask to obtain the edge probability of the fine segmentation mask forecast graph.
之后,边缘梯度模块可以计算精细分割掩模与第二标注分割掩模之间的交叉熵损失和结构相似性损失(SSIM Loss),并计算第四图像的边缘梯度图与精细分割掩模的边缘概率预测图之间的边缘梯度损失(Edge Gradient Loss)。之后,边缘梯度模块可以对精细分割掩模与第二标注分割掩模之间的交叉熵损失和结构相似性损失以及第四图像的边缘梯度图与精细分割掩模的边缘概率预测图之间的边缘梯度损失进行求和处理。边缘梯度模块可以根据计算得到的损失之和对训练模型执行反向传播算法,更新模型参数。After that, the edge gradient module can calculate the cross-entropy loss and structural similarity loss (SSIM Loss) between the fine segmentation mask and the second annotation segmentation mask, and calculate the edge gradient map of the fourth image and the edge of the fine segmentation mask Edge Gradient Loss between probabilistic prediction graphs. After that, the edge gradient module can calculate the cross-entropy loss and structural similarity loss between the fine segmentation mask and the second annotated segmentation mask and the edge probability prediction map between the edge gradient map of the fourth image and the fine segmentation mask. The edge gradient losses are summed. The edge gradient module can perform a backpropagation algorithm on the trained model based on the sum of the calculated losses to update the model parameters.
在多个训练周期内可以重复上述第二阶段的训练过程直至训练模型的损失函数完全收敛,保存模型及其模型参数,将经过第二阶段的训练后得到的模型中的边缘梯度模块去除后得到的模型确定为预训练图像分割模型。The above-mentioned second-stage training process can be repeated in multiple training cycles until the loss function of the training model is completely converged, the model and its model parameters are saved, and the edge gradient module in the model obtained after the second-stage training is removed to obtain The model is identified as a pretrained image segmentation model.
通过上述方式即可训练得到预训练图像分割模型。The pre-trained image segmentation model can be obtained by training in the above manner.
需要说明的是,在其它实施方式中,也可以不对第三图像和第四图像进行预设处理,那么在应用预训练图像分割模型来输出图像的分割掩模时,也不需要对图像进行预设处理后再输入至预训练图像分割 模型。It should be noted that, in other embodiments, the third image and the fourth image may not be pre-processed, and when the pre-trained image segmentation model is applied to output the segmentation mask of the image, there is no need to pre-process the image. It is set to be processed and then input to the pre-trained image segmentation model.
在本申请实施例中,交叉熵损失、结构相似性损失、边缘梯度损失的计算方式均为采用现有技术中的计算方式,因此本申请实施例不做赘述。In the embodiments of the present application, the calculation methods of the cross-entropy loss, the structural similarity loss, and the edge gradient loss are all calculation methods in the prior art, and thus are not repeated in the embodiments of the present application.
需要说明的是,本申请在人像分割上应用了结构相似性损失,使得分割掩模在图像结构上与标注(Label)掩模保持一致,提供了额外的梯度来训练模型,降低假阳性的预测。It should be noted that this application applies a structural similarity loss to portrait segmentation, so that the segmentation mask is consistent with the label mask in image structure, and additional gradients are provided to train the model and reduce false positive predictions .
本申请设计了边缘梯度模块,激励分割掩模在边缘梯度上与输入图像图保持一致,对边缘的特征提供了额外的梯度,提升边缘上的精细化分割效果,降低假阳性的预测。This application designs an edge gradient module, which motivates the segmentation mask to be consistent with the input image map on the edge gradient, provides additional gradients for edge features, improves the refined segmentation effect on the edge, and reduces false positive predictions.
本申请采用轻量级的设计,在多尺度编码器中利用了轻量级的基础网络,实现了较小的计算量,因而可以部署到诸如手机等移动终端上。This application adopts a lightweight design, utilizes a lightweight basic network in the multi-scale encoder, and realizes a small amount of computation, so it can be deployed on mobile terminals such as mobile phones.
本申请提供了一种将结构相似性和边缘梯度相结合的轻量级人像分割模型,将精细化模块(即分割模块)和边缘梯度模块同时应用到人像分割模型中,两个模块相结合,共同改善边缘上的分割效果,提高人像分割准确度。The present application provides a lightweight portrait segmentation model that combines structural similarity and edge gradient. The refinement module (ie, the segmentation module) and the edge gradient module are simultaneously applied to the portrait segmentation model. The two modules are combined, Work together to improve the segmentation effect on the edge and improve the accuracy of portrait segmentation.
本申请在交叉熵损失之外,在模型训练时加入了结构相似性损失和边缘梯度损失,以提供额外的梯度。相比于交叉熵损失,结构相似性损失和边缘梯度损失更能促使模型关注边缘上的分割效果,额外的边缘上的梯度可促使边缘分割效果的提升。In addition to the cross-entropy loss, this application adds structural similarity loss and edge gradient loss during model training to provide additional gradients. Compared with the cross entropy loss, the structural similarity loss and edge gradient loss can make the model pay more attention to the segmentation effect on the edge, and the additional gradient on the edge can promote the improvement of the edge segmentation effect.
本申请设计了包含三个卷积网络块和一个卷积层组成的分割模块,提升分割效果的同时只引入较小的计算量。本申请中边缘梯度模块和深层特征监督模块在最终部署时被移除,因此不会增加额外的计算资源需求。This application designs a segmentation module consisting of three convolutional network blocks and one convolutional layer, which improves the segmentation effect while only introducing a small amount of computation. The edge gradient module and deep feature supervision module in this application are removed in the final deployment, so no additional computing resource requirements are added.
另外,在本申请中,分割模块可以设计得更为复杂,例如可以采用各种神经网络来实现,只需要最后输出分割掩模即可。比如分割模块中可以采用Resnet Block等等。In addition, in this application, the segmentation module can be designed to be more complex, for example, various neural networks can be used to implement it, and it only needs to output a segmentation mask at the end. For example, Resnet Block can be used in the segmentation module, etc.
在本申请中,特征金字塔的层数可以视乎具体数据集情况灵活调整,最大下采样倍数可为64倍,32倍,16倍等,下采样倍数越大则计算量越大,但能提供的高层次特征信息越多。多尺度编码器可以采用各种轻量级的基础网络实现,如ShuffleNet,MobileNetV3等。In this application, the number of layers of the feature pyramid can be adjusted flexibly depending on the specific data set. The maximum downsampling multiple can be 64 times, 32 times, 16 times, etc. more high-level feature information. The multi-scale encoder can be implemented using various lightweight basic networks, such as ShuffleNet, MobileNetV3, etc.
图2所示的图像处理方法的流程可以包括:The flow of the image processing method shown in FIG. 2 may include:
201、电子设备获取第一图像。201. The electronic device acquires a first image.
比如,在通过上文中所述的训练方式得到预训练图像分割模型后,电子设备可以利用该预训练图像分割模型来分割图像。例如,电子设备可以先获取第一图像。For example, after obtaining the pre-trained image segmentation model through the training method described above, the electronic device can use the pre-trained image segmentation model to segment images. For example, the electronic device may acquire the first image first.
202、电子设备对第一图像进行预设处理,其中该预设处理包括随机裁剪和/或归一化处理。202. The electronic device performs preset processing on the first image, where the preset processing includes random cropping and/or normalization processing.
比如,在获取到第一图像后,电子设备可以对该第一图像进行预设处理。该预设处理可以包括随机裁剪和/或归一化处理。For example, after acquiring the first image, the electronic device may perform preset processing on the first image. The preset processing may include random cropping and/or normalization processing.
203、电子设备获取预训练图像分割模型,所述预训练图像分割模型用于输出图像的人像分割掩模,所述预训练图像分割模型包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、分割模块,该多尺度编码器模块、特征金字塔模块、多尺度解码器模块、分割模块依次连接,该分割模块包括多个卷积网络块与至少一个卷积层,该多个卷积网络块依次连接后再与该至少一个卷积层连接,每一卷积网络块包括卷积层、批归一化层及非线性激活层。203. The electronic device obtains a pre-trained image segmentation model, where the pre-trained image segmentation model is used to output a portrait segmentation mask of an image, and the pre-trained image segmentation model includes a multi-scale encoder module, a feature pyramid module, and a multi-scale decoder module, segmentation module, the multi-scale encoder module, feature pyramid module, multi-scale decoder module and segmentation module are connected in sequence, the segmentation module includes a plurality of convolutional network blocks and at least one convolutional layer, the plurality of convolutional network The blocks are sequentially connected and then connected to the at least one convolutional layer, and each convolutional network block includes a convolutional layer, a batch normalization layer, and a nonlinear activation layer.
比如,电子设备还可以获取预训练图像分割模型,该预训练图像分割模型可以用于输出图像的人像分割掩模。该预训练图像分割模型可以包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、分割模块。该多尺度编码器模块、特征金字塔模块、多尺度解码器模块、分割模块依次连接,该分割模块包括多个卷积网络块与至少一个卷积层,该多个卷积网络块依次连接后再与该至少一个卷积层连接,每一卷积网络块包括卷积层、批归一化层及非线性激活层。需要说明的是,用户可以预先根据人像分割的需求来训练模型,从而使得预训练图像分割模型可以输出用户所需的人像分割掩模。For example, the electronic device can also acquire a pre-trained image segmentation model, and the pre-trained image segmentation model can be used to output a portrait segmentation mask of an image. The pretrained image segmentation model may include a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module, and a segmentation module. The multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the segmentation module are connected in sequence, and the segmentation module includes a plurality of convolutional network blocks and at least one convolutional layer, and the plurality of convolutional network blocks are connected in sequence before Connected to the at least one convolutional layer, each convolutional network block includes a convolutional layer, a batch normalization layer, and a nonlinear activation layer. It should be noted that the user can pre-train the model according to the requirements of portrait segmentation, so that the pre-trained image segmentation model can output the portrait segmentation mask required by the user.
204、电子设备将第一图像经过预设处理后得到的图像输入至预训练图像分割模型,由该预训练图像分割模型输出该第一图像对应的人像分割掩模。204. The electronic device inputs the image obtained by the preset processing of the first image into a pre-trained image segmentation model, and the pre-trained image segmentation model outputs a portrait segmentation mask corresponding to the first image.
比如,在获取到第一图像和预训练图像分割模型后,电子设备可以将该第一图像经过预设处理后得 到的图像输入至该预训练图像分割模型中,并由该预训练图像分割模型输出该第一图像对应的人像分割掩模。For example, after acquiring the first image and the pre-trained image segmentation model, the electronic device can input the image obtained by pre-processing the first image into the pre-trained image segmentation model, and the pre-trained image segmentation model A portrait segmentation mask corresponding to the first image is output.
205、电子设备根据第一图像对应的人像分割掩模从该第一图像中分割出人像。205. The electronic device segments the portrait from the first image according to the portrait segmentation mask corresponding to the first image.
比如,在得到第一图像对应的人像分割掩模后,电子设备可以根据该人像分割掩模从该第一图像出分割出对应的人像。For example, after obtaining the portrait segmentation mask corresponding to the first image, the electronic device may segment the corresponding portrait from the first image according to the portrait segmentation mask.
206、在从第一图像中分割出人像后,电子设备根据分割出的人像对第一图像进行背景虚化处理或背景替换处理或人像美颜处理。206. After segmenting the portrait from the first image, the electronic device performs background blur processing, background replacement processing, or portrait beauty processing on the first image according to the segmented portrait.
比如,在从第一图像中分割出人像后,电子设备可以根据分割出的人像对第一图像进行各种处理。例如,电子设备可以根据分割出的人像对第一图像进行背景虚化处理,或者电子设备可以根据分割出的人像对第一图像进行背景替换处理,或者电子设备可以根据分割出的人像对第一图像进行人像美颜处理,等等。For example, after segmenting the portrait from the first image, the electronic device may perform various processing on the first image according to the segmented portrait. For example, the electronic device may perform background blurring processing on the first image according to the segmented portrait, or the electronic device may perform background replacement processing on the first image according to the segmented portrait, or the electronic device may perform background blurring processing on the first image according to the segmented portrait. Image for portrait beautification, etc.
容易理解的是,由于电子设备从第一图像中分割出的人像的精度较高,因此电子设备根据分割出的人像对第一图像进行的背景虚化处理或背景替换处理或人像美颜处理等的处理效果都会更好,从而得到成像质量更好的图像。It is easy to understand that, since the portrait segmented by the electronic device from the first image has high precision, the electronic device performs background blurring processing, background replacement processing, or portrait beauty processing, etc. on the first image according to the segmented portrait. The processing effect will be better, resulting in better image quality.
请参阅图7,图7为本申请实施例提供的图像处理装置的结构示意图。图像处理装置300可以包括:第一获取模块301,第二获取模块302,处理模块303,分割模块304。Please refer to FIG. 7 , which is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application. The image processing apparatus 300 may include: a first acquisition module 301 , a second acquisition module 302 , a processing module 303 , and a segmentation module 304 .
第一获取模块301,用于获取第一图像;a first acquisition module 301, configured to acquire a first image;
第二获取模块302,用于获取预训练图像分割模型,所述预训练图像分割模型用于输出图像的分割掩模,所述预训练图像分割模型至少包括分割模块,所述分割模块包括多个卷积网络块与至少一个卷积层,所述多个卷积网络块依次连接后再与所述至少一个卷积层连接,每一所述卷积网络块包括卷积层、批归一化层及非线性激活层;The second acquisition module 302 is configured to acquire a pre-trained image segmentation model, the pre-trained image segmentation model is used to output a segmentation mask of an image, the pre-trained image segmentation model includes at least a segmentation module, and the segmentation module includes a plurality of A convolutional network block is connected to at least one convolutional layer, the plurality of convolutional network blocks are connected in sequence and then connected to the at least one convolutional layer, and each of the convolutional network blocks includes a convolutional layer, batch normalization layer and nonlinear activation layer;
处理模块303,用于将所述第一图像输入至所述预训练图像分割模型,由所述预训练图像分割模型输出所述第一图像对应的分割掩模;a processing module 303, configured to input the first image into the pre-trained image segmentation model, and output a segmentation mask corresponding to the first image from the pre-trained image segmentation model;
分割模块304,用于根据所述第一图像对应的分割掩模从所述第一图像中分割出第二图像。A segmentation module 304, configured to segment a second image from the first image according to a segmentation mask corresponding to the first image.
在一种实施方式中,所述预训练图像分割模型还包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块,所述多尺度编码器模块、特征金字塔模块、多尺度解码器模块、分割模块依次连接。In one embodiment, the pre-trained image segmentation model further includes a multi-scale encoder module, a feature pyramid module, and a multi-scale decoder module, the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, The split modules are connected in sequence.
在一种实施方式中,在进行模型训练时,用于得到所述预训练图像分割模型的训练模型还包括深层特征监督模块,所述深层特征监督模块与所述特征金字塔模块连接,所述深层特征监督模块用于从多个尺度对深层特征进行监督;In one embodiment, during model training, the training model used to obtain the pre-trained image segmentation model further includes a deep feature supervision module, the deep feature supervision module is connected to the feature pyramid module, and the deep feature supervision module is connected to the feature pyramid module. The feature supervision module is used to supervise deep features from multiple scales;
在模型训练时,训练模型中的多尺度解码器输出与训练样本对应的第一初步分割掩模,训练模型中的深层特征监督模块输出与所述训练样本对应的N个深监督预测掩模,N为特征金字塔的层数;During model training, the multi-scale decoder in the training model outputs a first preliminary segmentation mask corresponding to the training samples, and the deep feature supervision module in the training model outputs N deep supervision prediction masks corresponding to the training samples, N is the number of layers of the feature pyramid;
获取所述训练样本中用作标注的第一标注分割掩模;obtaining the first label segmentation mask used as label in the training sample;
分别计算所述N个深监督预测掩模中的每一个掩模与所述第一标注分割掩模的交叉熵损失,以及计算所述第一初步分割掩模与所述第一标注分割掩模的交叉熵损失;separately calculating the cross-entropy loss for each of the N deep-supervised prediction masks and the first annotation segmentation mask, and computing the first preliminary segmentation mask and the first annotation segmentation mask The cross entropy loss of ;
根据计算得到的多个交叉熵损失,对训练模型执行反向传播算法,更新模型参数;According to the calculated multiple cross-entropy losses, the back-propagation algorithm is performed on the training model to update the model parameters;
在多个训练周期内重复模型训练过程直至包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、深层特征监督模块的模型的损失函数完全收敛,保存模型且不冻结模型的参数。The model training process is repeated over multiple training cycles until the loss function of the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is fully converged, and the model is saved and the parameters of the model are not frozen.
在一种实施方式中,训练模型还包括边缘梯度模块,所述边缘梯度模块用于提供边缘梯度损失函数作为模型训练时的其中一个损失函数。In one embodiment, the training model further includes an edge gradient module, and the edge gradient module is configured to provide an edge gradient loss function as one of the loss functions during model training.
在一种实施方式中,所述边缘梯度模块用于计算训练样本中的输入图像对应的边缘梯度图;In one embodiment, the edge gradient module is used to calculate the edge gradient map corresponding to the input image in the training sample;
将训练样本中的输入图像输入至训练模型中,依次经过多尺度编码器模块、特征金字塔模块、多尺度解码器模块的处理得到第二初步分割掩模;Input the input image in the training sample into the training model, and obtain the second preliminary segmentation mask through the processing of the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module in turn;
将所述第二初步分割掩模和所述输入图像输入至分割模块,由分割模块输出精细分割掩模;inputting the second preliminary segmentation mask and the input image to a segmentation module, and the segmentation module outputs a fine segmentation mask;
所述边缘梯度模块用于计算所述精细分割掩模对应的边缘概率预测图;The edge gradient module is used to calculate the edge probability prediction map corresponding to the fine segmentation mask;
所述边缘梯度模块提供的边缘梯度损失函数用于计算所述边缘梯度图和所述边缘概率预测图之间的边缘梯度损失。The edge gradient loss function provided by the edge gradient module is used to calculate the edge gradient loss between the edge gradient map and the edge probability prediction map.
在一种实施方式中,将保存的包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、深层特征监督模块的模型中的所述深层特征监督模块移除,并加入分割模块和边缘梯度模块;In one embodiment, the deep feature supervision module in the saved model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is removed, and a segmentation module and an edge are added. gradient module;
基于包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、分割模块和边缘梯度模块的模型继续进行模型训练,包括如下流程:Model training continues based on the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, the segmentation module and the edge gradient module, including the following processes:
将训练样本中的输入图像输入至训练模型中,依次经过多尺度编码器模块、特征金字塔模块、多尺度解码器模块的处理得到第二初步分割掩模;Input the input image in the training sample into the training model, and obtain the second preliminary segmentation mask through the processing of the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module in turn;
将所述第二初步分割掩模和所述输入图像在通道维度进行拼接后输入至分割模块,由分割模块输出精细分割掩模;The second preliminary segmentation mask and the input image are spliced in the channel dimension and then input to the segmentation module, and the segmentation module outputs the fine segmentation mask;
将所述输入图像输入至所述边缘梯度模块,由所述边缘梯度模块调用其所包括的索贝尔算子对所述输入图像进行相应的计算,得到所述输入图像的梯度图;Inputting the input image to the edge gradient module, and the edge gradient module invokes the Sobel operator included in the edge gradient module to perform corresponding calculations on the input image to obtain a gradient map of the input image;
获取训练样本中用作标注的第二标注分割掩模;Obtain the second annotation segmentation mask used as annotation in the training sample;
由所述边缘梯度模块调用其所包括的膨胀腐蚀模块对所述第二标注分割掩模进行膨胀腐蚀处理,得到边缘掩模;Invoking the expansion and corrosion module included in the edge gradient module to perform expansion and corrosion processing on the second labeling segmentation mask to obtain an edge mask;
将所述输入图像的梯度图与所述边缘掩模相乘,得到所述输入图像对应的边缘梯度图;Multiplying the gradient map of the input image by the edge mask to obtain an edge gradient map corresponding to the input image;
将所述精细分割掩模与所述边缘掩模相乘,得到所述精细分割掩模对应的边缘概率预测图;multiplying the fine segmentation mask by the edge mask to obtain an edge probability prediction map corresponding to the fine segmentation mask;
计算所述精细分割掩模与所述第二标注分割掩模之间的交叉熵损失和结构相似性损失;calculating a cross-entropy loss and a structural similarity loss between the fine segmentation mask and the second annotated segmentation mask;
计算所述边缘梯度图与所述边缘概率预测图之间的边缘梯度损失;calculating the edge gradient loss between the edge gradient map and the edge probability prediction map;
对所述精细分割掩模与所述第二标注分割掩模之间的交叉熵损失和结构相似性损失以及所述边缘梯度图与所述边缘概率预测图之间的边缘梯度损失进行损失求和处理;Loss summation of the cross-entropy loss and structural similarity loss between the fine segmentation mask and the second annotated segmentation mask and the edge gradient loss between the edge gradient map and the edge probability prediction map deal with;
根据计算得到的损失之和对训练模型执行反向传播算法,更新模型参数;Perform back-propagation algorithm on the training model according to the sum of the calculated losses, and update the model parameters;
在多个训练周期内重复模型训练的过程直至模型的损失函数完全收敛,保存模型及其模型参数;Repeat the process of model training in multiple training cycles until the loss function of the model is completely converged, and save the model and its model parameters;
将训练完成后得到的模型中的边缘梯度模块去除后得到的模型确定为预训练图像分割模型。The model obtained after removing the edge gradient module in the model obtained after training is determined as the pre-trained image segmentation model.
在一种实施方式中,所述分割模块304还可以用于:In one embodiment, the segmentation module 304 may also be used to:
对所述第一图像进行预设处理,所述预设处理包括随机裁剪和/或归一化处理;performing preset processing on the first image, the preset processing including random cropping and/or normalization processing;
所述将所述第一图像输入至所述预训练图像分割模型,包括:将所述第一图像经过所述预设处理后得到的图像输入至所述预训练图像分割模型。The inputting the first image into the pre-trained image segmentation model includes: inputting the image obtained after the first image is subjected to the preset processing into the pre-trained image segmentation model.
本申请实施例提供一种计算机可读的存储介质,其上存储有计算机程序,当所述计算机程序在计算机上执行时,使得所述计算机执行如本实施例提供的图像处理方法中的流程。An embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed on a computer, causes the computer to execute the process in the image processing method provided in this embodiment.
本申请实施例还提供一种电子设备,包括存储器,处理器,所述处理器通过调用所述存储器中存储的计算机程序,用于执行本实施例提供的图像处理方法中的流程。An embodiment of the present application further provides an electronic device, including a memory and a processor, where the processor is configured to execute the process in the image processing method provided by the present embodiment by invoking a computer program stored in the memory.
例如,上述电子设备可以是诸如平板电脑或者智能手机等移动终端。请参阅图8,图8为本申请实施例提供的电子设备的结构示意图。For example, the above-mentioned electronic device may be a mobile terminal such as a tablet computer or a smart phone. Please refer to FIG. 8 , which is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
该电子设备400可以包括显示屏401、存储器402、处理器403等部件。本领域技术人员可以理解,图8中示出的电子设备结构并不构成对电子设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。The electronic device 400 may include components such as a display screen 401, a memory 402, a processor 403, and the like. Those skilled in the art can understand that the structure of the electronic device shown in FIG. 8 does not constitute a limitation on the electronic device, and may include more or less components than the one shown, or combine some components, or arrange different components.
显示屏401可以用于显示诸如文字、图像等信息。The display screen 401 may be used to display information such as text, images, and the like.
存储器402可用于存储应用程序和数据。存储器402存储的应用程序中包含有可执行代码。应用程序可以组成各种功能模块。处理器403通过运行存储在存储器402的应用程序,从而执行各种功能应用以及数据处理。 Memory 402 may be used to store applications and data. The application program stored in the memory 402 contains executable code. Applications can be composed of various functional modules. The processor 403 executes various functional applications and data processing by executing the application programs stored in the memory 402 .
处理器403是电子设备的控制中心,利用各种接口和线路连接整个电子设备的各个部分,通过运行或执行存储在存储器402内的应用程序,以及调用存储在存储器402内的数据,执行电子设备的各种功能和处理数据,从而对电子设备进行整体监控。The processor 403 is the control center of the electronic device, uses various interfaces and lines to connect various parts of the entire electronic device, and executes the electronic device by running or executing the application program stored in the memory 402 and calling the data stored in the memory 402. The various functions and processing data of the device are used to monitor the electronic equipment as a whole.
在本实施例中,电子设备中的处理器403会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行代码加载到存储器402中,并由处理器403来运行存储在存储器402中的应用程序,从而执行:In this embodiment, the processor 403 in the electronic device loads the executable code corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 403 executes the execution and stores it in the memory 402 in the application, thus executing:
获取第一图像;get the first image;
获取预训练图像分割模型,所述预训练图像分割模型用于输出图像的分割掩模,所述预训练图像分割模型至少包括分割模块,所述分割模块包括多个卷积网络块与至少一个卷积层,所述多个卷积网络块依次连接后再与所述至少一个卷积层连接,每一所述卷积网络块包括卷积层、批归一化层及非线性激活层;Acquire a pre-trained image segmentation model, the pre-trained image segmentation model is used to output a segmentation mask of an image, the pre-trained image segmentation model includes at least a segmentation module, and the segmentation module includes a plurality of convolutional network blocks and at least one volume Layering, the multiple convolutional network blocks are sequentially connected and then connected to the at least one convolutional layer, and each of the convolutional network blocks includes a convolutional layer, a batch normalization layer and a nonlinear activation layer;
将所述第一图像输入至所述预训练图像分割模型,由所述预训练图像分割模型输出所述第一图像对应的分割掩模;inputting the first image into the pre-training image segmentation model, and outputting a segmentation mask corresponding to the first image by the pre-training image segmentation model;
根据所述第一图像对应的分割掩模从所述第一图像中分割出第二图像。A second image is segmented from the first image according to a segmentation mask corresponding to the first image.
请参阅图9,电子设备400可以包括显示屏401、存储器402、处理器403、电池404、摄像模组405、扬声器406、麦克风407等部件。Referring to FIG. 9 , the electronic device 400 may include a display screen 401 , a memory 402 , a processor 403 , a battery 404 , a camera module 405 , a speaker 406 , a microphone 407 and other components.
显示屏401可以用于显示诸如图像、文字等信息。The display screen 401 may be used to display information such as images, text, and the like.
存储器402可用于存储应用程序和数据。存储器402存储的应用程序中包含有可执行代码。应用程序可以组成各种功能模块。处理器403通过运行存储在存储器402的应用程序,从而执行各种功能应用以及数据处理。 Memory 402 may be used to store applications and data. The application program stored in the memory 402 contains executable code. Applications can be composed of various functional modules. The processor 403 executes various functional applications and data processing by executing the application programs stored in the memory 402 .
处理器403是电子设备的控制中心,利用各种接口和线路连接整个电子设备的各个部分,通过运行或执行存储在存储器402内的应用程序,以及调用存储在存储器402内的数据,执行电子设备的各种功能和处理数据,从而对电子设备进行整体监控。The processor 403 is the control center of the electronic device, uses various interfaces and lines to connect various parts of the entire electronic device, and executes the electronic device by running or executing the application program stored in the memory 402 and calling the data stored in the memory 402. The various functions and processing data of the device are used to monitor the electronic equipment as a whole.
电池404可以用于为电子设备的各个模块和部件提供电力支持,从而保证电子设备的正常运行。The battery 404 can be used to provide power support for various modules and components of the electronic device, thereby ensuring the normal operation of the electronic device.
摄像模组405可以用于采集图像。The camera module 405 can be used to capture images.
扬声器406可以用于播放声音信号。 Speaker 406 may be used to play sound signals.
麦克风407可以用于采集周围环境中的声音信号。例如,麦克风407可以用于采集用户的语音指令。The microphone 407 may be used to collect sound signals in the surrounding environment. For example, the microphone 407 may be used to capture the user's voice commands.
在本实施例中,电子设备中的处理器403会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行代码加载到存储器402中,并由处理器403来运行存储在存储器402中的应用程序,从而执行:In this embodiment, the processor 403 in the electronic device loads the executable code corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 403 executes the execution and stores it in the memory 402 in the application, thus executing:
获取第一图像;get the first image;
获取预训练图像分割模型,所述预训练图像分割模型用于输出图像的分割掩模,所述预训练图像分割模型至少包括分割模块,所述分割模块包括多个卷积网络块与至少一个卷积层,所述多个卷积网络块依次连接后再与所述至少一个卷积层连接,每一所述卷积网络块包括卷积层、批归一化层及非线性激活层;Acquire a pre-trained image segmentation model, the pre-trained image segmentation model is used to output a segmentation mask of an image, the pre-trained image segmentation model includes at least a segmentation module, and the segmentation module includes a plurality of convolutional network blocks and at least one volume a stacking layer, wherein the multiple convolutional network blocks are connected in sequence and then connected to the at least one convolutional layer, and each of the convolutional network blocks includes a convolutional layer, a batch normalization layer and a nonlinear activation layer;
将所述第一图像输入至所述预训练图像分割模型,由所述预训练图像分割模型输出所述第一图像对应的分割掩模;inputting the first image into the pre-training image segmentation model, and outputting a segmentation mask corresponding to the first image by the pre-training image segmentation model;
根据所述第一图像对应的分割掩模从所述第一图像中分割出第二图像。A second image is segmented from the first image according to a segmentation mask corresponding to the first image.
在一种实施方式中,所述预训练图像分割模型还包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块,所述多尺度编码器模块、特征金字塔模块、多尺度解码器模块、分割模块依次连接。In one embodiment, the pre-trained image segmentation model further includes a multi-scale encoder module, a feature pyramid module, and a multi-scale decoder module, the multi-scale encoder module, feature pyramid module, multi-scale decoder module, The split modules are connected in sequence.
在一种实施方式中,在进行模型训练时,用于得到所述预训练图像分割模型的训练模型还包括深层特征监督模块,所述深层特征监督模块与所述特征金字塔模块连接,所述深层特征监督模块用于从多个尺度对深层特征进行监督;In one embodiment, during model training, the training model used to obtain the pre-trained image segmentation model further includes a deep feature supervision module, the deep feature supervision module is connected to the feature pyramid module, and the deep feature supervision module is connected to the feature pyramid module. The feature supervision module is used to supervise deep features from multiple scales;
在模型训练时,训练模型中的多尺度解码器输出与训练样本对应的第一初步分割掩模,训练模型中的深层特征监督模块输出与所述训练样本对应的N个深监督预测掩模,N为特征金字塔的层数;During model training, the multi-scale decoder in the training model outputs a first preliminary segmentation mask corresponding to the training samples, and the deep feature supervision module in the training model outputs N deep supervision prediction masks corresponding to the training samples, N is the number of layers of the feature pyramid;
获取所述训练样本中用作标注的第一标注分割掩模;obtaining the first label segmentation mask used as label in the training sample;
分别计算所述N个深监督预测掩模中的每一个掩模与所述第一标注分割掩模的交叉熵损失,以及计 算所述第一初步分割掩模与所述第一标注分割掩模的交叉熵损失;separately calculating the cross-entropy loss for each of the N deep-supervised prediction masks and the first annotation segmentation mask, and computing the first preliminary segmentation mask and the first annotation segmentation mask The cross entropy loss of ;
根据计算得到的多个交叉熵损失,对训练模型执行反向传播算法,更新模型参数;According to the calculated multiple cross-entropy losses, the back-propagation algorithm is performed on the training model to update the model parameters;
在多个训练周期内重复模型训练过程直至包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、深层特征监督模块的模型的损失函数完全收敛,保存模型且不冻结模型的参数。The model training process is repeated over multiple training cycles until the loss function of the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is fully converged, and the model is saved and the parameters of the model are not frozen.
在一种实施方式中,训练模型还包括边缘梯度模块,所述边缘梯度模块用于提供边缘梯度损失函数作为模型训练时的其中一个损失函数。In one embodiment, the training model further includes an edge gradient module, and the edge gradient module is configured to provide an edge gradient loss function as one of the loss functions during model training.
在一种实施方式中,所述边缘梯度模块用于计算训练样本中的输入图像对应的边缘梯度图;In one embodiment, the edge gradient module is used to calculate the edge gradient map corresponding to the input image in the training sample;
将训练样本中的输入图像输入至训练模型中,依次经过多尺度编码器模块、特征金字塔模块、多尺度解码器模块的处理得到第二初步分割掩模;Input the input image in the training sample into the training model, and obtain the second preliminary segmentation mask through the processing of the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module in turn;
将所述第二初步分割掩模和所述输入图像输入至分割模块,由分割模块输出精细分割掩模;inputting the second preliminary segmentation mask and the input image to a segmentation module, and the segmentation module outputs a fine segmentation mask;
所述边缘梯度模块用于计算所述精细分割掩模对应的边缘概率预测图;The edge gradient module is used to calculate the edge probability prediction map corresponding to the fine segmentation mask;
所述边缘梯度模块提供的边缘梯度损失函数用于计算所述边缘梯度图和所述边缘概率预测图之间的边缘梯度损失。The edge gradient loss function provided by the edge gradient module is used to calculate the edge gradient loss between the edge gradient map and the edge probability prediction map.
在一种实施方式中,将保存的包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、深层特征监督模块的模型中的所述深层特征监督模块移除,并加入分割模块和边缘梯度模块;In one embodiment, the deep feature supervision module in the saved model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is removed, and a segmentation module and an edge are added. gradient module;
基于包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、分割模块和边缘梯度模块的模型继续进行模型训练,包括如下流程:Model training continues based on the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, the segmentation module and the edge gradient module, including the following process:
将训练样本中的输入图像输入至训练模型中,依次经过多尺度编码器模块、特征金字塔模块、多尺度解码器模块的处理得到第二初步分割掩模;Input the input image in the training sample into the training model, and obtain the second preliminary segmentation mask through the processing of the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module in turn;
将所述第二初步分割掩模和所述输入图像在通道维度进行拼接后输入至分割模块,由分割模块输出精细分割掩模;The second preliminary segmentation mask and the input image are spliced in the channel dimension and then input to the segmentation module, and the segmentation module outputs the fine segmentation mask;
将所述输入图像输入至所述边缘梯度模块,由所述边缘梯度模块调用其所包括的索贝尔算子对所述输入图像进行相应的计算,得到所述输入图像的梯度图;Inputting the input image to the edge gradient module, and the edge gradient module invokes the Sobel operator included in the edge gradient module to perform corresponding calculations on the input image to obtain a gradient map of the input image;
获取训练样本中用作标注的第二标注分割掩模;Obtain the second annotation segmentation mask used as annotation in the training sample;
由所述边缘梯度模块调用其所包括的膨胀腐蚀模块对所述第二标注分割掩模进行膨胀腐蚀处理,得到边缘掩模;Invoking the expansion and corrosion module included in the edge gradient module to perform expansion and corrosion processing on the second labeling segmentation mask to obtain an edge mask;
将所述输入图像的梯度图与所述边缘掩模相乘,得到所述输入图像对应的边缘梯度图;Multiplying the gradient map of the input image by the edge mask to obtain an edge gradient map corresponding to the input image;
将所述精细分割掩模与所述边缘掩模相乘,得到所述精细分割掩模对应的边缘概率预测图;multiplying the fine segmentation mask by the edge mask to obtain an edge probability prediction map corresponding to the fine segmentation mask;
计算所述精细分割掩模与所述第二标注分割掩模之间的交叉熵损失和结构相似性损失;calculating a cross-entropy loss and a structural similarity loss between the fine segmentation mask and the second annotated segmentation mask;
计算所述边缘梯度图与所述边缘概率预测图之间的边缘梯度损失;calculating the edge gradient loss between the edge gradient map and the edge probability prediction map;
对所述精细分割掩模与所述第二标注分割掩模之间的交叉熵损失和结构相似性损失以及所述边缘梯度图与所述边缘概率预测图之间的边缘梯度损失进行损失求和处理;Loss sums the cross-entropy loss and structural similarity loss between the fine segmentation mask and the second annotated segmentation mask and the edge gradient loss between the edge gradient map and the edge probability prediction map deal with;
根据计算得到的损失之和对训练模型执行反向传播算法,更新模型参数;Perform back-propagation algorithm on the training model according to the sum of the calculated losses, and update the model parameters;
在多个训练周期内重复模型训练的过程直至模型的损失函数完全收敛,保存模型及其模型参数;Repeat the process of model training in multiple training cycles until the loss function of the model is completely converged, and save the model and its model parameters;
将训练完成后得到的模型中的边缘梯度模块去除后得到的模型确定为预训练图像分割模型。The model obtained after removing the edge gradient module in the model obtained after training is determined as the pre-trained image segmentation model.
在一种实施方式中,所述处理器403还可以执行:In one embodiment, the processor 403 may also execute:
对所述第一图像进行预设处理,所述预设处理包括随机裁剪和/或归一化处理;performing preset processing on the first image, the preset processing including random cropping and/or normalization processing;
所述将所述第一图像输入至所述预训练图像分割模型,包括:将所述第一图像经过所述预设处理后得到的图像输入至所述预训练图像分割模型。The inputting the first image into the pre-trained image segmentation model includes: inputting the image obtained after the preset processing of the first image into the pre-trained image segmentation model.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见上文针对图像处理方法的详细描述,此处不再赘述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to the detailed description of the image processing method above, and details are not repeated here.
本申请实施例提供的所述图像处理装置与上文实施例中的图像处理方法属于同一构思,在所述图像处理装置上可以运行所述图像处理方法实施例中提供的任一方法,其具体实现过程详见所述图像处理方法实施例,此处不再赘述。The image processing apparatus provided in the embodiments of the present application and the image processing methods in the above embodiments belong to the same concept, and any method provided in the image processing method embodiments can be executed on the image processing apparatus. For the implementation process, please refer to the embodiment of the image processing method, which will not be repeated here.
需要说明的是,对本申请实施例所述图像处理方法而言,本领域普通技术人员可以理解实现本申请实施例所述图像处理方法的全部或部分流程,是可以通过计算机程序来控制相关的硬件来完成,所述计算机程序可存储于一计算机可读取存储介质中,如存储在存储器中,并被至少一个处理器执行,在执行过程中可包括如所述图像处理方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)等。It should be noted that, for the image processing methods described in the embodiments of the present application, those of ordinary skill in the art can understand that all or part of the process for implementing the image processing methods described in the embodiments of the present application can be controlled by computer programs. To complete, the computer program can be stored in a computer-readable storage medium, such as a memory, and executed by at least one processor, and the execution process can include the flow of the embodiment of the image processing method . Wherein, the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), and the like.
对本申请实施例的所述图像处理装置而言,其各功能模块可以集成在一个处理芯片中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中,所述存储介质譬如为只读存储器,磁盘或光盘等。For the image processing apparatus of the embodiments of the present application, each functional module may be integrated into one processing chip, or each module may exist physically alone, or two or more modules may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium, such as a read-only memory, a magnetic disk or an optical disk, etc. .
以上对本申请实施例所提供的一种图像处理方法、装置、存储介质以及电子设备进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The image processing method, device, storage medium, and electronic device provided by the embodiments of the present application are described in detail above. The principles and implementations of the present application are described with specific examples. The descriptions of the above embodiments are only It is used to help understand the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there will be changes in the specific embodiments and application scope. In summary, this specification The content should not be construed as a limitation on this application.

Claims (20)

  1. 一种图像处理方法,其中,所述方法包括:An image processing method, wherein the method comprises:
    获取第一图像;get the first image;
    获取预训练图像分割模型,所述预训练图像分割模型用于输出图像的分割掩模,所述预训练图像分割模型至少包括分割模块,所述分割模块包括多个卷积网络块与至少一个卷积层,所述多个卷积网络块依次连接后再与所述至少一个卷积层连接,每一所述卷积网络块包括卷积层、批归一化层及非线性激活层;Acquire a pre-trained image segmentation model, the pre-trained image segmentation model is used to output a segmentation mask of an image, the pre-trained image segmentation model includes at least a segmentation module, and the segmentation module includes a plurality of convolutional network blocks and at least one volume Layering, the multiple convolutional network blocks are sequentially connected and then connected to the at least one convolutional layer, and each of the convolutional network blocks includes a convolutional layer, a batch normalization layer and a nonlinear activation layer;
    将所述第一图像输入至所述预训练图像分割模型,由所述预训练图像分割模型输出所述第一图像对应的分割掩模;inputting the first image into the pre-training image segmentation model, and outputting a segmentation mask corresponding to the first image by the pre-training image segmentation model;
    根据所述第一图像对应的分割掩模从所述第一图像中分割出第二图像。A second image is segmented from the first image according to a segmentation mask corresponding to the first image.
  2. 根据权利要求1所述的图像处理方法,其中,所述预训练图像分割模型还包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块,所述多尺度编码器模块、特征金字塔模块、多尺度解码器模块、分割模块依次连接。The image processing method according to claim 1, wherein the pre-trained image segmentation model further comprises a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module, the multi-scale encoder module, the feature pyramid module, The multi-scale decoder module and the segmentation module are connected in sequence.
  3. 根据权利要求2所述的图像处理方法,其中,在进行模型训练时,用于得到所述预训练图像分割模型的训练模型还包括深层特征监督模块,所述深层特征监督模块与所述特征金字塔模块连接,所述深层特征监督模块用于从多个尺度对深层特征进行监督;The image processing method according to claim 2, wherein, during model training, the training model used to obtain the pre-trained image segmentation model further comprises a deep feature supervision module, the deep feature supervision module and the feature pyramid module connection, the deep feature supervision module is used to supervise the deep features from multiple scales;
    在模型训练时,训练模型中的多尺度解码器输出与训练样本对应的第一初步分割掩模,训练模型中的深层特征监督模块输出与所述训练样本对应的N个深监督预测掩模,N为特征金字塔的层数;During model training, the multi-scale decoder in the training model outputs a first preliminary segmentation mask corresponding to the training samples, and the deep feature supervision module in the training model outputs N deep supervision prediction masks corresponding to the training samples, N is the number of layers of the feature pyramid;
    获取所述训练样本中用作标注的第一标注分割掩模;obtaining the first label segmentation mask used as label in the training sample;
    分别计算所述N个深监督预测掩模中的每一个掩模与所述第一标注分割掩模的交叉熵损失,以及计算所述第一初步分割掩模与所述第一标注分割掩模的交叉熵损失;separately calculating the cross-entropy loss for each of the N deep-supervised prediction masks and the first annotation segmentation mask, and computing the first preliminary segmentation mask and the first annotation segmentation mask The cross entropy loss of ;
    根据计算得到的多个交叉熵损失,对训练模型执行反向传播算法,更新模型参数;According to the calculated multiple cross-entropy losses, the back-propagation algorithm is performed on the training model to update the model parameters;
    在多个训练周期内重复模型训练过程直至包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、深层特征监督模块的模型的损失函数完全收敛,保存模型且不冻结模型的参数。The model training process is repeated over multiple training cycles until the loss function of the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is fully converged, the model is saved and the parameters of the model are not frozen.
  4. 根据权利要求3所述的图像处理方法,其中,训练模型还包括边缘梯度模块,所述边缘梯度模块用于提供边缘梯度损失函数作为模型训练时的其中一个损失函数。The image processing method according to claim 3, wherein the training model further comprises an edge gradient module, and the edge gradient module is configured to provide an edge gradient loss function as one of the loss functions during model training.
  5. 根据权利要求4所述的图像处理方法,其中,所述边缘梯度模块用于计算训练样本中的输入图像对应的边缘梯度图;The image processing method according to claim 4, wherein the edge gradient module is used to calculate the edge gradient map corresponding to the input image in the training sample;
    将训练样本中的输入图像输入至训练模型中,依次经过多尺度编码器模块、特征金字塔模块、多尺度解码器模块的处理得到第二初步分割掩模;Input the input image in the training sample into the training model, and obtain the second preliminary segmentation mask through the processing of the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module in turn;
    将所述第二初步分割掩模和所述输入图像输入至分割模块,由分割模块输出精细分割掩模;inputting the second preliminary segmentation mask and the input image to a segmentation module, and the segmentation module outputs a fine segmentation mask;
    所述边缘梯度模块用于计算所述精细分割掩模对应的边缘概率预测图;The edge gradient module is used to calculate the edge probability prediction map corresponding to the fine segmentation mask;
    所述边缘梯度模块提供的边缘梯度损失函数用于计算所述边缘梯度图和所述边缘概率预测图之间的边缘梯度损失。The edge gradient loss function provided by the edge gradient module is used to calculate the edge gradient loss between the edge gradient map and the edge probability prediction map.
  6. 根据权利要求5所述的图像处理方法,其中,将保存的包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、深层特征监督模块的模型中的所述深层特征监督模块移除,并加入分割模块和边缘梯度模块;The image processing method according to claim 5, wherein the deep feature supervision module in the saved model comprising a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module, and a deep feature supervision module is removed, And add segmentation module and edge gradient module;
    基于包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、分割模块和边缘梯度模块的模型继续进行模型训练,包括如下流程:Model training continues based on the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, the segmentation module and the edge gradient module, including the following process:
    将训练样本中的输入图像输入至训练模型中,依次经过多尺度编码器模块、特征金字塔模块、多尺度解码器模块的处理得到第二初步分割掩模;Input the input image in the training sample into the training model, and obtain the second preliminary segmentation mask through the processing of the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module in turn;
    将所述第二初步分割掩模和所述输入图像在通道维度进行拼接后输入至分割模块,由分割模块输出精细分割掩模;The second preliminary segmentation mask and the input image are spliced in the channel dimension and then input to the segmentation module, and the segmentation module outputs the fine segmentation mask;
    将所述输入图像输入至所述边缘梯度模块,由所述边缘梯度模块调用其所包括的索贝尔算子对所述 输入图像进行相应的计算,得到所述输入图像的梯度图;The input image is input to the edge gradient module, and the Sobel operator it includes is called by the edge gradient module to carry out corresponding calculation to the input image to obtain the gradient map of the input image;
    获取训练样本中用作标注的第二标注分割掩模;Obtain the second annotation segmentation mask used as annotation in the training sample;
    由所述边缘梯度模块调用其所包括的膨胀腐蚀模块对所述第二标注分割掩模进行膨胀腐蚀处理,得到边缘掩模;Invoking the expansion and corrosion module included in the edge gradient module to perform expansion and corrosion processing on the second labeling segmentation mask to obtain an edge mask;
    将所述输入图像的梯度图与所述边缘掩模相乘,得到所述输入图像对应的边缘梯度图;Multiplying the gradient map of the input image by the edge mask to obtain an edge gradient map corresponding to the input image;
    将所述精细分割掩模与所述边缘掩模相乘,得到所述精细分割掩模对应的边缘概率预测图;multiplying the fine segmentation mask by the edge mask to obtain an edge probability prediction map corresponding to the fine segmentation mask;
    计算所述精细分割掩模与所述第二标注分割掩模之间的交叉熵损失和结构相似性损失;calculating a cross-entropy loss and a structural similarity loss between the fine segmentation mask and the second annotated segmentation mask;
    计算所述边缘梯度图与所述边缘概率预测图之间的边缘梯度损失;calculating the edge gradient loss between the edge gradient map and the edge probability prediction map;
    对所述精细分割掩模与所述第二标注分割掩模之间的交叉熵损失和结构相似性损失以及所述边缘梯度图与所述边缘概率预测图之间的边缘梯度损失进行损失求和处理;Loss summation of the cross-entropy loss and structural similarity loss between the fine segmentation mask and the second annotated segmentation mask and the edge gradient loss between the edge gradient map and the edge probability prediction map deal with;
    根据计算得到的损失之和对训练模型执行反向传播算法,更新模型参数;Perform back-propagation algorithm on the training model according to the sum of the calculated losses, and update the model parameters;
    在多个训练周期内重复模型训练的过程直至模型的损失函数完全收敛,保存模型及其模型参数;Repeat the process of model training in multiple training cycles until the loss function of the model is completely converged, and save the model and its model parameters;
    将训练完成后得到的模型中的边缘梯度模块去除后得到的模型确定为预训练图像分割模型。The model obtained after removing the edge gradient module in the model obtained after training is determined as the pre-trained image segmentation model.
  7. 根据权利要求1所述的图像处理方法,其中,所述方法还包括:The image processing method according to claim 1, wherein the method further comprises:
    对所述第一图像进行预设处理,所述预设处理包括随机裁剪和/或归一化处理;performing preset processing on the first image, the preset processing including random cropping and/or normalization processing;
    所述将所述第一图像输入至所述预训练图像分割模型,包括:将所述第一图像经过所述预设处理后得到的图像输入至所述预训练图像分割模型。The inputting the first image into the pre-trained image segmentation model includes: inputting the image obtained after the first image is subjected to the preset processing into the pre-trained image segmentation model.
  8. 一种图像处理装置,其中,所述装置包括:An image processing device, wherein the device comprises:
    第一获取模块,用于获取第一图像;a first acquisition module, configured to acquire a first image;
    第二获取模块,用于获取预训练图像分割模型,所述预训练图像分割模型用于输出图像的分割掩模,所述预训练图像分割模型至少包括分割模块,所述分割模块包括多个卷积网络块与至少一个卷积层,所述多个卷积网络块依次连接后再与所述至少一个卷积层连接,每一所述卷积网络块包括卷积层、批归一化层及非线性激活层;The second acquisition module is used to acquire a pre-trained image segmentation model, the pre-trained image segmentation model is used to output a segmentation mask of the image, the pre-trained image segmentation model includes at least a segmentation module, and the segmentation module includes a plurality of volumes The multiple convolutional network blocks are connected in sequence and then connected to the at least one convolutional layer. Each of the convolutional network blocks includes a convolutional layer and a batch normalization layer. and nonlinear activation layer;
    处理模块,用于将所述第一图像输入至所述预训练图像分割模型,由所述预训练图像分割模型输出所述第一图像对应的分割掩模;a processing module, configured to input the first image into the pre-trained image segmentation model, and output a segmentation mask corresponding to the first image from the pre-trained image segmentation model;
    分割模块,用于根据所述第一图像对应的分割掩模从所述第一图像中分割出第二图像。A segmentation module, configured to segment a second image from the first image according to a segmentation mask corresponding to the first image.
  9. 根据权利要求8所述的图像处理装置,其中,所述预训练图像分割模型还包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块,所述多尺度编码器模块、特征金字塔模块、多尺度解码器模块、分割模块依次连接。The image processing apparatus according to claim 8, wherein the pre-trained image segmentation model further comprises a multi-scale encoder module, a feature pyramid module, and a multi-scale decoder module, the multi-scale encoder module, the feature pyramid module, The multi-scale decoder module and the segmentation module are connected in sequence.
  10. 根据权利要求9所述的图像处理装置,其中,在进行模型训练时,用于得到所述预训练图像分割模型的训练模型还包括深层特征监督模块,所述深层特征监督模块与所述特征金字塔模块连接,所述深层特征监督模块用于从多个尺度对深层特征进行监督;The image processing apparatus according to claim 9, wherein, during model training, the training model used to obtain the pre-trained image segmentation model further comprises a deep feature supervision module, the deep feature supervision module and the feature pyramid module connection, the deep feature supervision module is used to supervise the deep features from multiple scales;
    在模型训练时,训练模型中的多尺度解码器输出与训练样本对应的第一初步分割掩模,训练模型中的深层特征监督模块输出与所述训练样本对应的N个深监督预测掩模,N为特征金字塔的层数;During model training, the multi-scale decoder in the training model outputs a first preliminary segmentation mask corresponding to the training samples, and the deep feature supervision module in the training model outputs N deep supervision prediction masks corresponding to the training samples, N is the number of layers of the feature pyramid;
    获取所述训练样本中用作标注的第一标注分割掩模;obtaining the first label segmentation mask used as label in the training sample;
    分别计算所述N个深监督预测掩模中的每一个掩模与所述第一标注分割掩模的交叉熵损失,以及计算所述第一初步分割掩模与所述第一标注分割掩模的交叉熵损失;separately calculating the cross-entropy loss for each of the N deep-supervised prediction masks and the first annotation segmentation mask, and computing the first preliminary segmentation mask and the first annotation segmentation mask The cross entropy loss of ;
    根据计算得到的多个交叉熵损失,对训练模型执行反向传播算法,更新模型参数;According to the calculated multiple cross-entropy losses, the back-propagation algorithm is performed on the training model to update the model parameters;
    在多个训练周期内重复模型训练过程直至包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、深层特征监督模块的模型的损失函数完全收敛,保存模型且不冻结模型的参数。The model training process is repeated over multiple training cycles until the loss function of the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is fully converged, the model is saved and the parameters of the model are not frozen.
  11. 根据权利要求10所述的图像处理装置,其中,训练模型还包括边缘梯度模块,所述边缘梯度模块用于提供边缘梯度损失函数作为模型训练时的其中一个损失函数。The image processing apparatus according to claim 10, wherein the training model further comprises an edge gradient module, and the edge gradient module is configured to provide an edge gradient loss function as one of the loss functions during model training.
  12. 根据权利要求11所述的图像处理装置,其中,所述边缘梯度模块用于计算训练样本中的输入图像对应的边缘梯度图;The image processing apparatus according to claim 11, wherein the edge gradient module is used to calculate the edge gradient map corresponding to the input image in the training sample;
    将训练样本中的输入图像输入至训练模型中,依次经过多尺度编码器模块、特征金字塔模块、多尺度解码器模块的处理得到第二初步分割掩模;Input the input image in the training sample into the training model, and obtain the second preliminary segmentation mask through the processing of the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module in turn;
    将所述第二初步分割掩模和所述输入图像输入至分割模块,由分割模块输出精细分割掩模;inputting the second preliminary segmentation mask and the input image to a segmentation module, and the segmentation module outputs a fine segmentation mask;
    所述边缘梯度模块用于计算所述精细分割掩模对应的边缘概率预测图;The edge gradient module is used to calculate the edge probability prediction map corresponding to the fine segmentation mask;
    所述边缘梯度模块提供的边缘梯度损失函数用于计算所述边缘梯度图和所述边缘概率预测图之间的边缘梯度损失。The edge gradient loss function provided by the edge gradient module is used to calculate the edge gradient loss between the edge gradient map and the edge probability prediction map.
  13. 一种计算机可读的存储介质,其上存储有计算机程序,其中,当所述计算机程序在计算机上执行时,使得所述计算机执行如权利要求1所述的方法。A computer-readable storage medium having a computer program stored thereon, wherein, when the computer program is executed on a computer, the computer is caused to perform the method of claim 1 .
  14. 一种电子设备,包括存储器,处理器,其中,所述处理器通过调用所述存储器中存储的计算机程序,以执行:获取第一图像;An electronic device comprising a memory and a processor, wherein the processor executes by calling a computer program stored in the memory: acquiring a first image;
    获取预训练图像分割模型,所述预训练图像分割模型用于输出图像的分割掩模,所述预训练图像分割模型至少包括分割模块,所述分割模块包括多个卷积网络块与至少一个卷积层,所述多个卷积网络块依次连接后再与所述至少一个卷积层连接,每一所述卷积网络块包括卷积层、批归一化层及非线性激活层;Acquire a pre-trained image segmentation model, the pre-trained image segmentation model is used to output a segmentation mask of an image, the pre-trained image segmentation model includes at least a segmentation module, and the segmentation module includes a plurality of convolutional network blocks and at least one volume Layering, the multiple convolutional network blocks are sequentially connected and then connected to the at least one convolutional layer, and each of the convolutional network blocks includes a convolutional layer, a batch normalization layer and a nonlinear activation layer;
    将所述第一图像输入至所述预训练图像分割模型,由所述预训练图像分割模型输出所述第一图像对应的分割掩模;inputting the first image into the pre-training image segmentation model, and outputting a segmentation mask corresponding to the first image by the pre-training image segmentation model;
    根据所述第一图像对应的分割掩模从所述第一图像中分割出第二图像。A second image is segmented from the first image according to a segmentation mask corresponding to the first image.
  15. 根据权利要求14所述的电子设备,其中,所述预训练图像分割模型还包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块,所述多尺度编码器模块、特征金字塔模块、多尺度解码器模块、分割模块依次连接。The electronic device according to claim 14, wherein the pre-trained image segmentation model further comprises a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module, the multi-scale encoder module, the feature pyramid module, the multi-scale encoder module, the multi-scale The scale decoder module and the segmentation module are connected in sequence.
  16. 根据权利要求15所述的电子设备,其中,在进行模型训练时,用于得到所述预训练图像分割模型的训练模型还包括深层特征监督模块,所述深层特征监督模块与所述特征金字塔模块连接,所述深层特征监督模块用于从多个尺度对深层特征进行监督;The electronic device according to claim 15, wherein, during model training, the training model for obtaining the pre-trained image segmentation model further comprises a deep feature supervision module, the deep feature supervision module and the feature pyramid module connection, the deep feature supervision module is used to supervise the deep features from multiple scales;
    在模型训练时,训练模型中的多尺度解码器输出与训练样本对应的第一初步分割掩模,训练模型中的深层特征监督模块输出与所述训练样本对应的N个深监督预测掩模,N为特征金字塔的层数;During model training, the multi-scale decoder in the training model outputs a first preliminary segmentation mask corresponding to the training samples, and the deep feature supervision module in the training model outputs N deep supervision prediction masks corresponding to the training samples, N is the number of layers of the feature pyramid;
    获取所述训练样本中用作标注的第一标注分割掩模;obtaining the first label segmentation mask used as label in the training sample;
    分别计算所述N个深监督预测掩模中的每一个掩模与所述第一标注分割掩模的交叉熵损失,以及计算所述第一初步分割掩模与所述第一标注分割掩模的交叉熵损失;separately calculating the cross-entropy loss for each of the N deep-supervised prediction masks and the first annotation segmentation mask, and computing the first preliminary segmentation mask and the first annotation segmentation mask The cross entropy loss of ;
    根据计算得到的多个交叉熵损失,对训练模型执行反向传播算法,更新模型参数;According to the calculated multiple cross-entropy losses, the back-propagation algorithm is performed on the training model to update the model parameters;
    在多个训练周期内重复模型训练过程直至包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、深层特征监督模块的模型的损失函数完全收敛,保存模型且不冻结模型的参数。The model training process is repeated over multiple training cycles until the loss function of the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is fully converged, the model is saved and the parameters of the model are not frozen.
  17. 根据权利要求16所述的电子设备,其中,训练模型还包括边缘梯度模块,所述边缘梯度模块用于提供边缘梯度损失函数作为模型训练时的其中一个损失函数。The electronic device according to claim 16, wherein the training model further comprises an edge gradient module, the edge gradient module is configured to provide an edge gradient loss function as one of the loss functions during model training.
  18. 根据权利要求17所述的电子设备,其中,所述边缘梯度模块用于计算训练样本中的输入图像对应的边缘梯度图;The electronic device according to claim 17, wherein the edge gradient module is used to calculate the edge gradient map corresponding to the input image in the training sample;
    将训练样本中的输入图像输入至训练模型中,依次经过多尺度编码器模块、特征金字塔模块、多尺度解码器模块的处理得到第二初步分割掩模;Input the input image in the training sample into the training model, and obtain the second preliminary segmentation mask through the processing of the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module in turn;
    将所述第二初步分割掩模和所述输入图像输入至分割模块,由分割模块输出精细分割掩模;inputting the second preliminary segmentation mask and the input image to a segmentation module, and the segmentation module outputs a fine segmentation mask;
    所述边缘梯度模块用于计算所述精细分割掩模对应的边缘概率预测图;The edge gradient module is used to calculate the edge probability prediction map corresponding to the fine segmentation mask;
    所述边缘梯度模块提供的边缘梯度损失函数用于计算所述边缘梯度图和所述边缘概率预测图之间的边缘梯度损失。The edge gradient loss function provided by the edge gradient module is used to calculate the edge gradient loss between the edge gradient map and the edge probability prediction map.
  19. 根据权利要求18所述的电子设备,其中,将保存的包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、深层特征监督模块的模型中的所述深层特征监督模块移除,并加入分割模块和边缘梯度模块;The electronic device according to claim 18, wherein the deep feature supervision module in the saved model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is removed, and Add segmentation module and edge gradient module;
    基于包括多尺度编码器模块、特征金字塔模块、多尺度解码器模块、分割模块和边缘梯度模块的模型继续进行模型训练,包括如下流程:Model training continues based on the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, the segmentation module and the edge gradient module, including the following process:
    将训练样本中的输入图像输入至训练模型中,依次经过多尺度编码器模块、特征金字塔模块、多尺度解码器模块的处理得到第二初步分割掩模;Input the input image in the training sample into the training model, and obtain the second preliminary segmentation mask through the processing of the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module in turn;
    将所述第二初步分割掩模和所述输入图像在通道维度进行拼接后输入至分割模块,由分割模块输出精细分割掩模;The second preliminary segmentation mask and the input image are spliced in the channel dimension and then input to the segmentation module, and the segmentation module outputs the fine segmentation mask;
    将所述输入图像输入至所述边缘梯度模块,由所述边缘梯度模块调用其所包括的索贝尔算子对所述输入图像进行相应的计算,得到所述输入图像的梯度图;Inputting the input image to the edge gradient module, and the edge gradient module invokes the Sobel operator included in the edge gradient module to perform corresponding calculations on the input image to obtain a gradient map of the input image;
    获取训练样本中用作标注的第二标注分割掩模;Obtain the second annotation segmentation mask used as annotation in the training sample;
    由所述边缘梯度模块调用其所包括的膨胀腐蚀模块对所述第二标注分割掩模进行膨胀腐蚀处理,得到边缘掩模;Invoking the expansion and corrosion module included in the edge gradient module to perform expansion and corrosion processing on the second labeling segmentation mask to obtain an edge mask;
    将所述输入图像的梯度图与所述边缘掩模相乘,得到所述输入图像对应的边缘梯度图;Multiplying the gradient map of the input image by the edge mask to obtain an edge gradient map corresponding to the input image;
    将所述精细分割掩模与所述边缘掩模相乘,得到所述精细分割掩模对应的边缘概率预测图;multiplying the fine segmentation mask by the edge mask to obtain an edge probability prediction map corresponding to the fine segmentation mask;
    计算所述精细分割掩模与所述第二标注分割掩模之间的交叉熵损失和结构相似性损失;calculating a cross-entropy loss and a structural similarity loss between the fine segmentation mask and the second annotated segmentation mask;
    计算所述边缘梯度图与所述边缘概率预测图之间的边缘梯度损失;calculating the edge gradient loss between the edge gradient map and the edge probability prediction map;
    对所述精细分割掩模与所述第二标注分割掩模之间的交叉熵损失和结构相似性损失以及所述边缘梯度图与所述边缘概率预测图之间的边缘梯度损失进行损失求和处理;Loss summation of the cross-entropy loss and structural similarity loss between the fine segmentation mask and the second annotated segmentation mask and the edge gradient loss between the edge gradient map and the edge probability prediction map deal with;
    根据计算得到的损失之和对训练模型执行反向传播算法,更新模型参数;Perform back-propagation algorithm on the training model according to the sum of the calculated losses, and update the model parameters;
    在多个训练周期内重复模型训练的过程直至模型的损失函数完全收敛,保存模型及其模型参数;Repeat the process of model training in multiple training cycles until the loss function of the model is completely converged, and save the model and its model parameters;
    将训练完成后得到的模型中的边缘梯度模块去除后得到的模型确定为预训练图像分割模型。The model obtained after removing the edge gradient module in the model obtained after training is determined as the pre-trained image segmentation model.
  20. 根据权利要求14所述的电子设备,其中,所述处理器用于执行:The electronic device of claim 14, wherein the processor is configured to perform:
    对所述第一图像进行预设处理,所述预设处理包括随机裁剪和/或归一化处理;performing preset processing on the first image, the preset processing including random cropping and/or normalization processing;
    所述将所述第一图像输入至所述预训练图像分割模型,包括:将所述第一图像经过所述预设处理后得到的图像输入至所述预训练图像分割模型。The inputting the first image into the pre-trained image segmentation model includes: inputting the image obtained after the preset processing of the first image into the pre-trained image segmentation model.
PCT/CN2021/098905 2020-07-23 2021-06-08 Image processing method and apparatus, storage medium, and electronic device WO2022017025A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010718338.6A CN111862127A (en) 2020-07-23 2020-07-23 Image processing method, image processing device, storage medium and electronic equipment
CN202010718338.6 2020-07-23

Publications (1)

Publication Number Publication Date
WO2022017025A1 true WO2022017025A1 (en) 2022-01-27

Family

ID=72950390

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/098905 WO2022017025A1 (en) 2020-07-23 2021-06-08 Image processing method and apparatus, storage medium, and electronic device

Country Status (2)

Country Link
CN (1) CN111862127A (en)
WO (1) WO2022017025A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114331918A (en) * 2022-03-08 2022-04-12 荣耀终端有限公司 Training method of image enhancement model, image enhancement method and electronic equipment
CN114565526A (en) * 2022-02-23 2022-05-31 杭州电子科技大学 Deep learning image restoration method based on gradient direction and edge guide
CN116206059A (en) * 2023-02-13 2023-06-02 北京医智影科技有限公司 Loss function calculation method and model training method
CN116671919A (en) * 2023-08-02 2023-09-01 电子科技大学 Emotion detection reminding method based on wearable equipment

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111402258A (en) * 2020-03-12 2020-07-10 Oppo广东移动通信有限公司 Image processing method, image processing device, storage medium and electronic equipment
CN111862127A (en) * 2020-07-23 2020-10-30 Oppo广东移动通信有限公司 Image processing method, image processing device, storage medium and electronic equipment
CN112614101B (en) * 2020-12-17 2024-02-20 广东道氏技术股份有限公司 Polished tile flaw detection method based on multilayer feature extraction and related equipment
CN112580567B (en) * 2020-12-25 2024-04-16 深圳市优必选科技股份有限公司 Model acquisition method, model acquisition device and intelligent equipment
CN112785575B (en) * 2021-01-25 2022-11-18 清华大学 Image processing method, device and storage medium
CN113916897B (en) * 2021-12-15 2022-03-15 武汉三力国创机械设备工程有限公司 Filter element quality detection method based on image processing
CN117710969A (en) * 2024-02-05 2024-03-15 安徽大学 Cell nucleus segmentation and classification method based on deep neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780512A (en) * 2016-11-30 2017-05-31 厦门美图之家科技有限公司 The method of segmentation figure picture, using and computing device
US20200058126A1 (en) * 2018-08-17 2020-02-20 12 Sigma Technologies Image segmentation and object detection using fully convolutional neural network
CN111402258A (en) * 2020-03-12 2020-07-10 Oppo广东移动通信有限公司 Image processing method, image processing device, storage medium and electronic equipment
CN111862127A (en) * 2020-07-23 2020-10-30 Oppo广东移动通信有限公司 Image processing method, image processing device, storage medium and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886971A (en) * 2019-01-24 2019-06-14 西安交通大学 A kind of image partition method and system based on convolutional neural networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780512A (en) * 2016-11-30 2017-05-31 厦门美图之家科技有限公司 The method of segmentation figure picture, using and computing device
US20200058126A1 (en) * 2018-08-17 2020-02-20 12 Sigma Technologies Image segmentation and object detection using fully convolutional neural network
CN111402258A (en) * 2020-03-12 2020-07-10 Oppo广东移动通信有限公司 Image processing method, image processing device, storage medium and electronic equipment
CN111862127A (en) * 2020-07-23 2020-10-30 Oppo广东移动通信有限公司 Image processing method, image processing device, storage medium and electronic equipment

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114565526A (en) * 2022-02-23 2022-05-31 杭州电子科技大学 Deep learning image restoration method based on gradient direction and edge guide
CN114331918A (en) * 2022-03-08 2022-04-12 荣耀终端有限公司 Training method of image enhancement model, image enhancement method and electronic equipment
CN114331918B (en) * 2022-03-08 2022-08-02 荣耀终端有限公司 Training method of image enhancement model, image enhancement method and electronic equipment
CN116206059A (en) * 2023-02-13 2023-06-02 北京医智影科技有限公司 Loss function calculation method and model training method
CN116206059B (en) * 2023-02-13 2023-12-01 北京医智影科技有限公司 Model training method
CN116671919A (en) * 2023-08-02 2023-09-01 电子科技大学 Emotion detection reminding method based on wearable equipment
CN116671919B (en) * 2023-08-02 2023-10-20 电子科技大学 Emotion detection reminding method based on wearable equipment

Also Published As

Publication number Publication date
CN111862127A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
WO2022017025A1 (en) Image processing method and apparatus, storage medium, and electronic device
CN109949255B (en) Image reconstruction method and device
CN110084274B (en) Real-time image semantic segmentation method and system, readable storage medium and terminal
CN111369440B (en) Model training and image super-resolution processing method, device, terminal and storage medium
CN110852383B (en) Target detection method and device based on attention mechanism deep learning network
CN110490082B (en) Road scene semantic segmentation method capable of effectively fusing neural network features
CN110717851A (en) Image processing method and device, neural network training method and storage medium
CN111091130A (en) Real-time image semantic segmentation method and system based on lightweight convolutional neural network
CN110569851B (en) Real-time semantic segmentation method for gated multi-layer fusion
CN116309648A (en) Medical image segmentation model construction method based on multi-attention fusion
CN111476133B (en) Unmanned driving-oriented foreground and background codec network target extraction method
WO2022213395A1 (en) Light-weighted target detection method and device, and storage medium
CN112989843B (en) Intention recognition method, device, computing equipment and storage medium
Hua et al. Dynamic scene deblurring with continuous cross-layer attention transmission
CN116563550A (en) Landslide interpretation semantic segmentation method, system, device and medium based on mixed attention
CN117496990A (en) Speech denoising method, device, computer equipment and storage medium
CN115471718A (en) Construction and detection method of lightweight significance target detection model based on multi-scale learning
CN116129119A (en) Rapid semantic segmentation network and semantic segmentation method integrating local and global features
CN112529064B (en) Efficient real-time semantic segmentation method
CN114627293A (en) Image matting method based on multi-task learning
CN114119627A (en) High-temperature alloy microstructure image segmentation method and device based on deep learning
CN113496228A (en) Human body semantic segmentation method based on Res2Net, TransUNet and cooperative attention
CN111798385A (en) Image processing method and device, computer readable medium and electronic device
CN114663774B (en) Lightweight salient object detection system and method
CN112132253A (en) 3D motion recognition method and device, computer readable storage medium and equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21847183

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21847183

Country of ref document: EP

Kind code of ref document: A1