WO2023070495A1 - 图像处理方法、电子设备和非瞬态计算机可读介质 - Google Patents

图像处理方法、电子设备和非瞬态计算机可读介质 Download PDF

Info

Publication number
WO2023070495A1
WO2023070495A1 PCT/CN2021/127282 CN2021127282W WO2023070495A1 WO 2023070495 A1 WO2023070495 A1 WO 2023070495A1 CN 2021127282 W CN2021127282 W CN 2021127282W WO 2023070495 A1 WO2023070495 A1 WO 2023070495A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
model
mask
sub
resolution
Prior art date
Application number
PCT/CN2021/127282
Other languages
English (en)
French (fr)
Other versions
WO2023070495A9 (zh
Inventor
刘瀚文
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to CN202180003148.9A priority Critical patent/CN116368512A/zh
Priority to PCT/CN2021/127282 priority patent/WO2023070495A1/zh
Priority to US18/273,026 priority patent/US20240087085A1/en
Publication of WO2023070495A1 publication Critical patent/WO2023070495A1/zh
Publication of WO2023070495A9 publication Critical patent/WO2023070495A9/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/174Segmentation; Edge detection involving the use of two or more images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular to an image processing method, electronic equipment and a non-transitory computer-readable medium.
  • the existing target detection and extraction algorithms based on neural network models have achieved good results, and they are widely used in the processing of various types of images such as natural images and medical images.
  • the processing precision of the current neural network model is low.
  • problems such as jagged edges often appear after processing.
  • the present disclosure aims to solve at least one of the technical problems existing in the prior art, and proposes an image processing method, electronic equipment, and a non-transitory computer-readable medium.
  • an image processing method including:
  • the second target object mask is fused with the original image to obtain a target object image.
  • the mask super-resolution model includes a first sub-model and a second sub-model
  • Using the mask super-resolution model to perform super-resolution processing on the first object mask to obtain a second object mask includes:
  • the first sub-model includes sequentially connected P-level first computing modules, each level of first computing modules includes a first computing unit and a second computing unit, where P is a positive integer greater than 1;
  • the extraction of image features corresponding to the downsampled image by using the first sub-model includes:
  • n is a positive integer and not greater than P: use its first operation unit to perform image feature extraction according to the first feature map output by the downsampled image or its upper-level first operation module , generate a second feature map, and output the second feature map to the second sub-model; use its second computing unit to enlarge the size of the second feature map, and use the enlarged second feature map
  • the graph is output to the first computing module of the next level;
  • Using the second sub-model to perform super-resolution processing on the first object mask in combination with the image features extracted by the first sub-model to obtain the second object mask includes:
  • the first operation unit includes sequentially connected convolutional layers, batch normalization layers, and excitation layers
  • the second operation unit includes transposed convolutional layers
  • the second sub-model includes sequentially connected P-level second computing modules, and each level of second computing modules includes a splicing layer, a third computing unit, and a fourth computing unit, wherein the first computing modules of the same level
  • the operation unit and the third operation unit are connected through a splicing layer;
  • m is a positive integer and not greater than P: use its splicing layer to splice the second feature map and the first object mask, or splice the second feature map and its
  • the third feature map output by the second computing module at the upper level generates a fourth feature map; using its third computing unit, performs image feature extraction according to the fourth feature map to generate a fifth feature map; using its fourth computing The unit enlarges the size of the fifth feature map, and outputs the enlarged fifth feature map to the next-level second computing module;
  • the enlarged fifth feature map is used as the second target object mask and output.
  • the third computing unit includes sequentially connected convolutional layers, batch normalization layers, and excitation layers
  • the fourth computing unit includes transposed convolutional layers
  • the extracting the object region in the downsampled image and obtaining the first object mask includes:
  • the target object extraction model is a UNet network model
  • the first sub-model includes sequentially connected three-level first operation modules
  • the second sub-model includes sequentially connected three-level second operation modules.
  • the mask super-resolution model is trained through the following steps:
  • the mask super-resolution model to be trained is trained based on the downsampled image sample and the target mask sample; wherein, the first sub-model to be trained is used to extract the lower Sampling the image features corresponding to the image samples, and performing super-resolution processing on the downsampled image samples; and inputting the image features extracted by the first sub-model to be trained into the second sub-model to be trained ;
  • the second sub-model to be trained combined with the image features extracted by the first sub-model to be trained, to perform super-resolution processing on the target mask sample;
  • the training is ended, and the mask super-resolution model is obtained.
  • the preset convergence conditions include at least one of the following:
  • the first loss value and the second loss value satisfy a preset loss value condition, wherein the first loss value is based on the original image sample corresponding to the downsampled image sample and the downsampled image sample after super-resolution processing Calculated, the second loss value is calculated based on the original image sample and the target object mask sample after super-resolution processing.
  • the first loss value is obtained by calculating according to the original image sample and the downsampled image sample after super-resolution processing.
  • Edge matching is performed on the first edge map and the second edge map, and the second loss value is determined according to an edge matching result.
  • the target area is a portrait area
  • the target image is a portrait image
  • an embodiment of the present disclosure further provides an electronic device, including:
  • processors one or more processors
  • memory for storing one or more programs
  • the one or more processors are made to implement the image processing method as described in any one of the above embodiments.
  • an embodiment of the present disclosure also provides a non-transitory computer-readable medium on which a computer program is stored, wherein, when the program is executed, the image processing method as described in any one of the above-mentioned embodiments is implemented .
  • FIG. 1 is a flowchart of an image processing method provided by an embodiment of the present disclosure
  • FIG. 2 is a flow chart of a specific implementation method of step S3 in an embodiment of the present disclosure
  • FIG. 3 is a flow chart of another specific implementation method of step S3 in the embodiment of the present disclosure.
  • FIG. 4 is a flowchart of a training method for a mask super-resolution model provided by an embodiment of the present disclosure
  • FIG. 5 is a flowchart of a specific implementation method of step S02 in an embodiment of the present disclosure
  • FIG. 6 is a flow chart of a specific implementation method of step S2 in an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of a mask super-resolution model provided by an embodiment of the present disclosure.
  • FIG. 8 is a composition block diagram of an electronic device provided by an embodiment of the present disclosure.
  • Fig. 9 is a composition block diagram of a non-transitory computer readable medium provided by an embodiment of the present disclosure.
  • FIG. 1 is a flowchart of an image processing method provided by an embodiment of the present disclosure. As shown in Figure 1, the method includes:
  • Step S1 downsampling the original image according to a preset resolution to generate a downsampled image.
  • the preset resolution is smaller than the resolution of the original image; downsampling the original image corresponds to the process of scaling the original image, which generates a lower-resolution downsampled image, and the resolution of the downsampled image is the preset resolution rate, the number of pixels of the downsampled image is reduced compared with the number of pixels of the original image, and the corresponding operation and processing time for the downsampled image is also reduced accordingly, and the reduction ratio can be approximated by the preset resolution and the resolution of the original image ratio between.
  • Step S2 extracting the object area in the down-sampled image to obtain a first object mask.
  • the mask is a single-channel image, which can be used to block all or part of the image to be processed during image processing, so as to control the object area and process of image processing;
  • the first The target object mask is a mask corresponding to the extracted target object area, wherein the target object area is used as the foreground area of the down-sampled image, and other parts are used as the background area of the down-sampled image. If the first target object mask The mask acts on the down-sampled image to obtain a down-sampled image that only retains the foreground area; in some embodiments, the mask is a binary image composed of 0 and 1, or in some embodiments, the mask can also be multiple value image.
  • the target object area is a portrait area
  • the target object image is a portrait image
  • step S2 corresponds to the process of portrait matting.
  • a human figure is only a specific implementation mode provided by the embodiments of the present disclosure, and it will not limit the technical solution of the present disclosure.
  • Other target types are also applicable to the technical solution of the present disclosure, such as Animals and plants, vehicles and other means of transportation, license plates, etc.; specifically, the target under the corresponding target type must meet at least one of the following conditions: have a specific shape; have a clear outline; can use the corresponding detection algorithm to determine its The location of the region it belongs to in the image.
  • Step S3 Input the downsampled image and the first object mask into the pre-trained mask super-resolution model, use the mask super-resolution model to perform super-resolution processing on the first object mask, and obtain the second Two object masks.
  • the resolution of the second object mask is higher than the resolution of the first object mask; in step S3, the first object mask is super-resolved by using the mask super-resolution model combined with the downsampled image Among them, Super Resolution (SR) is also called super-resolution processing, which corresponds to the process of reconstructing a high-resolution image based on a low-resolution image.
  • SR Super Resolution
  • the mask super-resolution model is pre-trained based on original image samples, downsampled image samples and object mask samples.
  • Step S4 fusing the second object mask with the original image to obtain an image of the object.
  • the target object image is the final matting result of the target object in the original image.
  • the second object mask is a binary image, and the second object mask and the original image are fused by multiplying; or, in some embodiments, as mentioned above, the second object mask is fused with the original image;
  • the second target mask is a single-channel image, and the second target mask can be fused with the original image by channel fusion; or, in some embodiments, the second target mask can be fused by Poisson fusion merged with the original image.
  • An embodiment of the present disclosure provides an image processing method, which can be used to extract the object area in the downsampled image of the original image to obtain the first object mask; input the downsampled image and the first object mask to the mask
  • the mask super-resolution model is used to perform super-resolution processing on the first target mask to obtain the second target mask; the second target mask is fused with the original image to obtain the target Therefore, by increasing the resolution of the mask corresponding to the target object, the overall fineness of the target object extraction process can be improved, which can effectively avoid the occurrence of large-scale, high-resolution images when performing target object extraction and matting.
  • the problem with jagged edges can be used to extract the object area in the downsampled image of the original image to obtain the first object mask; input the downsampled image and the first object mask to the mask
  • the mask super-resolution model is used to perform super-resolution processing on the first target mask to obtain the second target mask; the second target mask is fused with the original image to obtain the target Therefore, by increasing the
  • FIG. 2 is a flowchart of a specific implementation method of step S3 in an embodiment of the present disclosure.
  • the mask super-resolution model includes a first sub-model and a second sub-model; as shown in FIG. 2, in step S3, the mask super-resolution model is used to perform super-resolution processing on the first object mask,
  • the step of obtaining the second object mask includes: step S301 and step S302.
  • Step S301 using the first sub-model to extract image features corresponding to the downsampled image.
  • Step S302 Input the image features extracted by the first sub-model into the second sub-model, and use the second sub-model to perform super-resolution processing on the first object mask in combination with the image features extracted by the first sub-model, to obtain Second object mask.
  • the downsampled image and the first object mask are respectively input into the first sub-model and the second sub-model;
  • the first sub-model is used to extract the image features of the down-sampled image, and output to the second sub-model.
  • the first sub-model can finally output the down-sampled image after super-resolution processing, the output The results can be used to verify the input and output of the model and detect the super-resolution effect, etc.; the second sub-model is used to combine the image features extracted by the first sub-model to perform super-resolution processing on the first target mask, and finally output the second target Mask, in some embodiments, feature maps can be concatenated by splicing layer (Concat), and in some embodiments, feature fusion can be realized by means of 1*1 convolution and pooling of channel latitude, so as to Combining the image features extracted by the first sub-model to perform super-resolution processing on the first object mask.
  • Concat splicing layer
  • FIG. 3 is a flow chart of another specific implementation method of step S3 in the embodiment of the present disclosure.
  • this method is a concrete optional implementation based on the method shown in Figure 2; on its basis, the first sub-model includes sequentially connected P-level first computing modules, and each level of first computing modules includes the first A computing unit and a second computing unit, wherein, P is a positive integer greater than 1; as shown in Figure 3, when performing step S301, using the first sub-model to extract the step of image features corresponding to the down-sampled image, for the nth
  • the first operation module of the first stage (n is a positive integer and not greater than P), which includes: step S3011 and step S3012.
  • Step S3011 using the first computing unit of the first computing module of this level to perform image feature extraction according to the downsampled image or the first feature map output by the first computing module of the previous stage, generate a second feature map, and convert the second The feature map is output to the second sub-model.
  • the above image features corresponding to the downsampled image include: the image features of the downsampled image and the image features of its feature map.
  • Step S3012 using the second computing unit of the first computing module at the level to enlarge the size of the second feature map, and output the enlarged second feature map to the first computing module at the next stage.
  • the second feature map output by the first computing module of this stage is the first feature map received by the first computing module of the next stage.
  • the enlarged feature map output by it is the downsampled image after super-resolution processing.
  • the first computing unit includes a sequentially connected convolutional layer, batch normalization layer, and excitation layer
  • the second computing unit includes a transposed convolutional layer, also known as a deconvolution layer or an inverse convolution layer.
  • step S3012 the step of using the second computing unit of the first computing module of the level to enlarge the size of the feature map corresponding to the extracted image feature, specifically includes: using the second computing unit of the first computing module of the level to extract The feature map corresponding to the image features of the image features is transposed and convolved to enlarge the size of the feature map.
  • the parameter settings of the above-mentioned convolutional layer, batch normalization layer, excitation layer and transposed convolutional layer may be different.
  • the term "convolution kernel” refers to a two-dimensional matrix used in the convolution process.
  • each of the plurality of entries in the two-dimensional matrix has a specific value.
  • the term "convolution” refers to the process of processing an image.
  • the convolution kernel is used for convolution.
  • Each pixel of the input image has a value, and the convolution kernel starts at one pixel of the input image and moves sequentially over each pixel in the input image.
  • the kernel overlaps several pixels on the image based on the scale of the kernel.
  • the value of one of the several overlapping pixels is multiplied by a corresponding value of the convolution kernel to obtain a multiplied value of one of the several overlapping pixels.
  • all multiplied values of overlapping pixels are summed to obtain a sum corresponding to the position of the convolution kernel on the input image.
  • the convolution may use different convolution kernels to extract different features of the input image.
  • the convolution process may use different convolution kernels to add more features to the input image.
  • the convolution layer is used to perform convolution on the input image to obtain the output image.
  • different convolutions are performed on the same input image using different kernels.
  • different convolution kernels are used to perform convolution on different input images, for example, a plurality of images are input in a convolution layer, and corresponding convolution kernels are used to perform convolution on images in the plurality of images.
  • different convolution kernels are used according to different conditions of the input image.
  • the excitation layer may perform non-linear mapping on the output signal output from the convolutional layer.
  • Various functions can be used in the excitation layer. Examples of functions suitable for use in the excitation layer include, but are not limited to: rectified linear unit (ReLU) functions, sigmoid functions, and hyperbolic tangent functions (eg, tanh functions).
  • ReLU rectified linear unit
  • sigmoid sigmoid functions
  • hyperbolic tangent functions eg, tanh functions
  • excitation layers and batch normalization layers are included in convolutional layers.
  • the batch normalization layer (Batch Normalization, referred to as BN) can standardize the output of a small batch of data in each layer of the network model. Standardization is the process of making the data conform to the standard normal distribution with a mean of 0 and a standard deviation of 1. It can solve the problem of gradient disappearance in neural network models.
  • step S3011 and step S302 using the second sub-model and combining the image features extracted by the first sub-model to perform super-resolution processing on the first object mask to obtain the second object mask, including : Using the second sub-model, combined with the second feature map extracted by the first operation module at each level, performing super-resolution processing on the first object mask to obtain the second object mask.
  • the second sub-model includes P-level second computing modules connected in sequence, and each level of second computing modules includes a splicing layer, a third computing unit, and a fourth computing unit, wherein, wherein, P is greater than 1 Positive integer, the first operation unit and the third operation unit of the same level are connected through a splicing layer; thus, in some embodiments, as shown in FIG.
  • the second feature map extracted by the first operation module of the first stage is used to perform super-resolution processing on the first object mask to obtain the second object mask.
  • the second operation module of the mth stage (m is positive integer and not greater than P), which includes: Step S3021 to Step S3023.
  • Step S3021 using the splicing layer of the second operation module of this level to splice the second feature map and the first object mask, or splicing the second feature map and the third feature map output by the second operation module of the upper level to generate The fourth feature map.
  • the number of levels of the multi-level first operation module is the same as the number of levels of the multi-level second operation module.
  • stitching layers are used for stitching at lane latitudes.
  • Step S3022 using the third computing unit of the second computing module of the stage to extract image features according to the fourth feature map to generate a fifth feature map.
  • Step S3023 using the fourth computing unit of the second computing module of the level to enlarge the size of the fifth feature map, and output the enlarged fifth feature map to the second computing module of the next stage.
  • the fifth feature map output by the second computing module of this stage is the third feature map received by the second computing module of the next stage; for the second computing module of the last stage, the enlarged fifth feature map is used as The second object is masked and output.
  • the third computing unit includes sequentially connected convolutional layers, batch normalization layers, and excitation layers
  • the fourth computing unit includes transposed convolutional layers
  • the parameter settings of the above-mentioned convolutional layer, batch normalization layer, excitation layer and transposed convolutional layer may be different.
  • the embodiment of the present disclosure provides an image processing method, which can be used to perform super-resolution processing on the mask corresponding to the target object in combination with the image features of the downsampled image, increase its feature dimension, and improve the fineness of target object extraction.
  • FIG. 4 is a flowchart of a method for training a mask super-resolution model provided by an embodiment of the present disclosure.
  • the mask super-resolution model is the mask super-resolution model corresponding to FIG. 2, which includes a first sub-model and a second sub-model; as shown in FIG. 4, the mask super-resolution model is passed as follows Step training gets:
  • Step S01 input the downsampled image samples and their corresponding object mask samples into the mask super-resolution model to be trained.
  • the downsampled image sample is obtained by downsampling the corresponding original image sample
  • the object mask sample is obtained by extracting the object from the downsampled image sample.
  • Step S02 in an iterative manner, train the mask super-resolution model to be trained based on the downsampled image samples and the target mask samples.
  • FIG. 5 is a flowchart of a specific implementation method of step S02 in an embodiment of the present disclosure. As shown in FIG. 5, step S02 includes: step S021 and step S022.
  • Step S021 using the first sub-model to be trained to extract image features corresponding to the downsampled image samples, and perform super-resolution processing on the downsampled image samples.
  • Step S022 input the image features extracted by the first sub-model to be trained into the second sub-model to be trained, and use the second sub-model to be trained in combination with the image features extracted by the first sub-model to be trained to target Super-resolution processing of object mask samples.
  • the image features corresponding to the downsampled image sample include the image features of the downsampled image sample and the image features of its feature map; the above-mentioned process of training the first sub-model and the second sub-model Corresponds to the actual inference process of the first sub-model and the second sub-model.
  • Step S03 in response to the satisfaction of the preset convergence condition, end the training, and obtain the mask super-resolution model.
  • the preset convergence condition includes at least one of the following: a preset number of iterations has been trained; the first loss value and the second loss value meet the preset loss value condition.
  • the first loss value is calculated based on the original image sample corresponding to the downsampled image sample and the downsampled image sample after super-resolution processing
  • the second loss value is calculated based on the original image sample and the target mask sample after super-resolution processing calculated.
  • step S021 after the step of using the first sub-model to be trained to extract the image features corresponding to the downsampled image samples, and performing super-resolution processing on the downsampled image samples, it also includes: based on the mean square error
  • the (Mean Square Error, MSE for short) function calculates the first loss value based on the original image sample and the downsampled image sample after super-resolution processing.
  • step S022 after using the second sub-model to be trained and combining the image features extracted by the first sub-model to be trained to perform super-resolution processing on the target object mask sample, it further includes: Obtain the first edge map corresponding to the original image sample; perform edge detection on the target object mask sample after super-resolution processing to obtain the second edge map; perform edge matching on the first edge map and the second edge map, according to the edge matching As a result, a second loss value is determined.
  • edge detection is performed on the original image samples to obtain a first edge map, or the first edge map obtained by pre-calculation is read from a storage area.
  • the edge image is an 8bit grayscale image, EM1(xi , y i )>127, EM2 (x i ,y i )>127.
  • FIG. 6 is a flowchart of a specific implementation method of step S2 in an embodiment of the present disclosure. As shown in FIG. 6, step S2, the step of extracting the object area in the down-sampled image, and obtaining the first object mask includes: step S201.
  • Step S201 input the downsampled image into the pre-trained object extraction model, use the object extraction model to extract the object area in the downsampled image, and obtain the first object mask.
  • the target object extraction model adopts the UNet network model, and the resolutions of the input image and the output image are both 512*512; correspondingly, in step S1, the preset resolution is 512*512.
  • the first target object mask obtained by using the target object extraction model is input into the mask super-resolution model.
  • the mask super-resolution model includes a first sub-model and a second sub-model, and the first sub-model includes sequentially connected
  • the second computing module of each stage utilizes its fourth computing unit to enlarge the size of its fifth feature map, then based on the corresponding model parameter settings, the size of the feature map can be doubled each time, and the final output of the fifth feature map
  • the resolution of the two-target mask can be 4096*4096, so it can be applied to 4K scenes; specifically, by setting the padding of the transposed convolutional layer, two times the magnification can be achieved, such as setting the parameter to "same" etc.
  • the image processing method provided by the embodiments of the present disclosure will be described in detail below in combination with practical applications. Specifically, taking the application to portrait matting as an example, the target object area in the downsampled image is the portrait area, and the finally obtained target object image is the portrait image.
  • the original image is down-sampled according to a preset resolution to generate a down-sampled image; wherein, the original image is a 4K image including a portrait, and the preset resolution is 512*512.
  • the mask super-resolution model includes a first sub-model and a second sub-model; the first sub-model includes sequentially Connected three-level first computing modules (i.e.
  • each level of first computing modules includes a first computing unit and a second computing unit, and the down-sampled image is input into the first sub-model; the second sub-model includes sequentially connected The three-level second computing module, each second computing module includes a stitching layer, a third computing unit and a fourth computing unit, wherein the first computing unit and the third computing unit of the same level are connected through a splicing layer, and the second computing unit An object mask is input into the second submodel.
  • the first operation module of the nth level (n is a positive integer and not greater than 3)
  • use its first operation unit to perform image processing according to the downsampled image or the first feature map output by the first operation module at the upper level.
  • the first computing unit includes sequentially connected convolutional layers, batch normalization layers, and excitation layers
  • the second computing unit includes transposed convolutional layers.
  • the second operation module of the mth level uses its splicing layer to splicing the second feature map and the first target mask output by the first operation module at the same level, or splicing
  • the second feature map and the third feature map output by the second operation module of its upper level generate a fourth feature map; use its third operation unit to extract image features according to the fourth feature map to generate a fifth feature map; use Its fourth operation unit enlarges the size of the fifth feature map, and outputs the enlarged fifth feature map to the second operation module of the next stage; wherein, for the second operation module of the first stage, the splicing layer is used to stitch the first The feature map and the first object mask output by the first operation module of the first stage, and for the second operation module of the second level and the third level, the splicing layer is respectively spliced with the second operation module of the first level and the first operation module of the second level The feature map output by the module, and the feature map output by
  • the second object mask is fused with the original image to obtain a portrait image.
  • FIG. 7 is a schematic structural diagram of a mask super-resolution model provided by an embodiment of the present disclosure.
  • the mask super-resolution model includes a first sub-model and a second sub-model; the first sub-model includes three stages of first computing modules 301 connected in sequence,
  • the first computing module 301 includes a first computing unit CBR1 and a second computing unit T_conv1, the downsampled image LR is input into the first sub-model, and the first sub-model outputs the down-sampled image HR after super-resolution processing;
  • the second sub-model The model includes three levels of second computing modules 401 connected in sequence, and each level of second computing modules 401 includes a splicing layer (not shown in the figure), a third computing unit CBR2 and a fourth computing unit T_conv2, wherein the first computing unit of the same level
  • the operation unit CBR1 and the third operation unit CBR2 are connected through a splicing layer, the first object mask MASK
  • Fig. 8 is a composition block diagram of an electronic device provided by an embodiment of the present disclosure. As shown in Figure 8, the electronic equipment includes:
  • processors 101 one or more processors 101;
  • Memory 102 on which one or more programs are stored, and when the one or more programs are executed by the one or more processors, the one or more processors 101 realize the image processing as in any one of the above-mentioned embodiments method;
  • One or more I/O interfaces 103 are connected between the processor and the memory, and are configured to realize information exchange between the processor and the memory.
  • the processor 101 is a device with data processing capability, which includes but not limited to a central processing unit (CPU), etc.
  • the memory 102 is a device with data storage capability, which includes but not limited to a random access memory (RAM, more specifically Such as SDRAM, DDR, etc.), read-only memory (ROM), electrified erasable programmable read-only memory (EEPROM), flash memory (FLASH); I/O interface (read-write interface) 103 is connected between processor 101 and memory 102 , can realize information exchange between the processor 101 and the memory 102, which includes but not limited to a data bus (Bus) and the like.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrified erasable programmable read-only memory
  • FLASH flash memory
  • I/O interface (read-write interface) 103 is connected between processor 101 and memory 102 , can realize information exchange between the processor 101 and the memory 102, which includes but not limited to a data bus (Bus) and the like.
  • the processor 101 , the memory 102 and the I/O interface 103 are connected to each other through the bus 104 , and are further connected to other components of the computing device.
  • the plurality of processors 101 includes a plurality of graphics processing units (GPUs), the combined arrangement of which constitutes a graphics processor array.
  • GPUs graphics processing units
  • Fig. 9 is a composition block diagram of a non-transitory computer readable medium provided by an embodiment of the present disclosure.
  • a computer program is stored on the computer-readable medium, wherein, when the computer program is executed by a processor, the image processing method as in any one of the above-mentioned embodiments is realized.
  • Non-transitory computer readable media may include computer storage media (or non-transitory media) and communication media (or transitory media).
  • computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. permanent, removable and non-removable media.
  • Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了一种图像处理方法,包括:根据预设分辨率对原始图像进行下采样,生成下采样图像;提取下采样图像中的目标物区域,得到第一目标物掩膜;将下采样图像和第一目标物掩膜输入至预先训练得到的掩膜超分辨率模型中;利用掩膜超分辨率模型对第一目标物掩膜进行超分辨率处理,得到第二目标物掩膜;将第二目标物掩膜和原始图像进行融合,得到目标物图像。本公开还提供了一种电子设备和非瞬态计算机可读介质。

Description

图像处理方法、电子设备和非瞬态计算机可读介质 技术领域
本公开涉及图像处理技术领域,特别涉及一种图像处理方法、电子设备和非瞬态计算机可读介质。
背景技术
现有的基于神经网络模型的目标检测、提取算法已经取得了不错的效果,其在自然图像、医疗图像等各类别图像的处理中都有比较广泛的应用。但是,当前的神经网络模型处理精细度较低,实际进行目标物的提取及抠图时,对于尺寸较大的图像,处理后常常会出现在边缘处存在锯齿等问题。
发明内容
本公开旨在至少解决现有技术中存在的技术问题之一,提出了一种图像处理方法、电子设备和非瞬态计算机可读介质。
为实现上述目的,第一方面,本公开实施例提供了一种图像处理方法,包括:
根据预设分辨率对原始图像进行下采样,生成下采样图像;
提取所述下采样图像中的目标物区域,得到第一目标物掩膜;
将所述下采样图像和所述第一目标物掩膜输入至预先训练得到的掩膜超分辨率模型中;利用所述掩膜超分辨率模型对所述第一目标物掩膜进行超分辨率处理,得到第二目标物掩膜,其中,所述第二目标物掩膜的分辨率高于所述第一目标物掩膜的分辨率;
将所述第二目标物掩膜和所述原始图像进行融合,得到目标物图像。
在一些实施例中,所述掩膜超分辨率模型包括第一子模型和第二子模型;
所述利用所述掩膜超分辨率模型对所述第一目标物掩膜进行超分辨率处理,得到第二目标物掩膜,包括:
利用所述第一子模型提取所述下采样图像对应的图像特征;
将所述第一子模型提取的图像特征输入至所述第二子模型中;利用所述第二子模型,结合由所述第一子模型提取的图像特征对所述第一目标物掩膜进行超分辨率处理,得到所述第二目标物掩膜。
在一些实施例中,所述第一子模型包括依次连接的P级第一运算模块,每级第一运算模块包括第一运算单元和第二运算单元,其中,P为大于1的正整数;
所述利用所述第一子模型提取所述下采样图像对应的图像特征,包括:
对于第n级第一运算模块,n为正整数且不大于P:利用其第一运算单元,根据所述下采样图像或其上一级第一运算模块输出的第一特征图进行图像特征提取,生成第二特征图,并将所述第二特征图输出至所述第二子模型;利用其第二运算单元放大所述第二特征图的尺寸,并将放大后的所述第二特征图输出至下一级第一运算模块;
所述利用所述第二子模型,结合由所述第一子模型提取的图像特征对所述第一目标物掩膜进行超分辨率处理,得到所述第二目标物掩膜,包括:
利用所述第二子模型,结合每一级所述第一运算模块提取出的所述第二特征图,对所述第一目标物掩膜进行超分辨率处理,得到所述第二目标物掩膜。
在一些实施例中,所述第一运算单元包括依次连接的卷积层、批标准化层和激励层,所述第二运算单元包括转置卷积层。
在一些实施例中,所述第二子模型包括依次连接的P级第二运算模块,每级第二运算模块包括拼接层、第三运算单元和第四运算单元,其中,同级的第一运算单元和第三运算单元之间通过拼接层连接;
所述利用所述第二子模型,结合每一级所述第一运算模块提取出的所述第二特征图,对所述第一目标物掩膜进行超分辨率处理,得到所述第二目标物掩膜,包括:
对于第m级第二运算模块,m为正整数且不大于P:利用其拼接层,拼接所述第二特征图和所述第一目标物掩膜,或拼接所述第二特征图和其上一 级第二运算模块输出的第三特征图,生成第四特征图;利用其第三运算单元,根据所述第四特征图进行图像特征提取,生成第五特征图;利用其第四运算单元放大所述第五特征图的尺寸,并将放大后的所述第五特征图输出至下一级第二运算模块;
其中,对于最后一级第二运算模块,将其放大后的所述第五特征图作为所述第二目标物掩膜并输出。
在一些实施例中,所述第三运算单元包括依次连接的卷积层、批标准化层和激励层,所述第四运算单元包括转置卷积层。
在一些实施例中,所述提取所述下采样图像中的目标物区域,并得到第一目标物掩膜,包括:
将所述下采样图像输入至预先训练得到的目标物提取模型中,利用所述目标物提取模型提取所述下采样图像中的目标物区域,得到所述第一目标物掩膜;
其中,所述目标物提取模型为UNet网络模型,所述第一子模型包括依次连接的三级第一运算模块,所述第二子模型包括依次连接的三级第二运算模块。
在一些实施例中,所述掩膜超分辨率模型通过如下步骤训练得到:
将下采样图像样本和其对应的目标物掩膜样本输入至待训练的所述掩膜超分辨率模型中;
通过迭代的方式,基于所述下采样图像样本和所述目标物掩膜样本对待训练的所述掩膜超分辨率模型进行训练;其中,利用待训练的所述第一子模型提取所述下采样图像样本对应的图像特征,并对所述下采样图像样本进行超分辨率处理;以及,将待训练的所述第一子模型提取的图像特征输入至待训练的所述第二子模型中;利用待训练的所述第二子模型,结合由待训练的所述第一子模型提取的图像特征对所述目标物掩膜样本进行超分辨率处理;
响应于预设收敛条件满足,结束训练,得到所述掩膜超分辨率模型。
在一些实施例中,所述预设收敛条件包括以下至少之一:
已训练预设迭代次数;
第一损失值和第二损失值满足预设的损失值条件,其中,所述第一损失值基于所述下采样图像样本对应的原始图像样本和超分辨率处理后的所述下采样图像样本计算得到,所述第二损失值基于所述原始图像样本和超分辨率处理后的所述目标物掩膜样本计算得到。
在一些实施例中,在所述利用待训练的所述第一子模型提取所述下采样图像样本对应的图像特征,并对所述下采样图像样本进行超分辨率处理之后,还包括:
基于均方误差函数,根据所述原始图像样本和超分辨率处理后的所述下采样图像样本计算得到所述第一损失值。
在一些实施例中,在所述利用待训练的所述第二子模型,结合由待训练的所述第一子模型提取的图像特征对所述目标物掩膜样本进行超分辨率处理之后,还包括:
获取所述原始图像样本对应的第一边缘图;
对超分辨率处理后的所述目标物掩膜样本进行边缘检测,得到第二边缘图;
对所述第一边缘图和所述第二边缘图进行边缘匹配,根据边缘匹配结果确定所述第二损失值。
在一些实施例中,所述目标物区域为人像区域,所述目标物图像为人像图像。
第二方面,本公开实施例还提供了一种电子设备,包括:
一个或多个处理器;
存储器,用于存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如上述实施例中任一所述的图像处理方法。
第三方面,本公开实施例还提供了一种非瞬态计算机可读介质,其上存 储有计算机程序,其中,所述程序被执行时实现如上述实施例中任一所述的图像处理方法。
附图说明
附图用来提供对本公开的进一步理解,并且构成说明书的一部分,与本公开的实施例一起用于解释本公开,并不构成对本公开的限制。通过参考附图对详细示例实施例进行描述,以上和其他特征和优点对本领域技术人员将变得更加显而易见,在附图中:
图1为本公开实施例提供的一种图像处理方法的流程图;
图2为本公开实施例步骤S3的一种具体实施方法流程图;
图3为本公开实施例步骤S3的另一种具体实施方法流程图;
图4为本公开实施例提供的一种掩膜超分辨率模型的训练方法流程图;
图5为本公开实施例步骤S02的具体实施方法流程图;
图6为本公开实施例步骤S2的一种具体实施方法流程图;
图7为本公开实施例提供的一种掩膜超分辨率模型的结构示意图;
图8为本公开实施例提供的一种电子设备的组成框图;
图9为本公开实施例提供的一种非瞬态计算机可读介质的组成框图。
具体实施方式
为使本领域的技术人员更好地理解本公开的技术方案,下面结合附图对本公开提供的图像处理方法、电子设备和非瞬态计算机可读介质进行详细描述。
在下文中将参考附图更充分地描述示例实施例,但是所述示例实施例可以以不同形式来体现且不应当被解释为限于本文阐述的实施例。反之,提供这些实施例的目的在于使本公开透彻和完整,并将使本领域技术人员充分理解本公开的范围。
本文所使用的术语仅用于描述特定实施例,且不意欲限制本公开。如本文所使用的,单数形式“一个”和“该”也意欲包括复数形式,除非上下文 另外清楚指出。还将理解的是,当本说明书中使用术语“包括”和/或“由……制成”时,指定存在所述特征、整体、步骤、操作、元件和/或组件,但不排除存在或添加一个或多个其他特征、整体、步骤、操作、元件、组件和/或其群组。
将理解的是,虽然本文可以使用术语第一、第二等来描述各种元件,但这些元件不应当受限于这些术语。这些术语仅用于区分一个元件和另一元件。因此,在不背离本公开的指教的情况下,下文讨论的第一元件、第一组件或第一模块可称为第二元件、第二组件或第二模块。
除非另外限定,否则本文所用的所有术语(包括技术和科学术语)的含义与本领域普通技术人员通常理解的含义相同。还将理解,诸如那些在常用字典中限定的那些术语应当被解释为具有与其在相关技术以及本公开的背景下的含义一致的含义,且将不解释为具有理想化或过度形式上的含义,除非本文明确如此限定。
图1为本公开实施例提供的一种图像处理方法的流程图。如图1所示,该方法包括:
步骤S1、根据预设分辨率对原始图像进行下采样,生成下采样图像。
其中,该预设分辨率小于原始图像的分辨率;对原始图像进行下采样对应缩放原始图像的过程,其生成较低分辨率的下采样图像,下采样图像的分辨率即为该预设分辨率,下采样图像的像素点数较原始图像的像素点数减小,对下采样图像进行相应运算和处理的耗时也相应减少,其减少的比例可近似为预设分辨率与原始图像的分辨率之间的比值。
步骤S2、提取下采样图像中的目标物区域,得到第一目标物掩膜。
其中,掩膜(Mask)为一种单通道图像,在图像处理过程中可用于对待处理图像的全部或局部进行遮挡,以控制图像处理的对象区域及进程等;在本实施例中,第一目标物掩膜为与提取出的目标物区域相对应的掩膜,其中,将目标物区域作为下采样图像的前景区域,将其他部分作为下采样图像的背景区域,若将第一目标物掩膜作用于下采样图像,即可得到仅保留前景区域 的下采样图像;在一些实施例中,掩膜为由0和1组成的二进制图像,或者在一些实施例中,掩膜也可以是多值图像。
在一些实施例中,目标物区域为人像区域,目标物图像为人像图像,则步骤S2对应人像抠图的过程。需要说明的是,将人像作为目标物仅为本公开实施例所提供的一种具体实施方式,其不会对本公开的技术方案产生限制,其他目标物类型同样适用于本公开的技术方案,如动植物、车辆及其他交通工具、车牌等;具体地,相应目标物类型下的目标物需符合下述条件中的至少一个:具备具体形状;存在较清晰的轮廓;能够利用相应检测算法确定其在图像中所属的区域位置。
步骤S3、将下采样图像和第一目标物掩膜输入至预先训练得到的掩膜超分辨率模型中,利用掩膜超分辨率模型对第一目标物掩膜进行超分辨率处理,得到第二目标物掩膜。
其中,第二目标物掩膜的分辨率高于第一目标物掩膜的分辨率;在步骤S3中,利用掩膜超分辨率模型,结合下采样图像对第一目标物掩膜进行超分辨率处理,其中,超分辨率处理(Super Resolution,简称SR)又称超分处理,其对应基于低分辨率图像重建出高分辨率图像的过程。
在一些实施例中,该掩膜超分辨率模型为预先基于原始图像样本、下采样图像样本以及目标物掩膜样本进行训练得到的。
步骤S4、将第二目标物掩膜和原始图像进行融合,得到目标物图像。
其中,目标物图像即是原始图像中针对目标物的最终的抠图结果。在一些实施例中,第二目标物掩膜为二值图像,通过相乘的方式将第二目标物掩膜和原始图像进行融合;或者,在一些实施例中,如上文提到的,第二目标物掩膜是单通道图像,可通过通道融合的方式将第二目标物掩膜和原始图像进行融合;或者,在一些实施例中,通过泊松融合的方式将第二目标物掩膜和原始图像进行融合。
本公开实施例提供了一种图像处理方法,其可用于提取原始图像的下采样图像中的目标物区域,得到第一目标物掩膜;将下采样图像和第一目标物 掩膜输入至掩膜超分辨率模型中,利用掩膜超分辨率模型对第一目标物掩膜进行超分辨率处理,得到第二目标物掩膜;将第二目标物掩膜和原始图像进行融合,得到目标物图像可;由此通过提升目标物对应掩膜的分辨率,提升目标物提取过程整体的精细度,可有效避免尺寸较大的、高分辨率的图像在进行目标物提取、抠图时出现边缘锯齿的问题。
图2为本公开实施例步骤S3的一种具体实施方法流程图。具体地,掩膜超分辨率模型包括第一子模型和第二子模型;如图2所示,步骤S3中,利用掩膜超分辨率模型对第一目标物掩膜进行超分辨率处理,得到第二目标物掩膜的步骤,包括:步骤S301和步骤S302。
步骤S301、利用第一子模型提取下采样图像对应的图像特征。
步骤S302、将第一子模型提取的图像特征输入至第二子模型中,利用第二子模型,结合由第一子模型提取的图像特征对第一目标物掩膜进行超分辨率处理,得到第二目标物掩膜。
其中,在利用掩膜超分辨率模型对第一目标物掩膜进行超分辨率处理的过程中,将下采样图像和第一目标物掩膜分别输入至第一子模型和第二子模型;第一子模型用于提取下采样图像的图像特征,并输出至第二子模型,另外,在一些实施例中,第一子模型最后还可输出超分辨率处理后的下采样图像,该输出结果可用于校对模型输入输出以及检测超分辨率效果等;第二子模型用于结合由第一子模型提取的图像特征对第一目标物掩膜进行超分辨率处理,最后输出第二目标物掩膜,在一些实施例中,可通过拼接层(Concat)对特征图进行拼接,以及在一些实施例中,可通过1*1卷积以及通道纬度的池化等方式实现特征融合,以此结合由第一子模型提取的图像特征对第一目标物掩膜进行超分辨率处理。
图3为本公开实施例步骤S3的另一种具体实施方法流程图。具体地,该方法为基于图2所示方法的一种具体化可选实施方案;在其基础上,第一子模型包括依次连接的P级第一运算模块,每级第一运算模块包括第一运算单元和第二运算单元,其中,P为大于1的正整数;如图3所示,在执行步骤S301,利用第一子模型提取下采样图像对应的图像特征的步骤时,对于 第n级第一运算模块(n为正整数且不大于P),其包括:步骤S3011和步骤S3012。
步骤S3011、利用该级第一运算模块的第一运算单元,根据下采样图像或其上一级第一运算模块输出的第一特征图进行图像特征提取,生成第二特征图,并将第二特征图输出至第二子模型。
其中,利用多级第一运算模块分层级对下采样图像进行多次图像特征提取;具体地,当n=1时,即对于第一级第一运算模块,利用其第一运算单元直接对下采样图像进行图像特征提取;当n>1时,利用该级第一运算模块的第一运算单元对其上一级第一运算模块输出的第一特征图进行图像特征提取;具体地,基于上述的多次图像特征提取流程,则上述的,下采样图像对应的图像特征,包括:下采样图像的图像特征和其特征图的图像特征。
步骤S3012、利用该级第一运算模块的第二运算单元放大第二特征图的尺寸,并将放大后的第二特征图输出至下一级第一运算模块。
其中,该级第一运算模块输出的第二特征图即为下一级第一运算模块接收到的第一特征图。在一些实施例中,对于最后一级第一运算模块,其输出的放大后的特征图即超分辨率处理后的下采样图像。
在一些实施例中,第一运算单元包括依次连接的卷积层、批标准化层和激励层,第二运算单元包括转置卷积层,又称反卷积层、逆卷积层。则步骤S3012,利用该级第一运算模块的第二运算单元放大提取出的图像特征所对应的特征图的尺寸的步骤,具体包括:利用该级第一运算模块的第二运算单元对提取出的图像特征所对应的特征图进行转置卷积处理,以放大特征图的尺寸。
在一些实施例中,在不同层级的第一运算模块中,上述的卷积层、批标准化层、激励层和转置卷积层的参数设置可不相同。
具体地,对应于上述各层的运算处理过程,在本公开实施例中,术语“卷积核”指的是在卷积过程中使用的二维矩阵。可选地,二维矩阵中的多个项中的各个项具有特定值。
在本公开实施例中,术语“卷积”是指处理图像的过程。卷积核用于卷积。输入图像的每个像素具有值,卷积核开始于输入图像的一个像素,并顺序地在输入图像中的每个像素上移动。在卷积核的每个位置处,卷积核基于卷积核的尺度与图像上的几个像素重叠。在卷积核的位置处,将几个重叠像素中的一个像素的值乘以卷积核的相应一个值,以获得几个重叠像素中的一个像素的相乘值。随后,将重叠像素的所有相乘值相加,以获得与卷积核在输入图像上的位置相对应的和。通过在输入图像的每个像素上移动卷积核,收集并输出与卷积核的所有位置相对应的所有和,以形成输出图像。在一些实施例中,卷积可以使用不同的卷积核来提取输入图像的不同特征。在一些实施例中,卷积过程可使用不同的卷积核,将更多特征添加到输入图像。
其中,卷积层用于对输入图像执行卷积以获得输出图像。可选地,使用不同的卷积核来对同一输入图像执行不同的卷积。可选地,使用不同的卷积核来对同一输入图像的不同部分执行卷积。可选地,使用不同的卷积核来对不同的输入图像执行卷积,例如,在卷积层中输入多个图像,使用相应的卷积核来对多个图像中的图像执行卷积。可选地,根据输入图像的不同情况使用不同的卷积核。
其中,激励层可对从卷积层输出的输出信号执行非线性映射。在激励层中可以使用各种函数。适于在激励层中采用的函数的示例包括但不限于:整流线性单元(ReLU)函数、S形(sigmoid)函数和双曲正切函数(例如tanh函数)。在一些实施例中,激励层以及批标准化层包括在卷积层中。
其中,批标准化层(Batch Normalization,简称BN)可对一小批数据在网络模型各层的输出做标准化处理,标准化是使数据符合0均值,且1为标准差的标准正态分布的过程,可以解决神经网络模型中梯度消失的问题。
具体地,基于步骤S3011,步骤S302,利用第二子模型,结合由第一子模型提取的图像特征对第一目标物掩膜进行超分辨率处理,得到第二目标物掩膜的步骤,包括:利用第二子模型,结合每一级所述第一运算模块提取出的第二特征图,对第一目标物掩膜进行超分辨率处理,得到第二目标物掩膜。
在一些实施例中,第二子模型包括依次连接的P级第二运算模块,每级 第二运算模块包括拼接层、第三运算单元和第四运算单元,其中,其中,P为大于1的正整数,同级的第一运算单元和第三运算单元之间通过拼接层连接;由此,在一些实施例中,如图3所示,在执行上述的利用第二子模型,结合每一级所述第一运算模块提取出的第二特征图,对第一目标物掩膜进行超分辨率处理,得到第二目标物掩膜的步骤时,对于第m级第二运算模块(m为正整数且不大于P),其包括:步骤S3021至步骤S3023。
步骤S3021、利用该级第二运算模块的拼接层,拼接第二特征图和第一目标物掩膜,或拼接第二特征图和其上一级第二运算模块输出的第三特征图,生成第四特征图。
其中,当m=1时,即对于第一级第二运算模块,利用其拼接层拼接第二特征图和第一目标物掩膜;当m>1时,利用该级第二运算模块的拼接层拼接第二特征图和其上一级第二运算模块输出的第三特征图。
在一些实施例中,多级第一运算模块的层级数与多级第二运算模块的层级数相同。
在一些实施例中,拼接层用于在通道纬度上进行拼接。
步骤S3022、利用该级第二运算模块的第三运算单元,根据第四特征图进行图像特征提取,生成第五特征图。
步骤S3023、利用该级第二运算模块的第四运算单元放大第五特征图的尺寸,并将放大后的第五特征图输出至下一级第二运算模块。
其中,该级第二运算模块输出的第五特征图即为下一级第二运算模块接收到的第三特征图;对于最后一级第二运算模块,将其放大后的第五特征图作为第二目标物掩膜并输出。
在一些实施例中,与第一运算模块类似地,第三运算单元包括依次连接的卷积层、批标准化层和激励层,第四运算单元包括转置卷积层。
在一些实施例中,在不同层级的第二运算模块中,上述的卷积层、批标准化层、激励层和转置卷积层的参数设置可不相同。
本公开实施例提供了一种图像处理方法,其可用于结合下采样图像的图 像特征对目标物对应的掩膜进行超分辨率处理,增加其特征维度,提升目标物提取的精细度。
图4为本公开实施例提供的一种掩膜超分辨率模型的训练方法流程图。具体地,该掩膜超分辨率模型为图2所对应的掩膜超分辨率模型,其包括第一子模型和第二子模型;如图4所示,该掩膜超分辨率模型通过如下步骤训练得到:
步骤S01、将下采样图像样本和其对应的目标物掩膜样本输入至待训练的所述掩膜超分辨率模型中。
其中,下采样图像样本为其对应的原始图像样本经下采样后得到,目标物掩膜样本由下采样图像样本经目标物提取后得到。
步骤S02、通过迭代的方式,基于下采样图像样本和目标物掩膜样本对待训练的掩膜超分辨率模型进行训练。
图5为本公开实施例步骤S02的具体实施方法流程图。如图5所示,步骤S02包括:步骤S021和步骤S022。
步骤S021、利用待训练的第一子模型提取下采样图像样本对应的图像特征,并对下采样图像样本进行超分辨率处理。
步骤S022、将待训练的第一子模型提取的图像特征输入至待训练的第二子模型中,利用待训练的第二子模型,结合由待训练的第一子模型提取的图像特征对目标物掩膜样本进行超分辨率处理。
其中,与下采样图像对应的图像特征类似地,下采样图像样本对应的图像特征包括下采样图像样本的图像特征以及其特征图的图像特征;上述训练第一子模型和第二子模型的过程与第一子模型和第二子模型的实际推理过程相对应。
步骤S03、响应于预设收敛条件满足,结束训练,得到掩膜超分辨率模型。
在一些实施例中,预设收敛条件包括以下至少之一:已训练预设迭代次数;第一损失值和第二损失值满足预设的损失值条件。
其中,第一损失值基于下采样图像样本对应的原始图像样本和超分辨率处理后的下采样图像样本计算得到,第二损失值基于原始图像样本和超分辨率处理后的目标物掩膜样本计算得到。
在一些实施例中,在步骤S021,利用待训练的第一子模型提取下采样图像样本对应的图像特征,并对下采样图像样本进行超分辨率处理的步骤之后,还包括:基于均方误差(Mean Square Error,简称MSE)函数,根据原始图像样本和超分辨率处理后的下采样图像样本计算得到第一损失值。
在一些实施例中,在步骤S022,利用待训练的第二子模型,结合由待训练的第一子模型提取的图像特征对目标物掩膜样本进行超分辨率处理的步骤之后,还包括:获取原始图像样本对应的第一边缘图;对超分辨率处理后的目标物掩膜样本进行边缘检测,得到第二边缘图;对第一边缘图和第二边缘图进行边缘匹配,根据边缘匹配结果确定第二损失值。在一些实施例中,对原始图像样本进行边缘检测以得到第一边缘图,或者从存储区读取预先计算得到的该第一边缘图。
在一些实施例中,采用如下公式:
Figure PCTCN2021127282-appb-000001
计算第二损失值L EM;其中,EM1(x i,y i)表示第一边缘图中像素坐标(x i,y i)对应的像素值,EM2(x i,y i)表示第二边缘图中像素坐标(x i,y i)对应的像素值,i∈[1,n];在一些实施例中,边缘图为8bit灰度图像,EM1(x i,y i)>127,EM2(x i,y i)>127。
图6为本公开实施例步骤S2的一种具体实施方法流程图。如图6所示,步骤S2,提取下采样图像中的目标物区域,并得到第一目标物掩膜的步骤,包括:步骤S201。
步骤S201、将下采样图像输入至预先训练得到的目标物提取模型中,利用目标物提取模型提取下采样图像中的目标物区域,得到第一目标物掩膜。
其中,该目标物提取模型采用UNet网络模型,其输入图像及输出图像 的分辨率均为512*512;相应地,在步骤S1中,预设分辨率为512*512。具体地,利用目标物提取模型得到的第一目标物掩膜输入至掩膜超分辨率模型中,掩膜超分辨率模型包括第一子模型和第二子模型,第一子模型包括依次连接的三级第一运算模块,第二子模型包括依次连接的三级第二运算模块,第一运算模块和第二运算模块中具体各层的设置与图3所对应的各层设置相同,由此,可利用三级第一运算模块对下采样图像进行三次图像特征提取,三级第二运算模块结合各次提取出的图像特征对第一目标物掩膜进行超分辨率处理。
在一些实施例中,每级第二运算模块利用其第四运算单元放大其第五特征图的尺寸,则基于相应模型参数设置,每次可将特征图的尺寸放大两倍,最后输出的第二目标物掩膜的分辨率可为4096*4096,由此可应用于4K场景;具体地,通过设置转置卷积层的填充参数(padding)可实现两倍放大,如将该参数设置为“same”等。
下面结合实际应用对本公开实施例提供的图像处理方法进行详细说明。具体地,以应用于人像抠图为例,则下采样图像中的目标物区域为人像区域,最终得到的目标物图像为人像图像。
在具体实施中,首先根据预设分辨率对原始图像进行下采样,生成下采样图像;其中,该原始图像为包括人像的4K图像,预设分辨率为512*512。
将下采样图像输入至预先训练得到的目标物提取模型中,利用目标物提取模型提取下采样图像中的人像区域,得到第一目标物掩膜;其中,该目标物提取模型具体用于人像抠图,其采用UNet网络模型。
将下采样图像和第一目标物掩膜输入至预先训练得到的掩膜超分辨率模型中;其中,掩膜超分辨率模型包括第一子模型和第二子模型;第一子模型包括依次连接的三级第一运算模块(即P=3),每级第一运算模块包括第一运算单元和第二运算单元,下采样图像输入至第一子模型中;第二子模型包括依次连接的三级第二运算模块,每级第二运算模块包括拼接层、第三运算单元和第四运算单元,其中,同级的第一运算单元和第三运算单元之间通过拼接层连接,第一目标物掩膜输入至第二子模型中。具体地,对于第n级 第一运算模块(n为正整数且不大于3),利用其第一运算单元,根据下采样图像或其上一级第一运算模块输出的第一特征图进行图像特征提取,生成第二特征图;以及,利用其第二运算单元放大第二特征图的尺寸,并将放大后的第二特征图输出至下一级第一运算模块;其中,对于第一级第一运算模块,其第一运算单元直接对下采样图像进行图像特征提取,而对于第二级和第三级第一运算模块,其第一运算单元分别对第一级第一运算模块和第二级第一运算模块输出的特征图进行图像特征提取,第三级运算模块将其放大后的第二特征图直接输出,该第二特征图为进行超分辨率处理后的下采样图像;在一些实施例中,第一运算单元包括依次连接的卷积层、批标准化层和激励层,第二运算单元包括转置卷积层。具体地,对于第m级第二运算模块(m为正整数且不大于3),利用其拼接层,拼接同级第一运算模块输出的第二特征图和第一目标物掩膜,或拼接第二特征图和其上一级第二运算模块输出的第三特征图,生成第四特征图;利用其第三运算单元,根据第四特征图进行图像特征提取,生成第五特征图;利用其第四运算单元放大第五特征图的尺寸,并将放大后的第五特征图输出至下一级第二运算模块;其中,对于第一级第二运算模块,利用其拼接层拼接第一级第一运算模块输出的特征图和第一目标物掩膜,而对于第二级和第三级第二运算模块,其拼接层分别拼接第一级第二运算模块和第二级第一运算模块输出的特征图,以及拼接第二级第二运算模块和第三级第一运算模块输出的特征图,第三级运算模块将其放大后的第五特征图直接输出,该第五特征图即为第二目标物掩膜;在一些实施例中,第三运算单元包括依次连接的卷积层、批标准化层和激励层,第四运算单元包括转置卷积层。
最后,将第二目标物掩膜和原始图像进行融合,得到人像图像。
图7为本公开实施例提供的一种掩膜超分辨率模型的结构示意图。如图7所示,图中箭头示出了数据传输方向;掩膜超分辨率模型包括第一子模型和第二子模型;第一子模型包括依次连接的三级第一运算模块301,每级第一运算模块301包括第一运算单元CBR1和第二运算单元T_conv1,下采样图像LR输入至第一子模型中,第一子模型输出超分辨率处理后的下采样图 像HR;第二子模型包括依次连接的三级第二运算模块401,每级第二运算模块401包括拼接层(图中未示出)、第三运算单元CBR2和第四运算单元T_conv2,其中,同级的第一运算单元CBR1和第三运算单元CBR2之间通过拼接层连接,第一目标物掩膜MASK_LR输入至第二子模型中,第二子模型输出第二目标物掩膜MASK_HR;其中,CBR1和CBR2内部各层的设置类似,如图7所示,其包括卷积层Conv、批标准化层Batch_norm和激励层ReLu。
图8为本公开实施例提供的一种电子设备的组成框图。如图8所示,该电子设备包括:
一个或多个处理器101;
存储器102,其上存储有一个或多个程序,当该一个或多个程序被该一个或多个处理器执行,使得该一个或多个处理器101实现如上述实施例中任一的图像处理方法;
一个或多个I/O接口103,连接在处理器与存储器之间,配置为实现处理器与存储器的信息交互。
其中,处理器101为具有数据处理能力的器件,其包括但不限于中央处理器(CPU)等;存储器102为具有数据存储能力的器件,其包括但不限于随机存取存储器(RAM,更具体如SDRAM、DDR等)、只读存储器(ROM)、带电可擦可编程只读存储器(EEPROM)、闪存(FLASH);I/O接口(读写接口)103连接在处理器101与存储器102间,能实现处理器101与存储器102的信息交互,其包括但不限于数据总线(Bus)等。
在一些实施例中,处理器101、存储器102和I/O接口103通过总线104相互连接,进而与计算设备的其它组件连接。
在一些实施例中,多个处理器101包括多个图形处理器(GPU),其组合设置构成图形处理器阵列。
图9为本公开实施例提供的一种非瞬态计算机可读介质的组成框图。该计算机可读介质上存储有计算机程序,其中,该计算机程序在被处理器执行 时实现如上述实施例中任一的图像处理方法。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在非瞬态计算机可读介质上,非瞬态计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
本文已经公开了示例实施例,并且虽然采用了具体术语,但它们仅用于并仅应当被解释为一般说明性含义,并且不用于限制的目的。在一些实例中,对本领域技术人员显而易见的是,除非另外明确指出,否则可单独使用与特定实施例相结合描述的特征、特性和/或元素,或可与其他实施例相结合描述的特征、特性和/或元件组合使用。因此,本领域技术人员将理解,在不脱离由所附的权利要求阐明的本公开的范围的情况下,可进行各种形式和细节上的改变。

Claims (14)

  1. 一种图像处理方法,其中,包括:
    根据预设分辨率对原始图像进行下采样,生成下采样图像;
    提取所述下采样图像中的目标物区域,得到第一目标物掩膜;
    将所述下采样图像和所述第一目标物掩膜输入至预先训练得到的掩膜超分辨率模型中;利用所述掩膜超分辨率模型对所述第一目标物掩膜进行超分辨率处理,得到第二目标物掩膜,其中,所述第二目标物掩膜的分辨率高于所述第一目标物掩膜的分辨率;
    将所述第二目标物掩膜和所述原始图像进行融合,得到目标物图像。
  2. 根据权利要求1所述的图像处理方法,其中,所述掩膜超分辨率模型包括第一子模型和第二子模型;
    所述利用所述掩膜超分辨率模型对所述第一目标物掩膜进行超分辨率处理,得到第二目标物掩膜,包括:
    利用所述第一子模型提取所述下采样图像对应的图像特征;
    将所述第一子模型提取的图像特征输入至所述第二子模型中;利用所述第二子模型,结合由所述第一子模型提取的图像特征对所述第一目标物掩膜进行超分辨率处理,得到所述第二目标物掩膜。
  3. 根据权利要求2所述的图像处理方法,其中,所述第一子模型包括依次连接的P级第一运算模块,每级第一运算模块包括第一运算单元和第二运算单元,其中,P为大于1的正整数;
    所述利用所述第一子模型提取所述下采样图像对应的图像特征,包括:
    对于第n级第一运算模块,n为正整数且不大于P:利用其第一运算单元,根据所述下采样图像或其上一级第一运算模块输出的第一特征图进行图 像特征提取,生成第二特征图,并将所述第二特征图输出至所述第二子模型;利用其第二运算单元放大所述第二特征图的尺寸,并将放大后的所述第二特征图输出至下一级第一运算模块;
    所述利用所述第二子模型,结合由所述第一子模型提取的图像特征对所述第一目标物掩膜进行超分辨率处理,得到所述第二目标物掩膜,包括:
    利用所述第二子模型,结合每一级所述第一运算模块提取出的所述第二特征图,对所述第一目标物掩膜进行超分辨率处理,得到所述第二目标物掩膜。
  4. 根据权利要求3所述的图像处理方法,其中,所述第一运算单元包括依次连接的卷积层、批标准化层和激励层,所述第二运算单元包括转置卷积层。
  5. 根据权利要求3所述的图像处理方法,其中,所述第二子模型包括依次连接的P级第二运算模块,每级第二运算模块包括拼接层、第三运算单元和第四运算单元,其中,同级的第一运算单元和第三运算单元之间通过拼接层连接;
    所述利用所述第二子模型,结合每一级所述第一运算模块提取出的所述第二特征图,对所述第一目标物掩膜进行超分辨率处理,得到所述第二目标物掩膜,包括:
    对于第m级第二运算模块,m为正整数且不大于P:利用其拼接层,拼接所述第二特征图和所述第一目标物掩膜,或拼接所述第二特征图和其上一级第二运算模块输出的第三特征图,生成第四特征图;利用其第三运算单元,根据所述第四特征图进行图像特征提取,生成第五特征图;利用其第四运算单元放大所述第五特征图的尺寸,并将放大后的所述第五特征图输出至下一级第二运算模块;
    其中,对于最后一级第二运算模块,将其放大后的所述第五特征图作为 所述第二目标物掩膜并输出。
  6. 根据权利要求5所述的图像处理方法,其中,所述第三运算单元包括依次连接的卷积层、批标准化层和激励层,所述第四运算单元包括转置卷积层。
  7. 根据权利要求5所述的图像处理方法,其中,所述提取所述下采样图像中的目标物区域,并得到第一目标物掩膜,包括:
    将所述下采样图像输入至预先训练得到的目标物提取模型中,利用所述目标物提取模型提取所述下采样图像中的目标物区域,得到所述第一目标物掩膜;
    其中,所述目标物提取模型为UNet网络模型,所述第一子模型包括依次连接的三级第一运算模块,所述第二子模型包括依次连接的三级第二运算模块。
  8. 根据权利要求2所述的图像处理方法,其中,所述掩膜超分辨率模型通过如下步骤训练得到:
    将下采样图像样本和其对应的目标物掩膜样本输入至待训练的所述掩膜超分辨率模型中;
    通过迭代的方式,基于所述下采样图像样本和所述目标物掩膜样本对待训练的所述掩膜超分辨率模型进行训练;其中,利用待训练的所述第一子模型提取所述下采样图像样本对应的图像特征,并对所述下采样图像样本进行超分辨率处理;以及,将待训练的所述第一子模型提取的图像特征输入至待训练的所述第二子模型中;利用待训练的所述第二子模型,结合由待训练的所述第一子模型提取的图像特征对所述目标物掩膜样本进行超分辨率处理;
    响应于预设收敛条件满足,结束训练,得到所述掩膜超分辨率模型。
  9. 根据权利要求8所述的图像处理方法,其中,所述预设收敛条件包括以下至少之一:
    已训练预设迭代次数;
    第一损失值和第二损失值满足预设的损失值条件,其中,所述第一损失值基于所述下采样图像样本对应的原始图像样本和超分辨率处理后的所述下采样图像样本计算得到,所述第二损失值基于所述原始图像样本和超分辨率处理后的所述目标物掩膜样本计算得到。
  10. 根据权利要求9所述的图像处理方法,其中,在所述利用待训练的所述第一子模型提取所述下采样图像样本对应的图像特征,并对所述下采样图像样本进行超分辨率处理之后,还包括:
    基于均方误差函数,根据所述原始图像样本和超分辨率处理后的所述下采样图像样本计算得到所述第一损失值。
  11. 根据权利要求9所述的图像处理方法,其中,在所述利用待训练的所述第二子模型,结合由待训练的所述第一子模型提取的图像特征对所述目标物掩膜样本进行超分辨率处理之后,还包括:
    获取所述原始图像样本对应的第一边缘图;
    对超分辨率处理后的所述目标物掩膜样本进行边缘检测,得到第二边缘图;
    对所述第一边缘图和所述第二边缘图进行边缘匹配,根据边缘匹配结果确定所述第二损失值。
  12. 根据权利要求1-11中任意一项所述的图像处理方法,其中,
    所述目标物区域为人像区域,所述目标物图像为人像图像。
  13. 一种电子设备,包括:
    一个或多个处理器;
    存储器,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-12中任意一项所述的图像处理方法。
  14. 一种非瞬态计算机可读介质,其上存储有计算机程序,其中,所述程序被执行时实现如权利要求1-12中任意一项所述的图像处理方法。
PCT/CN2021/127282 2021-10-29 2021-10-29 图像处理方法、电子设备和非瞬态计算机可读介质 WO2023070495A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202180003148.9A CN116368512A (zh) 2021-10-29 2021-10-29 图像处理方法、电子设备和非瞬态计算机可读介质
PCT/CN2021/127282 WO2023070495A1 (zh) 2021-10-29 2021-10-29 图像处理方法、电子设备和非瞬态计算机可读介质
US18/273,026 US20240087085A1 (en) 2021-10-29 2021-10-29 Image processing method, electronic device, and non-transitory computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/127282 WO2023070495A1 (zh) 2021-10-29 2021-10-29 图像处理方法、电子设备和非瞬态计算机可读介质

Publications (2)

Publication Number Publication Date
WO2023070495A1 true WO2023070495A1 (zh) 2023-05-04
WO2023070495A9 WO2023070495A9 (zh) 2024-01-11

Family

ID=86160396

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/127282 WO2023070495A1 (zh) 2021-10-29 2021-10-29 图像处理方法、电子设备和非瞬态计算机可读介质

Country Status (3)

Country Link
US (1) US20240087085A1 (zh)
CN (1) CN116368512A (zh)
WO (1) WO2023070495A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10664718B1 (en) * 2017-09-11 2020-05-26 Apple Inc. Real-time adjustment of hybrid DNN style transfer networks
CN112819720A (zh) * 2021-02-02 2021-05-18 Oppo广东移动通信有限公司 图像处理方法、装置、电子设备及存储介质
CN113449735A (zh) * 2021-07-15 2021-09-28 北京科技大学 一种超像素分割的语义分割方法及装置
CN113763249A (zh) * 2021-09-10 2021-12-07 平安科技(深圳)有限公司 文本图像超分辨率重建方法及其相关设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10664718B1 (en) * 2017-09-11 2020-05-26 Apple Inc. Real-time adjustment of hybrid DNN style transfer networks
CN112819720A (zh) * 2021-02-02 2021-05-18 Oppo广东移动通信有限公司 图像处理方法、装置、电子设备及存储介质
CN113449735A (zh) * 2021-07-15 2021-09-28 北京科技大学 一种超像素分割的语义分割方法及装置
CN113763249A (zh) * 2021-09-10 2021-12-07 平安科技(深圳)有限公司 文本图像超分辨率重建方法及其相关设备

Also Published As

Publication number Publication date
CN116368512A (zh) 2023-06-30
WO2023070495A9 (zh) 2024-01-11
US20240087085A1 (en) 2024-03-14

Similar Documents

Publication Publication Date Title
Kim et al. Deformable kernel networks for joint image filtering
CN106780512B (zh) 分割图像的方法、应用及计算设备
EP3923233A1 (en) Image denoising method and apparatus
US20210216878A1 (en) Deep learning-based coregistration
CN111161269B (zh) 图像分割方法、计算机设备和可读存储介质
CN110598714A (zh) 一种软骨图像分割方法、装置、可读存储介质及终端设备
WO2023077809A1 (zh) 神经网络训练的方法、电子设备及计算机存储介质
CN111310758A (zh) 文本检测方法、装置、计算机设备和存储介质
Zhang et al. Deformable and residual convolutional network for image super-resolution
CN113298032A (zh) 基于深度学习的无人机视角图像的车辆目标检测方法
CN113421276A (zh) 一种图像处理方法、装置及存储介质
CN112613541A (zh) 目标检测方法及装置、存储介质及电子设备
Gendy et al. Balanced spatial feature distillation and pyramid attention network for lightweight image super-resolution
Lu et al. Non-convex joint bilateral guided depth upsampling
Liu et al. Aerial image super-resolution based on deep recursive dense network for disaster area surveillance
Wang et al. Joint depth map super-resolution method via deep hybrid-cross guidance filter
Shao et al. Nonparametric blind super-resolution using adaptive heavy-tailed priors
WO2023070495A1 (zh) 图像处理方法、电子设备和非瞬态计算机可读介质
CN116071239B (zh) 一种基于混合注意力模型的ct图像超分辨方法和装置
Singh et al. Single image super-resolution using adaptive domain transformation
Xiong et al. Single image super-resolution via image quality assessment-guided deep learning network
CN116934591A (zh) 多尺度特征提取的图像拼接方法、装置、设备及存储介质
WO2022247394A1 (zh) 图像拼接方法及装置、存储介质及电子设备
Wang et al. Deep residual network for single image super-resolution
Xie et al. Bidirectionally aligned sparse representation for single image super-resolution

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21961857

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18273026

Country of ref document: US