WO2023070495A9 - 图像处理方法、电子设备和非瞬态计算机可读介质 - Google Patents
图像处理方法、电子设备和非瞬态计算机可读介质 Download PDFInfo
- Publication number
- WO2023070495A9 WO2023070495A9 PCT/CN2021/127282 CN2021127282W WO2023070495A9 WO 2023070495 A9 WO2023070495 A9 WO 2023070495A9 CN 2021127282 W CN2021127282 W CN 2021127282W WO 2023070495 A9 WO2023070495 A9 WO 2023070495A9
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- model
- mask
- sub
- target
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 28
- 238000012545 processing Methods 0.000 claims abstract description 67
- 238000000034 method Methods 0.000 claims abstract description 36
- 238000005070 sampling Methods 0.000 claims abstract description 4
- 238000000605 extraction Methods 0.000 claims description 26
- 230000005284 excitation Effects 0.000 claims description 15
- 238000010606 normalization Methods 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 11
- 238000012549 training Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 4
- 238000003708 edge detection Methods 0.000 claims description 4
- 239000000284 extract Substances 0.000 claims description 4
- 230000004044 response Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 description 13
- 238000010586 diagram Methods 0.000 description 6
- 230000004927 fusion Effects 0.000 description 4
- 101150007921 CBR2 gene Proteins 0.000 description 3
- 102100021973 Carbonyl reductase [NADPH] 1 Human genes 0.000 description 3
- 101000896985 Homo sapiens Carbonyl reductase [NADPH] 1 Proteins 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4046—Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/174—Segmentation; Edge detection involving the use of two or more images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Definitions
- the present disclosure relates to the field of image processing technology, and in particular to an image processing method, electronic equipment and non-transitory computer-readable media.
- the present disclosure aims to solve at least one of the technical problems existing in the prior art, and proposes an image processing method, an electronic device and a non-transitory computer-readable medium.
- an embodiment of the present disclosure provides an image processing method, including:
- the down-sampled image and the first target mask are input into a pre-trained mask super-resolution model; the first target mask is super-resolved using the mask super-resolution model. rate processing to obtain a second target mask, wherein the resolution of the second target mask is higher than the resolution of the first target mask;
- the second target mask and the original image are fused to obtain a target image.
- the mask super-resolution model includes a first sub-model and a second sub-model
- the method of using the mask super-resolution model to perform super-resolution processing on the first target mask to obtain a second target mask includes:
- the first sub-model includes P levels of first operation modules connected in sequence, and each level of the first operation module includes a first operation unit and a second operation unit, where P is a positive integer greater than 1;
- Using the first sub-model to extract image features corresponding to the downsampled image includes:
- n is a positive integer and not greater than P: use its first operation unit to perform image feature extraction based on the downsampled image or the first feature map output by the upper level first operation module. , generate a second feature map, and output the second feature map to the second sub-model; use its second operation unit to enlarge the size of the second feature map, and use the enlarged second feature map to The graph is output to the first computing module of the next level;
- the method of using the second sub-model and combining the image features extracted by the first sub-model to perform super-resolution processing on the first target mask to obtain the second target mask includes:
- the first operation unit includes a convolution layer, a batch normalization layer and an excitation layer connected in sequence
- the second operation unit includes a transposed convolution layer
- the second sub-model includes P-level second operation modules connected in sequence.
- Each level of the second operation module includes a splicing layer, a third operation unit and a fourth operation unit, wherein the first operation module at the same level
- the computing unit and the third computing unit are connected through a splicing layer;
- m is a positive integer and not greater than P: use its splicing layer to splice the second feature map and the first target mask, or splice the second feature map and its
- the third feature map output by the second operation module of the upper level generates a fourth feature map; its third operation unit is used to extract image features according to the fourth feature map to generate a fifth feature map; its fourth operation is used The unit enlarges the size of the fifth feature map and outputs the enlarged fifth feature map to the second computing module of the next level;
- the amplified fifth feature map is used as the second target object mask and output.
- the third operation unit includes a convolution layer, a batch normalization layer and an excitation layer connected in sequence
- the fourth operation unit includes a transposed convolution layer
- extracting the target area in the down-sampled image and obtaining the first target mask includes:
- the target extraction model is a UNet network model
- the first sub-model includes three-level first operation modules connected in sequence
- the second sub-model includes three-level second operation modules connected in series.
- the mask super-resolution model is trained through the following steps:
- the mask super-resolution model to be trained is trained based on the down-sampled image sample and the target mask sample; wherein, the first sub-model to be trained is used to extract the down-sampled image sample. Sampling the image features corresponding to the image samples, and performing super-resolution processing on the down-sampled image samples; and inputting the image features extracted by the first sub-model to be trained into the second sub-model to be trained. ;Utilize the second sub-model to be trained and perform super-resolution processing on the target mask sample in combination with the image features extracted by the first sub-model to be trained;
- the training ends and the mask super-resolution model is obtained.
- the preset convergence conditions include at least one of the following:
- the first loss value and the second loss value satisfy the preset loss value conditions, wherein the first loss value is based on the original image sample corresponding to the downsampled image sample and the downsampled image sample after super-resolution processing.
- the second loss value is calculated based on the original image sample and the target mask sample after super-resolution processing.
- the method further includes:
- the first loss value is calculated according to the original image sample and the down-sampled image sample after super-resolution processing.
- Edge matching is performed on the first edge map and the second edge map, and the second loss value is determined according to the edge matching result.
- the target area is a portrait area
- the target image is a portrait image
- an embodiment of the present disclosure also provides an electronic device, including:
- processors one or more processors
- Memory used to store one or more programs
- the one or more processors are caused to implement the image processing method as described in any of the above embodiments.
- embodiments of the present disclosure also provide a non-transitory computer-readable medium on which a computer program is stored, wherein when the program is executed, the image processing method as described in any of the above embodiments is implemented. .
- Figure 1 is a flow chart of an image processing method provided by an embodiment of the present disclosure
- Figure 2 is a flow chart of a specific implementation method of step S3 according to the embodiment of the present disclosure
- FIG. 3 is a flow chart of another specific implementation method of step S3 according to the embodiment of the present disclosure.
- Figure 4 is a flow chart of a training method for a mask super-resolution model provided by an embodiment of the present disclosure
- Figure 5 is a flow chart of a specific implementation method of step S02 according to the embodiment of the present disclosure.
- Figure 6 is a flow chart of a specific implementation method of step S2 according to the embodiment of the present disclosure.
- Figure 7 is a schematic structural diagram of a mask super-resolution model provided by an embodiment of the present disclosure.
- Figure 8 is a block diagram of an electronic device provided by an embodiment of the present disclosure.
- Figure 9 is a block diagram of a non-transitory computer-readable medium provided by an embodiment of the present disclosure.
- Figure 1 is a flow chart of an image processing method provided by an embodiment of the present disclosure. As shown in Figure 1, the method includes:
- Step S1 Downsample the original image according to the preset resolution to generate a downsampled image.
- the preset resolution is smaller than the resolution of the original image; downsampling the original image corresponds to the process of scaling the original image, which generates a downsampled image with a lower resolution, and the resolution of the downsampled image is the preset resolution. rate, the number of pixels in the down-sampled image is reduced compared to the number of pixels in the original image, and the time-consuming operation and processing of the down-sampled image is also reduced accordingly.
- the reduction ratio can be approximated by the preset resolution and the resolution of the original image. the ratio between.
- Step S2 Extract the target area in the downsampled image to obtain the first target mask.
- the mask is a single-channel image, which can be used to block all or part of the image to be processed during the image processing process to control the object area and process of image processing; in this embodiment, the first The target mask is a mask corresponding to the extracted target area, in which the target area is used as the foreground area of the down-sampled image, and other parts are used as the background area of the down-sampled image. If the first target mask is The mask acts on the downsampled image to obtain a downsampled image that only retains the foreground area; in some embodiments, the mask is a binary image composed of 0 and 1, or in some embodiments, the mask can also be a multi-dimensional image. value image.
- the target object area is a portrait area and the target object image is a portrait image
- step S2 corresponds to the process of portrait cutout.
- a human figure is only a specific implementation provided by the embodiment of the present disclosure, and it will not limit the technical solution of the present disclosure.
- Other types of target objects are also applicable to the technical solution of the present disclosure, such as Animals and plants, vehicles and other means of transportation, license plates, etc.;
- the target object under the corresponding target object type must meet at least one of the following conditions: have a specific shape; have a clear outline; be able to use the corresponding detection algorithm to determine its The location of the area in the image.
- Step S3 Input the downsampled image and the first target mask into the pre-trained mask super-resolution model, use the mask super-resolution model to perform super-resolution processing on the first target mask, and obtain the first target mask. Two target masks.
- the resolution of the second target mask is higher than the resolution of the first target mask; in step S3, the mask super-resolution model is used to super-resolve the first target mask in combination with the down-sampled image.
- Super Resolution also known as super-resolution processing, corresponds to the process of reconstructing a high-resolution image based on a low-resolution image.
- the mask super-resolution model is trained in advance based on original image samples, downsampled image samples, and target mask samples.
- Step S4 Fusion of the second target mask and the original image to obtain a target image.
- the target image is the final cutout result of the target in the original image.
- the second target mask is a binary image, and the second target mask and the original image are fused by multiplying; or, in some embodiments, as mentioned above, the second target mask is a binary image.
- the two-object mask is a single-channel image.
- the second object mask and the original image can be fused through channel fusion; or, in some embodiments, the second object mask can be fused through Poisson fusion. fused with the original image.
- Embodiments of the present disclosure provide an image processing method, which can be used to extract a target object area in a down-sampled image of an original image to obtain a first target object mask; input the down-sampled image and the first target object mask into the mask
- the mask super-resolution model is used to perform super-resolution processing on the first target mask to obtain the second target mask; the second target mask and the original image are fused to obtain the target
- the object image can be; thus, by improving the resolution of the mask corresponding to the target object, the overall precision of the target object extraction process is improved, which can effectively avoid the appearance of larger, high-resolution images during target object extraction and cutout.
- the problem of jagged edges can be used to extract a target object area in a down-sampled image of an original image to obtain a first target object mask; input the down-sampled image and the first target object mask into the mask
- the mask super-resolution model is used to perform super-resolution processing on the first target mask to obtain the second target mask;
- FIG. 2 is a flow chart of a specific implementation method of step S3 according to the embodiment of the present disclosure.
- the mask super-resolution model includes a first sub-model and a second sub-model; as shown in Figure 2, in step S3, the mask super-resolution model is used to perform super-resolution processing on the first target mask,
- the step of obtaining the second target object mask includes: step S301 and step S302.
- Step S301 Use the first sub-model to extract image features corresponding to the downsampled image.
- Step S302 Input the image features extracted by the first sub-model into the second sub-model, use the second sub-model to perform super-resolution processing on the first target mask in combination with the image features extracted by the first sub-model, and obtain Second target mask.
- the down-sampled image and the first target mask are input to the first sub-model and the second sub-model respectively;
- the first sub-model is used to extract the image features of the down-sampled image and output it to the second sub-model.
- the first sub-model can finally output the down-sampled image after super-resolution processing.
- the results can be used to calibrate model input and output and detect super-resolution effects;
- the second sub-model is used to perform super-resolution processing on the first target object mask based on the image features extracted by the first sub-model, and finally output the second target object.
- the feature maps can be spliced through a concatenation layer (Concat), and in some embodiments, feature fusion can be achieved through 1*1 convolution and channel latitude pooling.
- Super-resolution processing is performed on the first target object mask in combination with the image features extracted by the first sub-model.
- FIG. 3 is a flow chart of another specific implementation method of step S3 according to the embodiment of the present disclosure.
- this method is a specific optional implementation based on the method shown in Figure 2; based on it, the first sub-model includes P-level first computing modules connected in sequence, and each level of the first computing module includes a An arithmetic unit and a second arithmetic unit, where P is a positive integer greater than 1; as shown in Figure 3, when performing step S301, using the first sub-model to extract image features corresponding to the down-sampled image, for the nth
- the first level operation module (n is a positive integer and not greater than P) includes: step S3011 and step S3012.
- Step S3011 Utilize the first computing unit of the first computing module at this level to extract image features based on the downsampled image or the first feature map output by the first computing module at the upper level, generate a second feature map, and add the second feature map to the second feature map.
- the feature map is output to the second sub-model.
- Step S3012 Use the second computing unit of the first computing module at this level to enlarge the size of the second feature map, and output the enlarged second feature map to the first computing module at the next level.
- the second feature map output by the first operation module at this level is the first feature map received by the first operation module at the next level.
- the amplified feature map output by it is the down-sampled image after super-resolution processing.
- the first computing unit includes a convolution layer, a batch normalization layer, and an excitation layer connected in sequence
- the second computing unit includes a transposed convolution layer, also known as a deconvolution layer and a deconvolution layer.
- step S3012 is the step of using the second operation unit of the first operation module at this level to enlarge the size of the feature map corresponding to the extracted image feature, which specifically includes: using the second operation unit of the first operation module at this level to extract the The feature map corresponding to the image feature is transposed and convolved to enlarge the size of the feature map.
- the parameter settings of the above-mentioned convolution layer, batch normalization layer, excitation layer and transposed convolution layer may be different.
- the term "convolution kernel” refers to the two-dimensional matrix used in the convolution process.
- each of the multiple entries in the two-dimensional matrix has a specific value.
- the term “convolution” refers to the process of processing images.
- Convolution kernel is used for convolution.
- Each pixel of the input image has a value, and the convolution kernel starts at one pixel of the input image and moves sequentially over each pixel in the input image.
- the convolution kernel overlaps several pixels on the image based on the scale of the convolution kernel.
- the value of one pixel among several overlapping pixels is multiplied by the corresponding value of the convolution kernel to obtain the multiplied value of one pixel among several overlapping pixels.
- all multiplied values of overlapping pixels are added to obtain a sum corresponding to the position of the convolution kernel on the input image.
- convolution By moving the convolution kernel on each pixel of the input image, all sums corresponding to all positions of the convolution kernel are collected and output to form the output image.
- convolution can use different convolution kernels to extract different features of the input image.
- the convolution process can use different convolution kernels to add more features to the input image.
- the convolutional layer is used to perform convolution on the input image to obtain the output image.
- the excitation layer can perform non-linear mapping on the output signal output from the convolution layer.
- Various functions can be used in the excitation layer. Examples of functions suitable for use in the excitation layer include, but are not limited to: rectified linear unit (ReLU) functions, sigmoid functions, and hyperbolic tangent functions (eg, tanh functions).
- ReLU rectified linear unit
- sigmoid sigmoid functions
- hyperbolic tangent functions eg, tanh functions
- the excitation layer and the batch normalization layer are included in the convolutional layer.
- the batch normalization layer (Batch Normalization, referred to as BN) can standardize the output of a small batch of data at each layer of the network model. Standardization is the process of making the data conform to the standard normal distribution with a mean of 0 and a standard deviation of 1. It can solve the problem of gradient disappearance in neural network models.
- the steps of using the second sub-model and combining the image features extracted by the first sub-model to perform super-resolution processing on the first target mask to obtain the second target mask include : Using the second sub-model, combined with the second feature map extracted by the first operation module at each level, super-resolution processing is performed on the first target mask to obtain the second target mask.
- the second sub-model includes P levels of second operation modules connected in sequence, and each level of the second operation module includes a splicing layer, a third operation unit and a fourth operation unit, where P is greater than 1.
- P is greater than 1.
- the first computing unit and the third computing unit of the same level are connected through the splicing layer; thus, in some embodiments, as shown in Figure 3, when executing the above second sub-model, combining each When performing super-resolution processing on the first target object mask to obtain the second target object mask using the second feature map extracted by the first operation module of the level m, for the second operation module of the mth level (m is A positive integer and not greater than P), which includes: step S3021 to step S3023.
- Step S3021 Use the splicing layer of the second operation module at this level to splice the second feature map and the first target object mask, or splice the second feature map and the third feature map output by the second operation module at the upper level to generate The fourth feature map.
- the second layer splices the second feature map and the third feature map output by the second operation module of the upper level.
- the number of levels of the multi-level first operation module is the same as the number of levels of the multi-level second operation module.
- the tiling layer is used for tiling at channel latitudes.
- Step S3022 Use the third computing unit of the second computing module at this level to extract image features based on the fourth feature map and generate a fifth feature map.
- Step S3023 Use the fourth computing unit of the second computing module at this level to enlarge the size of the fifth feature map, and output the enlarged fifth feature map to the second computing module at the next level.
- the fifth feature map output by the second operation module at this level is the third feature map received by the second operation module at the next level; for the second operation module at the last level, the amplified fifth feature map is used as The second target is masked and output.
- the third operation unit includes a convolution layer, a batch normalization layer and an excitation layer connected in sequence
- the fourth operation unit includes a transposed convolution layer
- the parameter settings of the above-mentioned convolution layer, batch normalization layer, excitation layer and transposed convolution layer may be different.
- Embodiments of the present disclosure provide an image processing method that can be used to perform super-resolution processing on a mask corresponding to a target object based on the image features of the downsampled image, thereby increasing its feature dimension and improving the precision of target object extraction.
- Figure 4 is a flow chart of a training method for a mask super-resolution model provided by an embodiment of the present disclosure.
- the mask super-resolution model is the mask super-resolution model corresponding to Figure 2, which includes a first sub-model and a second sub-model; as shown in Figure 4, the mask super-resolution model is as follows Step training results:
- Step S01 Input the downsampled image sample and its corresponding target mask sample into the mask super-resolution model to be trained.
- the down-sampled image sample is obtained by down-sampling its corresponding original image sample
- the target mask sample is obtained by extracting the target object from the down-sampled image sample.
- Step S02 Train the mask super-resolution model to be trained based on the down-sampled image samples and the target mask samples in an iterative manner.
- FIG. 5 is a flowchart of a specific implementation method of step S02 according to the embodiment of the present disclosure. As shown in Figure 5, step S02 includes: step S021 and step S022.
- Step S021 Use the first sub-model to be trained to extract image features corresponding to the down-sampled image samples, and perform super-resolution processing on the down-sampled image samples.
- Step S022 Input the image features extracted by the first sub-model to be trained into the second sub-model to be trained, and use the second sub-model to be trained to combine the image features extracted by the first sub-model to be trained to target Object mask samples are processed for super-resolution.
- the image features corresponding to the down-sampled image sample include the image features of the down-sampled image sample and the image features of its feature map; the above-mentioned process of training the first sub-model and the second sub-model Corresponds to the actual reasoning process of the first sub-model and the second sub-model.
- Step S03 In response to the preset convergence conditions being met, the training ends and the mask super-resolution model is obtained.
- the preset convergence condition includes at least one of the following: the preset number of iterations has been trained; the first loss value and the second loss value satisfy the preset loss value condition.
- the first loss value is calculated based on the original image sample corresponding to the down-sampled image sample and the down-sampled image sample after super-resolution processing
- the second loss value is calculated based on the original image sample and the target mask sample after super-resolution processing. calculated.
- step S021 using the first sub-model to be trained to extract image features corresponding to the down-sampled image samples, and performing super-resolution processing on the down-sampled image samples, it also includes: based on the mean square error (Mean Square Error, MSE for short) function calculates the first loss value based on the original image samples and the downsampled image samples after super-resolution processing.
- MSE mean Square Error
- step S022 using the second sub-model to be trained and combining the image features extracted by the first sub-model to be trained to perform super-resolution processing on the target mask sample, it also includes: Obtain the first edge map corresponding to the original image sample; perform edge detection on the target mask sample after super-resolution processing to obtain the second edge map; perform edge matching on the first edge map and the second edge map, and perform edge matching according to the edge map. The result determines the second loss value.
- edge detection is performed on the original image sample to obtain the first edge map, or the pre-calculated first edge map is read from the storage area.
- the edge map is an 8-bit grayscale image, EM1 (x i ,y i )>127, EM2 (x i ,y i )>127.
- FIG. 6 is a flow chart of a specific implementation method of step S2 according to the embodiment of the present disclosure. As shown in Figure 6, step S2, the step of extracting the target object area in the down-sampled image and obtaining the first target mask includes: step S201.
- Step S201 Input the down-sampled image into a pre-trained target extraction model, use the target extraction model to extract the target area in the down-sampled image, and obtain a first target mask.
- the target extraction model adopts the UNet network model, and the resolutions of its input image and output image are both 512*512; accordingly, in step S1, the preset resolution is 512*512.
- the first target mask obtained by using the target extraction model is input into the mask super-resolution model.
- the mask super-resolution model includes a first sub-model and a second sub-model.
- the first sub-model includes sequentially connected
- the second operation module of each level uses its fourth operation unit to enlarge the size of its fifth feature map.
- the size of the feature map can be enlarged by two times each time, and the final output of the fifth feature map is
- the resolution of the two-object mask can be 4096*4096, which can be applied to 4K scenes; specifically, by setting the padding parameter (padding) of the transposed convolution layer, double amplification can be achieved, such as setting this parameter to "same" etc.
- the image processing method provided by the embodiment of the present disclosure will be described in detail below in conjunction with practical applications. Specifically, taking the application to portrait cutout as an example, the target area in the downsampled image is the portrait area, and the final target image is the portrait image.
- the original image is first down-sampled according to a preset resolution to generate a down-sampled image; where the original image is a 4K image including portraits, and the preset resolution is 512*512.
- the down-sampled image is input into the pre-trained target extraction model, and the target extraction model is used to extract the portrait area in the down-sampled image to obtain the first target mask; wherein, the target extraction model is specifically used for portrait matting.
- Figure which uses the UNet network model.
- the mask super-resolution model includes a first sub-model and a second sub-model;
- the second sub-model includes sequential connections A three-level second operation module, each level of the second operation module includes a splicing layer, a third operation unit and a fourth operation unit, wherein the first operation unit and the third operation unit at the same level are connected through a splicing layer.
- An object mask is input into the second sub-model.
- its first operation unit is used to perform image processing based on the downsampled image or the first feature map output by its upper-level first operation module.
- Feature extraction generates a second feature map; and, uses its second operation unit to enlarge the size of the second feature map, and outputs the enlarged second feature map to the next-level first operation module; wherein, for the first-level The first operation module, its first operation unit directly extracts image features from the downsampled image, and for the second-level and third-level first operation modules, its first operation unit performs image feature extraction on the first-level first operation module and the third-level first operation module respectively.
- the feature map output by the second-level first operation module is used for image feature extraction, and the third-level operation module directly outputs the enlarged second feature map.
- the second feature map is the down-sampled image after super-resolution processing; in some embodiments, the first operation unit includes a convolution layer, a batch normalization layer and an excitation layer connected in sequence, and the second operation unit includes a transposed convolution layer.
- the second computing module of the mth level (m is a positive integer and not greater than 3) its splicing layer is used to splice the second feature map output by the first computing module of the same level and the first target mask, or splice
- the second feature map and the third feature map output by the upper-level second operation module generate a fourth feature map; use its third operation unit to extract image features based on the fourth feature map to generate a fifth feature map; use Its fourth computing unit amplifies the size of the fifth feature map, and outputs the enlarged fifth feature map to the next-level second computing module; wherein, for the first-level second computing module, its splicing layer is used to splice the first The feature map and first target object mask output by the first-level operation module.
- the splicing layer splices the first-level second operation module and the second-level first operation module respectively.
- the feature map output by the module, and the feature map output by the second-level second operation module and the third-level first operation module are spliced together.
- the third-level operation module directly outputs the amplified fifth feature map.
- the fifth feature map That is, the second target object mask; in some embodiments, the third operation unit includes a convolution layer, a batch normalization layer and an excitation layer connected in sequence, and the fourth operation unit includes a transposed convolution layer.
- FIG. 7 is a schematic structural diagram of a mask super-resolution model provided by an embodiment of the present disclosure.
- the mask super-resolution model includes a first sub-model and a second sub-model;
- the first sub-model includes three-level first operation modules 301 connected in sequence, each
- the first-level computing module 301 includes a first computing unit CBR1 and a second computing unit T_conv1.
- the down-sampled image LR is input to the first sub-model, and the first sub-model outputs the down-sampled image HR after super-resolution processing;
- the second sub-model The model includes three levels of second operation modules 401 connected in sequence.
- Each level of the second operation module 401 includes a splicing layer (not shown in the figure), a third operation unit CBR2 and a fourth operation unit T_conv2, where the first operation module of the same level
- the computing unit CBR1 and the third computing unit CBR2 are connected through a splicing layer.
- the first target mask MASK_LR is input to the second sub-model, and the second sub-model outputs the second target mask MASK_HR; among them, CBR1 and CBR2 internal
- the settings of each layer are similar, as shown in Figure 7, which includes the convolution layer Conv, the batch normalization layer Batch_norm and the excitation layer ReLu.
- FIG. 8 is a block diagram of an electronic device provided by an embodiment of the present disclosure. As shown in Figure 8, the electronic device includes:
- processors 101 one or more processors 101;
- the memory 102 has one or more programs stored thereon.
- the one or more processors 101 implement image processing as in any of the above embodiments. method;
- One or more I/O interfaces 103 are connected between the processor and the memory, and are configured to implement information exchange between the processor and the memory.
- the processor 101 is a device with data processing capabilities, including but not limited to a central processing unit (CPU), etc.
- the memory 102 is a device with data storage capabilities, including but not limited to random access memory (RAM, more specifically Such as SDRAM, DDR, etc.), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory (FLASH);
- the I/O interface (read-write interface) 103 is connected between the processor 101 and the memory 102 , can realize information interaction between the processor 101 and the memory 102, which includes but is not limited to a data bus (Bus), etc.
- processor 101 memory 102, and I/O interface 103 are connected to each other and, in turn, to other components of the computing device via bus 104.
- the plurality of processors 101 includes a plurality of graphics processors (GPUs), which are combined to form a graphics processor array.
- GPUs graphics processors
- Figure 9 is a block diagram of a non-transitory computer-readable medium provided by an embodiment of the present disclosure.
- a computer program is stored on the computer-readable medium, wherein the computer program implements the image processing method in any of the above embodiments when executed by the processor.
- Non-transitory computer-readable media may include computer storage media (or non-transitory media) and communication media (or transitory media).
- computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. removable, removable and non-removable media.
- Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer.
- communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
- Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a general illustrative sense only and not for purpose of limitation. In some instances, it will be apparent to those skilled in the art that features, characteristics and/or elements described in connection with a particular embodiment may be used alone, or may be used in conjunction with other embodiments, unless expressly stated otherwise. Features and/or components used in combination. Accordingly, it will be understood by those skilled in the art that various changes in form and details may be made without departing from the scope of the present disclosure as set forth in the appended claims.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (14)
- 一种图像处理方法,其中,包括:根据预设分辨率对原始图像进行下采样,生成下采样图像;提取所述下采样图像中的目标物区域,得到第一目标物掩膜;将所述下采样图像和所述第一目标物掩膜输入至预先训练得到的掩膜超分辨率模型中;利用所述掩膜超分辨率模型对所述第一目标物掩膜进行超分辨率处理,得到第二目标物掩膜,其中,所述第二目标物掩膜的分辨率高于所述第一目标物掩膜的分辨率;将所述第二目标物掩膜和所述原始图像进行融合,得到目标物图像。
- 根据权利要求1所述的图像处理方法,其中,所述掩膜超分辨率模型包括第一子模型和第二子模型;所述利用所述掩膜超分辨率模型对所述第一目标物掩膜进行超分辨率处理,得到第二目标物掩膜,包括:利用所述第一子模型提取所述下采样图像对应的图像特征;将所述第一子模型提取的图像特征输入至所述第二子模型中;利用所述第二子模型,结合由所述第一子模型提取的图像特征对所述第一目标物掩膜进行超分辨率处理,得到所述第二目标物掩膜。
- 根据权利要求2所述的图像处理方法,其中,所述第一子模型包括依次连接的P级第一运算模块,每级第一运算模块包括第一运算单元和第二运算单元,其中,P为大于1的正整数;所述利用所述第一子模型提取所述下采样图像对应的图像特征,包括:对于第n级第一运算模块,n为正整数且不大于P:利用其第一运算单元,根据所述下采样图像或其上一级第一运算模块输出的第一特征图进行图 像特征提取,生成第二特征图,并将所述第二特征图输出至所述第二子模型;利用其第二运算单元放大所述第二特征图的尺寸,并将放大后的所述第二特征图输出至下一级第一运算模块;所述利用所述第二子模型,结合由所述第一子模型提取的图像特征对所述第一目标物掩膜进行超分辨率处理,得到所述第二目标物掩膜,包括:利用所述第二子模型,结合每一级所述第一运算模块提取出的所述第二特征图,对所述第一目标物掩膜进行超分辨率处理,得到所述第二目标物掩膜。
- 根据权利要求3所述的图像处理方法,其中,所述第一运算单元包括依次连接的卷积层、批标准化层和激励层,所述第二运算单元包括转置卷积层。
- 根据权利要求3所述的图像处理方法,其中,所述第二子模型包括依次连接的P级第二运算模块,每级第二运算模块包括拼接层、第三运算单元和第四运算单元,其中,同级的第一运算单元和第三运算单元之间通过拼接层连接;所述利用所述第二子模型,结合每一级所述第一运算模块提取出的所述第二特征图,对所述第一目标物掩膜进行超分辨率处理,得到所述第二目标物掩膜,包括:对于第m级第二运算模块,m为正整数且不大于P:利用其拼接层,拼接所述第二特征图和所述第一目标物掩膜,或拼接所述第二特征图和其上一级第二运算模块输出的第三特征图,生成第四特征图;利用其第三运算单元,根据所述第四特征图进行图像特征提取,生成第五特征图;利用其第四运算单元放大所述第五特征图的尺寸,并将放大后的所述第五特征图输出至下一级第二运算模块;其中,对于最后一级第二运算模块,将其放大后的所述第五特征图作为 所述第二目标物掩膜并输出。
- 根据权利要求5所述的图像处理方法,其中,所述第三运算单元包括依次连接的卷积层、批标准化层和激励层,所述第四运算单元包括转置卷积层。
- 根据权利要求5所述的图像处理方法,其中,所述提取所述下采样图像中的目标物区域,并得到第一目标物掩膜,包括:将所述下采样图像输入至预先训练得到的目标物提取模型中,利用所述目标物提取模型提取所述下采样图像中的目标物区域,得到所述第一目标物掩膜;其中,所述目标物提取模型为UNet网络模型,所述第一子模型包括依次连接的三级第一运算模块,所述第二子模型包括依次连接的三级第二运算模块。
- 根据权利要求2所述的图像处理方法,其中,所述掩膜超分辨率模型通过如下步骤训练得到:将下采样图像样本和其对应的目标物掩膜样本输入至待训练的所述掩膜超分辨率模型中;通过迭代的方式,基于所述下采样图像样本和所述目标物掩膜样本对待训练的所述掩膜超分辨率模型进行训练;其中,利用待训练的所述第一子模型提取所述下采样图像样本对应的图像特征,并对所述下采样图像样本进行超分辨率处理;以及,将待训练的所述第一子模型提取的图像特征输入至待训练的所述第二子模型中;利用待训练的所述第二子模型,结合由待训练的所述第一子模型提取的图像特征对所述目标物掩膜样本进行超分辨率处理;响应于预设收敛条件满足,结束训练,得到所述掩膜超分辨率模型。
- 根据权利要求8所述的图像处理方法,其中,所述预设收敛条件包括以下至少之一:已训练预设迭代次数;第一损失值和第二损失值满足预设的损失值条件,其中,所述第一损失值基于所述下采样图像样本对应的原始图像样本和超分辨率处理后的所述下采样图像样本计算得到,所述第二损失值基于所述原始图像样本和超分辨率处理后的所述目标物掩膜样本计算得到。
- 根据权利要求9所述的图像处理方法,其中,在所述利用待训练的所述第一子模型提取所述下采样图像样本对应的图像特征,并对所述下采样图像样本进行超分辨率处理之后,还包括:基于均方误差函数,根据所述原始图像样本和超分辨率处理后的所述下采样图像样本计算得到所述第一损失值。
- 根据权利要求9所述的图像处理方法,其中,在所述利用待训练的所述第二子模型,结合由待训练的所述第一子模型提取的图像特征对所述目标物掩膜样本进行超分辨率处理之后,还包括:获取所述原始图像样本对应的第一边缘图;对超分辨率处理后的所述目标物掩膜样本进行边缘检测,得到第二边缘图;对所述第一边缘图和所述第二边缘图进行边缘匹配,根据边缘匹配结果确定所述第二损失值。
- 根据权利要求1-11中任意一项所述的图像处理方法,其中,所述目标物区域为人像区域,所述目标物图像为人像图像。
- 一种电子设备,包括:一个或多个处理器;存储器,用于存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-12中任意一项所述的图像处理方法。
- 一种非瞬态计算机可读介质,其上存储有计算机程序,其中,所述程序被执行时实现如权利要求1-12中任意一项所述的图像处理方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/273,026 US20240087085A1 (en) | 2021-10-29 | 2021-10-29 | Image processing method, electronic device, and non-transitory computer readable medium |
CN202180003148.9A CN116368512A (zh) | 2021-10-29 | 2021-10-29 | 图像处理方法、电子设备和非瞬态计算机可读介质 |
DE112021008413.5T DE112021008413T5 (de) | 2021-10-29 | 2021-10-29 | Bildverarbeitungsverfahren, elektronisches gerät und nichtflüchtiges computerlesbares medium |
PCT/CN2021/127282 WO2023070495A1 (zh) | 2021-10-29 | 2021-10-29 | 图像处理方法、电子设备和非瞬态计算机可读介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/127282 WO2023070495A1 (zh) | 2021-10-29 | 2021-10-29 | 图像处理方法、电子设备和非瞬态计算机可读介质 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2023070495A1 WO2023070495A1 (zh) | 2023-05-04 |
WO2023070495A9 true WO2023070495A9 (zh) | 2024-01-11 |
Family
ID=86160396
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/127282 WO2023070495A1 (zh) | 2021-10-29 | 2021-10-29 | 图像处理方法、电子设备和非瞬态计算机可读介质 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20240087085A1 (zh) |
CN (1) | CN116368512A (zh) |
DE (1) | DE112021008413T5 (zh) |
WO (1) | WO2023070495A1 (zh) |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10909657B1 (en) * | 2017-09-11 | 2021-02-02 | Apple Inc. | Flexible resolution support for image and video style transfer |
US11521299B2 (en) * | 2020-10-16 | 2022-12-06 | Adobe Inc. | Retouching digital images utilizing separate deep-learning neural networks |
CN112819720B (zh) * | 2021-02-02 | 2023-10-03 | Oppo广东移动通信有限公司 | 图像处理方法、装置、电子设备及存储介质 |
CN113449735B (zh) * | 2021-07-15 | 2023-10-31 | 北京科技大学 | 一种超像素分割的语义分割方法及装置 |
CN113763249B (zh) * | 2021-09-10 | 2025-03-07 | 平安科技(深圳)有限公司 | 文本图像超分辨率重建方法及其相关设备 |
US12020400B2 (en) * | 2021-10-23 | 2024-06-25 | Adobe Inc. | Upsampling and refining segmentation masks |
-
2021
- 2021-10-29 CN CN202180003148.9A patent/CN116368512A/zh active Pending
- 2021-10-29 DE DE112021008413.5T patent/DE112021008413T5/de active Pending
- 2021-10-29 WO PCT/CN2021/127282 patent/WO2023070495A1/zh active Application Filing
- 2021-10-29 US US18/273,026 patent/US20240087085A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
DE112021008413T5 (de) | 2024-08-22 |
US20240087085A1 (en) | 2024-03-14 |
CN116368512A (zh) | 2023-06-30 |
WO2023070495A1 (zh) | 2023-05-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kim et al. | Deformable kernel networks for joint image filtering | |
CN111311629B (zh) | 图像处理方法、图像处理装置及设备 | |
US20210365710A1 (en) | Image processing method, apparatus, equipment, and storage medium | |
CN110598714A (zh) | 一种软骨图像分割方法、装置、可读存储介质及终端设备 | |
CN110599528A (zh) | 一种基于神经网络的无监督三维医学图像配准方法及系统 | |
CN111080660A (zh) | 一种图像分割方法、装置、终端设备及存储介质 | |
CN107154023A (zh) | 基于生成对抗网络和亚像素卷积的人脸超分辨率重建方法 | |
CN113744136B (zh) | 基于通道约束多特征融合的图像超分辨率重建方法和系统 | |
CN113421276B (zh) | 一种图像处理方法、装置及存储介质 | |
CN110223300A (zh) | Ct图像腹部多器官分割方法及装置 | |
CN110827335B (zh) | 乳腺影像配准方法和装置 | |
CN105590304A (zh) | 超分辨率图像重建方法和装置 | |
CN111444923A (zh) | 自然场景下图像语义分割方法和装置 | |
CN111611968B (zh) | 一种遥感图像的处理方法以及遥感图像处理模型 | |
CN110097503A (zh) | 基于邻域回归的超分辨率方法 | |
Xiong et al. | Single image super-resolution via image quality assessment-guided deep learning network | |
CN119130863A (zh) | 一种基于多重注意力机制的图像恢复方法及系统 | |
CN114078149A (zh) | 一种图像估计方法、电子设备及存储介质 | |
WO2023070495A9 (zh) | 图像处理方法、电子设备和非瞬态计算机可读介质 | |
CN112132753A (zh) | 多尺度结构引导图像的红外图像超分辨率方法及系统 | |
US20230401670A1 (en) | Multi-scale autoencoder generation method, electronic device and readable storage medium | |
US20230343438A1 (en) | Systems and methods for automatic image annotation | |
CN115375547A (zh) | 图像重建方法、装置、计算机设备和存储介质 | |
Yu et al. | Sub-pixel convolution and edge detection for multi-view stereo | |
CN114445436B (zh) | 一种目标检测的方法、装置以及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21961857 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18273026 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202317084318 Country of ref document: IN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 112021008413 Country of ref document: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 30/07/2024) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21961857 Country of ref document: EP Kind code of ref document: A1 |