CN113313646B

CN113313646B - Image processing method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN113313646B
Application number: CN202110586467.9A
Authority: CN
Inventors: 王顺飞
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-05-27
Filing date: 2021-05-27
Publication date: 2024-04-16
Anticipated expiration: 2041-05-27
Also published as: CN113313646A

Abstract

The embodiment of the application discloses an image processing method, which comprises the following steps: extracting image features of the first image through a pre-trained image processing model, and obtaining an image processing result according to the image features of the first image; the image processing result comprises at least two results of a depth estimation result, a portrait segmentation result and a hairline matting result, the image processing model comprises an encoder and at least two of a first decoder, a second decoder and a third decoder, the encoder is used for extracting image features of a first image, the first decoder is used for obtaining the depth estimation result according to the image features, the second decoder is used for obtaining the portrait segmentation result according to the image features, and the third decoder is used for obtaining the hairline matting result according to the image features; and blurring the first image according to at least two results of the depth estimation result, the portrait segmentation result and the hairline matting result. The blurring effect of the image can be improved, and the calculated amount of the electronic equipment can be reduced.

Description

Image processing method and device, electronic equipment and computer readable storage medium

Technical Field

The present invention relates to the field of image technology, and in particular, to an image processing method and apparatus, an electronic device, and a computer readable storage medium.

Background

With the continuous development of shooting technology of electronic devices, at present, after an electronic device shoots an image, a user can process the shot image through functions such as background blurring provided by an application program installed on the electronic device, so as to meet the simple image processing requirement of the user.

In practice, it has been found that to implement the above-described image processing functions, communication requires feature extraction of the captured image, such as depth estimation, image segmentation, and the like. However, the existing image feature extraction algorithm is not only redundant in calculation and low in accuracy, so that the blurring effect of the image is not improved.

Disclosure of Invention

The embodiment of the application discloses an image processing method and device, electronic equipment and a computer readable storage medium, which can improve the blurring effect of an image and reduce the calculated amount of the electronic equipment on image processing.

An embodiment of the present application in a first aspect discloses an image processing method, including:

extracting image features of a first image through a pre-trained image processing model, and obtaining an image processing result according to the image features of the first image; the image processing result comprises at least two results of a depth estimation result, a portrait segmentation result and a hairline matting result, wherein the depth estimation result is used for describing depth information of the first image, the portrait segmentation result is used for describing a portrait area of the first image, the hairline matting result is used for describing a hair area of the first image, the image processing model comprises an encoder and at least two of a first decoder, a second decoder and a third decoder, the encoder is used for extracting image features of the first image, the first decoder is used for obtaining the depth estimation result according to the image features extracted by the encoder, the second decoder is used for obtaining the portrait segmentation result according to the image features extracted by the encoder, and the third decoder is used for obtaining the hairline matting result according to the image features extracted by the encoder;

And blurring the first image according to at least two results of the depth estimation result, the portrait segmentation result and the hairline matting result.

A second aspect of an embodiment of the present application discloses an image processing apparatus, including:

an extracting unit, configured to extract an image feature of a first image through a pre-trained image processing model, and obtain an image processing result according to the image feature of the first image, where the image processing result includes at least two results of a depth estimation result, a person segmentation result, and a hair-cut result, the depth estimation result is used to describe depth information of the first image, the person segmentation result is used to describe a person image area of the first image, the hair-cut result is used to describe a hair area of the first image, the image processing model includes an encoder, and includes at least two of a first decoder, a second decoder, and a third decoder, the encoder is used to extract an image feature of the first image, the first decoder is used to obtain a depth estimation result according to the image feature extracted by the encoder, the second decoder is used to obtain a person segmentation result according to the image feature extracted by the encoder, and the third decoder is used to obtain a hair-cut result according to the image feature extracted by the encoder;

And the blurring unit is used for blurring the first image according to at least two results of the depth estimation result, the portrait segmentation result and the hairline matting result.

A third aspect of an embodiment of the present application discloses an electronic device, including:

a memory storing executable program code;

a processor coupled to the memory;

the processor invokes the executable program code stored in the memory to execute the image processing method disclosed in the first aspect of the embodiment of the present application.

A fourth aspect of the embodiments of the present application discloses a computer-readable storage medium storing a computer program, wherein the computer program causes a computer to execute the image processing method disclosed in the first aspect of the embodiments of the present application.

Compared with the related art, the embodiment of the application has the following beneficial effects:

in the embodiment of the present application, the image features of the first image may be extracted through a pre-trained image processing model, and an image processing result may be obtained according to the image features of the first image, where the obtained image processing result may include at least two results of a depth estimation result, a portrait segmentation result and a hairline matting result, so that the subsequent electronic device may combine two or more different image processing results as guidance, perform blurring processing on the first image, and further improve the image blurring effect. In addition, the pre-trained image processing model disclosed by the embodiment of the application can adopt a framework that one encoder is connected with a plurality of decoders at the same time, namely, the plurality of decoders share one encoder to extract image features, and feature extraction is not required to be independently carried out for each decoder, so that the calculation amount of the image processing model can be reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural view of an image processing module disclosed in an embodiment of the present application;

FIG. 2 is a schematic flow chart of an image processing method disclosed in an embodiment of the present application;

FIG. 3 is a flow chart of another image processing method disclosed in an embodiment of the present application;

fig. 4A is an exemplary diagram of a rotation operation performed on a first image according to an embodiment of the present disclosure;

fig. 4B is an example diagram of another rotation operation performed on a first image disclosed in an embodiment of the present application;

FIG. 4C is a schematic diagram illustrating an image processing module workflow according to an embodiment of the disclosure;

FIG. 4D is a schematic diagram of a network layer exchange for use between decoders disclosed in an embodiment of the present application;

FIG. 5 is a flow chart of yet another image processing method disclosed in an embodiment of the present application;

FIG. 6 is a schematic diagram for explaining the blurring process disclosed in the embodiment of the present application;

fig. 7 is a schematic structural view of an image processing apparatus disclosed in an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

It should be noted that the terms "first," "second," "third," and "fourth," etc. in the description and claims of the present application are used for distinguishing between different objects and not for describing a particular sequential order. The terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

The embodiment of the application discloses an image processing method and device, electronic equipment and a computer readable storage medium, which can improve the blurring effect of an image and reduce the calculated amount of the electronic equipment.

The technical scheme of the present application will be described in detail with reference to specific embodiments.

In order to more clearly describe an image processing method and apparatus, an electronic device, and a computer readable storage medium disclosed in the embodiments of the present application, an application scenario suitable for the image processing method is first introduced. Alternatively, the image processing method may be applied to an electronic device (e.g., a cell phone, a tablet computer, a wearable device, etc., without limitation herein), the electronic device may include a pre-trained image processing model, as shown in fig. 1, the image processing model 100 may include an encoder 1001, and a first decoder 1002, a second decoder 1003, and a third decoder 1004 (in an alternative embodiment, the decoders included in the image processing model 100 may be only any two of the first decoder 1002, the second decoder 1003, and the third decoder 1004; in another alternative embodiment, the number of decoders included in the image processing model 100 may be more than three, e.g., four, five, etc., without limitation herein).

The encoder 1001 may be composed of any neural network model of a MobileNet series, a ShuffleNet system, a ResetNet system, or a backbone series, and is not limited herein. The encoder 1001 may be configured to extract image features of a first image (e.g., an image captured by an electronic device, or an image downloaded by an electronic device from the internet or other electronic device, not limited herein).

The first decoder 1002 may be composed of: one or more of the network layers of convolution layer, deconvolution layer, up-sampling layer, batch normalization layer (Batch Normalization, BN) layer, linear rectification (Rectified Linear Unit, reLU) layer, etc. The first decoder 1002 is configured to obtain a depth estimation result from the image feature extracted by the encoder 1001, where the depth estimation result is used to describe depth information of the first image.

The second decoder 1003 may be composed of: one or more of the network layers of convolution layer, deconvolution layer, up-sampling layer, batch normalization layer (Batch Normalization, BN) layer, linear rectification (Rectified Linear Unit, reLU) layer, etc. The second decoder 1003 is configured to obtain a portrait segmentation result from the image features extracted by the encoder 1001, where the portrait segmentation result is used to describe a portrait region of the first image.

The third decoder 1004 may be composed of: one or more of the network layers of convolution layer, deconvolution layer, up-sampling layer, batch normalization layer (Batch Normalization, BN) layer, linear rectification (Rectified Linear Unit, reLU) layer, etc. The third decoder 1004 is configured to obtain a hair-matting result according to the image feature extracted by the encoder 1001, where the hair-matting result is used to describe a hair region of the first image.

Alternatively, the encoder 1001 may be connected to the first decoder 1002, the second decoder 1003, and the third decoder 1004 through a jump connection (it should be noted that, the encoder may be connected to a plurality of decoders in a jump connection, fig. 1 is only an exemplary illustration of the jump connection between the encoder and the first decoder, and should not be limited to the embodiment of the present application), so that the image feature extracted by the encoder 1001 may be fused with the information feature map obtained by the decoder, thereby improving the generalization capability of the image processing model 100. In addition, the architecture in which one encoder is connected to a plurality of decoders simultaneously can also reduce the calculation amount of the image processing model 100.

It should be further noted that the encoder 1001, the first decoder 1002, the second decoder 1003, and the third decoder 1004 in the image processing model 100 may also be connected by other architectures, and fig. 1 is only an exemplary architecture that may be implemented, and should not be limited to the embodiments of the present application.

Referring to fig. 2, fig. 2 is a flowchart of an image processing method disclosed in an embodiment of the present application, where the image processing method may be applied to the electronic device, the electronic device may include an image processing model, and the image processing method may include the following steps:

202. and extracting the image characteristics of the first image through the pre-trained image processing model, and obtaining an image processing result according to the image characteristics of the first image.

In an embodiment of the present application, the pre-trained image processing model may include one encoder, and at least two of the first decoder, the second decoder, and the third decoder. And image features of the first image may be extracted by the encoder. Optionally, the encoder may include a plurality of network layers (e.g., a downsampling layer, a convolution layer, etc., not limited thereto), wherein the first network layer may receive the input first image, and sequentially process the first image by downsampling, convolution, etc. through the plurality of network layers arranged behind the first network layer to extract the image characteristics of the first image.

Alternatively, the first image may be an image captured by an imaging device of the electronic device, or may be an image transmitted to the electronic device through wired or wireless communication, which is not limited herein. In one embodiment, the first image may include a plurality of different image data, including: color data based on the YUV color space, color data based on the RGB color space, texture data, are not limited herein. Image features may include, but are not limited to, color features, texture features, edge features, and the like.

Further, the encoder may input the extracted image features of the first image into each decoder included in the image processing model, and process the image features through each decoder, so as to obtain corresponding image processing results of each decoder. For example, the image processing model includes a first decoder, a second decoder and a third decoder, and the image features of the first image can be processed by the first decoder, the second decoder and the third decoder, so as to obtain a plurality of image processing results. The image processing result is obtained by processing the image features extracted by the encoder in an up-sampling mode, a convolution mode and the like by the decoder.

The image processing results obtained by different decoders may be different, and the image processing results may carry image information in the first image, for example: the depth estimation result of the first image, the portrait segmentation result, the hair-line matting result and the like are not limited herein, wherein the depth estimation result can be used for describing the depth information of the first image, the depth estimation result can comprise the depth information corresponding to each pixel point in the first image, and the depth information can be used for representing the distance between a shooting object corresponding to the pixel point and shooting equipment; the image segmentation result may be used to describe an image region of the first image, and the image segmentation result may include image position information of the image region of the first image; the hair-matting result is used for describing the hair region of the first image, and the hair-matting result can comprise image position information of the hair region of the person in the first image.

In one embodiment, the image processing model outputs different image processing results depending on the number and kind of decoders included in the image processing model. The number and the types of the decoders included in the image processing model can be adjusted by a developer or a user according to the use requirement so as to obtain different image processing results, and the use flexibility of the image processing model is improved. Alternatively, the image processing model may include a first decoder, a second decoder, and a third decoder at the same time; a first decoder and a second decoder may also be included; or a second decoder and a third decoder; or alternatively includes a first decoder and a third decoder, not limited herein.

204. And blurring the first image according to at least two results of the depth estimation result, the portrait segmentation result and the hairline matting result.

In this embodiment of the present application, after obtaining an image processing result of the first image through the image processing model, the electronic device may perform blurring processing on the first image according to the image processing result. The blurring process refers to background blurring, i.e. the process of blurring the background region of the first image to highlight the foreground region of the first image.

Unlike the related art that the blurring process is generally performed on the image only according to a certain image processing result, in the embodiment of the present application, the electronic device may perform the blurring process on the first image according to at least two results of the depth estimation result, the portrait segmentation result and the hairline matting result, so as to improve the blurring effect on the image.

For example, the blurring process may be performed on the first image according to the depth estimation result and the portrait segmentation result, and since the portrait region of the first image described in the portrait segmentation result is more accurate, the electronic device may correct the portrait region in the depth estimation result according to the portrait segmentation result, and then the subsequent electronic device performs the blurring process on the first image according to the corrected depth estimation result, so that the blurring process may be performed on the background region except for the portrait region in the first image more accurately, thereby improving the accuracy of background blurring.

By implementing the method disclosed by the embodiments, the image characteristics of the first image can be extracted through the pre-trained image processing model, and the image processing result can be obtained according to the image characteristics of the first image, wherein the obtained image processing result can comprise at least two results of a depth estimation result, a portrait segmentation result and a hairline matting result, so that the subsequent electronic equipment can take more image processing results as guidance to perform blurring processing on the first image, and further the blurring effect of the image is improved. In addition, the pre-trained image processing model disclosed in the embodiments of the present application may adopt a structure in which one encoder is connected to a plurality of decoders at the same time, that is, the plurality of decoders share one feature decoder, and compared with a structure in which "one encoder is correspondingly connected to one decoder" in the related art, the pre-trained image processing model can reduce the plurality of decoders, thereby reducing the calculation amount of the image processing model.

Referring to fig. 3, fig. 3 is a flowchart of another image processing method disclosed in an embodiment of the present application, where the image processing method may be applied to the electronic device, and the electronic device may include an image processing model, and the image processing method may include the following steps:

302. image features of the first image are extracted by a pre-trained image processing model.

As an alternative embodiment, before extracting the image features of the first image by the pre-trained image processing model, the electronic device may perform preprocessing on the first image so that the image specification of the preprocessed first image is consistent with the image specification of the input image of the image processing model; the operation of preprocessing at least comprises: one or more of a rotation operation, a scaling operation, a normalization operation.

The rotation operation may refer to an operation of rotating a certain pixel point of an original image by a certain angle as a center. The electronic device may determine a shooting direction of the original image according to the width and the height of the original image, for example, when the width is greater than the height, the shooting direction of the original image is a horizontal shooting; when the height is larger than the width, the shooting direction of the original image is vertical. Or, the shooting direction of the original image is judged according to the shooting direction value recorded by the shooting device for shooting the original image. Wherein, shooting direction may include: horizontal or vertical beats. When it is determined that the photographing direction of the original image is inconsistent with the photographing direction of the input image of the first division model, the electronic device may perform a rotation operation on the original image such that the rotated original image is consistent with the photographing direction of the input image. The rotation direction is not limited, and may include clockwise rotation or counterclockwise rotation.

For example, referring to fig. 4A, fig. 4A is an exemplary diagram of performing a rotation operation on a first image according to an embodiment of the present application. Assuming that the shooting direction of an input image of the image processing model is vertical shooting; if the shooting direction of the first image 410 before preprocessing is detected as a horizontal shot, the first image 410 may be rotated by 90 ° in the counterclockwise direction, and a vertically shot first image 420 may be obtained.

For example, referring to fig. 4B, fig. 4B is an exemplary diagram of another rotation operation performed on a first image as disclosed in an embodiment of the present application. Assuming that the shooting direction of an input image of the image processing model is a horizontal shooting; if the photographing direction of the first image 430 obtained before the preprocessing is detected as a portrait, the first image may be rotated by 90 ° in a counterclockwise direction, resulting in a horizontally photographed first image 440.

The scaling operation may refer to an operation of reducing or enlarging an image size of the first image. When the image size of the first image is smaller than the image size of the input image of the first segmentation model, the electronic device can perform the amplifying operation on the first image; the electronic device may perform a zoom-out operation on the first image when the image size of the first image is greater than the image size of the input image. For example, if the image size of the input image of the first segmentation model is 640×480, the image size of the first image needs to be reduced or enlarged to 640×480.

Normalization may refer to mapping the image data values for each pixel in the first image to a range of [0,1 ]. The normalization operation may include: and carrying out the operations of subtracting the mean value from the value of the RGB three channels corresponding to each pixel point in the first image and then dividing the variance. For example, assuming that the average value is 127.5, for the value X of the RGB channel corresponding to any one pixel point in the first image, the operation of subtracting the average value and then dividing the variance may be expressed by the following formula: (X-127.5)/127.5. Alternatively, the normalization operation may include: the RGB three-channel value corresponding to each pixel in the first image is directly divided by 255. For example, for the value X of the RGB channel corresponding to any one pixel point in the first image, the operation of directly dividing by 255 may be expressed by the following formula: x/255.

304. And processing the image characteristics extracted by the encoder through a target decoder in the image processing model to obtain one or more first information characteristic diagrams corresponding to the target decoder, fusing the one or more first information characteristic diagrams with one or more second information characteristic diagrams output by other decoders, and obtaining an image processing result corresponding to the target decoder according to the fused information characteristic diagrams.

The target decoder may be any decoder included in the image processing model, and the other decoders may be any decoder other than the target decoder in the image processing model, which is not limited herein.

In this embodiment of the present application, the target decoder may be formed by one or more of a convolutional layer, a deconvolution layer, an upsampling layer, a batch normalization layer (Batch Normalization, BN) layer, a linear rectification (Rectified Linear Unit, reLU) layer, and other network layers, and after the encoder inputs the extracted image of the first image to the target decoder, the target decoder may sequentially perform upsampling, convolution, and other processes on the image feature of the first image through each network layer, so as to output a first information feature map at each layer, thereby obtaining one or more first information feature maps. The information feature map is an image for describing a certain feature of the first image, for example: the depth feature map may be used to describe depth information of the first image, the portrait segmentation feature map may be used to describe a portrait region of the first image, etc., and is not limited herein.

Further, other decoders than the target decoder may also process the image features of the first image extracted by the encoder, thereby obtaining one or more second information feature maps. Alternatively, the first information profile and the second information profile are typically two profiles for describing different features, such as: the first information feature map may be a depth feature map and the second information feature map may be a portrait segmentation feature map. And the electronic equipment can fuse one or more first information feature images with one or more second information feature images output by other decoders, and obtain an image processing result corresponding to the target decoder according to the fused information feature images.

In the embodiment of the present application, the electronic device may fuse the information feature maps output by at least two decoders, so that the information feature maps output by different decoders may be mutually guided and supervised, thereby improving the robustness of the image processing result determined later.

In one embodiment, the image processing model may include the first decoder and the second decoder described above. Alternatively, the encoder may input the extracted image features of the first image to the first decoder and the second decoder, respectively; and the first decoder can perform depth estimation according to the image features extracted by the encoder to generate one or more depth feature images, then fuse the one or more depth feature images with one or more image segmentation feature images output by the second decoder, and obtain a depth estimation result according to the fused feature images.

For example, referring to fig. 4C, since the image segmentation feature map is more accurate for segmenting the image region in the first image, the image segmentation feature map output by the second decoder may be fused to the depth feature map output by the first decoder, so as to correct the image region of the depth feature map, thereby obtaining a depth estimation result with more accurate edge of the image region.

In another embodiment, the second decoder may identify a portrait region of the first image according to the image features extracted by the encoder, generate one or more portrait segmentation feature maps according to the identified portrait region, then fuse the one or more portrait segmentation feature maps with the one or more depth feature maps output by the first decoder, and obtain a portrait segmentation result according to the fused feature maps.

Referring to fig. 4C, because there is inconsistency between the semantics of the hand-held object, the wearing object, the accessory and the portrait, the related art easily segments the hand-held object, the wearing object, and the accessory on the person as the background in the process of segmenting the portrait of the first image, so that the segmentation of the portrait area is inaccurate. However, in the depth feature map, the hand-held object, the wearing object and the accessory have similar depth with the portrait, so that the depth feature map output by the first decoder can be fused into the portrait segmentation feature map output by the second decoder, thereby avoiding the hand-held object, the wearing object and the accessory on the person from being segmented as the background, and further obtaining the portrait segmentation result with higher portrait segmentation precision.

As another alternative embodiment, the image processing model may include the second decoder and the third decoder described above, and optionally, the encoder may input the extracted image features of the first image to the second decoder and the third decoder, respectively, after extracting the image features of the first image; and the third decoder can identify the hair region in the first image according to the image features extracted by the encoder, generate one or more hair feature images according to the identified hair region, then fuse the one or more hair feature images with one or more image segmentation feature images output by the second decoder, and obtain a hair matting result according to the fused feature images.

Referring to fig. 4C, since the image segmentation feature map already includes a more accurate hair region segmentation result, the image segmentation feature map output by the second decoder can be fused to the hair feature map output by the third decoder, so that a more accurate hair matting result of the hair region can be obtained.

In yet another alternative embodiment, the image processing model may include a first decoder, a second decoder, and a third decoder, and similarly, the generated information feature maps between the first decoder, the second decoder, and the third decoder may be fused with each other in the above manner, so as to improve accuracy of the image processing results subsequently output by the respective decoders.

Optionally, the image processing model may include a fourth decoder, where the fourth decoder may be configured to obtain a hair segmentation result of the first image according to the image feature of the first image output by the encoder, and the hair segmentation result may be configured to instruct the third decoder to determine a hair region of the first image, so as to obtain a more accurate hair matting result.

In one embodiment, the target decoder may include an M-layer upsampling layer, M may be a positive integer greater than or equal to 2. After extracting the image features of the first image, the encoder may input the image features of the first image to a first upsampling layer in the target decoder, so as to perform upsampling processing on the image features extracted by the encoder, thereby obtaining a first information feature map output by the first upsampling layer. And the first up-sampling layer can input the output first information feature image into the second up-sampling layer to continue up-sampling processing, and so on, and the image features of the first image are up-sampled sequentially through the M up-sampling layers.

Optionally, N is a positive integer greater than or equal to 2 and less than or equal to M, corresponding to the N-th upsampling layer. The first information feature image output by the up-sampling layer of the N-1 layer in the target decoder can be fused with the second information feature image output by other decoders at the up-sampling layer of the same layer number, and the fused information feature image is input to the up-sampling layer of the N layer in the target decoder; and then, up-sampling the fused information feature images through an N-th up-sampling layer to obtain a first information feature image output by the N-th up-sampling layer. That is, for the upsampling layers (e.g., the second upsampling layer, the third upsampling layer, etc., not limited herein), which are arranged after the first upsampling layer, the first information feature map output by the upsampling layer of the previous layer may be fused with the second information feature map output by the upsampling layer of the other decoder in the same layer (e.g., the upsampling layer of the previous layer is the third upsampling layer in the target decoder, the second information feature map output by the upsampling layer of the third layer in the other decoder may be fused with the first information feature map output by the upsampling layer of the third layer in the target decoder), so that the information feature maps output by different decoders may guide and monitor each other, thereby improving the robustness of the image processing result of the subsequent determination.

Finally, according to the first information feature map output by the up-sampling layer of the M layer (i.e. the up-sampling layer of the last layer), an image processing result corresponding to the target decoder can be generated.

Optionally, the target decoder may further include one or more of a convolutional layer, a deconvolution layer, a batch normalization layer (Batch Normalization, BN) layer, a linear rectification (Rectified Linear Unit, reLU) layer, and other network layers, so that the first information feature map may be further reduced by other network layers, so as to generate an image processing result corresponding to the target decoder.

By implementing the method, compared with the related art, which usually determines that the image processing result is different only according to one or more information feature images output by one decoder, the information feature images output by at least two decoders can be fused, so that the information feature images output by different decoders can be mutually guided and supervised, and the robustness of the image processing result determined later is improved.

In the embodiment of the application, the electronic device may fuse the first information feature map and the second information feature map by adding (add) or splicing (concat).

That is, the electronic device may add the first feature of the first dimension corresponding to the first information feature map output by the N-1 up-sampling layer in the target decoder to the second feature of the first dimension corresponding to the second information feature map output by the same up-sampling layer in the other decoders, so as to fuse the first information feature map and the second information feature map.

In another embodiment, the electronic device may splice a first information feature map output by an N-1 up-sampling layer in the target decoder to a first feature and splice a second feature corresponding to a second information feature map output by the same up-sampling layer in other decoders, so as to fuse the first information feature map and the second information feature map.

In practice it has been found that the feature dimensions of the output first information feature map and the second information feature map may also be different, since the number of channels of the same network layer may be different for different decoders. Alternatively, before adding the first feature of the first dimension corresponding to the first information feature map and the second feature of the first dimension corresponding to the second information feature map, convolution operation may be performed on the first information feature map and/or the second information feature map to change the feature dimensions of the first information feature map and/or the second information feature map, and when the feature dimensions of the first information feature map and the second information feature map are equal, the first feature map and the second feature map are added.

In another embodiment, if the feature dimensions corresponding to the first information feature map and the second information feature map are different, bilinear interpolation processing may be performed on the first information feature map and/or the second information feature map (where bilinear interpolation is a linear interpolation expansion of an interpolation function of two variables, and the core idea is to perform linear interpolation once in two directions respectively), so that the feature dimensions of the first information feature map and the feature dimensions of the second information feature map are equal, and then add the two.

By implementing the method, when the feature dimensions of the first information feature map and the second information feature map are unequal, the feature dimensions of the first information feature map and the second information feature map are equal in a convolution or bilinear interpolation mode, so that the first information feature map and the second information feature map can be fused in a feature addition mode, the information feature maps output by different decoders can be mutually guided and supervised, and the robustness of the image processing result determined later is improved.

Referring to fig. 4C, the encoder may establish jump connection with the first decoder, the second decoder, and the third decoder, so that the image features extracted by the encoder may be fused with the information feature map restored by the decoder, thereby improving the generalization capability of the image processing module. Alternatively, the first information feature map output by the N-1 up-sampling layer in the target decoder, the second information feature map output by the same up-sampling layer in other decoders, and the target image feature may be fused, so as to obtain the first information feature map output by the N-1 up-sampling layer. The target image features are image features which are output in the encoder and have the same resolution as the first information feature map output by the N-1 layer up-sampling layer.

By implementing the method, the image features extracted by the encoder and the information feature map restored by the decoder can be fused, so that the generalization capability of the image processing module is improved, and the accuracy of the image processing result obtained by the decoder later is improved.

As an alternative embodiment, the electronic device may also exchange part of the downsampling layers in the trained target decoder with downsampling layers belonging to the same layer number in other decoders to obtain a new target decoder. The target decoder is any decoder included in the image processing model, and the other decoders are any decoder except the target decoder in the image processing model.

Alternatively, the electronic device may exchange a part of the downsampling layers arranged in succession with downsampling layers belonging to the same number of layers in other decoders; the downsampling layers of the partial interval arrangement (which may be one or more layers, such as two or three layers, etc., and are not limited herein) may also be exchanged with downsampling layers belonging to the same layer number in other decoders, and are not limited herein.

In connection with fig. 4D, for example, the electronic device may exchange the downsampling layers (e.g., layer 2, layer 4 downsampling layers in the figure) that are partially spaced apart in the first decoder with downsampling layers (also layer 2, layer 4 downsampling layers) belonging to the same number of layers in the second decoder to obtain a new first decoder and a new second decoder. Furthermore, the electronic device can process the image features extracted by the encoder through the new target decoder to obtain an image processing result corresponding to the target decoder.

By implementing the method, the downsampling layers in the new target decoder are interacted, namely the new target decoder obtains partial downsampling capacity of other target decoders, so that the electronic equipment can directly process the image features extracted by the encoder through the new target decoder, different decoders can be led and supervised mutually, and the robustness of the image processing result determined later is improved. And the calculation amount of the electronic equipment can be reduced because the feature fusion operation is not required to be executed, so that the cruising ability of the electronic equipment is improved.

In another embodiment, the image features extracted by the encoder may be processed by a new target decoder to obtain one or more first information feature maps corresponding to the new target decoder, and the one or more second information feature maps output by one or more first information feature maps and other decoders (which may not be a decoder performing downsampling layer exchange, or may be a decoder performing downsampling layer exchange with the target decoder or other decoders, which is not limited herein) are fused, and an image processing result corresponding to the new target decoder is obtained according to the fused information feature maps.

In yet another embodiment, one or more first information feature maps output by the downsampling layer that is not exchanged in the new decoder may be fused with one or more second information feature maps output by the other decoders, and an image processing result corresponding to the target decoder may be obtained according to the fused information feature maps. The other decoder may be a decoder that exchanges the downsampling layer with the target decoder, may be a decoder that exchanges the downsampling layer with the other decoder, or may be a decoder that does not exchange the downsampling layer, and is not limited herein. Alternatively, one or more second information feature maps output by other decoders may be output by a downsampled layer that is not being swapped among the decoders that swap the downsampled layer with the target decoder.

By implementing the method disclosed by the embodiments, the information feature maps output by different decoders can be fused, so that the information feature maps output by different decoders can be mutually guided and supervised, and the robustness of the image processing result determined later is improved.

306. And blurring the first image according to at least two results of the depth estimation result, the portrait segmentation result and the hairline matting result.

In one embodiment, when the image decoder includes a first decoder and a third decoder, the electronic device may determine a background area of the first image and depth information corresponding to each background sub-area in the background area according to a depth estimation result output by the first decoder, further determine a first blurring parameter corresponding to each background sub-area according to the depth information, and then perform blurring processing on the first image according to the first blurring parameter to obtain a first blurring image.

And then determining second blurring parameters corresponding to all the pixel points in the hair region in the first image according to the hair matting result output by the third decoder, and blurring all the pixel points according to the second blurring parameters corresponding to all the pixel points so as to obtain the first blurring image. And finally, fusing the second blurring image with the first blurring image to obtain a second image.

In another embodiment, when the image decoder includes the second decoder and the third decoder, the electronic device may determine the background area of the first image according to the portrait segmentation result output by the second decoder, and then directly perform the blurring process on the background area of the first image (the blurring parameters may be set by a developer according to the actual application requirements, and are not limited herein), so as to obtain the third blurring image. And then fusing the second blurring image with the third blurring image to obtain a second image.

By implementing the method disclosed by the embodiments, the electronic device can take more image processing results as guidance to perform blurring processing on the first image, so that the image blurring effect is improved; the information feature graphs output by at least two decoders can be fused, so that the information feature graphs output by different decoders can be mutually guided and supervised, and the robustness of the image processing result of subsequent determination is improved; and the image features extracted by the encoder and the information feature images restored by the decoder can be fused, so that the generalization capability of the image processing module is improved, and the accuracy of the image processing result obtained by the decoder subsequently is improved.

Referring to fig. 5, fig. 5 is a flowchart of another image processing method disclosed in an embodiment of the present application, where the image processing method may be applied to the electronic device, and the electronic device may include an image processing model, and the image processing method may include the following steps:

502. and extracting the image characteristics of the first image through the pre-trained image processing model, and obtaining an image processing result according to the image characteristics of the first image.

As an alternative embodiment, the first decoder may be trained on a first decoder to be trained according to a first sample set, where the first sample set includes a first sample portrait image and a depth image corresponding to the first sample portrait image. The second decoder may be trained by a second decoder to be trained according to a second sample set, where the second sample set includes a second sample portrait image and a portrait mask corresponding to the second sample portrait image. The third decoder may be trained on a third decoder to be trained according to a third sample set, the third sample set including a third sample portrait image and a hair mask corresponding to the third sample portrait image.

The first decoder, the second decoder and the third decoder may be separately trained or may be trained with the encoder. In addition, in the training process, the information feature maps output by the first decoder, the second decoder and the third decoder may be fused, or may not be fused, which is not limited herein.

504. And determining a background area of the first image and depth information corresponding to the background area according to the depth estimation result and the portrait segmentation result.

In the embodiment of the application, because the portrait region of the first image described in the portrait segmentation result is more accurate, the electronic device can correct the portrait region in the depth estimation result by using the portrait segmentation result, so as to determine a more accurate portrait region. In the embodiment of the present application, the portrait area is generally determined as the foreground area of the first image, so that the background area (other area than the foreground area in the first image) of the first image may be determined according to the portrait area. And because a plurality of scenes with different depths of field may exist in the background area, the electronic device may further determine the depth information corresponding to the background area according to the depth estimation result.

In this regard, optionally, the electronic device may correct the portrait area in the depth estimation result according to the portrait segmentation result, and determine a background area in the first image according to the corrected portrait area; and further obtaining depth information of the background area according to the depth estimation result.

It should be noted that, the image segmentation result may also be a segmentation result of a scene that is to be set as a foreground region, for example, the foreground region in the first image is a flower, the image segmentation result may be a flower contour segmentation result, the foreground region is a vehicle, and the image segmentation result may be a vehicle contour segmentation result, which is not limited herein.

506. And blurring the background area according to the depth information to obtain a second image.

In the embodiment of the application, the electronic device may not perform blurring processing on the foreground area. And for the background area, the electronic equipment can determine the blurring parameters of different subareas in the background area according to the scenery information of the background area, and further perform blurring processing with corresponding strength on the different subareas in the background area according to the determined blurring parameters, so as to obtain a second image with blurring of the background.

Optionally, the electronic device may divide the background area according to the depth information of the background area, and determine the first blurring parameter corresponding to each background sub-area according to the depth information corresponding to each divided background sub-area; and then, blurring processing is carried out on each background subarea according to the first blurring parameters corresponding to each background subarea so as to obtain a second image.

For example, in connection with fig. 6, the background area is divided into a first sub-area and a second sub-area, wherein the depth of field of the first sub-area is greater than that of the second sub-area, and then the blurring strength of the first sub-area may be greater than that of the second sub-area. Of course, the blurring strength of the first sub-region may be less than or equal to that of the second sub-region, which is not limited herein.

By implementing the method, the electronic equipment can perform blurring on different subareas in different degrees according to the depth of field of the different subareas in the background area, so that the layering sense of the first image is increased, and the blurring effect of the image is further improved.

It should be noted that, the hair matting result includes probability information that each pixel point in the hair region in the first image belongs to hair, and it can be understood that, because the hair is relatively slim, the background region between the hair is scratched out during matting. Optionally, the electronic device may determine, according to probability information that each pixel point in the hair region in the first image belongs to hair, a second blurring parameter corresponding to each pixel point in the hair region in the first image; and then, according to the second blurring parameters corresponding to the pixel points in the hair region, blurring the pixel points to obtain a first blurring image (namely, a blurring image of the hair region in the first image).

And because the blurring processing is performed according to the hairline drawing result to obtain the first blurring image, the background area in the hair area can be blurring without affecting the hair, and the blurring effect corresponding to the hair area is better. Alternatively, the electronic device may perform blurring processing on each background sub-region according to the first blurring parameter corresponding to each background sub-region to obtain a second blurring image (i.e., an overall blurring image), and fuse the second blurring image (a local blurring image) with the first blurring image to obtain a second image.

Alternatively, the electronic device may replace the hair region in the second virtual image with the first virtual image to obtain the second image.

By implementing the method, the electronic equipment can perform local blurring processing on the hair region of the first image according to the hair matting result to obtain a first blurring image of the hair region, and then perform integral blurring on the first image according to the depth estimation result and the portrait segmentation result to obtain a second blurring image. And then replacing the hair region in the second blurring image with the first blurring image so as to ensure that the blurring effect of the hair region in the second image is better.

By implementing the method disclosed by the embodiments, the electronic device can take more image processing results as guidance to perform blurring processing on the first image, so that the image blurring effect is improved; according to the depth of field of different subareas in the background area, blurring of different subareas can be performed to different degrees, so that layering of the first image is increased, and further blurring effect of the image is improved; and firstly, carrying out local blurring treatment on the hair region of the first image according to the hair matting result to obtain a first blurring image of the hair region, and then carrying out integral blurring on the first image according to the depth estimation result and the portrait segmentation result to obtain a second blurring image. And then replacing the hair region in the second blurring image with the first blurring image so as to ensure that the blurring effect of the hair region in the second image is better.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application, where the image processing apparatus may be applied to the above-mentioned electronic device, and the electronic device may include an image processing model, and the image processing apparatus may include: an extraction unit 701 and an blurring unit 702, wherein:

An extracting unit 701, configured to extract image features of the first image through a pre-trained image processing model, and obtain an image processing result according to the image features of the first image; the image processing result comprises at least two results of a depth estimation result, a portrait segmentation result and a hairline matting result, wherein the depth estimation result is used for describing depth information of a first image, the portrait segmentation result is used for describing a portrait area of the first image, the hairline matting result is used for describing a hair area of the first image, the image processing model comprises an encoder and at least two of a first decoder, a second decoder and a third decoder, the encoder is used for extracting image features of the first image, the first decoder is used for obtaining the depth estimation result according to the image features extracted by the encoder, the second decoder is used for obtaining the portrait segmentation result according to the image features extracted by the encoder, and the third decoder is used for obtaining the hairline matting result according to the image features extracted by the encoder;

a blurring unit 702, configured to perform blurring processing on the first image according to at least two results of the depth estimation result, the portrait segmentation result, and the hairline matting result; and, the amount of computation of the image processing model can be reduced.

By implementing the image processing device, more image processing results can be used as guidance to perform blurring processing on the first image, so that the image blurring effect is improved.

As an optional implementation manner, the manner in which the extracting unit 701 is configured to obtain the image processing result according to the image feature of the first image may specifically be:

the extracting unit 701 is configured to process, by using a target decoder in the image processing model, the image feature extracted by the encoder to obtain one or more first information feature images corresponding to the target decoder, fuse one or more first information feature images with one or more second information feature images output by other decoders, and obtain an image processing result corresponding to the target decoder according to the fused information feature images; the target decoder is any decoder included in the image processing model, and the other decoders are any decoder except the target decoder in the image processing model.

By implementing the image processing device, the information feature graphs output by at least two decoders can be fused, so that the information feature graphs output by different decoders can be mutually guided and supervised, and the robustness of the image processing result determined later is improved.

As an alternative embodiment, the image processing model includes a first decoder and a second decoder; and, the extracting unit 701 processes the image features extracted by the encoder through the target decoder in the image processing model, so as to obtain one or more first information feature images corresponding to the target decoder, fuse one or more first information feature images with one or more second information feature images output by other decoders, and obtain an image processing result corresponding to the target decoder according to the fused information feature images, where the method may specifically be that:

the extracting unit 701 is configured to perform depth estimation according to the image features extracted by the encoder through the first decoder, generate one or more depth feature maps, fuse the one or more depth feature maps with one or more image segmentation feature maps output by the second decoder, and obtain a depth estimation result according to the fused feature maps; and/or identifying the image region of the first image according to the image features extracted by the encoder through the second decoder, generating one or more image segmentation feature images according to the image region, fusing the one or more image segmentation feature images with the one or more depth feature images output by the first decoder, and obtaining an image segmentation result according to the fused feature images.

By implementing the image processing device, the information feature graphs output by the first decoder and the second decoder can be fused, so that the information feature graphs output by different decoders can be mutually guided and supervised, and the robustness of the image processing result determined later is improved.

As an alternative embodiment, the image processing model includes a second decoder and a third decoder; and, the extracting unit 701 processes the image features extracted by the encoder through the target decoder in the image processing model, so as to obtain one or more first information feature images corresponding to the target decoder, fuse one or more first information feature images with one or more second information feature images output by other decoders, and obtain an image processing result corresponding to the target decoder according to the fused information feature images, where the method may specifically be that:

the extracting unit 701 is configured to identify, by using the third decoder, a hair region in the first image according to the image feature extracted by the encoder, generate one or more hair feature maps according to the hair region, fuse the one or more hair feature maps with the one or more image segmentation feature maps output by the second decoder, and obtain a hair matting result according to the fused feature maps.

By implementing the image processing device, the information feature graphs output by the second decoder and the third decoder can be fused, so that the information feature graphs output by different decoders can be mutually guided and supervised, and the robustness of the image processing result determined later is improved.

As an alternative embodiment, the target decoder includes an M-layer upsampling layer; m is a positive integer greater than or equal to 2; and, the extracting unit 701 processes the image features extracted by the encoder through the target decoder in the image processing model, so as to obtain one or more first information feature images corresponding to the target decoder, fuse one or more first information feature images with one or more second information feature images output by other decoders, and obtain an image processing result corresponding to the target decoder according to the fused information feature images, where the method may specifically be that:

an extracting unit 701, configured to perform upsampling processing on the image feature extracted by the encoder through a first upsampling layer in the target decoder, so as to obtain a first information feature map output by the first upsampling layer; the first information feature map output by the up-sampling layer of the N-1 layer in the target decoder is fused with the second information feature map output by other decoders in the up-sampling layer of the same layer number, the fused information feature map is input to the up-sampling layer of the N layer in the target decoder, up-sampling processing is carried out through the up-sampling layer of the N layer, and the first information feature map output by the up-sampling layer of the N layer is obtained, wherein N is a positive integer which is more than or equal to 2 and less than or equal to M; and generating an image processing result corresponding to the target decoder according to the first information feature map output by the M-layer up-sampling layer.

In the implementation of the image processing device, compared with the related art, the difference of the image processing results is determined according to one or more information characteristic diagrams output by one decoder, the information characteristic diagrams output by at least two decoders can be fused, so that the information characteristic diagrams output by different decoders can be mutually guided and supervised, and the robustness of the image processing results determined later is improved.

As an optional implementation manner, the way for the extracting unit 701 to fuse the first information feature map output by the N-1 th upsampling layer in the target decoder with the second information feature map output by the other decoders in the upsampling layer with the same layer number may specifically be:

an extracting unit 701, configured to add a first feature of a first dimension corresponding to a first information feature map output by an N-1 th upsampling layer in a target decoder to a second feature of a first dimension corresponding to a second information feature map output by the same upsampling layer in other decoders; or splicing the first information characteristic diagram output by the up-sampling layer of the N-1 layer in the target decoder corresponding to the first characteristic and the second information characteristic diagram output by the same up-sampling layer of other decoders corresponding to the second characteristic.

By implementing the image processing device, the first information feature map and the second information feature map can be fused in a feature addition or splicing mode, so that the information feature maps output by different decoders can be mutually guided and supervised, and the robustness of the image processing result determined later is improved.

the extracting unit 701 is configured to fuse the first information feature map output by the N-1 up-sampling layer in the target decoder, the second information feature map output by the same up-sampling layer by other decoders, and the target image feature, where the target image feature is an image feature output in the encoder and has the same resolution as the first information feature map output by the N-1 up-sampling layer.

By implementing the image processing device, the image features extracted by the encoder and the information feature map restored by the decoder can be fused, so that the generalization capability of the image processing module is improved, and the accuracy of the image processing result obtained by the decoder subsequently is improved.

an extracting unit 701, configured to exchange a part of downsampling layers in the target decoder with downsampling layers belonging to the same layer number in other decoders, so as to obtain a new target decoder; the target decoder is any decoder included in the image processing model, and the other decoders are any decoder except the target decoder in the image processing model; and processing the image features extracted by the encoder by the new target decoder to obtain an image processing result corresponding to the target decoder.

By implementing the image processing device, the image characteristics extracted by the encoder can be directly processed by the new target decoder, so that different decoders can be mutually guided and supervised, and the robustness of the image processing result determined later is improved. And the calculation amount of the electronic equipment can be reduced because the feature fusion operation is not required to be executed, so that the cruising ability of the electronic equipment is improved.

As an alternative embodiment, the first decoder is trained on a first decoder to be trained according to a first sample set, the first sample set including a first sample portrait image and a depth image corresponding to the first sample portrait image; the second decoder is obtained by training a second decoder to be trained according to a second sample set, and the second sample set comprises a second sample portrait image and a portrait mask corresponding to the second sample portrait image; and the third decoder is trained by the third decoder to be trained according to a third sample set, wherein the third sample set comprises a third sample portrait image and a hair mask corresponding to the third sample portrait image.

As an optional implementation manner, the blurring unit 702 performs blurring processing on the first image according to at least two results of the depth estimation result, the portrait segmentation result and the hairline matting result, which may specifically be:

the blurring unit 702 is configured to determine a background area of the first image and depth information corresponding to the background area according to the depth estimation result and the portrait segmentation result; and blurring the background area according to the depth information to obtain a second image.

By implementing the image processing device, different sub-areas can be subjected to blurring to different degrees according to the depth of field of the different sub-areas in the background area, so that the layering sense of the first image is increased, and the blurring effect of the image is further improved.

As an optional implementation manner, the blurring unit 702 determines, according to the depth estimation result and the portrait segmentation result, a background area of the first image, and a manner of depth information corresponding to the background area may specifically be:

a blurring unit 702, configured to correct a portrait area in the depth estimation result according to the portrait segmentation result, and determine a background area in the first image according to the corrected portrait area; obtaining depth information of a background area according to a depth estimation result;

And, the blurring unit 702 performs blurring processing on the background area according to the depth information, so as to obtain a second image, which may specifically be:

the blurring unit 702 is configured to divide a background area according to depth information of the background area, and determine a first blurring parameter corresponding to each background sub-area according to the depth information corresponding to each background sub-area obtained by the division; and blurring processing is carried out on each background subarea according to the first blurring parameters corresponding to each background subarea so as to obtain a second image.

As an optional implementation manner, the hair matting result includes probability information that each pixel point in the hair region in the first image belongs to hair; and, the image processing apparatus shown in fig. 7 further includes a determination unit 703 in which:

a determining unit 703, configured to determine, according to probability information, a second blurring parameter corresponding to each pixel point in the hair area in the first image before the blurring unit 702 performs blurring processing on each background sub-area according to the first blurring parameter corresponding to each background sub-area to obtain a second image, and perform blurring processing on each pixel point according to the second blurring parameter corresponding to each pixel point to obtain a first blurring image;

And, the blurring unit 702 performs blurring processing on each background sub-region according to the first blurring parameter corresponding to each background sub-region, so as to obtain a second image, which may specifically be:

the blurring unit 702 is configured to perform blurring processing on each background sub-region according to the first blurring parameter corresponding to each background sub-region, so as to obtain a second blurring image, and fuse the second blurring image with the first blurring image, so as to obtain a second image.

By implementing the image processing device, the hair region of the first image can be locally blurred according to the hair matting result to obtain a first blurred image of the hair region, and then the first image is integrally blurred according to the depth estimation result and the portrait segmentation result to obtain a second blurred image. And then replacing the hair region in the second blurring image with the first blurring image so as to ensure that the blurring effect of the hair region in the second image is better.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

As shown in fig. 8, the electronic device may include:

a memory 801 storing executable program code;

A processor 802 coupled to the memory 801;

the processor 802 calls executable program codes stored in the memory 801 to execute the image processing method disclosed in each of the above embodiments.

The present embodiment discloses a computer-readable storage medium storing a computer program, wherein the computer program causes a computer to execute the image processing method disclosed in each of the above embodiments.

The application embodiment also discloses an application publishing platform, wherein the application publishing platform is used for publishing the computer program product, and the computer program product is enabled to execute part or all of the steps of the method as in the method embodiments.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments and that the acts and modules referred to are not necessarily required in the present application.

In various embodiments of the present application, it should be understood that the size of the sequence numbers of the above processes does not mean that the execution sequence of the processes is necessarily sequential, and the execution sequence of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer-accessible memory. Based on such understanding, the technical solution of the present application, or a part contributing to the prior art or all or part of the technical solution, may be embodied in the form of a software product stored in a memory, including several requests for a computer device (which may be a personal computer, a server or a network device, etc., in particular may be a processor in the computer device) to perform part or all of the steps of the above-mentioned method of the various embodiments of the present application.

Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the above embodiments may be implemented by a program that instructs associated hardware, the program may be stored in a computer readable storage medium including Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), one-time programmable Read-Only Memory (OTPROM), electrically erasable programmable Read-Only Memory (EEPROM), compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical disk Memory, magnetic disk Memory, tape Memory, or any other medium that can be used for carrying or storing data that is readable by a computer.

The image processing method and apparatus, the electronic device, and the computer readable storage medium disclosed in the embodiments of the present application are described in detail, and specific examples are applied to illustrate the principles and embodiments of the present application, where the description of the above embodiments is only used to help understand the method and core idea of the present application; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. An image processing method, the method comprising:

extracting image features of a first image through a pre-trained image processing model, and obtaining an image processing result according to the image features of the first image; the image processing result comprises at least two results of a depth estimation result, a portrait segmentation result and a hairline matting result, wherein the depth estimation result is used for describing depth information of the first image, the portrait segmentation result is used for describing a portrait area of the first image, the hairline matting result is used for describing a hair area of the first image, the image processing model comprises an encoder and at least two of a first decoder, a second decoder and a third decoder, the encoder is used for extracting image features of the first image, the first decoder is used for obtaining the depth estimation result according to the image features extracted by the encoder, the second decoder is used for obtaining the portrait segmentation result according to the image features extracted by the encoder, and the third decoder is used for obtaining the hairline matting result according to the image features extracted by the encoder and is connected with at least two of the first decoder, the second decoder and the third decoder respectively through jump connection;

2. The method according to claim 1, wherein the obtaining an image processing result according to the image feature of the first image includes:

processing the image features extracted by the encoder through a target decoder in the image processing model to obtain one or more first information feature images corresponding to the target decoder, fusing the one or more first information feature images with one or more second information feature images output by other decoders, and obtaining an image processing result corresponding to the target decoder according to the fused information feature images;

wherein the target decoder is any decoder included in the image processing model, and the other decoders are any decoder other than the target decoder in the image processing model.

3. The method of claim 2, wherein the image processing model comprises the first decoder and the second decoder; and processing the image features extracted by the encoder through a target decoder in the image processing model to obtain one or more first information feature images corresponding to the target decoder, fusing the one or more first information feature images with one or more second information feature images output by other decoders, and obtaining an image processing result corresponding to the target decoder according to the fused information feature images, wherein the processing result comprises:

Performing depth estimation according to the image features extracted by the encoder through the first decoder, generating one or more depth feature images, fusing the one or more depth feature images with one or more image segmentation feature images output by the second decoder, and obtaining a depth estimation result according to the fused feature images; and/or

And identifying a portrait region of the first image according to the image features extracted by the encoder through the second decoder, generating one or more portrait segmentation feature images according to the portrait region, fusing the one or more portrait segmentation feature images with one or more depth feature images output by the first decoder, and obtaining a portrait segmentation result according to the fused feature images.

4. The method of claim 2, wherein the image processing model comprises the second decoder and the third decoder; and processing the image features extracted by the encoder through a target decoder in the image processing model to obtain one or more first information feature images corresponding to the target decoder, fusing the one or more first information feature images with one or more second information feature images output by other decoders, and obtaining an image processing result corresponding to the target decoder according to the fused information feature images, wherein the processing result comprises:

And identifying a hair region in the first image according to the image features extracted by the encoder through the third decoder, generating one or more hair feature images according to the hair region, fusing the one or more hair feature images with one or more image segmentation feature images output by the second decoder, and obtaining a hair matting result according to the fused feature images.

5. The method of any of claims 2-4, wherein the target decoder comprises an M-layer upsampling layer; m is a positive integer greater than or equal to 2; and processing the image features extracted by the encoder through a target decoder in the image processing model to obtain one or more first information feature images corresponding to the target decoder, fusing the one or more first information feature images with one or more second information feature images output by other decoders, and obtaining an image processing result corresponding to the target decoder according to the fused information feature images, wherein the processing result comprises:

the image features extracted by the encoder are subjected to up-sampling processing through a first up-sampling layer in the target decoder so as to obtain a first information feature map output by the first up-sampling layer;

Fusing a first information feature image output by an up-sampling layer of an N-1 layer in the target decoder with a second information feature image output by other decoders in the up-sampling layer of the same layer number, inputting the fused information feature image to the up-sampling layer of the N layer in the target decoder, and performing up-sampling treatment through the up-sampling layer of the N layer to obtain a first information feature image output by the up-sampling layer of the N layer, wherein N is a positive integer greater than or equal to 2 and less than or equal to M;

and generating an image processing result corresponding to the target decoder according to the first information feature map output by the M-layer up-sampling layer.

6. The method of claim 5, wherein fusing the first information feature map output by the N-1 upsampling layer in the target decoder with the second information feature map output by other decoders at the same number of upsampling layers, comprises:

adding a first feature of a first dimension corresponding to a first information feature map output by an up-sampling layer of an N-1 layer in the target decoder and a second feature of a first dimension corresponding to a second information feature map output by the same up-sampling layer of other decoders;

Or splicing the first information feature map output by the up-sampling layer of the N-1 layer in the target decoder corresponding to the first feature and the second feature corresponding to the second information feature map output by the same up-sampling layer of other decoders.

7. The method of claim 5, wherein fusing the first information feature map output by the N-1 upsampling layer in the target decoder with the second information feature map output by the same upsampling layer as the other decoders, comprises:

and fusing the first information characteristic image output by the N-1 layer up-sampling layer in the target decoder, the second information characteristic image output by the same up-sampling layer in other decoders and target image characteristics, wherein the target image characteristics are image characteristics which are output by the encoder and have the same resolution as the first information characteristic image output by the N-1 layer up-sampling layer.

8. The method according to claim 1, wherein the obtaining an image processing result according to the image feature of the first image includes:

exchanging part of downsampling layers in the target decoder with downsampling layers belonging to the same layer number in other decoders to obtain a new target decoder; wherein the target decoder is any decoder included in the image processing model, and the other decoders are any decoder other than the target decoder in the image processing model;

And processing the image characteristics extracted by the encoder by the new target decoder to obtain an image processing result corresponding to the target decoder.

9. The method according to any one of claims 1-4, 6-8, wherein the first decoder is trained on a first decoder to be trained from a first set of samples, the first set of samples comprising a first sample portrait image and a depth image corresponding to the first sample portrait image;

the second decoder is obtained by training a second decoder to be trained according to a second sample set, and the second sample set comprises a second sample portrait image and a portrait mask corresponding to the second sample portrait image;

the third decoder is obtained by training a third decoder to be trained according to a third sample set, and the third sample set comprises a third sample portrait image and a hair mask corresponding to the third sample portrait image.

10. The method of claim 1, wherein blurring the first image based on at least two of the depth estimation result, the portrait segmentation result, and the hairline matting result comprises:

Determining a background area of the first image and depth information corresponding to the background area according to the depth estimation result and the portrait segmentation result;

and blurring the background area according to the depth information to obtain a second image.

11. The method according to claim 10, wherein determining a background area of the first image and depth information corresponding to the background area according to the depth estimation result and the image segmentation result includes:

correcting the portrait area in the depth estimation result according to the portrait segmentation result, and determining a background area in the first image according to the corrected portrait area;

acquiring depth information of the background area according to the depth estimation result;

and blurring the background area according to the depth information to obtain a second image, including:

dividing the background area according to the depth information of the background area, and determining a first blurring parameter corresponding to each background subarea according to the depth information corresponding to each background subarea obtained by dividing;

And carrying out blurring processing on each background subarea according to the first blurring parameters corresponding to each background subarea so as to obtain a second image.

12. A method as claimed in claim 11, wherein the hair matting result comprises probability information that each pixel point within a hair region in the first image belongs to hair; before the blurring processing is performed on each background subarea according to the first blurring parameters corresponding to each background subarea so as to obtain a second image, the method further includes:

determining second blurring parameters corresponding to all pixel points in the hair region in the first image according to the probability information;

according to the second blurring parameters corresponding to the pixel points, blurring processing is carried out on the pixel points so as to obtain a first blurring image;

and performing blurring processing on each background subarea according to the first blurring parameter corresponding to each background subarea to obtain a second image, where the blurring processing includes:

and carrying out blurring processing on each background subarea according to the first blurring parameters corresponding to each background subarea so as to obtain a second blurring image, and fusing the second blurring image with the first blurring image so as to obtain a second image.

13. An image processing apparatus, comprising:

the extraction unit is used for extracting the image characteristics of the first image through the pre-trained image processing model and obtaining an image processing result according to the image characteristics of the first image; the image processing result comprises at least two results of a depth estimation result, a portrait segmentation result and a hairline matting result, wherein the depth estimation result is used for describing depth information of the first image, the portrait segmentation result is used for describing a portrait area of the first image, the hairline matting result is used for describing a hair area of the first image, the image processing model comprises an encoder and at least two of a first decoder, a second decoder and a third decoder, the encoder is used for extracting image features of the first image, the first decoder is used for obtaining the depth estimation result according to the image features extracted by the encoder, the second decoder is used for obtaining the portrait segmentation result according to the image features extracted by the encoder, and the third decoder is used for obtaining the hairline matting result according to the image features extracted by the encoder and is connected with at least two of the first decoder, the second decoder and the third decoder respectively through jump connection;

14. An electronic device comprising a memory storing executable program code, and a processor coupled to the memory; wherein the processor invokes the executable program code stored in the memory to perform the method of any one of claims 1-12.

15. A computer readable storage medium storing a computer program, which when executed by a processor implements the method of any one of claims 1 to 12.