CN113409331A

CN113409331A - Image processing method, image processing apparatus, terminal, and readable storage medium

Info

Publication number: CN113409331A
Application number: CN202110636329.7A
Authority: CN
Inventors: 戴夏强
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-06-08
Filing date: 2021-06-08
Publication date: 2021-09-17
Anticipated expiration: 2041-06-08
Also published as: CN113409331B

Abstract

The application discloses an image processing method, an image processing device, a terminal and a non-volatile computer readable storage medium. The image processing method comprises the steps of carrying out portrait segmentation processing on an obtained original image to obtain an initial segmentation image and a first probability image, wherein the initial segmentation image comprises a portrait area and a background area, the first probability image comprises a first probability value of each pixel in the initial segmentation image, and the first probability value represents the probability of each pixel in the portrait area in the initial segmentation image; acquiring a depth image and a second probability image, wherein the depth image is used for indicating the depth value of each pixel in the original image, and the second probability image comprises a second probability value corresponding to each pixel in the depth image; and acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image. The method and the device can avoid the problems of false detection, missed detection and the like in a complex scene compared with the method of directly adopting a single segmentation model for segmentation.

Description

Image processing method, image processing apparatus, terminal, and readable storage medium

Technical Field

The present application relates to the field of image technologies, and in particular, to an image processing method, an image processing apparatus, a terminal, and a readable storage medium.

Background

At present, the portrait area in the image is usually obtained by semantic segmentation or matting and other methods, and the foreground and the background are classified by feature integration in the semantic segmentation or matting and other methods. However, if the scene of the image is complex, it is difficult to distinguish whether some features belong to the foreground or the background, so that the problems of false detection and missed detection are easily caused.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, a terminal and a non-volatile computer readable storage medium.

The embodiment of the application provides an image processing method. The image processing method comprises the steps of carrying out portrait segmentation processing on an acquired original image to obtain an initial segmentation image and a first probability image, wherein the initial segmentation image comprises a portrait area and a background area, the first probability image comprises a first probability value of each pixel in the initial segmentation image, and the first probability value represents the probability of each pixel in the initial segmentation image in the portrait area; acquiring a depth image and a second probability image, wherein the depth image is used for indicating the depth value of each pixel in the original image, and the second probability image comprises a second probability value corresponding to each pixel in the depth image; and acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image.

The embodiment of the application also provides an image processing device. The image processing device comprises a first acquisition module, a second acquisition module and a fusion module. The first obtaining module is configured to perform a portrait segmentation process on the obtained original image to obtain an initial segmented image and a first probability image, where the initial segmented image includes a portrait area and a background area, and the first probability image includes a first probability value of each pixel in the initial segmented image, and the first probability value represents a probability of each pixel in the initial segmented image in the portrait area. The second obtaining module is configured to obtain a depth image and a second probability image, where the depth image is used to indicate a depth value of each pixel in the original image, and the second probability image includes a second probability value corresponding to each pixel in the depth image. The fusion module is used for acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image.

The embodiment of the application also provides a terminal. The terminal includes a housing and one or more processors. One or more of the processors are associated with the housing. One or more processors configured to perform a portrait segmentation process on an acquired original image to obtain an initial segmentation image and a first probability image, the initial segmentation image including a portrait region and a background region, the first probability image including a first probability value of each pixel in the initial segmentation image, the first probability value representing a probability of each pixel in the portrait region in the initial segmentation image; acquiring a depth image and a second probability image, wherein the depth image is used for indicating the depth value of each pixel in the original image, and the second probability image comprises a second probability value corresponding to each pixel in the depth image; and acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image.

The embodiment of the application also provides a nonvolatile computer readable storage medium containing the computer program. The computer program, when executed by a processor, causes the processor to perform an image processing method described below. The image processing method comprises the steps of carrying out portrait segmentation processing on an acquired original image to obtain an initial segmentation image and a first probability image, wherein the initial segmentation image comprises a portrait area and a background area, the first probability image comprises a first probability value of each pixel in the initial segmentation image, and the first probability value represents the probability of each pixel in the initial segmentation image in the portrait area; acquiring a depth image and a second probability image, wherein the depth image is used for indicating the depth value of each pixel in the original image, and the second probability image comprises a second probability value corresponding to each pixel in the depth image; and acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image.

According to the image processing method, the image processing device, the terminal and the nonvolatile computer readable storage medium in the embodiment of the application, the target segmentation image is obtained according to the initial segmentation image, the first probability image, the depth image and the second probability image, compared with the situation that false detection, omission and the like in a complex scene can be avoided by directly adopting a single segmentation model to segment the original image, the stability and the accuracy of segmentation of the portrait area are improved.

Additional aspects and advantages of embodiments of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The above and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic flow chart diagram of an image processing method in some embodiments of the present application;

FIG. 2 is a schematic diagram of an image processing apparatus according to some embodiments of the present disclosure;

FIG. 3 is a block diagram of a terminal in some embodiments of the present application;

FIGS. 4-5 are schematic flow diagrams of image processing methods according to some embodiments of the present disclosure;

FIG. 6 is a schematic diagram of a raw image taken in a horizontal shot in some embodiments of the present application;

FIG. 7 is a schematic diagram of an original image taken in a portrait mode in some embodiments of the present application;

FIG. 8 is a schematic illustration of a rotation of a horizontal shot to a vertical shot in certain embodiments of the present application;

FIGS. 9-11 are schematic diagrams of segmentation models segmenting a pre-processed image in some embodiments of the present application;

FIG. 12 is a schematic flow chart diagram of an image processing method in some embodiments of the present application;

FIG. 13 is a schematic diagram of a depth estimation network model in some embodiments of the present application;

FIG. 14 is a schematic diagram of a single-purpose depth estimation network model in some embodiments of the present application;

FIG. 15 is a schematic diagram of a binocular depth estimation network model in some embodiments of the present application;

FIGS. 16-17 are schematic flow charts of image processing methods according to certain embodiments of the present disclosure;

FIG. 18 is a schematic diagram illustrating first image processing of an original depth information image based on an initial segmentation image in some embodiments of the present application;

FIGS. 19-20 are schematic flow diagrams of image processing methods according to certain embodiments of the present application;

FIG. 21 is a schematic diagram illustrating the generation of a segmented image of a target in some embodiments of the present application;

FIGS. 22-23 are schematic flow charts of image processing methods according to certain embodiments of the present disclosure;

FIG. 24 is a schematic diagram of an interaction between a non-volatile computer-readable storage medium and a processor in some embodiments of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the embodiments of the present application, and are not to be construed as limiting the embodiments of the present application.

Referring to fig. 1, an embodiment of the present application provides an image processing method. The image processing method comprises the following steps:

01: performing portrait segmentation processing on the obtained original image to obtain an initial segmentation image and a first probability image, wherein the initial segmentation image comprises a portrait area and a background area, the first probability image comprises a first probability value of each pixel in the initial segmentation image, and the first probability value represents the probability of each pixel in the portrait area in the initial segmentation image;

02: acquiring a depth image and a second probability image, wherein the depth image is used for indicating the depth value of each pixel in the original image, and the second probability image comprises a second probability value corresponding to each pixel in the depth image; and

03: and acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image.

Referring to fig. 2, an image processing apparatus 10 is further provided in the present embodiment. The image processing apparatus 10 includes a first obtaining module 11, a second obtaining module 12, and a fusion module 13. Step 01 in the image processing method can be implemented by the first obtaining module 11; step 02 may be implemented by the second obtaining module 12; step 03 may be implemented by the fusion module 13. That is, the first obtaining module 11 is configured to perform a portrait segmentation process on the obtained original image to obtain an initial segmentation image and a first probability image, where the initial segmentation image includes a portrait area and a background area, the first probability image includes a first probability value of each pixel in the initial segmentation image, and the first probability value represents a probability of each pixel in the portrait area in the initial segmentation image. The second obtaining module 12 is configured to obtain a depth image and a second probability image, where the depth image is used to indicate a depth value of each pixel in the original image, and the second probability image includes a second probability value corresponding to each pixel in the depth image. The fusion module 13 is configured to obtain an image target segmentation image according to the initial segmentation image, the first probability image, the depth image, and the second probability image.

Please refer to fig. 3, the present embodiment further provides a terminal 100. The terminal 100 includes a housing 20 and one or more processors 30, with the one or more processors 30 being integrated with the housing 20. The

steps

01, 02 and 03 of the image processing method can be implemented by one or more processors 20. That is, the one or more processors 30 are configured to perform a portrait segmentation process on the acquired original image to obtain an initial segmentation image and a first probability image, wherein the initial segmentation image includes a portrait region and a background region, the first probability image includes a first probability value of each pixel in the initial segmentation image, and the first probability value represents a probability of each pixel in the initial segmentation image in the portrait region; acquiring a depth image and a second probability image, wherein the depth image is used for indicating the depth value of each pixel in the original image, and the second probability image comprises a second probability value corresponding to each pixel in the depth image; and acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image. It should be noted that the terminal 100 may be a mobile phone, a camera, a notebook computer, or an intelligent wearable device, and the following embodiments only use the terminal 100 as a mobile phone for description.

In the image processing method, the image processing apparatus 10, and the terminal 100 according to the embodiment of the application, by acquiring the target segmentation image according to the initial segmentation image, the first probability image, the depth image, and the second probability image, compared with a method of directly segmenting the original image by using a single segmentation model, the method can avoid problems of false detection, missed detection, and the like in a complex scene, and improve stability and accuracy of segmentation of the portrait region.

In an example, the terminal 100 or the image processing apparatus 10 may further include an imaging module 40, and the terminal 100 or the image processing apparatus 10 may acquire an image of a person through the imaging module 40 to obtain an original image; in another example, the terminal 100 or the image processing apparatus 10 may further include a storage module 50, the storage module 50 stores an original image containing a portrait in advance, and the processor 30 of the terminal 100 may obtain the original image from the storage module 50; in still another example, the terminal 100 or the image processing apparatus 10 may also acquire an original image including a portrait through input of a user. The specific method for acquiring the original image is not limited herein, and the acquired original image needs to contain a portrait.

After the original image is acquired, the processor 30 (or the first acquisition module 11) performs a portrait segmentation process on the acquired original image to acquire an initial segmented image and a first probability image. Specifically, referring to fig. 1 and 4, in some embodiments, step 01: the method for obtaining the initial segmentation image and the first probability image by performing portrait segmentation processing on the obtained original image comprises the following steps:

011: preprocessing an original image to obtain a preprocessed image;

012: the preprocessed image is input into a preset segmentation model to obtain an initial segmentation image and a first probability image.

Referring to fig. 2, in some embodiments, the

steps

011 and 012 can be implemented by the first obtaining module 11 of the image processing apparatus 10. That is, the first obtaining module 11 is further configured to pre-process the original image to obtain a pre-processed image; and inputting the preprocessed image into a preset segmentation model to obtain an initial segmentation image and a first probability image.

Referring to fig. 3, in some embodiments, the

steps

011 and 012 can be implemented by one or more processors 30 of the terminal 100. That is, the one or more processors 30 are also configured to pre-process the raw image to obtain a pre-processed image; and inputting the preprocessed image into a preset segmentation model to obtain an initial segmentation image and a first probability image.

For example, in some embodiments, the image input to the preset segmentation model needs to meet the requirements of the input image, that is, the preset segmentation model may have some requirements for the attributes of the input image, and the input image should meet the requirements, and the preset segmentation model can be correctly processed. Therefore, after the original image containing the portrait is obtained, the original image is preprocessed to obtain a preprocessed image, and the preprocessed image can meet the requirement of a preset segmentation model on the input image. Therefore, after the preprocessed image is input into the preset segmentation model, the preset segmentation model can correctly process the preprocessed image. Specifically, referring to fig. 4 and 5, in some embodiments, step 011: preprocessing an original image to obtain a preprocessed image, comprising:

0111: detecting whether the original image is a horizontally shot image or a vertically shot image;

0112: if the original image is a horizontally shot image, rotating the original image to change the original image into a vertically shot image; performing normalization processing on the rotated original image to obtain a preprocessed image;

0113: and if the original image is a vertically shot image, performing normalization processing on the original image to obtain a preprocessed image.

Referring to fig. 2, in some embodiments, the

steps

0111, 0112, and 0113 can be implemented by the first obtaining module 11 of the image processing apparatus 10. That is, the first obtaining module 11 is further configured to detect whether the original image is a horizontally shot image or a vertically shot image; if the original image is a horizontally shot image, rotating the original image to change the original image into a vertically shot image; performing normalization processing on the rotated original image to obtain a preprocessed image; and if the original image is a vertically shot image, performing normalization processing on the original image to obtain a preprocessed image.

Referring to fig. 3, in some embodiments, the

steps

0111, 0112, and 0113 can be implemented by one or more processors 30 of the terminal 100. That is, the one or more processors 30 are also configured to detect whether the original image is a cross-shot image or a portrait image; if the original image is a horizontally shot image, rotating the original image to change the original image into a vertically shot image; performing normalization processing on the rotated original image to obtain a preprocessed image; and if the original image is a vertically shot image, performing normalization processing on the original image to obtain a preprocessed image.

Illustratively, after an original image containing a portrait is acquired, whether the original image is a horizontally shot image or a vertically shot image is detected. If the terminal 100 or the image processing apparatus 10 is in the horizontal shooting mode when the current original image is shot, the current original image is a horizontal shooting image; if the terminal 100 or the image processing apparatus 10 is in the portrait mode when the current original image is photographed, the current original image is a portrait image. In some embodiments, whether the original image is a cross-shot image or a portrait image may be detected by the width and height of the original image. For example, the terminal 100 captures a current original image. As shown in fig. 6 and fig. 7, the terminal 100 includes a first side 101 and a second side 102 adjacent to each other, and the length of the first side 101 is longer than that of the second side 102. The first side 101 is the long side of the terminal 100 and the second side 102 is the wide side of the terminal 100. And acquiring the width w and the height h of the original image, wherein the length of one side of the original image parallel to the long side of the terminal 100 is the height h of the original image, and the length of one side of the original image parallel to the wide side of the terminal 100 is the width w of the original image. If the width w of the original image is greater than the height h (as shown in fig. 6), the original image is a horizontal shot image; if the width w of the original image is smaller than the height h (as shown in fig. 7), the original image is a portrait image.

Referring to fig. 8, when the current image is determined to be a horizontally shot image, the original image is rotated to be a vertically shot image, and normalization processing is performed on the rotated original image to obtain a preprocessed image, which is beneficial for a preset segmentation model to be able to correctly process the preprocessed image. For example, in one example, normalization may be performed by dividing the pixel values of all pixels in the rotated original image by 255; in another example, the normalization may also be performed by dividing the difference of the pixel values of all pixels in the rotated original image minus 127.5 by 127.5, which is not limited herein. Further, in some embodiments, after the horizontally shot image serving as the original image is rotated into the vertically shot image, the rotated original image is scaled to a preset size, and then the scaled original image is normalized. Wherein the preset size is a size of the input image required by the preset segmentation model.

When the current image is determined to be a vertically shot image, normalization processing is directly performed on the original image to obtain a preprocessed image, so that a preset segmentation model can be used for correctly processing the preprocessor image. The specific way of normalization is the same as the above-mentioned specific way of normalization of the rotated original image, and is not described herein again. Of course, in some embodiments, the vertical shot image as the original image may be scaled to a preset size, and then the scaled original image may be normalized.

Referring to fig. 9, after obtaining the pre-processed image, the processor 30 (or the first obtaining module 11) inputs the pre-processed image into a preset segmentation model to obtain an initial segmentation image and a first probability image. Wherein the initial segmented image comprises a portrait region (e.g., a white region of the initial segmented image in fig. 9) and a background region (e.g., a black region of the initial segmented image in fig. 9), the first probability image comprises a first probability value I1 of each pixel in the initial segmented image, and the first probability value I1 characterizes a probability of each pixel in the initial segmented image in the portrait region. That is, if the first probability value I1 corresponding to a certain pixel in the initial segmented image is larger, the probability that the pixel is in the portrait area is higher; similarly, if the first probability value I1 corresponding to a certain pixel in the initial segmented image is smaller, the probability that the pixel is in the human image area is smaller. In particular, in some embodiments, the initial segmented image includes a segmentation value for each pixel, and the initial segmented image includes a segmentation value for each pixel in a range of 0 to 1. If the segmentation value in the initial segmentation image is larger than a first preset segmentation value, the region where the pixel point is located is considered as an image region; and if the segmentation value in the initial segmentation image is not greater than the first preset segmentation value, the region where the pixel point is located is considered as a background region.

Referring to fig. 10, in an embodiment, the preset segmentation model includes a coding module and a decoding module, and the coding module is capable of performing multiple convolutions and pooling on the preprocessed image input into the preset segmentation model to obtain a feature image, where the feature image includes portrait feature information. The decoding module is used for acquiring an initial segmentation image containing a portrait area and a first probability image according to the portrait feature information in the feature image. Specifically, referring to fig. 11, in some embodiments, the preset segmentation model further includes a Semantic segmentation (Semantic Branch) and a Detail segmentation (Detail Branch), wherein the Detail segmentation (Detail Branch) is used to extract micro features of the preprocessed image, the Semantic segmentation (Semantic Branch) is used to extract macro features of the preprocessed image, and after the micro features and the micro features are fused, an initial segmentation image including a portrait area and a first probability image are obtained according to the fused features. The preset segmentation model integrates the features of different levels (micro features and macro features) to realize the requirements of rough segmentation and fine segmentation, and the accuracy of the preset segmentation model for segmenting the image is improved. It should be noted that, in some embodiments, the preset segmentation model may adopt a MODNet network configuration or a Spectral matching network configuration, and certainly, the preset segmentation model may also adopt other network configurations, only the initial segmentation image including the portrait area and the first probability image may be obtained, which is not limited herein.

In addition, in some embodiments, the image processing method further includes iteratively training the initial segmentation model to obtain a preset segmentation model according to a sample image set including a plurality of sample images. Specifically, a portrait area is marked in the sample image, and the sample image is input into the initial segmentation model to obtain a training image containing the portrait area. And obtaining the value of the loss function of the initial segmentation model according to the difference between the portrait area of the training image and the portrait area marked in the sample image. After the value of the loss function of the initial model is obtained for the initial model, iterative training can be performed on the initial model according to the value of the loss function to obtain a segmentation model. In some embodiments, the initial model may be iteratively trained using an Adam optimizer according to a loss function until a loss value of an output result of the initial model converges, and the model at this time is saved as a trained segmentation model. The Adam optimizer combines the advantages of two optimization algorithms, namely Adaptive Gradient and RMSProp, comprehensively considers First Moment Estimation (mean value of Gradient) and Second Moment Estimation (non-centralized variance of Gradient) of the Gradient, and calculates the update step length.

It should be noted that the termination condition of the iterative training may include: the number of times of iterative training reaches the target number of times; or the loss value of the output result of the initial model meets the set convergence condition. In one example, the convergence condition is to make the total loss value as small as possible, and the initial learning rate 1e-3 is used, the learning rate decreases with the cosine of the step number, the batch _ size is 8, and after 16 epochs are trained, the convergence is considered to be completed. Where batch _ size may be understood as a batch parameter, its limit is the total number of samples in the training set, epoch refers to the number of times the entire data set is trained using all samples in the training set, colloquially the value of epoch is the number of times the entire data set is cycled, 1 epoch equals 1 training time using all samples in the training set. In another example, the loss value satisfying the set convergence condition may include: the total Loss value Loss is less than the set threshold. Of course, the specific setting conditions may not be limiting.

In some embodiments, the trained segmentation model may be stored locally in the terminal 100 (or the image processing apparatus 10), for example, in the storage module 50, and the trained segmentation model may also be stored in a server communicatively connected to the terminal 100 (or the image processing apparatus 10), so as to reduce the storage space occupied by the terminal 100 (or the image processing apparatus 10) and improve the operation efficiency of the terminal 100 (or the image processing apparatus 10). Of course, in some embodiments, the segmentation model may also periodically or aperiodically acquire new training data, train and update the segmentation model. For example, when there is a portrait image that is segmented by mistake, the portrait image may be used as a sample image, and after the sample image is labeled, the training may be performed again through the above training method, so as to improve the accuracy of the segmentation model.

In some embodiments, the original image may be input into a preset depth estimation network model, and the depth image including the depth information of the initial segmented image and the second probability image may be directly obtained. Specifically, referring to fig. 1 and 12, in some embodiments, the second probability value represents a probability that the depth value of each pixel in the depth image is the corresponding depth value, and step 02: acquiring a depth image and a second probability image, comprising:

021: preprocessing an original image to obtain a preprocessed image; and

022: and inputting the preprocessed image into a depth estimation network model to obtain a depth image and a second probability image.

Referring to fig. 2, in some embodiments, the

steps

021 and 022 can be performed by the second obtaining module 12 of the image processing apparatus 10. That is, the second obtaining module 12 is further configured to pre-process the original image to obtain a pre-processed image; and inputting the preprocessed image into the depth estimation network model to obtain a depth image and a second probability image.

Referring to fig. 3, in some embodiments, step 021 and step 022 can be implemented by being executed by one or more processors 30 of the terminal 100. That is, the one or more processors 30 are also configured to pre-process the raw image to obtain a pre-processed image; and inputting the preprocessed image into the depth estimation network model to obtain a depth image and a second probability image.

Similarly, in some embodiments, the image input to the preset depth estimation network model needs to satisfy the requirements of the input image, that is, the preset depth estimation network model may have some requirements for the attributes of the input image, and the input image should meet the requirements, and the preset depth estimation network model can be correctly processed. Therefore, before the original image is input into the preset depth estimation network model, the original image also needs to be preprocessed, and a specific implementation of preprocessing the original image to obtain a preprocessed image is the same as the specific implementation of preprocessing the original image to obtain the preprocessed image in the above embodiment, which is not described herein again. It should be noted that, in order to enable the depth image output by the depth estimation network model to correspond to the initial segmentation image, in some embodiments, the preprocessing performed before the original image enters the segmentation model is exactly the same as the preprocessing performed before the original image enters the depth estimation network model. In particular, in some embodiments, the original image may be preprocessed to obtain a preprocessed image, and then the preprocessed image is respectively input into the segmentation model and the depth estimation network model for processing. Therefore, the depth image output by the depth estimation network model can correspond to the initial segmentation image, and the image processing speed can be increased without carrying out twice preprocessing on the original image.

Referring to fig. 13, after the pre-processed image is obtained, the pre-processed image is input into a preset depth estimation network model to obtain a depth image and a second probability image. The depth image indicates depth values of all pixel points in the original image, the second probability image comprises a second probability value I2 corresponding to each pixel in the depth image, and the second probability value I2 represents the probability that the depth value of each pixel in the depth image is a corresponding value. For example, if the depth value of the pixel point located at row 1 and column 1 in the depth image is 0.5, and the second probability value I2 located at row 1 and column 1 in the second probability image is 80%, it indicates that the probability that the depth value of the pixel point located at row 1 and column 1 in the depth image is 0.5 is 80%.

Referring to fig. 14, in some embodiments, the predetermined depth network estimation model may be a monocular depth estimation network. For example, the monocular depth estimation network is a trained network obtained by performing supervised learning on a large amount of training data in advance, and may output the depth image and the second probability image corresponding to the preprocessed image, that is, may output the depth image and the second probability image corresponding to the initial segmented image. The monocular depth estimation network may be obtained by using a deep learning algorithm, such as CNN (Convolutional Neural Networks), U-Ne t algorithm, FCN (full Convolutional Neural Networks), and the like. In one embodiment, the monocular depth estimation network includes an Encoder coding module and a Decoder decoding module, the Encoder coding module implements coding by using a backsbone, not limited to mobilene, respet, vgg, and the like, and the coding module, i.e., the feature extraction module, is configured to perform operations, such as convolution, activation, pooling, and the like, on a processed image to extract features of an input image. The decoding module is used for performing convolution, activation, softmax calculation and other processing on the image features to obtain a depth image and a second probability image, wherein each second probability value I2 in the second probability image represents the probability that the depth value of each pixel in the depth image is a corresponding value. As shown in fig. 14, for a schematic image of a network structure fusing monocular depth estimation network depth information, a preprocessed image is input to an encoder-decoder type codec network, i.e., a monocular depth estimation network, so as to obtain a depth image and a second probability image.

Referring to fig. 15, in some embodiments, the preset depth network estimation model may also be a binocular depth estimation network. The imaging module 40 includes two left and right cameras at this time, and the two left and right cameras can separately acquire images. Similarly, the binocular depth estimation network is also a trained network obtained by preselecting supervised learning through a large amount of training data, and can output a depth image and a second probability image corresponding to the preprocessed image, that is, can output the depth image and the second probability image corresponding to the initial segmentation image. It should be noted that the original image may be a left image (acquired by the left-side camera), and the original image may also be a right image (acquired by the right-side camera), which is not limited herein. In one embodiment, as shown in fig. 15, the binocular depth estimation network includes two branches, one of which is a cost volume branch (cost volume branch) and the other of which is a volume filter (feature branch). The left camera and the right camera simultaneously acquire a left image and a right image, and the left image and the right image are preprocessed to obtain a preprocessed left image and a preprocessed right image. Outputting the preprocessed left image and the preprocessed right image to a binocular depth estimation network, after the preprocessed left image and the preprocessed right image enter a cost volume branch (cost volume branch) network branch for convolution, Batch Normalization (Batch Normalization), activation and the like, performing convolution through a convolution layer (convolution layer) to obtain a first result, and simultaneously, enabling the preprocessed right image to enter a volume filter (feature branch) network branch for performing convolution for multiple times to obtain a second result. After the first result and the second result are obtained, the first result and the second result are input to a joint filter (join filter) for convolution and activation, and the obtained result is calculated through three convolutions and soft argmax, so that a depth image is obtained. And inputting the first result and the second result into a joint filter (join filter) for convolution and activating to obtain a result, and performing processing such as multiple convolution and softmax calculation to obtain a second probability image, wherein each second probability value I2 in the second probability image represents the probability that the depth value of each pixel in the depth image is the corresponding value.

Referring to fig. 1 and 16, in some embodiments, the image processing method further includes:

04: acquiring original depth information image by adopting depth information acquisition module

At this time, step 02: obtaining a depth image and a second probability image, further comprising:

023: performing first image processing on the original depth information image according to the initial segmentation image to obtain a depth image corresponding to the initial segmentation image; and

024: and acquiring a corresponding second probability value according to the depth value of each pixel in the depth image so as to acquire a second probability image.

Referring to fig. 2, in some embodiments, the image processing apparatus 10 further includes a depth information collecting module 60, and the depth information collecting module 60 obtains an original depth information image. Step 023 and step 024 may both be implemented by the second obtaining module 12 of the image processing apparatus 10. That is, the second obtaining module 12 is further configured to perform first image processing on the original depth information image according to the initial segmented image to obtain a depth image corresponding to the initial segmented image; and acquiring a corresponding second probability value according to the depth value of each pixel in the depth image so as to acquire a second probability image.

Referring to fig. 3, in some embodiments, the terminal 100 further includes a depth information collecting module 60, and the depth information collecting module 60 obtains an original depth information image.

Steps

023 and 024 may both be implemented by execution of one or more processors 30 of terminal 100. That is, the one or more processors 30 are further configured to perform a first image processing on the original depth information image according to the initial segmentation image to obtain a depth image corresponding to the initial segmentation image; and acquiring a corresponding second probability value according to the depth value of each pixel in the depth image so as to acquire a second probability image.

It should be noted that, in some embodiments, the depth information collecting module 60 and the imaging module 40 collect the original depth information image and the original image at the same time; or, the time between the acquisition of the original depth information image by the depth information acquisition module 60 and the acquisition of the original image by the imaging module 40 is less than the preset time difference, so that the time difference between the acquisition of the original depth information image by the depth information acquisition module 60 and the acquisition of the original image by the imaging module 40 can be avoided to be large, and the image information in the original depth information image is not matched with the image information of the original image.

Further, in one example, the depth information collection module 60 may be a time of flight module (TOF module). Illustratively, the depth acquisition module 60 includes a light emitter and a light receiver, the light emitter is configured to emit infrared light to the object to be photographed, and the light receiver is configured to receive the infrared light reflected by the object to be photographed, and obtain the original depth information image according to a time difference or a phase difference between the emitted infrared light and the received reflected infrared light. In another example, the depth information collection module 60 may also be a structured light module. Illustratively, the depth collecting module 60 includes a structured light projector and a structured light camera, the structured light projector is used for projecting a laser image pattern to the object to be shot, and the structured light camera is used for collecting a reflected laser image pattern of the object to be shot and obtaining an original depth information image according to the received laser image pattern. Of course, the original depth information image may be obtained in other manners, for example, two frames of images are obtained by using a binocular camera, and the original depth information image is obtained according to the parallax between the images obtained by the left and right cameras, which is not limited herein.

The original depth information image corresponds to the original image, the original depth information image comprises the depth value of each pixel point in the image, the original image is subjected to preprocessing and portrait segmentation processing to obtain an initial segmentation image, and if the depth image corresponding to the initial segmentation image is required to be obtained, the original depth information image needs to be subjected to first image processing. Specifically, referring to fig. 16 and 17, in some embodiments, step 023: performing first image processing on an original depth information image according to an initial segmentation image to obtain a depth image corresponding to the initial segmentation image, including:

0231: carrying out image alignment processing on the original depth information image to obtain an original depth information image aligned with the initial segmentation image; and

0232: and carrying out interpolation zooming processing and Gaussian blur processing on the aligned original depth information image to obtain a depth image corresponding to the initial segmentation image.

Referring to fig. 2, in some embodiments, the

steps

0231 and 0232 can be implemented by the second obtaining module 12 of the image processing apparatus 10. That is, the second obtaining module 12 is further configured to perform image alignment processing on the original depth information image to obtain an original depth information image aligned with the initial segmentation image; and carrying out interpolation zooming processing and Gaussian blur processing on the aligned original depth information images to obtain a depth image corresponding to the initial segmentation image.

Referring to fig. 3, in some embodiments,

steps

0231 and 0232 can be implemented by one or more processors 30 of terminal 100. That is, the one or more processors 30 are further configured to perform an image alignment process on the original depth information image to obtain an original depth information image aligned with the initial segmentation image; and carrying out interpolation zooming processing and Gaussian blur processing on the aligned original depth information images to obtain a depth image corresponding to the initial segmentation image.

Because the original depth information image and the original image are not obtained by the same set of imaging equipment, the coordinate system corresponding to the original depth information image is not the same as the coordinate system corresponding to the original image, that is, the coordinate system corresponding to the original depth information image is not the same as the coordinate system corresponding to the initial segmentation image. After the initial segmentation image and the original depth information image are obtained, image alignment processing needs to be performed on the original depth information image to obtain the original depth information image aligned with the initial segmentation image, and a coordinate system corresponding to the aligned original depth information image is the same as a coordinate system corresponding to the initial segmentation image. Specifically, in some embodiments, the pixel coordinates of the pixel points in the original depth information image are converted into the coordinates in the world coordinate system according to the internal reference and the external reference of the depth camera (the camera that acquires the original depth information image), and then the coordinates in the world coordinate system are converted into the pixel coordinates in the original image coordinate system according to the internal reference and the external reference of the color camera (the camera that acquires the original image), that is, the coordinates in the world coordinate system are converted into the pixel coordinates in the initial segmented image coordinate system. For example, a three-dimensional point coordinate of a certain pixel point in an original depth information image in a depth camera coordinate system is calculated according to a two-dimensional point coordinate of the pixel point in the original depth information image in the depth image coordinate system and an internal parameter of a depth camera (a camera for acquiring the original depth information image); then, calculating the coordinate of the point in the world coordinate system according to the three-dimensional point coordinate of the pixel point in the depth camera coordinate system and the external reference matrix converted from the depth camera coordinate system to the world coordinate system; then, calculating the three-dimensional point coordinate of the point in the coordinate system of the color camera according to the coordinate of the point in the world coordinate system and an external reference matrix converted from the world coordinate system to the coordinate system of the color camera (a camera for acquiring an original image); and finally, calculating the pixel coordinate of the point in the color camera coordinate system according to the three-dimensional point coordinate in the color camera coordinate system and the internal reference matrix of the color camera. And performing coordinate conversion on all pixel points in the original depth information image to obtain an aligned original depth information image, wherein a coordinate system corresponding to the aligned original depth information image is the same as a coordinate system corresponding to the original image, namely the coordinate system corresponding to the aligned original depth information image is the same as the coordinate system corresponding to the initial segmentation image. Of course, in some embodiments, other embodiments may also be adopted to perform image alignment processing on the original depth information image to obtain the original depth information image aligned with the initial segmentation image, which is not limited herein.

Referring to fig. 18, after the aligned original depth information image is obtained, although the coordinate system corresponding to the aligned original depth information image is the same as the coordinate system corresponding to the initial segmented image, since the initial segmented image is obtained by preprocessing the original image and then performing segmentation processing, the size of the initial segmented image may be different from the size of the aligned original depth information image. Therefore, the original depth information image after alignment needs to be subjected to interpolation scaling processing to obtain a scaled original depth information image, and the size of the scaled original depth information image corresponds to the size of the initial segmentation image. Specifically, in some embodiments, a bilinear interpolation algorithm may be used to perform interpolation scaling on the aligned original depth information image. The bilinear interpolation algorithm is a better image scaling algorithm, and fully utilizes the depth values of four real pixels around a virtual point in an aligned original depth information image to jointly determine one depth value in the scaled original depth information image. Of course, in some embodiments, the original depth information image after alignment may be subjected to interpolation scaling by using a nearest neighbor interpolation algorithm to obtain a scaled original depth information image corresponding to the size of the initial segmentation image, which is not limited herein.

Referring to fig. 18, in some embodiments, after obtaining the scaled original depth information image corresponding to the size of the initial segmentation image, the scaled original depth information image is subjected to gaussian blurring to obtain a depth image corresponding to the initial segmentation image. Due to the fact that the Gaussian blur processing is carried out, noise of the zoomed original depth information image can be removed, the depth value of each pixel point in the depth image is smoother, and the subsequent image processing is facilitated. Of course, in some embodiments, the original depth information image after scaling may not be subjected to the gaussian blur processing, and the original depth information image after scaling may be directly used as the depth image, which is not limited herein.

Referring to fig. 16 and 19, in some embodiments, step 024: obtaining a corresponding second probability value according to the depth value of each pixel in the depth image, including:

0241: calculating the ratio of the depth value of each pixel in the depth image to the maximum depth value in the depth image;

0242: and calculating a difference value between the preset value and the ratio value to acquire a second probability value of each pixel in the depth image.

Referring to fig. 2, in some embodiments, step 0241 and step 0242 may be performed by the second obtaining module 12 of the image processing apparatus 10. That is, the second obtaining module 12 is further configured to calculate a ratio between the depth value of each pixel in the depth image and the maximum depth value in the depth image; and calculating a difference value between the preset value and the ratio to acquire a second probability value of each pixel in the depth image.

Referring to fig. 3, in some embodiments, step 0241 and step 0242 may be implemented by one or more processors 30 of terminal 100. That is, the one or more processors 30 are also operable to calculate a ratio between the depth value of each pixel in the depth image and the maximum depth value in the depth image; and calculating a difference value between the preset value and the ratio to acquire a second probability value of each pixel in the depth image.

After the original depth information image acquired by the hardware (the depth information acquiring module 60) is subjected to the first image processing to acquire the depth image corresponding to the initial segmentation image, the processor 30 (or the second acquiring module 12) calculates a ratio between a depth value of each pixel in the depth image and a maximum depth value in the depth image, and calculates a difference between a preset value and the ratio to acquire a second probability value I2 corresponding to each pixel in the depth image. The corresponding second probability value arrangement of each pixel is used for generating a second probability image. For example, in one embodiment, the predetermined value is 1, and it is assumed that the depth value of the pixel point located at the 1 st row and the 1 st column of the depth image is d₁Maximum depth value in depth image is d_maxThen the second probability value I2 corresponding to the pixel point is (1-d)₁/d_max) And the second probability value I2 at row 1 and column 1 of the second probability image is (1-d)₁/d_max). Thus, in the depth image, the larger the depth value, the smaller the second probability value I2 of the pixel point, that is, the more likely the pixel point is to be the background area, the smaller the second probability value I2; the smaller the depth value, the larger the second probability value I2 of the pixel point, that is, the more likely it is that the pixel point of the portrait area has the larger second probability value I2.

Referring to fig. 1 and 20, in some embodiments, the initial segmentation image includes a segmentation value of each pixel, step 03: acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image, wherein the image target segmentation image comprises:

031: and acquiring a target segmentation value of a pixel Pi 'j' at a position corresponding to the pixel Pij in the target segmentation image according to the segmentation value of each pixel Pij in the initial segmentation image, the first probability value of the pixel Pij, the depth value of a pixel Pi 'j' corresponding to the pixel Pij in the depth image and the second probability value of the pixel Pi 'j'.

Referring to fig. 2, in some embodiments, step 031 can be implemented by the fusion module 13 of the image processing apparatus 10. That is, the fusion module 13 is configured to obtain a target segmentation value of a pixel Pi "j" at a position corresponding to the pixel Pij in the target segmentation image according to the segmentation value of each pixel Pij in the initial segmentation image, the first probability value of the pixel Pij, the depth value of a pixel Pi 'j' corresponding to the pixel Pij in the depth image, and the second probability value of the pixel Pi 'j'.

Referring to fig. 3, in some embodiments, step 031 can also be implemented by one or more processors 30 of terminal 100. That is, the one or more processors 30 are further configured to obtain a target segmentation value of the pixel Pi "j" at the position corresponding to the pixel Pij in the target segmentation image according to the segmentation value of each pixel Pij in the initial segmentation image, the first probability value of the pixel Pij, the depth value of the pixel Pi 'j' corresponding to the pixel Pij in the depth image, and the second probability value of the pixel Pi 'j'.

It should be noted that, in some embodiments, before calculating the target segmentation value, the depth values of all pixels in the depth image are divided by the maximum depth value, so that the depth value of each pixel in the depth image is also in the range of 0 to 1, which is advantageous for subsequently enabling the calculated target segmentation value to be also in the range of 0 to 1. The depth values in the depth image described in the following embodiments are processed depth values, that is, the depth values in the depth image described in the following embodiments are all in the range of 0 to 1, which is not described herein again.

Referring to fig. 21, after the initial segmented image, the first probability image, the depth image and the second probability image are obtained, the processor 30 (or the fusion module 13) obtains the target segmented image according to the segmentation value of each pixel Pij in the initial segmented image, the first probability value of the pixel Pij, the depth value of the pixel Pi ' j ' corresponding to the pixel Pij in the depth image and the second probability value of the pixel Pi ' jThe target partition value of the pixel Pi "j" at the position corresponding to the pixel Pij. In particular, in some embodiments, equation S may be calculated_out(i，j)＝I2_(i,j)*(1-d_(i,j))*I1_(i,j)*S_in(i, j) is obtained by calculation, wherein S_out(i, j) a target segmentation value representing a pixel Pi "j" at a position corresponding to the pixel Pij in the target segmented image; s_in(i, j) represents the segmentation value of the pixel Pij in the initial segmented image; i1_(i,j)A first probability value representing the pixel Pij; d_(i,j)Represents a depth value of a pixel Pi 'j' corresponding to the pixel Pij in the depth image; i2_(i,j)Representing a second probability value of the pixel Pi 'j'. That is, the difference between the value 1 and the depth value of the pixel Pi 'j' corresponding to the pixel Pij in the depth image is calculated, and then the target segmentation value of the pixel Pi "j" at the position corresponding to the pixel Pij in the target segmentation image is calculated according to the product of the segmentation value of the pixel Pij, the first probability value of the pixel Pij corresponding to the pixel Pij, the second probability value of the pixel Pi 'j' and the difference. For example, taking the example of calculating the target segmentation value of the 1 st row and 1 st column pixel in the target segmentation image, the difference value between the value 1 and the depth value of the 1 st row and 1 st column pixel in the depth image is calculated, and then the product between the segmentation value of the 1 st row and 1 st column pixel in the initial segmentation image, the first probability value I1 of the 1 st row and 1 st column pixel in the first probability image, the second probability value I2 of the 1 st row and 1 st column pixel in the second probability image, and the difference value (the difference value between 1 and the depth value of the 1 st row and 1 st column pixel in the depth image) is calculated, and the calculated product is taken as the target segmentation value of the 1 st row and 1 st column pixel in the target segmentation image.

Referring to fig. 1 and 22, in some embodiments, the initial segmentation image includes a segmentation value of each pixel, step 03: acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image, wherein the image target segmentation image comprises:

032: acquiring a middle segmentation value of a pixel Pi 'j' at a position corresponding to the pixel Pij in the target segmentation image according to the segmentation value of each pixel Pij in the initial segmentation image, the first probability value of the pixel Pij, the depth value of a pixel Pi 'j' corresponding to the pixel Pij in the depth image and the second probability value of the pixel Pi 'j';

033: the target split value of each pixel Pi "j" is obtained from the intermediate split value of each pixel Pi "j" and the largest intermediate split value among all the pixels Pi "j".

Referring to fig. 2, in some embodiments, step 032 and step 033 may be implemented by the fusion module 13 of the image processing device 10. That is, the fusion module 13 is further configured to obtain an intermediate segmentation value of the pixel Pi "j" at the position corresponding to the pixel Pij in the target segmentation image according to the segmentation value of each pixel Pij in the initial segmentation image, the first probability value of the pixel Pij, the depth value of the pixel Pi 'j' corresponding to the pixel Pij in the depth image, and the second probability value of the pixel Pi 'j'; and obtaining a target segmentation value of each pixel Pi 'j' according to the intermediate segmentation value of each pixel Pi 'j' and the maximum intermediate segmentation value of all the pixels Pi 'j'.

Referring to fig. 3, in some embodiments, step 032 and step 033 may be implemented by being executed by one or more processors 30 of terminal 100. That is, the one or more processors 30 are further configured to obtain an intermediate segmentation value of the pixel Pi "j" at the position corresponding to the pixel Pij in the target segmentation image according to the segmentation value of each pixel Pij in the initial segmentation image, the first probability value of the pixel Pij, the depth value of the pixel Pi 'j' corresponding to the pixel Pij in the depth image, and the second probability value of the pixel Pi 'j'; and obtaining a target segmentation value of each pixel Pi 'j' according to the intermediate segmentation value of each pixel Pi 'j' and the maximum intermediate segmentation value of all the pixels Pi 'j'.

Referring to fig. 21, after the initial segmented image, the first probability image, the depth image and the second probability image are obtained, the processor 30 (or the fusion module 13) obtains a middle segmented value of the pixel Pi "j" at the position corresponding to the pixel Pij in the target segmented image according to the segmented value of each pixel Pij in the initial segmented image, the first probability value of the pixel Pij, the depth value of the pixel Pi 'j' corresponding to the pixel Pij in the depth image, and the second probability value of the pixel Pi 'j'. And a target division value of each pixel Pi "j" is obtained based on the intermediate division value of each pixel Pi "j" and the largest intermediate division value among all the pixels Pi "j". In this embodiment, after the intermediate segmentation values of the pixels Pi "j" at the positions corresponding to the pixels Pij in the target segmentation image are obtained according to the segmentation value of each pixel Pij in the initial segmentation image, the first probability value of the pixel Pij, the depth value of the pixels Pi 'j' corresponding to the pixels Pij in the depth image, and the second probability value of the pixels Pi 'j', all the intermediate segmentation values are further divided by the obtained maximum intermediate segmentation value to obtain the target segmentation value.

In particular, in some embodiments, equation S may be calculated_out(i，j)＝[I2_(i,j)*(1-d_(i,j))*I1_(ii,j)*S_in(i,j)]Is obtained by calculating,/E, wherein S_out(i, j) a target segmentation value representing a pixel Pi "j" at a position corresponding to the pixel Pij in the target segmented image; s_in(i, j) represents the segmentation value of the pixel Pij in the initial segmented image; i1_(i,j)A first probability value representing the pixel Pij; d_(i,j)Represents a depth value of a pixel Pi 'j' corresponding to the pixel Pij in the depth image; i2_(i，j)Representing a second probability value of the pixel Pi 'j', E being the largest intermediate partition value among all the pixels Pi "j". That is, the difference between the value 1 and the depth value of the pixel Pi 'j' corresponding to the pixel Pij in the depth image is calculated, and then the intermediate segmentation value of the pixel Pi "j" at the position corresponding to the pixel Pij in the target segmentation image is calculated according to the product of the segmentation value of the pixel Pij, the first probability value of the pixel Pij corresponding to the pixel Pij, the second probability value of the pixel Pi 'j' and the difference. After obtaining the intermediate segmentation values of all the pixels, dividing the intermediate segmentation value of the pixel Pi 'j' by the maximum intermediate segmentation value of all the pixels Pi 'j' of the target segmentation image to obtainThe target segmentation value of the pixel Pi "j" is obtained. For example, taking the example of calculating the target segmentation value of the 1 st row and 1 st column pixel in the target segmentation image, the difference value between the value 1 and the depth value of the 1 st row and 1 st column pixel in the depth image is calculated, and then the product between the segmentation value of the 1 st row and 1 st column pixel in the initial segmentation image, the first probability value I1 of the 1 st row and 1 st column pixel in the first probability image, the second probability value I2 of the 1 st row and 1 st column pixel in the second probability image, and the difference value (the difference value between 1 and the depth value of the 1 st row and 1 st column pixel in the depth image) is calculated, and the calculated product is taken as the intermediate segmentation value of the 1 st row and 1 st column pixel in the target segmentation image. Then, the median segmentation value of all pixels in the target segmentation image is calculated, assuming that the maximum median segmentation value is m. The result obtained by dividing the intermediate division value of the 1 st row and 1 st column pixel located in the target divided image by the maximum intermediate division value is m is taken as the target division value of the 1 st row and 1 st column pixel located in the target divided image.

It should be noted that, in some embodiments, if the segmentation value in the target segmentation image is greater than the second preset segmentation value, the region where the pixel point is located is considered as a human image region (for example, a white region in the target segmentation image in fig. 21); if the segmentation value in the target segmented image is not greater than the second preset segmentation value, the region where the pixel is located is considered as a background region (e.g., a black region in the target segmented image in fig. 21). The second preset division value may be the same as or different from the first preset division value in the above embodiment, and is not limited herein.

Because the target segmentation value is obtained by fusing the segmentation value, the first probability value, the depth value of the depth image and the second probability value in the initial segmentation image, compared with the method of directly segmenting the original image by adopting a single segmentation model, the method can avoid the problems of false detection, missed detection and the like in a complex scene, and improves the stability and the accuracy of segmentation of the portrait area.

Referring to fig. 23, in some embodiments, the target segmentation image includes a portrait area, and the image processing method further includes:

05: and performing second image processing on the original image according to the portrait area in the target segmentation image to obtain a target image.

Referring to fig. 2, in some embodiments, the image processing apparatus 10 further includes a processing module 14, and the step 05 can be implemented by the processing module 14. That is, the processing module 14 is configured to perform the second image processing on the original image to obtain the target image according to the portrait area in the target segmentation image.

Referring to fig. 3, in some embodiments, the step 05 can also be implemented by one or more processors 30 of the terminal 100. That is, the one or more processors 30 are further configured to perform a second image processing on the original image to obtain the target image according to the portrait area in the target segmentation image.

In some embodiments, after the target segmented image including the portrait area is acquired, the original image may be subjected to a second image processing according to the portrait area of the target segmented image including the portrait area to acquire the target image. Specifically, in some embodiments, the processor 30 (or the processing module 14) acquires the portrait area in the original image according to the portrait area of the target segmentation image (i.e., the white area portion in the target segmentation image of fig. 21). Illustratively, in one example, if the original image is scaled at the time of preprocessing, the target segmented image is enlarged to the same size as the original image; and if the original image is rotated during preprocessing, reversely rotating the target segmentation image. For example, if the original image is rotated by 90 ° to the left when the preprocessing is performed, the target segmented image is rotated by 90 ° to the right so that the target segmented image corresponds to the original image. After the target segmented image corresponds to the original image, the position in the original image corresponding to the portrait area in the target segmented image (i.e. the white area in the target segmented image in fig. 21) is the portrait area in the original image. Of course, the portrait area in the original image may also be obtained according to the portrait area in the target segmentation image in other manners, which is not limited herein. After acquiring the portrait area of the original image, the processor 30 (or the processing module 14) may perform background blurring on an area of the original image except the portrait area; alternatively, the processor 30 (or the processing module 14) may perform beautification processing on the portrait area in the original image; still alternatively, processor 30 (or processing module 14) may also extract the region of the portrait in the original image from the original image and place it in another frame of image to generate a new image containing the portrait in the original image.

Referring to fig. 24, the present application also provides a non-transitory computer readable storage medium 400 containing a computer program 410. The computer program, when executed by the processor 420, causes the processor 420 to perform the image processing method of any of the embodiments described above.

Referring to fig. 1, for example, when the computer program 410 is executed by the processor 420, the processor 420 is caused to perform the methods of 01, 011, 0111, 0112, 0113, 012, 03, 021, 022, 023, 0231, 0232, 024, 0241, 0242, 03, 031, 032, 033, 04, and 05. For example, the following image processing method is performed

It should be noted that the processor 420 may be disposed in the terminal 100, that is, the processor 420 and the processor 30 are the same processor, and of course, the processor 420 may not be disposed in the terminal 100, that is, the processor 420 and the processor 30 are not the same processor, which is not limited herein.

In the description herein, reference to the description of the terms "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example" or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes additional implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

Although embodiments of the present application have been shown and described above, it is to be understood that the above embodiments are exemplary and not to be construed as limiting the present application, and that changes, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. An image processing method, comprising:

performing portrait segmentation processing on an acquired original image to obtain an initial segmented image and a first probability image, wherein the initial segmented image comprises a portrait area and a background area, the first probability image comprises a first probability value of each pixel in the initial segmented image, and the first probability value represents the probability of each pixel in the initial segmented image in the portrait area;

acquiring a depth image and a second probability image, wherein the depth image is used for indicating the depth value of each pixel in the original image, and the second probability image comprises a second probability value corresponding to each pixel in the depth image; and

and acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image.

2. The image processing method according to claim 1, wherein said performing a portrait segmentation process on the acquired original image to obtain an initial segmented image and a first probability image comprises:

preprocessing the original image to obtain a preprocessed image;

inputting the preprocessed image into a preset segmentation model to obtain the initial segmentation image and the first probability image.

3. The image processing method of claim 1, wherein the second probability value characterizes a probability that a depth value of each pixel in the depth image is the corresponding depth value; the acquiring the depth image and the second probability image includes:

preprocessing the original image to obtain a preprocessed image; and

inputting the preprocessed image into a depth estimation network model to obtain the depth image and the second probability image.

4. The image processing method according to claim 1, characterized in that the image processing method further comprises:

acquiring an original depth information map by using a depth information acquisition module;

the acquiring the depth image and the second probability image includes:

performing first image processing on the original depth information map according to the initial segmentation image to obtain the depth image corresponding to the initial segmentation image; and

and acquiring a corresponding second probability value according to the depth value of each pixel in the depth image so as to acquire the second probability image.

5. The method of claim 4, wherein the obtaining a corresponding second probability value according to the depth value of each pixel in the depth image comprises:

calculating a ratio between a depth value of each pixel in the depth image and a maximum depth value in the depth image;

and calculating a difference value between a preset value and the ratio value to acquire a second probability value of each pixel in the depth image.

6. The image processing method according to any one of claims 1 to 5, wherein the initial segmentation image includes a segmentation value for each pixel; the acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image includes:

and acquiring a target segmentation value of a pixel Pi 'j' at a position corresponding to the pixel Pij in the target segmentation image according to the segmentation value of each pixel Pij in the initial segmentation image, the first probability value of the pixel Pij, the depth value of a pixel Pi 'j' corresponding to the pixel Pij in the depth image and the second probability value of the pixel Pi 'j'.

7. The image processing method according to any one of claims 1 to 5, wherein the initial segmentation image includes a segmentation value for each pixel; the acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image includes:

acquiring a middle segmentation value of a pixel Pi 'j' at a position corresponding to the pixel Pij in the target segmentation image according to a segmentation value of each pixel Pij in the initial segmentation image, a first probability value of the pixel Pij, a depth value of a pixel Pi 'j' corresponding to the pixel Pij in the depth image and a second probability value of the pixel Pi 'j';

and acquiring a target segmentation value of each pixel Pi 'j' according to the intermediate segmentation value of each pixel Pi 'j' and the maximum intermediate segmentation value of all the pixels Pi 'j'.

8. The image processing method according to claim 6, wherein the target segmented image includes a portrait area, the image processing method further comprising:

and performing second image processing on the original image according to the portrait area in the target segmentation image to obtain a target image.

9. An image processing apparatus characterized by comprising:

a first obtaining module, configured to perform a portrait segmentation process on an obtained original image to obtain an initial segmentation image and a first probability image, where the initial segmentation image includes a portrait area and a background area, the first probability image includes a first probability value of each pixel in the initial segmentation image, and the first probability value represents a probability of each pixel in the initial segmentation image in the portrait area;

a second obtaining module, configured to obtain a depth image and a second probability image, where the depth image is used to indicate a depth value of each pixel in the original image, and the second probability image includes a second probability value corresponding to each pixel in the depth image; and

and the fusion module is used for acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image.

10. A terminal, comprising:

a housing; and

one or more processors in combination with the housing, the one or more processors configured to perform the image processing method of any of claims 1 to 8.

11. A non-transitory computer-readable storage medium containing a computer program, wherein the computer program, when executed by a processor, causes the processor to perform the image processing method of any one of claims 1 to 8.