CN115409759A

CN115409759A - Disparity map generation method and device, storage medium and terminal equipment

Info

Publication number: CN115409759A
Application number: CN202110584249.1A
Authority: CN
Inventors: 李鹏; 刘阳兴
Original assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Current assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Priority date: 2021-05-27
Filing date: 2021-05-27
Publication date: 2022-11-29

Abstract

The invention discloses a disparity map generation method, a disparity map generation device, a storage medium and terminal equipment, wherein the method comprises the following steps: acquiring an image to be processed and an auxiliary image corresponding to the image to be processed, and determining a candidate disparity map corresponding to the image to be processed according to the image to be processed and the auxiliary image; inputting the image to be processed into a portrait segmentation model, and outputting a portrait mask image corresponding to the image to be processed through the portrait segmentation model; and adjusting the candidate parallax map according to the portrait mask map to obtain a target parallax map corresponding to the image to be processed. The invention optimizes the consistency of the parallax values of the portrait in the parallax image through the portrait mask image, reduces false blurring and missing blurring of the portrait, and simultaneously utilizes the depth gradual change in the parallax image to ensure that the blurring effect of the blurring image can be gradually changed, thereby improving the blurring effect of the image to be processed.

Description

Disparity map generation method and device, storage medium and terminal equipment

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a disparity map generation method and apparatus, a storage medium, and a terminal device.

Background

The dual cameras have been increasingly applied to mobile terminal devices, and the photographing function of the existing mobile terminal device configured with dual cameras is generally configured with a background blurring function, so that a user can photograph a picture with a blurred background and a prominent foreground based on the background blurring function. The disparity map obtained based on the two cameras is generally used for background blurring, however, the existing portrait blurring function generally has a problem of poor blurring effect.

Thus, there is still a need for improvement and development of the prior art.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a disparity map generation method, apparatus, storage medium and terminal device for solving the disadvantages of the prior art.

In order to solve the above technical problem, a first aspect of the embodiments of the present application provides a disparity map generating method, including:

acquiring an image to be processed and an auxiliary image corresponding to the image to be processed, and determining a candidate disparity map corresponding to the image to be processed according to the image to be processed and the auxiliary image;

inputting the image to be processed into a portrait segmentation model, and outputting a portrait mask image corresponding to the image to be processed through the portrait segmentation model;

and adjusting the candidate parallax map according to the portrait mask map to obtain a target parallax map corresponding to the image to be processed.

The parallax map generation method comprises the steps that the image to be processed and the auxiliary image are shot by an imaging module, wherein the imaging module comprises at least a main imager and an auxiliary imager; the main imager is used for shooting the image to be processed, and the auxiliary imager is used for shooting the auxiliary image.

The disparity map generation method, wherein the determining, according to the image to be processed and the auxiliary image, the candidate disparity map corresponding to the image to be processed specifically includes:

respectively adjusting the image size of the image to be processed and the image size of the auxiliary image to be preset image sizes to obtain an adjusted image to be processed and an adjusted auxiliary image;

and determining a candidate disparity map corresponding to the image to be processed according to the adjusted image to be processed and the adjusted auxiliary image.

The disparity map generation method comprises the steps that the portrait segmentation model comprises a first feature extraction module, a second feature extraction module, a third feature extraction module, a fourth feature extraction module and a fifth feature extraction module, an output item of the first feature extraction module is an input item of the second feature extraction module, an output item of the second feature extraction module is an input item of the third feature extraction module, an output item of the third feature extraction module is an input item of the fourth feature extraction module, and an output item of the fourth feature extraction module is an input item of the fifth feature extraction module.

The disparity map generation method includes that the first feature extraction module includes a first convolution layer, a first feature extraction unit, a second feature extraction unit and a third feature extraction unit which are sequentially cascaded, the second feature extraction module includes a fourth feature extraction unit, a fifth feature extraction unit and a sixth feature extraction unit which are connected in parallel, an output item of the first convolution layer is an input item of the first feature extraction unit, an output item of the first feature extraction unit is an input item of the second feature extraction unit and the fourth feature extraction unit, an output item of the second feature extraction unit is an input item of the third feature extraction unit and the fifth feature extraction unit, and an output item of the third feature extraction unit is an input item of the sixth feature extraction unit.

In the disparity map generation method, the third feature extraction module and the fifth feature extraction module each include a plurality of seventh feature extraction units and a plurality of first feature fusion units which are sequentially cascaded and arranged at intervals, the fourth feature extraction module comprises a plurality of seventh feature extraction units and a plurality of second feature fusion units which are sequentially cascaded and arranged at intervals, the output items of all the feature extraction units in the second feature extraction module are respectively horizontal input items of all the seventh feature extraction units in the third feature extraction module, the output items of the seventh feature extraction units in the third feature extraction module are horizontal input items of the seventh feature extraction units in the fourth feature extraction module respectively, the output items of the seventh feature extraction units in the fourth feature extraction module are horizontal input items of the seventh feature extraction units in the fifth feature extraction module respectively, the input items of the first feature fusion units in the third feature extraction module and the fifth feature extraction module are respectively the output items of the previous seventh feature extraction unit adjacent to the input items, the output item of each first feature fusion unit in the third feature extraction module and the fifth feature extraction module is a vertical input item of a next seventh feature extraction unit adjacent to the output item, the input items of the second feature fusion units in the fourth feature extraction module are respectively the output items of the previous seventh feature extraction unit adjacent to the input items, and the output item of each second feature fusion unit in the fourth feature extraction module is a vertical input item of a next seventh feature extraction unit adjacent to the output item.

The disparity map generation method includes that the seventh feature extraction unit includes a third feature fusion layer, a plurality of second convolution layers, a first normalization layer, a fourth feature fusion layer, and a first activation layer, which are sequentially cascaded, a second normalization layer, a second activation layer, and a regularization layer are sequentially cascaded between two adjacent convolution layers of the plurality of second convolution layers, an output item of the third feature fusion layer is an input item of the plurality of second convolution layers, an output item of the plurality of second convolution layers is an input item of the first normalization layer, an output item of the third fusion layer and an output item of the first normalization layer are input items of the fourth fusion layer, and an output item of the fourth fusion layer is an input item of the first activation layer.

The disparity map generation method comprises the following specific steps of:

inputting training images in a preset training sample into a preset network model, and determining a predicted portrait mask image corresponding to the training images through the preset network model; wherein the training image set comprises training images and real portrait mask images corresponding to the training images;

training the preset network model based on the predicted portrait mask image, the real portrait mask image and the loss function of the preset network model to obtain a portrait segmentation model; wherein the loss functions include a first loss function for segmentation of the portrait and a second loss function for optimization of edges of the portrait.

The disparity map generation method, wherein the second loss function construction method comprises:

respectively performing expansion operation and corrosion operation on the real portrait mask image, and determining the weight corresponding to the second loss function based on the expanded real portrait mask image and the real portrait mask image subjected to the corrosion operation;

and constructing the second loss function based on the predicted portrait mask image, the real portrait mask image and the weight corresponding to the second loss function.

The disparity map generation method further includes, after the human image mask map corresponding to the image to be processed is output through the human image segmentation model:

acquiring a plurality of first portrait areas of the portrait mask image and the number of pixels corresponding to each first portrait area;

and preprocessing the human image mask image according to the number of the pixels.

The disparity map generation method, wherein the preprocessing the human image mask map according to the number of the pixels specifically comprises:

determining the pixel proportion corresponding to each first portrait area according to the number of the pixels;

and comparing the pixel proportion with a preset threshold value, and removing the first portrait area corresponding to the pixel proportion from the portrait mask image when the pixel proportion is smaller than the preset threshold value.

The disparity map generation method, wherein the adjusting the candidate disparity map according to the portrait mask map to obtain the target disparity map corresponding to the image to be processed specifically includes:

adjusting the image size of the portrait mask image to be the same as the image size of the candidate parallax image, and determining a second portrait area of the candidate parallax image according to the adjusted portrait mask image;

and acquiring a plurality of first parallax values corresponding to the second portrait area, and adjusting the candidate parallax map according to the plurality of first parallax values to obtain a target parallax map corresponding to the image to be processed.

The disparity map generation method, wherein the adjusting the candidate disparity map according to the first disparity values to obtain the target disparity map corresponding to the image to be processed specifically includes:

determining a parallax average value corresponding to the second portrait area according to the plurality of first parallax values;

and adjusting the candidate disparity map according to the disparity average value to obtain a target disparity map corresponding to the image to be processed.

The disparity map generation method, wherein the adjusting the candidate disparity map according to the disparity mean specifically includes:

and setting a plurality of first parallax values corresponding to the second portrait area as the parallax average value.

acquiring a plurality of second parallax values corresponding to the non-portrait areas of the candidate parallax images;

and comparing the second parallax values with the parallax average value, screening a plurality of third parallax values equal to the parallax average value from the second parallax values, and setting the third parallax values as preset fourth parallax values.

The disparity map generation method comprises the following steps of obtaining a target disparity map corresponding to the image to be processed:

and performing background blurring on the image to be processed according to the target disparity map.

A second aspect of the embodiments of the present application provides a disparity map generating apparatus, including:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an image to be processed and an auxiliary image corresponding to the image to be processed and determining a candidate disparity map corresponding to the image to be processed according to the image to be processed and the auxiliary image;

the segmentation module is used for inputting the image to be processed into a portrait segmentation model and outputting a portrait mask image corresponding to the image to be processed through the portrait segmentation model;

and the adjusting module is used for adjusting the candidate parallax map according to the portrait mask map to obtain a target parallax map corresponding to the image to be processed.

A third aspect of embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores one or more programs, and the one or more programs are executable by one or more processors to implement the steps in the disparity map generating method according to any one of the above descriptions.

A fourth aspect of the embodiments of the present application provides a terminal device, including: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;

the communication bus realizes connection communication between the processor and the memory;

the processor, when executing the computer readable program, implements the steps in the disparity map generating method as described in any one of the above.

Has the advantages that: compared with the prior art, the method comprises the steps of obtaining an image to be processed and an auxiliary image corresponding to the image to be processed, and determining a candidate disparity map corresponding to the image to be processed according to the image to be processed and the auxiliary image; inputting the image to be processed into a portrait segmentation model, and outputting a portrait mask image corresponding to the image to be processed through the portrait segmentation model; and adjusting the candidate parallax map according to the portrait mask map to obtain a target parallax map corresponding to the image to be processed. The consistency of the parallax values of the portrait in the parallax image is optimized through the portrait mask image, false and missing of the portrait is reduced, meanwhile, the depth gradient in the parallax image is utilized to enable the blurring image obtained through blurring to show a gradual blurring effect, and therefore the blurring effect of the image to be processed is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a disparity map generating method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a human image segmentation model of a disparity map generation method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a seventh feature extraction unit in a portrait segmentation model of a disparity map generation method according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a disparity map generating apparatus according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

In order to make the purpose, technical solution, and effects of the present application clearer and clearer, the present application will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

In particular implementations, the terminal devices described in embodiments of the present application include, but are not limited to, other portable devices such as mobile phones, laptops, or tablet computers with touch sensitive surfaces (e.g., touch displays and/or touch pads). It should also be understood that in some embodiments, the device is not a portable communication device, but is a desktop computer having a touch-sensitive surface (e.g., a touch-sensitive display screen and/or touchpad).

In the discussion that follows, a terminal device that includes a display and a touch-sensitive surface is described. However, it should be understood that the terminal device may also include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.

The terminal device supports various applications, such as one or more of the following: a drawing application, a presentation application, a word processing application, a video conferencing application, a disc burning application, a spreadsheet application, a gaming application, a telephone application, a video conferencing application, an email application, an instant messaging application, an exercise support application, a photo management application, a data camera application, a digital video camera application, a web browsing application, a digital music player application, and/or a digital video playing application, etc.

Various applications that may be executed on the terminal device may use at least one common physical user interface device, such as a touch-sensitive surface. The first or more functions of the touch-sensitive surface and the corresponding information displayed on the terminal may be adjusted and/or changed between applications and/or within respective applications. In this way, a common physical framework (e.g., touch-sensitive surface) of the terminal can support various applications with user interfaces that are intuitive and transparent to the user.

It should be understood that, the sequence numbers and sizes of the steps in this embodiment do not mean the execution sequence, and the execution sequence of each process is determined by its function and inherent logic, and should not constitute any limitation on the implementation process of this embodiment.

The inventor has found through research that dual cameras have been increasingly applied to mobile terminal devices, and the photographing function of the existing mobile terminal device configured with dual camera is generally configured with a background blurring function. Based on the background blurring function, a picture with a blurred background and a prominent foreground can be shot. The commonly used background blurring method for the target mainly comprises a parallax-based method and a portrait segmentation-based method, wherein the parallax-based method mainly adopts binocular parallax estimation to obtain a parallax image of a shooting scene, and then the parallax image is used for background blurring; and obtaining a portrait mask image in the current background by adopting portrait segmentation based on the portrait segmentation method, and then using the portrait mask image for background blurring.

However, the two background blurring methods have certain limitations in the face of complex scenes, wherein the edges of the portrait (fingers, ears, hollows, etc.) cannot be well managed based on the parallax method, and leakage and false blurring easily occur at the edges of the portrait; the human image segmentation method can not process scenes without human beings, and meanwhile, due to the fact that parallax information does not exist, the background does not have blurring and gradually changing effects.

Based on this, in the embodiment of the application, an image to be processed and an auxiliary image corresponding to the image to be processed are obtained, and a candidate disparity map corresponding to the image to be processed is determined according to the image to be processed and the auxiliary image; inputting the image to be processed into a portrait segmentation model, and outputting a portrait mask image corresponding to the image to be processed through the portrait segmentation model; and adjusting the candidate parallax map according to the portrait mask map to obtain a target parallax map corresponding to the image to be processed. The consistency of the parallax values of the portrait in the parallax image is optimized through the portrait mask image, false and missing of the portrait is reduced, meanwhile, the depth gradient in the parallax image is utilized to enable the blurring image obtained through blurring to show a gradual blurring effect, and therefore the blurring effect of the image to be processed is improved.

The following further describes the content of the application by describing the embodiments with reference to the attached drawings.

The present embodiment provides a disparity map generating method, as shown in fig. 1, the method includes:

s100, obtaining an image to be processed and an auxiliary image corresponding to the image to be processed, and determining a candidate disparity map corresponding to the image to be processed according to the image to be processed and the auxiliary image.

Specifically, the image to be processed and the auxiliary image may be images captured by an imaging module, where the imaging module includes at least two imagers, and the two imagers are a main imager and an auxiliary imager, respectively. The main imager and the auxiliary imager are arranged on the same plane, and the main imager and the auxiliary imager can be arranged together in a transverse adjacent mode or in a vertical adjacent mode. The primary imager and the secondary imager may be dual cameras of an electronic device (e.g., a smartphone), i.e., both the primary imager and the secondary imager are cameras. For example, the main imager and the auxiliary imager may be dual rear cameras or dual front cameras, wherein the main imager and the auxiliary imager may be one color imager and the other black and white imager (e.g., the main imager is a color imager and the auxiliary imager is a black and white imager), and the main imager and the auxiliary imager may also be imagers with different focal lengths, and of course, the main imager and the auxiliary imager may also be the same imager. Of course, the imaging module may further include 3 imagers (e.g., a smartphone having three cameras, etc.), and may also include 4 imagers, etc.

The image to be processed and the auxiliary image can be images acquired by an imaging module configured in the electronic equipment, or images acquired by imaging modules of other electronic equipment through a network, bluetooth, infrared and the like. In a specific implementation manner of this embodiment, the image to be processed and the auxiliary image are obtained by shooting through an imaging module configured in an electronic device itself, the image to be processed is obtained by shooting through a main imager of the imaging module, and the auxiliary image is obtained by shooting through an auxiliary imager of the imaging module. It is understood that the electronic device is configured with an imaging module comprising at least a primary imager and a secondary imager; the main imager is used for shooting an image to be processed, the auxiliary imager is used for shooting an auxiliary image, and the auxiliary image is used for assisting in determining a candidate disparity map of the image to be processed. For example, when a mobile phone configured with two cameras takes a picture, the main camera in the two cameras collects an image a, and the auxiliary camera in the two cameras collects an image B, then the image a is a to-be-processed image, and the image B is an auxiliary image corresponding to the image a.

The candidate disparity map is formed by disparities of first pixel points and corresponding second pixel points in the image to be processed, the disparities refer to the horizontal distance between central pixel points of two matched image blocks in the left binocular image and the right binocular image, the corresponding second pixel points of the first pixel points are contained in the auxiliary image, and the pixel positions of the first pixel points in the image to be processed are the same as the pixel positions of the corresponding second pixel points in the auxiliary image. Therefore, after the image to be processed and the auxiliary image corresponding to the image to be processed are obtained, the candidate disparity map corresponding to the image to be processed is determined based on the image to be processed and the auxiliary image.

Further, in an implementation manner of this embodiment, the determining, according to the image to be processed and the auxiliary image, a candidate disparity map corresponding to the image to be processed specifically includes:

s110, respectively adjusting the image size of the image to be processed and the image size of the auxiliary image to be preset image sizes to obtain the adjusted image to be processed and the adjusted auxiliary image.

And S120, determining a candidate disparity map corresponding to the image to be processed according to the adjusted image to be processed and the adjusted auxiliary image.

Specifically, in order to increase the determination speed of the candidate disparity map, the image size of the image to be processed and the image size of the auxiliary image may be adjusted when determining the candidate disparity map, and the image size of the image to be processed after the adjustment is equal to the image size of the auxiliary image after the adjustment and is smaller than the image size of the image to be processed before the adjustment. In addition, for each first pixel point in the adjusted image to be processed, a candidate parallax pixel point exists in the candidate parallax map, the candidate parallax pixel point is used for reflecting the parallax value of the first pixel point and the corresponding second pixel point, and the pixel position of the candidate parallax pixel point in the candidate parallax map is the same as the pixel position of the first pixel point in the adjusted image to be processed. It should be noted that, in practical applications, the candidate disparity map corresponding to the image to be processed may be determined directly based on the image to be processed and the auxiliary image without performing image size adjustment on the image to be processed and the auxiliary image. In addition, when determining the candidate disparity map based on the image to be processed and the auxiliary image, an SGBM algorithm may be used to determine the candidate disparity map, and after acquiring the candidate disparity map, post-processing may be performed on the candidate disparity map to improve the accuracy of the candidate disparity map, for example, the post-processing may include hole filling, a joint bilateral filtering algorithm, and the like. The SGBM algorithm, the hole filling algorithm, and the joint bilateral filtering algorithm are all existing algorithms, and are not described herein.

S200, inputting the image to be processed into a portrait segmentation model, and outputting a portrait mask image corresponding to the image to be processed through the portrait segmentation model.

Specifically, the portrait mask image is used to reflect a portrait area in the image to be processed, the portrait mask image is obtained by performing portrait segmentation on the image to be processed through a neural network model, and correspondingly, the portrait mask image for obtaining the image to be processed may specifically be: inputting the image to be processed into a trained portrait segmentation model, and outputting a portrait mask image corresponding to the image to be processed through the portrait segmentation model. As shown in fig. 2, the portrait segmentation model includes a first feature extraction module, a second feature extraction module, a third feature extraction module, a fourth feature extraction module, and a fifth feature extraction module, an output item of the first feature extraction module is an input item of the second feature extraction module, an output item of the second feature extraction module is an input item of the third feature extraction module, an output item of the third feature extraction module is an input item of the fourth feature extraction module, and an output item of the fourth feature extraction module is an input item of the fifth feature extraction module.

Correspondingly, the inputting the image to be processed into the portrait segmentation model, and outputting the portrait mask image corresponding to the image to be processed through the portrait segmentation model specifically includes:

s210, inputting the image to be processed into a first feature extraction module, and outputting a first feature map corresponding to the image to be processed through the first feature extraction module;

s220, inputting the first feature map into a second feature extraction module, and outputting a second feature map corresponding to the image to be processed through the second feature extraction module;

s230, inputting the second feature map into a third feature extraction module, and outputting a third feature map corresponding to the image to be processed through the third feature extraction module;

s240, inputting the third feature map into the fourth feature extraction module, and outputting a fourth feature map corresponding to the image to be processed through the fourth feature extraction module;

and S250, inputting the fourth feature map into the fifth feature extraction module, and outputting a portrait mask map corresponding to the image to be processed through the fifth feature extraction module.

Specifically, the first feature extraction module includes a first convolution layer (i.e., 5x5 conv,64 in fig. 2), a first feature extraction unit (i.e., CNN Block C1 channels in fig. 2), a second feature extraction unit (i.e., CNN Block C2 channels in fig. 2), and a third feature extraction unit (i.e., CNN Block C3 channels in fig. 2) that are sequentially cascaded, where convolution kernels of the first convolution layer may be 5*5, a step size is 2, convolution kernels of the first feature extraction unit, the second feature extraction unit, and the third feature extraction unit are all 3*3, the step size is 2, and the number of feature channels is: 128, 256, 512. In one embodiment, the image to be processed is an input item of a first convolution layer, an output item of the first convolution layer is an input item of a first feature extraction unit, an output item of the first feature extraction unit is an input item of a second feature extraction unit, an output item of the second feature extraction unit is an input item of a third feature extraction unit, and the sizes of feature maps output by the first feature extraction unit, the second feature extraction unit and the third feature extraction unit are 1/4, 1/8 and 1/16 of the size of the image to be processed respectively.

The second feature extraction module comprises a fourth feature extraction unit (i.e. 1x1conv, bn, relu,64 in fig. 2), a fifth feature extraction unit (i.e. 1x1conv, bn, relu,128 in fig. 2) and a sixth feature extraction unit (i.e. 1x1conv, bn, relu,256 in fig. 2) which are connected in parallel, convolution kernels of the fourth feature extraction unit, the fifth feature extraction unit and the sixth feature extraction unit are all 1*1, the step length is 1, and the number of feature channels is respectively: 64 128, 256, the fourth feature extraction unit, the fifth feature extraction unit and the sixth feature extraction unit respectively perform convolution operation, batch normalization processing and activation operation on the input items in sequence. In a specific embodiment, the output item of the first feature extraction unit is the input item of the fourth feature extraction unit, the output item of the second feature extraction unit is the input item of the fifth feature extraction unit, and the output item of the third feature extraction unit is the input item of the sixth feature extraction unit.

The third feature extraction module and the fifth feature extraction module both include a plurality of seventh feature extraction units (i.e., S-blocks in fig. 2) and a plurality of first feature fusion units (i.e., conv trans in fig. 2) that are sequentially cascaded and arranged at intervals, the fourth feature extraction module includes a plurality of seventh feature extraction units (i.e., S-blocks in fig. 2) and a plurality of second feature fusion units (i.e., conv stride in fig. 2) that are sequentially cascaded and arranged at intervals, a convolution kernel of the first feature fusion unit may be 3*3, a step length is 1, the first feature fusion unit performs convolution operation on an input item and reduces the input item to 1/2 on a feature channel, a convolution kernel of the second feature fusion unit may be 3*3, the step length is 2, the second feature fusion unit performs convolution operation on the input item and increases the input item to 2 times on the feature channel. In a specific embodiment, the output items of the respective feature extraction units in the second feature extraction module are horizontal input items of the respective seventh feature extraction units in the third feature extraction module, the output items of the respective seventh feature extraction units in the third feature extraction module are horizontal input items of the respective seventh feature extraction units in the fourth feature extraction module, the output items of the respective seventh feature extraction units in the fourth feature extraction module are horizontal input items of the respective seventh feature extraction units in the fifth feature extraction module, the input items of the respective first feature fusion units in the third feature extraction module and the fifth feature extraction module are vertical input items of the respective seventh feature extraction units adjacent thereto, the output items of the respective first feature fusion units in the third feature extraction module and the fifth feature extraction module are vertical input items of the respective seventh feature extraction units adjacent thereto, and the input items of the respective second feature fusion units in the fourth feature extraction module are vertical input items of the respective seventh feature extraction units adjacent thereto.

In one implementation of this embodiment, as shown in fig. 3, the seventh feature extraction unit (i.e., S-Block in fig. 2) includes a third feature fusion layer (i.e., the first add from top to bottom in fig. 3), a plurality of second convolution layers (i.e., 3x3 Conv in fig. 3), a first normalization layer (i.e., the second Batch Norm from top to bottom in fig. 3), a fourth feature fusion layer (i.e., the second add from top to bottom in fig. 3), and a first activation layer (i.e., the second ReLU from top to bottom in fig. 3), which are sequentially cascaded between two adjacent convolution layers in the plurality of second convolution layers (i.e., the first Batch Norm from top to bottom in fig. 3), a second activation layer (i.e., the first ReLU from top to bottom in fig. 3), and a regularization layer (i.e., drop out in fig. 3). In other words, the seventh feature extraction unit may include two second convolution layers, or may include three or more second convolution layers, a second normalization layer, a second activation layer, and a regularization layer are sequentially cascaded between two adjacent second convolution layers, a convolution kernel of the second convolution layer is 3*3, the third feature fusion layer and the fourth feature fusion layer perform pixel-by-pixel addition on the input term, and the regularization layer performs random deactivation on the input term. In a specific embodiment, the output items of the third feature fusion layer are input items of a plurality of the second convolution layers, the output items of a plurality of the second convolution layers are input items of the first normalization layer, the output items of the third fusion layer and the output items of the first normalization layer are input items of the fourth fusion layer, and the output items of the fourth fusion layer are input items of the first activation layer. Between two adjacent second convolution layers, the output item of the previous second convolution layer is the input item of the second normalization layer, the output item of the second normalization layer is the input item of the second activation layer, the output item of the second activation layer is the input item of the regularization layer, and the output item of the regularization layer is the input item of the next second convolution layer.

In an implementation manner of this embodiment, the portrait segmentation model is trained, and the training process of the portrait segmentation model specifically includes:

m210, inputting a training image in a preset training sample into a preset network model, and determining a predicted portrait mask image corresponding to the training image through the preset network model; the training image set comprises training images and real portrait mask images corresponding to the training images;

m220, training the preset network model based on the predicted portrait mask image, the real portrait mask image and a loss function of the preset network model to obtain a portrait segmentation model; wherein the loss functions comprise a first loss function for segmentation of the portrait and a second loss function for optimization of the edges of the portrait.

Specifically, the preset training sample comprises a plurality of training image groups, each training image group in the plurality of training image groups comprises a training image and a real portrait mask image corresponding to the training image, the training image comprises a portrait area, the real portrait mask image is a mask image corresponding to the portrait area in the training image, and the real portrait mask image is used as a judgment basis for a predicted portrait mask image output through a preset network model, so that whether the predicted portrait mask image output through the preset network model meets requirements or not is determined.

The model structure of the preset network model is the same as that of the portrait segmentation model, and the preset network model is different from the portrait segmentation model in that: the model parameters of the preset network model are preset initial model parameters, and the model parameters of the portrait segmentation model are model parameters trained based on preset training samples. Therefore, the model structure of the preset network model is not repeated here, and the model structure of the portrait segmentation model can be referred to specifically.

In an implementation manner of this embodiment, the training sample data corresponding to the portrait segmentation model may include a preset number (e.g., 10 ten thousand, etc.) of portrait segmentation data pairs, where the portrait segmentation data pairs include training images and real portrait mask images corresponding to the training images, and when the preset network model is trained based on the training samples, the training samples may be subjected to data enhancement operations such as 90-degree rotation, contrast stretching, brightness adjustment, noise addition, saturation adjustment, random clipping, and the like, so as to improve the diversity of the training samples. In addition, the number of iterations (epoch) of the training of the segmentation network model may be 300, the batch size may be 32, and the gradient of the network may be optimized by using a random gradient descent method SGD, the initial learning rate is adjusted to 0.05, the learning rate is multiplied by 0.1 at 100 and 200 iterations, and the Momentum is 0.85. In addition, the preset network model adopts a first loss function to optimize the model parameters in the training process, wherein the first loss function is as follows:

wherein, beta is the weight of the loss function, p represents the confidence coefficient of the mask image of the real portrait,

representing the confidence of the mask image of the predicted portrait; in a particular implementation, β =1.5.

In an implementation manner of this embodiment, in order to improve the sensitivity of the portrait segmentation model to the portrait edge, in this embodiment, a second loss function for optimizing the portrait edge is pre-constructed, and the preset network model performs model parameter optimization by using the first loss function and the second loss function simultaneously in the training process, so that the portrait mask image output by the portrait segmentation model has a good portrait edge. Wherein the second loss function has the formula:

M ^c ＝Gauss(K*((M) ⁺ -(M) ^- ))

wherein p represents the confidence of the mask image of the real human image,

representing confidence of the predicted portrait mask, M representing the real portrait mask, M ⁺ Indicating the expansion operation, M ^- The method is characterized in that the method represents corrosion operation, K is a constant, and Gauss (·) represents that Gauss blurring is carried out on the product; in a particular implementation, K =5.

As can be seen from the above formula of the second loss function, in order to construct the second loss function, the expansion operation and the erosion operation need to be performed on the real portrait mask image M, respectively, and then the gaussian blur is performed based on the expanded real portrait mask image and the real portrait mask image after the erosion operation, so as to obtain the weight M corresponding to the second loss function ^c And finally, constructing the second loss function based on the predicted portrait mask image, the real portrait mask image and the weight corresponding to the second loss function.

In an implementation manner of this embodiment, after the outputting, by the portrait segmentation model, a portrait mask image corresponding to the image to be processed further includes:

r210, acquiring a plurality of first portrait areas of the portrait mask image and the number of pixels corresponding to each first portrait area;

and R220, preprocessing the human image mask image according to the number of the pixels.

Specifically, the first portrait areas refer to portrait areas in the portrait mask map, and for the portrait mask map, the portrait areas generally have a certain volume and shape, that is, the portrait areas generally consist of a certain number of pixels, for example, when the number of pixels of a certain portrait outline in the target portrait mask map is 2, since it cannot form a complete outline, it is determined that it may not be the portrait outline but the clean background is not deleted. In order to further eliminate non-portrait areas in the portrait mask image, in this embodiment, after the portrait mask image corresponding to the image to be processed is output through the portrait segmentation model, first, outer contour extraction is performed on the portrait mask image to obtain a plurality of first portrait areas in the portrait mask image, the outer contour extraction method may be implemented by using a function findContours () built in an OpenCV, then, the number of pixels in each first portrait area is obtained, and the portrait mask image is preprocessed according to the number of pixels in each first portrait area to obtain the preprocessed portrait mask image.

For example, after 4 portrait areas are extracted from the portrait mask image, the number of pixels of the four portrait areas is respectively obtained, whether the four portrait areas are all real portrait areas is determined according to the number of pixels of the four portrait areas, and the non-real portrait areas are removed from the portrait mask image, so that the portrait mask image is preprocessed.

In an implementation manner of this embodiment, the preprocessing the human image mask image according to the number of the pixels specifically includes:

step R221, determining the pixel proportion corresponding to each first portrait area according to the number of the pixels;

and step R222, comparing the pixel proportion with a preset threshold, and removing the first portrait area corresponding to the pixel proportion from the portrait mask image when the pixel proportion is smaller than the preset threshold.

In order to measure whether a plurality of extracted first portrait areas are real portrait areas, in this embodiment, a threshold is preset, after the number of pixels of each first portrait area is obtained, a pixel proportion of each first portrait area is determined according to the number of pixels of each first portrait area and the total number of pixels of the portrait mask image, then the pixel proportion is compared with a preset threshold, and when the pixel proportion is greater than or equal to the preset threshold, the first portrait area corresponding to the pixel proportion is considered to be a real portrait outline; and when the pixel proportion is smaller than a preset threshold value, considering a first portrait area corresponding to the pixel proportion as a background, and removing the first portrait area from the portrait mask image.

For example, when the number of pixels in a certain portrait area is 16, and the total number of pixels in the portrait mask is 100, the pixel proportion corresponding to the portrait area is calculated to be 0.16, and the preset threshold value is 0.15, that is, the pixel proportion corresponding to the portrait area is greater than the preset threshold value, and the portrait area is considered to be a real portrait outline; when the number of pixels in a certain portrait area is 14 and the total number of pixels in the portrait mask is 100, calculating that the proportion of pixels corresponding to the portrait area is 0.14 and the preset threshold value is 0.15, namely the proportion of pixels corresponding to the portrait area is less than the preset threshold value, considering the portrait area as a background, and removing the portrait area from the portrait mask.

S300, adjusting the candidate disparity map according to the portrait mask map to obtain a target disparity map corresponding to the image to be processed.

Specifically, the target disparity map is a disparity map of the candidate disparity map after being adjusted by the portrait mask map, the disparity consistency of the portrait in the candidate disparity map is poor, the portrait edge (fingers, ears, hollowing, and the like) cannot be managed well, leakage and false blurring easily occur at the portrait edge, the portrait mask map can optimize the disparity consistency of the portrait in the candidate disparity map and the portrait edge, background blurring by using the target disparity map can greatly optimize the problems of leakage and false blurring occurring at the portrait edge, a gradual blurring effect can be achieved, background blurring can be performed even in an unmanned scene, and the application range is wide.

In an implementation manner of this embodiment, the adjusting the candidate disparity map according to the portrait mask map to obtain the target disparity map corresponding to the image to be processed specifically includes:

step S310, adjusting the image size of the portrait mask image to be the same as the image size of the candidate parallax image, and determining a second portrait area of the candidate parallax image according to the adjusted portrait mask image;

step S320, obtaining a plurality of first disparity values corresponding to the second portrait area, and adjusting the candidate disparity map according to the plurality of first disparity values to obtain a target disparity map corresponding to the image to be processed.

Specifically, the second portrait area is a portrait area corresponding to the first portrait area in the portrait mask image in the candidate disparity map, and in order to increase the determination speed of the second portrait area, the image size of the portrait mask image may be adjusted before the second portrait area is determined, so that the image size of the portrait mask image after adjustment is the same as the image size of the candidate disparity map. In the foregoing steps, in order to reduce the computational complexity in the portrait segmentation process, the first image input into the portrait segmentation model is the first image subjected to down-sampling, that is, the size of the portrait mask is smaller than that of the candidate disparity map, in this embodiment, after the preprocessed portrait mask is obtained, the preprocessed portrait mask is up-sampled by using a nearest neighbor interpolation method, so as to up-sample the size of the preprocessed portrait mask to be consistent with that of the candidate disparity map, where the nearest neighbor difference algorithm is a simple difference algorithm, and the gray level of an adjacent pixel closest to the pixel to be solved is assigned to the pixel to be solved in four adjacent pixels of the pixel to be solved, so as to complement the pixel points in the preprocessed portrait mask with the candidate disparity map, so as to achieve the consistent size of the preprocessed portrait mask and the candidate disparity map. For example, the pixel point to be solved is (i + u, j + v), wherein u and v are decimals larger than zero and smaller than 1, the pixel point a in the four adjacent pixels of the pixel point to be solved is (i, j), the pixel point B is (i +1,j), the pixel point C is (i, j + 1), the pixel point D is (i +1, j + 1), when u is less than 0.5, v is less than 0.5, it indicates that the pixel point a is closest to the pixel point to be solved, the gray value of the pixel point a is assigned to the pixel point to be solved; when u is greater than 0.5, v is less than 0.5, indicating that the B pixel point is closest to the pixel point to be solved, and assigning the gray value of the B pixel point to the pixel point to be solved; when u is less than 0.5 and v is 0.5, indicating that the C pixel point is closest to the pixel point to be solved, assigning the gray value of the C pixel point to the pixel point to be solved; and when u is greater than 0.5 and v is greater than 0.5, indicating that the D pixel point is closest to the pixel point to be solved, and assigning the gray value of the D pixel point to the pixel point to be solved.

After the image size of the portrait mask image is adjusted to be the same as that of the candidate parallax image, for each pixel point in a first portrait area in the portrait mask image, a candidate parallax pixel point exists in the candidate parallax image, and an area formed by the candidate parallax pixel points corresponding to all pixel points in the first portrait area is a second portrait area of the candidate parallax image. After a second portrait area is determined, a pixel value corresponding to each pixel point in the second portrait area is obtained, a plurality of first parallax values corresponding to the second portrait area are obtained, and the candidate parallax image is adjusted according to the first parallax values, so that a target parallax image corresponding to the image to be processed is obtained.

In a specific embodiment, the step of adjusting the candidate disparity map according to the plurality of first disparity values to obtain a target disparity map corresponding to the image to be processed specifically includes:

step S321, determining a parallax average value corresponding to the second portrait area according to the plurality of first parallax values;

and step S322, adjusting the candidate disparity maps according to the disparity average value to obtain a target disparity map corresponding to the image to be processed.

Specifically, when the candidate disparity map is adjusted according to the first disparity values, the candidate disparity map is adjusted by using the disparity mean value corresponding to the second portrait area. After obtaining a plurality of first parallax values corresponding to the second portrait area, a parallax mean value corresponding to the second portrait area may be determined, for example, if the plurality of first parallax values are R1, R2, and R3 … Rn, the parallax mean value is

And then adjusting the candidate disparity map according to the disparity average value.

The candidate disparity map comprises a second portrait area and a non-portrait area, and when the candidate disparity map is adjusted by using the disparity average value, the second portrait area and the non-portrait area in the candidate disparity map are respectively adjusted by using the disparity average value. Each pixel point of a second portrait area in the candidate parallax image has a corresponding first parallax value, and when the second portrait area in the candidate parallax image is adjusted, a plurality of first parallax values corresponding to the second portrait area are all set as the parallax average value, so that the parallax consistency of the second portrait area in the candidate parallax image is ensured; when the non-portrait area in the candidate disparity map is adjusted, a plurality of second disparity values corresponding to the non-portrait area of the candidate disparity map are obtained, then the second disparity values are compared with the disparity mean value, a plurality of third disparity values equal to the disparity mean value are screened out from the second disparity values, and the third disparity values are set as preset fourth disparity values, so that the non-portrait area is prevented from being leaked and blurred as a portrait in the background blurring process. In one specific implementation, the fourth disparity value is d-10, where d is the mean disparity value.

For example, the first parallax values corresponding to the second portrait area are M1, M2, M3 … Mn, and the parallax average value is d, and when the second portrait area is adjusted by using the parallax average value, M1, M2, M3 … Mn is replaced by the parallax average value d; the second parallax values corresponding to the non-portrait area in the candidate parallax map are R1, R2, and R3 … Rn, when the non-portrait area is adjusted by using the parallax average value, R1, R2, and R3 … Rn are respectively compared with the parallax average value d, a plurality of third parallax values equal to the parallax average value are selected from R1, R2, and R3 … Rn, and are equal to the parallax average value R2, R3 … Rs, and a plurality of third parallax values R2, R3 … Rs are replaced by d-10.

In a specific embodiment, the step of obtaining the target disparity map corresponding to the image to be processed includes:

and S400, performing background blurring on the image to be processed according to the target disparity map.

Specifically, in this embodiment, after the consistency of the parallax and the edge of the portrait in the candidate parallax image are adjusted through the portrait mask image to obtain the target parallax image corresponding to the image to be processed, the target parallax image is used to perform background blurring on the image to be processed, so that false blurring and missing blurring of the portrait can be reduced, and thus the blurring effect of the image to be processed is improved.

In summary, the present embodiment provides a disparity map generating method, where the method includes acquiring an image to be processed and an auxiliary image corresponding to the image to be processed, and determining a candidate disparity map corresponding to the image to be processed according to the image to be processed and the auxiliary image; inputting the image to be processed into a portrait segmentation model, and outputting a portrait mask image corresponding to the image to be processed through the portrait segmentation model; and adjusting the candidate parallax map according to the portrait mask map to obtain a target parallax map corresponding to the image to be processed. The method and the device optimize the consistency of the parallax values of the portrait in the parallax image through the portrait mask image, reduce false blurring and leakage blurring of the portrait, and simultaneously enable the blurring effect of the blurring obtained blurring image to be gradually changed by means of depth gradual change in the parallax image, so that the blurring effect of the image to be processed is improved.

Based on the above disparity map generation method, the present embodiment provides a disparity map generation apparatus, as shown in fig. 4, including:

an obtaining module 410, configured to obtain an image to be processed and an auxiliary image corresponding to the image to be processed, and determine a candidate disparity map corresponding to the image to be processed according to the image to be processed and the auxiliary image;

the segmentation module 420 is configured to input the image to be processed into a portrait segmentation model, and output a portrait mask image corresponding to the image to be processed through the portrait segmentation model;

and the adjusting module 430 is configured to adjust the candidate disparity map according to the portrait mask map to obtain a target disparity map corresponding to the image to be processed.

Based on the above-described disparity map generation method, the present embodiment provides a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps in the disparity map generation method according to the above-described embodiment.

Based on the disparity map generation method, the present application further provides a terminal device, as shown in fig. 5, which includes at least one processor (processor) 20; a display screen 21; and a memory (memory) 22, and may further include a communication Interface (Communications Interface) 23 and a bus 24. The processor 20, the display 21, the memory 22 and the communication interface 23 can communicate with each other through the bus 24. The display screen 21 is configured to display a user guidance interface preset in the initial setting mode. The communication interface 23 may transmit information. The processor 20 may call logic instructions in the memory 22 to perform the methods in the embodiments described above.

Furthermore, the logic instructions in the memory 22 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product.

The memory 22, which is a computer-readable storage medium, may be configured to store a software program, a computer-executable program, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 20 executes the functional application and data processing, i.e. implements the method in the above-described embodiments, by executing the software program, instructions or modules stored in the memory 22.

The memory 22 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. Further, the memory 22 may include a high speed random access memory and may also include a non-volatile memory. For example, a variety of media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, may also be transient storage media.

In addition, the specific working process of the disparity map generating apparatus, the storage medium, and the specific process that the multiple instruction processors in the terminal device load and execute are described in detail in the method, and are not stated herein.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A disparity map generation method, comprising:

2. The disparity map generation method according to claim 1, wherein the image to be processed and the auxiliary image are captured by an imaging module, wherein the imaging module comprises at least a main imager and an auxiliary imager; the main imager is used for shooting the image to be processed, and the auxiliary imager is used for shooting the auxiliary image.

3. The method according to claim 1, wherein the determining the candidate disparity map corresponding to the image to be processed according to the image to be processed and the auxiliary image specifically comprises:

respectively adjusting the image size of the image to be processed and the image size of the auxiliary image to preset image sizes to obtain an adjusted image to be processed and an adjusted auxiliary image;

4. The disparity map generation method according to claim 1, wherein the human image segmentation model includes a first feature extraction module, a second feature extraction module, a third feature extraction module, a fourth feature extraction module, and a fifth feature extraction module, an output item of the first feature extraction module is an input item of the second feature extraction module, an output item of the second feature extraction module is an input item of the third feature extraction module, an output item of the third feature extraction module is an input item of the fourth feature extraction module, and an output item of the fourth feature extraction module is an input item of the fifth feature extraction module.

5. The disparity map generation method according to claim 4, wherein the first feature extraction module includes a first convolution layer, a first feature extraction unit, a second feature extraction unit, and a third feature extraction unit, which are sequentially cascaded, the second feature extraction module includes a fourth feature extraction unit, a fifth feature extraction unit, and a sixth feature extraction unit, which are connected in parallel, an output item of the first convolution layer is an input item of the first feature extraction unit, an output item of the first feature extraction unit is an input item of the second feature extraction unit and the fourth feature extraction unit, an output item of the second feature extraction unit is an input item of the third feature extraction unit and the fifth feature extraction unit, and an output item of the third feature extraction unit is an input item of the sixth feature extraction unit.

6. The disparity map generation method according to claim 5, wherein the third feature extraction module and the fifth feature extraction module each include a plurality of seventh feature extraction units and a plurality of first feature fusion units that are sequentially cascaded and arranged at intervals, the fourth feature extraction module comprises a plurality of seventh feature extraction units and a plurality of second feature fusion units which are sequentially cascaded and arranged at intervals, the output items of all the feature extraction units in the second feature extraction module are horizontal input items of all the seventh feature extraction units in the third feature extraction module respectively, the output items of the seventh feature extraction units in the third feature extraction module are horizontal input items of the seventh feature extraction units in the fourth feature extraction module respectively, the output items of the seventh feature extraction units in the fourth feature extraction module are horizontal input items of the seventh feature extraction units in the fifth feature extraction module respectively, the input items of the first feature fusion units in the third feature extraction module and the fifth feature extraction module are respectively the output items of the previous seventh feature extraction unit adjacent to the input items, the output item of each first feature fusion unit in the third feature extraction module and the fifth feature extraction module is a vertical input item of a next seventh feature extraction unit adjacent to the output item, the input items of the second feature fusion units in the fourth feature extraction module are respectively the output items of the previous seventh feature extraction unit adjacent to the input items, and the output item of each second feature fusion unit in the fourth feature extraction module is a vertical input item of a next seventh feature extraction unit adjacent to the output item.

7. The disparity map generation method according to claim 6, wherein the seventh feature extraction unit includes a third feature fusion layer, a plurality of second convolution layers, a first normalization layer, a fourth feature fusion layer, and a first activation layer, which are sequentially cascaded, wherein a second normalization layer, a second activation layer, and a regularization layer are sequentially cascaded between two adjacent convolution layers of the plurality of second convolution layers, an output term of the third feature fusion layer is an input term of the plurality of second convolution layers, an output term of the plurality of second convolution layers is an input term of the first normalization layer, an output term of the third fusion layer and an output term of the first normalization layer are input terms of the fourth fusion layer, and an output term of the fourth fusion layer is an input term of the first activation layer.

8. The disparity map generation method according to any one of claims 1 to 7, wherein the training process of the human image segmentation model specifically includes:

9. The disparity map generation method according to claim 8, wherein the second loss function is constructed by:

10. The disparity map generation method according to any one of claims 1 to 7, wherein after outputting the portrait mask map corresponding to the image to be processed by the portrait segmentation model, the method further comprises:

11. The disparity map generation method according to claim 10, wherein the preprocessing the human image mask map according to the number of pixels specifically comprises:

12. The method for generating a disparity map according to any one of claims 1 to 7, wherein the adjusting the candidate disparity map according to the portrait mask map to obtain the target disparity map corresponding to the image to be processed specifically includes:

13. The method for generating a disparity map according to claim 12, wherein the adjusting the candidate disparity map according to the plurality of first disparity values to obtain the target disparity map corresponding to the image to be processed specifically comprises:

determining a parallax mean value corresponding to the second portrait area according to the plurality of first parallax values;

14. The method of claim 13, wherein the adjusting the candidate disparity map according to the disparity mean specifically comprises:

15. The method of claim 13, wherein the adjusting the candidate disparity map according to the disparity mean specifically comprises:

acquiring a plurality of second parallax values corresponding to the non-portrait areas of the candidate parallax map;

and comparing the plurality of second parallax values with the parallax average value, screening a plurality of third parallax values which are equal to the parallax average value from the plurality of second parallax values, and setting the plurality of third parallax values as preset fourth parallax values.

16. The disparity map generation method according to any one of claims 1 to 7, wherein the step of obtaining the target disparity map corresponding to the image to be processed includes:

17. A disparity map generation device, comprising:

the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring an image to be processed and an auxiliary image corresponding to the image to be processed and determining a candidate disparity map corresponding to the image to be processed according to the image to be processed and the auxiliary image;

18. A computer-readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the steps in the disparity map generating method according to any one of claims 1 to 16.

19. A terminal device, comprising: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;

the processor, when executing the computer readable program, implements the steps in the disparity map generating method according to any of claims 1-16.