CN117808860A

CN117808860A - Image processing method, device, electronic equipment and storage medium

Info

Publication number: CN117808860A
Application number: CN202311870810.8A
Authority: CN
Inventors: 吴若溪
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2024-04-02

Abstract

The application discloses an image processing method, an image processing device, electronic equipment and a storage medium, wherein the image processing method comprises the following steps: acquiring an original image as a first-order image; acquiring a depth image corresponding to the first eye image; generating a target parallax image based on the depth image and the first eye image, wherein parallax exists between the target parallax image and the first eye image; and carrying out post-processing on the target parallax image to obtain a second-order image corresponding to the first-order image. The method can generate another eye image with parallax for the monocular image, so that the field of view corresponding to the image content can be expanded when the image content is displayed.

Description

Image processing method, device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image processing method, an image processing device, an electronic device, and a storage medium.

Background

With rapid progress in the technological level and the living standard, electronic devices (such as smartphones, tablet computers, etc.) have become one of the commonly used electronic products in people's lives. Since electronic devices generally have a photographing function, people often take photographs and videos with the electronic devices, but in the related art, the field of view of contents in images taken by the electronic devices is generally limited.

Disclosure of Invention

The application provides an image processing method, an image processing device, electronic equipment and a storage medium, which can generate another image with parallax for a monocular image, so that the field of view corresponding to the image content can be expanded when the image content is displayed.

In a first aspect, an embodiment of the present application provides an image processing method, including: acquiring an original image as a first-order image; acquiring a depth image corresponding to the first eye image; generating a target parallax image based on the depth image and the first eye image, wherein parallax exists between the target parallax image and the first eye image; and carrying out post-processing on the target parallax image to obtain a second-order image corresponding to the first-order image.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including: the device comprises a first image acquisition module, a second image acquisition module, a parallax image generation module and an image post-processing module, wherein the first image acquisition module is used for acquiring an original image as a first eye image; the second image acquisition module is used for acquiring a depth image corresponding to the first target image; the parallax image generation module is used for generating a target parallax image based on the depth image and the first eye image, wherein parallax exists between the target parallax image and the first eye image; the image post-processing module is used for carrying out post-processing on the target parallax image to obtain a second-order image corresponding to the first-order image.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a memory; one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the image processing method provided in the first aspect above.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored therein program code that is callable by a processor to perform the image processing method provided in the first aspect described above.

According to the scheme provided by the application, the original image is obtained and is used as a first-order image; and acquiring a depth image corresponding to the first eye image, generating a target parallax image based on the depth image and the first eye image, wherein parallax exists between the target parallax image and the first eye image, and performing post-processing on the target parallax image to obtain a second eye image corresponding to the first eye image. Therefore, another eye image with parallax can be generated according to the monocular image, so that the field of view corresponding to the image content can be expanded when the image content is displayed, and after the parallax image is obtained for the monocular image, the parallax image is subjected to post-processing, so that the accuracy and quality of the obtained another eye image can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a flow diagram of an image processing method according to an embodiment of the present application.

Fig. 2 shows a schematic diagram of a depth estimation model according to an embodiment of the present application.

Fig. 3 shows a flow diagram of an image processing method according to another embodiment of the present application.

Fig. 4 shows a schematic diagram of affine transformation provided by an embodiment of the present application.

Fig. 5 shows another schematic diagram of affine transformation provided by an embodiment of the present application.

Fig. 6 shows a flow diagram of an image processing method according to a further embodiment of the present application.

Fig. 7 is a schematic diagram of bilinear interpolation processing according to an embodiment of the present application.

Fig. 8 shows a schematic view of an effect provided by an embodiment of the present application.

Fig. 9 shows a schematic view of an effect provided by an embodiment of the present application.

Fig. 10 shows a flow diagram of an image processing method according to a further embodiment of the present application.

Fig. 11 shows a flow diagram of an image processing method according to yet another embodiment of the present application.

Fig. 12 shows a schematic view of a scenario provided in an embodiment of the present application.

Fig. 13 shows a block diagram of an image processing apparatus according to an embodiment of the present application.

Fig. 14 is a block diagram of an electronic device for performing an image processing method according to an embodiment of the present application.

Fig. 15 is a storage unit for storing or carrying program codes for implementing the image processing method according to the embodiment of the present application.

Detailed Description

In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application.

Along with the development of the technology level, an electronic device is usually provided with a camera, so as to realize a shooting function. Currently, in daily life use, electronic devices (such as smartphones, tablet computers and the like) have almost covered the population, wherein a camera module has become a main functional point of the electronic device, a user can shoot photos and videos through the camera function of the electronic device, the obtained images are shot immediately, and the electronic device is convenient and quick, and besides, the user also uploads the shot images to the internet to share with others.

When shooting by an electronic device, the field of view of imaging is usually limited, which makes the shot image, when used for presentation later, show only content within the limited field of view. In addition, with the application of a Virtual Reality (VR), an augmented Reality (Augmented Reality, AR), and an Extended Reality (XR) head mounted display device (e.g., smart glasses) in a home scene, there is also a linkage between the electronic device and the head mounted display device. When an electronic device and a head-mounted display device are linked, there is generally a scene in which the electronic device transmits photographed content to the head-mounted display device for display, in such a scene, if the head-mounted display device directly displays photographed monocular images, only a 2D display effect can be achieved, and the field of view of the displayed image content is limited.

In the related art, in order to solve the problem that the field of view of the imaging of the photographed monocular image is limited, field of view completion may be performed using a known monocular image and depth map, thereby expanding the field of view of the presented image content. However, in the related art, the accuracy and precision of the visual field completion is usually insufficient by using the monocular image, and the image after the visual field completion may have the conditions of shadow, fracture, serious texture error and the like, so that a great error occurs when the image is displayed by the equipment.

In order to solve the above problems, the inventor proposes an image processing method, an apparatus, an electronic device, and a storage medium according to embodiments of the present application, which can generate another image with parallax according to a monocular image, so as to expand a field of view corresponding to image content when the image content is displayed, and further perform post-processing on the parallax image after the parallax image is obtained for the monocular image, so that accuracy and quality of the obtained another image can be improved. The specific image processing method is described in detail in the following embodiments.

The image processing method provided in the embodiment of the present application will be described in detail with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a schematic flow chart of an image processing method according to an embodiment of the present application. In a specific embodiment, the image processing method is applied to an image processing apparatus 600 as shown in fig. 13 and an electronic device 100 (fig. 14) provided with the image processing apparatus 600. In the following, the specific flow of the present embodiment will be described by taking an electronic device as an example, and it will be understood that the electronic device applied in the present embodiment may be a smart phone, a tablet computer, an electronic book, a head-mounted display device, etc., which is not limited herein. The following will describe the flowchart shown in fig. 1 in detail, and the image processing method specifically may include the following steps:

Step S110: an original image is acquired as a first-order image.

In the embodiment of the application, the electronic device may acquire the original image and use the original image as the first destination image to generate another destination image for the original image, so as to complement another destination field of view and realize expansion of the field of view. The original image can be a single-purpose RGB image or one frame of image in the video; the original image may be a left-eye image or a right-eye image.

In some embodiments, the electronic device may obtain the photographed original image by photographing the image of the real scene, and take the photographed original image as the first-order image. The electronic device may be a mobile terminal provided with a camera, such as a smart phone, a tablet computer, a smart watch, or the like, and may perform image acquisition through a front camera or a rear camera, so as to obtain the above original image, for example, the electronic device may acquire an image through the rear camera and use the acquired image as the above original image.

In some embodiments, the electronic device may obtain the original image from the local, that is, the electronic device may obtain the original image from a locally stored file, for example, when the electronic device is a mobile terminal, the electronic device may obtain the original image from an album, that is, the electronic device may collect the image through a camera in advance and store the image in the local album, or download the image from a network in advance and store the image in the local album, and then read the original image from the album when the field of view of the image needs to be completed.

In some embodiments, when the electronic device is a mobile terminal or a computer, the original image may also be downloaded from a network, for example, the electronic device may download the required original image from a corresponding server through a wireless network, a data network, or the like. When the electronic device is a head-mounted display device, images transmitted by other devices can be received and used as original images.

Of course, the specific manner in which the electronic device acquires the original image is not limited.

Step S120: and acquiring a depth image corresponding to the first eye image.

In the embodiment of the present application, after the electronic device acquires the above first-order image, the electronic device may acquire a depth image corresponding to the above first-order image, so as to generate, according to the depth image corresponding to the first-order image, an image with parallax between the first-order image and the depth image. The depth image is an image taking the distance (depth) from the image collector to each point in the scene as a pixel value, the depth image can reflect the geometric shape of the visible surface of the object, the point cloud data can be calculated through coordinate conversion, and the point cloud data with regular and necessary information can also be reversely calculated as the depth image data.

In some embodiments, the electronic device may input the first order image into a pre-trained depth estimation model, and obtain a depth image output by the depth estimation model as a depth image corresponding to the first order image. The depth estimation model may be a neural network-based depth estimation model, and the depth estimation model may be obtained by training a large number of sample monocular images marked with depth images.

In one possible implementation, the above depth estimation model may be trained by: acquiring a plurality of sample monocular images and depth images corresponding to each sample monocular image; labeling a corresponding depth image of each sample monocular image; training the initial model according to the sample monocular image marked with the depth image to obtain the depth estimation model.

Optionally, when acquiring the above sample monocular images and the depth image corresponding to each sample monocular image, a binocular image acquired by a binocular camera may be acquired, and one of the binocular images may be used as the above sample monocular image, for example, a left-eye image or a right-eye image may be used as the above sample monocular image; and determining a depth image corresponding to the sample monocular image according to the sample monocular image and another eye image corresponding to the sample monocular image.

Alternatively, referring to fig. 2, the above depth estimation model 300 may be an encoder-decoder model (encoder-decoder model), and the encoder 301 is configured to extract image features for an input image; the decoder 302 is configured to map features into depth images according to the image features extracted by the encoder. The encoder can be a network model such as BEiT, swin2, res-net and the like, and can reduce the size of the feature map layer by layer through a plurality of convolution layers and pooling layers so as to acquire context information of different scales, so that the texture and the shape of an object are better understood. The decoder may be a cyclic neural network (Recurrent Neural Networks, RNN) or a variational self-encoder (Variational Auto Encoder, VAE), etc., and may upsample the feature map extracted by the encoder, and fuse the feature maps of different layers by means of skip connection (also called residual connection), i.e. directly connecting the input layer to the output layer, so as to obtain more detailed information, so as to reduce the blurring of the depth map, and solve the gradient vanishing problem in the network.

Where the BEIT network predicts the visual token of the original image based on the encoded vector of the corrupted image, the network structure has two representations of the image, namely the image patch and the visual token. Splitting an image into a series of patches, carrying out random masking (mask) on the patches, wherein the mask accounts for about 40% in general, flattening the patches into vectors, carrying out linear projection to obtain patch codes, and carrying out position coding on each patch to obtain the position codes of the image; secondly, fusing the patch code and the position code as the input of a Transformer layer; a mask image modeling (Masked Image Modeling, MIM) layer is then performed to predict the token representation of the patch of the mask. The network of the Swin2 is modified on the basis of the Encoder network of the transducer, the main difference between the Swin2 and the Swin1 is that LN layers are arranged at the rear, namely, the LN layers are connected and standardized, the influence of large amplitude difference of a cross-layer activation function and the influence of unstable training are solved, and the attention mechanism adopts cosine attention to replace dot product attention, so that the situation of sinking into a limit value is avoided. The Res-net network is a deep residual network that helps the network learn more complex feature representations by introducing residual connections (residual connection) that allow gradients to pass more directly through the deep network while preserving the input information, thus enabling the network to learn a richer feature representation, specifically, the residual connections in the Res net include identity maps that pass the input directly to the output and residual maps that process the input through a series of convolution layers and activation functions and then add it to the input to get the output. Thus, the sum of the input and the output of the residual map is the final output.

Optionally, when training the initial model according to the above sample monocular image marked with the depth image, each sample monocular image can be respectively input into the initial model to obtain an estimation result corresponding to each sample monocular image, then a loss value is determined according to the estimation result corresponding to each sample monocular image and the depth image marked by each sample monocular image, and then 1 epoch (round) is completed after the model parameters of the initial model are adjusted according to the loss value obtained by calculation; and then returning to the step of inputting each sample monocular image into the initial model to obtain an estimation result corresponding to each sample monocular image, and completing the next epoch, and repeating the steps to complete a plurality of epochs. Where epoch refers to the number of times that all sample monocular images are used, the value of epoch is colloquially stated as the whole dataset is rolled several times, 1 epoch being equal to training 1 time with all sample monocular images.

Optionally, according to the loss value determined above, an Adam optimizer is used to update the initial model iteratively, so that the loss value obtained each time becomes smaller until the loss value converges, and the model at this time is saved, so as to obtain a trained depth estimation model. The Adam optimizer combines the advantages of two optimization algorithms, namely AdaGra (Adaptive Gradient ) and RMSProp, and comprehensively considers the first moment estimation (First Moment Estimation, i.e., the average value of the gradient) and the second moment estimation (Second Moment Estimation, i.e., the non-centralized variance of the gradient) of the gradient to calculate an update step. The training end conditions of the iterative training may include: the number of iterative training reaches the target number; or the above loss value satisfies the set condition. The convergence condition is that the target loss value is as small as possible, the initial learning rate 1e-3 is used, the learning rate decays along with the cosine of the step number, the batch_size=512, and after a plurality of epochs are trained, the convergence can be considered to be completed. Wherein, batch_size can be understood as a batch parameter, and its limit value is the total number of training set samples; the loss value satisfying the set condition may include: the total loss value is less than the set threshold. Of course, the specific training end condition may not be limiting.

Step S130: and generating a target parallax image based on the depth image and the first target image, wherein parallax exists between the target parallax image and the first target image.

In the embodiment of the present application, after the depth image corresponding to the above first-order image is obtained, a target parallax image with parallax between the first-order image and the first-order image may be generated based on the above first-order image and the depth image, so as to obtain another target image corresponding to the first-order image. The parallax image is an image whose size is the size of the reference image and whose element value is the parallax value with respect to any one of the pair of images.

In some embodiments, considering that there is a correspondence between depth and parallax in general, for example, the correspondence is:where Z is the depth of each pixel, Z _max For the maximum depth existing in the depth map, s is a random variable, so that the parallax map can be determined based on the first target image and the depth image according to the corresponding relationLike an image. Of course, the specific manner of generating the target parallax image based on the above depth image and the first destination image may not be limited, and for example, the above parallax image may be generated by means of artificial intelligence (Artificial Intelligence, AI).

Step S140: and carrying out post-processing on the target parallax image to obtain a second-order image corresponding to the first-order image.

In the embodiment of the present application, after obtaining the above target parallax image, the electronic device may further perform post-processing on the target parallax image to obtain a second-order image corresponding to the first-order image. The post-processing is used for improving the precision and the image quality of the target parallax image, and the post-processing may include interpolation processing, calibration with a maximum parallax value, affine transformation, and other processing, which is specifically included in the post-processing may not be limited, and may include other processing for improving the precision and the image quality of the target parallax image, for example. It will be appreciated that the accuracy of another eye image obtained by performing field-of-view completion for a monocular image in the related art is often insufficient, and the image quality is poor, so that post-processing can be performed for the above obtained target parallax image to obtain another eye image with higher accuracy and image quality. If the first-order image is a left-order image, the second-order image is a right-order image; if the first-order image is a right-order image, the second-order image is a left-order image.

In some embodiments, the electronic device performs post-processing on the above target parallax image, which may be processing the target parallax image by an algorithm corresponding to each processing included in the post-processing; the image processing model may be trained in advance for the above post-processing, and the target parallax image may be processed by the image processing model when the above target parallax image is post-processed.

According to the image processing method, an original image is obtained and is used as a first-order image; and acquiring a depth image corresponding to the first eye image, generating a target parallax image based on the depth image and the first eye image, wherein parallax exists between the target parallax image and the first eye image, and performing post-processing on the target parallax image to obtain a second eye image corresponding to the first eye image. Therefore, another eye image with parallax can be generated according to the monocular image, so that the field of view corresponding to the image content can be expanded when the image content is displayed, and after the parallax image is obtained for the monocular image, the parallax image is subjected to post-processing, so that the accuracy and quality of the obtained another eye image can be improved.

Referring to fig. 3, fig. 3 is a flowchart illustrating an image processing method according to another embodiment of the present application. The image processing method is applied to the electronic device, and will be described in detail below with respect to the flowchart shown in fig. 3, where the image processing method specifically includes the following steps:

step S210: an original image is acquired as a first-order image.

Step S220: and acquiring a depth image corresponding to the first eye image.

Step S230: and generating a target parallax image based on the depth image and the first target image, wherein parallax exists between the target parallax image and the first target image.

In the embodiment of the present application, the steps S210 to S230 may refer to the content of other embodiments, which are not described herein.

Step S240: and carrying out interpolation processing on the target parallax image to obtain a first parallax image.

In the embodiment of the application, when post-processing is performed on the obtained target parallax image, interpolation processing may be performed on the target parallax image to obtain a first parallax image, so as to improve parallax change between far and near points through nonlinear transformation. Interpolation is a mathematical method for interpolating between known data points to obtain more accurate values, and in image processing, interpolation can increase the resolution of the image or improve the details of the image. The interpolation processing performed on the target parallax image may be cubic interpolation, bilinear difference, etc., where cubic interpolation is a higher-order interpolation method, and it uses the gray scales of 16 pixels around the pixel to be solved to perform cubic polynomial interpolation, so that a better effect can be obtained, and bilinear interpolation uses the gray scales of four adjacent pixels of the pixel to be solved to perform linear interpolation in two directions, so that a better effect can be obtained.

In some embodiments, the electronic device may interpolate the target parallax image by performing interpolation. When the target parallax image is subjected to tertiary interpolation, the target parallax image can be subjected to standardization processing to obtain a first intermediate parallax image; performing second interpolation processing on the first intermediate parallax image to obtain a second intermediate parallax image; and adjusting each pixel value of the second intermediate parallax image according to the maximum pixel value in the target parallax image to obtain a first parallax image. The standardized processing means that the image data is converted into a standardized form, so that the image data has uniform distribution characteristics, pixel values are 0-1, and the centralized processing is realized, thereby facilitating the subsequent data analysis and processing.

In one possible implementation manner, the above second interpolation process may be cubic interpolation, and when the second interpolation process is performed on the first intermediate parallax image, a search interval, that is, a pixel range around the pixel to be solved, may be determined according to the position of the target pixel; selecting a corresponding cubic polynomial to fit the gray level change trend of the known pixel value in the search interval; deriving the fitted cubic polynomial and enabling the fitted cubic polynomial to be zero, and solving extreme points, namely possible positions of pixels to be solved; selecting the optimal pixel position according to the position of the extreme point and the corresponding gray value to obtain the optimal interpolation effect; the optimal pixel position is then applied to the original image, resulting in the above second intermediate parallax image.

In one possible embodiment, each pixel value of the second intermediate parallax image is adjusted according to the maximum pixel value in the target parallax image, and the pixel value of each pixel position in the second intermediate parallax image may be multiplied by the above maximum pixel value, so as to obtain the above first parallax image. It can be understood that the above first intermediate parallax image is an image obtained by performing normalization processing on the target parallax image, and its pixel value is 0 to 1, that is, the above maximum pixel value is not greater than 1, and after the pixel value at each pixel position is multiplied by the maximum pixel value, the pixel value is also not greater than 1; further, since the pixel value in the above target parallax image indicates a parallax value, the above maximum pixel value indicates a position at which the parallax value is maximum, and hence the parallax change at the near point can be solved by multiplying the pixel value at each pixel position by the above maximum pixel value.

Step S250: and carrying out affine transformation on the first parallax image to obtain a second parallax image aligned with the first eye image, and taking the second parallax image as a second eye image corresponding to the first eye image.

In this embodiment of the present application, after interpolation processing is performed on the target parallax image to obtain a first parallax image, affine transformation may be performed on the first parallax image to achieve alignment between the first parallax image and the first destination image, so as to obtain a second parallax image, where the second parallax image may be used as another destination image (i.e., a second destination image) that may be displayed together with the first target image. The affine transformation of the images is an image processing technology, which aims to align two or more images so that corresponding pixels in the images can be overlapped on the premise of keeping the relative position relation of the image contents unchanged, and the affine transformation can align the images at different visual angles in image registration so as to carry out subsequent comparison or fusion, so that after the first-order image and the second-order image are displayed together, the images can be fused by the brain of a user, and the user can see the images with a larger visual field range than the original visual field range of the first-order image.

In some embodiments, considering that the above first-order image may be a left-order image corresponding to a left-order or a right-order image corresponding to a right-order, the above second-parallax image may be a right-order image or a left-order image, and when the first-parallax image is affine-transformed to align the first-parallax image with the first-order image, the first-parallax image may be forward-mapped (warping) or backward-mapped. If the first eye image is a left eye image, performing forward mapping processing on the first parallax image to obtain a second parallax image aligned with the first eye image, wherein the second parallax image is used as a second eye image corresponding to the first eye image; if the first eye image is a right eye image, the first parallax image may be subjected to a backward mapping process to obtain a second parallax image aligned with the first eye image, which is used as a second eye image corresponding to the first eye image.

It can be understood that, if the first parallax image is a left-eye image, the first parallax image corresponds to a right eye, so that the first parallax image needs to be aligned to the left-eye image, as shown in fig. 4, the pixel point in the first parallax image may be subjected to forward mapping; if the first parallax image is a right-eye image, the first parallax image corresponds to the left eye image, and therefore, the first parallax image needs to be aligned to the right-eye image, and as shown in fig. 5, the pixel point in the first parallax image may be subjected to the backward mapping process.

According to the image processing method, the generation of the other eye image with parallax according to the monocular image can be achieved, so that the view corresponding to the image content can be expanded when the image content is displayed, in addition, interpolation processing is conducted on the parallax image after the parallax image is obtained for the monocular image, the accuracy of the obtained other eye image can be improved, affine transformation is conducted on the parallax image, the obtained other eye image can be aligned with the original monocular image, and the expanded view content can be accurately displayed when the image content is displayed.

Referring to fig. 6, fig. 6 is a flowchart illustrating an image processing method according to another embodiment of the present application. The image processing method is applied to the electronic device, and will be described in detail below with respect to the flowchart shown in fig. 6, where the image processing method specifically includes the following steps:

step S310: an original image is acquired as a first-order image.

Step S320: and acquiring a depth image corresponding to the first eye image.

Step S330: and generating a target parallax image based on the depth image and the first target image, wherein parallax exists between the target parallax image and the first target image.

Step S340: and carrying out interpolation processing on the target parallax image to obtain a first parallax image.

Step S350: and carrying out affine transformation on the first parallax image to obtain a second parallax image aligned with the first eye image.

In the embodiment of the present application, the steps S310 to S350 may refer to the content of other embodiments, which are not described herein.

Step S360: and carrying out hole filling processing on the second parallax image based on the shielding position in the second parallax image, and taking the second parallax image subjected to the hole filling processing as a second eye image corresponding to the first eye image.

In this embodiment of the present application, after affine transformation is performed on a first parallax image to obtain a second parallax image aligned with a first eye image, considering that during affine transformation, an area visible in the second parallax image but blocked in the first eye image is not mapped in the first eye image, so that holes may appear in the areas, after the above second parallax image is obtained, hole filling processing may be performed on the second parallax image based on the blocking position in the second parallax image, so as to avoid the problems of fracture, shadow, texture error, unclear texture, shadow, and the like in the image, and further improve the image quality of the second parallax image.

In some embodiments, the electronic device performing hole filling processing on the second parallax image based on the occlusion position in the second parallax image may include: performing first interpolation processing on the shielding position in the second parallax image; and filling target pixel values in other areas except the shielding position in the second parallax image.

In the above embodiment, the occlusion position in the second parallax image refers to a hole that occurs due to the fact that no corresponding pixel point in the first eye image is mapped when affine transformation is performed. When the electronic device performs the hole filling processing on the second parallax image based on the blocking positions in the second parallax image, the above blocking positions may be identified by the pixel values of each pixel point in the second parallax image and the pixel points in the region adjacent to each pixel point, for example, a region in which the pixel values are black but the pixel points in the adjacent region are not black may be identified. After the shielding position is identified, other areas except the shielding position in the second parallax image can be determined, the other areas can be regarded as areas which are easy to deform, and interpolation processing can be carried out on the identified shielding position when hole filling processing is carried out on the second parallax image; the target pixel values are filled for the above other regions in the second parallax image.

In one possible embodiment, the first interpolation process may be performed on the occlusion position in the second parallax image, and the bilinear interpolation process may be performed on the occlusion position in the second parallax image. The bilinear interpolation process approximates the gray value between pixels of an image using a bilinear polynomial, by calculating the gray value of two pixels and calculating the gray value of a new pixel based on their weights. Wherein, the formula of bilinear interpolation processing is as follows:

Q ₁₁ ＝(x ₁ ,y ₁ ),Q ₁₂ ＝(x ₁ ,y ₂ )

Q ₂₁ ＝(x ₂ ,y ₁ ),Q ₂₂ ＝(x ₂ ,y ₂ )

f(x,y)＝f(Q ₁₁ )(x ₂ -x)(y ₂ -y)+f(Q ₂₁ )(x-x ₁ )(y ₂ -y)

+f(Q ₁₂ )(x ₂ -x)(y-y ₁ )+f(Q ₂₂ )(x-x ₁ )(y-y ₁ )

wherein f (x, y) represents the gray value of the target pixel, f (Q) ₁₁ )、f(Q ₂₁ )、f(Q ₁₂ ) F (Q) ₂₂ ) Respectively representing the gray values of the four pixel points of the back-push, (x) ₂ -x) and (y) ₂ -y)、(x-x ₁ ) And (y) ₂ -y)、(x ₂ -x) and (y-y) ₁ )、(x-x ₁ ) And (y-y) ₁ ) Respectively representing the difference between the position coordinates of the target pixel and the position coordinates of the back-push. Referring to fig. 7, the principle of the above formula is that the calculation process of the above formula is to perform linear interpolation in two directions respectively, and then add the results, specifically, for the gray value of the target pixel, four weighted values are calculated according to the gray values of the four pixel points and their distances from the target pixel, and then the gray value of the target pixel is calculated according to the four weighted values and the gray value of the corresponding pixel point, and the above process is performed once on each pixel of the above occlusion position, so as to obtain a new gray value distribution of the occlusion position in the second parallax image.

In one possible implementation, the filling of the above other regions in the second parallax image with the target pixel values may be filling the pixel points in the other regions with the pixel values of black; the color information of the pixels in other areas adjacent to the pixels in the background area can be used for filling the areas. It will be appreciated that, in addition to the holes that appear in the occlusion region, there are few pixels that are blank and these are considered noise points, so that by the above filling, the image quality can be improved.

For example, referring to fig. 8, fig. 8 shows a schematic diagram of another image (second image) obtained by the image processing method provided by the embodiment of the present application for an image (first image) captured by an indoor scene, so that the image captured by the indoor scene can be seen, the field of view at the edge can be completed, and the image with higher image quality can be obtained; referring to fig. 9, fig. 9 shows a schematic diagram of another image (second image) obtained by the image processing method provided by the embodiment of the present application for an image (first image) shot for an outdoor scene, so that the image shot for the outdoor scene can be seen, the field of view at the edge can be completed, and the image with higher image quality can be obtained.

According to the image processing method, another eye image with parallax can be generated according to the monocular image, so that the view corresponding to the image content can be expanded when the image content is displayed, in addition, interpolation processing is carried out on the parallax image after the parallax image is obtained for the monocular image, so that the accuracy of the obtained another eye image can be improved, affine transformation is carried out on the parallax image, the obtained another eye image can be aligned with the original monocular image, and the expanded view content can be accurately displayed when the image content is displayed; in addition, after affine transformation is carried out on the parallax image, hole filling processing is carried out on the parallax image, so that the image quality of the finally obtained image of the other purpose is further improved.

Referring to fig. 10, fig. 10 is a flowchart illustrating an image processing method according to still another embodiment of the present application. The image processing method is applied to the electronic device, and will be described in detail below with respect to the flowchart shown in fig. 10, where the image processing method specifically includes the following steps:

step S410: an original image is acquired as a first-order image.

Step S420: and acquiring a depth image corresponding to the first eye image.

In the embodiment of the present application, step S410 and step S420 may refer to the content of the foregoing embodiment, and are not described herein.

Step S430: and inputting the depth image and the first target image into a pre-trained parallax estimation model to obtain a parallax image output by the parallax estimation model, wherein the parallax estimation model is obtained by training in advance according to a first sample image and a depth image corresponding to the first sample image, the first sample image is marked with a corresponding second sample image, and parallax exists between the second sample image and the first sample image.

In the embodiment of the present application, when generating the target parallax image based on the above depth image and the first order image, the depth image and the first order image may be input into a parallax estimation model trained in advance, and then the parallax image output by the parallax estimation model may be obtained. The parallax estimation model may be a convolutional neural network, and illustratively, the parallax estimation model includes an encoding network and a decoding network, an image input to the encoding network is activated by convolution, batch Normalization (BN) and an activation function (Relu), then an image feature is output, the decoding network performs convolution, batch normalization and activation of the Relu function on the input image feature, and then a parallax image is output after passing through a plurality of residual blocks and a convolution layer.

In some embodiments, the disparity estimation model is trained by: acquiring a plurality of sample image pairs acquired by a binocular camera; acquiring a sample image corresponding to a first object in the plurality of sample image pairs as the first sample image and an image corresponding to a second object in the plurality of sample image pairs as the second sample image; labeling the second sample image corresponding to the first sample image for each first sample image, and acquiring a depth image corresponding to the first sample image to obtain a sample image set; and training an initial estimation model based on the sample image set to obtain the parallax estimation model. The plurality of sample image pairs acquired by the binocular camera can be a plurality of sample image pairs obtained by shooting images of different scenes; since each of the above pairs of sample images is acquired by the two-sided camera, there is parallax between two sample images in the pair of sample images, that is, each pair of sample images includes a left-eye sample image corresponding to the left eye and a right-eye image corresponding to the right eye, one of the pair of sample images may be used as a first sample image for input to the model, and the other as a label of the first sample image, so that the parallax image can be accurately generated by the constraint model.

In some embodiments, training an initial estimation model based on the sample image set to obtain the disparity estimation model may include: inputting the first sample image in the sample image set and the depth image corresponding to the first sample image into an initial estimation model to obtain a parallax estimation image output by the initial estimation model; determining a loss value based on a difference between the second sample image noted by the first sample image and the parallax estimation image; and based on the loss value, carrying out iterative updating on the initial estimation model to obtain the parallax estimation model.

In the above embodiment, when the depth image corresponding to the first sample image is acquired, the manner of acquiring the depth image may be the same as that of acquiring the depth image corresponding to the first target image in the foregoing embodiment. For example, the first sample image may be input into a pre-trained depth estimation model, so as to obtain a depth image corresponding to the first sample image output by the depth estimation model.

In one possible implementation manner, after the first sample image and the depth image corresponding to the first sample image are input into the above initial estimation model, the initial estimation model may output a parallax estimation image corresponding to the first sample image; then, the loss value may be determined from the difference between the parallax estimation model output by the initial estimation model and the label corresponding to the first sample image (i.e., the second sample image to which the first sample image is labeled). Alternatively, the above loss value may be determined by means of L2 loss calculation.

In one possible implementation manner, when the initial estimation model is iteratively updated according to the obtained loss value, the model parameters of the initial estimation model can be adjusted according to the calculated loss value; then returning to the steps of: inputting the first sample image in the sample image set and the depth image corresponding to the first sample image into an initial estimation model to obtain a parallax estimation image output by the initial estimation model until a training ending condition is met, and obtaining a trained parallax estimation model.

Inputting each first sample image and a depth image thereof into an initial estimation model respectively to obtain parallax estimation images corresponding to each first sample image, determining a loss value according to the parallax estimation images corresponding to each first sample image and second sample images marked by each first sample image, and adjusting model parameters of the initial estimation model according to the calculated loss value to complete 1 epoch (round); then returning to the steps: and inputting the first sample image in the sample image set and the depth image corresponding to the first sample image into an initial estimation model to obtain a parallax estimation image output by the initial estimation model, completing the next epoch, and repeating the steps to complete a plurality of epochs. Where epoch refers to the number of times that all of the first sample images and their depth images are used, the value of the popular epoch is that the entire data set is rolled several times, 1 epoch is equal to training 1 time with all of the first sample images and their depth images.

In some embodiments, an Adam optimizer may be used to iteratively update the initial estimation model according to the loss values, so that the loss value obtained each time becomes smaller until the loss value converges, and the model at that time is saved to obtain a trained parallax estimation model. The Adam optimizer combines the advantages of two optimization algorithms, namely AdaGra and RMSProp, comprehensively considers the first moment estimation and the second moment estimation of the gradient, and calculates an update step size. The training ending condition of the iterative training may include: the number of iterative training reaches the target number; or the above loss value satisfies the set condition.

Optionally, the convergence condition is that the target loss value is as small as possible, and the initial learning rate 1e-3 is used, the learning rate decays with the cosine of the step number, the batch_size=512, and after training a plurality of epochs, the convergence can be considered to be completed. Where batch_size is understood as a batch parameter whose limit value is the total number of training set samples.

Alternatively, the loss value satisfying the setting condition may include: the loss value is less than the set threshold. Of course, the specific training end condition may not be limiting.

In one possible implementation manner, if the parallax estimation model to be obtained is to output a parallax image corresponding to a right eye according to the input image corresponding to a left eye and the depth image thereof, that is, the above original image is a left eye image corresponding to a left eye, and the obtained second eye image is a right eye image, the above first sample image may be a left eye image in the sample image pair, and the second sample image may be a right eye image in the sample image pair; if the parallax estimation model to be obtained is to output a parallax image corresponding to a left eye according to an input image corresponding to a right eye and a depth image thereof, that is, the above original image is a right eye image corresponding to a right eye, and the obtained second eye image is a left eye image, the above first sample image may be a right eye image in a sample image pair, and the second sample image may be a left eye image in a sample image pair.

In one possible implementation manner, the above initial estimation model includes a generating network and a discriminating network, where the generating network is used for outputting a corresponding parallax estimation image according to each input first sample image and its depth image. That is, the initial estimation model may generate a reactive network (Generative Adversarial Network, GAN), which is a deep learning model that is built by at least two models in a framework: the mutual game learning between the Generative Model and the discriminant Model (Discriminative Model) produces a fairly good output. In the training process of the GAN, the aim of generating the network is to generate a real parallax image as much as possible to deceptively judge the network, and the aim of judging the network is to separate the picture generated by the generating network from the real picture as much as possible, so that a dynamic game process is formed between the generating network and the judging network. In the training process, for the parallax estimation image output by the generation network, the parallax estimation image can be input into the discrimination network, and the second sample image marked by the first sample image can be input into the discrimination network, so that the discrimination result of the discrimination network for the parallax estimation image output by the generation network and the discrimination result of the real second sample image can be obtained; and determining a discrimination loss value of the discrimination network according to the discrimination result of the discrimination network for the parallax estimation image output by the generation network and the discrimination result of the real second sample image.

In addition, a loss value can be determined according to the difference between the second sample image marked by the first sample image and the parallax estimation image, and the loss value can be used for generating the loss value; and determining a total loss value of the initial estimation model based on the discrimination loss value and the generation loss value, performing iterative training on the initial estimation model based on the total loss value until the training is finished, and taking the generated network obtained at the moment as the parallax estimation model in the embodiment of the application.

Step S440: and carrying out post-processing on the target parallax image to obtain a second-order image corresponding to the first-order image.

In the embodiment of the present application, step S440 may refer to the content of other embodiments, which is not described herein.

According to the image processing method, another eye image with parallax can be generated according to the monocular image, so that the field of view corresponding to the image content can be expanded when the image content is displayed, and after the parallax image is obtained for the monocular image, the post-processing is further carried out on the parallax image, and the accuracy and quality of the obtained another eye image can be improved; in addition, when the parallax image is acquired, the parallax image is generated for a single-purpose image by the parallax estimation model trained in advance, so that the accuracy of the acquired other-purpose image can be further improved.

Referring to fig. 11, fig. 11 is a flowchart illustrating an image processing method according to still another embodiment of the present application. The image processing method is applied to the electronic device, and will be described in detail below with respect to the flowchart shown in fig. 11, where the image processing method specifically includes the following steps:

step S510: an original image is acquired as a first-order image.

Step S520: and acquiring a depth image corresponding to the first eye image.

Step S530: and generating a target parallax image based on the depth image and the first target image, wherein parallax exists between the target parallax image and the first target image.

Step S540: and carrying out post-processing on the target parallax image to obtain a second-order image corresponding to the first-order image.

In the embodiment of the present application, the steps S510 to S540 may refer to the content of the foregoing embodiment, and are not described herein.

Step S550: and sending the first-order image and the second-order image to a head-mounted display device so that the head-mounted display device displays the first-order image and the second-order image.

In this embodiment of the present application, the electronic device may be connected to the head-mounted display device, and after the electronic device obtains the second eye image corresponding to the first eye image, the electronic device may send the first eye image and the second eye image to the head-mounted display device, so that the head-mounted display device displays the first eye image and the second eye image. Therefore, when the first-order image and the second-order image are displayed, the image content of the first-order image can enter the left eye of a user, the image content of the second-order image can enter the right eye of the user, after the fusion of the brains of the user, the user can see the image with the visual field range larger than that of the first-order image which is independently displayed, and the expansion of the visual field range is further realized.

For example, referring to fig. 12, the head-mounted display device 200 may be a wireless device, and the head-mounted display device 200 may be connected to the electronic device 100 when the head-mounted display device 200 displays content. The first-order image and the second-order image obtained by the electronic device 100 are transmitted to the head-mounted display device 200, and the head-mounted display device 200 displays the first-order image and the second-order image. The head-mounted display device 200 may be AR glasses, AR helmets, VR glasses, VR helmets, MR (MR) glasses, MR helmets, or the like, without limitation.

In some embodiments, the electronic device may acquire an application scenario for executing the image processing method provided by the embodiment of the present application, and may capture an image in real time by the electronic device, and execute the image processing method provided by the embodiment of the present application with respect to the captured image. That is, the electronic device may acquire the monocular image acquired by the image acquisition device as the first-order image; acquiring a depth image corresponding to the first eye image; generating a target parallax image based on the depth image and the first eye image, wherein parallax exists between the target parallax image and the first eye image; post-processing is carried out on the target parallax image to obtain a second eye image corresponding to the first eye image; the first eye image and the second eye image are then transmitted to the head mounted display device. Thus, the electronic device can generate another target image (namely, the second target image frame) for the shot image (namely, the first target image) in real time, and send the shot image and the other target image to the head-mounted display device for display. The image capturing may be capturing a photograph or capturing a video, and is not limited herein.

In some embodiments, the electronic device may also store the first-order image and the second-order image after acquiring the above second-order image. Therefore, when the first-order image is required to be displayed through the head-mounted display device, the electronic device can send the first-order image and the second-order image to the head-mounted display device, so that the head-mounted display device can display the image with a larger visual field range when displaying the first-order image, and the visual field corresponding to the image content is expanded.

In some embodiments, after obtaining the second-order image corresponding to the first-order image, the electronic device may further apply the first-order image and the second-order image as stereoscopic data to any stereoscopic model, such as a SLAM system, so as to improve the accuracy of the monocular system.

According to the image processing method, another eye image with parallax can be generated according to the monocular original image, the monocular original image and the generated another eye image are sent to the head-mounted display device, the head-mounted display device displays the monocular original image and the generated another eye image, and therefore a user can see an image with a larger visual field range than that of the original image displayed independently, and further the expansion of the visual field range is achieved.

Referring to fig. 13, a block diagram of an image processing apparatus 600 according to an embodiment of the present application is shown. The image processing apparatus 600 is applied to the above-described electronic device, and the image processing apparatus 600 includes: the first image acquisition module 610, the second image acquisition module 620, the parallax image generation module 630, and the image post-processing module 640. The first image obtaining module 610 is configured to obtain an original image as a first destination image; the second image obtaining module 620 is configured to obtain a depth image corresponding to the first destination image; the parallax image generating module 630 is configured to generate a target parallax image based on the depth image and the first destination image, where parallax exists between the target parallax image and the first destination image; the image post-processing module 640 is configured to post-process the target parallax image to obtain a second-order image corresponding to the first-order image.

In some embodiments, the image post-processing module 640 may be specifically configured to: performing interpolation processing on the target parallax image to obtain a first parallax image; and carrying out affine transformation on the first parallax image to obtain a second parallax image aligned with the first eye image, and taking the second parallax image as a second eye image corresponding to the first eye image.

In one possible implementation, the image post-processing module 640 may be specifically configured to: if the first eye image is a left eye image, performing forward mapping processing on the first parallax image to obtain a second parallax image aligned with the first eye image, and using the second parallax image as a second eye image corresponding to the first eye image; and if the first eye image is a right eye image, performing backward mapping processing on the first parallax image to obtain a second parallax image aligned with the first eye image, and taking the second parallax image as a second eye image corresponding to the first eye image.

In one possible implementation, the image post-processing module 640 may be specifically configured to: affine transformation is carried out on the first parallax image, and a second parallax image aligned with the first eye image is obtained; and carrying out hole filling processing on the second parallax image based on the shielding position in the second parallax image, and taking the second parallax image subjected to the hole filling processing as the second eye image.

Alternatively, the image post-processing module 640 may be specifically configured to: performing first interpolation processing on the shielding position in the second parallax image; and filling target pixel values in other areas except the shielding position in the second parallax image.

In one possible implementation, the image post-processing module 640 may be specifically configured to: performing standardization processing on the target parallax image to obtain a first intermediate parallax image; performing second interpolation processing on the first intermediate parallax image to obtain a second intermediate parallax image; and adjusting each pixel value of the second intermediate parallax image according to the maximum pixel value in the target parallax image to obtain the first parallax image.

In some embodiments, the parallax image generation module 630 may be specifically configured to: and inputting the depth image and the first target image into a pre-trained parallax estimation model to obtain a parallax image output by the parallax estimation model, wherein the parallax estimation model is obtained by training in advance according to a first sample image and a depth image corresponding to the first sample image, the first sample image is marked with a corresponding second sample image, and parallax exists between the second sample image and the first sample image.

In one possible implementation, the image processing apparatus 600 may further include an image pair acquisition module, a sample image acquisition module, an image set construction module, and a model training module. The image pair acquisition module is used for acquiring a plurality of sample image pairs acquired by the binocular camera; the sample image acquisition module is used for acquiring a sample image corresponding to a first mesh in the plurality of sample image pairs as the first sample image and an image corresponding to a second mesh in the plurality of sample image pairs as the second sample image; the image set construction module is used for labeling the second sample image corresponding to the first sample image for each first sample image, and acquiring a depth image corresponding to the first sample image to obtain a sample image set; the model training module is used for training an initial estimation model based on the sample image set to obtain the parallax estimation model.

Alternatively, the model training module may be specifically configured to: inputting the first sample image in the sample image set and the depth image corresponding to the first sample image into an initial estimation model to obtain a parallax estimation image output by the initial estimation model; determining a loss value based on a difference between the second sample image noted by the first sample image and the parallax estimation image; and based on the loss value, carrying out iterative updating on the initial estimation model to obtain the parallax estimation model.

In some embodiments, the second image acquisition module 620 may be specifically configured to: and inputting the first order image into a pre-trained depth estimation model to obtain a depth image output by the depth estimation model, wherein the depth image is used as a depth image corresponding to the first order image.

In some embodiments, the image processing apparatus 600 may further include an image transmission module. The image sending module is used for sending the first-order image and the second-order image to the head-mounted display device so that the head-mounted display device can display the first-order image and the second-order image.

In some embodiments, the first image acquisition module 610 may be specifically configured to: and acquiring a monocular image acquired by an image acquisition device as the first eye image.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus and modules described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.

In several embodiments provided herein, the coupling of the modules to each other may be electrical, mechanical, or other.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.

In summary, according to the scheme provided by the application, the original image is obtained and is used as the first-order image; and acquiring a depth image corresponding to the first eye image, generating a target parallax image based on the depth image and the first eye image, wherein parallax exists between the target parallax image and the first eye image, and performing post-processing on the target parallax image to obtain a second eye image corresponding to the first eye image. Therefore, another eye image with parallax can be generated according to the monocular image, so that the field of view corresponding to the image content can be expanded when the image content is displayed, and after the parallax image is obtained for the monocular image, the parallax image is subjected to post-processing, so that the accuracy and quality of the obtained another eye image can be improved.

Referring to fig. 14, a block diagram of an electronic device according to an embodiment of the present application is shown. The electronic device 100 may be a smart phone, a tablet computer, an electronic book, a head mounted display device, or the like capable of running an application program. The electronic device 100 in this application may include one or more of the following components: a processor 110, a memory 120, and one or more applications, wherein the one or more applications may be stored in the memory 120 and configured to be executed by the one or more processors 110, the one or more applications configured to perform the method as described in the foregoing method embodiments.

Processor 110 may include one or more processing cores. The processor 110 utilizes various interfaces and lines to connect various portions of the overall electronic device 100, perform various functions of the electronic device 100, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 120, and invoking data stored in the memory 120. Alternatively, the processor 110 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 110 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), a graphics processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 110 and may be implemented solely by a single communication chip.

The Memory 120 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Memory 120 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described below, etc. The storage data area may also store data created by the electronic device 100 in use (e.g., phonebook, audiovisual data, chat log data), and the like.

Referring to fig. 15, a block diagram of a computer readable storage medium according to an embodiment of the present application is shown. The computer readable medium 800 has stored therein program code which can be invoked by a processor to perform the methods described in the method embodiments described above.

The computer readable storage medium 800 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer readable storage medium 800 comprises a non-volatile computer readable medium (non-transitory computer-readable storage medium). The computer readable storage medium 800 has storage space for program code 810 that performs any of the method steps described above. The program code can be read from or written to one or more computer program products. Program code 810 may be compressed, for example, in a suitable form.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, one of ordinary skill in the art will appreciate that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. An image processing method, the method comprising:

acquiring an original image as a first-order image;

acquiring a depth image corresponding to the first eye image;

generating a target parallax image based on the depth image and the first eye image, wherein parallax exists between the target parallax image and the first eye image;

and carrying out post-processing on the target parallax image to obtain a second-order image corresponding to the first-order image.

2. The method according to claim 1, wherein the post-processing the target parallax image to obtain a second destination image corresponding to the first destination image includes:

Performing interpolation processing on the target parallax image to obtain a first parallax image;

and carrying out affine transformation on the first parallax image to obtain a second parallax image aligned with the first eye image, and taking the second parallax image as a second eye image corresponding to the first eye image.

3. The method according to claim 2, wherein affine transforming the first parallax image to obtain a second parallax image aligned with the first eye image as a second eye image corresponding to the first eye image, includes:

if the first eye image is a left eye image, performing forward mapping processing on the first parallax image to obtain a second parallax image aligned with the first eye image, and using the second parallax image as a second eye image corresponding to the first eye image;

and if the first eye image is a right eye image, performing backward mapping processing on the first parallax image to obtain a second parallax image aligned with the first eye image, and taking the second parallax image as a second eye image corresponding to the first eye image.

4. The method according to claim 2, wherein affine transforming the first parallax image to obtain a second parallax image aligned with the first eye image as a second eye image corresponding to the first eye image, includes:

Affine transformation is carried out on the first parallax image, and a second parallax image aligned with the first eye image is obtained;

and carrying out hole filling processing on the second parallax image based on the shielding position in the second parallax image, and taking the second parallax image subjected to the hole filling processing as the second eye image.

5. The method of claim 4, wherein the performing hole filling processing on the second parallax image based on the occlusion position in the second parallax image comprises:

performing first interpolation processing on the shielding position in the second parallax image;

and filling target pixel values in other areas except the shielding position in the second parallax image.

6. The method according to claim 2, wherein the interpolating the target parallax image to obtain a first parallax image includes:

performing standardization processing on the target parallax image to obtain a first intermediate parallax image;

performing second interpolation processing on the first intermediate parallax image to obtain a second intermediate parallax image;

and adjusting each pixel value of the second intermediate parallax image according to the maximum pixel value in the target parallax image to obtain the first parallax image.

7. The method of claim 1, wherein the generating a target parallax image based on the depth image and the first destination image comprises:

and inputting the depth image and the first target image into a pre-trained parallax estimation model to obtain a parallax image output by the parallax estimation model, wherein the parallax estimation model is obtained by training in advance according to a first sample image and a depth image corresponding to the first sample image, the first sample image is marked with a corresponding second sample image, and parallax exists between the second sample image and the first sample image.

8. The method of claim 7, wherein the disparity estimation model is trained by:

acquiring a plurality of sample image pairs acquired by a binocular camera;

acquiring a sample image corresponding to a first object in the plurality of sample image pairs as the first sample image and an image corresponding to a second object in the plurality of sample image pairs as the second sample image;

labeling the second sample image corresponding to the first sample image for each first sample image, and acquiring a depth image corresponding to the first sample image to obtain a sample image set;

And training an initial estimation model based on the sample image set to obtain the parallax estimation model.

9. The method of claim 8, wherein training an initial estimation model based on the set of sample images to obtain the disparity estimation model comprises:

inputting the first sample image in the sample image set and the depth image corresponding to the first sample image into an initial estimation model to obtain a parallax estimation image output by the initial estimation model;

determining a loss value based on a difference between the second sample image noted by the first sample image and the parallax estimation image;

and based on the loss value, carrying out iterative updating on the initial estimation model to obtain the parallax estimation model.

10. The method according to claim 1, wherein the acquiring the depth image corresponding to the first destination image includes:

and inputting the first order image into a pre-trained depth estimation model to obtain a depth image output by the depth estimation model, wherein the depth image is used as a depth image corresponding to the first order image.

11. The method according to any one of claims 1-10, wherein after the post-processing the target parallax image to obtain a second destination image corresponding to the first destination image, the method further comprises:

And sending the first-order image and the second-order image to a head-mounted display device so that the head-mounted display device displays the first-order image and the second-order image.

12. The method according to any one of claims 1-10, wherein said acquiring an original image as a first destination image comprises:

and acquiring a monocular image acquired by an image acquisition device as the first eye image.

13. An image processing apparatus, characterized in that the apparatus comprises: a first image acquisition module, a second image acquisition module, a parallax image generation module and an image post-processing module, wherein,

the first image acquisition module is used for acquiring an original image as a first target image;

the second image acquisition module is used for acquiring a depth image corresponding to the first target image;

the parallax image generation module is used for generating a target parallax image based on the depth image and the first eye image, wherein parallax exists between the target parallax image and the first eye image;

the image post-processing module is used for carrying out post-processing on the target parallax image to obtain a second-order image corresponding to the first-order image.

14. An electronic device, comprising:

one or more processors;

a memory;

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any of claims 1-12.

15. A computer readable storage medium having stored therein program code which is callable by a processor to perform the method according to any one of claims 1-12.