CN116342617A

CN116342617A - Depth map generation method and device, electronic equipment and storage medium

Info

Publication number: CN116342617A
Application number: CN202111530298.3A
Authority: CN
Inventors: 黄天胤
Original assignee: Zte Nanjing Co ltd; ZTE Corp
Current assignee: Zte Nanjing Co ltd; ZTE Corp
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2023-06-27
Also published as: WO2023109871A1

Abstract

The embodiment of the application relates to the technical field of computer stereoscopic vision, in particular to a method and a device for generating a depth map, electronic equipment and a storage medium. The method for generating the depth map comprises the following steps: acquiring a left eye color chart and a right eye color chart of a target object; image segmentation is carried out on the left eye color image and the right eye color image, and a left eye area image and a right eye area image are generated; generating a first depth region map according to the left eye region map and the right eye region map; extracting target depth areas meeting preset conditions in the first depth area image and a preset monocular depth image of the target object to form an initial depth image; and generating a depth map of the target object according to the initial depth map and the left eye area map. The method and the device can improve the robustness and the precision of the depth image generation in a complex scene.

Description

Depth map generation method and device, electronic equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of computer stereoscopic vision, in particular to a method and a device for generating a depth map, electronic equipment and a storage medium.

Background

The current common mode of depth image acquisition is that a binocular camera, an RGB-D camera, a laser radar scanner or deep learning is utilized to carry out depth recovery on a monocular image, and the binocular depth recovery efficiency and the accuracy are low; the RGB-D camera generally adopts a Time Of Flight (TOF) depth sensor to acquire depth data, and the maximum acquisition depth Of the TOF depth sensor is usually not more than 5 meters, so that the confidence Of the recovered depth map Of the RGB-D camera is lower in a complex scene; the cost of the laser radar scanner is too high, and a sparse depth map is generally obtained; monocular depth recovery is not robust enough in complex scenes due to the lack of true scale information.

Disclosure of Invention

The main purpose of the embodiment of the application is to provide a depth map generation method, a depth map generation device, electronic equipment and a storage medium. The method aims to improve the robustness and the accuracy of depth image generation in a complex scene.

In order to achieve the above object, an embodiment of the present application provides a method for generating a depth map, including: acquiring a left eye color chart and a right eye color chart of a target object; image segmentation is carried out on the left eye color image and the right eye color image, and a left eye area image and a right eye area image are generated; generating a first depth region map according to the left eye region map and the right eye region map; extracting target depth areas meeting preset conditions in the first depth area image and a preset monocular depth image of the target object to form an initial depth image; and generating a depth map of the target object according to the initial depth map and the left eye area map.

In order to achieve the above object, an embodiment of the present application further provides a depth map generating device, including: the acquisition module is used for acquiring a left eye color chart and a right eye color chart of the target object; the segmentation module is used for carrying out image segmentation on the left-eye color image and the right-eye color image to generate a left-eye area image and a right-eye area image; the first generation module is used for generating a first depth region map according to the left eye region map and the right eye region map; the second generation module is used for extracting target depth areas meeting preset conditions in the first depth area image and the preset monocular depth image of the target object to form an initial depth image; and the third generation module is used for generating the depth map of the target object according to the initial depth map and the left eye area map.

To achieve the above object, an embodiment of the present application further provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the depth map generating method.

To achieve the above object, an embodiment of the present application further provides a computer readable storage medium storing a computer program, where the computer program is executed by a processor to implement the method for generating a depth map described above.

According to the depth map generation method, in the depth map generation process, a left-eye color map and a right-eye color map of a target object are obtained; image segmentation is carried out on the left eye color image and the right eye color image, and a left eye area image and a right eye area image are generated; generating a first depth region map according to the left eye region map and the right eye region map; extracting target depth areas meeting preset conditions in the first depth area image and a preset monocular depth image of the target object to form an initial depth image; generating a depth map of the target object according to the initial depth map and the left eye region map; selecting a target region meeting the conditions from a first depth region image generated based on a binocular image and a preset monocular depth image to form an initial depth image, and then combining the initial depth image and a left-eye image in the binocular image to generate a depth image, wherein a region with higher confidence coefficient can be selected from the monocular depth image and the binocular depth image to generate a new depth image, so that the method and the device can acquire high-precision depth information in a complex scene, and improve the robustness and the precision of the depth image generation in the complex scene; the method solves the technical problem that in the prior art, robustness of each depth recovery method is poor in a complex scene.

Drawings

Fig. 1 is a flowchart of a method for generating a depth map according to an embodiment of the present application;

fig. 2 is a flowchart of step 103 in the method for generating a depth map according to the embodiment of the present application;

fig. 3 is a flowchart of a method for generating a depth map according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an image acquisition device in the depth map generating method according to the embodiment of the present application;

fig. 5 is a schematic terminal diagram in a method for generating a depth map according to an embodiment of the present application;

fig. 6 is a flowchart of a method for generating a depth map according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a depth map generating device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. However, as will be appreciated by those of ordinary skill in the art, in the various embodiments of the present application, numerous technical details have been set forth in order to provide a better understanding of the present application. However, the technical solutions claimed in the present application can be implemented without these technical details and with various changes and modifications based on the following embodiments. The following embodiments are divided for convenience of description, and should not be construed as limiting the specific implementation of the present application, and the embodiments may be mutually combined and referred to without contradiction.

One embodiment of the present application relates to a method for generating a depth map, as shown in fig. 1, including:

step 101, obtaining a left eye color chart and a right eye color chart of a target object.

In an example implementation, a target object is photographed by a binocular camera or a binocular sensor to generate a left eye color map and a right eye color map of the target object; the left-eye color map and the right-eye color map generated by the binocular camera or the binocular sensor may be a left-eye color map sequence and a right-eye color map sequence.

And 102, performing image segmentation on the left-eye color map and the right-eye color map to generate a left-eye area map and a right-eye area map.

In an example implementation, a left-eye color chart and a right-eye color chart are input into an input preset semantic analysis neural network to perform image segmentation, and the left-eye color chart and the right-eye color chart are segmented according to information of each pixel point in the left-eye color chart and the right-eye color chart to generate a left-eye area chart and a right-eye area chart.

And step 103, generating a first depth region map according to the left eye region map and the right eye region map.

In an example implementation, the first depth region map is generated based on the principle of triangulation ranging, and actually, the depth value of each pixel point on the left-eye region map is obtained according to parallax information of the left-eye region map and the right-eye region map and camera parameters of a binocular camera or a binocular sensor forming the left-eye color map and the right-eye color map, so as to form the first depth region map. When the left-eye camera and the right-eye camera in the binocular camera or the binocular sensor are not on the same horizontal line, the rotation factor of the left-eye area diagram and the right-eye area diagram needs to be considered. Step 103 may be implemented by the sub-steps shown in fig. 2, and specifically includes:

in the substep 1031, feature extraction processing is performed on the left-eye region map and the right-eye region map, respectively, to obtain each left-eye feature point of the left-eye region map and each right-eye feature point of the right-eye region map.

In an example implementation, a scale invariant feature transform (Scale Invariant Feature Transform, SIFT) feature extraction is performed on the left-eye region map and the right-eye region map to obtain each left-eye feature point of the left-eye region map and each right-eye feature point of the right-eye region map.

And sub-step 1032, performing feature matching on the left eye feature points and the right eye feature points to obtain the corresponding relation between the left eye feature points and the right eye feature points.

In an example implementation, feature matching is performed on each left-eye feature point on the left-eye region graph and each right-eye feature point on the right-eye region graph, so as to obtain a corresponding relationship between each left-eye feature point and each right-eye feature point, where the corresponding relationship is actually a right-eye feature point of a position of the left-eye feature point on the right-eye region graph after viewing angle conversion.

Sub-step 1033 obtains parallax information between each left-eye feature point and the corresponding right-eye feature point according to the correspondence.

In an example implementation, the correspondence is actually a pair of feature points, one feature point pair includes one left-eye feature point and one right-eye feature point, and parallax information of each left-eye feature point and the right-eye feature point corresponding thereto is calculated according to each feature point pair in the correspondence.

Sub-step 1034, generating depth information of each left-eye feature point according to the camera parameters and parallax information of each left-eye feature point.

In an example implementation, after parallax information is obtained, depth information of each left eye feature point can be calculated through a triangle ranging formula by combining camera parameters of a binocular camera or a binocular sensor when the left eye color chart and the right eye color chart are shot.

Substep 1035 generates a first depth region map from the depth information of each left-eye feature point and the position information of each left-eye feature point.

In an example implementation, the positions of the left-eye feature points in the left-eye regions in the left-eye region map are determined, and depth values of pixel blocks (such as 9*9 and 3*3) with preset sizes around the feature points are set as depth information of the left-eye feature points, so that the first depth region map can be generated by traversing all the left-eye feature points.

And 104, extracting target depth areas meeting preset conditions in the first depth area image and the preset monocular depth image of the target object to form an initial depth image.

In an example implementation, a monocular depth map of the target object is generated by capturing a photograph of the target object by a TOF signal transmitter/receiver; each left-eye color map in the left-eye color map sequence and each right-eye color map in the right-eye color map sequence can correspond to a monocular depth map; the shooting environments, shooting distances and shooting angles of the binocular camera or the binocular sensor and the TOF signal transmitter/receiver should be kept consistent so as to ensure that the generated left-eye color map and right-eye color map can be corresponding to the monocular depth map.

In an example implementation, there are region maps in the monocular depth map that correspond to the first depth region map; selecting one depth region from the first depth region and each depth region of the monocular depth map corresponding to the first depth region as a target depth region according to preset conditions; traversing all the first depth areas and each depth area of the monocular depth map corresponding to the first depth areas, and screening each target depth area; and constructing an initial depth map by each target depth region. Wherein the condition may be an average depth value of the second depth region, a distribution of depth values, etc.

In an example implementation, the algorithm flow of performing semantic segmentation on the left-eye color map and the right-eye color map and then performing depth estimation can improve the algorithm robustness of the depth estimation algorithm in a complex scene.

And 105, generating a depth map of the target object according to the initial depth map and the left eye area map.

In an example implementation, the initial depth map and the left-eye region map are input into a preset optimization network for optimization, wherein the optimization network can be a deep completion network, and the deep completion network comprises semantic edge depth consistency optimization, depth map hole part completion and the like; and taking the boundary of each left eye area in the left eye area map as a boundary constraint condition during the optimization of the initial depth map, so as to optimize the initial depth map and finally generate the depth map of the target object.

In the embodiment of the application, in the process of generating the depth map, a left-eye color map and a right-eye color map of a target object are obtained; image segmentation is carried out on the left eye color image and the right eye color image, and a left eye area image and a right eye area image are generated; generating a first depth region map according to the left eye region map and the right eye region map; extracting target depth areas meeting preset conditions in the first depth area image and a preset monocular depth image of the target object to form an initial depth image; generating a depth map of the target object according to the initial depth map and the left eye region map; selecting a target region meeting the conditions from a first depth region image generated based on a binocular image and a preset monocular depth image to form an initial depth image, and then combining the initial depth image and a left-eye image in the binocular image to generate a depth image, wherein a region with higher confidence coefficient can be selected from the monocular depth image and the binocular depth image to generate a new depth image, so that the method and the device can acquire high-precision depth information in a complex scene, and improve the robustness and the precision of the depth image generation in the complex scene; the method solves the technical problem that in the prior art, robustness of each depth recovery method is poor in a complex scene.

One embodiment of the present application relates to a method for generating a depth map, as shown in fig. 3, including:

step 201, acquiring a left-eye image sequence and a right-eye image sequence through a binocular camera, wherein the left-eye image sequence comprises each left-eye image and a time stamp of each left-eye image, and the left-eye image sequence comprises each right-eye image and a time stamp of each right-eye image.

In an example implementation, the left eye color map and the right eye color map can be obtained through a smart device (such as a smart glasses) comprising a binocular camera, the structure schematic diagram of the smart glasses is shown in fig. 4, and the smart glasses further comprise a time-of-flight transmitting receiver which is used for obtaining an initial depth map of a target object; under normal conditions, the intelligent equipment is connected with the terminal, and after the left-eye color chart and the right-eye color chart are acquired, the left-eye color chart and the right-eye color chart are sent to the terminal and transmitted to the cloud for processing by the terminal.

In an example implementation, an intelligent device obtains a monocular depth map of a target object through a time-of-flight transmitting receiver, and obtains a left-eye image sequence and a right-eye image sequence of the target object through a binocular camera, wherein the left-eye image sequence comprises left-eye images and time stamps of the left-eye images, and the left-eye image sequence comprises right-eye images and time stamps of the right-eye images; and each left-eye image in the left-eye image sequence and each right-eye image in the right-eye image sequence may correspond to a monocular depth map.

Step 202, extracting key points of each left-eye image and key points of each right-eye image.

In an example implementation, a key point extraction operation is performed on each left-eye image in the left-eye image sequence and each right-eye image in the right-eye image sequence, and key points with rotation invariance and luminosity invariance are extracted from each left-eye image in the left-eye image sequence and each right-eye image in the right-eye image sequence.

And 203, selecting left-eye key frame images from the left-eye images according to the number of key points of the left-eye images and the time stamps of the left-eye images, and taking the left-eye key frame images as left-eye color images.

In an example implementation, for each of the left-eye images, the number of keypoints of the left-eye image and the time stamp of the left-eye image are obtained, when the number of keypoints is greater than 75 and the time stamp interval of the left-eye image and the last left-eye image is greater than 0.5s or the number of keypoints is greater than 35 and the time stamp interval of the left-eye image and the last left-eye image is greater than 1s or the number of keypoints is greater than 35, then the current left-eye image is the key frame image and the left-eye image is the left-eye color image is indicated as the left-eye color image.

And 204, selecting a right-eye key frame image from the right-eye images according to the number of key points of the right-eye images and the time stamp of the right-eye images, and taking the right-eye key frame image as a right-eye color image.

In an example implementation, for each right-eye image in each left-eye image, the number of keypoints of the right-eye image and the time stamp of the right-eye image are acquired, when the number of keypoints is greater than 75 and the time stamp interval of the right-eye image and the last right-eye image is greater than 0.5s or the number of keypoints is greater than 35 and the time stamp interval of the right-eye image and the last right-eye image is greater than 1s or the number of keypoints is greater than 35, then the current right-eye image is the key frame image and the left-eye image is the right-eye color image is indicated as zero.

In an example implementation, the screening of the left-eye color image and the right-eye color image is processed and obtained by a processor of the intelligent device, after all the left-eye color image and the right-eye color image are obtained, the left-eye image sequence, the right-eye image sequence, the left-eye color image, the right-eye color image and the monocular depth image are sent to the terminal together, the left-eye color image, the right-eye color image and the monocular depth image are input to a cloud or are processed locally by the terminal, and the left-eye image sequence and the right-eye image sequence are displayed on an image preview interface of the terminal; the terminal may control the frame rate, up to 30HZ, and resolution, up to 1920 x 1080 of the binocular camera video recording.

In an example implementation, as shown in fig. 5, the interface of the terminal further includes a start button, and when the user triggers the button, the smart device starts to collect related data, and displays the left-eye image sequence and the right-eye image sequence on the image preview interface.

And step 205, performing image segmentation on the left-eye color map and the right-eye color map to generate a left-eye area map and a right-eye area map.

In an example implementation, the step is substantially the same as step 102 in the embodiment of the present application, and is not described here in detail.

Step 206, generating a first depth region map according to the left eye region map and the right eye region map.

In an example implementation, the step is substantially the same as step 103 in the embodiment of the present application, and is not described here in detail.

Step 207, extracting the target depth region satisfying the preset condition in the first depth region map and the preset monocular depth map of the target object, so as to form an initial depth map.

In an example implementation, the step is substantially the same as step 104 in the embodiment of the present application, and is not described here in detail.

Step 208, generating a depth map of the target object according to the initial depth map and the left eye region map.

In an exemplary implementation, the step is substantially the same as step 105 in the embodiment of the present application, and is not described here in detail.

According to the embodiment of the application, the intelligent device can collect related images on the basis of other embodiments, and the image data collection mode is very simple. Only the person who draws the picture wears intelligent equipment, clicks a record start key of the terminal after setting related parameters, and walks in the environment to be scanned. And because the scene seen by the image collector is almost consistent with the actual collected image, the image collector does not need to pay attention to the collected image at any time by taking a mobile phone or a camera as in the traditional mode.

One embodiment of the present application relates to a method for generating a depth map, as shown in fig. 6, including:

step 301, obtaining a left eye color chart and a right eye color chart of a target object.

In an example implementation, the step is substantially the same as step 101 in the embodiment of the present application, and is not described here in detail.

Step 302, image segmentation is performed on the left-eye color map and the right-eye color map to generate a left-eye area map and a right-eye area map, and segmentation is performed on the monocular depth map according to the left-eye area map to generate a first depth area map.

And step 303, generating a second depth region map according to the left eye region map and the right eye region map.

Step 304, for each first depth region in the first depth region map, obtaining an average depth value of the first depth region.

In an example implementation, for each first depth region, an average depth value for the first depth region is calculated from depth information for each left-eye feature point or depth information for each pixel point in the first depth region.

In step 305, when the average depth value satisfies the condition, a depth region corresponding to the first depth region in the monocular depth map is taken as the target depth region.

In an example implementation, when the average depth value of the first depth region is smaller than a preset threshold, the first depth region is considered to be a small depth region, and the depth region corresponding to the first depth region in the monocular depth map is considered to be a target depth region when the data acquired by the TOF signal transceiver is more dependent on the first depth region; wherein, the average depth value is smaller than the preset threshold value, and the condition is considered to be satisfied.

In an example implementation, before the target depth region is extracted, the monocular depth map may be segmented according to the positions of the left-eye regions in the left-eye region map, to generate a second depth region map; namely, the positions of the left eye areas in the left eye area map are corresponding to the monocular depth map; after segmentation, each left eye region comprises a first depth region corresponding to the left eye region; each first depth region in the first depth region map has a corresponding second depth region map in the monocular depth map; and when the average depth value of the first depth region is smaller than a preset threshold value, taking a second depth region corresponding to the first depth region in the second depth region map as a target depth region.

And 306, inputting the first depth region and the left eye region corresponding to the first depth region in the left eye region graph into a preset depth recovery network when the average depth value does not meet the condition, generating a first depth recovery region, and taking the first depth recovery region as a target depth region.

In an example implementation, when the average depth value of the first depth region is greater than or equal to a preset threshold, the first depth region is considered to be a large depth region, and the data collected by the binocular camera is relied on at this time, the first depth region and a left eye region corresponding to the first depth region in the left eye region map are input into a preset depth recovery network, and a first depth recovery region is generated and is used as a target depth region. The preset depth restoration network can be a full convolution depth restoration neural network based on residual error learning, and the full convolution depth restoration neural network passes through a convolution layer, a pooling layer, a full connection layer and the like, for example MonoDepth, FCRN-DepthPrediction, denseDepth and the like.

Step 307, traversing each first depth region in the first depth region map, and obtaining each target depth region to form an initial depth map.

In an example implementation, traversing each first depth region in the first depth region map may obtain each target depth region used to construct the initial depth map, thereby constructing the initial depth map.

Step 308, generating a depth map of the target object according to the initial depth map and the left eye region map.

In the embodiment of the present application, on the basis of other embodiments, a certain target depth region may be selected from the first depth region map or the monocular depth map according to the average depth value of the feature points of each first depth region in the first depth region map as a criterion to form an initial depth map; the initial depth map generated by the method can be fused with the region with higher confidence of the first depth region map and the monocular depth map, so that the accuracy of the generated initial depth map is improved.

The above steps of the methods are divided, for clarity of description, and may be combined into one step or split into multiple steps when implemented, so long as they include the same logic relationship, and they are all within the protection scope of this patent; it is within the scope of this patent to add insignificant modifications to the algorithm or flow or introduce insignificant designs, but not to alter the core design of its algorithm and flow.

Another embodiment of the present application relates to a depth map generating apparatus, and details of the depth map generating apparatus of the present embodiment are specifically described below, where the following is only implementation details provided for understanding, and is not necessary for implementing the present embodiment, and fig. 7 is a schematic diagram of the depth map generating apparatus of the present embodiment, including: an acquisition module 401, a segmentation module 402, a first generation module 403, a second generation module 404, and a third generation module 405.

The acquiring module 401 is configured to acquire a left eye color chart and a right eye color chart of the target object.

The segmentation module 402 is configured to perform image segmentation on the left-eye color map and the right-eye color map, and generate a left-eye area map and a right-eye area map.

The first generation module 403 is configured to generate a second depth area map according to the left eye area map and the right eye area map.

The second generating module 404 is configured to extract a target depth region that satisfies a preset condition in the first depth region map and the monocular depth map of the target object, and form an initial depth map.

And a third generating module 405, configured to generate a depth map of the target object according to the initial depth map and the left-eye area map.

It is to be noted that this embodiment is a system embodiment corresponding to the above-described method embodiment, and this embodiment may be implemented in cooperation with the above-described method embodiment. The related technical details and technical effects mentioned in the above embodiments are still valid in this embodiment, and in order to reduce repetition, they are not described here again. Accordingly, the related technical details mentioned in the present embodiment can also be applied to the above-described embodiments.

It should be noted that the embodiment of the present apparatus is mainly described in terms of a method implementation of a depth map generating method provided by the embodiment of the present apparatus, and the implementation needs to be supported by hardware, for example, functions of related modules may be deployed on a processor, so that the processor performs corresponding functions, and in particular, related data generated by the operation may be stored in a memory for subsequent inspection and use.

It should be noted that, each module involved in this embodiment is a logic module, and in practical application, one logic unit may be one physical unit, or may be a part of one physical unit, or may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present application, elements that are not so close to solving the technical problem presented in the present application are not introduced in the present embodiment, but it does not indicate that other elements are not present in the present embodiment.

Another embodiment of the present application relates to an electronic device, as shown in fig. 8, comprising: at least one processor 501; and a memory 502 communicatively coupled to the at least one processor 501; the memory 502 stores instructions executable by the at least one processor 501, and the instructions are executed by the at least one processor 501, so that the at least one processor 501 can execute the depth map generating method in the above embodiments.

Where the memory and the processor are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors and the memory together. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over the wireless medium via the antenna, which further receives the data and transmits the data to the processor.

The processor is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory may be used to store data used by the processor in performing operations.

Another embodiment of the present application relates to a computer-readable storage medium storing a computer program. The computer program implements the above-described method embodiments when executed by a processor.

That is, it will be understood by those skilled in the art that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps in the methods of the embodiments described herein. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples of implementing the present application and that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims

1. The method for generating the depth map is characterized by comprising the following steps of:

acquiring a left eye color chart and a right eye color chart of a target object;

image segmentation is carried out on the left eye color image and the right eye color image, and a left eye area image and a right eye area image are generated;

generating a first depth region map according to the left eye region map and the right eye region map;

extracting target depth areas meeting preset conditions in the first depth area image and a preset monocular depth image of the target object to form an initial depth image;

and generating a depth map of the target object according to the initial depth map and the left eye area map.

2. The method of generating a depth map of claim 1, further comprising: acquiring camera parameters of a binocular camera generating the left eye color chart and the right eye color chart;

the generating a first depth region map according to the left eye region map and the right eye region map includes:

respectively carrying out feature extraction processing on the left eye region diagram and the right eye region diagram to obtain each left eye feature point of the left eye region diagram and each right eye feature point of the right eye region diagram;

performing feature matching on the left eye feature points and the right eye feature points to obtain corresponding relations between the left eye feature points and the right eye feature points;

acquiring parallax information between each left eye feature point and the corresponding right eye feature point according to the corresponding relation;

generating depth information of each left-eye feature point according to the camera parameters and the parallax information of each left-eye feature point;

and generating the first depth region map according to the depth information of each left-eye feature point and the position information of each left-eye feature point.

3. The method for generating a depth map according to claim 1, wherein the extracting the target depth region satisfying a preset condition in the first depth region map and the preset monocular depth map of the target object, to form an initial depth map, includes:

for each first depth region in the first depth region map, acquiring an average depth value of the first depth region;

when the average depth value meets the condition, taking a depth region corresponding to the first depth region in the monocular depth map as the target depth region; or,

when the average depth value does not meet the condition, inputting the first depth region and a left eye region corresponding to the first depth region in the left eye region graph into a preset depth recovery network, generating a first depth recovery region, and taking the first depth recovery region as the target depth region;

traversing each first depth region in the first depth region map, and obtaining each target depth region to form the initial depth map.

4. The method for generating a depth map according to claim 3, wherein the extracting the target depth region satisfying the preset condition in the preset monocular depth map of the first depth region map and the target object comprises:

dividing the monocular depth map according to the positions of the left eye areas in the left eye area map to generate a second depth area map;

the extracting the target depth region meeting the preset condition in the first depth region map and the preset monocular depth map of the target object includes: and extracting the target depth region in the first depth region map and the second depth region map.

5. The method according to any one of claims 1 to 4, wherein the obtaining a left-eye color map and a right-eye color map of the target object includes:

obtaining a left-eye image sequence and a right-eye image sequence through a binocular camera, wherein the left-eye image sequence comprises left-eye images and time stamps of the left-eye images, and the left-eye image sequence comprises right-eye images and time stamps of the right-eye images;

extracting key points of the left eye images and key points of the right eye images;

selecting left-eye key frame images from the left-eye images according to the number of key points of the left-eye images and the time stamps of the left-eye images, and taking the left-eye key frame images as the left-eye color images;

and selecting a right-eye key frame image from the right-eye images according to the number of key points of the right-eye images and the time stamp of the right-eye images, and taking the right-eye key frame image as the right-eye color image.

6. The method according to any one of claims 1 to 4, wherein the generating the depth map of the target object from the initial depth map and the left-eye region map includes:

and inputting the initial depth map and the left eye region map into a preset optimizing network for optimizing processing, and generating the depth map.

7. The method for generating a depth map according to claim 6, wherein the step of inputting the initial depth map and the left-eye region map into a preset optimization network for optimization processing, and generating the depth map comprises:

and in the optimization network, taking the boundaries of each left eye area in the left eye area graph as boundary constraint conditions, performing optimization operation on the initial depth graph, and generating the depth graph of the target object.

8. A depth map generating apparatus, comprising:

the acquisition module is used for acquiring a left eye color chart and a right eye color chart of the target object;

the segmentation module is used for carrying out image segmentation on the left-eye color image and the right-eye color image to generate a left-eye area image and a right-eye area image;

the first generation module is used for generating a first depth region map according to the left eye region map and the right eye region map;

the second generation module is used for extracting target depth areas meeting preset conditions in the first depth area image and the preset monocular depth image of the target object to form an initial depth image;

and the third generation module is used for generating the depth map of the target object according to the initial depth map and the left eye area map.

9. An electronic device, comprising:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of generating a depth map according to any one of claims 1 to 7.

10. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the depth map generation method of any one of claims 1 to 7.