CN112927281A

CN112927281A - Depth detection method, depth detection device, storage medium, and electronic apparatus

Info

Publication number: CN112927281A
Application number: CN202110367514.0A
Authority: CN
Inventors: 庞若愚
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-04-06
Filing date: 2021-04-06
Publication date: 2021-06-08
Anticipated expiration: 2041-04-06
Also published as: CN112927281B

Abstract

The disclosure provides a depth detection method, a depth detection device, a storage medium and electronic equipment, and relates to the technical field of computer vision. The method comprises the following steps: acquiring point cloud data of an object to be detected, which is acquired by a laser radar, and at least two images of the object to be detected, which are acquired by at least two cameras; determining first depth information of the object to be detected by analyzing the point cloud data, wherein the first depth information comprises first depth values of different areas of the object to be detected; determining second depth information of the object to be detected by performing stereo matching on the at least two images, wherein the second depth information comprises second depth values of different areas of the object to be detected; determining a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value of different areas; and fusing the first depth information and the second depth information by using the first weight value and the second weight value to obtain target depth information of the object to be detected. The application scene is expanded, and the practicability is high.

Description

Depth detection method, depth detection device, storage medium, and electronic apparatus

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to a depth detection method, a depth detection apparatus, a computer-readable storage medium, and an electronic device.

Background

The depth detection refers to detecting the distance between an observer and an object to be detected in the depth direction so as to recover three-dimensional information of the object to be detected.

In the related art, depth Detection is mostly implemented by using a specific sensor and a matching algorithm thereof, where the sensor includes a binocular camera, a LiDAR (light radar Detection and Ranging, for short, LiDAR), a TOF (Time Of Flight) sensor, a structured light camera, and the like. The depth detection by adopting each sensor has certain limitation, for example, the accuracy of the depth value detected by all the sensors to the object to be detected which exceeds the detection range is lower, the accuracy of the depth value detected by the binocular camera to the weak texture part of the object is lower, the laser radar is easily influenced by the multipath interference effect, the accuracy of the depth value detected by the edge part of the object is lower, and the like. Therefore, the related art has high requirements for depth detection scenes and low practicability.

Disclosure of Invention

The disclosure provides a depth detection method, a depth detection device, a computer-readable storage medium and an electronic device, thereby improving the problem that the related technology has high requirements on depth detection scenes at least to a certain extent.

According to a first aspect of the present disclosure, there is provided a depth detection method, including: acquiring point cloud data of an object to be detected, which is acquired by a laser radar, and at least two images of the object to be detected, which are acquired by at least two cameras; determining first depth information of the object to be detected by analyzing the point cloud data, wherein the first depth information comprises first depth values of different areas of the object to be detected; determining second depth information of the object to be detected by performing stereo matching on the at least two images, wherein the second depth information comprises second depth values of different areas of the object to be detected; determining a first weight value corresponding to the first depth value of the different area and a second weight value corresponding to the second depth value; and fusing the first depth information and the second depth information by using the first weight value and the second weight value to obtain target depth information of the object to be detected.

According to a second aspect of the present disclosure, there is provided a depth detection apparatus comprising: a data acquisition module configured to acquire point cloud data of an object to be measured acquired by a laser radar, and at least two images of the object to be measured acquired by at least two cameras; a first depth information determination module configured to determine first depth information of the object to be detected by parsing the point cloud data, the first depth information including first depth values of different regions of the object to be detected; a second depth information determining module configured to determine second depth information of the object to be detected by performing stereo matching on the at least two images, the second depth information including second depth values of different areas of the object to be detected; a weight value determination module configured to determine a first weight value corresponding to a first depth value of the different region and a second weight value corresponding to a second depth value; and the depth information fusion module is configured to fuse the first depth information and the second depth information by using the first weight value and the second weight value to obtain target depth information of the object to be detected.

According to a third aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the depth detection method of the first aspect described above and possible implementations thereof.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the depth detection method of the first aspect described above and possible implementations thereof via execution of the executable instructions.

The technical scheme of the disclosure has the following beneficial effects:

the depth information fusion method and the depth information fusion system have the advantages that the depth information fusion method and the depth information fusion system realize the fusion of the laser radar and the binocular (or multi-view) camera, the limitation of a single sensor system can be overcome, the depth value range which can be detected and the depth detection scene which can be suitable for the depth detection range are expanded, the accuracy of the depth detection is improved, and the practicability is high.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

FIG. 1 shows a system architecture diagram of an operating environment in the present exemplary embodiment;

fig. 2 shows a schematic configuration diagram of an electronic apparatus in the present exemplary embodiment;

FIG. 3 illustrates a flow chart of a depth detection method in the present exemplary embodiment;

FIG. 4 illustrates a flow chart for acquiring point cloud data in the present exemplary embodiment;

fig. 5 shows a flow chart for determining second depth information in the present exemplary embodiment;

FIG. 6 illustrates a flow chart for determining a first weight value and a second weight value in this exemplary embodiment;

FIG. 7 shows a schematic diagram of a range of depth values in the present exemplary embodiment;

fig. 8 shows another flowchart for determining the first weight value and the second weight value in this exemplary embodiment;

FIG. 9 shows a flow chart of another depth detection method in the present exemplary embodiment;

fig. 10 shows a schematic configuration diagram of a depth detection device in the present exemplary embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

In the related art, a scheme of fusing an active depth sensor and a binocular camera has appeared. For example, the data collected by the laser radar and the binocular camera are fused, and the data collected by the two sensors independently are verified with each other to eliminate wrong data. However, the laser radar and the binocular camera both have respective detection ranges, and the detection range obtained by intersecting the detection ranges of the laser radar and the binocular camera is small, so that the depth detection scene is very limited.

In view of the above, exemplary embodiments of the present disclosure first provide a depth detection method. FIG. 1 shows a system architecture diagram of an environment in which the exemplary embodiment operates. Referring to fig. 1, the system architecture includes a data acquisition device 110 and a computing device 120. The data acquisition device 110 includes a camera system 111, a laser radar 112, and a synchronizer 113. The camera system 111 may be used to capture image data of an object to be measured, and includes at least two cameras, such as a first camera 1111 and a second camera 1112 shown in fig. 1, and the camera system 111 is a binocular camera system. The camera system 111 may further include a third camera, a fourth camera, etc., which is not limited in this disclosure. The laser radar 112 may be configured to emit a laser signal to the object to be measured, and obtain point cloud data of the object to be measured by analyzing the received reflected signal. Synchronizer 113 may be used to time synchronize camera system 111 with lidar 112 such that the time at which camera system 111 acquires image data is synchronized with the time at which lidar 112 acquires point cloud data. The data collection device 110 and the computing device 120 may form a connection over a wired or wireless communication link such that the data collection device 110 transmits collected data to the computing device 120. The computing device 120 includes a processor 121 and a memory 122. The memory 122 is used for storing executable instructions of the processor 121, and may also store application data, such as image data, video data, and the like. The processor 121 is configured to execute the depth detection method of the present exemplary embodiment via executing executable instructions to process the data sent by the data acquisition device 110 to obtain corresponding target depth information.

In one embodiment, the data collection device 110 and the computing device 120 may be two devices independent of each other, for example, the data collection device 110 is a robot and the computing device 120 is a computer for controlling the robot.

In another embodiment, the data acquisition device 110 and the computing device 120 may also be integrated in the same device, for example, the vehicle-mounted smart device includes the data acquisition device 110 and the computing device 120, and the depth detection and the automatic driving of the vehicle are realized by performing the whole process of data acquisition and data processing.

Application scenarios of the depth detection method of the present exemplary embodiment include, but are not limited to: the method comprises the steps that when a vehicle or a robot runs automatically, at least two cameras are controlled to collect image data of a to-be-detected object in front, a laser radar is controlled to collect point cloud data of the to-be-detected object, the collected image data and the point cloud data are processed by executing the depth detection method of the exemplary embodiment, target depth information of the to-be-detected object is obtained, then the three-dimensional structure of the to-be-detected object is reconstructed, and a decision in automatic running is determined according to the target depth information.

Exemplary embodiments of the present disclosure also provide an electronic device for performing the above depth detection method. The electronic device may be the computing device 120 described above or the computing device 120 including the data collection device 110.

The structure of the electronic device is exemplarily described below by taking the mobile terminal 200 in fig. 2 as an example. It will be appreciated by those skilled in the art that the configuration of figure 2 can also be applied to fixed type devices, in addition to components specifically intended for mobile purposes.

As shown in fig. 2, the mobile terminal 200 may specifically include: a processor 210, an internal memory 221, an external memory interface 222, a USB (Universal Serial Bus) interface 230, a charging management Module 240, a power management Module 241, a battery 242, an antenna 1, an antenna 2, a mobile communication Module 250, a wireless communication Module 260, an audio Module 270, a speaker 271, a microphone 272, a microphone 273, an earphone interface 274, a sensor Module 280, a display 290, a camera Module 291, a pointer 292, a motor 293, a button 294, and a SIM (Subscriber identity Module) card interface 295.

Processor 210 may include one or more processing units, such as: the Processor 210 may include an AP (Application Processor), a modem Processor, a GPU (Graphics Processing Unit), an ISP (Image Signal Processor), a controller, an encoder, a decoder, a DSP (Digital Signal Processor), a baseband Processor, and/or an NPU (Neural-Network Processing Unit), etc.

The encoder may encode (i.e., compress) the image or video data, for example, encode the photographed image of the object to be measured to form corresponding code stream data, so as to reduce the bandwidth occupied by data transmission; the decoder may decode (i.e., decompress) the code stream data of the image or the video to restore the image or the video data, for example, decode the code stream data corresponding to the image of the object to be detected to obtain the original image data. The mobile terminal 200 may support one or more encoders and decoders. In this way, the mobile terminal 200 may process images or video in a variety of encoding formats, such as: image formats such as JPEG (Joint Photographic Experts Group), PNG (Portable Network Graphics), BMP (Bitmap), and Video formats such as MPEG (Moving Picture Experts Group) 1, MPEG2, h.263, h.264, and HEVC (High Efficiency Video Coding).

In one embodiment, processor 210 may include one or more interfaces through which connections are made to other components of mobile terminal 200.

Internal memory 221 may be used to store computer-executable program code, including instructions. The internal memory 221 may include volatile memory and nonvolatile memory. The processor 210 executes various functional applications of the mobile terminal 200 and data processing by executing instructions stored in the internal memory 221.

The external memory interface 222 may be used to connect an external memory, such as a Micro SD card, for expanding the storage capability of the mobile terminal 200. The external memory communicates with the processor 210 through the external memory interface 222 to implement data storage functions, such as storing images, videos, and other files.

The USB interface 230 is an interface conforming to the USB standard specification, and may be used to connect a charger to charge the mobile terminal 200, or connect an earphone or other electronic devices.

The charge management module 240 is configured to receive a charging input from a charger. While the charging management module 240 charges the battery 242, the power management module 241 may also supply power to the device; the power management module 241 may also monitor the status of the battery.

The wireless communication function of the mobile terminal 200 may be implemented by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, a modem processor, a baseband processor, and the like. The

antennas

1 and 2 are used for transmitting and receiving electromagnetic wave signals. The mobile communication module 250 may provide a solution including 2G/3G/4G/5G wireless communication applied on the mobile terminal 200. The Wireless Communication module 260 may provide Wireless Communication solutions applied to the mobile terminal 200, including WLAN (Wireless Local Area Networks ) (e.g., Wi-Fi (Wireless Fidelity, Wireless Fidelity) Networks), BT (Bluetooth), GNSS (Global Navigation Satellite System), FM (Frequency Modulation), NFC (Near Field Communication), IR (Infrared technology), and the like.

The mobile terminal 200 may implement a display function through the GPU, the display screen 290, the AP, and the like, and display a user interface. For example, when the user starts a photographing function, the mobile terminal 200 may display a photographing interface, a preview image, and the like in the display screen 290.

The mobile terminal 200 may implement a photographing function through the ISP, the camera module 291, the encoder, the decoder, the GPU, the display 290, the AP, and the like. For example, the user may start a service related to depth detection, trigger the start of a shooting function, and at this time, may acquire an image of the object to be detected through the camera module 291.

The mobile terminal 200 may implement an audio function through the audio module 270, the speaker 271, the receiver 272, the microphone 273, the earphone interface 274, the AP, and the like.

The sensor module 280 may include an ambient light sensor 2801, a pressure sensor 2802, a gyroscope sensor 2803, a barometric pressure sensor 2804, etc. to implement a corresponding inductive detection function.

Indicator 292 may be an indicator light that may be used to indicate a state of charge, a change in charge, or may be used to indicate a message, missed call, notification, etc. The motor 293 may generate a vibration cue, may also be used for touch vibration feedback, and the like. The keys 294 include a power-on key, a volume key, and the like.

The mobile terminal 200 may support one or more SIM card interfaces 295 for connecting SIM cards to implement functions such as call and mobile communication.

The depth detection method of the present exemplary embodiment is described below with reference to fig. 3, where fig. 3 shows an exemplary flow of the depth detection method, and may include:

step S310, point cloud data of the object to be detected collected by the laser radar and at least two images of the object to be detected collected by at least two cameras are obtained;

step S320, determining first depth information of the object to be detected by analyzing the point cloud data, wherein the first depth information is first depth values of different areas of the object to be detected;

step S330, determining second depth information of the object to be detected by performing stereo matching on the at least two images, wherein the second depth information comprises second depth values of different areas of the object to be detected;

step S340, determining a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value of different areas;

and step S350, fusing the first depth information and the second depth information by using the first weight value and the second weight value to obtain target depth information of the object to be detected.

By the method, the fusion of the depth information detected by the laser radar and the binocular (or multi-view) camera is realized, the limitation of a single sensor system can be overcome, the range of the detected depth value and the applicable depth detection scene can be expanded, the accuracy of depth detection is improved, and the method has high practicability.

Each step in fig. 3 is explained in detail below.

Referring to fig. 3, in step S310, point cloud data of an object to be measured, which is acquired by a laser radar, and at least two images of the object to be measured, which are acquired by at least two cameras, are acquired.

The object to be measured refers to the environment in front of the laser radar and the camera, and comprises an object in the environment. The laser radar generally comprises a transmitter and a receiver, wherein the transmitter transmits a laser signal, the receiver receives the reflected laser signal after the laser signal is reflected at an object to be detected, the depth information of the object to be detected can be calculated by analyzing the time difference between the transmitted and received laser signals, and meanwhile, the three-dimensional information of the object to be detected is determined according to the coordinate system of the laser radar, so that point cloud data of the object to be detected is generated.

At the same time, a camera system comprising at least two cameras can acquire images of the object to be measured. Taking the binocular camera system as an example, the at least two acquired images include a first image and a second image, the first image may be a left view of binocular, and the second image may be a right view.

In an embodiment, as shown in fig. 4, the acquiring point cloud data of the object to be measured acquired by the laser radar may include:

step S410, acquiring multi-frame point cloud data acquired by the laser radar in the movement process;

and step S420, registering the multi-frame point cloud data, and fusing the registered multi-frame point cloud data to obtain the point cloud data of the object to be detected.

In the movement process of the laser radar, along with the change of the pose, the coordinate system of the laser radar changes, and multi-frame point cloud data are acquired under different poses, wherein each frame of point cloud data is three-dimensional information of an object to be detected in different coordinate systems of the laser radar. Therefore, the multi-frame point cloud data can be registered, the registered point cloud data are in the same coordinate system, the multi-frame point cloud data are fused, dense point cloud data compared with a single frame are obtained, partial wrong points in the point cloud data can be removed, and the accuracy of the point cloud data is improved.

In one embodiment, one frame of the multi-frame point cloud data can be selected as a reference frame, and other frames are registered to the reference frame. For example, the laser radar acquires k frames of point cloud data in the movement process, and the 2 nd frame to the k frame are registered to the 1 st frame by taking the 1 st frame as a reference.

Generally, the object to be measured is a static object, that is, the shape of the object to be measured is not changed during the movement of the laser radar, so that different frames of point cloud data correspond to the object to be measured with the same shape. Therefore, during registration, the optimal transformation parameters are determined for the frame to be registered, and the transformed frame is enabled to be superposed with the reference frame as much as possible.

The present disclosure is not limited to a particular registration algorithm. For example, ICP (Iterative Closest Point) algorithm may be adopted, when registering frame 2 to frame 1, the 2 nd frame Point cloud data is transformed into the coordinate system of frame 1 based on the initial transformation parameters (generally including rotation matrix and translation vector), and the Closest Point is paired with the 1 st frame Point cloud data; calculating the average distance of the nearest point pairs to construct a loss function; continuously reducing the loss function value until convergence by iteratively optimizing the transformation parameters to obtain optimized transformation parameters; and transforming the 2 nd frame point cloud data into the 1 st frame coordinate system by using the optimized transformation parameters, thereby completing the registration of the 2 nd frame to the 1 st frame.

In one embodiment, step S420 may be implemented by:

determining a reference frame in the multi-frame point cloud data, and registering other frame point cloud data except the reference frame to a coordinate system corresponding to the reference frame, wherein the coordinate system is a three-dimensional coordinate system;

in a coordinate system corresponding to the reference frame, dividing a cube or a cuboid grid according to the resolution, actual requirements and the like of the laser radar;

dividing points in each frame of point cloud data after registration into the grids according to x, y and z coordinates of the points, and taking the points in the same grid as homonymous points;

counting the number of points in each grid, if the number of points is less than a threshold value of the number of points with the same name, judging that the points in the grid are error points, and removing the error points, wherein the threshold value of the number of the points with the same name can be determined according to experience or can be determined by combining the frame number of point cloud data, for example, under the condition of obtaining k frames of point cloud data together, the threshold value of the number of the points with the same name can be s.k, and s is a coefficient less than 1, such as 0.5, 0.25 and the like;

and forming a set by the residual points to obtain fused point cloud data.

With continued reference to fig. 3, in step S320, first depth information of the object to be measured is determined by analyzing the point cloud data, where the first depth information includes first depth values of different areas of the object to be measured.

The first depth information refers to depth information of the object to be measured determined based on point cloud data of the laser radar, and comprises first depth values of different areas of the object to be measured. For the convenience of distinguishing, the depth value obtained based on the laser radar is recorded as a first depth value, and the depth value obtained based on the camera system is recorded as a second depth value.

The point cloud data includes coordinates of points, where an axis coordinate corresponding to a depth direction is a depth value, for example, a general z-axis coordinate is a depth value, and of course, an x-axis or a y-axis may also be an axis of the depth direction, which is related to the direction setting of the coordinate system, and this disclosure does not limit this. Therefore, the depth value of the object to be detected can be directly obtained from the point cloud data.

Considering that the depth values in the point cloud data are coordinate values in the lidar coordinate system, they may be further transformed into the coordinate system of the camera system to facilitate the fusion of subsequent depth information. In one embodiment, step S320 may include:

and projecting the point cloud data to a coordinate system of a first camera of the at least two cameras based on a first calibration parameter between the laser radar and the first camera to obtain first depth information of the object to be detected.

In a camera system, each camera has its own camera coordinate system, and usually one camera is selected as a main camera, and the coordinate system of the main camera is used as the coordinate system of the whole camera system. In the present exemplary embodiment, the selected main camera is referred to as a first camera, which may be any one of the at least two cameras. For example, in a binocular camera system, the left camera is generally taken as a main shot, and the left camera may be taken as the first camera.

In the exemplary embodiment, the laser radar and the first camera may be calibrated in advance, for example, a gnomon calibration method may be adopted. The first calibration parameter is a calibration parameter between the laser radar and the first camera, and may be a transformation parameter between a coordinate system of the laser radar and a coordinate system of the first camera. Therefore, after the point cloud data of the laser radar is obtained, the point cloud data can be projected from the coordinate system of the laser radar to the coordinate system of the first camera by adopting the first calibration parameter, so that the coordinates of each point in the coordinate system of the first camera are obtained, and further, the depth value of each point in the coordinate system of the first camera, namely the first depth value, is obtained.

It should be noted that both the lidar and the camera system have a certain resolution when detecting depth information. The receiver, for example a lidar, comprises an array of elements receiving laser signals reflected from different areas of the object to be measured, the depth value resolved by each element being represented as the depth value of a point on the object to be measured, which actually corresponds to a local area of the object to be measured. Thus, the greater the number of elements, the greater the density of the array, and the higher the resolution of the resulting depth values, i.e., the smaller the area corresponding to each dot. For camera systems, the resolution of the detected depth information is related to the number of texture features, feature points, etc. in the image. In the present exemplary embodiment, two concepts of a point and a region in the object to be measured are not particularly distinguished.

The first depth information mainly comprises a set of first depth values of different areas. In addition, the first depth information may also include other information besides the first depth value, such as a first confidence degree corresponding to the first depth value.

In one embodiment, the lidar may output a first confidence level. For example, according to the intensity of the laser signal received by the receiver, a first confidence corresponding to the first depth value of the different regions is quantitatively calculated, and generally, the intensity of the laser signal is positively correlated to the first confidence.

In another embodiment, the first confidence may be determined according to the fusion result of the multiple frames of point cloud data. For example, after dividing the points in each frame of point cloud data after registration into grids according to the x, y, and z coordinates thereof, the number of points in each grid and the depth difference between different points in each grid (i.e., the difference between the depth direction axis coordinates of different points, such as the difference between the z axis coordinates) are counted, and the greater the number of points in a grid, the smaller the depth difference between points, the higher the first confidence of the points. Illustratively, the following equation (1) may be used for calculation:

where Conf1 represents a first confidence. grid_iDenotes the ith lattice, p is the point in the ith lattice, count (grid)_i) Represents grid_iNumber of points within. And k is the frame number of the point cloud data. d denotes depth, where σ d (grid)_i) Represents grid_iDepth value standard deviation, Δ d (grid) of different points within_i) Represents grid_iThe span of depth values of (a) may be, for example, a span in the z-axis. a and b are two empirical indices, illustratively, both a and b are in the (0,1) range.

As can be seen from equation (1), the first confidence is obtained by multiplying the two parts. The larger the number of points in the grid is, the larger the ratio of the points to the number of frames k is, the larger the first part value is, and the larger the first confidence coefficient is; the more concentrated the depth values in the grid are, the smaller the standard deviation is, the smaller the ratio to the depth value span is, the larger the second partial value is, the larger the first confidence is.

With reference to fig. 3, in step S330, second depth information of the object to be measured is determined by performing stereo matching on the at least two images, where the second depth information includes second depth values of different areas of the object to be measured.

The second depth information refers to depth information of the object to be measured determined based on the camera system, and comprises second depth values of different areas of the object to be measured. The at least two images are images acquired by different cameras for the same object to be detected, so that three-dimensional information of the object to be detected can be recovered through three-dimensional reconstruction, and second depth information is obtained.

In one embodiment, referring to fig. 5, step S330 may include:

step S510, performing stereo matching on the at least two images based on a second calibration parameter between the at least two cameras to obtain a binocular disparity map;

and step S520, determining second depth information of the object to be detected according to the binocular disparity map.

As can be seen from the above, each camera in the camera system has its own camera coordinate system, and the coordinate system of the first camera (i.e. the main camera) is selected as the coordinate system of the whole camera system, and other cameras in the camera system and the first camera may be calibrated in advance, for example, the gnomon calibration method may be adopted. The second calibration parameter is a calibration parameter between the other camera and the first camera, and may be a transformation parameter between the coordinate system of the other camera and the coordinate system of the first camera. When the camera system includes three or more cameras, the second camera, the third camera, and the like may be calibrated to the first camera to obtain multiple sets of second calibration parameters.

Based on the second calibration parameter, pairwise stereo matching can be performed on the images. For example, if the camera system includes two cameras, the two cameras respectively capture one image, wherein the first camera captures a first image and the second camera captures a second image; and performing stereo matching on the first image and the second image based on a second calibration parameter between the first camera and the second camera to obtain a binocular disparity map corresponding to the first image and the second image. If the camera system comprises three cameras, the three cameras respectively collect one image, wherein the first camera collects a first image, the second camera collects a second image, and the third camera collects a third image; performing stereo matching on the first image and the second image based on a second calibration parameter between the first camera and the second camera to obtain a binocular disparity map corresponding to the first image and the second image, wherein the binocular disparity map can be recorded as (1-2) binocular disparity maps for convenience of distinguishing; and performing stereo matching on the first image and the third image based on a second calibration parameter between the first camera and the third camera to obtain a binocular disparity map corresponding to the first image and the third image, which can be recorded as (1-3) binocular disparity maps.

The specific algorithm of stereo Matching is not limited in the present disclosure, and may be implemented by, for example, a Semi-Global Matching (SGM) algorithm.

The binocular disparity map includes disparity values of each point, and a second depth value of each point can be calculated by combining parameters of the cameras and second calibration parameters (mainly, base line lengths between different cameras), and the second depth value of each point in the first image is calculated by generally using the first camera as a reference, so as to obtain the second depth information.

When a plurality of binocular disparity maps are obtained, a set of second depth information can be obtained by calculation according to each binocular disparity map, for example, (1-2) second depth information is obtained according to the (1-2) binocular disparity maps, and (1-3) second depth information is obtained according to the (1-3) binocular disparity maps. Further, the second depth values of the same area (or the same point) in different sets of second depth information are fused, for example, an average value may be calculated, so as to obtain a set of fused second depth information.

The second depth information comprises a set of second depth values of the different regions. In addition, the second depth information may also include other information besides the second depth value, such as a second confidence degree corresponding to the second depth value.

In one embodiment, the second confidence level may be estimated using a machine learning model. For example, a convolutional neural network is trained in advance, the at least two images and second depth information (which may be a depth image corresponding to the first image) are input into the convolutional neural network, and after processing, an image with a second confidence level, including the second confidence level of each point in the object to be measured, is output.

In another embodiment, an LRC (Left-Right Consistency) detection algorithm may be used to detect a wrong parallax match, especially an occlusion region at a depth fault of an object to be detected, and a lower second confidence is given to the occlusion region.

With continued reference to fig. 3, in step S340, a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value of the different region are determined.

As can be seen from the above, the first depth information and the second depth information are depth information of the object to be measured obtained through different approaches, and different areas of the object to be measured have the first depth value in the first depth information and the second depth value in the second depth information. The exemplary embodiment determines a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value of the different regions, so as to facilitate subsequent weighted fusion.

It should be noted that it is generally difficult for the first depth information and the second depth information to cover all regions or all points of the object to be measured. Therefore, there may be some regions in which the depth value is detected only in one of the first depth information and the second depth information, i.e., the regions have only one of the first depth value and the second depth value. When the first depth information and the second depth information are fused, the first depth value or the second depth value of the first depth information and the second depth information can be directly adopted without calculating a first weight value and a second weight value. In other regions, depth values are detected in both the first depth information and the second depth information, that is, the depth values have a first depth value and a second depth value, and step S340 mainly calculates a first weight value and a second weight value for the regions.

In one embodiment, referring to fig. 6, step S340 may include:

step S610, acquiring a first depth value range and a second depth value range, wherein the first depth value range is a depth value detection range of the laser radar, and the second depth value range is a depth value detection range of the at least two cameras;

step S620, determining a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value according to the first depth value range, the second depth value range, and the first depth value and the second depth value of the different regions.

The first depth value range, i.e. the range of the laser radar, is generally an index of the performance parameter of the laser radar. The second range of depth values may be determined from the above-mentioned internal parameters, base length, etc. of the at least two cameras. Exemplarily, a laser radar and a binocular camera are arranged in a mobile phone, and when depth detection is performed, the depth value detection range of the laser radar is generally close, the depth value detection range of the binocular camera is relatively far, for example, referring to fig. 7, the first depth value range of the laser radar is 0.1-3 meters, and the second depth value range of a camera system is 0.6-5 meters.

In the range of the first depth value, the first depth value detected based on the laser radar is credible, a higher first weight value can be set, and in the range of the second depth value, the second depth value detected based on the camera system is credible, and a higher second weight value can be set. Therefore, the first weight value and the second weight value can be determined by comparing the first depth value and the second depth value of different areas of the object to be detected with the first depth value range and the second depth value range.

In one embodiment, referring to fig. 8, step S620 may include:

step S810, determining a first depth median and a second depth median, wherein the first depth median is a median of a first depth value range, and the second depth median is a median of a second depth value range;

in step S820, a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value are determined according to a difference between the first depth value and the first depth median and a difference between the second depth value and the second depth median.

Generally, the closer the first depth value is to the median of the first depth, the higher the confidence level is, the larger the first weight value can be set, and the same applies to the second depth value. Therefore, the difference between the first depth value and the first depth median value and the difference between the second depth value and the second depth median value can be used as the basis for measuring the first weight value and the second weight value.

In one embodiment, a first weight value corresponding to the first depth value may be calculated according to a difference between the first depth value and the first depth median, a second weight value corresponding to the second depth value may be calculated according to a difference between the second depth value and the second depth median, and the first weight value and the second weight value may be normalized.

In one embodiment, the first and second weight values may be calculated simultaneously in combination with a difference between the first depth value and a first depth median and a difference between the second depth value and a second depth median. Illustratively, the following equation (2) can be used for calculation:

where w1(p) represents a first weight value of p points, and w2(p) represents a second weight value of p points. d1(p) represents a first depth value for p points, and d2(p) represents a second depth value for p points. med1 denotes a first depth median and med2 denotes a second depth median. | d1(p) -med1| is the difference between the first depth value and the first depth median, and | d2(p) -med2| is the difference between the second depth value and the second depth median. Δ d1 denotes the span of the first range of depth values (i.e. the difference between the upper and lower limit values), and Δ d2 denotes the span of the second range of depth values. It can be seen that diff1(p) is the difference between the normalized first depth value and the first median depth value, and diff2(p) is the difference between the normalized second depth value and the second median depth value. And calculating a first weighted value and a second weighted value according to diff1(p) and diff2 (p).

In one embodiment, the first range of depth values may be divided from the second range of depth values. Referring to fig. 7, a first depth value range of the laser radar intersects with a second depth value range of the camera system to obtain a common range, i.e., a range of 0.6 to 3 meters; the complement of the common range in the first depth value range is a first single-side range, namely a range of 0.1-0.6 m; the complement of the common range in the second range of depth values is the second single-sided range, i.e., the range of 3-5 meters. For any area of the object to be measured, if the first depth value is within the first single-sided range and the second depth value is close to the boundary between the first single-sided range and the common range (for example, the difference between the first depth value and 0.6 m is less than a certain boundary threshold, and the boundary threshold is 0.1 m, for example), setting the first weight value of the area to 1 and setting the second weight value to 0; if the first depth value of the region is close to the boundary of the second single-sided range and the common range (for example, the difference between the first depth value and the 3 meters is less than a certain boundary threshold value, for example, 0.1 meter), and the second depth value is within the second single-sided range, the first weight value of the region is set to 0, and the second weight value is set to 1; if the first depth value and the second depth value are both in the common range, the first weight value and the second weight value are calculated according to the difference between the first depth value and the first depth median and the difference between the second depth value and the second depth median, for example, the calculation may be performed with reference to the above equation (2).

When the first depth information and the second depth information are calculated, corresponding first confidence degree and second confidence degree can be obtained, and the first confidence degree and the second confidence degree can be used for calculating a first weight value and a second weight value. In one embodiment, step S620 may include:

and determining a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value according to at least one of the first depth value range, the second depth value range, the first depth value and the second depth value of the different regions, and a first confidence coefficient corresponding to the first depth value and a second confidence coefficient corresponding to the second depth value.

The following two cases are described separately:

(1) one of the first confidence level and the second confidence level is obtained.

Taking the obtaining of the first confidence level as an example, for any region of the object to be detected, if the first confidence level of the region is lower than the first confidence lower threshold, setting the first weight value of the region to be 0, and setting the second weight value to be 1; if the first confidence of the region is higher than the first confidence upper limit threshold, setting a first weight value of the region to be 1, and setting a second weight value to be 0; and if the first confidence degree of the area is between the first upper confidence limit threshold and the first lower confidence limit threshold, calculating a first weight value and a second weight value according to the difference between the first depth value and the first depth median and the difference between the second depth value and the second depth median. The first upper confidence threshold and the first lower confidence threshold may be set according to experience and actual demand, for example, the first upper confidence threshold is 0.8, and the first lower confidence threshold is 0.2.

(2) Obtaining a first confidence degree and a second confidence degree

In one embodiment, for any region of the object to be detected, if the first confidence of the region is lower than the first confidence threshold and the second confidence is higher than the second confidence threshold, the first weight value of the region is set to 0, and the second weight value is set to 1; if the first confidence degree is higher than the first confidence threshold value and the second confidence degree is lower than the second confidence threshold value, setting the first weight value of the region as 1 and setting the second weight value as 0; if the first confidence is higher than the first confidence threshold and the second confidence is higher than the second confidence threshold, calculating a first weight value and a second weight value according to a difference between the first depth value and the first depth median and a difference between the second depth value and the second depth median, for example, referring to the above formula (2); if the first confidence degree is lower than the first confidence threshold value and the second confidence degree is lower than the second confidence threshold value, the first depth value and the second depth value of the area are abandoned. The first confidence threshold is a confidence lower limit threshold of the depth value detected by the laser radar, the second confidence threshold is a confidence lower limit threshold of the depth value detected by the camera system, and the first confidence threshold and the second confidence threshold may be the same or different and may be set according to the performance of the sensor, the actual requirements, and the like. Illustratively, the first confidence threshold and the second confidence threshold may both be 0.2.

In an embodiment, the first weight value and the second weight value calculated according to the difference between the first depth value and the first depth median and the difference between the second depth value and the second depth median may be further fused with the first confidence degree and the second confidence degree, for example, the first weight value may be subjected to an index or coefficient correction calculation according to the first confidence degree, the second weight value may be subjected to an index or coefficient correction calculation according to the second confidence degree to optimize the first weight value and the second weight value, the optimized first weight value and the optimized second weight value are normalized, and the final first weight value and the final second weight value are output.

Continuing to refer to fig. 3, in step S350, the first depth information and the second depth information are fused by using the first weight value and the second weight value, so as to obtain the target depth information of the object to be measured.

Specifically, for any area of the object to be measured, if the area has the first depth value and does not have the second depth value, the first depth value is taken as the target depth value of the area; if the area has the second depth value and does not have the first depth value, taking the second depth value as the target depth value of the area; and if the area has the first depth value and the second depth value, performing weighted calculation on the first depth value and the second depth value by using the first weight value and the second weight value to obtain a target depth value of the area. Thus, the target depth value of each region is obtained, and the target depth values are formed into a set, namely the target depth information.

By fusing the first depth information and the second depth information, the depth detection method can combine the advantages of the laser radar and the camera system for detecting the depth, fill the depth holes or information loss caused by the material reflectivity and the multipath interference effect of the object to be detected in the first depth information, fill the depth holes or information loss at the depth fault caused by shielding in the second depth information, improve the depth value with lower credibility in the first depth information and the second depth information, improve the accuracy of depth detection, and obtain more accurate and reliable target depth information. The depth value detection range and the applicable scene are expanded, so that the scheme has higher practicability.

In an embodiment, the target depth information may also be filtered, for example, an edge-preserving Filtering algorithm such as BF (Bilateral Filtering), GF (Guided Filtering), FBS (Filter Bank Summation), and the like may be used, so that the edge information in the object to be measured may be preserved while the depth information is smoothed.

Fig. 9 shows an implementation flow of the depth detection method, taking an example of hardware configuration of the lidar and the binocular camera, and includes:

step S901, calibrating a binocular camera;

step S902, calibrating a first camera in a binocular camera and a laser radar;

step S903, collecting multi-frame point cloud data through a laser radar, and collecting two images through a binocular camera;

step S904, registering the collected multi-frame point cloud data, and fusing the multi-frame point cloud data after registration to obtain a frame of dense point cloud data;

step S905, projecting the point cloud data to a coordinate system of the first camera according to calibration parameters of the first camera and the laser radar to obtain first depth information;

step S906, carrying out binocular stereo matching on the two acquired images according to calibration parameters of a binocular camera to obtain second depth information;

step S907, processing the two images and the second depth information by using an LRC algorithm or a machine learning model, and calculating a second confidence coefficient;

step S908, determining a first weight value and a second weight value according to the second confidence coefficient, the first depth information and the second depth information, and performing weighted fusion on the first depth information and the second depth information;

step S909, performing filtering processing on the further fused depth information;

in step S910, the target depth information is output after the filtering process.

In one embodiment, the target depth information and the image acquired by the camera system may form a data set, where the image is used as training data, and the target depth information is used as annotation data (Ground route), which may be used to train a machine learning model related to depth estimation, and is beneficial to improving the accuracy and integrity of the data set.

Exemplary embodiments of the present disclosure also provide a depth detection apparatus. Referring to fig. 10, the depth detection apparatus 1000 may include:

a data acquisition module 1010 configured to acquire point cloud data of an object to be measured acquired by a laser radar, and at least two images of the object to be measured acquired by at least two cameras;

a first depth information determining module 1020 configured to determine first depth information of the object to be detected by parsing the point cloud data, the first depth information including first depth values of different regions of the object to be detected;

a second depth information determining module 1030 configured to determine second depth information of the object to be detected by performing stereo matching on the at least two images, where the second depth information includes second depth values of different regions of the object to be detected;

a weight value determining module 1040 configured to determine a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value of the different region;

the depth information fusion module 1050 is configured to fuse the first depth information and the second depth information by using the first weight value and the second weight value to obtain target depth information of the object to be detected.

In one embodiment, the data acquisition module 1010 is configured to:

acquiring multi-frame point cloud data acquired by a laser radar in a movement process;

and registering the multi-frame point cloud data, and fusing the registered multi-frame point cloud data to obtain the point cloud data of the object to be detected.

In one embodiment, the first depth information determining module 1020 is configured to:

based on a first calibration parameter between the laser radar and a first camera of the at least two cameras, the point cloud data is projected to a coordinate system of the first camera, and first depth information of the object to be detected is obtained.

In one embodiment, the second depth information determining module 1030 is configured to:

performing stereo matching on at least two images based on a second calibration parameter between at least two cameras to obtain a binocular disparity map;

and determining second depth information of the object to be detected according to the binocular disparity map.

In one embodiment, the weight value determination module 1040 is configured to:

acquiring a first depth value range and a second depth value range, wherein the first depth value range is a depth value detection range of the laser radar, and the second depth value range is a depth value detection range of at least two cameras;

and determining a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value according to the first depth value range, the second depth value range, and the first depth value and the second depth value of the different area.

In one embodiment, the weight value determination module 1040 is configured to:

determining a first depth median value and a second depth median value, the first depth median value being a median value of the first range of depth values and the second depth median value being a median value of the second range of depth values;

and determining a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value according to the difference between the first depth value and the first depth median value and the difference between the second depth value and the second depth median value.

In one embodiment, the weight value determination module 1040 is configured to:

and determining a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value according to at least one of the first depth value range, the second depth value range, the first depth value and the second depth value of the different area, and a first confidence coefficient corresponding to the first depth value and a second confidence coefficient corresponding to the second depth value.

The details of the above-mentioned parts of the apparatus have been described in detail in the method part embodiments, and thus are not described again.

Exemplary embodiments of the present disclosure also provide a computer-readable storage medium, which may be implemented in the form of a program product, including program code for causing an electronic device to perform the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned "exemplary method" section of this specification, when the program product is run on the electronic device. In one embodiment, the program product may be embodied as a portable compact disc read only memory (CD-ROM) and include program code, and may be run on an electronic device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, according to exemplary embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the following claims.

Claims

1. A depth detection method, comprising:

acquiring point cloud data of an object to be detected, which is acquired by a laser radar, and at least two images of the object to be detected, which are acquired by at least two cameras;

determining first depth information of the object to be detected by analyzing the point cloud data, wherein the first depth information comprises first depth values of different areas of the object to be detected;

determining second depth information of the object to be detected by performing stereo matching on the at least two images, wherein the second depth information comprises second depth values of different areas of the object to be detected;

determining a first weight value corresponding to the first depth value of the different area and a second weight value corresponding to the second depth value;

and fusing the first depth information and the second depth information by using the first weight value and the second weight value to obtain target depth information of the object to be detected.

2. The method of claim 1, wherein the acquiring point cloud data of the object to be measured acquired by the lidar comprises:

3. The method of claim 1, wherein determining the first depth information of the object to be measured by parsing the point cloud data comprises:

4. The method according to claim 1, wherein the determining second depth information of the object to be measured by stereo matching the at least two images comprises:

performing stereo matching on the at least two images based on a second calibration parameter between the at least two cameras to obtain a binocular disparity map;

5. The method of claim 1, wherein determining that a first weight value corresponding to a first depth value of the different region corresponds to a second weight value corresponding to a second depth value comprises:

acquiring a first depth value range and a second depth value range, wherein the first depth value range is the depth value detection range of the laser radar, and the second depth value range is the depth value detection range of the at least two cameras;

6. The method of claim 5, wherein determining a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value according to the first depth value range, the second depth value range, and the first depth value and the second depth value of the different region comprises:

determining a first depth median value that is a median of the first range of depth values and a second depth median value that is a median of the second range of depth values;

7. The method of claim 5, wherein determining a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value according to the first depth value range, the second depth value range, and the first depth value and the second depth value of the different region comprises:

determining a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value according to at least one of the first depth value range, the second depth value range, the first depth value and the second depth value of the different region, and a first confidence degree corresponding to the first depth value and a second confidence degree corresponding to the second depth value.

8. A depth detection apparatus, comprising:

a data acquisition module configured to acquire point cloud data of an object to be measured acquired by a laser radar, and at least two images of the object to be measured acquired by at least two cameras;

a first depth information determination module configured to determine first depth information of the object to be detected by parsing the point cloud data, the first depth information including first depth values of different regions of the object to be detected;

a second depth information determining module configured to determine second depth information of the object to be detected by performing stereo matching on the at least two images, the second depth information including second depth values of different areas of the object to be detected;

a weight value determination module configured to determine a first weight value corresponding to a first depth value of the different region and a second weight value corresponding to a second depth value;

and the depth information fusion module is configured to fuse the first depth information and the second depth information by using the first weight value and the second weight value to obtain target depth information of the object to be detected.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 7.

10. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of any of claims 1 to 7 via execution of the executable instructions.