WO2023015880A1 - 训练样本集的获取方法、模型训练方法及相关装置 - Google Patents

训练样本集的获取方法、模型训练方法及相关装置 Download PDF

Info

Publication number
WO2023015880A1
WO2023015880A1 PCT/CN2022/080515 CN2022080515W WO2023015880A1 WO 2023015880 A1 WO2023015880 A1 WO 2023015880A1 CN 2022080515 W CN2022080515 W CN 2022080515W WO 2023015880 A1 WO2023015880 A1 WO 2023015880A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
depth map
sparse
depth
dense
Prior art date
Application number
PCT/CN2022/080515
Other languages
English (en)
French (fr)
Inventor
刘浏
徐玉华
闫敏
余宇山
杨晓立
赵鑫
Original Assignee
深圳奥锐达科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳奥锐达科技有限公司 filed Critical 深圳奥锐达科技有限公司
Publication of WO2023015880A1 publication Critical patent/WO2023015880A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs

Definitions

  • the present application relates to the technical field of computer vision, and in particular to a method for obtaining a training sample set, a method for training a model, and related devices.
  • Existing depth information completion models generally adopt supervised training, and when using supervised training, dense depth measurement results are required as supervised training ground truth (GroundTruth), and the depth information completion model is trained through supervised learning.
  • the dense depth measure that is the ground truth for training is commonly determined by simulation.
  • there will be a deviation between the dense depth measurement obtained by simulation and the actual scene, such deviation will affect the model performance of the trained depth information completion model, which in turn will reduce the accuracy of the dense depth map determined by the depth information completion model .
  • the technical problem to be solved in this application is to provide a method for obtaining a training sample set, a method for training a model, and a related device in view of the deficiencies in the prior art.
  • the first aspect of the embodiment of the present application provides a method for obtaining a training sample set, and the method for obtaining includes:
  • each image group in the several image groups includes the first scene image obtained by the binocular camera and the first dense depth map corresponding to the first scene image, and the first dense depth map obtained by the distance measuring device a first sparse depth map corresponding to the first scene image;
  • An image group whose image difference degree satisfies a preset condition is selected from several image groups, and a data set composed of the selected image groups is used as a training sample set.
  • the method for obtaining the training sample set wherein the first sparse depth map and the first dense depth map in the image group are obtained by collecting the same scene information at the same time with a relatively fixed binocular camera and a distance measuring device .
  • the method for obtaining the training sample set wherein the respectively determining the degree of image difference between the first dense depth map and the first sparse depth map in each image group specifically includes:
  • the method for obtaining the training sample set wherein, in the first dense depth map, the respective image regions corresponding to each sparse depth point in the projected first sparse depth map are selected, and based on all the selected images
  • the area determination depth threshold specifically includes:
  • For each sparse depth point in the projected first sparse depth map select an image region corresponding to the sparse depth point in the first dense depth map, wherein the image region includes the sparse depth point;
  • the dense depth mean value of all acquired dense depth points is calculated, and the dense depth mean value is used as a depth threshold.
  • the method for obtaining the training sample set wherein, based on the sparse depth values corresponding to the respective sparse depth points and the depth threshold, the degree of image difference between the first sparse depth map and the first dense depth map is determined , to obtain the corresponding image difference degree of each image group, which specifically includes:
  • the method for obtaining the training sample set wherein, based on the sparse depth values corresponding to the respective sparse depth points and the depth threshold, the degree of image difference between the first sparse depth map and the first dense depth map is determined , to obtain the corresponding image difference degree of each image group, which specifically includes:
  • the method for obtaining the training sample set wherein, before determining the degree of image difference between the first dense depth map and the first sparse depth map in each image group, the method further includes:
  • the first sparse depth map and the first dense depth map in each image group of the plurality of image groups are transformed into the same coordinate system.
  • the acquisition method of the training sample set wherein the method also includes:
  • All the obtained first enhanced image groups are added to the training sample set, and the added training sample set is used as the training sample set.
  • the acquisition method of the training sample set wherein the method also includes:
  • the rotation and translation matrix that maps the first sparse depth map in the image group to the coordinate system of the first scene image in the image group, wherein the rotation and translation matrix is The distance measuring device is relative to the rotation and translation matrix of the camera used to acquire the first scene image in the binocular camera;
  • All the obtained second enhanced image groups are added to the training sample set, and the added training sample set is used as the training sample set.
  • the second aspect of the embodiment of the present application provides a training method for a depth information completion model, characterized in that the training method uses the training sample set obtained by using the training sample set acquisition method as described above ; Described training method comprises:
  • the preset network model is trained to obtain a depth information completion model.
  • the third aspect of the embodiment of the present application provides a dense depth map acquisition method, characterized in that the acquisition method applies the depth information completion model obtained by the above-mentioned depth information completion model training method,
  • the acquisition methods specifically include:
  • controlling the distance measuring device to obtain a second sparse depth map of the target scene, and synchronously controlling the camera to obtain a second scene image of the target scene;
  • the fourth aspect of the embodiment of the present application provides a training sample set acquisition device, the acquisition device includes:
  • An acquisition module configured to acquire several image groups, wherein each image group in the several image groups includes a first scene image obtained through a binocular camera and a first dense depth map corresponding to the first scene image, and through a first sparse depth map corresponding to the first scene image acquired by the distance measuring device;
  • a determining module configured to respectively determine the image difference between the first dense depth map and the first sparse depth map in each image group
  • the selection module is used to select an image group whose degree of image difference satisfies a preset condition among several image groups, and use the data set formed by the selected image group as a training sample set.
  • the fifth aspect of the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to Realize the steps in the method for obtaining a dense depth map as described above, to realize the steps in the method for training the depth information completion model as described above, and/or to realize the method for obtaining a dense depth map as described above in the steps.
  • a terminal device which includes: a processor, a memory, and a communication bus; a computer-readable program that can be executed by the processor is stored on the memory;
  • the communication bus realizes connection and communication between the processor and the memory
  • the processor executes the computer-readable program, it realizes the steps in the method for obtaining a dense depth map as described above, realizes the steps in the method for training the depth information completion model as described above, and/or realizes Steps in the method for obtaining a dense depth map as described above.
  • the present application provides a training sample set acquisition method, a model training method and related devices.
  • the acquisition method includes acquiring several image groups; respectively determining the first The degree of image difference between the dense depth map and the first sparse depth map; selecting an image group whose image difference degree satisfies a preset condition among several image groups, and using the data set composed of the selected image groups as a training sample set.
  • the first dense depth map is obtained by a binocular camera, and then the image difference between the first dense depth map and the first sparse depth map obtained by a distance measurement device is determined, and the image group whose image difference meets the preset condition
  • the dense depth map obtained by the binocular camera can improve the reliability of the dense depth map
  • the sparse depth map obtained by the distance measurement device can be used to filter the dense depth map, which can eliminate deviations from the actual scene Large dense depth map, and then improve the accuracy of the dense depth map as the training true value, and then can improve the model performance of the depth information completion model trained based on the training sample set.
  • FIG. 1 is a flow chart of the method for obtaining a training sample set provided by the present application.
  • FIG. 2 is a schematic diagram of the positional relationship between the distance measuring device and the binocular camera in the method for obtaining the training sample set provided by the present application.
  • FIG. 3 is a flow chart of step S20 in the method for obtaining a training sample set provided by the present application.
  • FIG. 4 is a flow chart of the training method of the depth information completion model provided by the present application.
  • FIG. 5 is a structural schematic diagram of a preset network model in the method for obtaining a training sample set provided by the present application.
  • FIG. 6 is a flow chart of the method for obtaining a dense depth map provided by the present application.
  • FIG. 7 is a schematic diagram of the structure of the device for obtaining the training sample set provided by the present application.
  • FIG. 8 is a schematic structural diagram of a terminal device provided by the present application.
  • the distance measurement device performs depth measurements to obtain a sparse depth map.
  • the binocular system will be limited by structural jitter, ambient temperature, and surface texture during use.
  • the distance measurement device can guarantee the accuracy of the depth measurement results, the resolution of the acquired sparse depth map is limited.
  • the training methods of the full model are mainly divided into supervised training and unsupervised training. Unsupervised training can get rid of the requirements for training data to a certain extent, but from the current technological development, the measurement accuracy and performance of the depth information completion model obtained by unsupervised training are poor, which cannot meet the needs of practical applications.
  • Supervised training requires the dense depth measurement results as the supervised training ground truth (GroundTruth), and the depth information completion model is trained through supervised learning, wherein the dense depth measurement used as the training ground truth is generally determined by simulation.
  • GroundTruth supervised training ground truth
  • the dense depth measurement used as the training ground truth is generally determined by simulation.
  • an embodiment of the present application provides a method for acquiring a training sample set, the acquisition method includes acquiring several image groups; respectively determining the image difference between the first dense depth map and the first sparse depth map in each image group degree; select an image group whose image difference degree satisfies a preset condition among several image groups, and use the data set formed by the selected image group as a training sample set.
  • the first dense depth map is obtained by a binocular camera, and then the image difference between the first dense depth map and the first sparse depth map obtained by a distance measurement device is determined, and the image group whose image difference meets the preset condition
  • the dense depth map obtained by the binocular camera can improve the reliability of the dense depth map
  • the sparse depth map obtained by the distance measurement device can be used to filter the dense depth map, which can eliminate deviations from the actual scene A large dense depth map, and then improve the accuracy of the dense depth map as the training ground truth, can improve the model performance of the depth information completion model trained based on the training sample set.
  • This embodiment provides a method for obtaining a training sample set, as shown in Figure 1, the method includes:
  • each image group in several image groups includes a first scene image, a first dense depth map, and a first sparse depth map, wherein both the first dense depth map and the first sparse depth map are the first scene image
  • the corresponding depth maps that is to say, both the first dense depth map and the first sparse depth map are obtained by collecting the scene corresponding to the first scene image.
  • the first dense depth map is obtained through a binocular camera
  • the first sparse depth map is obtained through a distance measurement device
  • the first scene image can be obtained through the left camera or the right camera of the binocular camera
  • the first scene image It can be an RGB image or an IR image
  • the binocular camera and the distance measuring device simultaneously collect the collection scene.
  • image group A includes a first scene image a, a first sparse depth map b, and a first dense depth map c, and both the first scene image a and the first dense depth map c are obtained through dual
  • the first sparse depth map b is obtained by the target camera
  • the first sparse depth map b is obtained by synchronously collecting the collection scene corresponding to the first scene image by the distance measuring device.
  • the acquisition scenes corresponding to the first scene images included in each image group can be the same; or, in several image groups, the acquisition scenes corresponding to the first scene images included in some image groups are the same, and the first scene images included in some image groups
  • the collection scenes corresponding to the images are different; or, the collection scenes corresponding to the first scene images included in each of the several image groups are all different.
  • the respective acquisition scenes corresponding to the first scene images included in each image group may be the same, and the respective acquisition times corresponding to the first scene images are different.
  • image groups include image group A and image group B, image group A includes the first scene image a, image group B includes the first scene image b, and the acquisition scene corresponding to the first scene image a corresponds to the first scene image b
  • the acquisition scenes are the same, and the acquisition time corresponding to the first scene image a is different from the acquisition time corresponding to the first scene image b.
  • the binocular camera used to collect the first scene image and the first dense depth map and the distance measuring device used to collect the first sparse depth map can be equipped with the method for running the training sample set provided by this embodiment.
  • the electronic device directly collects the first scene image, the first dense depth map and the first sparse depth map through the electronic device, so as to obtain several image groups.
  • the binocular camera and the distance measuring device can be used as a separate collection device, and the separate collection device is connected to the electronic equipment used to run the method for obtaining the training sample set provided in this embodiment, and the collected first scene image , the first dense depth map, and the first sparse depth map are sent to the electronic device, so that the electronic device acquires several image groups.
  • the binocular camera and the distance device can be used as a separate acquisition device, and after the first scene image, the first dense depth map and the first sparse depth map are collected by the separate acquisition device, the first scene image, the first dense depth map
  • the image and the first sparse depth image are stored in the cloud, so that the electronic device used to run the method for obtaining the training sample set provided by this embodiment can acquire several image groups through the cloud.
  • several image groups may also be obtained in other ways, for example, through external devices, etc., which will not be described here.
  • the distance measuring device is used for emitting and receiving laser beams to form a first sparse depth map.
  • the left and right cameras in the binocular camera respectively receive the light spots or tracks projected by the laser beam emitted by the distance measuring device and perform high-resolution imaging, and then stereoscopically match the images collected by the left camera with the images collected by the right camera to obtain the first A dense depth map.
  • the image collected by the left camera or the right camera can be used as the first scene image.
  • the electronic equipment used to collect several image groups includes a distance measuring device 11 and a binocular camera 12, the distance measuring device 11 is located between the left and right cameras of the binocular camera 12, and the distance The measuring device 11 and the left and right cameras of the binocular camera 12 are located on the same horizontal line.
  • the distance measurement device is a LiDAR or a depth camera using a fixed array emission mode, which can be an area emission or mechanical scanning LiDAR, or can also be based on the time-of-flight principle (including DTOF, ITOF etc.) depth camera.
  • the distance measuring device may include a transmitter, a collector, and a control and processing circuit, and the transmitter includes a light source and an optical emission element, preferably, a beam splitting element and the like.
  • the light source is used to emit laser beams.
  • the light source can be a single light source or a light source array composed of multiple light sources.
  • the light source array can be configured with several sub-light source arrays so that the light source array can emit light in groups.
  • the emitter when the emitter is controlled by the control and processing circuit to emit laser beams, only one sub-light source array or only one light source in each sub-light source array can be turned on at a time to generate a fixed point array for projection on the target surface.
  • the light source is configured as a VCSEL (Vertical-Cavity Surface-Emitting Laser, Vertical-Cavity Surface-Emitting Laser) array light source, which emits arrays through column addressing or binary addressing, and passes through single or multiple lenses.
  • the emitting optical element is modulated and projected on the target surface in the form of a fixed point array.
  • the light source can use EEL (Edge-emitting Laser, edge emitting laser) or VCSEL to emit spot beams
  • the emitting optical elements include collimating lenses and beam splitting elements, which are optically collimated after passing through the emitting optical elements.
  • the beam splitting element splits the beam, and also produces a fixed point array that is projected on the surface of the object.
  • the beam splitting element can be a diffractive optical element (Difractive Optical Element, DOE), a microlens array, etc.
  • the collector may include a pixel unit composed of at least one pixel, a filter unit and a receiving optical element, the receiving optical element images the laser beam reflected by the target onto the pixel array, the filter unit is used to filter out background light and stray light, and the pixel It can be one of photodetectors such as APD, SiPM, SPAD, CCD, and CMOS.
  • the pixel unit may be an image sensor for light time-of-flight measurement, and the pixel unit may also be integrated into a photosensitive chip for light time-of-flight measurement.
  • a pixel unit includes a plurality of SPADs that can respond to an incident single photon and output a photon signal indicating the corresponding arrival time of the received photon at each SPAD.
  • the collector also includes a readout circuit composed of one or more of devices such as a signal amplifier connected to the pixel unit, a time-to-digital converter (TDC), and a digital-to-analog converter (ADC) (these circuits can be integrated with pixels, as part of the collector, or as part of the control and processing circuitry).
  • TDC time-to-digital converter
  • ADC digital-to-analog converter
  • the control and processing circuit can be an independent dedicated circuit, such as an independent circuit with computing power of the depth camera itself; it can also contain a general processing circuit, such as when the depth camera is integrated into a smart terminal such as a mobile phone, TV, computer, etc., the terminal
  • the processor in the can perform the functions of the control and processing circuits.
  • the control and processing circuit simultaneously controls the emitter and the collector, and calculates the depth of the target based on the time difference or phase difference between the emitted beam and the reflected beam.
  • the indirect (ITOF) time-of-flight method can also be used to solve the time-of-flight by solving the phase difference between the transmitted waveform and the received waveform.
  • the distance measuring device is a mechanical scanning LiDAR, and the distance measuring device further includes a mechanical scanning unit, which may be a vibrating mirror, a mirror, MEMS, a wedge mirror, a rotating motor, and the like.
  • the light source is configured as a point light source or a column light source
  • the pixel unit is configured as a single pixel or a pixel column
  • the scanning unit includes a rotating motor, etc.
  • the rotation axis performs 360-degree scanning.
  • the light source also emits in a fixed point array form, and scans and images the surrounding environment with the rotation of the entire transceiver system.
  • the light source is configured as a point light source or a column light source, which also produces a fixed point array to emit light.
  • the dot matrix beam emitted by the light source is projected onto the surface of the object through the mechanical scanning unit.
  • the mechanical scanning unit includes MEMS mirrors, vibrating mirrors, etc., which are used to receive the laser beam emitted by the light source, deflect it and project it onto the surface of the object to form a fixed point array.
  • the binocular camera includes a left camera and a right camera, both of which are high-resolution imaging cameras.
  • the first image can be collected by the left camera, and the right camera can be controlled synchronously to collect the second image, the dense disparity map of the first image and the second image can be determined through the stereo matching algorithm, and the first dense disparity map can be determined based on the internal and external parameters of the binocular camera and the dense disparity map Depth map, and then use the first image or the second image as the first scene image to obtain the first scene image and the first dense depth map.
  • the laser beam sent by the distance measurement device is an infrared laser beam.
  • the left and right cameras of the binocular camera Both use infrared cameras, so that the distance measuring device and the binocular camera can acquire the first sparse depth map, the first scene image and the first dense depth map under dark light conditions.
  • the process of determining the first dense depth map through the first image and the second image can be For: the geometric constraints of the binocular camera will be used to perform de-distortion and limit correction on the first image and the second image, and then perform a pixel-by-pixel search and calculate the matching cost based on the first image and the second image, and use adjacent pixel information to match The cost is optimized, and then the disparity of each pixel is calculated based on the optimized matching cost to obtain a disparity map, and then the disparity map is filtered and denoised by hole filling to obtain the first dense depth map.
  • the stereo matching algorithm used to determine the dense disparity map can be any binocular matching algorithm, for example, SAD (Sum of absolute differences) algorithm and SGBM (Stereo Processing by Semiglobal Matching and Mutual Information) algorithm, etc.
  • a stereo matching model based on deep learning can also be used. After the first image and the second image are acquired, the first image and the second image are used as input items of the stereo matching model, and the first image is output through the stereo matching model. Dense depth map.
  • the binocular camera is time-synchronized with the distance measuring device, and the binocular camera can clearly image the point-by-point or array light spots projected by the transmitter under the near-infrared band emitted by the distance measuring device.
  • the first sparse depth map and the first dense depth map in the image group are obtained from the same scene information collected by the binocular camera and the distance measuring device at a relatively fixed position at the same time.
  • the binocular camera can be triggered based on the instruction of the distance measurement device to emit a laser beam, or the binocular camera can be triggered based on the laser signal collected by the distance measurement device.
  • the eye camera can determine whether the pixel unit of the distance measuring device has received the laser light when acquiring the image group, and if so, trigger an instruction to the binocular camera so that the binocular camera can take pictures.
  • it may also be implemented in other manners, which are not specifically limited here.
  • the image difference degree is used to reflect the degree of deviation between the first dense depth map and the first sparse depth map, wherein the greater the image difference degree, the greater the degree of deviation between the first dense depth map and the first sparse depth map.
  • the smaller the image difference the smaller the deviation between the first dense depth map and the first sparse depth map.
  • image group A includes the first dense depth map a and the first sparse depth map b
  • image group B includes the first dense depth map c and the first sparse depth map d
  • the degree of image difference between the first dense depth map c and the first sparse depth map d is d2.
  • d1 is greater than d2, it indicates the degree of deviation between the first dense depth map a and the first sparse depth map b. Greater than the degree of deviation between the first dense depth map c and the first sparse depth map d.
  • the method before determining the degree of image difference between the first dense depth map and the first sparse depth map in each image group in several image groups, the method also includes:
  • the first sparse depth map and the first dense depth map in each image group of the plurality of image groups are transformed into the same coordinate system.
  • the coordinate system may be the coordinate system where the first sparse depth map is located, or the coordinate system where the first dense depth map is located, or the like.
  • the coordinate system is the coordinate system where the first dense depth map is located, so when converting the first sparse depth map and the first dense depth map to the same coordinate system, it is only necessary to pass the first
  • the sparse depth map is mapped to the coordinate system of the first dense depth map through the first rotation and translation matrix, which can simplify the implementation process of converting the first sparse depth map and the first dense depth map into the same coordinate system with a small amount of calculation.
  • the rotation-translation matrix is an extrinsic parameter between the distance measurement device and the binocular camera, which can be determined through pre-calibration.
  • the determining the degree of image difference between the first dense depth map and the first sparse depth map in each of several image groups specifically includes:
  • the first sparse depth map and the first dense depth map are located in the same coordinate system, and the first sparse depth map and the first dense depth map are at least partially overlapped.
  • the sparse depth map and the first dense depth map are used, as shown in FIG. 2 , the positions of the distance measurement device 11 and the binocular camera 12 are relatively fixed, and the field of view (FOV) 1 of the distance measurement device 11 and the binocular camera 12 are relatively fixed.
  • the distance measuring device 11 determines the first sparse depth map
  • the light source configured by itself emits an area spot beam to the calibration plate
  • the collector configured by itself collects the surface
  • An array of speckle beams and each spot corresponds to a depth
  • the binocular camera 12 also collects an area array of spot beams, the left camera collects an image, and the right camera collects an image, and then the left and right cameras determine the first dense density through a stereo matching algorithm. depth.
  • the field of view (FOV) 1 of the distance measuring device 11 and the field of view (FOV) 2 of the binocular camera 12 there may be an overlapping area in the area, so that there is an overlapping area between the first sparse depth map determined by the distance measuring device and the first dense depth map determined by the binocular camera.
  • the field of view (FOV) of the distance measuring device and the field of view (FOV) of the distance measuring device partially overlap, and the corresponding first sparse depth map and the first dense depth map also partially overlap.
  • the first sparse depth map and the first dense depth map can be projected into a unified coordinate system to generate overlapping regions.
  • the positions of the distance measuring device and the binocular camera are relatively fixed, and the pixel resolution of the collector in the distance measuring device and the pixel resolution of the binocular camera may be the same, for example, both are 640*480.
  • the resolutions of the first sparse depth map and the first dense depth map are different.
  • the first sparse depth map and the first dense depth map are transformed into the same coordinate system, At this time, the first sparse depth map and the first dense depth map at least partially overlap, that is, the depth points in the first sparse depth map coincide with the depth points in the first dense map.
  • the overlapping area may be a square area, a rectangular area, etc., and the overlapping area may be a part of the image area in the first sparse depth map, or all the image areas in the first sparse depth map. That is to say, the first sparse depth map may overlap part of the image area with the first dense depth map, or all image areas may overlap with the first dense depth map.
  • the image regions corresponding to each sparse depth point in the projected first sparse depth map are selected in the first dense depth map, and based on all the selected image regions Determining the depth threshold specifically includes:
  • For each sparse depth point in the projected first sparse depth map select an image region corresponding to the sparse depth point in the first dense depth map, wherein the image region includes the sparse depth point;
  • the dense depth mean value of all acquired dense depth points is calculated, and the dense depth mean value is used as a depth threshold.
  • each sparse depth point in the first sparse depth map after selection and projection is a sparse depth point in the overlapping area
  • the image area corresponding to the first sparse depth point is included in the first dense depth map
  • the first sparse depth point The depth points are located within the image area.
  • the size of the image region corresponding to each first sparse depth point may be the same, and both are much smaller than the image size of the first dense depth map.
  • the image size of the first dense depth map is 640*480
  • the area size of the image area is 3*3.
  • a neighborhood interval of a preset length can be selected centered on the first sparse depth point, for example, centered on the first sparse depth point , select a neighborhood interval with a length of 3, that is, select a 3*3 neighborhood interval.
  • the dense depth value of each selected dense depth point is obtained, and the average value of all the obtained dense depth values is calculated to obtain the depth threshold, where the calculation formula of the depth mean value can be:
  • d j is the dense depth value of the dense depth points in the image area of all the dense depth points obtained
  • N is the number of all the dense depth points obtained.
  • the average value of the dense depth values of all dense depth points in the image area corresponding to the first sparse depth point in the overlapping area after projection is used as the depth threshold, which can reflect the first sparse depth map and the first dense depth as a whole. The deviation of the map, thereby improving the accuracy of the subsequent first dense depth map screening.
  • the depth thresholds corresponding to the sparse depth points in the first sparse depth map may be different, or the depth thresholds corresponding to some sparse depth points are the same, and the depth thresholds corresponding to some sparse depth points are different.
  • the first sparse depth map For example, after projecting the first sparse depth map to the first dense depth map, for each sparse depth point, select a neighborhood region with a preset length for the sparse depth point, and determine all the dense The average value of the dense depth value of the depth point is used as the depth threshold corresponding to the sparse depth point.
  • the image difference between the first sparse depth map and the first dense depth map can be determined based on the depth threshold and the sparse depth values corresponding to each sparse depth point in the first sparse map, wherein the The image difference degree can be the sum of the difference between the sparse depth value of each sparse depth point and the depth threshold value, or the difference between the sparse depth value and the depth threshold value in the sparse depth point is greater than the preset threshold value, or it can be the value of each sparse depth point The average of the difference between the sparse depth value of the depth point and the depth threshold and so on.
  • the image difference between the first sparse depth map and the first dense depth map is determined based on the sparse depth values corresponding to the respective sparse depth points and the depth threshold, In order to obtain the corresponding image difference degree of each image group, it specifically includes:
  • the depth deviation value is used to reflect the deviation between the sparse depth value of the sparse depth point and the depth threshold, wherein the depth deviation value is equal to the absolute value of the difference between the sparse depth value of the sparse depth point and the depth threshold. For example, if the depth threshold is A1, and the sparse depth value of the sparse depth point is A2, then the depth deviation value can be
  • the depth offset value may be determined in other ways, for example, the depth offset value is calculated as the square root of the square difference between the sparse depth value of the sparse depth point and the depth threshold, and the like.
  • the image difference degree between the first sparse depth map and the first dense depth map is calculated based on the obtained depth deviation value, wherein the image difference degree may be equal to all depths
  • the sum of deviation values may also be the arithmetic square root equal to the sum of squares of all depth deviation values.
  • the calculation formula of image difference degree can be:
  • ⁇ d represents the degree of image difference
  • sd represents the first sparse depth map
  • represents the ⁇ norm, which is used to measure the deviation of the first sparse depth map from the depth threshold.
  • the value of ⁇ can be 1, and the image difference degree is the sum of the absolute value of the difference between the sparse depth value of each sparse depth point in the first sparse depth map and the depth threshold value.
  • the image difference value The calculation formula can be:
  • sd i represents the sparse depth value of the i-th sparse depth point
  • the value of ⁇ may be 2, and the degree of image difference is the arithmetic square root of the square sum of the absolute value of the difference between the sparse depth value of each sparse depth point in the first sparse depth map and the depth threshold value.
  • the calculation formula of image difference value can be:
  • sd i represents the sparse depth value of the i-th sparse depth point
  • the image difference between the first sparse depth map and the first dense depth map is determined based on the sparse depth values corresponding to the respective sparse depth points and the depth threshold, In order to obtain the corresponding image difference degree of each image group, it specifically includes:
  • the preset threshold is preset and is used to measure the basis of the depth deviation value.
  • the preset threshold value indicates that the depth deviation value does not meet the requirements, that is, it indicates that the sparse depth corresponding to the depth deviation value point is a sparse depth point that does not meet the requirements; on the contrary, when the depth deviation is less than or equal to the preset threshold, it indicates that the depth deviation value meets the requirements, that is, the sparse depth point corresponding to the depth deviation value is a sparse depth point that meets the requirements.
  • the ratio value is the ratio of all sparse depth points that do not meet the requirements to all sparse depth points included in the first sparse depth map, that is, after obtaining the corresponding depth deviation values of each sparse depth point, respectively.
  • the depth deviation value corresponding to each sparse depth point is compared with the preset threshold value to select the sparse depth point whose depth deviation value is greater than the preset threshold value, and then compare the number of selected sparse depth points with all the first sparse depth map. The number of sparse depth points compared to get this scale value.
  • the number of sparse depth points included in the first sparse depth map is a2
  • the number of sparse depth points in the first sparse depth map whose depth deviation value is greater than the preset threshold is a1, and a1 is less than or equal to a2, then the depth deviation value is greater than
  • the ratio of the sparse depth points of the preset threshold to the sparse depth points in the first sparse depth map is a1/a2.
  • the preset condition is preset and is a measure of the image difference degree, wherein, when the image difference degree satisfies the preset condition, it indicates that the first sparse depth map and the first dense depth map in the image group
  • the image deviation degree meets the requirements, and the image group can be used as a training sample; on the contrary, when the image difference degree does not meet the preset condition, it means that the image deviation degree of the first sparse depth map and the first dense depth map in the image group does not meet Requirements, image groups cannot be used as training samples.
  • image groups include image group A, image group B, and image group C, wherein the image difference degrees corresponding to image group A and image group B meet the preset conditions, and the image difference degrees corresponding to image group C do not meet the preset conditions. condition, then image group A and image group B can be used as training samples, and image group C cannot be used as a training sample, so the training sample set includes image group A and image group B.
  • the preset condition may be determined based on the manner of determining the image difference degree, for example, when the image difference degree is to calculate the image difference degree between the first sparse depth map and the first dense depth map based on the acquired depth deviation value
  • the preset condition may be that the image difference degree is less than the deviation threshold, that is, when the image difference degree is less than the deviation threshold value, it means that the image difference degree satisfies the preset condition; otherwise, when the image difference degree is greater than or equal to the deviation threshold threshold, it means that the degree of image difference does not meet the preset condition; for another example, when the degree of image difference is based on the ratio of the sparse depth points whose depth deviation value is greater than the preset threshold to all the sparse depth points in the first sparse depth map,
  • the preset condition can be that the ratio value is less than the preset ratio threshold, that is, when the image difference degree is less than the preset ratio threshold value, it means that the image difference degree satisfies the preset condition; otherwise,
  • the method for obtaining the training sample set may also include:
  • All the obtained first enhanced image groups are added to the training sample set, and the added training sample set is used as the training sample set.
  • the sparse depth value in the first sparse depth map collected by the distance measurement device is easily affected by ambient light, noise of the circuit itself, thermal noise, etc., resulting in a certain range of jitter errors in the measurement results, and thus obtained by adding the jitter error to the sparse depth value
  • An enhanced depth map containing enhanced depth values is input to the depth completion model for training to improve the adaptability and robustness of the model to measurement errors.
  • the first number may be preset, or determined based on the number of training samples included in the training sample set, and the first number is less than or equal to the number of training samples included in the training sample set, for example, The first number is half of the number of training samples included in the training sample set, and for another example, the first number is equal to the number of training samples included in the training sample set. In a typical implementation manner, the first number is equal to the number of training samples included in the training sample set, so that each training sample in the training sample set can be enhanced.
  • a number of sparse depth points are selected in the first sparse depth map, wherein the selected The number of sparse depth points is less than or equal to the number of all sparse depth points in the first sparse depth map. For example, if the number of all sparse depth points in the first sparse depth map is 100, then the number of selected sparse depth points is less than or equal to 100.
  • the enhanced sparse depth map corresponding to the first sparse depth map is generated by adjusting the sparse depth values of some or all of the sparse depth points in the first sparse depth map, and then the corresponding The enhanced sparse depth map, the first scene image corresponding to the first sparse depth map and the first dense depth map are added to the training sample set as the first enhanced image group, so that the training sample set includes the first sparse depth map, the first sparse An image group composed of the first scene image corresponding to the depth map and the first dense depth map, the enhanced sparse depth map corresponding to the first sparse depth map, and the first scene image corresponding to the first sparse depth map and the first dense depth map.
  • the image group is first enhanced, thereby enriching the training data in the training sample set.
  • the enhanced sparse depth map when the enhanced sparse depth map is determined by adjusting the sparse depth values of the sparse depth points in the first sparse depth map in the image group, multiple sparse depth points can be determined by adjusting different sparse depth points.
  • An enhanced sparse depth map can also be adjusted by using different adjustment values to determine multiple enhanced sparse depth maps. It can be understood that there are many factors affecting the jitter error in the distance measurement device, so the value of the jitter error is not unique, it can be determined by experimental measurement or calculated by theoretical derivation, and there is no specific limitation in the present invention.
  • each sparse depth point in the first sparse depth map may be sequentially Adjust to get multiple enhanced sparse depth maps.
  • the enhanced sparse depth map is determined by adjusting the sparse depth values of the sparse depth points in the first sparse depth map in the image group
  • multiple different adjustment values can be used to adjust the sparse depth values of the first sparse depth map.
  • the sparse depth value of the sparse depth point is adjusted to obtain multiple enhanced sparse depth maps, etc.
  • adjusting the sparse depth values of several sparse depth points in the first sparse depth map in the image group may be based on the sparse depth value of each sparse depth point in the several sparse depth points
  • the adjustment method of adding a preset adjustment value may also be an adjustment method of adding the corresponding depth adjustment value of each sparse depth point to the sparse depth value of each sparse depth point among several sparse depth points, wherein each sparse depth
  • the depth adjustment value corresponding to each point can be determined according to the sparse depth value of each sparse depth point. For example, the depth adjustment value corresponding to each sparse depth point can be determined according to one percent of the sparse depth value of each sparse depth point.
  • the measurement error enhancement is performed on the first sparse depth map by adding an adjustment value to the sparse depth value of the sparse depth point, so that the training sample set can include training samples with measurement errors, so that training based on the training sample set
  • the obtained depth information completion model is adaptable and robust to the measurement error of the distance measurement device, so that the trained depth information completion model can be applied to the distance measurement device in the presence of ambient light noise, circuit thermal noise and circuit readout. Sparse depth maps collected in noisy application scenarios such as noise.
  • the input data of the preset network model is the first sparse depth map and the first scene image, and the first sparse depth map and the first scene image collected by the distance measuring device are mapped to Alignment and registration are performed in the same coordinates.
  • the rotation and translation matrix is the external parameter of the distance measurement device and one camera in the binocular camera.
  • the external parameters of the system are determined and calibrated through offline production calibration or online real-time calibration. However, there will be certain errors in the calibration results themselves.
  • the camera will have structural jitter, which will cause spatial registration errors between the sparse depth map and the scene image mapped to the same coordinate system based on the external parameters of the calibration.
  • the first Noise and jitter are added during the alignment and registration process of the sparse depth map and the first scene image, so that the training sample set carries training samples with spatial registration errors, which in turn can improve the accuracy of the depth information completion model for spatial registration errors.
  • the method for obtaining the training sample set may also include:
  • the rotation and translation matrix that maps the first sparse depth map in the image group to the coordinate system of the first scene image in the image group, wherein the rotation and translation matrix is The distance measuring device is relative to the rotation and translation matrix of the camera used to acquire the first scene image in the binocular camera;
  • All the obtained second enhanced image groups are added to the training sample set, and the added training sample set is used as the training sample set.
  • the second number may be preset, or determined based on the number of training samples included in the training sample set, and the second number is less than or equal to the number of training samples included in the training sample set, for example, The second number is half of the number of training samples included in the training sample set, and for another example, the second number is equal to the number of training samples included in the training sample set.
  • the second data is equal to the number of training samples included in the training sample set, so that each training sample in the training sample set can be enhanced.
  • the rotation and translation matrix is used to map the first sparse depth map to the coordinate system of the first scene image.
  • the first scene image is captured by a camera in the binocular camera, then the distance measurement device and the camera are used as a fusion system, and the calibration algorithm is used to calibrate the external parameters of the distance measurement device and the camera to obtain A rotation-translation matrix between the distance measurement device and the target camera, through which the first sparse depth map can be mapped to the coordinate system of the first scene image.
  • the distance measuring device measures a certain three-dimensional space point P w (X w , Y w , Z w ) to obtain the depth point and the corresponding depth value in the first sparse depth map corresponding to the three-dimensional space point, thus constructing
  • the projection relationship of the three-dimensional space point projection to the coordinate system of the distance measuring device is obtained.
  • the rotation-translation matrix between the distance measurement device and the target camera is set to [R t], then the projection process of the three-dimensional space point P w (X w , Y w , Z w ) onto the target camera coordinate system can be expressed as :
  • K is the internal reference matrix of the target camera
  • R is the rotation matrix from the coordinate system of the distance measurement device to the coordinate system of the target camera
  • t is the translation matrix from the coordinate system of the distance measurement device to the coordinate system of the target camera.
  • the translation matrix includes three degrees of freedom x, y, and z, and the translation distance along the three degrees of freedom (x, y, z) can be expressed as (t 1 , t 2 , t 3 ).
  • the rotation matrix includes three degrees of freedom x, y, and z.
  • the rotation matrix along the three degrees of freedom (x, y, z) can be represented by Euler angles ⁇ ( ⁇ x , ⁇ y , ⁇ z ), which are expressed as:
  • the rotation matrix R can be expressed as:
  • the Euler angles corresponding to the three degrees of freedom in the rotation matrix can be adjusted, and the translation distances of the three degrees of freedom in the translation matrix can also be adjusted, wherein,
  • the adjustment process can be to adjust a parameter in the Euler angles of the three degrees of freedom and the translation distances of the three degrees of freedom, or to adjust the Euler angles of the three degrees of freedom and the translation distances of the three degrees of freedom Two or more parameters in the adjustment of one parameter are adjusted; or, the Euler angles of the three degrees of freedom are adjusted; or, the translation distances of the three degrees of freedom are adjusted, etc.
  • the specific adjustment method can be further adjusted according to actual training requirements. All methods of adjusting the training sample set by adjusting the Euler angles of the three degrees of freedom and the translation distances of the three degrees of freedom belong to the protection scope of the present application.
  • the upper limit of the adjustment value of the Euler angles of the three degrees of freedom of the rotation matrix and the translation distance of the three degrees of freedom of the translation matrix can be determined according to the structural constraints of the distance measurement device and the camera, for example, the three degrees of freedom of the rotation matrix
  • the upper limit of the adjustment value of the Euler angles of four degrees of freedom and the translation distance of three degrees of freedom of the translation matrix can be equal to the maximum value of the structure shake of the distance measuring device and the camera in each degree of freedom, or be determined according to the maximum value and less than the maximum value, etc.
  • the Euler angles of three degrees of freedom are ⁇ ( ⁇ x , ⁇ y , ⁇ z ), and the adjusted Euler angles can be expressed as ⁇ ( ⁇ x ⁇ x , ⁇ y ⁇ y , ⁇ z ⁇ z ), where ⁇ x , ⁇ y , and ⁇ z are the adjustment values of the Euler angles of the three degrees of freedom x, y, and z, respectively, and the value range of ⁇ x is all Euler angles from 0 to x degrees of freedom Adjust the maximum value, the value range of ⁇ y is the maximum value of Euler angle adjustment from 0 to y degree of freedom, and the value range of ⁇ z is the maximum value of Euler angle adjustment from 0 to z degree of freedom.
  • the translational distance of three degrees of freedom can be expressed as (t 1 , t 2 , t 3 ), and the adjusted translational distance can be expressed as (t 1 ⁇ t 1 , t 2 ⁇ t 2 , t 3 ⁇ t 3 ), where , ⁇ t 1 , ⁇ t 2 , ⁇ t 3 are the adjustment values of the translation distance of the three degrees of freedom x, y, and z respectively, and the value range of ⁇ t 1 is the maximum value of the translation distance adjustment from 0 to the x degree of freedom, ⁇ t 2
  • the value range of ⁇ t 3 is the maximum value of translation distance adjustment from 0 to y degree of freedom, and the value range of ⁇ t 3 is the maximum value of translation distance adjustment from 0 to z degree of freedom.
  • the system is used for model training.
  • the error jitter enhancement is performed on the rotation-translation matrix, and the first sparse The depth map is mapped to the coordinate system of the first scene image to obtain the second enhanced image group corresponding to the image group, and the model is trained with the second enhanced image group and the original image group, thereby enriching the training in the training sample set data, which also improves the robustness of the model for depth estimation.
  • the training sample set when the training sample set is enhanced, it can be enhanced only by adjusting the sparse depth value of the first sparse depth map, or by only adjusting the rotation and translation matrix, or , while enhancing by adjusting the sparse depth value of the first sparse depth map and by adjusting the rotation and translation matrix, that is to say, the training sample set can be adjusted by adjusting the sparse depth value of the first sparse depth map Enhance, and then use the method of adjusting the rotation and translation matrix to enhance the enhanced training sample set; or, first use the method of adjusting the rotation and translation matrix to enhance the training sample set, and then use the method of adjusting the first sparse depth map
  • the way of sparse depth value is to enhance the enhanced training sample set and so on.
  • image groups are acquired through a binocular camera and a distance measuring device, and then the image differences between the first dense depth map and the first sparse depth map in each image group are respectively determined; finally, in several image groups Select the image group whose image difference degree satisfies the preset condition, and use the data set composed of the selected image group as the training sample set.
  • the training sample in the training sample set determined in this embodiment is the first dense depth map obtained by the binocular camera as the training true value, which can ensure the reliability of the training true value;
  • the image difference degree of the first sparse depth map obtained by the distance measuring device satisfies the preset condition, which can ensure the matching degree of the first dense depth map and the scene of the first scene image, so that the training sample set provided by this embodiment is used to determine the depth
  • the model performance of the trained deep information completion model can be improved.
  • this embodiment provides a training method for the depth information completion model, and the training method applies the training sample set obtained by the method for obtaining the training sample set described in the above embodiment ;
  • described training method comprises:
  • the predicted dense depth map is predicted by the preset network model based on the first sparse depth map and the first scene image in the training sample, and the image resolution of the predicted dense depth map is equal to the resolution of the first dense depth map
  • the dense depth points in the preset dense depth map correspond one-to-one to the dense depth points in the first dense depth map.
  • the resolution of the first dense depth map is 640*480
  • the resolution of the predicted dense depth map is 640*480.
  • the preset network model is preset, the model structure of the preset network model and the depth information completion model obtained through training, the difference between the two is that the model parameters of the preset network model are initial parameters, and the model parameters of the depth information completion model are
  • the model parameters are the model parameters trained by the training sample set.
  • the preset network model can adopt a neural network model based on deep learning, for example, a convolutional neural network model, a recurrent neural network model, a two-way recurrent neural network model, and a long-term short-term memory network model.
  • the preset network model adopts a codec convolutional neural network, as shown in Figure 5, the preset network model includes an encoding module and a decoding module, and the input items of the encoding module are the first sparse depth map and the first The scene image, the output item of the encoding module is the depth feature map; the input item of the decoding module is the depth feature map, and the output item is the predicted dense depth map.
  • the loss function can be calculated based on the predicted dense depth map and the first dense depth map , and then perform training iterations on the preset network model based on the loss function.
  • the loss function may be a loss function formed by pixel-by-pixel depth, a pixel-by-pixel loss function of depth map gradient, and one or more weighted loss functions among model structure loss functions, etc.
  • the pixel-by-pixel depth loss function can use the mean square error loss function MSE (Mean Squared Error Loss), the mean absolute error loss function MAE (Mean Absolute Error Loss), and the Huber loss function combining the MSE loss function and the MAE loss function.
  • MSE mean square error loss function
  • MAE mean Absolute Error Loss
  • Huber loss function combining the MSE loss function and the MAE loss function.
  • MSE mean square error loss function
  • MAE mean Absolute Error Loss
  • N is the number of dense depth points in the first dense depth map
  • d i is the dense depth value of the dense depth point in the predicted dense depth map
  • is the preset depth deviation threshold
  • the pixel-by-pixel loss function of the depth map gradient can be:
  • the model structure loss L weight can use the L1 attenuation function or the L2 weight attenuation function.
  • the loss function of the pixel-by-pixel depth, the pixel-by-pixel loss function of the depth map gradient, and the missing function determined by the weight of the model structure loss function can be expressed as:
  • L pixel may be L MSE , L MAE or L Huber .
  • this embodiment provides a method for obtaining a dense depth map, and the method for obtaining depth information obtained by using the method for training the depth information completion model described in the above-mentioned embodiment is completed.
  • Model as shown in Figure 6, the acquisition method specifically includes:
  • the camera may be a monocular camera, or a camera in a binocular camera, for example, a left camera.
  • the electronic device running the dense depth map acquisition method provided in this embodiment is configured with a distance measuring device and a monocular camera or a binocular camera.
  • the distance measuring device and the monocular camera are controlled.
  • the camera acquires the second sparse depth map and the second scene image of the target scene synchronously, and then determines the second dense depth map through the trained depth information completion model.
  • one camera in the binocular camera and the distance measuring device can be controlled to obtain the second sparse depth map and the second scene image of the target scene synchronously, and then determine through the trained depth information completion model Second dense depth map.
  • the camera may use an infrared camera.
  • this embodiment provides a device for obtaining a training sample set, as shown in FIG. 7 , the obtaining device includes:
  • the obtaining module 100 is configured to obtain several image groups, wherein each image group in the several image groups includes a first scene image obtained through a binocular camera and a first dense depth map corresponding to the first scene image, and A first sparse depth map corresponding to the first scene image acquired by the distance measuring device;
  • a determining module 200 configured to respectively determine the degree of image difference between the first dense depth map and the first sparse depth map in each image group;
  • the selection module 300 is configured to select an image group whose degree of image difference satisfies a preset condition among several image groups, and use the data set formed by the selected image groups as a training sample set.
  • this embodiment provides a computer-readable storage medium, the computer-readable storage medium stores one or more programs, and the one or more programs can be used by one or more
  • the processor executes to realize the steps in the method for obtaining the training sample set as described in the above-mentioned embodiments.
  • the present application also provides a terminal device, as shown in FIG. 8 , which includes at least one processor (processor) 20; a display screen 21; Communications Interface (Communications Interface) 23 and bus 24.
  • processor processor
  • the display screen 21 is configured to display the preset user guidance interface in the initial setting mode.
  • the communication interface 23 can transmit information.
  • the processor 20 can invoke logic instructions in the memory 22 to execute the methods in the above-mentioned embodiments.
  • logic instructions in the memory 22 may be implemented in the form of software functional units and when sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the memory 22 can be configured to store software programs and computer-executable programs, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure.
  • the processor 20 runs software programs, instructions or modules stored in the memory 22 to execute functional applications and data processing, ie to implement the methods in the above-mentioned embodiments.
  • the memory 22 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and at least one application required by a function; the data storage area may store data created according to the use of the terminal device, and the like.
  • the memory 22 may include a high-speed random access memory, and may also include a non-volatile memory.
  • various media that can store program codes such as U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc., can also be temporary state storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Measurement Of Optical Distance (AREA)
  • Length Measuring Devices By Optical Means (AREA)
  • Image Processing (AREA)

Abstract

本申请公开了一种训练样本集的获取方法、模型训练方法及相关装置,方法包括获取若干图像组,确定图像组中的第一稠密深度图与第一稀疏深度图的图像差异度;在若干图像组中选取图像差异度满足预设条件的图像组以得到训练样本集。本申请中通过双目相机获取稠密深度图,确定稠密深度图与通过距离测量装置获取的稀疏深度图的图像差异度,将图像差异度满足预设条件的图像组作为训练样本,这样通过双目相机获取的稠密深度图可以保证稠密深度图的可靠性,通过距离测量装置获取的稀疏深度图对稠密深度图进行筛选,剔除与实际场景偏差大的稠密深度图以提高作为训练真值的稠密深度图的精确性,提高了基于训练样本集训练得到的深度信息补全模型的模型性能。

Description

训练样本集的获取方法、模型训练方法及相关装置
本申请要求于2021年8月9日提交中国专利局,申请号为202110910264.0,发明名称为“训练样本集的获取方法、模型训练方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机视觉技术领域,特别涉及一种训练样本集的获取方法、模型训练方法及相关装置。
背景技术
现有深度信息补全模型普遍采用有监督训练,而在采用监督训练时需要稠密深度测量结果作为有监督的训练真值(GroundTruth),通过监督学习来训练深度信息补全模型,其中,用于作为训练真值的稠密深度测量普遍通过仿真方式确定的。然而,仿真方式获取的稠密深度测量与实际场景之间会存在偏差,这样偏差影响训练得到的深度信息补全模型的模型性能,进而会降低通过深度信息补全模型确定的稠密深度图的准确性。
因而现有技术还有待改进和提高。
发明内容
本申请要解决的技术问题在于,针对现有技术的不足,提供一种训练样本集的获取方法、模型训练方法及相关装置。
为了解决上述技术问题,本申请实施例第一方面提供了一种训练样本集的获取方法,所述的获取方法包括:
获取若干图像组,其中,若干图像组中的每个图像组均包括通过双目相机获取的第一场景图像和所述第一场景图像对应的第一稠密深度图,以及通过距离测量装置获取的所述第一场景图像对应的第一稀疏深度图;
分别确定各图像组中的第一稠密深度图与第一稀疏深度图的图像差异度;
在若干图像组中选取图像差异度满足预设条件的图像组,并将选取到的图像组构成的数据集作为训练样本集。
所述训练样本集的获取方法,其中,所述图像组中的第一稀疏深度图和第一稠密深度图为同一时间下位置相对固定的双目相机以及距离测量装置采集同一场景信息所得到的。
所述训练样本集的获取方法,其中,所述分别确定各图像组中的第一稠密深度图与第一稀疏深度图的图像差异度具体包括:
对于若干图像组中的每个图像组,将该图像组中的第一稀疏深度图投影至该图像组中的第一稠密深度图;
在所述第一稠密深度图中选取投影后的第一稀疏深度图中的各稀疏深度点各自对应的图像区域,并基于选取到的所有图像区域确定深度阈值;
基于各稀疏深度点各自对应的稀疏深度值以及所述深度阈值,确定所述第一稀疏深度图与所述第一稠密深度图的图像差异度,以得到各图像组各自对应的图像差异度。
所述训练样本集的获取方法,其中,所述在所述第一稠密深度图中选取投影后的第一稀疏深度图中的各稀疏深度点各自对应的图像区域,并基于选取到的所有图像区域确定深度阈值具体包括:
对于投影后的第一稀疏深度图中的每个稀疏深度点,在所述第一稠密深度图中选取该稀疏深度点对应的图像区域,其中,所述图像区域包括该稀疏深度点;
获取选取的所有图像区域所包括的稠密深度点;
计算获取的所有稠密深度点的稠密深度均值,并将所述稠密深度均值作为深度阈值。
所述训练样本集的获取方法,其中,所述基于各稀疏深度点各自对应的稀疏深度值以及所述深度阈值,确定所述第一稀疏深度图与所述第一稠密深度图的图像差异度,以得到各图像组各自对应的图像差异度具体包括:
逐点计算各稀疏深度点各自对应的稀疏深度值与所述深度阈值的深度偏差值;
基于获取到的深度偏差值计算所述第一稀疏深度图与所述第一稠密深度图的图像差异度,以得到各图像组各自对应的图像差异度。
所述训练样本集的获取方法,其中,所述基于各稀疏深度点各自对应的稀疏深度值以及所述深度阈值,确定所述第一稀疏深度图与所述第一稠密深度图的图像差异度,以得到各图像组各自对应的图像差异度具体包括:
逐点计算各稀疏深度点各自对应的稀疏深度值与所述深度阈值的深度偏差值;
确定深度偏差值大于预设阈值的稀疏深度点占所述第一稀疏深度图中的稀疏深度点的比例值,并将所述比例值作为所述第一稀疏深度图与所述第一稠密深度图的图像差异度,以得到各图像组各自对应的图像差异度。
所述训练样本集的获取方法,其中,所述分别确定各图像组中的第一稠密深度图与第一稀疏深度图的图像差异度之前,所述方法还包括:
将若干图像组中的每个图像组中的第一稀疏深度图和第一稠密深度图转换至同一坐标系内。
所述训练样本集的获取方法,其中,所述方法还包括:
在所述训练样本集中选取第一数量的图像组;
对于选取到的图像组中的每个图像组,调整该图像组中的第一稀疏深度图中的若干稀疏深度点的稀疏深度值,得到该图像组对应的第一增强图像组;
将获取到的所有第一增强图像组添加到训练样本集中,并将添加得到的训练样本集作为训练样本集。
所述训练样本集的获取方法,其中,所述方法还包括:
在训练样本集中选取第二数量的图像组;
对于选取到的图像组中的每个图像组,获取将图像组中的第一稀疏深度图映射至图像组中的第一场景图像所在坐标系的旋转平移矩阵,其中,所述旋转平移矩阵为所述距离测量装置相对于所述双目相机中用于获取第一场景图像的相机的旋转平移矩阵;
调整所述旋转平移矩阵,并基于调整后的旋转平移矩阵将所述第一稀疏深度图映射至所述第一场景图像所在坐标系,得到该图像组对应的第二增强图像组;
将获取到的所有第二增强图像组添加到训练样本集中,并将添加得到的训练样本集作为训练样本集。
本申请实施例第二方面提供了一种深度信息补全模型的训练方法,其特征在于,所述的训练方法应用采用如上任一所述的训练样本集的获取方法所获取到的训练样本集;所述的训练方法包括:
将训练样本集中的图像组中的第一稀疏深度图和第一场景图像输入预设网络模型,并通过所述预设网络模型输出所述第一稀疏深度图对应的预测稠密深度图;
基于图像组中的第一稠密深度图以及所述预测稠密深度图,对所述预设网络模型进 行训练,以得到深度信息补全模型。
本申请实施例第三方面提供了、一种稠密深度图的获取方法,其特征在于,所述的获取方法应用如上所述的深度信息补全模型的训练方法所得到的深度信息补全模型,所述的获取方法具体包括:
控制距离测量装置获取目标场景的第二稀疏深度图,并同步控制相机获取目标场景的第二场景图像;
将所述第二稀疏深度图以及所述第二场景图像输入所述深度信息补全模型,以得到所述第二场景图像对应的第二稠密深度图。
本申请实施例第四方面提供了一种训练样本集的获取装置,所述的获取装置包括:
获取模块,用于获取若干图像组,其中,若干图像组中的每个图像组均包括通过双目相机获取的第一场景图像和所述第一场景图像对应的第一稠密深度图,以及通过距离测量装置获取的所述第一场景图像对应的第一稀疏深度图;
确定模块,用于分别确定各图像组中的第一稠密深度图与第一稀疏深度图的图像差异度;
选取模块,用于在若干图像组中选取图像差异度满足预设条件的图像组,并将选取到的图像组构成的数据集作为训练样本集。
本申请实施例第五方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现如上任一所述的稠密深度图的获取方法中的步骤,以实现如上所述的深度信息补全模型的训练方法中的步骤,和/或以实现如上所述的稠密深度图的获取方法中的步骤。
一种终端设备,其包括:处理器、存储器及通信总线;所述存储器上存储有可被所述处理器执行的计算机可读程序;
所述通信总线实现处理器和存储器之间的连接通信;
所述处理器执行所述计算机可读程序时实现如上任一所述的稠密深度图的获取方法中的步骤,实现如上所述的深度信息补全模型的训练方法中的步骤,和/或实现如上所述的稠密深度图的获取方法中的步骤。
有益效果:与现有技术相比,本申请提供了一种训练样本集的获取方法、模型训练方法及相关装置,所述的获取方法包括获取若干图像组;分别确定各图像组中的第一稠 密深度图与第一稀疏深度图的图像差异度;在若干图像组中选取图像差异度满足预设条件的图像组,并将选取到的图像组构成的数据集作为训练样本集。本申请中通过双目相机获取第一稠密深度图,然后再确定第一稠密深度图与通过距离测量装置获取的第一稀疏深度图的图像差异度,将图像差异度满足预设条件的图像组作为训练样本,这样一方面通过双目相机获取的稠密深度图可以提高稠密深度图的可靠性,另一方面通过距离测量装置获取的稀疏深度图对稠密深度图进行筛选,可以剔除与实际场景偏差大的稠密深度图,进而提高作为训练真值的稠密深度图的精确性,进而可以提高基于训练样本集训练得到的深度信息补全模型的模型性能。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员而言,在不符创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请提供的训练样本集的获取方法的流程图。
图2为本申请提供的训练样本集的获取方法中用于距离测量装置与双目相机的位置关系示意图。
图3为本申请提供的训练样本集的获取方法中步骤S20的流程图。
图4为本申请提供的深度信息补全模型的训练方法的流程图。
图5为本申请提供的训练样本集的获取方法中的预设网络模型的结构原理图。
图6为本申请提供的稠密深度图的获取方法的流程图。
图7为本申请提供的训练样本集的获取装置的结构原理图。
图8为本申请提供的终端设备的结构原理图。
具体实施方式
本申请提供一种训练样本集的获取方法、模型训练方法及相关装置,为使本申请的目的、技术方案及效果更加清楚、明确,以下参照附图并举实施例对本申请进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本申请的说明书中 使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解,当我们称元件被“连接”或“耦接”到另一元件时,它可以直接连接或耦接到其他元件,或者也可以存在中间元件。此外,这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一单元和全部组合。
本技术领域技术人员可以理解,除非另外定义,这里使用的所有术语(包括技术术语和科学术语),具有与本申请所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是,诸如通用字典中定义的那些术语,应该被理解为具有与现有技术的上下文中的意义一致的意义,并且除非像这里一样被特定定义,否则不会用理想化或过于正式的含义来解释。
应理解,本实施例中各步骤的序号和大小并不意味着执行顺序的先后,各过程的执行顺序以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
发明人经过研究发现,中远距离深度传感技术普遍应用于自动驾驶以及智能机器人领域,在采用中远距离深度传感技术时,可以通过双目系统进行深度测量以得到稠密深度图,也可以是通过距离测量装置进行深度测量以得到稀疏深度图。然而,双目系统在使用过程中会受结构抖动、环境温度以及物体表面纹理的限制,同时还需要平衡镜头焦距与基线来保证测量精度与盲区,进而限制了双目系统的实际使用,使得许多场景无法应用双目系统进行稠密深度测量。距离测量装置虽然可以保证深度测量结果的精确性,但是获取的稀疏深度图的分辨率有限。
为了解决上述问题,目前普遍是通过距离测量装置来获取深度结果可靠的稀疏深度结果,再基于深度学习的深度信息补全模型以及稀疏深度结果确定稠密深度结果,其中,基于深度学习的深度信息补全模型的训练方式主要分为有监督训练和无监督训练。无监督训练可以在一定程度上摆脱对于训练数据的要求,但是从目前的技术发展来看,无监督训练得到的深度信息补全模型的测量精度及性能均较差,无法满足实际应用需求。监督训练则需要稠密深度测量结果作为有监督的训练真值(GroundTruth),通过监督学习来训练深度信息补全模型,其中,用于作为训练真值的稠密深度测量普遍通过仿真方式确定的。然而,仿真方式获取的稠密深度测量与实际场景之间会存在偏差,这样偏 差影响训练得到的深度信息补全模型的模型性能,进而会降低通过深度信息补全模型确定的稠密深度图的准确性。
基于此,在本申请实施例提供了一种训练样本集的获取方法,所述获取方法包括获取若干图像组;分别确定各图像组中的第一稠密深度图与第一稀疏深度图的图像差异度;在若干图像组中选取图像差异度满足预设条件的图像组,并将选取到的图像组构成的数据集作为训练样本集。本申请中通过双目相机获取第一稠密深度图,然后再确定第一稠密深度图与通过距离测量装置获取的第一稀疏深度图的图像差异度,将图像差异度满足预设条件的图像组作为训练样本,这样一方面通过双目相机获取的稠密深度图可以提高稠密深度图的可靠性,另一方面通过距离测量装置获取的稀疏深度图对稠密深度图进行筛选,可以剔除与实际场景偏差大的稠密深度图,进而提高作为训练真值的稠密深度图的精确性,可以提高基于训练样本集训练得到的深度信息补全模型的模型性能。
下面结合附图,通过对实施例的描述,对申请内容作进一步说明。
本实施例提供了一种训练样本集的获取方法,如图1所示,所述方法包括:
S10、获取若干图像组。
具体地,若干图像组中的每个图像组均包括第一场景图像、第一稠密深度图以及第一稀疏深度图,其中,第一稠密深度图和第一稀疏深度图均为第一场景图像对应的深度图,也就是说,第一稠密深度图和第一稀疏深度图均是通过对第一场景图像对应的采集场景进行采集得到。第一稠密深度图为通过双目相机获取得到的,第一稀疏深度图为通过距离测量装置获取得到,第一场景图像可以是通过双目相机的左相机或者右相机获取得到,第一场景图像可以是RGB图像也可以是IR图像,并且双目相机和距离测量装置同步对采集场景进行采集。例如,若干图像组中包括图像组A,图像组包括第一场景图a、第一稀疏深度图b和第一稠密深度图c,第一场景图像a和第一稠密深度图c均是通过双目相机获取得到的,第一稀疏深度图b是通过距离测量装置同步对第一场景图像对应的采集场景进行采集所得到的。
各图像组各自包括的第一场景图像各自对应的采集场景可以相同;或者是,若干图像组中的存在部分图像组包括的第一场景图像对应的采集场景相同,部分图像组包括的第一场景图像对应的采集场景不同;或者是,若干图像组中的各图像组各自包括的第一场景图像对应的采集场景均不相同等。在一个典型实现方式中,各图像组各自包括的第 一场景图像各自对应的采集场景可以相同,各第一场景图像各自对应的采集时间不相同。例如,若干图像组包括图像组A和图像组B,图像组A包括第一场景图像a,图像组B包括第一场景图像b,第一场景图像a对应的采集场景和第一场景图像b对应的采集场景相同,第一场景图像a对应的采集时间和第一场景图像b对应的采集时间不同。
此外,用于采集第一场景图像和第一稠密深度图的双目相机和用于采集第一稀疏深度图的距离测量装置可以装配于用于运行本实施例提供的训练样本集的获取方法的电子设备,通过该电子设备直接采集第一场景图像、第一稠密深度图以及第一稀疏深度图,以获取到若干图像组。或者是,双目相机和距离测量装置可以作为单独采集装置,并且该单独采集装置与用于运行本实施例提供的训练样本集的获取方法的电子设备相连接,将采集到的第一场景图像、第一稠密深度图以及第一稀疏深度图发送给电子设备,以使得电子设备获取到若干图像组。或者是,双目相机和距离装置可以作为单独采集装置,该单独采集装置在采集到第一场景图像、第一稠密深度图以及第一稀疏深度图后,将第一场景图像、第一稠密深度图以及第一稀疏深度图存储于云端,以使得用于运行本实施例提供的训练样本集的获取方法的电子设备可以通过云端获取到若干图像组。当然,在实际应用中,还可以采用其他方式获取若干图像组,例如,通过外部设备获取等,这里就不一一说明。
距离测量装置用于发射并接收激光光束以形成第一稀疏深度图。双目相机中的左右相机分别接收距离测量装置发射的激光光束所投射出的光斑或者轨迹并进行高分辨率成像,再将左相机采集的图像与右相机采集的图像进行立体匹配,以得到第一稠密深度图,同时可以将左相机或者右相机采集的图像作为第一场景图像。在一个典型实现方式中,如图2所示,用于采集若干图像组的电子设备包括距离测量装置11和双目相机12,距离测量装置11位于双目相机12的左右相机之间,并且距离测量装置11与双目相机12的左右相机位于同一水平线上。
本实施例的一个实现方式中,距离测量装置为采用固定阵列发射模式的LiDAR或深度相机,可以是面阵发射式或机械扫描式的LiDAR,或者也可以是基于飞行时间原理(包括DTOF、ITOF等)的深度相机。在一个具体实现方式中,距离测量装置可以包括发射器、采集器以及控制和处理电路,发射器包括光源以及发射光学元件,优选地,还包括分束元件等。其中,光源用于发射激光光束,光源可以是单个光源或者是由多个光源组 成的光源阵列,光源阵列可以被配置若干子光源阵列以使得光源阵列可以分组发光,例如,将一行或一列光源作为一个子光源阵列,或者是,将两行或两列作为一个子光源阵列等。由此,当通过控制和处理电路控制发射器发射激光光束时,可以一次仅开启一个子光源阵列或者仅开启每个子光源阵列中的一个光源,以产生固定的点阵列形式投影在目标表面。一种典型的实例,光源配置为VCSEL(Vertical-Cavity Surface-Emitting Laser,垂直腔面发射激光器)阵列光源,通过列寻址或二位寻址进行阵列发射,并经过单个或多个透镜构成的发射光学元件调制后以固定的点阵列形式投影在目标表面。又一种典型的实例,光源可以使用EEL(Edge-emitting Laser,边发射激光器)或VCSEL发射斑点光束,发射光学元件包括准直透镜以及分束元件,经过发射光学元件后进行光学准直并由分束元件进行分束,同样产生固定的点阵列形式投影在物体表面,分束元件可以是衍射光源元件(Difractive Optical Element,DOE)、微透镜阵列等。
采集器可以包括由至少一个像素组成的像素单元、过滤单元和接收光学元件,接收光学元件将目标反射的激光光束成像到像素阵列上,过滤单元用于滤除背景光和杂散光,所述像素可以是APD、SiPM、SPAD、CCD、CMOS等光电探测器中的一种。在一些实施例中,像素单元可以为用于光飞行时间测量的图像传感器,像素单元也可以集成到用于光飞行时间测量的感光芯片中。在一个典型实施例中,像素单元包括多个SPAD,SPAD可以对入射的单个光子进行响应并输出指示所接收光子在每个SPAD处相应到达时间的光子信号。一般地,采集器还包括有与像素单元连接的信号放大器、时数转换器(TDC)以及数模转换器(ADC)等器件中的一种或多种组成的读出电路(这些电路即可以与像素整合在一起,作为采集器的一部分,也可以作为控制和处理电路的一部分)。
控制和处理电路可以是独立的专用电路,比如深度相机自身具有计算能力的独立电路;也可以包含通用处理电路,比如当该深度相机被集成到如手机、电视、电脑等智能终端中时,终端中的处理器可以执行控制和处理电路的功能。控制和处理电路同时控制发射器和采集器,并根据发射光束与反射光束之间的时间差或相位差计算目标的深度。
其中,对于控制和处理电路执行深度计算的测量原理,典型的是通过直接(DTOF)飞行时间方法,计算脉冲发射时刻与接收时刻间的差值来计算飞行时间t,进一步根据公式D=ct/2计算物体距离。另外,也可以通过间接(ITOF)飞行时间方法,通过求解发射波形与接收波形的相位差来求解飞行时间。或者,还可以是通过发射调制编码的连 续波信号,接收端通过相关匹配等信号处理方法间接求解飞行时间,例如:AMCW调幅连续波,FMCW调频连续波,编码脉冲发射等,上述不同的测距方案均不会影响本方案的实现。
本实施例的另一实现方式中,距离测量装置为机械扫描式LiDAR,距离测量装置还包括机械扫描单元,机械扫描单元可以是振镜、反射镜、MEMS、楔形镜以及旋转电机等。在一个典型的实施例中,光源被配置为点光源或列光源,并且像素单元被配置为单个像素或者像素列,扫描单元包括旋转电机等,用于同步控制发射器和采集器绕垂直地面的旋转轴线进行360度扫描,此时光源同样产生固定的点阵列形式出射,随着整个收发系统的旋转对周边环境进行扫描成像。在另一个典型的实施例中,光源被配置为点光源或列光源,同样产生固定的点阵列形式出射,光源发射的点阵光束经过机械扫描单元投射到物体表面,随着机械扫描单元的旋转对物体进行扫描成像。其中,机械扫描单元包括MEMS反射镜、振镜等,用于接收光源发出的激光光束并进行偏转投射到物体表面上形成固定的点阵列形式。
本实施例的一个实现方式中,双目相机包括左相机和右相机,左相机和右相机均为高分辨率成像相机,在通过双目相机获取第一场景图像和第一稠密深度图时,可以左相机采集第一图像,并同步控制右相机采集第二图像,通过立体匹配算法确定第一图像和第二图像的稠密视差图,基于双目相机的内外参数以及稠密视差图确定第一稠密深度图,再将第一图像或者第二图像作为第一场景图像,以得到第一场景图像和第一稠密深度图。此外,在实际应用中,为了避免提高基于训练样本集训练得到的深度信息补全模型的适用场景,距离测量装置发送的激光光束为红外激光光束,相应的,双目相机的左相机和右相机均采用红外相机,这样使得距离测量装置和双目相机可以获取到暗光条件下的第一稀疏深度图、第一场景图像以及第一稠密深度图。
在本实施例的一个实现方式中,在通过双目相机的左相机获取到第一图像和右相机获取到第二图像后,通过第一图像和第二图像确定第一稠密深度图的过程可以为:将利用双目相机的几何约束对第一图像和第二图像进行去畸变和极限校正,然后根据第一图像和第二图像进行逐像素搜索并计算匹配代价,利用相邻像素信息对匹配代价进行优化,再基于优化后的匹配代价计算各像素的视差以得到视差图,然后在对视差图进行滤波以及空洞填补等去噪处理,以得到第一稠密深度图。
此外,在实际应用中,用于确定稠密视差图的立体匹配算法可以为任意的双目匹配算法,例如,SAD(Sum of absolute differences)算法和SGBM(Stereo Processing by Semiglobal Matching and Mutual Information)算法等;或者是,也可以采用基于深度学习的立体匹配模型,在获取到第一图像和第二图像后,将第一图像和第二图像作为立体匹配模型的输入项,通过立体匹配模型输出第一稠密深度图。
双目相机与距离测量装置时间同步,并且双目相机可以在距离测量装置发射的近红外波段下对发射器投射出的逐点或阵列光斑进行清晰成像。在实际产品使用过程中,可以根据实际使用场景选择对可见光或近红外光进行视觉感知。可以理解的是,图像组中的第一稀疏深度图和第一稠密深度图为同一时间下位置相对固定的双目相机以及距离测量装置采集同一场景信息所得到的。而为了保证距离测量装置与双目相机捕捉的是当前场景相同时刻下的信息,可以基于距离测量装置发射激光光束的指令触发双目相机来实现,或者基于距离测量装置采集到激光信号来触发双目相机,例如,在获取图像组时,可以判断距离测量装置的像素单元是否接收到激光,若是,则触发指令至双目相机,以使得双目相机进行拍摄。当然,在其他实现方式中,也可以通过其他方式实现,在此不做具体限制。
S20、确定若干图像组中的每个图像组中的第一稠密深度图与第一稀疏深度图的图像差异度。
具体地,所述图像差异度用于反映第一稠密深度图与第一稀疏深度图的偏差程度,其中,图像差异度越大,说明第一稠密深度图与第一稀疏深度图的偏差程度越大;反之,图像差异度越小,说明第一稠密深度图与第一稀疏深度图的偏差程度越小。例如,图像组A包括第一稠密深度图a和第一稀疏深度图b,图像组B包括第一稠密深度c和第一稀疏深度图d,第一稠密深度图a与第一稀疏深度图b的图像差异度为d1,第一稠密深度图c与第一稀疏深度图d的图像差异度为d2,当d1大于d2时,说明第一稠密深度图a与第一稀疏深度图b的偏差程度大于第一稠密深度图c与第一稀疏深度图d的偏差程度。
在本实施例中,由于第一稠密深度图通过双目相机获取的,第一稀疏深度图通过距离测量装置获取的,从而第一稠密深度图所处的坐标系与第一稀疏深度图所处的坐标系不相同。基于此,所述确定若干图像组中的每个图像组中的第一稠密深度图与第一稀疏 深度图的图像差异度之前,所述方法还包括:
将若干图像组中的每个图像组中的第一稀疏深度图和第一稠密深度图转换至同一坐标系内。
具体地,坐标系可以是第一稀疏深度图所在的坐标系,也可以是第一稠密深度图所在坐标系等。在本实施例的一个实现方式中,该坐标系为第一稠密深度图所在的坐标系,那么在将第一稀疏深度图和第一稠密深度图转换至同一坐标系时仅需要通过将第一稀疏深度图通过第一旋转平移矩阵映射至第一稠密深度图所在坐标系,这样可以简化第一稀疏深度图和第一稠密深度图转换至同一坐标系内的实现过程且计算量小,第一旋转平移矩阵为距离测量装置和双目相机之间的外参,通过预先标定可以确定。
在本实施例的一个实现方式中,如图3所示,所述确定若干图像组中的每个图像组中的第一稠密深度图与第一稀疏深度图的图像差异度具体包括:
S21、对于若干图像组中的每个图像组,将该图像组中的第一稀疏深度图投影至该图像组中的第一稠密深度图;
S22、在所述第一稠密深度图中选取投影后的第一稀疏深度图中的各稀疏深度点各自对应的图像区域,并基于选取到的所有图像区域确定深度阈值;
S23、基于各稀疏深度点各自对应的稀疏深度值以及所述深度阈值,确定所述第一稀疏深度图与所述第一稠密深度图的图像差异度,以得到各图像组各自对应的图像差异度。
具体地,第一稀疏深度图和第一稠密深度图位于同一坐标系内,第一稀疏深度图和第一稠密深度图至少部分重合,这是由于在通过距离测量装置和双目相机获取第一稀疏深度图和第一稠密深度图时,如图2所示,距离测量装置11与双目相机12的位置是相对固定,并且距离测量装置11的视场角(FOV)1与双目相机12的视场角(FOV)2之间存在重叠部分,其中,距离测量装置11在确定第一稀疏深度图时,其自身配置的光源向标定板发射面阵斑点光束,自身配置的采集器采集面阵斑点光束,每个光斑对应测出一个深度;同时双目相机12也采集面阵斑点光束,左相机采集一幅图像,右相机采集一幅图像,然后左右相机通过立体匹配算法确定第一稠密深度。那么当距离测量装置11的视场角(FOV)1与双目相机12的视场角(FOV)2之间存在重叠部分时,距离测量装置11采集的光斑区域与双目相机12采集的光斑区域会存在重合区域,从而使得通过 距离测量装置确定的第一稀疏深度图与通过双目相机确定的第一稠密深度图之间存在重合区域。例如,在图2中,距离测量装置的视场角(FOV)和距离测量装置的视场角(FOV)部分重合,相应的第一稀疏深度图和第一稠密深度图也部分重合。在一些其他实施例中,距离测量装置的视场角(FOV)和双目相机的视场角(FOV)之间也存在全部重合,相应的第一稀疏深度图和第一稠密深度图也全部重合。通过距离测量装置和双目相机之间的外参,即可将第一稀疏深度图和第一稠密深度图投影至统一坐标系下产生重合区域。
在一个典型实现方式中,距离测量装置与双目相机的位置是相对固定,距离测量装置中的采集器的像素分辨率与双目相机的像素分辨率可以相同,例如,均为640*480。但由于两者采集深度图的原理不同,而导致第一稀疏深度图和第一密集深度图的分辨率不同,当将第一稀疏深度图和第一稠密深度图转化到同一坐标系内时,此时第一稀疏深度图和第一稠密深度图至少存在部分重合区域,即第一稀疏深度图内的深度点与第一稠密图内的深度点重合。其中,重合区域可以为正方形区域,长方形区域等,并且重合区域可以是第一稀疏深度图中的部分图像区域,也可以是第一稀疏深度图的全部图像区域。也就是说,第一稀疏深度图可以部分图像区域与第一稠密深度图重合,也可以全部图像区域均与第一稠密深度图重合。
在本实施例的一个实现方式中,所述在所述第一稠密深度图中选取投影后的第一稀疏深度图中的各稀疏深度点各自对应的图像区域,并基于选取到的所有图像区域确定深度阈值具体包括:
对于投影后的第一稀疏深度图中的每个稀疏深度点,在所述第一稠密深度图中选取该稀疏深度点对应的图像区域,其中,所述图像区域包括该稀疏深度点;
获取选取的所有图像区域所包括的稠密深度点;
计算获取的所有稠密深度点的稠密深度均值,并将所述稠密深度均值作为深度阈值。
具体地,上述选取投影后的第一稀疏深度图中的各稀疏深度点为重合区域内的稀疏深度点,第一稀疏深度点对应的图像区域包含于第一稠密深度图内,并且第一稀疏深度点位于图像区域内。此外,各第一稀疏深度点各自对应的图像区域的区域尺寸可以相同,且均远小于第一稠密深度图的图像尺寸。例如,第一稠密深度图的图像尺寸为640*480,图像区域的区域尺寸为3*3。在一个典型实现方式中,在选取第一稀疏深度点对应的图 像区域时,可以以该第一稀疏深度点为中心,选取预设长度的邻域区间,例如,以第一稀疏深度点为中心,选取长度为3的邻域区间,即选取3*3邻域区间。
在选取到各第一稀疏深度点各自对应的图像区域后,获取各图像区域包含的稠密深度点,然后可以直接计算获取到的所有稠密深度点的稠密深度均值,也可以是去除获取到的所有稠密深度点中的重复稠密深度点,再计算去重后的稠密深度点的稠密深度均值以得到深度阈值。
在选取到稠密深度点后,获取每个选取的稠密深度点的稠密深度值,并计算获取到的所有稠密深度值的平均值以得到深度阈值,其中,深度均值的计算公式可以为:
Figure PCTCN2022080515-appb-000001
其中,
Figure PCTCN2022080515-appb-000002
表示深度阈值,d j为获取的所有稠密深度点图像区域中的稠密深度点的稠密深度值,N为获取的所有稠密深度点的数量。
本实施例通过将投影后重叠区域内的第一稀疏深度点对应的图像区域中的所有稠密深度点的稠密深度值的平均值作为深度阈值,可以整体反应第一稀疏深度图与第一稠密深度图的偏差情况,进而提高后续第一稠密深度图筛选的准确性。在其他实现方式中,第一稀疏深度图中的各稀疏深度点各自对应的深度阈值可以不同,或者是,部分稀疏深度点对应的深度阈值相同,部分稀疏深度点对应的深度阈值不同。例如,在将第一稀疏深度图投影至第一稠密深度图后,对于每个稀疏深度点,均为该稀疏深度点选取预设长度的邻域区域,并确定该邻域区域包括的所有稠密深度点的稠密深度值的平均值,将该平均值作为该稀疏深度点对应的深度阈值等。
在获取到深度阈值后,可以基于深度阈值以及第一稀疏图中的各稀疏深度点各自对应的稀疏深度值来确定第一稀疏深度图与第一稠密深度图的图像差异度,其中,所述图像差异度可以是各稀疏深度点的稀疏深度值与深度阈值的差值的和,也可以是稀疏深度点中稀疏深度值与深度阈值的差值大于预设阈值的数量,还可以是各稀疏深度点的稀疏深度值与深度阈值的差值的平均值等等。
在本实施例的一个实现方式中,所述基于各稀疏深度点各自对应的稀疏深度值以及所述深度阈值,确定所述第一稀疏深度图与所述第一稠密深度图的图像差异度,以得到各图像组各自对应的图像差异度具体包括:
逐点计算各稀疏深度点各自对应的稀疏深度值与所述深度阈值的深度偏差值;
基于获取到的深度偏差值计算所述第一稀疏深度图与所述第一稠密深度图的图像差异度,以得到各图像组各自对应的图像差异度。
具体地,深度偏差值用于反映稀疏深度点的稀疏深度值与深度阈值之间的偏差,其中,深度偏差值等于稀疏深度点的稀疏深度值与深度阈值的差值绝对值。例如,深度阈值为A1,稀疏深度点的稀疏深度值为A2,那么深度偏差值可以为|A1-A2|。此外,在其他实现方式中,深度偏差值可以采用其他方式确定,例如,深度偏差值为稀疏深度点的稀疏深度值与深度阈值的平方差的算是平方根等。
在获取到各稀疏深度点各自对应的深度偏差值后,基于获取到的深度偏差值计算第一稀疏深度图与所述第一稠密深度图的图像差异度,其中,图像差异度可以等于所有深度偏差值的和,也可以是等于所有深度偏差值的平方和的算术平方根。例如,图像差异度的计算公式可以为:
Figure PCTCN2022080515-appb-000003
其中,Δd表示图像差异度,sd表示第一稀疏深度图,
Figure PCTCN2022080515-appb-000004
表示深度阈值,||x|| α表示α范数,用于衡量第一稀疏深度图与深度阈值的偏差.
在一个实现方式中,α的取值可以为1,图像差异度为第一稀疏深度图中各稀疏深度点的稀疏深度值与深度阈值的差值绝对值之和,相应的,图像差异值的计算公式可以为:
Figure PCTCN2022080515-appb-000005
其中,sd i表示第i个稀疏深度点的稀疏深度值,
Figure PCTCN2022080515-appb-000006
表示第i个稀疏深度点对应的深度阈值。
在一个实现方式中,α的取值可以为2,图像差异度为第一稀疏深度图中各稀疏深度点的稀疏深度值与深度阈值的差值绝对值的平方和的算术平方根,相应的,图像差异值的计算公式可以为:
Figure PCTCN2022080515-appb-000007
其中,sd i表示第i个稀疏深度点的稀疏深度值,
Figure PCTCN2022080515-appb-000008
表示第i个稀疏深度点对应的深度阈值。
在本实施例的一个实现方式中,所述基于各稀疏深度点各自对应的稀疏深度值以及 所述深度阈值,确定所述第一稀疏深度图与所述第一稠密深度图的图像差异度,以得到各图像组各自对应的图像差异度具体包括:
逐点计算各稀疏深度点各自对应的稀疏深度值与所述深度阈值的深度偏差值;
确定深度偏差值大于预设阈值的稀疏深度点占所述第一稀疏深度图中的稀疏深度点的比例值,并将所述比例值作为所述第一稀疏深度图与所述第一稠密深度图的图像差异度,以得到各图像组各自对应的图像差异度。
具体地,预设阈值为预先设置的,用于衡量深度偏差值的依据,其中,当深度偏差值大于预设阈值时,说明深度偏差值不符合要求,即说明该深度偏差值对应的稀疏深度点为不符合要求的稀疏深度点;反之,当深度偏差小于或者等于预设阈值时,说明深度偏差值符合要求,即说明该深度偏差值对应的稀疏深度点为符合要求的稀疏深度点。由此,比例值为所有不符合要求的稀疏深度点与第一稀疏深度图包含的所有稀疏深度点的比例,也就是说,在获取到各稀疏深度点各自对应的深度偏差值后,分别将各稀疏深度点各自对应的深度偏差值与预设阈值进行比较,以选取深度偏差值大于预设阈值的稀疏深度点,再将选取到的稀疏深度点的数量与第一稀疏深度图包括的所有稀疏深度点的数量相比得到该比例值。
例如,第一稀疏深度图包括的稀疏深度点的数量为a2,第一稀疏深度图中深度偏差值大于预设阈值的稀疏深度点的数量为a1,a1小于或者等于a2,那么深度偏差值大于预设阈值的稀疏深度点占所述第一稀疏深度图中的稀疏深度点的比例值为a1/a2。
S30、在若干图像组中选取图像差异度满足预设条件的图像组,并将选取到的图像组构成的数据集作为训练样本集。
具体地,所述预设条件为预先设置的,为图像差异度的衡量标准,其中,当图像差异度满足预设条件时,说明该图像组中的第一稀疏深度图和第一稠密深度图的图像偏差程度满足要求,图像组可以作为训练样本;反之,当图像差异度不满足预设条件时,说明该图像组中的第一稀疏深度图和第一稠密深度图的图像偏差程度不满足要求,图像组不可以作为训练样本。例如,若干图像组包括图像组A、图像组B以及图像组C,其中,图像组A和图像组B各自对应的图像差异度满足预设条件,图像组C对应的图像差异度不满足预设条件,那么图像组A和图像组B可以作为训练样本,图像组C不可以作为训练样本,从而训练样本集包括图像组A和图像组B。
所述预设条件可以基于图像差异度的确定方式而确定,例如,当图像差异度为基于获取到的深度偏差值计算所述第一稀疏深度图与所述第一稠密深度图的图像差异度确定的时,所述预设条件可以为图像差异度小于偏差阈值,也就是说,当图像差异度小于偏差阈值时,说明图像差异度满足预设条件,反之,当图像差异度大于或者等于偏差阈值时,说明图像差异度不满足预设条件;又如,当图像差异度为基于深度偏差值大于预设阈值的稀疏深度点与第一稀疏深度图中的所有稀疏深度点的比例值时,预设条件可以为比例值小于预设比例阈值,也就是说,当图像差异度小于预设比例阈值时,说明图像差异度满足预设条件,反之,当图像差异度大于或者等于预设比例阈值时,说明图像差异度不满足预设条件。
在本实施例的一个实现方式中,在获取到训练样本集之后,可以对训练样本集进行数据增强处理,以丰富训练样本集中的训练数据,以提高训练样本集的多样性,以提高基于训练样本集训练得到的深度信息补全模型的鲁棒性。基于此,在获取到训练样本集后,所述的训练样本集的获取方法还可以包括:
在所述训练样本集中选取第一数量的图像组;
对于选取到的图像组中的每个图像组,调整该图像组中的第一稀疏深度图中的若干稀疏深度点的稀疏深度值,得到该图像组对应的第一增强图像组;
将获取到的所有第一增强图像组添加到训练样本集中,并将添加得到的训练样本集作为训练样本集。
距离测量装置采集的第一稀疏深度图中的稀疏深度值容易受到环境光、电路本身的噪声、热噪声等影响导致测量结果存在一定范围的抖动误差,由此通过对稀疏深度值增加抖动误差获取包含增强深度值的增强深度图,输入到深度补全模型进行训练时提高模型对测量误差的适应性以及鲁棒性。
具体地,所述第一数量可以为预先设定的,也可以是基于训练样本集包括的训练样本的数量确定的,并且第一数量小于或者等于训练样本集包括的训练样本的数量,例如,第一数量为训练样本集包括的训练样本的数量的一半,又如,第一数量等于训练样本集包括的训练样本的数量。在一个典型实现方式中,第一数量等于训练样本集包括的训练样本的数量,这样可以对训练样本集中的每个训练样本均进行增强。
在选取到第一数量的图像组后,对于选取的图像组中的每个图像组中的第一稀疏深 度图,在该第一稀疏深度图中选取若干稀疏深度点,其中,选取到的若干稀疏深度点的数量小于获取等于第一稀疏深度图中的所有稀疏深度点的数量。例如,第一稀疏深度图中的所有稀疏深度点的数量为100,那么选取到的若干稀疏深度点的数量小于或者等于100。本实施例通过对第一稀疏深度图中的部分稀疏深度点或全部稀疏深度点的稀疏深度值进行调整来生成第一稀疏深度图对应的增强稀疏深度图,然后将第一稀疏深度图对应的增强稀疏深度图,第一稀疏深度图对应的第一场景图像以及第一稠密深度图作为第一增强图像组添加到训练样本集中,以使得训练样本集包括该第一稀疏深度图、第一稀疏深度图对应的第一场景图像以及第一稠密深度图构成的图像组以及第一稀疏深度图对应的增强稀疏深度图,第一稀疏深度图对应的第一场景图像以及第一稠密深度图构成的第一增强图像组,进而丰富了训练样本集中的训练数据。
此外,在实际应用中,在通过对图像组中的第一稀疏深度图中的稀疏深度点的稀疏深度值进行调整来确定增强稀疏深度图时,可以通过对不同稀疏深度点的调整来确定多张增强稀疏深度图,也可以通过采用不同的调整值进行调整来确定多张增强稀疏深度图。可以理解的是,距离测量装置中引起抖动误差的影响因素较多,因此抖动误差的值并不唯一,可以通过实验测量确定也可以采用理论推导的方式计算,在本发明中不做具体限制。例如,在通过对图像组中的第一稀疏深度图中的稀疏深度点的稀疏深度值进行调整来确定增强稀疏深度图时,可以通过对第一稀疏深度图中的每个稀疏深度点依次进行调整,得到多张增强稀疏深度图。又如,在通过对图像组中的第一稀疏深度图中的稀疏深度点的稀疏深度值进行调整来确定增强稀疏深度图时,可以采用多个不同的调整值对第一稀疏深度图中的稀疏深度点的稀疏深度值进行调整,得到多张增强稀疏深度图等。
在本实施例的一个实现方式中,调整该图像组中的第一稀疏深度图中的若干稀疏深度点的稀疏深度值可以采用在若干稀疏深度点中的每个稀疏深度点的稀疏深度值上加上预设调整值的调整方法,也可以是在若干稀疏深度点中的各稀疏深度点的稀疏深度值上加上各稀疏深度点各自对应的深度调整值的调整方法,其中,各稀疏深度点各自对应的深度调整值可以根据各稀疏深度点的稀疏深度值确定的,例如,各稀疏深度点各自对应的深度调整值可以根据各稀疏深度点的稀疏深度值的百分之一等。
本实施通过在稀疏深度点的稀疏深度值上增加调整值的方式来对第一稀疏深度图进行测量误差增强,可以使得训练样本集中可以包括具有测量误差的训练样本,以使得 基于训练样本集训练得到的深度信息补全模型对距离测量装置的测量误差的适应性和鲁棒性,从而使得训练得到的深度信息补全模型可以适用距离测量装置在存在环境光噪声、电路热噪声以及电路读出噪声等噪声的应用场景中采集到的稀疏深度图。
在本实施例的一个实现方式中,预设网络模型的输入数据为第一稀疏深度图和第一场景图,距离测量装置采集的第一稀疏深度图与第一场景图像根据旋转平移矩阵映射至同一坐标中进行对齐配准,在本申请所述的方案中,旋转平移矩阵即为距离测量装置与双目相机中一个相机的外参。其中,系统外参通过离线生产标定或者在线实时标定来确定与校准的,然而,校准结果本身会存在一定误差,同时在实际使用过程中,特别是动态使用场景,距离测量装置和/或双目相机会存在结构抖动,从而会使得基于标定的外参映射至同一坐标系中的稀疏深度图和场景图像之间存在空间配准误差。由此,为了提高基于训练样本集训练得到的深度信息补全模型对具有空间配准误差的稀疏深度图和场景图像的处理效果,在获取到训练样本集后,可以在训练样本集中的第一稀疏深度图和第一场景图像对齐配准的映射过程中增加噪声抖动,以使得训练样本集携带有具有空间配准误差的训练样本,进而可以提高深度信息补全模型对具有空间配准误差的稀疏深度图和场景图像的鲁棒性。
基于此,所述训练样本集的获取方法还可以包括:
在训练样本集中选取第二数量的图像组;
对于选取到的图像组中的每个图像组,获取将图像组中的第一稀疏深度图映射至图像组中的第一场景图像所在坐标系的旋转平移矩阵,其中,所述旋转平移矩阵为所述距离测量装置相对于所述双目相机中用于获取第一场景图像的相机的旋转平移矩阵;
调整所述旋转平移矩阵,并基于调整后的旋转平移矩阵将所述第一稀疏深度图映射至所述第一场景图像所在坐标系,得到该图像组对应的第二增强图像组;
将获取到的所有第二增强图像组添加到训练样本集中,并将添加得到的训练样本集作为训练样本集。
具体地,所述第二数量可以为预先设定的,也可以是基于训练样本集包括的训练样本的数量确定的,并且第二数量小于或者等于训练样本集包括的训练样本的数量,例如,第二数量为训练样本集包括的训练样本的数量的一半,又如,第二数量等于训练样本集包括的训练样本的数量。在一个典型实现方式中,第二数据等于训练样本集包括的训练 样本的数量,这样可以对训练样本集中的每个训练样本均进行增强。
在选取到第二数据的图像组后,对于选取的图像组中的每个图像组,确定第一稀疏深度图映射至图像组中的第一场景图像所在坐标系的旋转平移矩阵,旋转平移矩阵用于将第一稀疏深度图映射至第一场景图像所在坐标系。在一个实现方式中,第一场景图像为双目相机中一个相机拍摄得到,则将距离测量装置与该相机作为一个融合系统,利用标定算法进行距离测量装置和所述相机的外参标定,得到距离测量装置与目标相机之间的旋转平移矩阵,并可以通过该旋转平移矩阵将第一稀疏深度图映射至第一场景图像所在坐标系。
例如,假设距离测量装置测量某一个三维空间点P w(X w,Y w,Z w)得到该三维空间点对应的第一稀疏深度图中的深度点以及对应的深度值,由此,构建出三维空间点投影到距离测量装置坐标系的投影关系。并且,距离测量装置与目标相机之间的旋转平移矩阵设定为[R t],那么三维空间点P w(X w,Y w,Z w)投影到目标相机坐标系的投影过程可以表示为:
Figure PCTCN2022080515-appb-000009
其中,K为目标相机的内参矩阵,R为从距离测量装置所在坐标系到目标相机所在坐标系的旋转矩阵,t为从距离测量装置所在坐标系到目标相机所在坐标系的平移矩阵。
平移矩阵包括x,y,z三个自由度,沿(x,y,z)三个自由度的平移距离可以表示为(t 1,t 2,t 3)。旋转矩阵包括x,y,z三个自由度,沿(x,y,z)三个自由度的旋转矩阵可以通过欧拉角θ(θ xyz)表示,分别表示为:
Figure PCTCN2022080515-appb-000010
相应的,旋转矩阵R可以表示为:
Figure PCTCN2022080515-appb-000011
其中,s i=sinθ i,i=x,y,z,c i=cosθ i,i=x,y,z。
基于此,在调整所述旋转平移矩阵时,可以对旋转矩阵中的三个自由度各自对应的欧拉角进行调整,也可以对平移矩阵中的三个自由度的平移距离进行调整,其中,调整过程可以为对三个自由度的欧拉角和三个自由度的平移距离中的一个参量进行调整,或者是,对三个自由度的欧拉角和三个自由度的平移距离中的一个参量进行调整中的两个参量或者两个以上参量进行调整;或者是,对三个自由度的欧拉角进行调整;或者是,对三个自由度的平移距离进行调整等。具体地调整方式可以根据实际训练需求而进而调整,所有通过调整三个自由度的欧拉角和三个自由度的平移距离的方式来调整训练样本集的方式均属于本申请的保护范围。
此外,旋转矩阵的三个自由度的欧拉角以及平移矩阵的三个自由度的平移距离的调整值的上限值可以根据距离测量装置和相机的结构约束而确定,例如,旋转矩阵的三个自由度的欧拉角以及平移矩阵的三个自由度的平移距离的调整值的上限值可以等于距离测量装置和相机的在各自由度的结构抖动的最大值,或者是根据最大值确定的且小于最大值等。例如,三个自由度的欧拉角为θ(θ xyz),调整后的欧拉角可以表示为θ(θ x±Δθ xy±Δθ yz±Δθ z),其中,Δθ x,Δθ y,Δθ z分别为x,y,z三个自由度的欧拉角的调整值,并且Δθ x的取值范围均为0到x自由度的欧拉角调整最大值,Δθ y的取值范围均为0到y自由度的欧拉角调整最大值,Δθ z的取值范围均为0到z自由度的欧拉角调整最大值。三个自由度的平移距离可以表示为(t 1,t 2,t 3),调整后的平移距离可以表示为(t 1±Δt 1,t 2±Δt 2,t 3±Δt 3),其中,Δt 1,Δt 2,Δt 3分别为x,y,z三个自由度的平移距离的调整值,并且Δt 1的取值范围均为0到x自由度的平移距离调整最大值,Δt 2的取值范围均为0到y自由度的平移距离调整最大值,Δt 3的取值范围均为0到z自由度的平移距离调整最大值。
可以理解的是,在将第一稀疏深度图和第一场景图输入预设网络模型进行训练时需要根据系统外参(旋转平移矩阵)将第一稀疏深度图和第一场景图投影到同一坐标系下用于进行模型的训练,为了有效提高距离测量装置和目标相机之间的抖动引起的误差,通过对旋转平移矩阵进行误差抖动增强,并基于调整后的旋转平移矩阵将所述第一稀疏深度图映射至所述第一场景图像所在坐标系,得到该图像组对应的第二增强图像组,用第二增强图像组与原始图像组一起对模型进行训练,进而丰富了训练样本集中的训练数据,也提高了模型进行深度估计的鲁棒性。
此外,值得说明的是,在对训练样本集进行增强时,可以仅采用调整第一稀疏深度图的稀疏深度值的方式进行增强,或者是,仅采用调整旋转平移矩阵的方式进行增强,或者是,同时采用调整第一稀疏深度图的稀疏深度值的方式进行增强和调整旋转平移矩阵的方式进行增强,也就是说,可以先采用调整第一稀疏深度图的稀疏深度值的方式对训练样本集进行增强,然后再采用调整旋转平移矩阵的方式对增强后的训练样本集进行增强;或者是,先采用调整旋转平移矩阵的方式对训练样本集进行增强,然后再采用调整第一稀疏深度图的稀疏深度值的方式对增强后的训练样本集进行增强等。
综上所述,本实施例通过双目相机以及距离测量装置获取若干图像组,再分别确定各图像组中的第一稠密深度图与第一稀疏深度图的图像差异度;最后在若干图像组中选取图像差异度满足预设条件的图像组,并将选取到的图像组构成的数据集作为训练样本集。本实施例确定的训练样本集中的训练样本是以双目相机获取的第一稠密深度图作为训练真值,可以保证训练真值的可靠性;通道该训练样本中的第一稠密深度图与通过距离测量装置获取的第一稀疏深度图的图像差异度满足预设条件,可以保证第一稠密深度图与第一场景图像的采集场景的匹配度,从而采用本实施例提供的训练样本集对深度信息补全模型进行有监督训练时,可以提高训练得到的深度信息补全模型的模型性能。
基于上述训练样本集的获取方法,本实施例提供了一种深度信息补全模型的训练方法,所述的训练方法应用上述实施例所述的训练样本集的获取方法所获取到的训练样本集;如图4所示,所述的训练方法包括:
N10、将训练样本集中的图像组中的第一稀疏深度图和第一场景图像输入预设网络模型,并通过所述预设网络模型输出所述第一稀疏深度图对应的预测稠密深度图;
N20、基于图像组中的第一稠密深度图以及所述预测稠密深度图,对所述预设网络模型进行训练,以得到深度信息补全模型。
具体地,预测稠密深度图为预设网络模型基于训练样本中的第一稀疏深度图和第一场景图像预测得到的,预测稠密深度图的图像分辨率与第一稠密深度图的分辨率相等,以使得预设稠密深度图中的稠密深度点与第一稠密深度图中的稠密深度点一一对应。例如,第一稠密深度图的分辨率为640*480,那么预测稠密深度图的分辨率为640*480。
所述预设网络模型为预先设置的,预设网络模型与训练得到的深度信息补全模型的模型结构,两者的区别在于预设网络模型的模型参数为初始参数,深度信息补全模型的模型参数为经过训练样本集训练后的模型参数。其中,预设网络模型可以采用基于深度 学习的神经网络模型,例如,卷积神经网络模型、循环神经网络模型、双向循环神经网络模型以及长短期记忆网络模型等。在一个具体实现方式中,预设网络模型采用编解码卷积神经网络,如图5所示,预设网络模型包括编码模块和解码模块,编码模块的输入项为第一稀疏深度图和第一场景图像,编码模块的输出项为深度特征图;解码模块的输入项为深度特征图,输出项为预测稠密深度图。
在通过预设网络模型输入预测稠密深度图后,在基于预测稠密深度图和第一稠密深度图对预设网络模型进行训练时,可以基于预测稠密深度图和第一稠密深度图来计算损失函数,再基于损失函数对预设网络模型进行训练迭代。所述损失函数可以采用逐像素深度的损失函数、深度图梯度的逐像素损失函数以及模型结构损失函数中的一种或者多种加权所形成的损失函数等。
逐像素深度的损失函数可以采用均方差损失函数MSE(Mean Squared Error Loss)、平均绝对误差损失函数MAE(Mean Absolute Error Loss),以及MSE损失函数与MAE损失函数结合的Huber损失函数,其中,MSE损失函数、MAE损失函数以及Huber损失函数的表达式可以为:
Figure PCTCN2022080515-appb-000012
Figure PCTCN2022080515-appb-000013
Figure PCTCN2022080515-appb-000014
其中,N为第一稠密深度图中的稠密深度点的数量,d i为预测稠密深度图中的稠密深度点的稠密深度值,
Figure PCTCN2022080515-appb-000015
为第一稠密深度图中的稠密深度点的稠密深度值,δ为预设的深度偏差阈值。
深度图梯度的逐像素损失函数可以为:
Figure PCTCN2022080515-appb-000016
其中,
Figure PCTCN2022080515-appb-000017
为预测稠密深度图中的稠密深度点在x方向的梯度,
Figure PCTCN2022080515-appb-000018
为第一稠密深度图中的稠密深度点在x方向的梯度,
Figure PCTCN2022080515-appb-000019
为预测稠密深度图中的稠密深度点在y方向的梯度,
Figure PCTCN2022080515-appb-000020
为第一稠密深度图中的稠密深度点在y方向的梯度。
模型结构损失L weight可以采用L1衰减函数或者L2权重衰减函数。相应的,逐像素 深度的损失函数、深度图梯度的逐像素损失函数以及模型结构损失函数的加权确定的缺失函数可以表示为:
L=a*L pixel+b*L grad+c*L weight
其中,L pixel可以为L MSE,L MAE或者L Huber
基于上述训练样本集的获取方法,本实施例提供了一种稠密深度图的获取方法,所述的获取方法应用上述实施例所述的深度信息补全模型的训练方法所得到的深度信息补全模型,如图6所示,所述的获取方法具体包括:
H10、控制距离测量装置获取目标场景的第二稀疏深度图,并同步控制相机获取目标场景的第二场景图像;
H20、将所述第二稀疏深度图以及所述第二场景图像输入所述深度信息补全模型,以得到所述第二场景图像对应的第二稠密深度图。
具体地,相机可以是单目相机,也可以是双目相机中的一个相机,例如,左相机。可以理解的是,运行本实施例提供的稠密深度图的获取方法的电子设备配置有距离测量装置以及单目相机或者双目相机,当电子设备配置有单目相机时,控制距离测量装置和单目相机同步获取目标场景的第二稀疏深度图和第二场景图像,然后通过经过训练的深度信息补全模型确定第二稠密深度图。当电子设备配置有双目相机时,可以控制双目相机中的一个相机以及距离测量装置同步获取目标场景的第二稀疏深度图和第二场景图像,然后通过经过训练的深度信息补全模型确定第二稠密深度图。当然,在实际应用中,为了在暗光条件下也可以获取到第二稠密深度图,相机可以采用红外相机。
基于上述训练样本集的获取方法,本实施例提供了一种训练样本集的获取装置,如图7所示,所述的获取装置包括:
获取模块100,用于获取若干图像组,其中,若干图像组中的每个图像组均包括通过双目相机获取的第一场景图像和所述第一场景图像对应的第一稠密深度图,以及通过距离测量装置获取的所述第一场景图像对应的第一稀疏深度图;
确定模块200,用于分别确定各图像组中的第一稠密深度图与第一稀疏深度图的图像差异度;
选取模块300,用于在若干图像组中选取图像差异度满足预设条件的图像组,并将选取到的图像组构成的数据集作为训练样本集。
基于上述训练样本集的获取方法,本实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现如上述实施例所述的训练样本集的获取方法中的步骤。
基于上述训练样本集的获取方法,本申请还提供了一种终端设备,如图8所示,其包括至少一个处理器(processor)20;显示屏21;以及存储器(memory)22,还可以包括通信接口(Communications Interface)23和总线24。其中,处理器20、显示屏21、存储器22和通信接口23可以通过总线24完成相互间的通信。显示屏21设置为显示初始设置模式中预设的用户引导界面。通信接口23可以传输信息。处理器20可以调用存储器22中的逻辑指令,以执行上述实施例中的方法。
此外,上述的存储器22中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。
存储器22作为一种计算机可读存储介质,可设置为存储软件程序、计算机可执行程序,如本公开实施例中的方法对应的程序指令或模块。处理器20通过运行存储在存储器22中的软件程序、指令或模块,从而执行功能应用以及数据处理,即实现上述实施例中的方法。
存储器22可包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端设备的使用所创建的数据等。此外,存储器22可以包括高速随机存取存储器,还可以包括非易失性存储器。例如,U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等多种可以存储程序代码的介质,也可以是暂态存储介质。
此外,上述训练样本集的获取装置的具体工作过程,存储介质以及终端设备中的多条指令处理器加载并执行的具体过程在上述方法中已经详细说明,在这里就不再一一陈述。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (14)

  1. 一种训练样本集的获取方法,其特征在于,所述的获取方法包括:
    获取若干图像组,其中,若干图像组中的每个图像组均包括通过双目相机获取的第一场景图像和所述第一场景图像对应的第一稠密深度图,以及通过距离测量装置获取的所述第一场景图像对应的第一稀疏深度图;
    分别确定各图像组中的第一稠密深度图与第一稀疏深度图的图像差异度;
    在若干图像组中选取图像差异度满足预设条件的图像组,并将选取到的图像组构成的数据集作为训练样本集。
  2. 根据权利要求1所述训练样本集的获取方法,其特征在于,所述图像组中的第一稀疏深度图和第一稠密深度图为同一时间下位置相对固定的双目相机以及距离测量装置采集同一场景信息所得到的。
  3. 根据权利要求1所述训练样本集的获取方法,其特征在于,所述分别确定各图像组中的第一稠密深度图与第一稀疏深度图的图像差异度具体包括:
    对于若干图像组中的每个图像组,将该图像组中的第一稀疏深度图投影至该图像组中的第一稠密深度图;
    在所述第一稠密深度图中选取投影后的第一稀疏深度图中的各稀疏深度点各自对应的图像区域,并基于选取到的所有图像区域确定深度阈值;
    基于所述各稀疏深度点各自对应的稀疏深度值以及所述深度阈值,确定所述第一稀疏深度图与所述第一稠密深度图的图像差异度,以得到各图像组各自对应的图像差异度。
  4. 根据权利要求3所述训练样本集的获取方法,其特征在于,所述在所述第一稠密深度图中选取投影后的第一稀疏深度图中的各稀疏深度点各自对应的图像区域,并基于选取到的所有图像区域确定深度阈值具体包括:
    对于投影后的第一稀疏深度图中的每个稀疏深度点,在所述第一稠密深度图中选取该稀疏深度点对应的图像区域,其中,所述图像区域包括该稀疏深度点;
    获取选取的所有图像区域所包括的稠密深度点;
    计算获取的所有稠密深度点的稠密深度均值,并将所述稠密深度均值作为深度阈值。
  5. 根据权利要求3所述训练样本集的获取方法,其特征在于,所述基于各稀疏深 度点各自对应的稀疏深度值以及所述深度阈值,确定所述第一稀疏深度图与所述第一稠密深度图的图像差异度,以得到各图像组各自对应的图像差异度具体包括:
    逐点计算各稀疏深度点各自对应的稀疏深度值与所述深度阈值的深度偏差值;
    基于获取到的深度偏差值计算所述第一稀疏深度图与所述第一稠密深度图的图像差异度,以得到各图像组各自对应的图像差异度。
  6. 根据权利要求3所述训练样本集的获取方法,其特征在于,所述基于各稀疏深度点各自对应的稀疏深度值以及所述深度阈值,确定所述第一稀疏深度图与所述第一稠密深度图的图像差异度,以得到各图像组各自对应的图像差异度具体包括:
    逐点计算各稀疏深度点各自对应的稀疏深度值与所述深度阈值的深度偏差值;
    确定深度偏差值大于预设阈值的稀疏深度点占所述第一稀疏深度图中的稀疏深度点的比例值,并将所述比例值作为所述第一稀疏深度图与所述第一稠密深度图的图像差异度,以得到各图像组各自对应的图像差异度。
  7. 根据权利要求1或3所述训练样本集的获取方法,其特征在于,所述分别确定各图像组中的第一稠密深度图与第一稀疏深度图的图像差异度之前,所述方法还包括:
    将若干图像组中的每个图像组中的第一稀疏深度图和第一稠密深度图转换至同一坐标系内。
  8. 根据权利要求1所述训练样本集的获取方法,其特征在于,所述方法还包括:
    在所述训练样本集中选取第一数量的图像组;
    对于选取到的图像组中的每个图像组,调整该图像组中的第一稀疏深度图中的若干稀疏深度点的稀疏深度值,得到该图像组对应的第一增强图像组;
    将获取到的所有第一增强图像组添加到训练样本集中,并将添加得到的训练样本集作为训练样本集。
  9. 根据权利要求1或8所述训练样本集的获取方法,其特征在于,所述方法还包括:
    在训练样本集中选取第二数量的图像组;
    对于选取到的图像组中的每个图像组,获取将图像组中的第一稀疏深度图映射至图像组中的第一场景图像所在坐标系的旋转平移矩阵,其中,所述旋转平移矩阵为所述距离测量装置相对于所述双目相机中用于获取第一场景图像的相机的旋转平移矩阵;
    调整所述旋转平移矩阵,并基于调整后的旋转平移矩阵将所述第一稀疏深度图映射至所述第一场景图像所在坐标系,得到该图像组对应的第二增强图像组;
    将获取到的所有第二增强图像组添加到训练样本集中,并将添加得到的训练样本集作为训练样本集。
  10. 一种深度信息补全模型的训练方法,其特征在于,所述的训练方法应用采用如权利要求1-9任一所述的训练样本集的获取方法所获取到的训练样本集;所述的训练方法包括:
    将训练样本集中的图像组中的第一稀疏深度图和第一场景图像输入预设网络模型,并通过所述预设网络模型输出所述第一稀疏深度图对应的预测稠密深度图;
    基于图像组中的第一稠密深度图以及所述预测稠密深度图,对所述预设网络模型进行训练,以得到深度信息补全模型。
  11. 一种稠密深度图的获取方法,其特征在于,所述的获取方法应用如权利要求10所述的深度信息补全模型的训练方法所得到的深度信息补全模型,所述的获取方法具体包括:
    控制距离测量装置获取目标场景的第二稀疏深度图,并同步控制相机获取目标场景的第二场景图像;
    将所述第二稀疏深度图以及所述第二场景图像输入所述深度信息补全模型,以得到所述第二场景图像对应的第二稠密深度图。
  12. 一种训练样本集的获取装置,其特征在于,所述的获取装置包括:
    获取模块,用于获取若干图像组,其中,若干图像组中的每个图像组均包括通过双目相机获取的第一场景图像和所述第一场景图像对应的第一稠密深度图,以及通过距离测量装置获取的所述第一场景图像对应的第一稀疏深度图;
    确定模块,用于分别确定各图像组中的第一稠密深度图与第一稀疏深度图的图像差异度;
    选取模块,用于在若干图像组中选取图像差异度满足预设条件的图像组,并将选取到的图像组构成的数据集作为训练样本集。
  13. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现如权利要 求1-9任意一项所述的稠密深度图的获取方法中的步骤,以实现如权利要求10所述的深度信息补全模型的训练方法中的步骤,和/或以实现如权利要求11所述的稠密深度图的获取方法中的步骤。
  14. 一种终端设备,其特征在于,包括:处理器、存储器及通信总线;所述存储器上存储有可被所述处理器执行的计算机可读程序;
    所述通信总线实现处理器和存储器之间的连接通信;
    所述处理器执行所述计算机可读程序时实现如权利要求1-9任意一项所述的稠密深度图的获取方法中的步骤,实现如权利要求10所述的深度信息补全模型的训练方法中的步骤,和/或实现如权利要求11所述的稠密深度图的获取方法中的步骤。
PCT/CN2022/080515 2021-08-09 2022-03-13 训练样本集的获取方法、模型训练方法及相关装置 WO2023015880A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110910264.0 2021-08-09
CN202110910264.0A CN113780349B (zh) 2021-08-09 2021-08-09 训练样本集的获取方法、模型训练方法及相关装置

Publications (1)

Publication Number Publication Date
WO2023015880A1 true WO2023015880A1 (zh) 2023-02-16

Family

ID=78837160

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/080515 WO2023015880A1 (zh) 2021-08-09 2022-03-13 训练样本集的获取方法、模型训练方法及相关装置

Country Status (2)

Country Link
CN (1) CN113780349B (zh)
WO (1) WO2023015880A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117115225A (zh) * 2023-09-01 2023-11-24 安徽羽亿信息科技有限公司 一种自然资源智慧综合信息化管理平台
CN117456124A (zh) * 2023-12-26 2024-01-26 浙江大学 一种基于背靠背双目鱼眼相机的稠密slam的方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780349B (zh) * 2021-08-09 2023-07-11 深圳奥锐达科技有限公司 训练样本集的获取方法、模型训练方法及相关装置
CN117201705B (zh) * 2023-11-07 2024-02-02 天津云圣智能科技有限责任公司 一种全景图像的获取方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110148086A (zh) * 2019-04-28 2019-08-20 暗物智能科技(广州)有限公司 稀疏深度图的深度补齐方法、装置及三维重建方法、装置
WO2021013334A1 (en) * 2019-07-22 2021-01-28 Toyota Motor Europe Depth maps prediction system and training method for such a system
CN112330729A (zh) * 2020-11-27 2021-02-05 中国科学院深圳先进技术研究院 图像深度预测方法、装置、终端设备及可读存储介质
CN112541482A (zh) * 2020-12-25 2021-03-23 北京百度网讯科技有限公司 深度信息补全模型训练方法、装置、设备以及存储介质
CN113780349A (zh) * 2021-08-09 2021-12-10 深圳奥锐达科技有限公司 训练样本集的获取方法、模型训练方法及相关装置

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109300151B (zh) * 2018-07-02 2021-02-12 浙江商汤科技开发有限公司 图像处理方法和装置、电子设备
CN109325972B (zh) * 2018-07-25 2020-10-27 深圳市商汤科技有限公司 激光雷达稀疏深度图的处理方法、装置、设备及介质
CN111741283A (zh) * 2019-03-25 2020-10-02 华为技术有限公司 图像处理的装置和方法
CN110308547B (zh) * 2019-08-12 2021-09-07 青岛联合创智科技有限公司 一种基于深度学习的稠密样本无透镜显微成像装置与方法
CN110610486B (zh) * 2019-08-28 2022-07-19 清华大学 单目图像深度估计方法及装置
CN113034562B (zh) * 2019-12-09 2023-05-12 百度在线网络技术(北京)有限公司 用于优化深度信息的方法和装置
CN111680596B (zh) * 2020-05-29 2023-10-13 北京百度网讯科技有限公司 基于深度学习的定位真值校验方法、装置、设备及介质
CN111563923B (zh) * 2020-07-15 2020-11-10 浙江大华技术股份有限公司 获得稠密深度图的方法及相关装置
CN112560875B (zh) * 2020-12-25 2023-07-28 北京百度网讯科技有限公司 深度信息补全模型训练方法、装置、设备以及存储介质
CN113160327A (zh) * 2021-04-09 2021-07-23 上海智蕙林医疗科技有限公司 一种点云补全的实现方法和系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110148086A (zh) * 2019-04-28 2019-08-20 暗物智能科技(广州)有限公司 稀疏深度图的深度补齐方法、装置及三维重建方法、装置
WO2021013334A1 (en) * 2019-07-22 2021-01-28 Toyota Motor Europe Depth maps prediction system and training method for such a system
CN112330729A (zh) * 2020-11-27 2021-02-05 中国科学院深圳先进技术研究院 图像深度预测方法、装置、终端设备及可读存储介质
CN112541482A (zh) * 2020-12-25 2021-03-23 北京百度网讯科技有限公司 深度信息补全模型训练方法、装置、设备以及存储介质
CN113780349A (zh) * 2021-08-09 2021-12-10 深圳奥锐达科技有限公司 训练样本集的获取方法、模型训练方法及相关装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117115225A (zh) * 2023-09-01 2023-11-24 安徽羽亿信息科技有限公司 一种自然资源智慧综合信息化管理平台
CN117115225B (zh) * 2023-09-01 2024-04-30 安徽羽亿信息科技有限公司 一种自然资源智慧综合信息化管理平台
CN117456124A (zh) * 2023-12-26 2024-01-26 浙江大学 一种基于背靠背双目鱼眼相机的稠密slam的方法
CN117456124B (zh) * 2023-12-26 2024-03-26 浙江大学 一种基于背靠背双目鱼眼相机的稠密slam的方法

Also Published As

Publication number Publication date
CN113780349B (zh) 2023-07-11
CN113780349A (zh) 2021-12-10

Similar Documents

Publication Publication Date Title
WO2023015880A1 (zh) 训练样本集的获取方法、模型训练方法及相关装置
CN113538591B (zh) 一种距离测量装置与相机融合系统的标定方法及装置
CN110596721B (zh) 双重共享tdc电路的飞行时间距离测量系统及测量方法
US11328446B2 (en) Combining light-field data with active depth data for depth map generation
CN113538592B (zh) 一种距离测量装置与相机融合系统的标定方法及装置
CN106405572B (zh) 基于空间编码的远距离高分辨率激光主动成像装置及方法
TWI624170B (zh) 影像掃描系統及其方法
CN209028562U (zh) 一种基于无介质空中成像的交互和反馈装置
CN111045029B (zh) 一种融合的深度测量装置及测量方法
US11977167B2 (en) Efficient algorithm for projecting world points to a rolling shutter image
WO2022017366A1 (zh) 一种深度成像方法及深度成像系统
CN102438111A (zh) 一种基于双阵列图像传感器的三维测量芯片及系统
WO2023103198A1 (zh) 一种计算测距系统相对外参的方法、装置和存储介质
JP2018155709A (ja) 位置姿勢推定装置および位置姿勢推定方法、運転支援装置
CN110986816B (zh) 一种深度测量系统及其测量方法
CN113034567A (zh) 一种深度真值获取方法、装置、系统及深度相机
CN202406199U (zh) 一种基于双阵列图像传感器的三维测量芯片及系统
US20240127566A1 (en) Photography apparatus and method, electronic device, and storage medium
WO2022052366A1 (zh) 一种融合的深度测量方法及测量装置
WO2023138697A1 (zh) 基于图像融合激光的雷达系统扫描方法及装置
CN111510700A (zh) 图像采集装置
CN116299341A (zh) 基于tof的双目深度信息获取系统及方法
CN216133412U (zh) 一种距离测量装置与相机融合系统
CN111982071B (zh) 一种基于tof相机的3d扫描方法及系统
CN212471510U (zh) 移动机器人

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE