WO2022241874A1 - 一种红外热成像单目视觉测距方法及相关组件 - Google Patents

一种红外热成像单目视觉测距方法及相关组件 Download PDF

Info

Publication number
WO2022241874A1
WO2022241874A1 PCT/CN2021/099057 CN2021099057W WO2022241874A1 WO 2022241874 A1 WO2022241874 A1 WO 2022241874A1 CN 2021099057 W CN2021099057 W CN 2021099057W WO 2022241874 A1 WO2022241874 A1 WO 2022241874A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
infrared
edge
ranging
loss function
Prior art date
Application number
PCT/CN2021/099057
Other languages
English (en)
French (fr)
Inventor
王建生
刘斌
李港庆
Original Assignee
烟台艾睿光电科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 烟台艾睿光电科技有限公司 filed Critical 烟台艾睿光电科技有限公司
Publication of WO2022241874A1 publication Critical patent/WO2022241874A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C3/00Measuring distances in line of sight; Optical rangefinders
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01JMEASUREMENT OF INTENSITY, VELOCITY, SPECTRAL CONTENT, POLARISATION, PHASE OR PULSE CHARACTERISTICS OF INFRARED, VISIBLE OR ULTRAVIOLET LIGHT; COLORIMETRY; RADIATION PYROMETRY
    • G01J5/00Radiation pyrometry, e.g. infrared or optical thermometry
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01JMEASUREMENT OF INTENSITY, VELOCITY, SPECTRAL CONTENT, POLARISATION, PHASE OR PULSE CHARACTERISTICS OF INFRARED, VISIBLE OR ULTRAVIOLET LIGHT; COLORIMETRY; RADIATION PYROMETRY
    • G01J5/00Radiation pyrometry, e.g. infrared or optical thermometry
    • G01J2005/0077Imaging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20192Edge enhancement; Edge preservation

Definitions

  • the present application relates to the technical field of visual distance measurement, in particular to a method, device, equipment and readable storage medium for infrared thermal imaging monocular vision distance measurement.
  • Visual distance measurement is an essential key technology in the field of automatic driving and infrared precise temperature measurement.
  • visual ranging compared with commonly used lidar and binocular ranging, unsupervised monocular visual ranging has attracted the attention of many researchers due to its low cost, easy deployment, and low process requirements.
  • visible light is not suitable for scenes such as night and fog
  • infrared thermal imaging is a useful supplement.
  • infrared images have disadvantages such as low contrast, wide dynamic range, image discontinuity, low signal-to-noise ratio, and low texture.
  • Visual ranging will lead to problems such as training crashes, low accuracy, and blurred edges of depth images, making it difficult to meet user needs.
  • the purpose of this application is to provide a method, device, equipment and readable storage medium for infrared thermal imaging monocular vision distance measurement, so as to realize high-precision infrared thermal imaging monocular vision distance measurement.
  • An infrared thermal imaging monocular vision ranging method comprising:
  • the loss function of the infrared monocular ranging deep neural network includes edge loss function
  • the edge loss function is a function that performs edge loss constraints according to the difference between the edge feature of the image frame and the edge feature space projection of the adjacent image frame
  • the multi-scale feature extraction layer of the infrared monocular ranging deep neural network includes a BiFPN layer, and the BiFPN layer is used to strengthen feature fusion according to the correlation between disparity maps of different scales.
  • the residual network of the infrared monocular ranging deep neural network adopts a CSPNet network.
  • the infrared monocular ranging deep neural network includes a depth estimation network and an attitude network.
  • the training method of the infrared monocular ranging deep neural network includes:
  • the loss function further includes: a reprojection loss function and an edge smoothness loss function.
  • the loss function is a weighted sum of the reprojection loss function, the edge smoothness loss function, and the edge loss function.
  • An infrared thermal imaging monocular vision ranging device comprising:
  • the data acquisition unit is used to acquire the internal reference matrix of the infrared thermal imager, and the infrared image collected and generated by the infrared thermal imager for the target object;
  • the network inference unit is used to call the pre-trained infrared monocular ranging deep neural network to perform disparity inference on the infrared image according to the internal reference matrix, and obtain a relative disparity map; wherein, the infrared monocular ranging deep neural network
  • the loss function includes an edge loss function, and the edge loss function is a function that performs edge loss constraints according to the difference between the edge feature of the image frame and the edge feature space projection of the adjacent image frame;
  • a depth calculation unit configured to determine an absolute depth according to the relative disparity map, and use the absolute depth as a ranging result.
  • An infrared thermal imaging monocular vision ranging device comprising:
  • the processor is configured to realize the steps of the above-mentioned infrared thermal imaging monocular vision ranging method when executing the computer program.
  • a readable storage medium wherein a computer program is stored on the readable storage medium, and when the computer program is executed by a processor, the steps of the above-mentioned infrared thermal imaging monocular vision ranging method are realized.
  • the method provided in the embodiment of the present application proposes a new loss function for the infrared monocular ranging deep neural network used for disparity inference on infrared images: edge loss function, which is based on the edge of the image frame The difference between the feature and the edge feature space projection of the adjacent image frame is a function of the edge loss constraint.
  • edge loss function which is based on the edge of the image frame
  • the edge of the infrared image is extracted first, and then the edge of the source image and the target image are matched, increasing
  • the discrimination of the pixels on the edge after constraining the network parameters based on the edge loss function, the infrared monocular ranging deep neural network can locate each pixel more accurately, thereby reducing the pixel mismatch during image reprojection and improving the image quality.
  • the update accuracy of the predicted depth and camera estimated pose improves the accuracy of the relative disparity map, thereby further improving the ranging accuracy.
  • the embodiment of the present application also provides an infrared thermal imaging monocular vision distance measuring device, equipment, and a readable storage medium corresponding to the above-mentioned infrared thermal imaging monocular vision distance measurement method, which have the above-mentioned technical effects, and will not be repeated here. repeat.
  • Fig. 1 is the implementation flowchart of a kind of infrared thermal imaging monocular vision ranging method in the embodiment of the present application;
  • Fig. 2 is a kind of infrared monocular ranging deep neural network input infrared image in the embodiment of the present application and the comparative schematic diagram of the relative parallax map of output;
  • Fig. 3 is a schematic diagram of an infrared image using a Laplacian operator to realize the comparison before and after the image edge extraction in the embodiment of the present application;
  • FIG. 4 is a schematic structural diagram of an infrared monocular ranging deep neural network in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a BiFPN structure in the embodiment of the present application.
  • Fig. 6 is a schematic diagram of an original image in the embodiment of the present application.
  • FIG. 7 is a schematic diagram of a relative disparity map generated after feature extraction by a multi-scale feature extraction layer without BiFPN in the embodiment of the present application;
  • FIG. 8 is a schematic diagram of a relative disparity map generated after feature extraction by a multi-scale feature extraction layer with BiFPN in the embodiment of the present application;
  • FIG. 9 is a schematic structural diagram of a PAnet in an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a basic module of a Resnet18 in the embodiment of the present application.
  • FIG 11 is a schematic diagram of an improved Resnet18 basic module in the embodiment of the present application.
  • Fig. 12 is a schematic structural diagram of an infrared thermal imaging monocular vision ranging device in an embodiment of the present application.
  • Fig. 13 is a schematic structural diagram of an infrared thermal imaging monocular vision ranging device in an embodiment of the present application.
  • the core of the present application is to provide a method of infrared thermal imaging monocular vision distance measurement, which can realize high-precision infrared thermal imaging monocular vision distance measurement.
  • FIG. 1 is a flow chart of a method for infrared thermal imaging monocular vision ranging in an embodiment of the present application. The method includes the following steps:
  • the internal reference matrix of the infrared thermal imager and the infrared image collected and generated by the infrared thermal imager for the target object.
  • a calibration method for the internal reference matrix of an infrared thermal imager is introduced here.
  • the internal reference matrix of the thermal imager Specifically, the infrared thermal imager can be activated in an indoor environment. After the system reaches a thermal equilibrium state, a special multi-circular hole aluminum plate is placed about 1 to 3 meters away from the infrared thermal imager to collect multiple infrared images.
  • the general-purpose technology can calibrate the internal reference matrix K of the infrared thermal imager.
  • the above-mentioned generation method of the internal reference matrix is used as an example for introduction, other calculation and generation methods may refer to the introduction of this embodiment, and will not be repeated here.
  • the infrared monocular ranging deep neural network called in this application is mainly used to perform parallax reasoning on infrared images and generate a relative parallax map. As shown in Figure 2, it is an infrared image input and output of an infrared monocular ranging deep neural network. Schematic diagram of the comparison of the relative disparity maps.
  • the loss function of the common infrared monocular ranging deep neural network includes: one of the reprojection loss function, the edge smoothness loss function, etc.
  • One or more types, and the loss function can be configured according to actual needs.
  • the current photometric loss function based on grayscale loss and SSIM can no longer meet the needs of infrared image monocular ranging.
  • the principle of using image reconstruction as a supervisory signal is to obtain The pixels of the image are used to find the pixels of the target image to calculate the loss according to the predicted depth and camera pose, thereby updating the depth and camera pose to obtain accurate depth and camera pose.
  • Visible light images have pixel values of three channels of RGB and rich detail textures, so it is relatively easy to find the pixels of the source image in the target image.
  • the infrared image lacks texture.
  • the program reads the infrared image, its RGB three channels have the same pixel value. Therefore, when looking for the same pixel of the source image in the target image, it is easy to use the same pixel value nearby.
  • the pixel is treated as the same pixel of the source image, which will lead to wrong depth and pose estimation.
  • an edge loss function for infrared images is proposed in this application to enhance the edge features of the image and avoid the weakening or even disappearance of edge features.
  • the edge loss function is a function that performs edge loss constraints according to the difference between the edge feature of the image frame and the edge feature space projection of the adjacent image frame. Under the constraints of this loss function, the edge of the infrared image is extracted first, and then the edge of the source image and the target image are matched. Since the pixels of the edge are easy to distinguish, they can be positioned more accurately, thereby updating the image prediction depth and The camera estimates the pose.
  • L e
  • edge represents the edge feature of the image
  • Trans represents the spatial projection, ensuring that the two images are aligned in space
  • I t represents the image frame at time t
  • I t+1 represents the image frame at time t+1
  • L e Represents the edge feature value
  • Trans KT t ⁇ t+1 D t(p) K -1
  • K is the internal reference matrix
  • T t ⁇ t+1 represents the pose between the image frame at time t and the image frame at time t+1 Change
  • D t(p) is the depth of point p (a pixel in the image frame at time t) estimated by the depth network.
  • the edge features are enhanced through the constraint of the edge loss function provided by this embodiment, so as to ensure that the edge features do not disappear.
  • FIG 3 is a schematic diagram of an infrared image before and after image edge extraction using the Laplacian operator. It can be seen that under the constraints of the edge loss function proposed in this embodiment, the edge features of the image are obvious and the edge features are highly preserved.
  • the absolute depth is used as the ranging result of the acquired infrared image.
  • edge loss function is a function of edge loss constraint based on the difference between the edge feature of the image frame and the edge feature space projection of the adjacent image frame , under the constraints of this loss function, the edge of the infrared image is extracted first, and then the edge of the source image and the target image are matched to increase the discrimination of the edge pixels.
  • the infrared single The deep neural network of visual distance can locate each pixel more accurately, thereby reducing the pixel mismatch during image reprojection, improving the update accuracy of image prediction depth and camera estimated pose, and improving the accuracy of relative disparity map, thereby further Improve ranging accuracy.
  • the embodiments of the present application also provide corresponding improvement solutions.
  • the same steps as those in the above embodiment or corresponding steps can be referred to each other, and the corresponding beneficial effects can also be referred to each other, and will not be repeated in the preferred/improved embodiment herein.
  • the infrared monocular ranging deep neural network may specifically include: a depth estimation network and a pose network, wherein the depth estimation network is used to estimate the depth of each frame of pictures, and the pose network is used to estimate the pose of the camera between two frames of pictures Variety.
  • FIG. 4 A schematic diagram of a network structure under the above network composition is shown in Figure 4.
  • a network training method is as follows:
  • I t is the target image, that is, the first image
  • I s is the previous or next frame of I t , that is, the second image.
  • the pixel projection relationship formula between adjacent images is called to reconstruct the first image to obtain the reconstructed first image
  • image reconstruction is performed on the first image by calling the pixel projection relationship formula between adjacent images according to the pose change matrix and the depth map, to obtain the reconstructed first image.
  • the reconstructed image The difference between the real image It and the real image It constitutes the supervision signal of the training process, and the training will By continuously approaching I t , D t that is closer to the true value can be obtained.
  • the loss function proposed for the infrared monocular ranging deep neural network needs to include the edge loss function proposed in this application, and in addition to the edge loss function, it can also further include one or more losses of other types Functions, such as reprojection loss functions, etc.
  • this embodiment proposes that the reprojection loss function and the edge smoothing loss function can be further set, so as to better measure the reprojection error and eliminate the disparity map.
  • Noise, the optimization constraints of network parameters are carried out according to the reprojection loss function, edge smoothing loss function and edge loss function at the same time.
  • the reprojection loss function can also be composed of two parts, namely the structural similarity measurement function and the L1 norm loss function.
  • the expression is as follows:
  • is the average value, which is the average value of image pixels in the loss function
  • ⁇ x is the average value in the x direction
  • ⁇ y is the average value in the Y direction
  • is the variance
  • ⁇ x is the variance in the x direction
  • ⁇ y is the variance in the y direction
  • ⁇ xy is the overall variance in the x and y directions, which also represents the variance of the pixel in the loss function
  • C is a constant used to maintain stability.
  • the value range of SSIM is from 0 to 1, the more similar the two images are, the closer the value of SSIM is to 1.
  • the value of the hyperparameter ⁇ here can be specifically selected as 0.85.
  • the edge smoothing loss is used to eliminate the noise in the disparity map, the depth gradient perception item makes the output disparity map smoother, and the edge perception item encourages the model to learn better object edge information.
  • L s is the loss function
  • d t is the depth map
  • the final loss function of infrared monocular ranging deep neural network is a combination of reprojection loss function, edge smoothness loss function and edge loss function.
  • the combination of the three loss functions is not limited in this embodiment.
  • the automatic masking loss function can be used as the weighting value, and the automatic masking loss function
  • the infrared monocular ranging deep neural network can be trained for 20 rounds using the Adam gradient descent method, with a batch size of 12 and an input/output resolution of 640*192. For the first 15 rounds, the learning rate is 10 -4 , and for the remaining 5 rounds, the learning rate is reduced to 10 -5 and then the infrared monocular ranging deep neural network training is completed.
  • the method of upsampling the disparity maps of four different scales to the resolution of the original image is usually used to calculate the loss. This method ignores the Correlations between different scales. Based on this, this embodiment proposes a method that can improve the effect of feature extraction.
  • the BiFPN layer can be set in the multi-scale feature extraction layer of the infrared monocular ranging deep neural network, and the BiFPN layer can be added in the multi-scale feature extraction layer, as shown in Figure 5, which is a schematic diagram of a BiFPN structure, and the BiFPN layer uses
  • Figure 5 is a schematic diagram of a BiFPN structure, and the BiFPN layer uses
  • the BiFPN layer strengthens the effect of feature fusion between different scales through downsampling, upsampling and cross-linking, which can make better use of the correlation between disparity maps of different scales , to solve the problem of blurred edges of parallax images.
  • Figure 6 is a schematic diagram of an original image
  • Figure 7 is a schematic diagram of a relative disparity map generated after feature extraction by a multi-scale feature extraction layer without BiFPN
  • Figure 8 is a schematic diagram of a multi-scale feature extraction layer with BiFPN Schematic diagram of the relative disparity map generated after feature extraction in the feature extraction layer.
  • BiFPN is improved on the basis of PAnet, and a structure of PAnet is shown in Figure 9.
  • the BiFPN layer is called to strengthen the feature fusion according to the correlation between disparity maps of different scales, those nodes with only one input edge are first deleted. If a node has only one input edge and no feature fusion, it will contribute less to the feature network which aims to fuse different features. This results in a simplified bidirectional network. Second, if the original input and output nodes are at the same level, an extra edge is added between the original input and output nodes to fuse more features without increasing the cost.
  • each bidirectional (top-down and bottom-up) path is considered as a feature network layer, and the same layer is repeated Multiple times to enable more advanced feature fusion.
  • a common approach is to first resize them to the same resolution and then pool them.
  • the Pyramid Attention Network introduces global self-attention upsampling to recover pixel localization. All previous methods treat all input features equally, without distinction. However, the applicant found that since different input feature maps have different resolutions, they usually contribute unevenly to the output feature map. To solve this problem, an additional weight can be added to each input and let the network learn the importance of each input function. Specifically, a weighted fusion method is proposed in this embodiment:
  • the fast normalized fusion method looks like this:
  • each normalization weight also has a value between 0 and 1.
  • P 3 td is the intermediate feature of level 3 on the top-down path
  • P 3 in is the input feature of level 3
  • P 4 in is the input feature of level 4
  • P 2 out is the The output feature of level 3
  • w 1 is the weight of level 1
  • w 2 is the weight of level 2
  • w 3 is the weight of level 3 Weights. All other features are constructed in a similar manner. It is worth noting that, in order to further improve the efficiency, depthwise separable convolution is used for feature fusion in this embodiment, and batch normalization and activation are added after each convolution.
  • bidirectional feature pyramid network (BiFPN) is applied to feature fusion of multi-scale disparity maps, which can solve the problem of edge blurring of disparity maps.
  • the residual network in the current general-purpose monocular visual ranging network model is usually built on the basis of the Resnet18 network.
  • a basic module of Resnet18 is shown in Figure 10. After practice, the applicant found that building a residual network based on Resnet18 requires a large amount of calculation, which will further lead to a large overall complexity, high requirements for application equipment, and low calculation efficiency.
  • this embodiment proposes not to simply use the Resnet18 network as the depth and pose estimation network (a residual network), but to use the CSPNet network to improve the original Resnet18 network.
  • the improved basic module of Resnet18 is shown in Figure 11.
  • the main working idea of CSPNet is to split the feature map into two parts, one part performs convolution operation, and the other part is directly spliced with the result of the previous part convolution operation.
  • Using CSPNet as the residual network can greatly reduce the amount of calculation and reduce the memory cost, and the use of this structure can enhance the learning ability of CNN, reduce the complexity of the network, and can maintain the accuracy of calculation while reducing the weight of the system.
  • Algorithms are transplanted to low-end AI processing chips.
  • the embodiment of the present application also provides an infrared thermal imaging monocular vision distance measuring device, the infrared thermal imaging monocular vision distance measuring device described below and the infrared thermal imaging monocular vision distance measuring device described above
  • the ranging methods can be referred to in correspondence with each other.
  • the device includes the following modules:
  • the data acquisition unit 110 is mainly used to acquire the internal reference matrix of the infrared thermal imager, and the infrared image collected and generated by the infrared thermal imager for the target object;
  • the network inference unit 120 is mainly used to call the pre-trained infrared monocular ranging deep neural network to perform disparity inference on the infrared image according to the internal reference matrix to obtain a relative disparity map; wherein, the loss function of the infrared monocular ranging deep neural network includes edge A loss function, the edge loss function is a function that performs edge loss constraints according to the difference between the edge feature of the image frame and the edge feature space projection of the adjacent image frame;
  • the depth calculation unit 130 is mainly used to determine the absolute depth according to the relative disparity map, and use the absolute depth as the ranging result.
  • the embodiment of the present application also provides an infrared thermal imaging monocular vision ranging device, an infrared thermal imaging monocular vision ranging device described below and an infrared thermal imaging described above Imaging monocular vision ranging methods can be referred to each other.
  • the infrared thermal imaging monocular vision ranging equipment includes:
  • the processor is configured to implement the steps of the infrared thermal imaging monocular vision ranging method in the above method embodiment when executing the computer program.
  • FIG. 13 is a schematic structural diagram of an infrared thermal imaging monocular vision distance measuring device provided in this embodiment.
  • the infrared thermal imaging monocular vision distance measurement device may produce relatively large may include one or more processors (central processing units, CPU) 322 (for example, one or more processors) and memory 332, and memory 332 stores one or more computer application programs 342 or data 344.
  • the storage 332 may be a short-term storage or a persistent storage.
  • the program stored in the memory 332 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the data processing device.
  • the processor 322 may be configured to communicate with the memory 332 , and execute a series of instruction operations in the memory 332 on the infrared thermal imaging monocular vision ranging device 301 .
  • the infrared thermal imaging monocular vision ranging device 301 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input and output interfaces 358, and/or, one or more operating systems 341.
  • the steps in the infrared thermal imaging monocular vision distance measuring method described above can be realized by the structure of the infrared thermal imaging monocular vision distance measurement device.
  • the embodiment of the present application also provides a readable storage medium.
  • the readable storage medium described below can correspond to the infrared thermal imaging monocular vision ranging method described above. refer to.
  • a readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the infrared thermal imaging monocular vision ranging method in the above method embodiment are realized.
  • the readable storage medium can be a USB flash drive, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, etc., which can store program codes.
  • readable storage media can be a USB flash drive, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, etc., which can store program codes.
  • readable storage media can be a USB flash drive, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, etc., which can store program codes.
  • readable storage media can be a USB flash drive, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM),

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Electromagnetism (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Image Processing (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

一种红外热成像单目视觉测距方法,该方法针对用于对红外图像进行视差推理提出了一项新的损失函数:边缘损失函数,该边缘损失函数为根据图像帧的边缘特征与相邻图像帧的边缘特征空间投影间的差值进行边缘损失约束的函数,在该损失函数的约束下,先提取红外图像的边缘再将源图像和目标图像的边缘进行匹配,红外单目测距深度神经网络可以比较精准的定位各像素点,从而可以降低图像重投影时的像素误匹配,提升图像预测深度和相机估计姿态的更新准确度,提升相对视差图的精准度,从而进一步提升测距精准度。本发明还公开了一种红外热成像单目视觉测距装置、设备及可读存储介质,具有相应的技术效果。

Description

一种红外热成像单目视觉测距方法及相关组件
本申请要求于2021年5月18日提交中国专利局、申请号为202110541321.2、申请名称为“一种红外热成像单目视觉测距方法及相关组件”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视觉测距技术领域,特别是涉及一种红外热成像单目视觉测距方法、装置、设备及可读存储介质。
背景技术
视觉测距在自动驾驶及红外精准测温领域是必不可少的关键技术。在视觉测距中,与常用的激光雷达和双目测距相比,无监督单目视觉测距由于其低成本、便于部署、工艺要求低等特点,吸引了众多研究者的注意。
虽然基于可见光单目视觉测距取得了重大发展,然而在夜间、雾天等场景,可见光并不适用,而红外热成像是有益的补充。但与可见光图像相比,红外图像有对比度低、动态范围宽、图像不连续、信噪比低、低纹理等缺点,简单地将可见光的单目视觉测距算法应用于红外热成像的单目视觉测距,会导致训练崩溃、精度低、深度图像边缘模糊等问题,难以满足用户需求。
综上所述,如何实现高精准的红外热成像单目视觉测距,是目前本领域技术人员急需解决的技术问题。
申请内容
本申请的目的是提供一种红外热成像单目视觉测距方法、装置、设备及可读存储介质,以实现高精准的红外热成像单目视觉测距。
为解决上述技术问题,本申请提供如下技术方案:
一种红外热成像单目视觉测距方法,包括:
获取红外热成像仪的内参矩阵,以及所述红外热成像仪针对目标对象 采集生成的红外图像;
调用预训练的红外单目测距深度神经网络根据所述内参矩阵对所述红外图像进行视差推理,得到相对视差图;其中,所述红外单目测距深度神经网络的损失函数中包括边缘损失函数,所述边缘损失函数为根据图像帧的边缘特征与相邻图像帧的边缘特征空间投影间的差值进行边缘损失约束的函数;
根据所述相对视差图确定绝对深度,并将所述绝对深度作为测距结果。
可选地,所述红外单目测距深度神经网络的多尺度特征提取层中包括BiFPN层,所述BiFPN层用于根据不同尺度视差图之间的关联性加强特征融合。
可选地,所述红外单目测距深度神经网络的残差网络采用CSPNet网络。
可选地,所述红外单目测距深度神经网络包括深度估计网络以及姿态网络。
可选地,所述红外单目测距深度神经网络的训练方法包括:
获取所述红外热成像仪采集生成的连续图像,第一图像以及第二图像;其中,所述第二图像为第一图像的相邻帧图像;
调用所述深度估计网络对所述第一图像进行深度计算,得到深度图;
调用所述姿态网络对所述第一图像以及所述第二图像进行位姿变化计算,得到位姿变化矩阵;
根据所述位姿变化矩阵以及所述深度图调用相邻图像间像素投影关系公式对所述第一图像进行图像重构,得到重构第一图像;
根据所述重构第一图像以及所述第一图像进行网络训练。
可选地,所述损失函数中还包括:重投影损失函数以及边缘平滑度损失函数。
可选地,所述损失函数为所述重投影损失函数、所述边缘平滑度损失函数以及所述边缘损失函数的加权和。
一种红外热成像单目视觉测距装置,包括:
数据获取单元,用于获取红外热成像仪的内参矩阵,以及所述红外热成像仪针对目标对象采集生成的红外图像;
网络推理单元,用于调用预训练的红外单目测距深度神经网络根据所述内参矩阵对所述红外图像进行视差推理,得到相对视差图;其中,所述红外单目测距深度神经网络的损失函数中包括边缘损失函数,所述边缘损失函数为根据图像帧的边缘特征与相邻图像帧的边缘特征空间投影间的差值进行边缘损失约束的函数;
深度计算单元,用于根据所述相对视差图确定绝对深度,并将所述绝对深度作为测距结果。
一种红外热成像单目视觉测距设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行所述计算机程序时实现上述红外热成像单目视觉测距方法的步骤。
一种可读存储介质,所述可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述红外热成像单目视觉测距方法的步骤。
本申请实施例所提供的方法,针对用于对红外图像进行视差推理的红外单目测距深度神经网络提出了一项新的损失函数:边缘损失函数,该边缘损失函数为根据图像帧的边缘特征与相邻图像帧的边缘特征空间投影间的差值进行边缘损失约束的函数,在该损失函数的约束下,先提取红外图像的边缘,然后将源图像和目标图像的边缘进行匹配,增加边缘的像素点的区分度,基于该边缘损失函数进行网络参数约束后,红外单目测距深度神经网络可以比较精准的定位各像素点,从而可以降低图像重投影时的像素误匹配,提升图像预测深度和相机估计姿态的更新准确度,提升相对视差图的精准度,从而进一步提升测距精准度。
相应地,本申请实施例还提供了与上述红外热成像单目视觉测距方法相对应的红外热成像单目视觉测距装置、设备和可读存储介质,具有上述技术效果,在此不再赘述。
附图说明
为了更清楚地说明本申请实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员 来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例中一种红外热成像单目视觉测距方法的实施流程图;
图2为本申请实施例中一种红外单目测距深度神经网络输入的红外图像与输出的相对视差图的对比示意图;
图3为本申请实施例中一种使用Laplacian算子实现图像边缘的提取前后对比的红外图像示意图;
图4为本申请实施例中一种红外单目测距深度神经网络的结构示意图;
图5为本申请实施例中一种BiFPN结构示意图;
图6为本申请实施例中一种原始图像示意图;
图7为本申请实施例中一种无BiFPN的多尺度特征提取层进行特征提取后生成的相对视差图示意图;
图8为本申请实施例中一种有BiFPN的多尺度特征提取层进行特征提取后生成的相对视差图示意图;
图9为本申请实施例中一种PAnet的结构示意图;
图10为本申请实施例中一种Resnet18的一个基本模块示意图;
图11为本申请实施例中一种改进之后的Resnet18基本模块示意图;
图12为本申请实施例中一种红外热成像单目视觉测距装置的结构示意图;
图13为本申请实施例中一种红外热成像单目视觉测距设备的结构示意图。
具体实施方式
本申请的核心是提供一种红外热成像单目视觉测距方法,可以实现高精准的红外热成像单目视觉测距。
为了使本技术领域的人员更好地理解本申请方案,下面结合附图和具体实施方式对本申请作进一步的详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本 领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参考图1,图1为本申请实施例中一种红外热成像单目视觉测距方法的流程图,该方法包括以下步骤:
S101、获取红外热成像仪的内参矩阵,以及红外热成像仪针对目标对象采集生成的红外图像;
获取红外热成像仪的内参矩阵,以及该红外热成像仪针对目标对象采集生成的红外图像。本实施例中对于内参矩阵的生成方式不做限定,可以参照相关技术,为加深理解,在此介绍一种红外热成像仪的内参矩阵标定方法,可以利用特制的多圆形孔铝板标定出红外热成像仪的内参矩阵。具体地,可以在室内环境,启动红外热成像仪,待系统达到热平衡状态后,将特制的多圆形孔铝板放置于距红外热成像仪1到3米左右,采集多张红外图像,利用目前通用的技术即可标定出红外热成像仪的内参矩阵K。本实施例中仅以上述内参矩阵的生成方式为例进行介绍,其它计算以及生成方式均可参照本实施例的介绍,在此不再赘述。
将标定出内参矩阵K的红外热像仪安装到待测距的设备上(比如汽车)上,启动红外热成像仪,待系统达到热平衡状态后,针对目标对象进行图像采集,获取红外热像仪生成红外图像。
S102、调用预训练的红外单目测距深度神经网络根据内参矩阵对红外图像进行视差推理,得到相对视差图;
本申请中调用的红外单目测距深度神经网络主要用于对红外图像进行视差推理,生成相对视差图,如图2所示为一种红外单目测距深度神经网络输入的红外图像与输出的相对视差图的对比示意图。
而本实施例中对于调用的红外单目测距深度神经网络的结构以及训练方式不做限定,可以参照相关红外测距网络的相关介绍。
每个深度神经网络都需要通过损失函数来实现训练时参数的约束,目前,常见的红外单目测距深度神经网络的损失函数中包括:重投影损失函数、边缘平滑度损失函数等中的一种或多种,具体可以根据实际需要进行损失函数的配置。但是由于红外图像与可见光图像特性的不同,目前基于灰度损失和SSIM构成的光度损失函数已经不能满足红外图像单目测距的 需要,具体地,将图像重建作为监督信号的原理是通过从源图像的像素点根据预测深度和相机姿态来寻找目标图像的像素点计算损失,从而更新深度和相机姿态以获得准确的深度和相机姿态。可见光图像拥有RGB三个通道的像素值,有丰富的细节纹理,因而可以比较容易的在目标图像寻找到源图像的像素点。然而红外图像缺乏纹理,当程序读取红外图像时,它的RGB三个通道为相同的像素值,因而在目标图像中寻找源图像的相同的像素点时,很容易将附近相同的像素值的像素点当成源图像的同一个像素点,这样会导致错误的深度和姿态估计。
为此,根据红外低纹理造成像素点对之间无法精准配对的问题,本申请中提出了一种针对红外图像的边缘损失函数,用以增强图像的边缘特征,避免边缘特征的弱化甚至消失。边缘损失函数为根据图像帧的边缘特征与相邻图像帧的边缘特征空间投影间的差值进行边缘损失约束的函数。在该损失函数的约束下,先提取红外图像的边缘,然后将源图像和目标图像的边缘进行匹配,由于边缘的像素点易于区分,可以比较精准的定位,从而较为准确的更新图像预测深度和相机估计姿态。
具体地,针对上述表述的一种函数表达式如下:L e=|edge(I t)-Trans(edge(I t+1))|。其中,edge表示图像的边缘特征,Trans表示空间投影,保证这两个图像在空间中是对齐的,I t表示t时刻的图像帧,I t+1表示t+1时刻的图像帧,L e表示边缘特征值,Trans=KT t→t+1D t(p)K -1,K为内参矩阵,T t→t+1表示t时刻图像帧与t+1时刻图像帧之间的位姿变化,D t(p)为深度网络所估计的p点(t时刻图像帧中的一个像素点)的深度。
随着网络层次的增加,在边缘特征变模糊时,通过本实施例提供的边缘损失函数的约束增强边缘特征,保证边缘特征不消失。
如图3所示为一种使用Laplacian算子实现图像边缘的提取前后对比的红外图像示意图,可见,在本实施例提出的边缘损失函数的约束下,图像边缘特征明显,边缘特征保留程度高。
S103、根据相对视差图确定绝对深度,并将绝对深度作为测距结果。
在得到相对视差图后将其转换为绝对深度,实现方式本实施例中不做限定,可以参照相关技术中的实现方式,在此不再赘述。
得到绝对深度值后,将该绝对深度作为获取的红外图像的测距结果。
基于上述介绍,本申请实施例所提供的技术方案,基于红外图像对比度低、动态范围宽、图像不连续、信噪比低、低纹理的特性,针对用于对红外图像进行视差推理的红外单目测距深度神经网络提出了一项新的损失函数:边缘损失函数,该边缘损失函数为根据图像帧的边缘特征与相邻图像帧的边缘特征空间投影间的差值进行边缘损失约束的函数,在该损失函数的约束下,先提取红外图像的边缘,然后将源图像和目标图像的边缘进行匹配,增加边缘的像素点的区分度,基于该边缘损失函数进行网络参数约束后,红外单目测距深度神经网络可以比较精准的定位各像素点,从而可以降低图像重投影时的像素误匹配,提升图像预测深度和相机估计姿态的更新准确度,提升相对视差图的精准度,从而进一步提升测距精准度。
需要说明的是,基于上述实施例,本申请实施例还提供了相应的改进方案。在优选/改进实施例中涉及与上述实施例中相同步骤或相应步骤之间可相互参考,相应的有益效果也可相互参照,在本文的优选/改进实施例中不再一一赘述。
上述实施例中对于调用的红外单目测距深度神经网络的结构以及训练方式不做限定,本实施例中介绍一种红外单目测距深度神经网络的结构以及训练方式以供借鉴。
在输入单目的红外视频进行网络模型的训练时红外热成像仪在获取每一帧图片时红外热成像仪的位姿是在不断变化的,针对于此,可以针对位姿变化进行相对视差的估计。相应地,红外单目测距深度神经网络具体可以包括:深度估计网络以及姿态网络,其中,深度估计网络用于估计每帧图片的深度,姿态网络用于估计两帧图片之间相机的位姿变化。
在上述网络组成下的一种网络结构示意图如图4所示,针对上述网络结构,一种网络的训练方式如下:
(1)获取红外热成像仪采集生成的连续图像,第一图像以及第二图像;其中,第二图像为第一图像的相邻帧图像;
使用连续的红外视频图像帧作为数据集输入网络进行训练,假设相机连续拍摄的图像分别为I t,I s。其中I t是目标图像,即第一图像,I s是I t的上一帧或下一帧图像,即第二图像。
(2)调用深度估计网络对第一图像进行视差计算,得到视差图;
将I t送入深度估计网络得到其视差图D t
(3)调用姿态网络对第一图像以及第二图像进行位姿变化计算,得到位姿变化矩阵;
将I t和I s送入姿势网络得到两帧之间相机的位姿变化矩阵T。
(4)根据位姿变化矩阵以及深度图调用相邻图像间像素投影关系公式对第一图像进行图像重构,得到重构第一图像;
取I t图像上固定一点p t,I t转化为视差图D t之后该点变为D t(p),同时该点在I s上的投影点分别表示为p s。由于之前的两个网络分别得到了p t点的深度信息以及该点与上下帧该点投影点的位姿矩阵,根据像素投影的几何约束关系,p t与p s有以下的约束关系:p s=KT t→sD t(p)K -1p t。式中K是相机内参,通过标定获取,T是姿态网络所得到的连续帧之间的位姿矩阵,D t(p)是深度网络所估计的p t点的深度。
如果t时刻图片每一点的深度D t和t时刻与s时刻之间的位姿变化T都能获取的话,那就可以建立t时刻图片与其上下帧图片每一像素点之间的对应关系。因此将一点的关系推广到整幅图像,得到I t上的每一点的深度和T t→s,就能重构出一幅图像
Figure PCTCN2021099057-appb-000001
因此,本实施例中根据位姿变化矩阵以及深度图调用相邻图像间像素投影关系公式对第一图像进行图像重构,得到重构第一图像。
(5)根据重构第一图像以及第一图像进行网络训练。
重构出的图像
Figure PCTCN2021099057-appb-000002
与真实图像I t之间的差异就构成了训练过程的监督信号,通过训练将
Figure PCTCN2021099057-appb-000003
不断逼近于I t,就能得到越接近于真实值的D t
需要说明的是,在本实施例介绍的网络训练过程中并没有用到真实的深度信息,因此本申请的方法是完全无监督的。
本实施例中仅以基于位姿视差估计对红外单目测距深度神经网络结构以及训练过程进行介绍,基于其他类型的网络结构以及训练过程均可参照本实施例的介绍,在此不再赘述。
另外,上述实施例中针对红外单目测距深度神经网络的损失函数提出其中需要包括本申请所提出的边缘损失函数,而在边缘损失函数外,也可以进一步包括其它类型的一个或多个损失函数,比如重投影损失函数等。
为了提升位姿矩阵以及视差估计效果,在在边缘损失函数之外,本实施例中提出可以进一步设置重投影损失函数和边缘平滑损失函数,以便于更好的衡量重投影误差、消除视差图的噪声,则同时根据重投影损失函数、边缘平滑损失函数以及边缘损失函数进行网络参数的优化约束。
(1)为了同时从整体和细节来更好的衡量重投影误差,重投影损失函数也可以由两个部分组成,分别是结构相似性度量函数和L1范数损失函数。表达式如下:
Figure PCTCN2021099057-appb-000004
其中,SSIM的表达式为:
Figure PCTCN2021099057-appb-000005
其中,μ是平均值,在损失函数中就是图像像素点的平均值,μ x为x方向的平均值,μ y为Y方向的平均值,σ是方差,σ x为x方向的方差,σ y为y方向的方差,σ xy为x方向和Y方向的整体方差,在损失函数中也就代表像素点的方差,C是用来维持稳定的常数。SSIM的取值范围是0到1,两幅图像越是相似,SSIM的值就越接近1。这里的超参数α的取值具体可以选用0.85。
在两个连续图像计算重投影误差时,现有的自我监督深度估计方法会将重投影误差平均到两个图像中,这可能会导致较高的光度误差。这种有问题的像素主要来自两类:由于图像边界处的自运动而导致的视线外的像素,以及被遮挡的像素。可以通过在重投影损失中掩盖此类像素来减少视线外像素的影响,但这不能解决遮挡问题,因为平均重投影会导致模糊的 深度不连续性。
本申请借鉴Monodepth2的方法。在每个像素上没有对两个上下帧图像平均光度误差,而是仅使用最小值。因此,本实施例中最终的每像素光度损失函数为:
Figure PCTCN2021099057-appb-000006
(2)边缘平滑损失用于消除视差图中的噪声,深度梯度感知项使输出视差图更平滑,同时边缘感知项则鼓励模型更好的学习物体边缘信息。
表达式如下:
Figure PCTCN2021099057-appb-000007
其中,L s为损失函数,d t为深度图。
最终的红外单目测距深度神经网络的损失函数就是重投影损失函数、边缘平滑度损失函数以及边缘损失函数的组合。而三种损失函数的组合方式本实施例中不做限定,比如可以选用重投影损失函数、边缘平滑度损失函数以及边缘损失函数的加权和,即损失函数为:
Figure PCTCN2021099057-appb-000008
其中μ、λ、β为三个超参数,表示每个损失的重要程度,满足μ+λ+β=1。
其中,对于加权值μ,可以将自动掩膜损失函数作为该加权值,自动掩膜损失函数
Figure PCTCN2021099057-appb-000009
其中[]是艾弗森(Iverson)括号。在照相机和另一个物体都以相似的速度移动的情况下,μ可防止图像中保持静止的像素损失很小而不利于梯度下降。同样,当相机静止时,该损失可以滤除图像中的所有像素。
为加深理解,在次介绍一种基于上述损失函数设置下的网络训练方式,具体如下:使用操作系统为Ubuntu 18.04的计算机,红外单目测距深度神经网络的训练和测试模型可以在Pytorch1.4架构下搭建,所使用的服务器可以为RTX 6000。数据集使用HD1280红外相机采集,供训练的数据集一共有19000张,另外1000张作验证集。总的损失函数中三个超参数分别设置为μ=0.7、λ=0.2、β=0.1。红外单目测距深度神经网络可以使用Adam梯度下降法训练20轮,批处理大小为12,输入/输出分辨率为640*192。对于前15轮,学习率大小为10 -4,剩下的5轮,将学习率降至10 -5后红外单 目测距深度神经网络训练完成。
基于上述实施例,目前,在红外单目测距深度神经网络的多尺度特征提取中通常采用在四个不同尺度的视差图都上采样到原图像分辨率上计算损失的方式,该方式忽略了不同尺度之间的关联性。基于此,本实施例中提出一种可以提升特征提取效果的方法。
具体地,可以在红外单目测距深度神经网络的多尺度特征提取层中设置BiFPN层,在多尺度特征提取层中增加BiFPN层,如图5所示为一种BiFPN结构示意图,BiFPN层用于根据不同尺度视差图之间的关联性加强特征融合,BiFPN层通过下采样上采样和跨越链接来加强不同尺度之间特征融合的效果,可以更好的利用不同尺度视差图之间的关联性,以解决视差图像边缘模糊的问题。如图6所示为一种原始图像示意图,图7所示为一种无BiFPN的多尺度特征提取层进行特征提取后生成的相对视差图示意图,图8所示为一种有BiFPN的多尺度特征提取层进行特征提取后生成的相对视差图示意图,经过对比可见,加入BiFPN多尺度特征融合之后,图像的细节更加明显,边缘更加清晰。比如图中电线杆和自行车,加入BiFPN结构之后,边缘模糊问题有了明显的改善。
具体地,BiFPN是在PAnet的基础上改进得到,一种PAnet的结构如图9所示。调用BiFPN层根据不同尺度视差图之间的关联性加强特征融合时,首先删除那些只有一个输入边的节点。如果一个节点只有一个输入边且没有特征融合,那么它将对旨在融合不同特征的特征网络贡献较小。这导致简化的双向网络。其次,如果原始输入与输出节点处于同一级别,则在原始输入和输出节点之间添加一条额外的边,以便在不增加成本的情况下融合更多功能。第三,与PANet仅具有一个自上而下和一个自下而上的路径不同,将每个双向(自上而下和自下而上)路径视为一个特征网络层,并重复相同的层多次以启用更多高级特征融合。当融合具有不同分辨率的特征时,一种常见的方法是先将它们的大小调整为相同的分辨率,然后对其进行汇总。金字塔注意力网络引入了全局自注意力上采样以恢复像素定位。以前的所有方法均等地对待所有输入特征,没有区别。但是,经申请人研究发现,由于不同的输入特征图的分辨率不同,因此它们通常对输出 特征图的贡献不均。为了解决这个问题,可以为每个输入增加一个额外的权重,并让网络学习每个输入功能的重要性。具体地,本实施例中提出了一种加权融合方法:
快速归一化融合方法如下所示:
Figure PCTCN2021099057-appb-000010
其中w i≥0是通过在每个w i之后应用Relu来确保的,而ε=0.0001是一个小数值,以避免数值不稳定。同样,每个归一化权重的值也介于0和1之间。一种BiFPN第三层的输出如下:
Figure PCTCN2021099057-appb-000011
Figure PCTCN2021099057-appb-000012
其中P 3 td是自顶向下路径上第3级的中间特征,P 3 in是第3级的输入特征,P 4 in是第4级的输入特征,P 2 out是自下而上路径上第3级的输出特征,而P 3 out是自下而上路径上第3级的输出特征,w 1为第1级的权重,w 2为第2级的权重,w 3为第3级的权重。所有其他特征均以类似方式构造。值得注意的是,为了进一步提高效率,本实施例中使用深度可分离卷积进行特征融合,并在每次卷积后添加批处理规范化和激活。
本实施例中将双向特征金字塔网络(BiFPN)应用于多尺度视差图的特征融合,可以解决视差图边缘模糊问题。
另外,目前通用的单目视觉测距网络模型中的残差网络通常是以Resnet18网络为基础来搭建的,Resnet18的一个基本模块如图10所示。而经过实践申请人发现,基于Resnet18搭建残差网络时计算量大,会进一步导致整体复杂大,对于应用设备要求高,且计算效率低。
为进一步解决上述问题,本实施例中提出不再单纯的使用Resnet18网络作为深度以及姿势估计网络(一种残差网络),而是使用CSPNet网络对原先的Resnet18网络进行改进。改进之后的Resnet18基本模块如图11所示。CSPNet的主要工作思想是将特征图拆成两个部分,一部分进行卷积操 作,另一部分和上一部分卷积操作的结果直接进行拼接。采用CSPNet作为残差网络可以很大程度上减少计算量同时降低内存成本,并且使用该结构可以增强CNN的学习能力,降低网络复杂度,能够在系统轻量化的同时保持计算的准确性,便于将算法移植到低端的AI处理芯片。
相应于上面的方法实施例,本申请实施例还提供了一种红外热成像单目视觉测距装置,下文描述的红外热成像单目视觉测距装置与上文描述的红外热成像单目视觉测距方法可相互对应参照。
参见图12所示,该装置包括以下模块:
数据获取单元110主要用于获取红外热成像仪的内参矩阵,以及红外热成像仪针对目标对象采集生成的红外图像;
网络推理单元120主要用于调用预训练的红外单目测距深度神经网络根据内参矩阵对红外图像进行视差推理,得到相对视差图;其中,红外单目测距深度神经网络的损失函数中包括边缘损失函数,边缘损失函数为根据图像帧的边缘特征与相邻图像帧的边缘特征空间投影间的差值进行边缘损失约束的函数;
深度计算单元130主要用于根据相对视差图确定绝对深度,并将绝对深度作为测距结果。
相应于上面的方法实施例,本申请实施例还提供了一种红外热成像单目视觉测距设备,下文描述的一种红外热成像单目视觉测距设备与上文描述的一种红外热成像单目视觉测距方法可相互对应参照。
该红外热成像单目视觉测距设备包括:
存储器,用于存储计算机程序;
处理器,用于执行计算机程序时实现上述方法实施例的红外热成像单目视觉测距方法的步骤。
具体的,请参考图13,为本实施例提供的一种红外热成像单目视觉测距设备的具体结构示意图,该红外热成像单目视觉测距设备可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)322(例如,一个或一个以上处理器)和存储器332, 存储器332存储有一个或一个以上的计算机应用程序342或数据344。其中,存储器332可以是短暂存储或持久存储。存储在存储器332的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对数据处理设备中的一系列指令操作。更进一步地,处理器322可以设置为与存储器332通信,在红外热成像单目视觉测距设备301上执行存储器332中的一系列指令操作。
红外热成像单目视觉测距设备301还可以包括一个或一个以上电源326,一个或一个以上有线或无线网络接口350,一个或一个以上输入输出接口358,和/或,一个或一个以上操作系统341。
上文所描述的红外热成像单目视觉测距方法中的步骤可以由红外热成像单目视觉测距设备的结构实现。
相应于上面的方法实施例,本申请实施例还提供了一种可读存储介质,下文描述的一种可读存储介质与上文描述的一种红外热成像单目视觉测距方法可相互对应参照。
一种可读存储介质,可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现上述方法实施例的红外热成像单目视觉测距方法的步骤。
该可读存储介质具体可以为U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可存储程序代码的可读存储介质。
本领域技术人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。

Claims (10)

  1. 一种红外热成像单目视觉测距方法,其特征在于,包括:
    获取红外热成像仪的内参矩阵,以及所述红外热成像仪针对目标对象采集生成的红外图像;
    调用预训练的红外单目测距深度神经网络根据所述内参矩阵对所述红外图像进行视差推理,得到相对视差图;其中,所述红外单目测距深度神经网络的损失函数中包括边缘损失函数,所述边缘损失函数为根据图像帧的边缘特征与相邻图像帧的边缘特征空间投影间的差值进行边缘损失约束的函数;
    根据所述相对视差图确定绝对深度,并将所述绝对深度作为测距结果。
  2. 根据权利要求1所述的红外热成像单目视觉测距方法,其特征在于,所述红外单目测距深度神经网络的多尺度特征提取层中包括BiFPN层,所述BiFPN层用于根据不同尺度视差图之间的关联性加强特征融合。
  3. 根据权利要求1所述的红外热成像单目视觉测距方法,其特征在于,所述红外单目测距深度神经网络的残差网络采用CSPNet网络。
  4. 根据权利要求1所述的红外热成像单目视觉测距方法,其特征在于,所述红外单目测距深度神经网络包括深度估计网络以及姿态网络。
  5. 根据权利要求4所述的红外热成像单目视觉测距方法,其特征在于,所述红外单目测距深度神经网络的训练方法包括:
    获取所述红外热成像仪采集生成的连续图像,第一图像以及第二图像;其中,所述第二图像为第一图像的相邻帧图像;
    调用所述深度估计网络对所述第一图像进行深度计算,得到深度图;
    调用所述姿态网络对所述第一图像以及所述第二图像进行位姿变化计算,得到位姿变化矩阵;
    根据所述位姿变化矩阵以及所述深度图调用相邻图像间像素投影关系公式对所述第一图像进行图像重构,得到重构第一图像;
    根据所述重构第一图像以及所述第一图像进行网络训练。
  6. 根据权利要求1所述的红外热成像单目视觉测距方法,其特征在于,所述损失函数中还包括:重投影损失函数以及边缘平滑度损失函数。
  7. 根据权利要求6所述的红外热成像单目视觉测距方法,其特征在于, 所述损失函数为所述重投影损失函数、所述边缘平滑度损失函数以及所述边缘损失函数的加权和。
  8. 一种红外热成像单目视觉测距装置,其特征在于,包括:
    数据获取单元,用于获取红外热成像仪的内参矩阵,以及所述红外热成像仪针对目标对象采集生成的红外图像;
    网络推理单元,用于调用预训练的红外单目测距深度神经网络根据所述内参矩阵对所述红外图像进行视差推理,得到相对视差图;其中,所述红外单目测距深度神经网络的损失函数中包括边缘损失函数,所述边缘损失函数为根据图像帧的边缘特征与相邻图像帧的边缘特征空间投影间的差值进行边缘损失约束的函数;
    深度计算单元,用于根据所述相对视差图确定绝对深度,并将所述绝对深度作为测距结果。
  9. 一种红外热成像单目视觉测距设备,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述计算机程序时实现如权利要求1至7任一项所述红外热成像单目视觉测距方法的步骤。
  10. 一种可读存储介质,其特征在于,所述可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述红外热成像单目视觉测距方法的步骤。
PCT/CN2021/099057 2021-05-18 2021-06-09 一种红外热成像单目视觉测距方法及相关组件 WO2022241874A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110541321.2 2021-05-18
CN202110541321.2A CN113140011B (zh) 2021-05-18 2021-05-18 一种红外热成像单目视觉测距方法及相关组件

Publications (1)

Publication Number Publication Date
WO2022241874A1 true WO2022241874A1 (zh) 2022-11-24

Family

ID=76817558

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/099057 WO2022241874A1 (zh) 2021-05-18 2021-06-09 一种红外热成像单目视觉测距方法及相关组件

Country Status (2)

Country Link
CN (1) CN113140011B (zh)
WO (1) WO2022241874A1 (zh)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051632A (zh) * 2022-12-06 2023-05-02 中国人民解放军战略支援部队航天工程大学 一种双通道transformer卫星六自由度姿态估计算法
CN116168070A (zh) * 2023-01-16 2023-05-26 南京航空航天大学 一种基于红外图像的单目深度估计方法及系统
CN116245927A (zh) * 2023-02-09 2023-06-09 湖北工业大学 一种基于ConvDepth的自监督单目深度估计方法及系统
CN116524017A (zh) * 2023-03-13 2023-08-01 明创慧远科技集团有限公司 一种用于矿山井下检测识别定位系统
CN116524201A (zh) * 2023-03-29 2023-08-01 锋睿领创(珠海)科技有限公司 多尺度门控融合单元的特征提取方法、装置、设备及介质
CN117152397A (zh) * 2023-10-26 2023-12-01 慧医谷中医药科技(天津)股份有限公司 一种基于热成像投影的三维人脸成像方法及系统
CN117197229A (zh) * 2023-09-22 2023-12-08 北京科技大学顺德创新学院 一种基于亮度对齐的多阶段估计单目视觉里程计方法
CN117670753A (zh) * 2024-01-30 2024-03-08 浙江大学金华研究院 基于深度多亮度映射无监督融合网络的红外图像增强方法
CN117930224A (zh) * 2024-03-19 2024-04-26 山东科技大学 一种基于单目视觉深度估计的车辆测距方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114037087B (zh) * 2021-10-29 2024-02-09 北京百度网讯科技有限公司 模型训练方法及装置、深度预测方法及装置、设备和介质
CN116295356A (zh) * 2023-03-31 2023-06-23 国广顺能(上海)能源科技有限公司 一种单目检测与测距方法、电子设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106918321A (zh) * 2017-03-30 2017-07-04 西安邮电大学 一种利用图像上目标物视差进行测距的方法
CN110108253A (zh) * 2019-05-31 2019-08-09 烟台艾睿光电科技有限公司 单目红外热像仪的测距方法、装置、设备及可读存储设备
JP2019148865A (ja) * 2018-02-26 2019-09-05 パナソニックIpマネジメント株式会社 識別装置、識別方法、識別プログラムおよび識別プログラムを記録した一時的でない有形の記録媒体
CN111340867A (zh) * 2020-02-26 2020-06-26 清华大学 图像帧的深度估计方法、装置、电子设备及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108961327B (zh) * 2018-05-22 2021-03-30 深圳市商汤科技有限公司 一种单目深度估计方法及其装置、设备和存储介质
CN109087349B (zh) * 2018-07-18 2021-01-26 亮风台(上海)信息科技有限公司 一种单目深度估计方法、装置、终端和存储介质
CN111105432B (zh) * 2019-12-24 2023-04-07 中国科学技术大学 基于深度学习的无监督端到端的驾驶环境感知方法
CN111462206B (zh) * 2020-03-24 2022-06-24 合肥的卢深视科技有限公司 一种基于卷积神经网络的单目结构光深度成像方法
CN111899295B (zh) * 2020-06-06 2022-11-15 东南大学 一种基于深度学习的单目场景深度预测方法
CN112051853B (zh) * 2020-09-18 2023-04-07 哈尔滨理工大学 基于机器视觉的智能避障系统及方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106918321A (zh) * 2017-03-30 2017-07-04 西安邮电大学 一种利用图像上目标物视差进行测距的方法
JP2019148865A (ja) * 2018-02-26 2019-09-05 パナソニックIpマネジメント株式会社 識別装置、識別方法、識別プログラムおよび識別プログラムを記録した一時的でない有形の記録媒体
CN110108253A (zh) * 2019-05-31 2019-08-09 烟台艾睿光电科技有限公司 单目红外热像仪的测距方法、装置、设备及可读存储设备
CN111340867A (zh) * 2020-02-26 2020-06-26 清华大学 图像帧的深度估计方法、装置、电子设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XU SHU-SHU, WANG YUAN-QING, ZHANG ZHAO-YANG: "Extracting disparity map from bifocal monocular stereo vision in a novel way", JOURNAL OF COMPUTER APPLICATIONS, JISUANJI YINGYONG, CN, vol. 31, no. 2, 28 February 2011 (2011-02-28), CN , XP093006507, ISSN: 1001-9081, DOI: 10.3724/SP.J.1087.2011.00341 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051632A (zh) * 2022-12-06 2023-05-02 中国人民解放军战略支援部队航天工程大学 一种双通道transformer卫星六自由度姿态估计算法
CN116051632B (zh) * 2022-12-06 2023-12-05 中国人民解放军战略支援部队航天工程大学 一种双通道transformer卫星六自由度姿态估计算法
CN116168070A (zh) * 2023-01-16 2023-05-26 南京航空航天大学 一种基于红外图像的单目深度估计方法及系统
CN116168070B (zh) * 2023-01-16 2023-10-13 南京航空航天大学 一种基于红外图像的单目深度估计方法及系统
CN116245927A (zh) * 2023-02-09 2023-06-09 湖北工业大学 一种基于ConvDepth的自监督单目深度估计方法及系统
CN116245927B (zh) * 2023-02-09 2024-01-16 湖北工业大学 一种基于ConvDepth的自监督单目深度估计方法及系统
CN116524017A (zh) * 2023-03-13 2023-08-01 明创慧远科技集团有限公司 一种用于矿山井下检测识别定位系统
CN116524017B (zh) * 2023-03-13 2023-09-19 明创慧远科技集团有限公司 一种用于矿山井下检测识别定位系统
CN116524201A (zh) * 2023-03-29 2023-08-01 锋睿领创(珠海)科技有限公司 多尺度门控融合单元的特征提取方法、装置、设备及介质
CN116524201B (zh) * 2023-03-29 2023-11-17 锋睿领创(珠海)科技有限公司 多尺度门控融合单元的特征提取方法、装置、设备及介质
CN117197229B (zh) * 2023-09-22 2024-04-19 北京科技大学顺德创新学院 一种基于亮度对齐的多阶段估计单目视觉里程计方法
CN117197229A (zh) * 2023-09-22 2023-12-08 北京科技大学顺德创新学院 一种基于亮度对齐的多阶段估计单目视觉里程计方法
CN117152397A (zh) * 2023-10-26 2023-12-01 慧医谷中医药科技(天津)股份有限公司 一种基于热成像投影的三维人脸成像方法及系统
CN117152397B (zh) * 2023-10-26 2024-01-26 慧医谷中医药科技(天津)股份有限公司 一种基于热成像投影的三维人脸成像方法及系统
CN117670753A (zh) * 2024-01-30 2024-03-08 浙江大学金华研究院 基于深度多亮度映射无监督融合网络的红外图像增强方法
CN117930224A (zh) * 2024-03-19 2024-04-26 山东科技大学 一种基于单目视觉深度估计的车辆测距方法

Also Published As

Publication number Publication date
CN113140011B (zh) 2022-09-06
CN113140011A (zh) 2021-07-20

Similar Documents

Publication Publication Date Title
WO2022241874A1 (zh) 一种红外热成像单目视觉测距方法及相关组件
US11798132B2 (en) Image inpainting method and apparatus, computer device, and storage medium
KR102319177B1 (ko) 이미지 내의 객체 자세를 결정하는 방법 및 장치, 장비, 및 저장 매체
CN111598998B (zh) 三维虚拟模型重建方法、装置、计算机设备和存储介质
CN110782490B (zh) 一种具有时空一致性的视频深度图估计方法及装置
WO2019223382A1 (zh) 单目深度估计方法及其装置、设备和存储介质
WO2021179820A1 (zh) 图像处理方法、装置、存储介质及电子设备
US20190362511A1 (en) Efficient scene depth map enhancement for low power devices
US11748894B2 (en) Video stabilization method and apparatus and non-transitory computer-readable medium
CN112967341B (zh) 基于实景图像的室内视觉定位方法、系统、设备及存储介质
CN112927279A (zh) 一种图像深度信息生成方法、设备及存储介质
CN111798485B (zh) 一种利用imu增强的事件相机光流估计方法及系统
CN111325782A (zh) 一种基于多尺度统一的无监督单目视图深度估计方法
CN108665541A (zh) 一种基于激光传感器的地图生成方法及装置和机器人
CN112580558A (zh) 红外图像目标检测模型构建方法、检测方法、装置及系统
CN113711276A (zh) 尺度感知单目定位和地图构建
CN112907557A (zh) 道路检测方法、装置、计算设备及存储介质
CN116977674A (zh) 图像匹配方法、相关设备、存储介质及程序产品
CN116452810A (zh) 一种多层次语义分割方法、装置、电子设备及存储介质
CN116912675B (zh) 一种基于特征迁移的水下目标检测方法及系统
WO2023086398A1 (en) 3d rendering networks based on refractive neural radiance fields
CN116258756A (zh) 一种自监督单目深度估计方法及系统
EP4199498A1 (en) Site model updating method and system
CN115249269A (zh) 目标检测方法、计算机程序产品、存储介质及电子设备
CN114494574A (zh) 一种多损失函数约束的深度学习单目三维重建方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21940323

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21940323

Country of ref document: EP

Kind code of ref document: A1