TWI760128B

TWI760128B - Method and system for generating depth image and positioning system using the method

Info

Publication number: TWI760128B
Application number: TW110107954A
Authority: TW
Inventors: 郭峻因; 皓升鄭
Original assignee: 國立陽明交通大學
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2022-04-01
Also published as: TW202236214A

Abstract

The present invention is a generation method and a generation system for a depth image. A pixel image with a relative depth and a radar image with a measured depth is used to generate a magnification, and the pixel image is corrected by the magnification to generate a depth image. That is, the accurate depth is obtained by the fusion of the pixel image and the radar image. In addition, a corresponding positioning system is constructed by using the above method and system in the present invention. The depth image generated by the generation system for a depth image is positioned by the synchronous positioning and map construction module to achieve the purpose of positioning using depth images with absolute depth.

Description

Method and system for generating depth image and positioning system using the method

本發明係為一種圖像生成技術以及定位方法，尤指一種深度圖像之生成方法、系統以及應用該方法之定位系統。 The present invention relates to an image generation technology and a positioning method, in particular to a depth image generation method and system, and a positioning system applying the method.

近年來，機器人、自動駕駛汽車(以下稱自駕車)等技術愈益成熟。以自駕車為例，於一些特定區域已能看到自駕車行駛於道路上，其須仰賴精確的定位技術之協助，方能順利於預定路線上移動，以避免自駕車偏離預定路線或是自駕車因偏離預定路線而造成許多事故，是以，定位技術對於自駕車之行駛著實重要。 In recent years, technologies such as robots and self-driving cars (hereinafter referred to as self-driving cars) have become more and more mature. Taking self-driving cars as an example, it can be seen that self-driving cars are driving on the road in some specific areas. They must rely on the assistance of accurate positioning technology to move smoothly on the predetermined route, so as to avoid self-driving cars from deviating from the predetermined route or self-driving cars. Many accidents are caused by driving away from the intended route, so positioning technology is really important for the driving of self-driving cars.

近來最受到矚目的定位技術，莫過於同步定位與地圖建構技術(Simultaneous Localization and Mapping，下稱SLAM)，憑藉著其精確的定位功能，且不須倚靠全球定位系統(Global Positioning System，GPS)之特點，使得SLAM技術可應用於戶外或非戶外之場所，其應用範圍更加廣泛。一般SLAM技術通常需要架設光學雷達(light detection and ranging，LIDAR)作為感測裝置，以精確地偵測環境中物體之距離以及判斷影像之深度，惟光學雷達設備昂貴，是以，目前出現視覺SLAM(Vision SLAM)技術盼得取而代之。 The positioning technology that has attracted the most attention recently is Simultaneous Localization and Mapping (hereinafter referred to as SLAM). With its precise positioning function, it does not need to rely on the Global Positioning System (GPS). Features, make SLAM technology can be used in outdoor or non-outdoor places, and its application range is wider. General SLAM technology usually needs to set up light detection and ranging (LIDAR) as a sensing device to accurately detect Measuring the distance of objects in the environment and judging the depth of the image, but the LiDAR equipment is expensive, so the current vision SLAM (Vision SLAM) technology is looking forward to replace it.

視覺SLAM技術主要係利用相機作為感測器，而現今的相機具有成本低廉，且架設、設定、校正皆較為簡單之優點，使得視覺SLAM技術逐漸受到重視，然視覺SLAM技術仍存在一些需要克服之難題，其中之一在於其演算法較為複雜且運算量較為龐大。 The visual SLAM technology mainly uses the camera as the sensor, and the current camera has the advantages of low cost, and the erection, setting and calibration are relatively simple, which makes the visual SLAM technology gradually attract more attention. However, there are still some problems that need to be overcome in the visual SLAM technology. One of the problems is that its algorithm is relatively complex and the amount of calculation is relatively large.

為解決上述問題，目前之視覺SLAM技術中，於影像處理方面，是依據其使用之像素區域大小，大致上分為直接法(Direct method)和特徵點法(feature-based method)，其中，直接法會直接使用整張影像去進行定位與估測相機姿態，其優點在於較不耗時，但直接法對於亮度變化敏感，且儲存地圖所占用之空間較大，而特徵點法則會對整張畫面先進行特徵點萃取的動作，雖然較費時但對於環境光的抗噪性高，地圖占用儲存空間小。 In order to solve the above problems, the current visual SLAM technology, in terms of image processing, is roughly divided into the direct method and the feature-based method according to the size of the pixel area used. The method directly uses the entire image to locate and estimate the camera pose, which has the advantage of being less time-consuming, but the direct method is sensitive to changes in brightness and takes up a lot of space to store the map, while the feature point method will affect the entire image. The picture first performs the feature point extraction action. Although it is time-consuming, it has high noise immunity against ambient light, and the map occupies a small storage space.

另外，在輸入感測方面，視覺SLAM技術可分為單目像機(Monocular)SLAM、彩色深度像機(RGB-D)SLAM和立體像機(Stereo)SLAM，其中，Monocular SLAM是使用單一個影像進行定位，其利用關鍵幀與關鍵幀之間的特徵點的位置，等速模型估算距離，亦即於Monocular SLAM進行建圖與定位時，須依靠關鍵幀與關鍵幀直接的視差計算特徵點的實際位置，此致Monocular SLAM無法建立與實際軌跡大小一樣的地圖，因而Monocular SLAM最大的問題在於缺乏環境的實際深度資訊；其次，RGB-D SLAM則是利用單目相機搭配紅外測距，惟其存在著紅外線在室外強光底下無法正常運作之問題，亦即紅外線於室外之場所使用時，將受光線之故而影響其準確度，致其所測量的深度並不準確，據此，RGB-D SLAM的應用通常僅限於室內的環境中使用；另外，Stereo SLAM係使用兩個相機的視差估計深度進行定位，具體地，Stereo SLAM係藉由兩顆魚眼相機之間的視差，以計算深度，再搭配相機的校正計算出實際環境的距離，惟Stereo SLAM的問題在於延展性不佳。 In addition, in terms of input sensing, visual SLAM technology can be divided into monocular camera (Monocular) SLAM, color depth camera (RGB-D) SLAM and stereo camera (Stereo) SLAM, wherein Monocular SLAM uses a single Image positioning, which uses the position of the feature points between the key frames and the key frames, and the isokinetic model to estimate the distance, that is, when using Monocular SLAM for mapping and positioning, it must rely on the direct disparity between the key frames and the key frames to calculate the feature points. As a result, Monocular SLAM cannot create a map with the same size as the actual trajectory, so the biggest problem of Monocular SLAM is the lack of actual depth information of the environment; secondly, RGB-D SLAM uses a monocular camera with infrared ranging, which only exists In view of the problem that infrared rays cannot operate normally under strong outdoor light, that is, when infrared rays are used in outdoor places, their accuracy will be affected by light, resulting in inaccurate depth measurements. Accordingly, RGB-D SLAM applications are generally limited to indoor environments; in addition, Stereo SLAM uses the parallax of two cameras to estimate the depth for positioning. Specifically, Stereo SLAM uses the parallax between two fisheye cameras to calculate the depth, and then uses the camera's correction to calculate the distance of the actual environment. The problem is poor scalability.

由上可知，目前視覺SLAM技術存在的最大問題在於無法有效提供具有精確的絕對深度之深度圖，而傳統Monocular SLAM、RGB-D SLAM和以及Stereo SLAM存在各自缺點，導致視覺SLAM目前難以取代現有應用光學雷達之SLAM，因此，如何克服上述現有技術之種種缺失，將成目前亟欲解決的課題。 It can be seen from the above that the biggest problem of the current visual SLAM technology is that it cannot effectively provide a depth map with accurate absolute depth, while traditional Monocular SLAM, RGB-D SLAM and Stereo SLAM have their own shortcomings, which makes it difficult for visual SLAM to replace existing applications. The SLAM of the optical radar, therefore, how to overcome the above-mentioned various deficiencies in the prior art will become an urgent problem to be solved at present.

有鑑於上述問題，本發明提供一種深度圖像之生成方法，係包括：提供包括深度特徵點以及對應該深度特徵點之相對深度的像素圖像；提供包括對應該深度特徵點之雷達點以及對應該雷達點之量測深度的雷達圖像；計算該量測深度與該相對深度之間的倍率；以及透過該倍率校正該像素圖像，以形成深度圖像。 In view of the above problems, the present invention provides a method for generating a depth image, which includes: providing a pixel image including a depth feature point and a relative depth corresponding to the depth feature point; providing a radar point including the depth feature point and a pair of A radar image of the measured depth of the radar point; calculating a magnification between the measured depth and the relative depth; and correcting the pixel image by the magnification to form a depth image.

於一實施例中，該深度圖像之生成方法復包括透過深度學習網路對該像素圖像進行分析，以自該像素圖像中取得該深度特徵點，俾利用該深度特徵點得到該相對深度。 In one embodiment, the method for generating the depth image further includes analyzing the pixel image through a deep learning network to obtain the depth feature point from the pixel image, so that the relative depth feature point can be obtained by using the depth feature point. depth.

於另一實施例中，該雷達圖像與該像素圖像係利用投影方式，以使該雷達點與該深度特徵點相對應。 In another embodiment, the radar image and the pixel image are projected so that the radar point corresponds to the depth feature point.

於另一實施例中，該像素圖像包括複數個該深度特徵點以及複數個該相對深度，該雷達圖像包括複數個該雷達點以及複數個該量測深度，透過複數個該量測深度與複數個該相對深度之間的計算取得複數個倍率，且平均該複數個倍率以產生平均倍率，俾利用該平均倍率校正該像素圖像。 In another embodiment, the pixel image includes a plurality of the depth feature points and a plurality of the relative depths, and the radar image includes a plurality of the radar points and a plurality of the measured depths, through the complex A plurality of magnifications are obtained by calculating between a plurality of the measured depths and a plurality of the relative depths, and the plurality of magnifications are averaged to generate an average magnification for correcting the pixel image using the averaged magnification.

於又一實施例中，該深度圖像之生成方法復包括透過複數個該像素圖像以及複數個該雷達圖像以獲得複數個幀倍率，且平均該複數個幀倍率以產生校正倍率，俾藉由該校正倍率校正該像素圖像，以形成該深度圖像。 In yet another embodiment, the method for generating the depth image further comprises obtaining a plurality of frame magnifications through a plurality of the pixel images and a plurality of the radar images, and averaging the plurality of frame magnifications to generate a correction magnification, so as to obtain a plurality of frame magnifications. The pixel image is corrected by the correction magnification to form the depth image.

本發明復提供一種深度圖像之生成系統，其包括：深度預測模組，係用於接收及分析像素圖像，以獲得該像素圖像之深度特徵點以及對應該深度特徵點的相對深度；雷達擷取模組，係用於擷取包括對應該深度特徵點之雷達點以及對應該雷達點之量測深度的雷達圖像；以及深度融合模組，係連接該深度預測模組與該雷達擷取模組，用以計算該量測深度與該相對深度之間的倍率，以透過該倍率校正該像素圖像而形成深度圖像。 The present invention further provides a system for generating a depth image, comprising: a depth prediction module for receiving and analyzing a pixel image to obtain a depth feature point of the pixel image and a relative depth corresponding to the depth feature point; The radar acquisition module is used to acquire the radar image including the radar point corresponding to the feature point of the depth and the measured depth corresponding to the radar point; and the depth fusion module is used to connect the depth prediction module and the radar The capturing module is used for calculating the magnification between the measured depth and the relative depth, so as to correct the pixel image through the magnification to form a depth image.

於一實施例中，該深度圖像之生成系統復包括連接該深度預測模組之圖像擷取模組，用以擷取該像素圖像以將該像素圖像提供予該深度預測模組。 In one embodiment, the depth image generation system further includes an image capture module connected to the depth prediction module for capturing the pixel image to provide the pixel image to the depth prediction module .

於另一實施例中，該深度預測模組係透過深度學習網路對該像素圖像進行分析，以自該像素圖像中取得該深度特徵點，俾利用該深度特徵點得到該相對深度。 In another embodiment, the depth prediction module analyzes the pixel image through a deep learning network to obtain the depth feature point from the pixel image, so as to use the depth feature point to obtain the relative depth.

於另一實施例中，該深度圖像之生成系統復包括連接該深度預測模組以及該雷達擷取模組之校正模組，係用於利用投影方式使該雷達點與該深度特徵點相對應，俾將經投影之該像素圖像及該雷達圖像傳送至該深度融合模組。 In another embodiment, the generation system of the depth image further includes a correction module connected to the depth prediction module and the radar capture module, which is used to make the radar point and the depth feature point match by projection method. Correspondingly, the projected pixel image and the radar image are transmitted to the deep fusion module.

於另一實施例中，該像素圖像包括複數個該深度特徵點以及複數個該相對深度，該雷達圖像包括複數個該雷達點以及複數個該量測深度，該深度融合模組透過複數個該量測深度與複數個該相對深度之間的計算取得複數個倍率，且平均該複數個倍率以產生平均倍率，俾利用該平均倍率校正該像素圖像。 In another embodiment, the pixel image includes a plurality of the depth feature points and a plurality of the relative depths, the radar image includes a plurality of the radar points and a plurality of the measured depths, and the depth fusion module passes through the plurality of A plurality of magnifications are obtained by calculating between a plurality of the measured depths and a plurality of the relative depths, and the plurality of magnifications are averaged to generate an average magnification for correcting the pixel image using the averaged magnification.

於又一實施例中，該深度融合模組透過複數個該像素圖像以及複數個該雷達圖像之間的計算取得複數個幀倍率，且平均該複數個幀倍率以產生校正倍率，俾藉由該校正倍率校正該像素圖像，以形成該深度圖像。 In yet another embodiment, the depth fusion module obtains a plurality of frame magnifications through calculation between a plurality of the pixel images and a plurality of the radar images, and averages the plurality of frame magnifications to generate a correction magnification, so as to use The pixel image is corrected by the correction magnification to form the depth image.

本發明復提供一種定位系統，係包括：深度預測模組，係用於接收及分析像素圖像，以獲得該像素圖像之深度特徵點以及對應該深度特徵點的相對深度；雷達擷取模組，係用於擷取包括對應該深度特徵點之雷達點以及對應該雷達點之量測深度的雷達圖像；深度融合模組，係連接該深度預測模組與該雷達擷取模組，用以計算該量測深度與該相對深度之間的倍率，透過該倍率校正該像素圖像而形成深度圖像；以及同步定位與地圖建構模組，係用於比對該深度融合模組所形成之該深度圖像以及預存之區域地圖，以依據比對結果而於該區域地圖中進行定位。 The present invention further provides a positioning system, which includes: a depth prediction module for receiving and analyzing a pixel image to obtain a depth feature point of the pixel image and a relative depth corresponding to the depth feature point; a radar capture module The group is used to capture the radar image including the radar point corresponding to the feature point of the depth and the measured depth corresponding to the radar point; the depth fusion module is used to connect the depth prediction module and the radar capture module, is used for calculating the magnification between the measured depth and the relative depth, and correcting the pixel image through the magnification to form a depth image; and a synchronizing positioning and map building module for comparing the data obtained by the depth fusion module The formed depth image and the pre-stored area map are used for positioning in the area map according to the comparison result.

於一實施例中，該定位系統復包括圖像擷取模組以及校正模組，其中，該圖像擷取模組用以擷取該像素圖像以將該像素圖像提供予該深度預測模組，而該校正模組接收來自該深度預測模組之該像素圖像以及來自該雷達擷取模組之該雷達圖像，以利用投影方式，使該雷達點與該深度特徵點相對應，俾將經投影之該像素圖像及該雷達圖像傳送至該深度融合模組。 In one embodiment, the positioning system further includes an image capture module and a calibration module, wherein the image capture module is used to capture the pixel image to provide the pixel image to the depth prediction a module, and the correction module receives the pixel image from the depth prediction module and the radar image from the radar capture module, so as to use a projection method to make the radar point correspond to the depth feature point , so as to transmit the projected pixel image and the radar image to the deep fusion module.

於另一實施例中，該同步定位與地圖建構模組復包括：特徵點萃取單元，係用於對該深度圖像進行圖像特徵點萃取；追蹤或重新定位單元，係連接該特徵點萃取單元，用以比對該深度圖像之該圖像特徵點與前一個深度圖像之先前圖像特徵點，以進行特徵點追蹤；以及優化與區域定位單元，係連接該追蹤或重新定位單元，用於分析該追蹤或重新定位單元之追蹤結果，以取得該深度圖像對應的定位位置。 In another embodiment, the synchronous positioning and map construction module further includes: a feature point extraction unit for extracting image feature points from the depth image; a tracking or relocation unit, connected to The feature point extraction unit is connected to compare the image feature point of the depth image with the previous image feature point of the previous depth image, so as to carry out feature point tracking; and an optimization and area positioning unit is connected to the The tracking or repositioning unit is used for analyzing the tracking result of the tracking or repositioning unit to obtain the positioning position corresponding to the depth image.

於另一實施例中，該同步定位與地圖建構模組復包括用以輸出該深度圖像對應的定位位置之相機姿態輸出單元。 In another embodiment, the synchronized positioning and map construction module further includes a camera pose output unit for outputting the positioning position corresponding to the depth image.

於另一實施例中，該相機姿態輸出單元係為視訊圖形陣列(VGA)、數位視訊介面(DVI)、高畫質多媒體介面(HDMI)或顯示埠(Display Port)之輸出埠。 In another embodiment, the camera gesture output unit is an output port of a Video Graphics Array (VGA), a Digital Video Interface (DVI), a High-Definition Multimedia Interface (HDMI) or a Display Port.

於另一實施例中，該同步定位與地圖建構模組復包括用以儲存該區域地圖之區域地圖單元，以於該追蹤或重新定位單元無法對該深度圖像之圖像特徵點與該前一個深度圖像之先前圖像特徵點進行追蹤時，透過該深度影像與該區域地圖進行比對，以進行重新定位。 In another embodiment, the simultaneous positioning and map building module further includes an area map unit for storing the area map, so that the tracking or relocation unit cannot match the image feature points of the depth image with the previous area map. When the feature points of the previous image of a depth image are tracked, the depth image is compared with the map of the area for repositioning.

於另一實施例中，該優化與區域定位單元係於取得該深度圖像對應的該定位位置時，同步更新該區域地圖。 In another embodiment, the optimization and area positioning unit updates the area map synchronously when the positioning position corresponding to the depth image is obtained.

於另一實施例中，該同步定位與地圖建構模組復包括用於在該區域地圖中插入關鍵幀之關鍵幀生成單元，以於插入該關鍵幀時，比對該關鍵幀與該區域地圖中之其他關鍵幀，俾於比對結果匹配時，利用該圖像特徵點形成三維地圖。 In another embodiment, the synchronous positioning and map construction module further includes a key frame generating unit for inserting a key frame in the area map, so as to compare the key frame and the area map when inserting the key frame The other key frames are used to form a three-dimensional map by using the image feature points when the comparison result matches.

於又一實施例中，該同步定位與地圖建構模組復包括迴環檢測單元，用以計算該關鍵幀與該區域地圖之該其他關鍵幀之間的相似度，以於該相似度超過閾值時，優化該區域地圖。 In yet another embodiment, the synchronous positioning and map construction module further includes a loop closure detection unit for calculating the similarity between the key frame and the other key frames of the area map, so that when the similarity exceeds a threshold value , optimize the map of the area.

綜上所述，本發明之深度圖像之生成方法與系統，將具相對深度之像素圖像以及具量測深度之雷達圖像進行融合，計算出二者相差之倍率，以透過該倍率校正該像素圖像而生成深度圖像，藉此得到準確深度；另外，本發明提出應用前述生成方法之定位系統，係將所生成之深度圖像進一步透過同步定位與地圖建構模組進行定位，據以達到提供具有絕對深度之深度圖像進行定位之目的，不僅解決視覺SLAM的可能問題，亦能取代現有應用光學雷達之SLAM的高成本問題。 To sum up, the method and system for generating a depth image of the present invention fuse the pixel image with relative depth and the radar image with measured depth, and calculate the ratio of the difference between the two, so as to correct the ratio through the ratio The pixel image is used to generate a depth image, thereby obtaining an accurate depth. In addition, the present invention proposes a positioning system using the above-mentioned generation method, which is to further locate the generated depth image through a synchronous positioning and map construction module. In order to achieve the purpose of providing a depth image with absolute depth for localization, it not only solves the possible problems of visual SLAM, but also replaces the high cost of SLAM using existing LiDAR.

10,10’:深度圖像之生成系統 10,10': Depth image generation system

11:深度預測模組 11: Depth prediction module

12:雷達擷取模組 12: Radar capture module

13:深度融合模組 13: Deep Fusion Module

14:圖像擷取模組 14: Image capture module

15:校正模組 15: Calibration module

201-211:流程 201-211: Process

60:定位系統 60: Positioning system

61:深度預測模組 61: Depth prediction module

62:雷達擷取模組 62: Radar Capture Module

63:深度融合模組 63: Deep Fusion Module

64:同步定位與地圖建構模組 64: Simultaneous positioning and map building module

641:特徵點萃取單元 641: Feature point extraction unit

642:追蹤或重新定位單元 642: Track or reposition unit

643:優化與區域定位單元 643: Optimization and Regional Positioning Unit

644:相機姿態輸出單元 644: Camera pose output unit

645:區域地圖單元 645: Area Map Unit

646:關鍵幀生成單元 646: Keyframe generation unit

647:迴環檢測單元 647: Loopback detection unit

65:圖像擷取模組 65: Image capture module

66:校正模組 66: Calibration module

801~807:流程 801~807: Process

901~907:流程 901~907: Process

S101~S104:步驟 S101~S104: Steps

圖1係本發明之深度圖像之生成方法的步驟流程圖。 FIG. 1 is a flow chart of the steps of the method for generating a depth image of the present invention.

圖2係本發明之深度圖像之生成方法進行深度融合之流程圖。 FIG. 2 is a flowchart of depth fusion performed by the method for generating a depth image of the present invention.

圖3係本發明之雷達圖像取樣示意圖。 FIG. 3 is a schematic diagram of radar image sampling according to the present invention.

圖4係本發明之深度圖像之生成系統第一實施例之架構示意圖。 FIG. 4 is a schematic diagram of the structure of the first embodiment of the depth image generation system of the present invention.

圖5係本發明之深度圖像之生成系統第二實施例之架構示意圖。 FIG. 5 is a schematic diagram of the structure of the second embodiment of the depth image generation system of the present invention.

圖6係本發明之定位系統之系統架構示意圖。 FIG. 6 is a schematic diagram of the system architecture of the positioning system of the present invention.

圖7係本發明之定位系統中同步定位與地圖建構模組之架構示意圖。 FIG. 7 is a schematic structural diagram of a synchronous positioning and map building module in the positioning system of the present invention.

圖8係本發明之形成投影矩陣之流程圖。 FIG. 8 is a flow chart of forming a projection matrix according to the present invention.

圖9係本發明之定位系統之相機姿態投影流程圖。 FIG. 9 is a flow chart of camera pose projection of the positioning system of the present invention.

圖10A-10D係本發明之定位效果軌跡圖。 10A-10D are locating effect trajectory diagrams of the present invention.

以下藉由特定的具體實施形態說明本發明之技術內容，熟悉此技藝之人士可由本說明書所揭示之內容輕易地瞭解本發明之優點與功效。然本發明亦可藉由其他不同的具體實施形態加以施行或應用。 The following describes the technical content of the present invention through specific embodiments, and those skilled in the art can easily understand the advantages and effects of the present invention from the content disclosed in this specification. However, the present invention can also be implemented or applied by other different specific embodiments.

圖1為本發明之深度圖像之生成方法的步驟流程圖。如圖所示，於步驟S101，提供包括深度特徵點以及對應該深度特徵點之相對深度的像素圖像。具體而言，該像素圖像係為透過相機、攝影機等用以擷取環境圖像之設備所擷取之圖片或影像資料，透過分析像素圖像，以自該像素圖像中獲得深度特徵點以及對應該深度特徵點之相對深度資訊。易言之，本發明透過深度學習網路對像素圖像進行分析，以由像素圖像中取得深度特徵點，並利用深度特徵點得到像素圖像之相對深度，其中，本發明所述之深度特徵點係指像素圖像中之特定區域、物件、像素或座標點等，可用以供深度學習網路進行分析之資訊。 FIG. 1 is a flow chart of the steps of a method for generating a depth image according to the present invention. As shown in the figure, in step S101, a pixel image including a depth feature point and a relative depth corresponding to the depth feature point is provided. Specifically, the pixel image is a picture or image data captured by a device such as a camera, a video camera, etc. for capturing environmental images. By analyzing the pixel image, depth feature points are obtained from the pixel image. And the relative depth information corresponding to the depth feature point. In other words, the present invention analyzes the pixel image through a deep learning network to obtain depth feature points from the pixel image, and uses the depth feature points to obtain the relative depth of the pixel image, wherein the depth described in the present invention is Feature points refer to specific regions, objects, pixels, or coordinate points in a pixel image, which can be used for analysis by deep learning networks.

於步驟S102，提供包括對應該深度特徵點之雷達點以及對應該雷達點之量測深度的雷達圖像。具體而言，雷達圖像係為透過雷達對環境所擷取之雷達點雲，其具有一或複數雷達點，透過雷達點能獲知其對應之距離資訊。進一步地，雷達圖像與像素圖像利用投影方式，而使該雷達點與該深度特徵點相對應，其中，像素圖像及雷達圖像分別具有各自的座標軸，透過將雷達圖像之座標軸對應至像素圖像之座標軸，即將雷達圖像投影至像素圖像，藉此使得雷達圖像之雷達點可對應像素圖像上之像素點。 In step S102, a radar image including a radar point corresponding to the depth feature point and a measured depth corresponding to the radar point is provided. Specifically, the radar image is a radar point cloud captured by the radar to the environment, which has one or more radar points, and the corresponding distance information can be obtained through the radar points. Further, the radar image and the pixel image use a projection method to make the radar point correspond to the depth feature point, wherein the pixel image and the radar image have respective coordinate axes, and the coordinate axes of the radar image are corresponding to each other. To the coordinate axis of the pixel image, that is, project the radar image to the pixel image, so that the radar point of the radar image can correspond to the pixel point on the pixel image.

於步驟S103，計算該量測深度與該相對深度之間的倍率。本步驟即以相對深度為基礎，計算量測深度相較於相對深度之間之倍率，舉例而言，像素圖像中之深度特徵點經分析所得到之相對深度為5公尺，而雷達圖像中對應深度特徵點的量測深度為10公尺，則其間之倍率即為2倍。 In step S103, the magnification between the measurement depth and the relative depth is calculated. This step is based on the relative depth to calculate the ratio between the measured depth and the relative depth. For example, the relative depth obtained by analyzing the depth feature points in the pixel image is 5 meters, and the radar image If the measured depth of the corresponding depth feature points in the image is 10 meters, the magnification between them is 2 times.

於步驟S104，透過該倍率校正該像素圖像，以形成深度圖像。於獲得相對深度與量測深度之間的倍率後，利用該倍率對像素圖像進行校正，使該像素圖像形成具有絕對深度的深度圖像，亦即，像素圖像具有許多相對距離資訊，為密集的深度圖，而雷達圖像具有準確距離資訊但資料量少，故為稀疏的點雲圖，深度圖像則如像素圖像具有許多距離資訊的深度圖，但同時具備絕對深度資訊。承上示例，當得到相對深度與量測深度之間的倍率為2倍時，利用該倍率將像素圖像中的相對深度校正為雷達圖像中之量測深度，使得像素圖像經倍率之校正後，成為具有絕對深度之深度圖像。 In step S104, the pixel image is corrected by the magnification to form a depth image. After obtaining the magnification between the relative depth and the measured depth, the pixel image is corrected by the magnification, so that the pixel image forms a depth image with absolute depth, that is, the pixel image has a lot of relative distance information, It is a dense depth map, while a radar image has accurate distance information but a small amount of data, so it is a sparse point cloud image, and a depth image is a depth map with a lot of distance information such as pixel images, but at the same time absolute depth information. Following the above example, when the magnification between the relative depth and the measured depth is 2 times, the relative depth in the pixel image is corrected to the measured depth in the radar image using this magnification, so that the pixel image is changed by the magnification. After correction, it becomes a depth image with absolute depth.

於一實施例中，可於一段時間區間中擷取複數幀之像素圖像以及對應該複數幀的像素圖像之複數個雷達圖像，接著，先對各對應幀之像素圖像及雷達圖像取其各自對應幀之倍率，藉以獲得複數個幀倍率，再進一步對複數個幀倍率取平均值，以得到校正倍率，即可藉該校正倍率校正各像素圖像，以形成對應之深度圖像，據此，本發明可達到降低單一倍率可能產生較大誤差之目的。 In one embodiment, pixel images of a plurality of frames and a plurality of radar images corresponding to the pixel images of the plurality of frames can be captured in a period of time, and then, the pixel images and radar images of the corresponding frames are firstly analyzed. The image takes the magnification of its corresponding frame to obtain multiple frame magnifications, and then further averages the multiple frame magnifications to obtain the correction magnification, which can be used to correct each pixel image to form the corresponding depth map For example, according to this, the present invention can achieve the purpose of reducing the large error that may be caused by a single magnification.

於另一實施例中，每一個像素圖像係包括複數個深度特徵點，且複數個深度特徵點係對應有複數個相對深度，而每一個雷達圖像係對應像素圖像包括複數個雷達點，且複數個雷達點係對應有複數個量測深度，據此，本發明可透過單一圖像中複數個量測深度與對應的複數個相對深度之間的計算，使各對應之相對深度及量測深度皆產生對應之倍率，藉以於單一像素圖像中取得複數個倍率，再將該複數個倍率進行平均，以產生平均倍率，即可利用該平均倍率校正像素圖像，其可達到降低單一圖像中僅採用單一倍率可能產生誤差之結果。 In another embodiment, each pixel image includes a plurality of depth feature points, and the plurality of depth feature points correspond to a plurality of relative depths, and each radar image includes a plurality of radar points corresponding to the pixel image. , and a plurality of radar points correspond to a plurality of measurement depths. Accordingly, the present invention can make the corresponding relative depths and The measured depths all generate corresponding magnifications, so as to obtain multiple magnifications in a single pixel image, and then average the multiple magnifications to generate an average magnification, which can be used to correct the pixel image, which can reduce the Using only a single magnification in a single image may produce erroneous results.

圖2為本發明之深度圖像之生成方法進行深度融合之流程圖。如圖所示，本發明之雷達圖像為透過雷達所擷取之對應環境的雷達點雲，其中，雷達點雲中具有多個雷達點及對應各雷達點之量測深度，於流程201中，係提供雷達圖像；於流程202-203中，係對雷達圖像進行篩選，以達到減少雷達圖像中之雜訊所產生之影像，具體而言，先確認雷達點雲是否是有效投影，即將雷達圖像投影至像素圖像上，並確認各雷達點是否為有效投影，並可找到對應之特徵點，若為未有效投影，則捨去該雷達圖像，當為有效投影時，再確認雷達圖像中的雷達點的距離範圍是否屬於所使用之雷達的深度範圍，如否，則捨去該雷達圖像，以淘汰不適合之雷達圖像，若雷達點屬於規定之深度範圍，則進入流程204，即於雷達圖像篩選後，利用篩選後之具有深度的雷達圖像對相對深度圖進行取樣，其中，圖3為本發明之雷達圖像取樣示意圖，如圖所示，本發明藉由在雷達圖像之雷達點周圍進行取樣，其可達到消除雜訊之目的，於此之後，將取得每一個有效雷達點對應的相對深度數值；於流程205中，利用有效雷達點對應之相對深度之數值，以計算單一幀的倍率；於流程206中，利用多幀之像素圖像及對應之雷達圖像，以計算多幀(例如M個幀)之多個倍率；於流程207中，再將多幀所形成之多個倍率進行平均，以取得最後之校正倍率；於流程208中，將得到之最終之校正倍率，用以校正相對深度；於流程209中，進行限制深度範圍，即要求相對深度校正後仍要屬於深度範圍內；於流程210中，再進行中位數對齊的調整；最後，於流程211中，獲得具有絕對深度之深度圖像。 FIG. 2 is a flow chart of deep fusion performed by the method for generating a depth image of the present invention. As shown in the figure, the radar image of the present invention is the radar point cloud of the corresponding environment captured by the radar, wherein the radar image There are multiple radar points in the point cloud and the measured depths corresponding to each radar point. In the process 201, radar images are provided; in the processes 202-203, the radar images are screened to reduce the number of radar images. The image generated by the noise in the image, specifically, first confirm whether the radar point cloud is a valid projection, that is, project the radar image onto the pixel image, and confirm whether each radar point is a valid projection, and can find the corresponding If the feature point is not valid projection, the radar image will be discarded. If it is valid projection, then confirm whether the distance range of the radar point in the radar image belongs to the depth range of the radar used. If not, discard the feature point. Remove the radar image to eliminate the unsuitable radar image. If the radar point belongs to the specified depth range, enter the process 204, that is, after the radar image is filtered, use the filtered radar image with depth to compare the relative depth map. Sampling, wherein, FIG. 3 is a schematic diagram of the radar image sampling of the present invention. As shown in the figure, the present invention can achieve the purpose of eliminating noise by sampling around the radar point of the radar image. After that, The relative depth value corresponding to each effective radar point will be obtained; in the process 205, the value of the relative depth corresponding to the effective radar point is used to calculate the magnification of a single frame; in the process 206, the pixel images of multiple frames and corresponding to calculate multiple magnifications of multiple frames (for example, M frames); in process 207, the multiple magnifications formed by the multiple frames are averaged to obtain the final corrected magnification; in process 208, The obtained final correction magnification is used to correct the relative depth; in the process 209, the depth range is limited, that is, the relative depth is required to be within the depth range after the correction; in the process 210, the median alignment is adjusted again ; Finally, in the process 211, a depth image with absolute depth is obtained.

圖4為本發明之深度圖像之生成系統第一實施例之架構示意圖。如圖所示，於本實施例中，本發明之深度圖像之生成系統10包括深度預測模組11、雷達擷取模組12以及與深度預測模組11和雷達擷取模組12連接之深度融合模組13，其中，深度融合模組13自深度預測模組11和雷達擷取模組12分別獲得相對深度及對應相對深度之量測深度，以經計算及校正後，形成深度圖像。關於本發明之生成系統10，詳細說明如下。 FIG. 4 is a schematic structural diagram of the first embodiment of the depth image generation system of the present invention. As shown in the figure, in this embodiment, the depth image generation system 10 of the present invention includes a depth prediction module 11 , a radar capture module 12 , and a module connected to the depth prediction module 11 and the radar capture module 12 . The deep fusion module 13, wherein the deep fusion module 13 obtains the phase data from the depth prediction module 11 and the radar capture module 12 respectively. The depth and the measured depth corresponding to the relative depth are calculated and corrected to form a depth image. The generation system 10 of the present invention will be described in detail below.

深度預測模組11係利用像素圖像中之單個RGB影像產生深度資訊，亦即深度預測模組11於接收像素圖像後進行分析，以獲得該像素圖像之深度特徵點以及對應深度特徵點之相對深度。具體而言，深度預測模組11透過使用預先訓練完成之深度學習網路對像素圖像進行分析，以自該像素圖像中取得深度特徵點，並利用該深度特徵點得到相對深度，進一步地，深度學習網路對像素圖像之分析可分為編碼與解碼兩個階段，其中，編碼部分係在於對像素圖像找出對應之特徵點，而解碼部分在利用特徵點以計算出對應之深度資訊。 The depth prediction module 11 uses a single RGB image in the pixel image to generate depth information, that is, the depth prediction module 11 analyzes the pixel image after receiving the pixel image to obtain the depth feature points of the pixel image and the corresponding depth feature points. the relative depth. Specifically, the depth prediction module 11 analyzes the pixel image by using the pre-trained deep learning network to obtain depth feature points from the pixel image, and uses the depth feature points to obtain the relative depth, and further , the analysis of pixel images by the deep learning network can be divided into two stages: encoding and decoding. Among them, the encoding part is to find the corresponding feature points for the pixel image, and the decoding part uses the feature points to calculate the corresponding feature points. In-depth information.

雷達擷取模組12係具備可銜接嵌入式系統裝置中的影像擷取裝置，其主要是通過序列(serial)的傳輸方式將雷達點雲資訊傳遞到系統以進行其他運算，雷達擷取模組12可擷取包括對應像素圖像的深度特徵點之雷達點以及對應雷達點之量測深度的雷達圖像，亦即，本發明透過雷達擷取模組12取得具有雷達點及量測深度之雷達圖像。 The radar capture module 12 is provided with an image capture device that can be connected to the embedded system device, and mainly transmits the radar point cloud information to the system through serial transmission for other operations. The radar capture module 12 can capture radar points including depth feature points corresponding to pixel images and radar images corresponding to the measured depths of the radar points, that is, the present invention obtains radar points and measured depths through the radar capture module Radar image.

深度融合模組13係用於將深度預測模組11所得之相對深度與雷達擷取模組12所得之雷達圖像中量測深度進行融合。本發明之深度融合模組13連接深度預測模組11與雷達擷取模組12並接收具有相對深度之像素圖像以及具有量測深度之雷達圖像，該深度融合模組13計算量測深度與相對深度之間的倍率，再透過該倍率以校正像素圖像，亦即，使像素圖像之相對深度藉由倍率校正成絕對深度，藉以形成具有絕對深度之深度圖像。須說明者，深度融合模組13主要用於處理雷達點雲和相對深度圖，其中，雷達圖像為雷達點雲，其雖僅為稀疏之點雲圖，但擁有準確之距離資訊，另外，像素圖像則只具有物件與物件之間的相對深度，但其為密集的深度圖，本發明即透過深度融合模組13融合相對深度與量測深度，以形成深度圖像。 The depth fusion module 13 is used to fuse the relative depth obtained by the depth prediction module 11 and the measured depth in the radar image obtained by the radar capture module 12 . The depth fusion module 13 of the present invention connects the depth prediction module 11 and the radar capture module 12 and receives a pixel image with relative depth and a radar image with measured depth, the depth fusion module 13 calculates the measured depth The magnification between the relative depth and the magnification is used to correct the pixel image, that is, the relative depth of the pixel image is corrected to the absolute depth by the magnification, thereby forming a depth image with absolute depth. It should be noted that the depth fusion module 13 is mainly used to process radar point clouds and relative depth maps, wherein the radar images are radar point clouds, which are only sparse point cloud images, but have accurate distance information. The image only has the object-to-object Relative depth, but it is a dense depth map, the present invention fuses the relative depth and the measured depth through the depth fusion module 13 to form a depth image.

於一實施例中，深度融合模組13係於接收複數個像素圖像及對應之複數個雷達圖像後，先透過計算各對應幀之像素圖像以及雷達圖像取得各幀之倍率，藉以取得複數個幀倍率，再平均複數個幀倍率以得到校正倍率，據此，深度融合模組13即可藉該校正倍率以校正像素圖像，藉此形成對應之深度圖像，如此透過多個圖像的數據平均，可降低僅採用單一圖像而有造成較大之誤差值的情況。 In one embodiment, after receiving a plurality of pixel images and a plurality of corresponding radar images, the depth fusion module 13 first obtains the magnification of each frame by calculating the pixel images and radar images of each corresponding frame, so as to obtain the ratio of each frame. Obtain a plurality of frame magnifications, and then average the plurality of frame magnifications to obtain a correction magnification. Accordingly, the depth fusion module 13 can use the correction magnification to correct the pixel image, thereby forming a corresponding depth image. The data averaging of the images can reduce the situation of large error values caused by using only a single image.

於另一實施例中，像素圖像係包括複數個深度特徵點以及複數個相對深度，而雷達圖像包括對應各深度特徵點之複數個雷達點以及複數個量測深度，該深度融合模組13能透過複數個量測深度與複數個相對深度之間的計算，以取得複數個倍率，接著平均該複數個倍率以產生平均倍率，據此，本發明可利用平均倍率來校正像素圖像，亦可降低單一圖像僅採單一倍率可能造成之誤差問題。 In another embodiment, the pixel image includes a plurality of depth feature points and a plurality of relative depths, and the radar image includes a plurality of radar points corresponding to each depth feature point and a plurality of measurement depths. The depth fusion module 13. A plurality of magnifications can be obtained through the calculation between a plurality of measurement depths and a plurality of relative depths, and then the plurality of magnifications can be averaged to generate an average magnification. Accordingly, the present invention can utilize the average magnification to correct pixel images, It can also reduce the error problem that may be caused by only taking a single magnification for a single image.

第5圖為本發明之深度圖像之生成系統第二實施例之架構示意圖。如圖所示，本實施例之深度圖像之生成系統10’的深度預測模組11、雷達擷取模組12以及深度融合模組13與圖4相同，於此不在贅述，而不同處在於本實施例中，深度圖像之生成系統10’復包括圖像擷取模組14以及校正模組15。 FIG. 5 is a schematic diagram of the structure of the second embodiment of the depth image generation system of the present invention. As shown in the figure, the depth prediction module 11 , the radar capture module 12 and the depth fusion module 13 of the depth image generation system 10 ′ of this embodiment are the same as those in FIG. In this embodiment, the depth image generation system 10 ′ further includes an image capturing module 14 and a calibration module 15 .

圖像擷取模組14係用以擷取供深度預測模組11分析之像素圖像，簡言之，其為具備可銜接於嵌入式裝置之模組，例如網路攝影機、數位攝影機、智慧型手機、行車紀錄器等電子設備，本發明可藉圖像擷取模組14即時獲得所處環境之像素圖像。 The image capture module 14 is used to capture pixel images for analysis by the depth prediction module 11. In short, it is a module that can be connected to embedded devices, such as network cameras, digital cameras, smart For electronic equipment such as mobile phones, driving recorders, etc., the present invention can obtain the pixel image of the environment in real time through the image capturing module 14 .

校正模組15係連接深度預測模組11、雷達擷取模組12以及深度融合模組13，以接收像素圖像以及雷達圖像，其中，校正模組15利用投影方式將雷達圖像投影至像素圖像，使得雷達點與深度特徵點之間相對應，經投影之像素圖像及雷達圖像之後被傳送至深度融合模組13進行深度融合。易言之，校正模組15用以將雷達點自雷達圖像之坐標投影至像素圖像之坐標上，以使雷達點對應深度影像之深度特徵點，即於像素圖像中顯示雷達點所對應到的影像圖像中之像素的位置。具體而言，圖像擷取模組14以相機為例，於校正模組15進行投影前，須先計算相機的內部參數和雷達對於相機的相對位置，之後，若相機之內部參數和雷達與相機之相對位置於未改變之前提下，無需再進行校正，亦即，雷達與相機之間只須進行一次校正即可，於投影階段則可直接利用預先建立之投影矩陣進行轉換，以達對應之目的。 The correction module 15 is connected to the depth prediction module 11 , the radar capture module 12 and the depth fusion module 13 to receive the pixel image and the radar image, wherein the correction module 15 projects the radar image onto the radar image by means of projection. The pixel image makes the radar point and the depth feature point correspond, and the projected pixel image and the radar image are then sent to the depth fusion module 13 for depth fusion. In other words, the correction module 15 is used to project the coordinates of the radar point from the radar image to the coordinates of the pixel image, so that the radar point corresponds to the depth feature point of the depth image, that is, the pixel image displays the location of the radar point. The position of the pixel in the corresponding image image. Specifically, the image capture module 14 takes a camera as an example. Before the calibration module 15 performs projection, the internal parameters of the camera and the relative position of the radar to the camera must be calculated. Under the premise that the relative position of the camera is not changed, no further correction is required, that is, only one correction is required between the radar and the camera, and the pre-established projection matrix can be directly used for conversion in the projection stage to achieve the corresponding purpose.

圖6為應用深度圖像之生成方法之定位系統的系統架構示意圖。如圖所示，本發明之定位系統60包括深度預測模組61、雷達擷取模組62、深度融合模組63以及同步定位與地圖建構模組64，進一步地，復可包括圖像擷取模組65以及校正模組66，其中，深度預測模組61、雷達擷取模組62、深度融合模組63、圖像擷取模組65以及校正模組66之功能與運作，如前面深度圖像之生成系統所述，故不再贅言，以下僅針對同步定位與地圖建構模組64進行說明。 FIG. 6 is a schematic diagram of a system architecture of a positioning system applying a depth image generation method. As shown in the figure, the positioning system 60 of the present invention includes a depth prediction module 61, a radar capture module 62, a depth fusion module 63, and a simultaneous positioning and map construction module 64. Further, it may further include image capture. The module 65 and the calibration module 66, wherein the functions and operations of the depth prediction module 61, the radar capture module 62, the depth fusion module 63, the image capture module 65 and the calibration module 66, such as the front depth The image generation system is described above, so it will not be repeated here. The following only describes the synchronized positioning and map construction module 64 .

同步定位與地圖建構模組64用於將自深度融合模組63所接收之深度圖像以及同步定位與地圖建構模組64所預存之區域地圖作比對，以依據比對結果而於區域地圖中進行定位，據以得到於該區域地圖中所處之位置。簡言之，定位系統60是基於深度融合模組63所產生之深度圖像，而於區域地圖中進行定位，其具體定位程序，詳如後述。 The synchronous positioning and map construction module 64 is used for comparing the depth image and the synchronous positioning received from the deep fusion module 63 with the regional map pre-stored by the map construction module 64, so as to create the regional map according to the comparison result. Positioning is carried out in the map to obtain the location on the map of the area. In short, the positioning system 60 performs positioning in the area map based on the depth image generated by the depth fusion module 63 . The specific positioning procedure will be described in detail later.

圖7為本發明之定位系統60中同步定位與地圖建構模組64之架構示意圖。如圖所示，本發明之同步定位與地圖建構模組64包括特徵點萃取單元641、追蹤或重新定位單元642以及優化與區域定位單元643。 FIG. 7 is a schematic structural diagram of the synchronous positioning and map construction module 64 in the positioning system 60 of the present invention. As shown in the figure, the synchronized localization and map construction module 64 of the present invention includes a feature point extraction unit 641 , a tracking or relocation unit 642 , and an optimization and area localization unit 643 .

特徵點萃取單元641係於同步定位與地圖建構模組64自深度融合模組63取得深度圖像，對所接收之每一幀深度圖像進行圖像特徵點萃取。 The feature point extraction unit 641 obtains the depth image from the depth fusion module 63 in the simultaneous positioning and map construction module 64, and performs image feature point extraction on each received frame of the depth image.

追蹤或重新定位單元642係連接該特徵點萃取單元641，以於特徵點萃取單元641對當前所接收之深度圖像萃取出圖像特徵點時，利用該深度圖像之圖像特徵點對前一幀之深度圖像的先前圖像特徵點進行追蹤。簡言之，追蹤或重新定位單元642透過比對當前幀之深度圖像的圖像特徵點與前一個深度圖像之先前圖像特徵點，以進行特徵點追蹤。 The tracking or repositioning unit 642 is connected to the feature point extraction unit 641, so that when the feature point extraction unit 641 extracts image feature points from the currently received depth image, the image feature points of the depth image are used for The previous image feature points of the depth image of one frame are tracked. In short, the tracking or relocation unit 642 performs feature point tracking by comparing the image feature points of the depth image of the current frame with the previous image feature points of the previous depth image.

優化與區域定位單元643係連接追蹤或重新定位單元642，且於追蹤或重新定位單元642之追蹤成功後，依據追蹤結果進行分析，以取得深度圖像對應的定位位置，進而得到本發明之定位系統的相機姿態。易言之，本發明之定位系統於追蹤或重新定位單元642追蹤成功後，經由優化與區域定位單元643計算出相機姿態，相機姿態之資訊會同時進行輸出以及於區域地圖中紀錄當前相機姿態。 The optimization and area positioning unit 643 is connected to the tracking or repositioning unit 642, and after the tracking or repositioning unit 642 succeeds in tracking, it analyzes according to the tracking result to obtain the positioning position corresponding to the depth image, and then obtains the positioning of the present invention The camera pose of the system. In other words, after the tracking or repositioning unit 642 successfully tracks, the positioning system of the present invention calculates the camera pose through the optimization and regional positioning unit 643, and the camera pose information is simultaneously output and the current camera pose is recorded in the regional map.

據此，本發明能透過圖6之深度預測模組61、雷達擷取模組62以及深度融合模組63生成深度圖像，提供同步定位與地圖建構模組64對深度圖像進行追蹤，進而確定在區域地圖之實際位置，以達到定位之目的。 Accordingly, the present invention can generate a depth image through the depth prediction module 61, the radar capture module 62, and the depth fusion module 63 in FIG. 6, and provide the synchronous positioning and map construction module 64 to track the depth image, and then Determine the actual location on the regional map to achieve the purpose of positioning.

進一步地，圖7所示之同步定位與地圖建構模組64復包括相機姿態輸出單元644、區域地圖單元645、關鍵幀生成單元646以及迴環檢測單元647。 Further, the synchronized positioning and map construction module 64 shown in FIG. 7 further includes a camera pose output unit 644 , an area map unit 645 , a key frame generation unit 646 and a loop closure detection unit 647 .

相機姿態輸出單元644係用以輸出深度圖像之定位位置所對應之相機姿態，其透過視訊圖形陣列(VGA)、數位視訊介面(DVI)、高畫質多媒體介面(HDMI)或顯示埠(Display Port，DP)等輸出埠，以供內嵌式或外接式之顯示裝置連接，藉由前述之輸出埠對應連接到影像顯示裝置，提供同步定位與地圖建構模組64之視機姿態。簡言之，本發明可透過相機姿態輸出單元644提供視覺化輸出資訊，相機姿態輸出單元644可與本發明之定位系統所設置之相容的顯示設備透過如VGA、DVI、HDMI或Display Port等輸出介面進行連接，以傳送視覺化輸出資訊，易言之，相機姿態單元644可提供一個接口通道給其他模組或裝置取得相機姿態之資訊或進一步對相機姿態執行資料視覺化。另外，相機姿態輸出單元644亦提供相機姿態在三維空間中的變換矩陣(Transformation matrix)，該變換矩陣內包含了相機姿態當前的位置資訊，可提供給其他系統使用或記錄軌跡等用途。 The camera pose output unit 644 is used for outputting the camera pose corresponding to the positioning position of the depth image, through a video graphics array (VGA), a digital video interface (DVI), a high-definition multimedia interface (HDMI) or a display port (DisplayPort) Port, DP) and other output ports for the connection of embedded or external display devices, and correspondingly connected to the image display device through the aforementioned output ports to provide simultaneous positioning and the video posture of the map building module 64 . In short, the present invention can provide visual output information through the camera posture output unit 644, and the camera posture output unit 644 can be compatible with the display device set by the positioning system of the present invention through, such as VGA, DVI, HDMI or Display Port, etc. The output interface is connected to transmit the visual output information. In other words, the camera posture unit 644 can provide an interface channel for other modules or devices to obtain the camera posture information or further perform data visualization on the camera posture. In addition, the camera pose output unit 644 also provides a transformation matrix of the camera pose in the three-dimensional space. The transformation matrix includes the current position information of the camera pose, which can be provided to other systems for use or track recording.

區域地圖單元645係用以儲存該區域地圖，以及於追蹤或重新定位單元642無法對深度圖像之圖像特徵點與前一個深度圖像之先前圖像特徵點進行追蹤時，提供區域地圖與深度影像進行比對，以利用區域地圖與深度影像之比對結果重新定位。由上可知，定位系統經前述重新定位時，優化與區域定位單元643將取得深度圖像對應之定位位置，此時可同步更新區域地圖單元645內之區域地圖。 The area map unit 645 is used to store the area map, and when the tracking or relocation unit 642 cannot track the image feature points of the depth image and the previous image feature points of the previous depth image, provide the area map and the area map. The depth image is compared to relocate using the result of the comparison between the area map and the depth image. As can be seen from the above, when the positioning system is relocated as described above, the optimization and area positioning unit 643 will obtain the positioning position corresponding to the depth image, and at this time, the area map in the area map unit 645 can be updated synchronously.

另外，本發明之定位系統於追蹤或重新定位單元642追蹤成功後，經由優化與區域定位單元643計算出相機姿態，相機姿態之資訊會同時送到相機姿態輸出單元644以及區域地圖單元645紀錄下當前相機姿態。 In addition, after the tracking or relocation unit 642 successfully tracks the positioning system of the present invention, the camera posture is calculated by the optimization and area positioning unit 643, and the camera posture information will be simultaneously sent to the camera posture output unit 644 and the area map unit 645 for recording. Current camera pose.

關鍵幀生成單元646係用於在該區域地圖中插入關鍵幀，且於插入該關鍵幀時，比對該關鍵幀與該區域地圖中之其他關鍵幀，以於比對結果匹配時，利用該圖像特徵點形成三維地圖。具體而言，關鍵幀生成單元646會存取區域地圖單元645中之區域地圖，以於追蹤或重新定位單元642追蹤的過程中，於一定之間隔(例如時間間隔或幀間隔)插入關鍵幀，並由當前關鍵幀與區域地圖中的關鍵幀中相匹配的特徵點生成三維(3D)地圖點。 The key frame generation unit 646 is used to insert a key frame in the area map, and when inserting the key frame, compare the key frame with other key frames in the area map, so that when the comparison result matches, use the key frame. Image feature points form a three-dimensional map. Specifically, the key frame generation unit 646 accesses the area map in the area map unit 645 to insert key frames at certain intervals (eg, time intervals or frame intervals) during the tracking process of the tracking or repositioning unit 642, And three-dimensional (3D) map points are generated from the feature points matching the current key frame and the key frame in the area map.

同步定位與地圖建構模組64係利用迴環檢測單元647計算關鍵幀與該區域地圖之該其他關鍵幀之間的相似度，以於該相似度超過閾值時，優化該區域地圖。易言之，迴環檢測單元647會計算新生成與舊有地圖點、關鍵幀的相似度，若相似度高於閾值，則會對所有的地圖資訊做優化，來消除飄移誤差(drift error)，再將優化後結果存回區域地圖內。 The simultaneous positioning and map construction module 64 uses the loop closure detection unit 647 to calculate the similarity between the key frame and the other key frames of the area map, so as to optimize the area map when the similarity exceeds a threshold. In other words, the loop closure detection unit 647 will calculate the similarity between the newly generated map points and key frames, and if the similarity is higher than the threshold, all map information will be optimized to eliminate drift errors. Then save the optimized results back into the area map.

圖8為本發明之形成投影矩陣之流程圖。如圖所示，為使本發明之同步定位與地圖建構模組64的軌跡接近全球定位系統(Global Positioning System，GPS)之準確度，本發明進一步利用投影矩陣對每一個相機姿態進行投影。於流程801中，建置或預先錄製ROSBAG資料庫；於流程802-804中，利用建置之ROSBAG資料庫產生SLAM建圖，並儲存SLAM建圖為SLAM地圖以及獲得一個初步的建圖軌跡；於流程805中，取得GPS軌跡；於流程806-807中，計算建圖軌跡和GPS軌跡之間的對應，以產生投影矩陣。據此，本發明於定位階段後，利用產生對應建圖軌跡和GPS軌跡之投影矩陣，以供後續對每一個相機姿態和GPS軌跡作投影，以獲得更加精準的軌跡圖。 FIG. 8 is a flow chart of forming a projection matrix according to the present invention. As shown in the figure, in order to make the trajectory of the synchronized positioning and map construction module 64 of the present invention approach the accuracy of the Global Positioning System (GPS), the present invention further utilizes a projection matrix to project each camera pose. In process 801, a ROSBAG database is built or pre-recorded; in processes 802-804, a SLAM map is generated using the built ROSBAG database, and the SLAM map is stored as a SLAM map and a preliminary map-building track is obtained; In the process 805, the GPS track is obtained; in the processes 806-807, the correspondence between the mapping track and the GPS track is calculated to generate a projection matrix. Accordingly, after the positioning stage, the present invention generates a projection matrix corresponding to the mapping trajectory and the GPS trajectory for subsequent projection of each camera pose and GPS trajectory to obtain a more accurate trajectory map.

圖9為本發明之定位系統之相機姿態投影流程圖。如圖所示，於流程901-902中，提供SLAM建圖以及圖8所產生之投影矩陣至定位系統，於流程 903-904中，本發明之定位系統接收SLAM建圖以及投影矩陣進行分析，以輸出相機姿態。 FIG. 9 is a flow chart of camera pose projection of the positioning system of the present invention. As shown in the figure, in the processes 901-902, the SLAM mapping and the projection matrix generated in FIG. 8 are provided to the positioning system, and in the process In 903-904, the positioning system of the present invention receives the SLAM map and the projection matrix for analysis to output the camera pose.

圖10A-10D為本發明之定位效果軌跡圖。如圖所示，於四種不同路線上，透過本發明之定位係統所取得之定位效果及地圖構建之結果，幾近於習知利用光學雷達之LIDAR SLAM和GPS之效果，因而本發明所述技術確實能取代現有應用光學雷達之SLAM，且其精確度也近似GPS軌跡。 10A-10D are locating effect trajectory diagrams of the present invention. As shown in the figure, on four different routes, the positioning effect and map construction result obtained by the positioning system of the present invention are almost the same as those of the conventional LIDAR SLAM and GPS using optical radar, so the present invention The technology can indeed replace the existing SLAM application of LiDAR, and its accuracy is also similar to the GPS trajectory.

綜上所述，本發明係建立一種雷達與攝像機傳感器融合的深度圖像之生成系統，其中，本發明利用雷達和攝像機傳感器(例如單眼相機)之優勢來實現這種融合，透過對單眼相機之深度估計，可僅用單一像素圖像來估計環境每一個像素的密集深度，惟像素圖像僅缺少絕對深度信息，對此，本發明導入雷達以彌補其缺失，雷達擁有價格便宜之優勢，且其可提供有關周圍環境之稀疏的深度訊息，因此，本發明使用雷達提供之稀疏的深度訊息，以與單眼相機之深度估計所產生之估計深度進行融合，藉以創建可提供周圍環境之絕對深度圖像。再者，本發明進一步將深度圖像導入至SLAM系統，SLAM系統僅須利用本發明所生成之深度圖像取得深度信息，而無需在隨後幀的關鍵點內使用三角測量來計算周圍的深度。 To sum up, the present invention establishes a system for generating a depth image fused by radar and camera sensors, wherein the present invention utilizes the advantages of radar and camera sensors (such as monocular cameras) to achieve such fusion, and through the fusion of monocular cameras Depth estimation can only use a single pixel image to estimate the dense depth of each pixel of the environment, but the pixel image only lacks absolute depth information. For this, the present invention introduces radar to make up for its lack. The radar has the advantage of being cheap, and It can provide sparse depth information about the surrounding environment, therefore, the present invention uses the sparse depth information provided by the radar to fuse with the estimated depth generated by the depth estimation of the monocular camera, thereby creating an absolute depth map that can provide the surrounding environment. picture. Furthermore, the present invention further imports the depth image into the SLAM system, and the SLAM system only needs to use the depth image generated by the present invention to obtain depth information without using triangulation in the key points of the subsequent frame to calculate the surrounding depth.

本發明可於兩個方面使SLAM系統受益，亦即，於SLAM系統的製圖任務中，本發明可具有有關周圍環境的更準確的深度信息，因此，可創建一個具有比單眼SLAM更好並且與LIDAR SLAM相當的比例的地圖；在定位任務中，它不僅可使用後續幀之間的特徵匹配進行定位，還可使用視覺軌跡進行定位，並減少由於純旋轉而丟失跟踪的機會，此可增加SLAM系統的穩定性，並增加重新定位和成功定位的機會。綜上，本發明利用雷達和如相機等圖像擷取裝置即可生成具有絕對深度之深度圖，據以達到具有與GPS或LIDAR SLAM相當的比例之地圖之功效，將有助於減少系統建置之成本。 The present invention can benefit a SLAM system in two ways, namely, in the mapping task of a SLAM system, the present invention can have more accurate depth information about the surrounding environment, and therefore, can create a LIDAR SLAM comparable scale map; in localization tasks, it can use not only feature matching between subsequent frames for localization, but also visual trajectories for localization and reduce the chance of losing tracking due to pure rotation, which increases SLAM stability of the system and increase the chances of repositioning and successful positioning. To sum up, the present invention can generate the A depth map with absolute depth can be obtained, so as to achieve the effect of a map with a scale equivalent to that of GPS or LIDAR SLAM, which will help reduce the cost of system construction.

上述實施形態僅例示性說明本發明之原理及其功效，而非用於限制本發明。任何熟習此項技藝之人士均可在不違背本發明之精神及範疇下，對上述實施形態進行修飾與改變。因此，本發明之權利保護範圍，應如後述之申請專利範圍所列。 The above-mentioned embodiments merely illustrate the principles and effects of the present invention, but are not intended to limit the present invention. Any person skilled in the art can modify and change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Therefore, the protection scope of the present invention should be as listed in the patent application scope described later.

S101~S104:步驟 S101~S104: Steps

Claims

A method for generating a depth image, comprising: providing a pixel image including a depth feature point and a relative depth corresponding to the depth feature point; providing a radar point corresponding to the depth feature point and a measured depth corresponding to the radar point. radar image; calculating the magnification between the measured depth and the relative depth; and correcting the pixel image through the magnification to form a depth image, wherein the pixel image includes a plurality of the depth feature points and a plurality of For the relative depth, the radar image includes a plurality of the radar points and a plurality of the measurement depths, and obtains a plurality of magnifications by calculating between the plurality of the measurement depths and the plurality of the relative depths, and averages the plurality of magnifications to generate an average magnification for correcting the pixel image using the average magnification.

The method for generating a depth image according to claim 1, further comprising analyzing the pixel image through a deep learning network to obtain the depth feature point from the pixel image, so as to obtain the depth feature point by using the depth feature point. relative depth.

The method for generating a depth image according to claim 1, wherein the radar image and the pixel image are projected by means of a projection method, so that the radar point corresponds to the depth feature point.

The method for generating a depth image according to claim 1, further comprising obtaining a plurality of frame magnifications through a plurality of the pixel images and a plurality of the radar images, and averaging the plurality of frame magnifications to generate a correction magnification, so as to obtain a plurality of frame magnifications. The pixel image is corrected by the correction magnification to form the depth image.

A system for generating a depth image, comprising: a depth prediction module for receiving and analyzing a pixel image to obtain a depth feature point of the pixel image and a relative depth corresponding to the depth feature point; The radar acquisition module is used to acquire the radar image including the radar point corresponding to the feature point of the depth and the measured depth corresponding to the radar point; and the depth fusion module is used to connect the depth prediction module and the radar a capture module for calculating a magnification between the measured depth and the relative depth, so as to correct the pixel image through the magnification to form a depth image, wherein the pixel image includes a plurality of the depth feature points and A plurality of the relative depths, the radar image includes a plurality of the radar points and a plurality of the measured depths, and the depth fusion module obtains a plurality of magnifications by calculating between the plurality of the measured depths and the plurality of the relative depths , and the plurality of magnifications are averaged to generate an average magnification for correcting the pixel image using the averaged magnification.

The system for generating a depth image according to claim 5, further comprising an image capturing module connected to the depth prediction module for capturing the pixel image to provide the depth prediction module with the pixel image Group.

The system for generating a depth image according to claim 5, wherein the depth prediction module analyzes the pixel image through a deep learning network, so as to obtain the depth feature point from the pixel image, so as to use The depth feature point obtains the relative depth.

The system for generating a depth image as claimed in claim 5, further comprising a correction module connected to the depth prediction module and the radar capture module, and used for making the radar point and the depth feature point match by projection. Correspondingly, the projected pixel image and the radar image are sent to the deep fusion module.

The system for generating a depth image according to claim 5, wherein the depth fusion module obtains a plurality of frame magnifications through calculation between a plurality of the pixel images and a plurality of the radar images, and averages the plurality of frame magnifications. The frame magnification is used to generate a correction magnification, so that the pixel image is corrected by the correction magnification to form the depth image.

A positioning system, comprising: a depth prediction module for receiving and analyzing a pixel image to obtain a depth feature point of the pixel image and a relative depth corresponding to the depth feature point; a radar capture module for using In capturing the radar image including the radar point corresponding to the feature point of the depth and the measured depth corresponding to the radar point; the depth fusion module is connected to the depth prediction module and the radar capture module to calculate the measuring a magnification between the depth and the relative depth, to correct the pixel image through the magnification to form a depth image; and a synchronous positioning and map building module for comparing the depth image formed by the depth fusion module The depth image and the pre-stored area map are used for positioning in the area map according to the comparison result, wherein, the synchronous positioning and map construction module further includes: a feature point extraction unit, which is used for the depth image. Image feature point extraction; a tracking or repositioning unit is connected to the feature point extraction unit for comparing the image feature point of the depth image with the previous image feature point of the previous depth image to perform feature extraction. The point tracking; and the optimization and area positioning unit are connected to the tracking or repositioning unit for analyzing the tracking result of the tracking or repositioning unit to obtain the positioning position corresponding to the depth image.

The positioning system of claim 10, further comprising an image capture module and a calibration module, wherein the image capture module is used to capture the pixel image to provide the pixel image to the depth a prediction module, and the correction module receives the pixel image from the depth prediction module and the radar image from the radar capture module, so as to make the radar point correspond to the depth feature point by means of projection , so as to transmit the projected pixel image and the radar image to the deep fusion module.

The positioning system of claim 10, wherein the simultaneous positioning and map building module comprises an area map unit for storing the area map, so that the image of the depth image cannot be obtained by the tracking or relocation unit When the feature point is tracked with the feature point of the previous image of the previous depth image, the depth image is compared with the map of the area to perform repositioning.

The positioning system of claim 10, wherein the optimization and area positioning unit updates the area map synchronously when the positioning position corresponding to the depth image is obtained.

The positioning system of claim 10, wherein the synchronous positioning and map construction module further comprises a key frame generating unit for inserting a key frame in the area map, so as to compare the key frame when inserting the key frame The frame and other key frames in the area map are used to form a three-dimensional map using the image feature points when the comparison result matches.

The positioning system of claim 14, wherein the synchronized positioning and map construction module further comprises a loop closure detection unit for calculating the similarity between the key frame and the other key frames of the area map, so as to determine the similarity between the key frame and the other key frames of the area map. When the similarity exceeds the threshold, optimize the area map.

The positioning system of claim 10, wherein the simultaneous positioning and map construction module comprises a camera pose output unit for outputting the positioning position corresponding to the depth image.

The positioning system of claim 16, wherein the camera attitude output unit is an output of a video graphics array (VGA), a digital video interface (DVI), a high-definition multimedia interface (HDMI) or a display port (Display Port) port.