TWI639136B

TWI639136B - Real-time video stitching method

Info

Publication number: TWI639136B
Application number: TW106141636A
Authority: TW
Inventors: 陳昭和; 陳聰毅; 余碩文
Original assignee: 國立高雄科技大學
Priority date: 2017-11-29
Filing date: 2017-11-29
Publication date: 2018-10-21
Also published as: TW201926244A

Abstract

本發明係揭露一種即時視訊畫面拼接方法，係包含五個部分：(1)影像前處理：採用雙線性插值法(Bilinear Interpolation)來進行影像縮減取樣以降低後續處理時間；(2)尋找特徵點：使用SIFT法搜尋特徵點，透過RANSAC法剃除異常的特徵點，接著以DBSCAN法求得畫面中每一特徵點群聚且計算其核心點給予後續處理使用；(3)計算最佳透視變換矩陣：計算出畫面影像的自適應性場景分群線且依核心點分佈位置分成兩群，並計算此兩群核心點數量之比值，依此比值判斷以篩選出能夠得到較佳透視變換矩陣的特徵點集合，將之存入候選特徵點集合暫存器中，接著從候選特徵點集合中選取具有特徵點數量最多之集合，並計算其此特徵點集合的單應性矩陣以做為最佳透視變換矩陣；(4)場景校正暨補償：使用取得之最佳透視變換矩陣針對匹配過後的全張影像進行校正暨補償；(5)視訊拼接：依特徵點分佈將影像劃分成重疊區域與非重疊區域，使用多頻帶融合法對重疊區域進行拼接，而使用線性融合法對非重疊區域進行拼接，最後再將拼接後的兩區域進行疊加以產生最後拼接結果。 The present invention discloses a real-time video image stitching method, which includes five parts: (1) Image pre-processing: Bilinear Interpolation is used to perform image downsampling to reduce subsequent processing time; (2) Find features Points: Use SIFT method to search for feature points, use RANSAC method to shave out abnormal feature points, and then use DBSCAN method to find the cluster of each feature point in the picture and calculate its core point for subsequent processing; (3) Calculate the best perspective Transformation matrix: Calculate the adaptive scene grouping line of the screen image and divide it into two groups according to the distribution position of the core points, and calculate the ratio of the number of core points of the two groups, according to the ratio to determine the best perspective transformation matrix Feature point set, store it in the candidate feature point set register, then select the set with the largest number of feature points from the candidate feature point set, and calculate the homography matrix of this feature point set as the best Perspective transformation matrix; (4) Scene correction and compensation: use the best perspective transformation matrix obtained to correct and compensate the matched full-frame image; (5) Video stitching: Divide the image into overlapping areas and non-overlapping areas according to the feature point distribution, use the multi-band fusion method to stitch the overlapping areas, and use the linear fusion method to stitch the non-overlapping areas, and finally stitch the two areas after stitching. Superimposed to produce the final stitching result.

Description

Real-time video picture stitching method

本發明是有關於一種畫面拼接方法，特別是有關於一種能產出高品質拼接畫面的即時視訊畫面拼接方法。 The present invention relates to a picture splicing method, and in particular to a real-time video picture splicing method capable of producing high-quality spliced pictures.

現今監控攝影機以廣泛的應用在各式場所，同一區域中有多台攝影機也相當常見，但監控的效益卻未隨著攝影機增加而逐漸增強反而隨之下降，原因是在監控畫面中子畫面增加導致監控人員容易發生空間錯亂現象，且並無法專注於監視所有子畫面而造成視覺疲憊現象，所以監控畫面拼接系統在現今環境下是有必要實行的。 Nowadays, surveillance cameras are widely used in various places, and it is quite common to have multiple cameras in the same area, but the effectiveness of surveillance has not gradually increased with the increase of cameras, but has declined. The reason is that the sub-pictures increase in the surveillance screen. As a result, surveillance personnel are prone to spatial confusion and cannot focus on monitoring all the sub-pictures to cause visual fatigue, so the surveillance picture splicing system is necessary to implement in today's environment.

再者近年來恐怖組織猖獗使得許多國家紛紛受到嚴重的損害，使得在人群較多的大型公開場合如大型廣場及公園中的監控攝影機數量也逐漸上升，使得安全監控領域格外受到大家關注，因此有關安全監控的電腦視覺技術紛紛問世如異物闖入偵測、人臉偵測暨辨識、遺留物偵測暨辨識等，而各項偵測暨辨識系統也會因為監控畫面並未經過整理而產生許多狀況，本發明之高品質拼接畫面能夠給予上述電腦視覺系統當作前級影像輸入，使得各項偵測暨辨識系統因視野增廣而強化其功能與應用範圍。 In addition, the rampant terrorist organizations in recent years have caused many countries to be seriously damaged, and the number of surveillance cameras in large public places with large crowds, such as large squares and parks, has gradually increased, making the security surveillance field particularly concerned by everyone. Computer vision technologies for security monitoring have come out such as foreign object intrusion detection, face detection and recognition, leftover detection and recognition, etc., and various detection and recognition systems will also produce many conditions because the monitoring screen has not been sorted out. The high-quality spliced picture of the present invention can be given to the above-mentioned computer vision system as a front-end image input, so that each detection and identification system strengthens its function and application range due to the increased field of view.

傳統拼接方法主要分成兩種方式，第一類則為使用單張影像進行拼接縫合現今已廣泛的應用在行動裝置中，雖然拼接接縫處已達到幾乎完全貼合，但其運算時間通常需要數秒才能夠完成，因此並無法實行在現行的監控攝影機視訊拼接中。第二類則為魚眼攝影機為主的全景拼接攝影機，此類型的拼接方法是透過固定攝影機位置來減少影像重疊區域拼接觸的誤差，但並無法精確且細緻的呈現其拼接結果，且魚眼攝影機拍攝雖然能夠涵蓋大部分的視角，但影像扭曲程度非常嚴重，扭曲的影像並無法給予後續電腦視覺之偵測暨辨識運用。 Traditional stitching methods are mainly divided into two methods. The first type is stitching and stitching using a single image. It has been widely used in mobile devices. Although the stitching seams have reached almost complete fit, the calculation time usually takes a few seconds. Can only be completed, so the current surveillance camera cannot be implemented Video camera video stitching. The second type is a panoramic stitching camera based on a fisheye camera. This type of stitching method reduces the contact error of the overlapping areas of the image by fixing the camera position, but it cannot accurately and meticulously present the stitching result. Although camera shooting can cover most perspectives, the degree of image distortion is very serious, and the distorted image cannot be used for subsequent detection and recognition of computer vision.

有鑑於上述習知之問題，本發明的目的在於提供一種能產出高品質拼接畫面的即時視訊畫面拼接方法，用以解決習知技術中所面臨之問題。 In view of the above-mentioned conventional problems, the object of the present invention is to provide a real-time video picture splicing method capable of producing high-quality spliced pictures to solve the problems faced in the conventional technology.

基於上述目的，本發明係提供一種即時視訊畫面拼接方法，係包含下列步驟：對複數個輸入影像以雙線性內插法(Bilinear Interpolation)進行縮減取樣，將複數個輸入影像正規化以產生複數個正規化輸入影像；對複數個正規化輸入影像以SIFT特徵點偵測演算法搜尋複數個特徵點；對複數個特徵點以RANSAC演算法篩選複數個特徵點，剃除各正規化輸入影像中非重疊區域之無法匹配的些特徵點；對複數個正規化輸入影像以DBSCAN演算法求得複數個特徵點群聚及其核心點；從複數個核心點之各y座標中提取最大y座標及最小y座標並對兩者作平均計算，據以得到場景分群線之y座標位置；依據場景分群線將複數個核心點分作上特徵點集合及下特徵點集合，且計算上特徵點集合之特徵點數量及下特徵點集合之特徵點數量再作比值計算，符合預設比值範圍之上特徵點集合及下特徵點集合作為候選矩陣；計算各候選矩陣中之特徵點匹配數量，從中選出具有最多之特徵點匹配數量之候選矩陣以作為最佳透視變換矩陣(Perspective Transform)；利用最佳透視變換矩陣對複數個輸入影像中之目標影像進行影像校正處理，透視變換包含放大、平移與畫面旋轉處理，且藉由投射線性變換關係執行三維變形；依複數個特徵點之分布將複數個輸入影像劃分成重疊區域與非重疊區域，使用多頻帶融合法(Multi-Band Blending)對重疊區域進行拼接，而使用線性融合法(Linear Blending)對非重疊區域進行拼接，最後再將拼接後的重疊區域及非重疊區域進行疊加以產生拼接影像。 Based on the above objective, the present invention provides a real-time video image stitching method, which includes the following steps: downsampling a plurality of input images by bilinear interpolation, and normalizing the plurality of input images to generate a complex number Normalized input image; SIFT feature point detection algorithm is used to search for a plurality of feature points for a plurality of normalized input images; RANSAC algorithm is used to filter a plurality of feature points for a plurality of feature points, and each normalized input image is shaved Unmatched feature points of non-overlapping areas; clustering of a plurality of feature points and their core points by using the DBSCAN algorithm for a plurality of normalized input images; extracting the largest y coordinate from each y coordinate of a plurality of core points and The minimum y coordinate and the average calculation of the two to obtain the y coordinate position of the scene grouping line; according to the scene grouping line, the core points are divided into the upper feature point set and the lower feature point set, and the upper feature point set is calculated The number of feature points and the number of feature points of the lower feature point set are then calculated as a ratio, which is in line with the set of feature points above and below the preset ratio range The feature point set is used as a candidate matrix; the number of feature point matches in each candidate matrix is calculated, and the candidate matrix with the largest number of feature point matches is selected as the best perspective transform matrix; the best perspective transform matrix is used for complex numbers The target image in the input image is subjected to image correction processing. Perspective transformation includes magnification, translation and screen rotation processing, and by the projection line Perform three-dimensional deformation of the sexual transformation relationship; divide the multiple input images into overlapping regions and non-overlapping regions according to the distribution of the plurality of feature points, use the Multi-Band Blending method to stitch the overlapping regions, and use the linear fusion method (Linear Blending) The non-overlapping areas are stitched together, and finally the stitched overlapping areas and non-overlapping areas are superimposed to produce a stitched image.

較佳地，進行剔除無法匹配之些特徵點時，可經由分析出該複數個特徵點之分布，再據以估計出離群點及內群點合理分布之模型，在剔除不合理之離群點。 Preferably, when the feature points that cannot be matched are eliminated, the distribution of the plurality of feature points can be analyzed, and then the model for estimating the reasonable distribution of outliers and inner cluster points can be used to eliminate unreasonable outliers. point.

較佳地，對應提高估算合理分布之模型的正確率，可提高迭代次數，如下列公式所示： 1-P=(1-O^M)^K， Preferably, the number of iterations can be increased by increasing the accuracy of the model for estimating a reasonable distribution, as shown in the following formula: 1-P = (1-O ^M ) ^K ,

其中，P為計算合理群的機率，K為迭代次數，O為內群點數量，M為合理分布之模型所選定的特徵點數量。 Among them, P is the probability of calculating a reasonable group, K is the number of iterations, O is the number of inner group points, and M is the number of feature points selected by the reasonably distributed model.

較佳地，將複數個核心點分作上特徵點集合及下特徵點集合之分群公式如下所示： ,Fd(p)={y_p>T_SM},pD_p Preferably, the grouping formula for dividing a plurality of core points into an upper feature point set and a lower feature point set is as follows: , Fd (p) = {y _p > T _SM }, p D _p

其中，TSM為自適應場景線數值，D_p為以DBSCAN演算法求得之特徵點群聚，p為D_p中之特徵點，Fu為上特徵點集合，Fd為下特徵點集合。 Among them, TSM is the adaptive scene line value, D _p is the feature point clustering obtained by the DBSCAN algorithm, p is the feature point in D _p , Fu is the upper feature point set, and Fd is the lower feature point set.

較佳地，候選矩陣可符合下列條件式： IF ,THEN D_p is stored into HC Preferably, the candidate matrix can meet the following conditional expression: IF , THEN D _p is stored into HC

其中，HC為候選矩陣，R_min為預設比值範圍之下限，R_max為預設比值範圍之上限，Num(Fu)為上特徵點集合之特徵點數量，Num(Fd)為下特徵點集合之特徵點數量。 Among them, HC is the candidate matrix, R _min is the lower limit of the preset ratio range, R _max is the upper limit of the preset ratio range, Num (Fu) is the number of feature points of the upper feature point set, and Num (Fd) is the lower feature point set The number of feature points.

較佳地，從複數個核心點之各x座標中提取最大x座標及最小x座標，並據以將該輸入影像劃分成重疊區及非重疊區，其符合下列條件式： Preferably, the maximum x coordinate and the minimum x coordinate are extracted from the x coordinates of the plurality of core points, and the input image is divided into overlapping regions and non-overlapping regions according to the following conditional expressions:

其中，Img(x,y)為輸入影像，XMin(p)為最小x座標，XMax(p)為最大x座標，D_p為以DBSCAN演算法求得之特徵點群聚，p為D_p中之特徵點。 Among them, Img (x, y) is the input image, XMin (p) is the minimum x coordinate, XMax (p) is the maximum x coordinate, D _p is the feature point clustering obtained by the DBSCAN algorithm, p is D _p Characteristic points.

承上所述，本發明之即時視訊畫面拼接方法可對環境具自適應(Adaptive)能力以進行拼接且動態地計算重疊區域，拼接結果為直觀影像輸出，如此不僅有助於監控人員之觀察力，且對於後續智慧偵測系統應用可給予輔助效果。 As mentioned above, the real-time video image stitching method of the present invention can adapt to the environment to perform stitching and dynamically calculate the overlapping area. The stitching result is an intuitive image output, which not only helps the monitoring personnel to observe , And for the subsequent application of intelligent detection system can give auxiliary effects.

S1至S52‧‧‧步驟 Steps S1 to S52‧‧‧

第1圖係為本發明之即時視訊畫面拼接方法之流程圖。 FIG. 1 is a flowchart of the instant video image stitching method of the present invention.

為利瞭解本發明之特徵、內容與優點及其所能達成之功效，茲將本發明配合圖式，並以實施例之表達形式詳細說明如下，而其中所使用之圖式，其主旨僅為示意及輔助說明書之用，未必為本發明實施後之真實比例與精準配置，故不應就所附之圖式的比例與配置關係解讀、侷限本發明於實際實施上的權利範圍。 In order to better understand the features, contents and advantages of the present invention and the effects that can be achieved, the present invention is described in detail with the drawings in the form of examples, and the main purpose of the drawings is only The use of illustrations and auxiliary descriptions may not be the actual proportions and precise configurations after the implementation of the present invention, so the accompanying drawings should not be interpreted and limited to the scope of rights of the present invention in actual implementation.

本發明之優點、特徵以及達到之技術方法將參照例示性實施例及所附圖式進行更詳細地描述而更容易理解，且本發明或可以不同形式來實現，故不應被理解僅限於此處所陳述的實施例，相反地，對所屬技術領域具有通常知識者而言，所提供的實施例將使本揭露更加透徹與全面且完整地傳達本發明的範疇，且本發明將僅為所附加的申請專利範圍所定義。 The advantages, features, and technical methods of the present invention will be described in more detail with reference to the exemplary embodiments and the accompanying drawings for easier understanding, and the present invention may be implemented in different forms, so it should not be understood to be limited to this The embodiments described herein, on the contrary, to those having ordinary knowledge in the technical field, the embodiments provided will make this disclosure more thoroughly and comprehensively and completely convey the scope of the invention, and the invention will only be appended As defined by the scope of patent applications.

請參閱第1圖，其係為本發明之即時視訊畫面拼接方法之流程圖。如圖所示，本發明之即時視訊畫面拼接方法係包含下列步驟：在步驟S1中：影像前處理；在步驟S2中：尋找特徵點；在步驟S3中；計算最佳透視變換矩陣；在步驟S4中：場景校正暨補償；在步驟S5中：視訊拼接處理。 Please refer to FIG. 1, which is a flowchart of the instant video image stitching method of the present invention. As shown in the figure, the instant video image stitching method of the present invention includes the following steps: in step S1: image pre-processing; in step S2: finding feature points; in step S3; calculating the optimal perspective transformation matrix; in step In S4: scene correction and compensation; in step S5: video stitching processing.

更進一步地，在步驟S1中，更包含了步驟S11：縮減取樣；更詳細地說，由於現今主流監控攝影機解析度逐漸往更高解析度發展，而對影像或視訊處理時，高解析度的影像其資訊量雖豐富，但所耗費的處理時間也相對的很高，然而實際上某些處理運算中並不需要這麼多的像素。故本發明在視訊影像輸入後，首先將影像畫面透過雙線性內插法(Bilinear Interpolation)進行縮減取樣，這是由於雙線性內插法具有保留適當的邊緣資訊與較少的計算量，如此不僅能將輸入的攝影機畫面進行一致性的正規化處理，且可以大幅降低後續特徵點偵測處理的時間。 Furthermore, in step S1, it further includes step S11: downsampling; more specifically, since the resolution of current mainstream surveillance cameras is gradually developing to higher resolutions, when processing images or video, high resolution Although the image is rich in information, the processing time it consumes is relatively high, but in fact, some processing operations do not require so many pixels. Therefore, in the present invention, after the video image is input, the image frame is first down-sampled by bilinear interpolation (Bilinear Interpolation). This is because the bilinear interpolation method retains appropriate edge information and less calculation. In this way, not only can the input camera image be processed with a regular normalization process, but also the time for subsequent feature point detection processing can be greatly reduced.

而，在步驟S2中更包含了步驟S21：使用SIFT進行特徵點偵測、步驟S22：使用RANSAC去除匹配錯誤特徵點及步驟S23：使用DBSCAN求取特徵點群聚中的核心點。 However, step S2 further includes step S21: using SIFT for feature point detection, step S22: using RANSAC to remove matching error feature points, and step S23: using DBSCAN to find the core points in the feature point cluster.

續言之，更詳細地說，特徵點偵測是圖片拼接方法中一個非常重要的步驟，主要是為了有效解決拼接部分的視差問題，因此通常會針對特徵點處理加以改良，使得畫面中能夠有更多且更精確的特徵點匹配，而透過複雜的運算是能達到此目的，但相對地必須花費更多計算時間，因此較無法將其應用於即時視訊拼接處理。為避免使用傳統影像特徵點處理所需高時計算複雜度的演算法，並降低運算量以適用於即時視訊拼接，本發明之即時視訊畫面拼接方法提出了透過分析特徵點分布以尋找最佳透視變換矩陣，為了克服同一拍攝空間環境下之畫面所產生水平、垂直及旋轉角度上的差異，及找出畫面上能夠拼接的範圍，所以必須透過偵測影像中的特徵點匹配給予後續影像校正使用，然若畫面上重疊區域較窄時，單應性矩陣(Homography Matrix)的計算會更為困難且穩定性會更差，原因是畫面重疊區域越少而所能偵測到的特徵點匹配數量越少，對於沒有計算出特徵點匹配的區域矩陣則容易被忽略掉，以致造成校正矩陣變形只傾向於密集特徵點匹配位置而最後產生偏差的校正結果，因此必須透過更精確的特徵點偵測演算法來找尋正確的特徵點以得到較正確的校正結果。 In a word, in more detail, feature point detection is a very important step in the image stitching method, mainly to effectively solve the parallax problem of the stitched part, so it is usually aimed at feature points The processing is improved so that more and more accurate feature points can be matched in the picture, and this goal can be achieved through complex calculations, but relatively more calculation time must be spent, so it is less applicable to real-time video stitching deal with. In order to avoid the use of traditional image feature point processing algorithms that require high-time computational complexity and reduce the amount of calculation to be suitable for real-time video stitching, the real-time video picture stitching method of the present invention proposes to find the best perspective by analyzing the feature point distribution Transformation matrix, in order to overcome the difference in horizontal, vertical and rotation angles generated by the picture in the same shooting space environment and find out the range that can be stitched on the picture, so it must be used for subsequent image correction by detecting the matching of feature points in the image However, if the overlapping area on the screen is narrow, the calculation of the homography matrix (Homography Matrix) will be more difficult and the stability will be worse, because the fewer overlapping areas on the screen, the number of matching feature points that can be detected The less the area matrix for which the feature point matching is not calculated, it is easy to be ignored, so that the correction matrix deformation only tends to the dense feature point matching position and finally produces a biased correction result. Therefore, more accurate feature point detection must be used Algorithm to find the correct feature points to get more correct correction results.

其中，如步驟S21，使用SIFT特徵點偵測演算法，它建構出多層金字塔模型來強健尺度不變性的偵測，在描述子中透過統計出區域內特徵點主方向以給予旋轉不變性而達更穩定的效果，公式(1)示出區域旋轉角度θ(為向量資訊的方向)，L(x,y)表示輸入影像I(x,y)的Laplace of Gaussian，G(x,y,σ)為Gaussian函數。 Among them, in step S21, the SIFT feature point detection algorithm is used, which constructs a multi-layer pyramid model to strengthen the detection of scale invariance. In the descriptor, the main direction of the feature points in the region is counted to give rotation invariance to achieve For a more stable effect, formula (1) shows the area rotation angle θ (the direction of vector information), L (x, y) represents the Laplace of Gaussian, G (x, y, σ ) of the input image I (x, y) ) Is the Gaussian function.

若特徵點匹配後的結果並不是相當穩定，經常出現匹配錯誤的現象，原因是一般監控攝影機為了能夠拍攝到視野較廣的場景，這使得同一空間環境下的攝取畫面之重疊區域較小，導致重疊區域的特徵點描述子容易與非重疊區域中的特徵點產生錯誤的匹配，以致後續得到錯亂的影像校正。 If the result of matching feature points is not quite stable, the phenomenon of matching errors often occurs, because the general surveillance camera is able to shoot scenes with a wider field of view, which makes the same space The overlapping area of the captured image under the environment is small, which makes the feature point descriptors of the overlapping area easily match with the feature points in the non-overlapping area, resulting in subsequent disordered image correction.

如步驟S22，為了解決上述問題，故使用RANSAC演算法進行適當特徵點群的篩選，將非重疊區域中不可能匹配的特徵點進行進一步的剃除，透過分析出影像特徵點分布的資料數據以估計出離群點及內群資料點合理分布的模型，再剃除不合理的離群點，有時為了提高估算合理模型的正確率，必須要提高迭代次數，如公式(2)，其中P為計算合理群的機率，K為迭代次數，O為內群點數量，M則為模型所選定的點數量。 In step S22, in order to solve the above problems, the RANSAC algorithm is used to filter the appropriate feature point group, and the feature points that are impossible to match in the non-overlapping area are further shaved, and the data of the image feature point distribution is analyzed by analyzing Estimate the model of reasonable distribution of outliers and inner group data points, and then shave out unreasonable outliers. Sometimes in order to improve the accuracy of estimating a reasonable model, it is necessary to increase the number of iterations, such as formula (2), where P To calculate the probability of a reasonable group, K is the number of iterations, O is the number of inner group points, and M is the number of points selected by the model.

在獲得適當的特徵點後，由於特徵點的屬性通常為邊緣點或角點，所以特徵點時常有群聚現象產生，它對於後續找尋最佳透視變換矩陣時常使得處理上出現錯誤的判定，因此，如步驟S23，本發明使用DBSCAN演算法執行特徵點群聚分析來篩選特徵點，得出群聚特徵點中的核心點，並將之儲存至特徵點集合中以利於後續處理。 After obtaining appropriate feature points, because the attribute of the feature point is usually an edge point or a corner point, the feature point often has a clustering phenomenon, and it often makes the wrong judgment in processing for the subsequent search for the best perspective transformation matrix, so In step S23, the present invention uses the DBSCAN algorithm to perform feature point cluster analysis to filter feature points, obtain the core points in the cluster feature points, and store them in the feature point set to facilitate subsequent processing.

而，在步驟S3中更包含了步驟S31：計算自適應場景分群線並將特徵點分群、步驟S32：計算兩群特徵點集合比值，若符合閥值條件則存入候選特徵點集合暫存器及步驟S33：挑選具有特徵點數量最多之集合以計算單應性矩陣作為最佳透視變換矩陣。 However, step S3 further includes step S31: calculating the adaptive scene grouping line and grouping the feature points, and step S32: calculating the ratio of the feature point set of the two groups, and storing it in the candidate feature point set register if the threshold condition is met And step S33: select the set with the largest number of feature points to calculate the homography matrix as the optimal perspective transformation matrix.

續言之，更詳細地說，為了能夠達到即時處理視訊拼接的效果，本發明針對固定式監控攝影機畫面計算出其最佳透視變換矩陣，後面影像序列皆透過此最佳透視變換矩陣進行影像校正，而不需再進行較高計算複雜度的特徵點偵測處理，故能達到即時處理的效果。 In a word, in more detail, in order to achieve the effect of real-time processing of video stitching, the present invention calculates its optimal perspective transformation matrix for the fixed surveillance camera screen, followed by the image sequence All the image corrections are performed through this optimal perspective transformation matrix, and there is no need to perform feature point detection processing with higher computational complexity, so the effect of real-time processing can be achieved.

如步驟S31，在前述得出群聚特徵點的核心點後，接著進行自適應場景分群線的計算，由於監控攝影機架設的環境在正常使用的情況下，同一空間環境的攝影機水平位置並不一致，而此狀況會影響到特徵點分布的結果，且對於後續篩選候選特徵點集合時會產生相當程度的影響，所以在此要先計算出具自適應環境的場景分群線，正確地將特徵點分成上群集合及下群集合，假設D_p為DBSCAN分群後之特徵點集合，p為D_p中之一點，接著提取特徵點集合中y座標之最大值YMax(p)及最小值YMin(p)進行平均計算，得到場景分群線的y座標位置Y_SM，如公式(3)。 In step S31, after the core point of the clustering feature point is obtained, the calculation of the adaptive scene grouping line is performed. Since the environment of the monitoring camera rack is under normal use, the horizontal position of the camera in the same spatial environment is not consistent. This situation will affect the distribution of feature points, and will have a considerable impact on the subsequent selection of candidate feature point sets. Therefore, we must first calculate the scene grouping line with an adaptive environment and correctly divide the feature points into upper Group set and lower group set, assuming that D _p is the set of feature points after DBSCAN grouping, p is one of the points in D _p , and then extract the maximum value YMax (p) and the minimum value YMin (p) of the y coordinate in the feature point set The average calculation results in the y-coordinate position Y _{SM of the} scene grouping line, as shown in formula (3).

如步驟S32，經由上述步驟後獲得自適應場景分群線以區分Fu、Fd兩特徵點集合，給予後續計算特徵點分佈使用，分群公式如(4)所示，T_SM為自適應場景線數值，對D_p集合中的特徵點進行分類，如圖6所示，紅色圈表示Fu集合、黃色圈表示Fd集合。 In step S32, the adaptive scene grouping line is obtained after the above steps to distinguish between the two feature point sets of Fu and Fd, and is used for subsequent calculation of the feature point distribution. The grouping formula is shown in (4), T _SM is the value of the adaptive scene line, The feature points in the D _p set are classified. As shown in FIG. 6, the red circle indicates the Fu set and the yellow circle indicates the Fd set.

由於場景中特徵點分佈會影響單應性矩陣，經深入分析暨實驗發現若特徵點分佈集中於某些區域中，則單應性矩陣則會傾向該特徵點集中區塊，以致整個校正畫面呈現不合理且拼接區域容易造成破碎現象，為了找尋到特徵點分佈最均勻之候選特徵點集合，將Fu、Fd兩群數量統計再計算兩群比值，判斷若比值是否介於R_min~R_max之間，Num(Fu)為Fu之特徵點數量，Num(Fd)為Fd 之特徵點數量，若符合條件公式(5)，則將此候選特徵點集合(候選矩陣)存入候選矩陣集合HC。 Since the distribution of feature points in the scene will affect the homography matrix, through in-depth analysis and experiments, it is found that if the distribution of feature points is concentrated in some areas, the homography matrix will tend to the block where the feature points are concentrated, so that the entire calibration screen is presented Unreasonable and the splicing area is easy to cause fragmentation. In order to find the candidate feature point set with the most uniform distribution of feature points, the numbers of Fu and Fd are counted and the ratio of the two groups is calculated to determine whether the ratio is between R _min ~ R _max In the meantime, Num (Fu) is the number of feature points of Fu, and Num (Fd) is the number of feature points of Fd. If the conditional formula (5) is met, the candidate feature point set (candidate matrix) is stored in the candidate matrix set HC.

如步驟S33，若符合條件公式(5)而選入候選矩陣(指候選特徵點集合HC)集合中的數量達到設定目標後，則開始進行最佳透視變換矩陣的挑選，選取條件主要是考量特徵點的匹配數量與平均分佈，因此挑選處理主要是從候選矩陣集合中計算各矩陣之特徵點匹配數量並選取最多數量的矩陣，將其形成單應性矩陣做為最佳透視變換矩陣，如公式(6)所示，其中Num(HC(i))代表計算矩陣HC(i)之特徵點匹配數量的函數，M_HC代表候選矩陣集合HC內之矩陣總數量，HC(ω)代表最多特徵點匹配數量的矩陣，HC_best代表所選取之最佳透視變換矩陣。上述選取原因是若候選矩陣中的特徵點分佈較為平均分佈，則所計算出來的單應性矩陣較不容易歪斜，而特徵點匹配數量越多則能夠使得單應性矩陣更為穩定，執行校正時則較不容易忽略其他區域以致能產生較好的效果。 In step S33, if the number of candidate matrices (referring to the candidate feature point set HC) set that meets the conditional formula (5) reaches the set target, the selection of the best perspective transformation matrix is started. The selection condition is mainly to consider the features The number of matching points and the average distribution, so the selection process is mainly to calculate the number of matching feature points of each matrix from the set of candidate matrices and select the maximum number of matrices to form a homography matrix as the best perspective transformation matrix, such as the formula As shown in (6), where Num (HC (i)) represents the function of calculating the matching number of feature points of the matrix HC (i), M _HC represents the total number of matrices in the candidate matrix set HC, and HC ( ω ) represents the most feature points For a matrix of matching numbers, HC _best represents the selected best perspective transformation matrix. The reason for the above selection is that if the distribution of feature points in the candidate matrix is more evenly distributed, the calculated homography matrix is less likely to be skewed, and the greater the number of feature point matches, the more stable the homography matrix is, and the correction is performed It is less easy to ignore other areas to produce better results.

HC_best=HC(ω)，ω=arg max Num(HC(i))，i=1，........，M_HC (6) HC _best = HC ( ω ), ω = arg max Num (HC (i)), i = 1,..., M _HC (6)

在步驟S4中，更包含了步驟S41：使用最佳透視變換矩陣針對匹配過後的影像進行校正暨補償；一般而言，透視變換(Perspective Projection)與仿射變換(Affine Transformation)均用於圖像校正，雖然相鄰影像畫面間的變化不大，但仿射變換需指定三點穩定的特徵點為矯正點，然拍攝畫面常因內含多移動物體而影響穩定特徵點的選取，而矯正結果會被所選擇點的優劣所影響。因此本發明採用透視變換並結合最小中位數平方(Least-Median of Squares，LMedS) 來選取較為安穩的特徵點以做為矯正點，此方法雖然有較高的容離值(又稱崩潰點，Breakdown Point)(約50%，而50%是所有穩健迴歸估計量中可達到的最高容離值，表示離群特徵點的極端值對於LMedS的影響很小)，表示較不易受到離群特徵點影響參數，可以有效地歸類資料中的多重離群點並且篩選排除，以避免影響到轉換矩陣，但由於監控攝影機所拍攝影像畫面之重疊區域較小，這表示所能偵測到的特徵點數量有限，且容易因為背景建築物的邊緣點過於強烈而導致特徵點過度集中於某些區域，使得最小中位數平方所獲得的矩陣參數接為離群位置點，先前透過DBSCAN篩選剃除掉同一區域中大多數的邊緣特徵點而只保留區域內的核心點，再透過搜尋Fu、Fd兩特徵點集合中比值接近平均的特徵點集合，如此可提供更有效且穩定的單應性矩陣。 In step S4, step S41 is further included: the best perspective transformation matrix is used for correction and compensation of the matched image; generally speaking, both perspective transformation (Aspecte Transformation) and affine transformation (Affine Transformation) are used for images Correction, although the change between adjacent image frames is not large, but the affine transformation needs to specify three stable feature points as correction points. However, the shooting screen often affects the selection of stable feature points due to the inclusion of multiple moving objects, and the correction results Will be affected by the pros and cons of the selected point. Therefore, the present invention adopts perspective transformation and combines Least-Median of Squares (LMedS) To select a more stable feature point as the correction point, although this method has a higher tolerance value (also called Breakdown Point) (about 50%, and 50% is achievable in all robust regression estimators The highest tolerance outlier value indicates that the extreme value of outlier feature points has little effect on LMedS), indicating that it is less susceptible to the parameters of outlier feature points, which can effectively classify multiple outliers in the data and filter out to avoid Affects the conversion matrix, but because the overlapping area of the image captured by the surveillance camera is small, this means that the number of feature points that can be detected is limited, and it is easy to cause the feature points to be excessively concentrated because the edge points of the background building are too strong In some areas, the matrix parameter obtained by the square of the minimum median is connected to the outlier position point. Previously, most of the edge feature points in the same area were shaved off by DBSCAN filtering, and only the core points in the area were retained. The set of feature points whose ratio between Fu and Fd is close to the average can provide a more effective and stable homography matrix.

當獲得最佳透視變換矩陣後，本發明對目標影像進行影像校正處理，透視變換主要包含放大、平移與畫面旋轉處理，藉由投射線性變換關係執行三維變形，基於射影幾何性質，射影變換並不保持大小和角度但會保持重合關係和交比，如公式(7)： When the best perspective transformation matrix is obtained, the present invention performs image correction processing on the target image. Perspective transformation mainly includes zooming, translation and picture rotation processing. Three-dimensional deformation is performed by projecting linear transformation relationships. Based on projective geometric properties, projective transformation does not Maintain the size and angle but maintain the coincidence relationship and cross ratio, as shown in formula (7):

式中x和y為兩畫面中重疊像素點P_i的透視點，K_x與K_y為內部參數矩陣，投射線在笛卡兒座標下則為非線性變換，故無法透過矩陣乘法執行透視射影所必需的除法運算，H_xy如公式(8)： Where x and y are the perspective points of the overlapping pixels P _i in the two frames, K _x and K _y are the internal parameter matrix, and the projection line is a nonlinear transformation under the Cartesian coordinates, so it is impossible to perform perspective projection through matrix multiplication The necessary division operation, H _{xy is} as formula (8):

式中R為x與y的旋轉矩陣，n與d分別表示平面的法向量與到平面之距離，t為x至y的平移向量，變換處理如公式(9)： Where R is the rotation matrix of x and y, n and d represent the normal vector of the plane and the distance to the plane, and t is the translation vector of x to y, and the transformation process is as formula (9):

上式最少需要4個以上的匹配點才可得出8個未知數。若要求做到圖像校正、圖像對齊或圖像之間的相機運動計算(旋轉和平移)，則需使反投影公式(10)的誤差為最小，由於單應性矩陣的尺度可變，所以公式(10)需被正規化以使得h₃₃=1。 The above formula requires at least 4 matching points to get 8 unknowns. If image correction, image alignment, or camera motion calculation (rotation and translation) between images is required, the error of the back projection formula (10) needs to be minimized, because the scale of the homography matrix is variable, Therefore, formula (10) needs to be normalized so that h ₃₃ = 1.

本發明使用最小中位數平方法(LMedS)來獲得那8個未知數，LMedS演算法如下所述。 The present invention uses the minimum median leveling method (LMedS) to obtain those 8 unknowns. The LMedS algorithm is as follows.

由公式(10)得知，在此需求出從圖像轉換之單應性矩陣中的8項未知數，此8項未知數亦為特徵點群中的四點，在此使用最小中位數平方來求得此解，作法為先將所有的資料進行排序，在取中間區塊的數值，對此區塊分別計算各組點到線之距離平方值，在挑選最小的資料組，而此資料組則為所需要的未知數，此法之流程如公式(11)、(12)、(13)所示：l_i={(a_i,b_i)|i=1,2,3,...,1} (11) It is known from formula (10) that here we need 8 unknowns in the homography matrix converted from the image, and these 8 unknowns are also four points in the feature point group. The minimum median square is used here To find this solution, the method is to sort all the data first, take the value of the middle block, calculate the square value of the distance from each group of points to the line for this block, and select the smallest data group, and this data group Is the unknown number needed, the flow of this method is shown in formulas (11), (12), (13): l _i = {(a _i, b _i ) | i = 1,2,3, ... , 1} (11)

式中l_i為a與b之線段，a_i與b_i分別為任意之兩點。 In the formula, l _i is a line segment of a and b, and a _i and b _i are arbitrary two points respectively.

r=|ax+by+1| (12) r = | ax + by + 1 | (12)

此式為計算點到線之距離，式中r為殘餘量(Residual)。 This formula is the distance between the calculated point and the line, where r is the residual quantity (Residual).

上式中M_j為上述兩公式(11)&(12)計算完之結果，n為資料點總組數，l為取來畫線之總組數，p_j為取點之距離，經由此公式求得之數值即為單應性矩陣中所需的未知參數。 In the above formula, M _j is the result of the calculation of the above two formulas (11) & (12), n is the total number of data points, l is the total number of lines drawn, and p _j is the distance of the points. The value obtained by the formula is the unknown parameter required in the homography matrix.

而，在步驟S5中更包含了步驟S51：使用多頻帶融合法(Multi-Band Blending)對重疊區域進行拼接及步驟S52：使用線性融合法(Linear Blending)對非重疊區域進行拼接。 In addition, step S5 further includes step S51: using multi-band blending (Multi-Band Blending) to stitch overlapping regions and step S52: using linear blending (Linear Blending) to stitch non-overlapping regions.

續言之，在上述所得之最佳透視變換矩陣，將用來對後面影像序列(視訊)進行影像校正，而不需對每一張畫面影像再進行較高計算複雜度的特徵點偵測處理，故能大幅降低整體處理時間；此外，對於拼接(Stitching)處理上，則依特徵點分佈找尋D_p特徵點集合中x座標的最小值XMin(p)與最大值XMax(p)，將影像Img(x,y)劃分成重疊區overlap(x,y)與非重疊區non-overlap(x,y)，如公式(14)，式中D_p為DBSCAN分群後之特徵點集合；重疊區使用多頻帶融合(Multi-Band Blending)法處理較容易產生扭曲的部分，而非重疊區則使用計算複雜度較低的線性融合(Linear Blending)法處理較不易發生破碎的區域，如此可提高執行速度且能得出較平滑的拼接結果。本發明藉上述方式來達到即時視訊拼接的效果。 In a word, the best perspective transformation matrix obtained in the above will be used to correct the image sequence (video) in the back, without the need to perform feature point detection processing with higher computational complexity for each frame image , So it can greatly reduce the overall processing time; in addition, for stitching (Stitching) processing, according to the distribution of feature points, find the minimum value XMin (p) and maximum value XMax (p) of the x coordinate in the set of _Dp feature points, and convert the image Img (x, y) is divided into overlapping area overlap (x, y) and non-overlap area non-overlap (x, y), as in formula (14), where D _p is the feature point set after DBSCAN grouping; overlapping area Use the Multi-Band Blending method to deal with the parts that are more likely to be distorted, while the non-overlapping areas use the Linear Blending method with a lower computational complexity to deal with the areas that are less likely to break, which can improve execution Speed and can get a smoother stitching result. The present invention achieves the effect of real-time video splicing by the above method.

如步驟S51，由於重疊區域中常發生視差現象而容易產生扭曲，所以使用Multi-Band Blending演算法對影像進行細緻的拼接處理，在此將輸入影像對(左、右張影像)中的基本影像與校正完成的目標影像，先分割出各自重疊區域overlapⁱ(x,y)，再透過公式(15)建構出overlap_σ ⁱ(x,y)，式中i為輸入影像的數量，g_σ(x,y)為高斯模糊函數，*表示卷積運算(Convolution)。 As in step S51, the parallax phenomenon often occurs in the overlapping area and distortion is easy to occur, so the Multi-Band Blending algorithm is used to perform detailed stitching processing on the image. Here, the basic image and the image in the input image pair (left and right images) are combined. After correcting the target image, first divide the respective overlapping area overlap ⁱ (x, y), and then construct the overlap _σ ⁱ (x, y) by formula (15), where i is the number of input images, g _σ (x , y) is a Gaussian fuzzy function, and * represents a convolution operation (Convolution).

接著將不同尺度空間所獲得overlapⁱ _kσ(x,y)與overlapⁱ _(k+1)σ(x,y)相減得到Bⁱ _(k+1)σ(x,y)，如公式(16)所示，式中k為頻帶數量，Bⁱ _(k+1)σ(x,y)為尺度空間[k,(k+1)σ]中的空間域資訊。 Then subtract the overlap ⁱ _kσ (x, y) and overlap ⁱ _{(k + 1) σ} (x, y) obtained from different scale spaces to obtain B ⁱ _{(k + 1) σ} (x, y), as shown in formula (16 ), Where k is the number of frequency bands, and B ⁱ _{(k + 1) σ} (x, y) is the spatial domain information in the scale space [k, (k + 1) σ].

公式(17)中Wⁱ _(k+1)σ則為尺度[k,(k+1)σ]中的遮罩權重值，在此使用遮罩權重值Wⁱ _kσ(x,y)與高斯模糊函數g_σ(x,y)進行卷積運算。 In formula (17), W ⁱ _{(k + 1) σ} is the mask weight value in the scale [k, (k + 1) σ ], and the mask weight value W ⁱ _kσ (x, y) and Gaussian are used here The fuzzy function g _σ (x, y) performs a convolution operation.

公式(18)表示該尺度空間k中的空間域資訊Bⁱ _kσ(x,y)與遮罩權重值Wⁱ _kσ(x,y)使用對應的融合權重進行拼接運算，式中overlap^multi _kσ(x,y)為該尺度空間k中的拼接結果。 Formula (18) represents the spatial domain information B ⁱ _kσ (x, y) and the mask weight value W ⁱ _kσ (x, y) in the scale space k using the corresponding fusion weight to perform the splicing operation, where overlap ^multi _kσ ( x, y) is the stitching result in the scale space k.

重疊區影像則透過高頻帶(較小的k)融合將影像中的細節部分加以拼接，而低頻帶(較大的k)融合則將大範圍區域如廣域的色彩及色調區域加以復原，最後再將多個頻帶進行疊加。 The image in the overlapping area is stitched together through the high-band (smaller k) fusion, and the low-band (larger k) fusion restores the large-area area, such as the wide-area color and tone area, and finally Then multiple frequency bands are superimposed.

如步驟S52，非重疊區則使用計算複雜度較低的線性融合(Linear Blending)演算法處理較不易發生破碎的區域，如此可提高執行速度且能得出較平滑的拼接結果。非重疊區域之拼接處理主要是將拼接完成的重疊區域分別與基本畫面及目標畫面的非重疊區進行Linear Blending拼接，進而能夠獲得完整影像拼接結果，公式(19)是透過提取拼接影像寬度Width_img與當前像素點位置Width_target而計算出權重值α；公式(20)是透過α將重疊區像素overlap(x,y)與重疊區之左畫面像素(non-overlap_L(x,y))進行線性拼接並暫存至Temp_img(x,y)給予後續處理。接著將Temp_img(x,y)與重疊區之右畫面像素(non-overlap_R(x,y))進行最後拼接處理以獲得結果影像result(x,y)，如公式(21)所示。圖8示出非重疊區之拼接處理，圖(a)為重疊區(圖7(e))與基本畫面之非重疊區(就是non-overlap_L(x,y))之線性拼接結果；圖(b)為圖(a)與目標畫面的非重疊區(就是non-overlap_R(x,y))之線性拼接結果(就是最後拼接結果)。 In step S52, the non-overlapping area uses a linear blending algorithm with a low computational complexity to process the area that is less likely to be broken, so that the execution speed can be improved and a smoother stitching result can be obtained. The stitching process of the non-overlapping area is mainly to perform the linear blending stitching of the stitched overlapping area and the non-overlapping area of the base picture and the target picture, respectively, to obtain the complete image stitching result. Formula (19) is to extract the width of the stitched image by Width _img Calculate the weight value α with the current pixel position Width _target ; formula (20) is to use α to overlap the pixels in the overlap area (x, y) and the left frame pixels in the overlap area (non-overlap _L (x, y)) Linear stitching and temporary storage to Temp _img (x, y) for subsequent processing. Then Temp _img (x, y) and the right frame pixel (non-overlap _R (x, y)) of the overlap area are finally stitched to obtain the resulting image result (x, y), as shown in formula (21). Figure 8 shows the stitching process of the non-overlapping area. Figure (a) is the linear stitching result of the overlapping area (Figure 7 (e)) and the non-overlapping area of the basic picture (that is, non-overlap _L (x, y)). (b) is the linear splicing result (that is, the final splicing result) of the non-overlapping area (that is, non-overlap _R (x, y)) of Figure (a) and the target picture.

Temp_img(x,y)=α×overlap(x,y)+(1-α)×non-overlap_L(x,y) (20) Temp _img (x, y) = α × overlap (x, y) + (1- α ) × non-overlap _L (x, y) (20)

result(x,y)=α×non-overlap_R(x,y)+(1-α)×Temp_img(x,y) (21) result (x, y) = α × non-overlap _R (x, y) + (1- α ) × Temp _img (x, y) (21)

以上所述之實施例僅係為說明本發明之技術思想及特點，其目的在使熟習此項技藝之人士能夠瞭解本發明之內容並據以實施，當不能以之限定本發明之專利範圍，即大凡依本發明所揭示之精神所作之均等變化或修飾，仍應涵蓋在本發明之專利範圍內。 The above-mentioned embodiments are only for explaining the technical ideas and characteristics of the present invention. The purpose is to enable those skilled in the art to understand the contents of the present invention and implement them accordingly. When the scope of the patent of the present invention cannot be limited, That is, any equivalent changes or modifications made in accordance with the spirit disclosed in the present invention should still be covered by the patent scope of the present invention.

Claims

A real-time video image stitching method includes the following steps: performing a down-sampling on a plurality of input images by bilinear interpolation, normalizing the plurality of input images to generate a plurality of normalized input images; The SIFT feature point detection algorithm is used to search for a plurality of feature points in the normalized input image; the plurality of feature points are selected by the RANSAC algorithm to remove the non-overlapping areas in the normalized input image. The feature points of the; the feature points remaining after shaving the unmatched feature points in the non-overlapping region of the normalized input image are obtained by the DBSCAN algorithm Point; extract a maximum y-coordinate and a minimum y-coordinate from the y-coordinates of the plurality of core points and calculate the average of the two to obtain the position of the y-coordinate of a scene grouping line; A plurality of core points are divided into an upper feature point set and a lower feature point set, and the number of feature points of the upper feature point set and the number of feature points of the lower feature point set are calculated. Value calculation, the set of upper feature points and the set of lower feature points that meet a preset ratio range are used as candidate matrices; the number of matching feature points in each candidate matrix is calculated, and the candidate with the most matching number of feature points is selected from them The matrix is used as the optimal perspective transformation matrix; the optimal perspective transformation matrix is used to perform image correction processing on one target image among the plurality of input images by perspective transformation and minimum median square. The perspective transformation includes zooming, panning and Screen rotation processing, and perform three-dimensional deformation by projecting a linear transformation relationship; according to the distribution of the plurality of core points obtained by the DBSCAN algorithm, the plurality of input images are divided into overlapping regions and non-overlapping regions, using the multi-band fusion method The overlapping area is stitched, and the linear fusion method is used to stitch the non-overlapping area, and finally the stitched overlapping area and the non-overlapping area are superimposed to generate a stitched image.

As described in item 1 of the patent application scope, a method for stitching real-time video images, in which the feature points that cannot be matched are eliminated, the distribution of the plurality of feature points is analyzed, and then the outliers and inner groups are estimated A model with a reasonable distribution of points eliminates unreasonable outliers.

The real-time video picture stitching method as described in item 2 of the patent application scope, which correspondingly increases the accuracy of the model for estimating a reasonable distribution, increases the number of iterations, as shown in the following formula: 1-P = (1-O ^M ) ^K ,

Among them, P is the probability of calculating a reasonable group, K is the number of iterations, O is the number of inner group points, and M is the number of feature points selected by the reasonably distributed model.

The real-time video picture stitching method as described in item 1 of the patent application scope, wherein the grouping formula of the plurality of core points into the upper feature point set and the lower feature point set is as follows:

, Fd (p) = {y _p > T _SM }, p

D _p where TSM is the value of the adaptive scene line, D _p is the clustering of the feature points obtained by the DBSCAN algorithm, p is the feature point in D _p , Fu is the set of upper feature points, and Fd is the lower Feature point collection.

The real-time video image stitching method as described in item 1 of the patent application scope, wherein the candidate matrix meets the following conditional expression: IF

, THEN D _p is stored into HC, where HC is the candidate matrix, R _min is the lower limit of the preset ratio range, R _max is the upper limit of the preset ratio range, and Num (Fu) is the feature point of the upper feature point set Number, Num (Fd) is the number of feature points of the lower feature point set.

The real-time video image stitching method as described in item 1 of the patent application scope, wherein a maximum x coordinate and a minimum x coordinate are extracted from each x coordinate of the plurality of core points, and the input image is divided into the overlap accordingly Zone and the non-overlapping zone, it meets the following conditional formula:

Among them, Img (x, y) is the input image, XMin (p) is the minimum x coordinate, XMax (p) is the maximum x coordinate, D _p is the clustering of the feature points obtained by the DBSCAN algorithm, p Is the feature point in D _p .