TWI795885B

TWI795885B - Visual positioning method, device and computer-readable storage medium

Info

Publication number: TWI795885B
Application number: TW110131503A
Authority: TW
Inventors: 黃凱; 章國鋒; 鮑虎軍; 王楠; 舒向前
Original assignee: 大陸商浙江商湯科技開發有限公司
Priority date: 2020-10-23
Filing date: 2021-08-25
Publication date: 2023-03-11
Also published as: CN112348889A; JP7280385B2; WO2022083038A1; KR20220054582A; JP2023502192A; TW202217755A

Abstract

The present disclosure discloses a visual positioning method, a device and a computer-readable storage medium, wherein the visual positioning method includes: obtaining gravity information of a camera; using the gravity information to obtain the camera pose parameters of the current image taken by the camera in the preset motion state; based on the camera pose parameters of the current image, the camera pose parameters of the image to be processed after the current image are obtained. The above solution reduces the use cost of visual positioning technology and expands the scope of use of visual positioning technology.

Description

Visual positioning method, device and computer-readable storage medium

本發明關於電腦視覺技術領域，特別是關於一種視覺定位方法、設備和電腦可讀儲存介質。 The present invention relates to the technical field of computer vision, in particular to a visual positioning method, device and computer-readable storage medium.

隨著電子資訊技術的發展，SLAM(Simultaneous Localization And Mapping，即時定位與地圖構建)等視覺定位技術已逐漸應用於自動駕駛、室內導航、AR(Augmented Reality，擴增實境)、VR(Virtual Reality，虛擬實境)等領域。 With the development of electronic information technology, visual positioning technologies such as SLAM (Simultaneous Localization And Mapping, real-time positioning and map construction) have been gradually applied to automatic driving, indoor navigation, AR (Augmented Reality, augmented reality), VR (Virtual Reality , virtual reality) and other fields.

SLAM等視覺定位技術通過獲取移動設備的相機位姿，完成移動設備的自主定位、導航等任務，其本質上是複雜的數學問題。目前，SLAM等視覺定位技術在硬體上依賴於感測器，通常需要相機、加速度計、重力計、IMU(Inertial Measurement Unit，慣性測量單元)等感測器。然而，在實際應用中，一般只有中高端移動設備才完整地配置上述感測器。低端移動設備所配置的感測器一般較少，且一般不會配置IMU，從而導致現有的視覺定位技術的使用成本較高，且使用範圍較窄。有鑑於此，如何降低視覺定位技術的使用成本、擴大視覺定位技術的使用範圍成為亟待解決的問題。 Visual positioning technologies such as SLAM complete tasks such as autonomous positioning and navigation of mobile devices by obtaining the camera pose of the mobile device, which is essentially a complex mathematical problem. At present, visual positioning technologies such as SLAM rely on sensors in hardware, and usually require sensors such as cameras, accelerometers, gravimeters, and IMUs (Inertial Measurement Units). However, in practical applications, generally only mid-to-high-end mobile devices are fully equipped with the above-mentioned sensors. Low-end mobile devices are generally equipped with fewer sensors and generally do not have an IMU, resulting in the existing visual positioning The use cost of technology is high, and the scope of use is narrow. In view of this, how to reduce the cost of using visual positioning technology and expand the scope of use of visual positioning technology has become an urgent problem to be solved.

本發明提供一種視覺定位方法、設備和電腦可讀儲存介質。 The invention provides a visual positioning method, equipment and computer-readable storage medium.

本發明第一方面提供了一種視覺定位方法，包括：獲取相機的重力資訊；利用重力資訊，獲取相機在預設運動狀態下拍攝的當前圖像的相機位姿參數；基於當前圖像的相機位姿參數，獲取當前圖像之後的待處理圖像的相機位姿參數。 The first aspect of the present invention provides a visual positioning method, including: obtaining the gravity information of the camera; using the gravity information to obtain the camera pose parameters of the current image captured by the camera in a preset motion state; the camera position based on the current image Pose parameters, to obtain the camera pose parameters of the image to be processed after the current image.

因此，通過獲取相機的重力資訊，從而利用重力資訊，獲取相機在預設運動狀態下拍攝的當前圖像的相機位姿參數，並基於當前圖像的相機位姿參數，獲取當前圖像之後的待處理圖像的相機位姿參數，進而能夠只依賴於相機和重力資訊來進行視覺定位，故能夠降低視覺定位技術的使用成本，擴大視覺定位技術的使用範圍。 Therefore, by obtaining the gravity information of the camera, the gravity information is used to obtain the camera pose parameters of the current image captured by the camera in a preset motion state, and based on the camera pose parameters of the current image, the The camera pose parameters of the image to be processed can then perform visual positioning only relying on the camera and gravity information, so the cost of using the visual positioning technology can be reduced and the scope of use of the visual positioning technology can be expanded.

其中，重力資訊包括重力方向資訊，基於當前圖像的相機位姿參數，獲取當前圖像之後的待處理圖像的相機位姿參數之前，還包括：獲取當前圖像中的特徵點的特徵方向資訊；利用特徵點的特徵方向資訊和重力方向資訊，得到當前圖像中特徵點的深度資訊；基於當前圖像的相機位姿參數，獲取當前圖像之後的待處理圖像的相機位姿參數包括：基於當前圖像中特徵點的深度資訊和當前圖像的相機位姿參數，獲取當前圖像之後的待處理圖像中特徵點的深度資訊和待處理圖像的相機位姿參數。 Among them, the gravity information includes the gravity direction information, based on the camera pose parameters of the current image, before obtaining the camera pose parameters of the image to be processed after the current image, it also includes: obtaining the feature directions of the feature points in the current image information; use the feature direction information and gravity direction information of the feature points to obtain the depth information of the feature points in the current image; based on the camera pose parameters of the current image, obtain the camera pose parameters of the image to be processed after the current image The number includes: based on the depth information of the feature points in the current image and the camera pose parameters of the current image, obtain the depth information of the feature points in the image to be processed after the current image and the camera pose parameters of the image to be processed.

因此，通過獲取當前圖像中特徵點的特徵方向資訊，並利用特徵點的特徵方向資訊和重力資訊所包含的重力方向資訊，得到當前圖像中特徵點的深度資訊，故能夠僅基於當前圖像來初始化當前圖像中特徵點的深度資訊和當前圖像的相機位姿參數，且能夠基於當前圖像中特徵點的深度資訊和當前圖像的相機位姿參數，獲取當前圖像之後的待處理圖像中特徵點的深度資訊和待處理圖像的相機位姿參數，而無需掃描多幀圖像來進行初始化工作，從而能夠提高視覺定位的回應速度。 Therefore, by obtaining the feature direction information of the feature points in the current image, and using the feature direction information of the feature points and the gravity direction information contained in the gravity information to obtain the depth information of the feature points in the current image, it is possible to Image to initialize the depth information of the feature points in the current image and the camera pose parameters of the current image, and based on the depth information of the feature points in the current image and the camera pose parameters of the current image, the image after the current image can be obtained The depth information of the feature points in the image to be processed and the camera pose parameters of the image to be processed do not need to scan multiple frames of images for initialization, which can improve the response speed of visual positioning.

其中，特徵方向資訊包括特徵點的方向向量，重力方向資訊包括重力向量，深度資訊包括特徵點的深度值；利用特徵點的特徵方向資訊和重力方向資訊，得到當前圖像中特徵點的深度資訊包括：對特徵點的方向向量和重力向量進行第一預設運算，得到特徵點的方向向量和重力向量之間的夾角；對相機的預設高度和夾角進行第二預設運算，得到特徵點的深度值。 Among them, the feature direction information includes the direction vector of the feature point, the gravity direction information includes the gravity vector, and the depth information includes the depth value of the feature point; use the feature direction information and gravity direction information of the feature point to obtain the depth information of the feature point in the current image Including: performing the first preset operation on the direction vector and the gravity vector of the feature point to obtain the angle between the direction vector and the gravity vector of the feature point; performing the second preset operation on the preset height and angle of the camera to obtain the feature point the depth value.

因此，特徵方向資訊設置為包括特徵點的方向向量，重力方向資訊設置為包括重力向量，深度資訊設置為包括特徵點的深度值，從而對特徵點的方向向量和重力向量進行第一預設運算，得到特徵點的方向向量和重力向量之間的夾角，從而對相機的預設高度和夾角進行第二預設運算，得到特徵點的深度值，故能夠有利於降低獲取特徵點深度值的計算複雜度。 Therefore, the feature direction information is set to include the direction vector of the feature point, the gravity direction information is set to include the gravity vector, and the depth information is set to include the depth value of the feature point, so that the first preset operation is performed on the direction vector and the gravity vector of the feature point , to obtain the angle between the direction vector of the feature point and the gravity vector, so as to perform the second preset operation on the preset height and angle of the camera, Obtaining the depth value of the feature point can help reduce the computational complexity of obtaining the depth value of the feature point.

其中，第一預設運算包括內積運算，和/或，第二預設運算包括將預設高度除以夾角的餘弦值。 Wherein, the first preset operation includes inner product operation, and/or, the second preset operation includes dividing the preset height by the cosine value of the included angle.

因此，將第一預設運算設置為包括內積運算，能夠有利於降低獲取方向向量和重力向量之間夾角的複雜度，將第二預設運算設置為包括將預設高度除以夾角的餘弦值，能夠有利於降低獲取深度值的複雜度。 Therefore, setting the first preset operation to include the inner product operation can help reduce the complexity of obtaining the angle between the direction vector and the gravity vector, and setting the second preset operation to include dividing the preset height by the cosine of the included angle value, which can help reduce the complexity of obtaining the depth value.

其中，基於當前圖像中特徵點的深度資訊和當前圖像的相機位姿參數，獲取當前圖像之後的待處理圖像中特徵點的深度資訊和待處理圖像的相機位姿參數包括：利用預設位姿跟蹤方式對當前圖像中特徵點的深度資訊、當前圖像的相機位姿參數進行跟蹤處理，得到當前圖像的下一幀圖像中特徵點的深度資訊和下一幀圖像的相機位姿參數；將下一幀圖像作為當前圖像，並重新執行利用預設位姿跟蹤方式對當前圖像中特徵點的深度資訊、當前圖像的相機位姿參數進行跟蹤處理的步驟以及後續步驟。 Wherein, based on the depth information of the feature points in the current image and the camera pose parameters of the current image, obtaining the depth information of the feature points in the image to be processed after the current image and the camera pose parameters of the image to be processed include: Use the preset pose tracking method to track the depth information of the feature points in the current image and the camera pose parameters of the current image to obtain the depth information of the feature points in the next frame of the current image and the next frame The camera pose parameters of the image; take the next frame of the image as the current image, and re-execute the depth information of the feature points in the current image and the camera pose parameters of the current image by using the preset pose tracking method Processed steps and subsequent steps.

因此，利用預設位姿跟蹤方式對當前圖像中特徵點的深度資訊、當前圖像的相機位姿參數進行跟蹤處理，得到當前圖像的下一幀圖像中特徵點的深度資訊和下一幀圖像的相機位姿參數，從而將下一幀圖像作為當前圖像，並重新執行利用預設位姿跟蹤方式對當前圖像中特徵點的深度資訊、當前圖像的相機位姿參數進行跟蹤處理的步驟以及後續步驟，進而能夠逐幀計算相機位姿參數，有利於降低相機位姿參數的累積誤差。 Therefore, the depth information of the feature points in the current image and the camera pose parameters of the current image are tracked by using the preset pose tracking method, and the depth information and the next frame of the feature point in the current image are obtained. The camera pose parameters of one frame of image, so that the next frame of image is used as the current image, and the depth information of the feature points in the current image and the camera pose of the current image are re-executed using the preset pose tracking method. Parameters for tracking processing steps and subsequent steps, so that the camera pose parameters can be calculated frame by frame, which is beneficial To reduce the cumulative error of camera pose parameters.

其中，利用預設位姿跟蹤方式對當前圖像中特徵點的深度資訊、當前圖像的相機位姿參數進行跟蹤處理，得到當前圖像的下一幀圖像中特徵點的深度資訊和下一幀圖像的相機位姿參數，包括：利用當前圖像中特徵點的深度資訊，確定特徵點在下一幀圖像中的投影點，基於特徵點在當前圖像中局部區域的圖元值和投影點在下一幀圖像中局部區域的圖元值之間的差異，得到當前圖像與下一幀圖像之間的位姿變換參數，利用位姿變換參數和當前圖像的相機位姿參數，得到下一幀圖像的相機位姿參數，利用已經收斂的三維點，優化下一幀圖像的相機位姿參數，獲取特徵點的深度資訊的概率分佈，並利用概率分佈，得到下一幀圖像中特徵點的深度資訊。 Among them, the depth information of the feature points in the current image and the camera pose parameters of the current image are tracked by using the preset pose tracking method, and the depth information and the next frame of the feature point in the current image are obtained. The camera pose parameters of a frame of image, including: using the depth information of the feature points in the current image to determine the projection point of the feature point in the next frame of image, based on the primitive value of the feature point in the local area of the current image and the difference between the primitive values of the projection point in the local area in the next frame image to obtain the pose transformation parameters between the current image and the next frame image, using the pose transformation parameters and the camera position of the current image pose parameters to obtain the camera pose parameters of the next frame of image, use the converged 3D points to optimize the camera pose parameters of the next frame of image, obtain the probability distribution of the depth information of the feature points, and use the probability distribution to obtain The depth information of the feature points in the next image frame.

因此，通過利用當前圖像中特徵點的深度資訊，確定特徵點在下一幀圖像中的投影點，從而基於特徵點在當前圖像中局部區域的圖元值和投影點在下一幀圖像中局部區域的圖元值之間的差異，得到當前圖像與下一幀圖像之間的位姿變換參數，並利用位姿變換參數和當前圖像的相機位姿參數，得到下一幀圖像的相機位姿參數，利用已經收斂的三維點，優化下一幀圖像的相機位姿參數，從而可以對相機位姿參數進行進一步的優化，有利於提高相機位姿參數的準確性；而通過獲取特徵點的深度資訊的概率分佈，並利用概率分佈，得到下一幀圖像中特徵點的深度資訊，從而能夠基於深度資訊的分佈概率，在拍攝過程中對深度資訊進行優化；這裡，將相機位姿參數與深度資訊進行分開優化，減少了優化計算量。 Therefore, by using the depth information of the feature points in the current image to determine the projection points of the feature points in the next frame of the image, based on the primitive values and projection points of the feature points in the local area in the current image The difference between the primitive values in the local area in the middle is used to obtain the pose transformation parameters between the current image and the next frame image, and the next frame is obtained by using the pose transformation parameters and the camera pose parameters of the current image The camera pose parameters of the image, using the converged 3D points, optimize the camera pose parameters of the next frame of image, so that the camera pose parameters can be further optimized, which is conducive to improving the accuracy of the camera pose parameters; By obtaining the probability distribution of the depth information of the feature points, and using the probability distribution, the depth information of the feature points in the next frame of image can be obtained, so that based on the distribution probability of the depth information, during the shooting process Optimize the depth information; here, the camera pose parameters and depth information are optimized separately, which reduces the amount of optimization calculation.

其中，相機位姿參數包括旋轉參數和位移參數；基於當前圖像的相機位姿參數，獲取當前圖像之後的待處理圖像的相機位姿參數之後，方法還包括：回應於待處理圖像的相機位姿參數不滿足預設穩定狀態條件，確定無法獲取待處理圖像的位移參數；以及，利用待處理圖像的上一幀圖像的圖元值和上一幀圖像的相機位姿參數，得到待處理圖像的旋轉參數。 Wherein, the camera pose parameters include rotation parameters and displacement parameters; based on the camera pose parameters of the current image, after obtaining the camera pose parameters of the image to be processed after the current image, the method further includes: responding to the image to be processed The camera pose parameters do not meet the preset steady state conditions, and it is determined that the displacement parameters of the image to be processed cannot be obtained; and, using the primitive value of the previous frame image of the image to be processed and the camera position of the previous frame image Attitude parameters to obtain the rotation parameters of the image to be processed.

因此，相機位姿參數設置為包括旋轉參數和位移參數，且在獲取當前圖像之後的待處理圖像的相機位姿參數之後，回應於待處理圖像的相機位姿參數不滿足預設穩定狀態條件，確定無法獲取待處理圖像的位移參數，從而利用待處理圖像的上一幀圖像的圖元值和上一幀圖像的相機位姿參數，得到待處理圖像的旋轉參數，進而能夠在相機位姿參數不準確時，直接利用圖像圖元，估計旋轉參數，能夠有利於降低因旋轉參數無法更新而導致的諸如虛擬實境中虛擬物體貼屏等問題發生的概率。 Therefore, the camera pose parameters are set to include rotation parameters and displacement parameters, and after acquiring the camera pose parameters of the image to be processed after the current image, responding that the camera pose parameters of the image to be processed do not satisfy the preset stability State conditions, determine that the displacement parameter of the image to be processed cannot be obtained, so that the rotation parameter of the image to be processed can be obtained by using the primitive value of the previous frame image of the image to be processed and the camera pose parameter of the previous frame image , and then when the camera pose parameters are inaccurate, the image primitives can be directly used to estimate the rotation parameters, which can help reduce the probability of problems such as virtual objects sticking to the screen in virtual reality caused by the inability to update the rotation parameters.

其中，利用待處理圖像的上一幀圖像的圖元值和上一幀圖像的相機位姿參數，得到待處理圖像的旋轉參數包括：利用待處理圖像和上一幀圖像之間的位姿變換參數對上一幀圖像中的至少部分圖元點進行投影變換，得到至少部分圖元點在待處理圖像的投影點；利用至少部分圖元點在上一幀圖像中的圖元值和與至少部分圖元點對應的投影點在待處理圖像中的圖元值的差異，構建關於位姿變換參數的目標函數；利用求解目標函數得到的位姿變換參數對上一幀圖像的相機位姿參數進行變換處理，得到待處理圖像的旋轉參數。 Wherein, using the primitive value of the previous frame image of the image to be processed and the camera pose parameter of the previous frame image, obtaining the rotation parameter of the image to be processed includes: using the image to be processed and the previous frame image The pose transformation parameters between perform projection transformation on at least some of the primitive points in the previous frame image, and obtain at least some of the primitive points in the projection points of the image to be processed; use at least some of the primitive points in the previous frame image Primitive values in the image and projections corresponding to at least some of the primitive points The difference between the primitive values of the shadow point in the image to be processed is used to construct an objective function about the pose transformation parameters; the pose transformation parameters obtained by solving the objective function are used to transform the camera pose parameters of the previous frame image, Get the rotation parameters of the image to be processed.

因此，利用待處理圖像和上一幀圖像之間的位姿變換參數對上一幀圖像中的至少部分圖元點進行投影變換，得到至少部分圖元點在待處理圖像的投影點，並利用至少部分圖元點在上一幀圖像中的圖元值和至少部分圖元點對應的投影點在待處理圖像中的圖元值的差異，構建關於位姿變換參數的目標函數，從而利用求解目標函數得到的位姿變換參數對上一幀圖像的相機位姿參數進行變換處理，得到待處理圖像的旋轉參數，故能夠基於至少部分圖元點求得旋轉參數，能夠有利於降低計算旋轉參數的計算量。 Therefore, use the pose transformation parameters between the image to be processed and the previous frame image to perform projection transformation on at least part of the primitive points in the previous frame image, and obtain the projection of at least part of the primitive points on the image to be processed points, and using the difference between the primitive values of at least some of the primitive points in the previous frame image and the primitive values of the projection points corresponding to at least some of the primitive points in the image to be processed, to construct the parameters of the pose transformation The objective function, so that the camera pose parameters of the previous frame image are transformed using the pose transformation parameters obtained by solving the objective function, and the rotation parameters of the image to be processed are obtained, so the rotation parameters can be obtained based on at least some primitive points , which can help to reduce the amount of calculation for calculating the rotation parameters.

其中，利用待處理圖像和上一幀圖像之間的位姿變換參數對上一幀圖像中的至少部分圖元點進行投影變換，得到至少部分圖元點在待處理圖像的投影點之前，方法還包括：將上一幀圖像進行降採樣處理，得到上一幀圖像的縮略圖像；利用待處理圖像和上一幀圖像之間的位姿變換參數對待處理圖像中的至少部分圖元點進行投影變換，得到至少部分圖元點在待處理圖像的投影點包括：利用待處理圖像和上一幀圖像之間的位姿變換參數對縮略圖像中的圖元點進行投影變換，得到縮略圖像中的圖元點在待處理圖像的投影點。 Wherein, using the pose transformation parameters between the image to be processed and the previous frame image to perform projection transformation on at least part of the primitive points in the previous frame image, to obtain the projection of at least part of the primitive points on the image to be processed Before the point, the method also includes: downsampling the previous frame image to obtain a thumbnail image of the previous frame image; using the pose transformation parameters between the image to be processed and the previous frame image to be processed Performing projection transformation on at least some of the primitive points in the image, and obtaining the projection points of at least some of the primitive points on the image to be processed includes: using the pose transformation parameters between the image to be processed and the previous frame image to shorten the The primitive points in the image are subjected to projection transformation to obtain the projection points of the primitive points in the thumbnail image on the image to be processed.

因此，通過將上一幀圖像進行降採樣處理，得到上一幀圖像的縮略圖像，從而利用待處理圖像的上一幀圖像之間的位姿變換參數對縮略圖像中的圖元點進行投影變換，得到縮略圖像中的圖元點在待處理圖像的投影點，以進行後續的目標函數構建以及求解，能夠有利於降低計算旋轉參數的計算量。 Therefore, by down-sampling the previous frame image, the thumbnail image of the previous frame image is obtained, and the thumbnail image is transformed using the pose transformation parameters between the previous frame images of the image to be processed Projective transformation is performed on the primitive points in the thumbnail image to obtain the projection points of the primitive points in the thumbnail image on the image to be processed, so as to construct and solve the subsequent objective function, which can help reduce the amount of calculation for calculating the rotation parameters.

其中，利用待處理圖像的上一幀圖像的圖元值和上一幀圖像的相機位姿參數，得到待處理圖像的旋轉參數之後，方法還包括：檢測相機當前的加速度資訊，並判斷加速度資訊是否處於預設運動狀態；若是，則重新執行獲取相機的重力資訊的步驟以及後續步驟；若否，則重新執行檢測相機當前的加速度資訊的步驟以及後續步驟。 Wherein, after obtaining the rotation parameter of the image to be processed by using the primitive value of the previous frame image of the image to be processed and the camera pose parameter of the previous frame image, the method further includes: detecting the current acceleration information of the camera, And determine whether the acceleration information is in the preset motion state; if yes, re-execute the step of obtaining the gravity information of the camera and the subsequent steps; if not, re-execute the step of detecting the current acceleration information of the camera and the subsequent steps.

因此，在得到待處理圖像的旋轉參數之後，進一步檢測相機當前的加速度資訊，並判斷加速度資訊是否處於預設運動狀態，從而在處於預設運動狀態的情況下，重新執行獲取相機的重力資訊的步驟以及後續步驟，並在不處於預設運動狀態的情況下，重新執行檢測相機當前的加速度資訊的步驟以及後續步驟，進而能夠有利於提高視覺定位的魯棒性。 Therefore, after obtaining the rotation parameters of the image to be processed, further detect the current acceleration information of the camera, and judge whether the acceleration information is in the preset motion state, so that in the case of the preset motion state, re-execute the acquisition of the camera's gravity information The step and the subsequent steps, and re-execute the step of detecting the current acceleration information of the camera and the subsequent steps without being in the preset motion state, which can help improve the robustness of the visual positioning.

其中，重力資訊包括重力方向資訊，相機位姿參數包括旋轉參數和位移參數，利用重力資訊，獲取相機在預設運動狀態下拍攝的當前圖像的相機位姿參數包括：利用重力方向資訊，獲取相機分別相對於世界座標系x座標軸、y座標軸和z座標軸的旋轉角度；其中，相機按照旋轉角度旋轉後的重力方向與z座標軸的反方向相同；利用旋轉角度，得到旋轉參數，並將位移參數設置為預設數值。 Among them, the gravity information includes the gravity direction information, and the camera pose parameters include rotation parameters and displacement parameters. Using the gravity information, obtaining the camera pose parameters of the current image captured by the camera in the preset motion state includes: using the gravity direction information to obtain The rotation angles of the camera relative to the x coordinate axis, y coordinate axis and z coordinate axis of the world coordinate system respectively; among them, the camera rotates according to the rotation angle The direction of gravity after the degree rotation is the same as the opposite direction of the z coordinate axis; use the rotation angle to obtain the rotation parameter, and set the displacement parameter to a preset value.

因此，通過利用重力方向資訊，獲取相機分別相對於世界座標系x座標軸、y座標軸和z座標軸的旋轉角度，且相機按照旋轉角度旋轉後的重力方向與z座標軸的反方向相同，從而利用旋轉角度，得到旋轉參數，並將位移參數設置為預設數值，能夠通過重力對齊得到旋轉參數，進而初始化相機位姿參數，有利於降低相機位姿參數初始化的計算量。 Therefore, by using the gravity direction information, the rotation angles of the camera relative to the x coordinate axis, y coordinate axis and z coordinate axis of the world coordinate system are obtained respectively, and the gravity direction after the camera is rotated according to the rotation angle is the same as the opposite direction of the z coordinate axis, thus using the rotation angle , to obtain the rotation parameters, and set the displacement parameters to preset values, the rotation parameters can be obtained through gravity alignment, and then the camera pose parameters can be initialized, which is beneficial to reduce the calculation amount of camera pose parameter initialization.

其中，世界座標系的原點為相機拍攝當前圖像時所在的位置，預設數值為0。 Wherein, the origin of the world coordinate system is the position where the camera captures the current image, and the default value is 0.

因此，將世界座標系的原點設置為相機拍攝當前圖像時所在的位置，預設數值設置為0，能夠有利於降低初始化位移參數的複雜度。 Therefore, setting the origin of the world coordinate system to the position where the camera captures the current image, and setting the preset value to 0 can help reduce the complexity of initializing the displacement parameters.

其中，預設運動狀態為靜止狀態或勻速運動狀態；和/或，重力資訊是利用相機在預設狀態下的加速度資訊得到的。 Wherein, the preset motion state is a static state or a uniform motion state; and/or, the gravity information is obtained by using the acceleration information of the camera in the preset state.

因此，將預設運動狀態設置為靜止狀態或勻速運動狀態，能夠有利於提高初始化當前圖像的相機位姿參數的準確性；而利用相機在預設狀態下的加速度資訊得到重力資訊，能夠僅利用加速度計得到重力資訊，從而能夠有利於進一步降低視覺定位技術的使用成本，擴大視覺定位技術的使用範圍。 Therefore, setting the preset motion state to a static state or a uniform motion state can help improve the accuracy of initializing the camera pose parameters of the current image; and using the acceleration information of the camera in the preset state to obtain gravity information can only Using the accelerometer to obtain the gravity information can further reduce the cost of using the visual positioning technology and expand the application range of the visual positioning technology.

本發明第二方面提供了一種視覺定位裝置，包括：重力資訊獲取部分、第一位姿獲取部分和第二位姿獲取部分，重力資訊獲取部分配置為獲取相機的重力資訊；第一位姿獲取部分配置為利用重力資訊，獲取相機在預設運動狀態下拍攝的當前圖像的相機位姿參數；第二位姿獲取部分配置為基於當前圖像的相機位姿參數，獲取當前圖像之後的待處理圖像的相機位姿參數。 The second aspect of the present invention provides a visual positioning device, including: The gravity information acquisition part, the first pose acquisition part and the second pose acquisition part, the gravity information acquisition part is configured to obtain the gravity information of the camera; the first pose acquisition part is configured to use the gravity information to obtain the camera in the preset motion state The camera pose parameters of the current image captured below; the second pose acquisition part is configured to acquire the camera pose parameters of the image to be processed after the current image based on the camera pose parameters of the current image.

本發明第三方面提供了一種電子設備，包括相互耦接的記憶體和處理器，處理器用於執行記憶體中儲存的程式指令，以實現上述第一方面中的視覺定位方法。 The third aspect of the present invention provides an electronic device, including a memory and a processor coupled to each other, and the processor is used to execute program instructions stored in the memory, so as to realize the visual positioning method in the first aspect above.

本發明第四方面提供了一種電腦可讀儲存介質，其上儲存有程式指令，程式指令被處理器執行時實現上述第一方面中的視覺定位方法。 A fourth aspect of the present invention provides a computer-readable storage medium, on which program instructions are stored, and when the program instructions are executed by a processor, the visual positioning method in the above-mentioned first aspect is implemented.

本發明第五方面提供了一種電腦程式，包括電腦可讀代碼，在所述電腦可讀代碼在電子設備中運行，被所述電子設備中的處理器執行的情況下，實現上述第一方面中的視覺定位方法。 The fifth aspect of the present invention provides a computer program, including computer-readable codes, when the computer-readable codes run in an electronic device and are executed by a processor in the electronic device, the above-mentioned first aspect can be implemented. visual positioning method.

上述方案，通過獲取相機的重力資訊，從而利用重力資訊，獲取相機在預設運動狀態下拍攝的當前圖像的相機位姿參數，並基於當前圖像的相機位姿參數，獲取當前圖像之後的待處理圖像的相機位姿參數，進而能夠只依賴於相機和重力資訊來進行視覺定位，故能夠降低視覺定位技術的使用成本，擴大視覺定位技術的使用範圍。 In the above solution, by obtaining the gravity information of the camera, the gravity information is used to obtain the camera pose parameters of the current image captured by the camera in the preset motion state, and based on the camera pose parameters of the current image, after obtaining the current image The camera pose parameters of the image to be processed, and then can only rely on the camera and gravity information for visual positioning, so the cost of using visual positioning technology can be reduced, and the scope of use of visual positioning technology can be expanded.

60:視覺定位裝置 60: Visual positioning device

61:重力資訊獲取部分 61: Gravity information acquisition part

62:第一位姿獲取部分 62: The first pose acquisition part

63:第二位姿獲取部分 63: The second pose acquisition part

70:電子設備 70: Electronic equipment

71:記憶體 71: Memory

72:處理器 72: Processor

80:電腦可讀儲存介質 80: computer readable storage medium

801:程式指令 801: Program instruction

S11~S13,S131~S133,S41~S47,S451~S453:步驟 S11~S13, S131~S133, S41~S47, S451~S453: steps

h:相機的預設高度 h : the preset height of the camera

:特徵點的重力向量

: Gravity vector of feature point

α:特徵點的方向向量和重力向量之間的夾角 α: The angle between the direction vector of the feature point and the gravity vector

:特徵點的方向向量

: The direction vector of the feature point

圖1是本發明視覺定位方法一實施例的流程示意圖；圖2是獲取深度資訊一實施例的示意圖；圖3是圖1中步驟S13一實施例的流程示意圖；圖4是本發明視覺跟蹤方法另一實施例的流程示意圖；圖5是圖4中步驟S45一實施例的流程示意圖；圖6是本發明視覺定位裝置一實施例的框架示意圖；圖7是本發明電子設備一實施例的框架示意圖；圖8是本發明電腦可讀儲存介質一實施例的框架示意圖。 Fig. 1 is a schematic flow diagram of an embodiment of the visual positioning method of the present invention; Fig. 2 is a schematic diagram of an embodiment of obtaining depth information; Fig. 3 is a schematic flow diagram of an embodiment of step S13 in Fig. 1; Fig. 4 is a visual tracking method of the present invention A schematic flow diagram of another embodiment; FIG. 5 is a schematic flow diagram of an embodiment of step S45 in FIG. 4; FIG. 6 is a schematic diagram of the frame of an embodiment of the visual positioning device of the present invention; FIG. Schematic diagram; FIG. 8 is a schematic diagram of the framework of an embodiment of the computer-readable storage medium of the present invention.

下面結合說明書附圖，對本發明實施例的方案進行詳細說明。 The solutions of the embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

以下描述中，為了說明而不是為了限定，提出了諸如特定系統結構、介面、技術之類的細節，以便透徹理解本發明。 In the following description, for purposes of illustration rather than limitation, details such as specific system structures, interfaces, and techniques are set forth in order to provide a thorough understanding of the present invention.

本文中術語“系統”和“網路”在本文中常被可互換使用。本文中術語“和/或”，僅僅是一種描述關聯對象的關聯關係，表示可以存在三種關係，例如，A和/或B，可以表示：單獨存在A，同時存在A和B，單獨存在B這三種情況。另外，本文中字元“/”，一般表示前後關聯對象是一種“或”的關係。此外，本文中的“多”表示兩個或者多於兩個。 The terms "system" and "network" are often used interchangeably herein. The term "and/or" in this article is just an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone. three conditions. In addition, the character "/" in this article generally indicates that the contextual objects are an "or" relationship. In addition, "many" in this article means two or more than two.

請參閱圖1，圖1是本發明視覺定位方法一實施例的流程示意圖。視覺定位方法可以包括如下步驟。 Please refer to FIG. 1 . FIG. 1 is a schematic flowchart of an embodiment of the visual positioning method of the present invention. The visual positioning method may include the following steps.

步驟S11：獲取相機的重力資訊。 Step S11: Obtain the gravity information of the camera.

在本發明實施例中，視覺定位方法的執行主體可以是視覺定位裝置，例如，視覺定位方法可以由終端設備或伺服器或其它處理設備執行，其中，終端設備可以為使用者設備(User Equipment，UE)、移動設備、使用者終端、終端、蜂窩電話、無線電話、個人數位助理(Personal Digital Assistant，PDA)、手持設備、計算設備、車載設備、可穿戴設備等。在一些可能的實現方式中，該視覺定位方法可以通過處理器調用記憶體中儲存的電腦可讀指令的方式來實現。 In the embodiment of the present invention, the execution subject of the visual positioning method may be a visual positioning device. For example, the visual positioning method may be executed by a terminal device or a server or other processing device, wherein the terminal device may be a user equipment (User Equipment, UE), mobile devices, user terminals, terminals, cellular phones, wireless phones, personal digital assistants (Personal Digital Assistant, PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc. In some possible implementation manners, the visual positioning method may be implemented by calling a computer-readable instruction stored in a memory by a processor.

在本發明實施例中，相機可以集成於移動設備中；移動設備可以包括但不限於：手機、平板電腦、機器人等。這裡，本發明實施例以及下述公開實施例中的步驟可以由移動設備執行，移動設備中設置有視覺定位裝置。此外，移動設備還可以集成有其他感測器，例如，加速度計、重力計、IMU等，對此，可以根據實際應用場景進行設置，在此不做限定。例如，受成本限制，低端移動設備可以僅集成有相機和加速度計；或者，也可以集成有相機和重力計，此外，中高端移動設備還可以集成有相機、加速度計、IMU等，在此不做限定。 In the embodiment of the present invention, the camera may be integrated into a mobile device; the mobile device may include but not limited to: a mobile phone, a tablet computer, a robot, and the like. Here, the steps in the embodiments of the present invention and the following disclosed embodiments may be executed by a mobile device, and a visual positioning device is provided in the mobile device. In addition, the mobile device may also be integrated with other sensors, such as an accelerometer, a gravity meter, an IMU, etc., which may be set according to actual application scenarios, and are not limited here. For example, due to cost constraints, low-end mobile devices can only integrate cameras and accelerometers; or, they can also integrate cameras and gravimeters. In addition, mid-to-high-end mobile devices can also integrate cameras, accelerometers, IMUs, etc. Here No limit.

在一個實施場景中，重力資訊可以是利用相機在預設運動狀態下的加速度資訊得到的，從而能夠無需IMU，僅依靠加速度計得到重力資訊。其中，預設運動狀態為靜止狀態或勻速運動狀態。例如，預設運動狀態下檢測得到的相機加速度與重力加速度之間的差異在預設範圍內，例如，重力加速度為9.8m/s²，預設範圍為0~1m/s²，在檢測得到的相機加速度為10m/s²的情況下，可以認為相機在預設運動狀態，預設範圍可以根據實際應用需要進行設置，在此不做限定。此外，在移動設備集成有重力計的情況下，也可以無需IMU，直接通過重力計獲取重力資訊。 In an implementation scenario, the gravity information can be obtained by using the acceleration information of the camera in a preset motion state, so that the gravity information can be obtained only by the accelerometer without an IMU. Wherein, the preset motion state is a static state or a uniform motion state. For example, the difference between the detected camera acceleration and the acceleration of gravity in the preset motion state is within the preset range. For example, the acceleration of gravity is 9.8m/s ² , and the preset range is 0~1m/s ² . When the acceleration of the camera is 10m/s ² , it can be considered that the camera is in a preset motion state, and the preset range can be set according to actual application needs, which is not limited here. In addition, when the mobile device is integrated with a gravimeter, it is also possible to obtain gravity information directly through the gravimeter without an IMU.

在一個實施場景中，移動設備還可以根據檢測得到的相機加速度來判斷是否處於靜止狀態或勻速運動狀態，例如，在檢測得到的相機加速度接近於重力加速度的情況下，可以認為處於靜止狀態或勻速運動狀態。其中，在配置有加速度計的情況下，可以計算加速度計在三軸上的加速度分量(如，a _x,a _y,a _z)平方和的根，作為相機加速度a _camera，即

。 In an implementation scenario, the mobile device can also judge whether it is in a static state or a state of constant motion according to the detected camera acceleration. state of motion. Among them, when an accelerometer is configured, the root of the sum of the squares of the acceleration components (such as a _x , a _y , a _z ) of the accelerometer on the three axes can be calculated as the camera acceleration a _camera , namely

.

在另一個實施場景中，在檢測到相機不處於預設運動狀態的情況下，移動設備可以重新進行檢測，直至檢測得到相機處於預設運動狀態為止。這裡，檢測的頻率可以和相機拍攝的頻率一致，例如，相機每秒拍攝25張圖像，則可以相應地每秒檢測25次是否處於預設運動狀態，對此，可以根據實際應用需要進行設置，在此不做限定。 In another implementation scenario, when it is detected that the camera is not in the preset motion state, the mobile device may perform detection again until it is detected that the camera is in the preset motion state. Here, the frequency of detection can be consistent with the frequency of camera shooting. For example, if the camera takes 25 images per second, it can detect whether it is in the preset motion state 25 times per second. For this, it can be set according to actual application needs , is not limited here.

在一個實施場景中，相機的重力資訊可以包括重力方向資訊；其中，重力方向資訊可以包括重力向量。在一個實施場景中，在配置有加速度計的情況下，可以計算加速度計在三軸上的加速度分量的向量和(即

+

+

)，並將向量和作為重力向量，或者，還可以將與向量和同方向的單位向量作為重力向量，對於重力向量，可以根據實際應用需要進行設置，在此不做限定。 In an implementation scenario, the gravity information of the camera may include gravity direction information; wherein, the gravity direction information may include a gravity vector. In an implementation scenario, when an accelerometer is configured, the vector sum of the acceleration components of the accelerometer on three axes (ie

+

), and the vector sum is used as a gravity vector, or a unit vector in the same direction as the vector sum can also be used as a gravity vector. For the gravity vector, it can be set according to actual application needs, and is not limited here.

步驟S12：利用重力資訊，獲取相機在預設運動狀態下拍攝的當前圖像的相機位姿參數。 Step S12: Obtain the camera pose parameters of the current image captured by the camera in a preset motion state by using the gravity information.

在一個實施場景中，相機位姿參數可以包括位移參數和旋轉參數，從而可以利用重力方向資訊，獲取相機分別相對於世界座標系x座標軸、y座標軸和z座標軸的旋轉角度，且相機按照旋轉角度旋轉後的重力方向與z座標軸的反方向相同，進而可以利用旋轉角度，得到旋轉參數，並將位移參數設置為預設數值，故可以通過重力對齊簡化初始化當前圖像的相機位姿參數，降低計算量。 In an implementation scenario, the camera pose parameters can include displacement parameters and rotation parameters, so that the gravity direction information can be used to obtain the rotation angles of the camera relative to the x coordinate axis, y coordinate axis, and z coordinate axis of the world coordinate system, and the camera rotates according to the rotation angle The direction of gravity after rotation is the same as the opposite direction of the z coordinate axis, and then the rotation parameter can be obtained by using the rotation angle, and the displacement parameter can be set to a preset value, so the camera pose parameters of the current image can be simplified and initialized through gravity alignment, reducing Calculations.

在一個實施場景中，相機相對於x座標軸的旋轉角度可以表示為θ、相機相對於y座標軸的旋轉角度可以表示為φ、相機相對於z座標軸的旋轉角度可以表示為

，則相機相對於世界座標系x座標軸的旋轉參數R _x，相對於世界座標系y座標軸的旋轉參數R _y，相對於世界座標系z座標軸的旋轉參數R _z可以分別表示為：

In an implementation scenario, the rotation angle of the camera relative to the x coordinate axis can be expressed as θ, the rotation angle of the camera relative to the y coordinate axis can be expressed as φ, and the rotation angle of the camera relative to the z coordinate axis can be expressed as

, then the rotation parameter R _x of the camera relative to the x coordinate axis of the world coordinate system, the rotation parameter R _y relative to the y coordinate axis of the world coordinate system, and the rotation parameter R _z relative to the z coordinate axis of the world coordinate system can be expressed as:

旋轉參數R可以由相對於世界座標系x座標軸的旋轉參數R _x，相對於世界座標系y座標軸的旋轉參數R _y，相對於世界座標系z座標軸的旋轉參數R _z求得；這裡，可以將上述旋轉參數R _x、旋轉參數R _y和旋轉參數R _z的乘積，作為旋轉參數R，即旋轉參數R可以表示為：R=R _x R _y R _z......(2) The rotation parameter R can be obtained from the rotation parameter R _x relative to the x coordinate axis of the world coordinate system, the rotation parameter R _y relative to the y coordinate axis of the world coordinate system, and the rotation parameter R _z relative to the z coordinate axis of the world coordinate system; here, The product of the above rotation parameter R _x , rotation parameter R _y and rotation parameter R _z is used as the rotation parameter R , that is, the rotation parameter R can be expressed as: R=R _x R _y R _z …(2)

在另一個實施場景中，移動設備可以將世界座標系的原點作為相機拍攝當前圖像時所在的位置，即相機相對於x座標軸、y座標軸、z座標軸的位移均為0，故可以將預設數值設置為0，即位移參數可以設置為0。 In another implementation scenario, the mobile device can use the origin of the world coordinate system as the position where the camera takes the current image, that is, the displacement of the camera relative to the x-coordinate axis, y-coordinate axis, and z-coordinate axis is all 0, so the preset Set the value to 0, that is, the displacement parameter can be set to 0.

步驟S13：基於當前圖像的相機位姿參數，獲取當前圖像之後的待處理圖像的相機位姿參數。 Step S13: Based on the camera pose parameters of the current image, acquire the camera pose parameters of the image to be processed after the current image.

在一個實施場景中，移動設備可以掃描獲取多幀圖像，並利用三角化方式對當前圖像和當前圖像的相鄰圖像中相互匹配的特徵點進行處理，得到匹配的特徵點的深度資訊；其中，深度資訊可以包括特徵點的深度值，從而利用計算得到的深度值，可以得到特徵點在世界座標系中的三維座標；進而可以利用當前圖像的下一幀圖像與當前圖像之間的位姿變換參數，將特徵點的三維座標重投影至下一幀圖像中，得到在下一幀圖像的投影點；從而利用投影點在下一幀圖像的圖元值以及對應的特徵點在當前圖像的圖元值之間的差異，構建關於位姿變換參數的目標函數；通過最小化該目標函數，可以求得位姿變換參數，並利用位姿變換參數和當前圖像的相機位姿參數，可以得到下一幀圖像的相機位姿參數；以此類推，移動設備可以逐幀獲取當前圖像之後的待處理圖像的相機位姿參數。在一個實施場景中，三角化方式是指在不同的位置觀測同一個三維點，且已知在不同位置處觀測到的三維點的二維投影點，利用三角關係，恢復出三維點的深度資訊，在此不再贅述。 In an implementation scenario, the mobile device can scan and acquire multiple frames of images, and use triangulation to process the matching feature points in the current image and the adjacent images of the current image to obtain the depth of the matching feature points information; among them, the depth information can include the depth value of the feature point, so that the three-dimensional coordinates of the feature point in the world coordinate system can be obtained by using the calculated depth value; and then the next frame image of the current image and the current image The pose transformation parameters between images, reproject the three-dimensional coordinates of the feature points to the next frame of image, and obtain the projection point in the next frame of image; thus use the primitive value of the projection point in the next frame of image and the corresponding The difference between the feature points in the primitive values of the current image constructs an objective function about the pose transformation parameters; by minimizing the objective function, the pose transformation parameters can be obtained, and using the pose transformation parameters and the current image The camera pose parameters of the image can be used to obtain the camera pose parameters of the next frame of image; and so on, the mobile device can obtain frame by frame Get the camera pose parameters of the image to be processed after the current image. In an implementation scenario, the triangulation method refers to observing the same 3D point at different positions, and knowing the 2D projection points of the 3D points observed at different positions, using the triangular relationship to restore the depth information of the 3D point , which will not be repeated here.

在另一個實施場景中，為了減少掃描多幀圖像所帶來的額外回應時間，提高視覺定位的回應速度，移動設備還可以通過當前圖像中特徵點的特徵方向資訊和重力方向資訊，得到當前圖像中特徵點的深度資訊，從而基於當前圖像中特徵點的深度資訊和當前圖像的相機位姿參數，獲取當前圖像之後的待處理圖像中特徵點的深度資訊和待處理圖像的相機位姿參數，進而能夠僅需當前圖像即可初始化深度資訊，故能夠免於掃描多幀圖像，有利於提高視覺定位的回應速度。 In another implementation scenario, in order to reduce the additional response time caused by scanning multiple frames of images and improve the response speed of visual positioning, the mobile device can also obtain the feature direction information and gravity direction information of the feature points in the current image. The depth information of the feature points in the current image, so that based on the depth information of the feature points in the current image and the camera pose parameters of the current image, the depth information and the depth information of the feature points in the image to be processed after the current image are obtained The camera pose parameters of the image can then initialize the depth information with only the current image, so it can avoid scanning multiple frames of images, which is beneficial to improve the response speed of visual positioning.

在一個實施場景中，特徵方向資訊可以包括特徵點的方向向量，重力方向資訊包括重力向量；其中，方向向量和重力向量可以為單位向量，深度資訊包括特徵點的深度值；特徵點可以包括能夠描述圖像特徵的圖元點，例如，可以包括圖像中輪廓邊緣圖元點、圖元值突變的圖元點等等；這裡，對於特徵點，可以根據實際需要進行設置，在此不做限定。例如，可以通過FAST(Features from Accelerated Segment Test，加速段測試的特徵)、BRIEF(Binary Robust Independent Elementary Features，二進位魯棒獨立特徵描述子)、SIFT(Scale Invariant Feature Transform，尺度不變特徵變換)、ORB等檢測方式，得到特徵點以及特徵點的方向向量，對此，可以根據實際應用需要選擇特徵點檢測方式，在此不做限定。 In an implementation scenario, the feature direction information can include the direction vector of the feature point, and the gravity direction information can include the gravity vector; wherein, the direction vector and the gravity vector can be unit vectors, and the depth information can include the depth value of the feature point; the feature point can include The primitive points that describe the image features, for example, can include the contour edge primitive points in the image, the primitive points with sudden changes in the primitive value, etc.; here, for the feature points, you can set them according to actual needs, and do not do it here limited. For example, FAST (Features from Accelerated Segment Test, features of accelerated segment test), BRIEF (Binary Robust Independent Elementary Features, binary robust independent feature descriptor), SIFT (Scale Invariant Feature Transform (Scale Invariant Feature Transform), ORB and other detection methods to obtain feature points and direction vectors of feature points. For this, feature point detection methods can be selected according to actual application needs, and are not limited here.

在另一個實施場景中，請結合參閱圖2，圖2是獲取深度資訊一實施例的示意圖，移動設備可以對特徵點的方向向量

和重力向量

進行第一預設運算，得到特徵點的方向向量

和重力向量

之間的夾角，第一預設運算可以包括內積運算，即方向向量

和重力向量

之間的夾角α可以表示為：

In another implementation scenario, please refer to FIG. 2. FIG. 2 is a schematic diagram of an embodiment of obtaining depth information. The mobile device can calculate the direction vector of the feature point

and the gravity vector

Perform the first preset operation to obtain the direction vector of the feature point

and the gravity vector

The angle between , the first preset operation can include the inner product operation, that is, the direction vector

and the gravity vector

The angle α between can be expressed as:

在得到夾角α之後，可以對相機的預設高度h和夾角α進行第二預設運算，得到特徵點的深度值z，第二預設運算包括將預設高度h除以夾角的餘弦值，預設高度h可以根據實際應用情況進行設置，以AR應用為例，可以根據虛擬物體的大小進行設置，例如，虛擬物體為一般體型的貓、狗等寵物，則可以將預設高度設置為0.5米~1米，其他應用情況可以根據實際情況進行設置，在此不再一一舉例。其中，深度值z可以表示為：

After obtaining the included angle α, a second preset operation can be performed on the preset height h of the camera and the included angle α to obtain the depth value z of the feature point. The second preset operation includes dividing the preset height h by the cosine value of the included angle, The preset height h can be set according to the actual application situation. Taking the AR application as an example, it can be set according to the size of the virtual object. For example, if the virtual object is a pet such as a cat or a dog of a normal size, the preset height can be set to 0.5 m ~ 1 m, other applications can be set according to the actual situation, no more examples here. Among them, the depth value z can be expressed as:

在一個實施場景中，本發明實施例以及下述公開實施例中的步驟可以集成於移動設備運行的室內導航、自動駕駛、AR、VR等應用程式、網頁中，對此，可以根據實際應用需要進行設置，在此不做限定。 In an implementation scenario, the steps in the embodiments of the present invention and the following disclosed embodiments can be integrated into indoor navigation, automatic driving, AR, VR and other applications and webpages running on mobile devices. The actual application needs to be set, which is not limited here.

上述方案，移動設備通過獲取相機的重力資訊，從而利用重力資訊，獲取相機在預設運動狀態下拍攝的當前圖像的相機位姿參數，並基於當前圖像的相機位姿參數，獲取當前圖像之後的待處理圖像的相機位姿參數，進而能夠只依賴於相機和重力資訊來進行視覺定位，故能夠降低視覺定位技術的使用成本，擴大視覺定位技術的使用範圍。 In the above solution, the mobile device obtains the gravity information of the camera, thereby using the gravity information to obtain the camera pose parameters of the current image captured by the camera in a preset motion state, and based on the camera pose parameters of the current image, obtains the current image The camera pose parameters of the image to be processed after the imaging can only rely on the camera and gravity information for visual positioning, so the cost of using visual positioning technology can be reduced and the scope of use of visual positioning technology can be expanded.

請參閱圖3，圖3是圖1中步驟S13一實施例的流程示意圖。圖3是基於當前圖像中特徵點的深度資訊和當前圖像的相機位姿參數，獲取當前圖像之後的待處理圖像中特徵點的深度資訊和待處理圖像的相機位姿參數一實施例的流程示意圖。其中，S13可以包括如下步驟。 Please refer to FIG. 3 . FIG. 3 is a schematic flowchart of an embodiment of step S13 in FIG. 1 . Figure 3 is based on the depth information of the feature points in the current image and the camera pose parameters of the current image to obtain the depth information of the feature points in the image to be processed after the current image and the camera pose parameters of the image to be processed A schematic flow chart of an embodiment. Wherein, S13 may include the following steps.

步驟S131：利用預設位姿跟蹤方式對當前圖像中特徵點的深度資訊、當前圖像的相機位姿參數進行跟蹤處理，得到當前圖像的下一幀圖像中特徵點的深度資訊和下一幀圖像的相機位姿參數。 Step S131: Use the preset pose tracking method to track the depth information of the feature points in the current image and the camera pose parameters of the current image to obtain the depth information and the depth information of the feature points in the next frame of the current image. The camera pose parameters of the next frame image.

預設位姿跟蹤方式可以根據實際應用需要進行設置。本發明實施例中，預設位姿跟蹤方式可以包括稀疏圖像對齊、特徵點對齊、位姿優化等步驟，從而通過上述步驟處理得到下一幀圖像的相機位姿參數，此外，預設位姿跟蹤方式還可以包括圖點優化步驟，從而通過圖點優化等步驟處理得到下一幀圖像中特徵點的深度資訊。 The preset pose tracking method can be set according to actual application needs. In the embodiment of the present invention, the preset pose tracking method may include steps such as sparse image alignment, feature point alignment, and pose optimization, so as to obtain the camera pose parameters of the next frame of image through the above steps. In addition, the preset The pose tracking method may also include a map point optimization step, so as to obtain the depth information of the feature points in the next frame image through the map point optimization and other steps.

在一個實施場景中，移動設備在進行稀疏圖像對齊時，可以先利用所述當前圖像中所述特徵點的深度資訊，確定所述特徵點在所述下一幀圖像中的投影點，可以包括：利用二維到三維的反投影函數π^-1，將包含特徵點座標u和特徵點深度值du的第一座標資訊(u,du)反投影至三維空間，得到特徵點的三維座標π^-1(u,du)，並利用當前圖像k-1和下一幀圖像k之間的位姿變換參數T和三維到二維的投影函數π將當前圖像特徵點的三維座標π^-1(u,du)投影至下一幀圖像k，得到特徵點在下一幀圖像k中的投影點π(T‧π^-1(u,du))；從而得到投影點在下一幀圖像k中的局部區域的圖元值W _k(π(T‧π^-1(u,du)))以及對應於當前圖像k-1的特徵點局域區域的圖元值W _k-1(u)之間存在差異；進而可以基於該差異，得到當前圖像k-1和下一幀圖像k之間的位姿變換參數。其中，局部區域可以是以特徵點(或投影點)為中心的一矩形區域(如，3 * 3區域、4 * 4區域、8 * 8區域等等)，如下式所示：r(T,u)=W _k(π(T‧π^-1(u,du)))-W _k-1(u)......(5) In an implementation scenario, when performing sparse image alignment, the mobile device may first use the depth information of the feature points in the current image to determine the projection points of the feature points in the next frame of image , may include: back-projecting the first coordinate information ( u , du ) including the feature point coordinate u and the feature point depth value du to the 3D space by using the 2D-to-3D back-projection function π ^-1 , to obtain the 3D feature point Coordinate π ^-1 ( u , du ), and use the pose transformation parameter T between the current image k -1 and the next frame image k and the projection function π from 3D to 2D to transform the 3D of the feature points of the current image The coordinate π ^-1 ( u , du ) is projected to the next frame of image k , and the projected point π( T ‧π ^-1 ( u , du )) of the feature point in the next frame of image k is obtained; thus the projected point is obtained in the following The primitive value W _k (π( T ‧π ^-1 ( u , du ))) of the local area in a frame of image k and the primitive value W of the local area of the feature point corresponding to the current image k -1 There is a difference between _{k -1} ( u ); based on this difference, the pose transformation parameters between the current image k -1 and the next frame image k can be obtained. Among them, the local area can be a rectangular area (such as 3 * 3 area, 4 * 4 area, 8 * 8 area, etc.) centered on the feature point (or projection point), as shown in the following formula: r ( T , u )= W _k (π( T ‧π ^-1 ( u , du )))- W _{k -1} ( u )...(5)

需要說明的是，特徵點的數量一般有多個，故可以對多個特徵點計算上述差異，並進行求和，進一步構建目標函數，如下式所示：

It should be noted that there are generally multiple feature points, so the above differences can be calculated for multiple feature points and summed to further construct the objective function, as shown in the following formula:

上述公式(6)中，

表示目標函數，其中，ρ表示魯棒函數，用於降低雜訊影響，∥‧∥表示範數運算，

表示以位姿變換參數T為優化對象最小化目標函數，T _k,k-1表示求解目標函數所得到的位姿變換參數。 In the above formula (6),

Represents the objective function, where ρ represents a robust function for reducing the impact of noise, ∥‧∥ represents a norm operation,

Indicates that the objective function is minimized with the pose transformation parameter T as the optimization object, and T _{k,k -1} represents the pose transformation parameters obtained by solving the objective function.

在計算得到的位姿變換參數T _k,k-1之後，移動設備可以利用位姿變換參數T _k,k-1和當前圖像k-1的相機位姿參數T _k-1，得到下一幀圖像的相機位姿參數T _k。這裡，可以將位姿變換參數T _k,k-1乘以當前圖像k-1的相機位姿參數T _k-1，得到下一幀圖像的相機位姿參數T _k。 After calculating the pose transformation parameter T _{k,k -1} , the mobile device can use the pose transformation parameter T _{k,k -1} and the camera pose parameter T _{k -1} of the current image k -1 to obtain the next The camera pose parameter T _k of the frame image. Here, the pose transformation parameter T _{k,k -1} can be multiplied by the camera pose parameter T _k -1 of the current image k -1 to obtain the camera pose parameter T _k of the next frame image.

此外，為了降低稀疏圖像對齊的計算複雜度，移動設備還可以對當前圖像k-1和下一幀圖像k進行下採樣處理，得到當前圖像k-1和下一幀圖像k的金字塔圖像，並取金字塔圖像中解析度為預設解析度的一層圖像或多層圖像進行上述稀疏圖像對齊的處理，從而可以降低計算複雜度。 In addition, in order to reduce the computational complexity of sparse image alignment, the mobile device can also down-sample the current image k -1 and the next frame image k to obtain the current image k -1 and the next frame image k Pyramid image, and take a layer of images or multi-layer images whose resolution is the preset resolution in the pyramid image to perform the above sparse image alignment processing, so that the computational complexity can be reduced.

在一個實施場景中，上述稀疏圖像對齊的操作不可避免地會帶來累計誤差，從而導致所獲取的下一幀圖像的相機位姿參數T _k精度較低。為了提高精度，移動設備可以利用已經收斂的三維點(例如，三維模型中三維點)，優化下一幀圖像的相機位姿參數T _k。這裡，可以利用已經收斂的三維點進行匹配對齊得到投影點，再利用投影點對上述稀疏圖像對齊所得到的下一幀圖像的相機位姿參數T _k進行優化特徵點對齊的步驟可以包括：在已經收斂的三維點中選取能夠投影至下一幀圖像k的三維點，作為目標三維點，並從已經拍攝的圖像中選取目標三維點能夠投影的圖像中，最早拍攝的圖像，作為參考圖像，並獲取目標三維點在參考圖像中局部區域圖元值W _r(u _i)，利用上述粗略估計的下一幀圖像的相機位姿參數T _k將目標三維點投影至下一幀圖像中，得到目標三維點在下一幀圖像的投影點

；從而獲取投影點

在下一幀圖像中局部區域圖元值W _k(

)，進而可以利用局部區域圖元值W _r(u _i)和局部區域圖元值W _k(

)，構建關於投影點

的目標函數，參閱下式：

In an implementation scenario, the above operation of sparse image alignment will inevitably bring cumulative errors, resulting in low accuracy of the camera pose parameter T _k of the acquired next frame image. In order to improve the accuracy, the mobile device can use the converged 3D points (for example, 3D points in the 3D model) to optimize the camera pose parameter T _k of the next frame of image. Here, the converged three-dimensional points can be used for matching and alignment to obtain the projection points, and then the projection points can be used to optimize the camera pose parameters T _k of the next frame image obtained by the above sparse image alignment. The step of feature point alignment can include : Select the 3D point that can be projected to the next frame of image k from the converged 3D points as the target 3D point, and select the earliest captured image among the images that the target 3D point can project from the captured images image, as a reference image, and obtain the primitive value W _r ( u _i ) of the local area of the target 3D point in the reference image, and use the camera pose parameter T _k of the next frame image roughly estimated above to convert the target 3D point Project to the next frame image to get the projection point of the target 3D point in the next frame image

; to get the projected point

In the next frame image, the primitive value W _k of the local area (

), and then the local area primitive value W _r ( u _i ) and the local area primitive value W _k (

), constructing about the projected point

The objective function of , see the following formula:

上述公式(7)中，

表示目標函數，其中，∥‧∥表示範數運算，A _i表示仿射變換矩陣，用於補償不同視角帶來的圖像扭曲，

表示以投影點

的位置為優化對象最小化目標函數。 In the above formula (7),

Represents the objective function, where ∥‧∥ represents the norm operation, A _i represents the affine transformation matrix, which is used to compensate the image distortion caused by different viewing angles,

expressed as a projected point

The location of is the optimization object to minimize the objective function.

在得到投影點

之後，移動設備可以基於上述特徵點對齊所得到的投影點

可以對上述稀疏圖像對齊所得到的下一幀圖像的相機位姿參數T _k進行優化，最終優化得到下一幀圖像的相機位姿參數T _w,k。位姿優化的步驟可以包括：利用下一幀圖像的相機位姿參數T _w,k和三維到二維的投影函數π將目標三維點^w p _i重投影至下一幀圖像k中，得到投影點

，並利用投影點

和特徵點對齊步驟中優化得到的下一幀圖像的投影點

之間的位置差異，構建關於相機位姿參數T _w,k目標函數，參閱下式：

get the projected point

Afterwards, the mobile device can align the resulting projected points based on the above feature points

The camera pose parameter T _k of the next frame of image obtained by the above sparse image alignment can be optimized, and finally the camera pose parameter T _w,k of the next frame of image is optimized. The step of pose optimization may include: using the camera pose parameter T _w,k of the next frame of image and the projection function π from 3D to 2D to re-project the target 3D point ^w p _i into the next frame of image k , get projected point

, and using the projected point

The projection point of the next frame image optimized in the feature point alignment step

The position difference between , to construct the objective function about the camera pose parameters T _w,k , refer to the following formula:

上述公式(8)中，

表示以T _w,k為優化對象最小化目標函數。 In the above formula (8),

Indicates to minimize the objective function with T _w,k as the optimization object.

通過求解公式(8)所示的目標函數，最終能夠得到下一幀圖像的相機位姿參數T _w,k。 By solving the objective function shown in formula (8), the camera pose parameter T _w,k of the next frame image can be finally obtained.

在一個實施場景中，圖點優化的本質是對首次觀測到三維點的參考圖像上對應位置的逆深度(即深度值的倒數)優化。這裡，可以獲取特徵點的深度資訊的概率分佈，而特徵點的內點概率γ和逆深度值z，近似符合貝塔和高斯的混合模型分佈(Beta Gaussian Mixture Model Distribution)，參閱下式：

In an implementation scenario, the essence of map point optimization is to optimize the inverse depth (ie, the reciprocal of the depth value) of the corresponding position on the reference image where the 3D point is observed for the first time. Here, the probability distribution of the depth information of the feature point can be obtained, and the interior point probability γ and the inverse depth value z of the feature point approximately conform to the Beta Gaussian Mixture Model Distribution (Beta Gaussian Mixture Model Distribution), refer to the following formula:

其中，上述公式(9)中表示對於某一個特徵點p第k次觀測後的概率分佈，a _k,b _k表示貝塔分佈的參數，μ_k,

表示逆深度高斯分佈的均值和方差。在得到概率分佈之後，移動設備可以利用獲取到的概率分佈，得到下一幀圖像中特徵點的深度資訊。例如，當逆深度高斯分佈的方差

小於一預設深度範圍(如，1/200)，則可以認為深度值收斂，取此時逆深度高斯分佈的均值μ_k的倒數作為特徵點的深度值，從而可以在拍攝過程中，對特徵點的深度值不斷進行優化。 Among them, the above formula (9) represents the probability distribution after the kth observation of a certain feature point p , a _k , b _k represent the parameters of the Beta distribution, μ _k ,

Represents the mean and variance of the inverse depth Gaussian distribution. After obtaining the probability distribution, the mobile device can use the obtained probability distribution to obtain the depth information of the feature points in the next frame of image. For example, when the variance of the inverse depth Gaussian distribution

is less than a preset depth range (eg, 1/200), it can be considered that the depth value converges, and the reciprocal of the mean value μ _k of the inverse depth Gaussian distribution at this time is taken as the depth value of the feature point, so that during the shooting process, the feature The depth value of the point is continuously optimized.

步驟S132：將下一幀圖像作為當前圖像。 Step S132: Use the next frame of image as the current image.

在得到下一幀圖像的相機位姿參數和特徵點的深度資訊之後，移動設備可以將下一幀圖像作為當前圖像，並重新執行上述步驟S131以及後續步驟，從而可以逐幀計算圖像的相機位姿參數和圖像中特徵點的深度資訊。 After obtaining the camera pose parameters of the next frame of image and the depth information of feature points, the mobile device can use the next frame of image as the current image, and re-execute the above step S131 and subsequent steps, so that the image can be calculated frame by frame. The camera pose parameters of the image and the depth information of the feature points in the image.

步驟S133：重新執行步驟S131以及後續步驟。 Step S133: Re-execute step S131 and subsequent steps.

區別於前述實施例，移動設備利用預設位姿跟蹤方式對當前圖像中特徵點的深度資訊、當前圖像的相機位姿參數進行跟蹤處理，得到當前圖像的下一幀圖像中特徵點的深度資訊和下一幀圖像的相機位姿參數；從而將下一幀圖像作為當前圖像，並重新執行利用預設位姿跟蹤方式對當前圖像中特徵點的深度資訊、當前圖像的相機位姿參數進行跟蹤處理的步驟以及後續步驟，進而能夠逐幀計算相機位姿參數，有利於降低相機位姿參數的累積誤差。 Different from the foregoing embodiments, the mobile device uses the preset pose to track the The method tracks the depth information of the feature points in the current image and the camera pose parameters of the current image, and obtains the depth information of the feature points in the next frame of the current image and the camera pose of the next frame of the image. Parameters; so that the next frame image is used as the current image, and the steps of using the preset pose tracking method to track the depth information of the feature points in the current image, the camera pose parameters of the current image, and the follow-up Steps, and then the camera pose parameters can be calculated frame by frame, which is beneficial to reduce the cumulative error of the camera pose parameters.

請參閱圖4，圖4是本發明視覺跟蹤方法另一實施例的流程示意圖，視覺跟蹤方法可以包括如下步驟。 Please refer to FIG. 4 . FIG. 4 is a schematic flowchart of another embodiment of a visual tracking method according to the present invention. The visual tracking method may include the following steps.

步驟S41：獲取相機的重力資訊。 Step S41: Obtain the gravity information of the camera.

請參閱前述實施例中相關步驟。 Please refer to the relevant steps in the foregoing embodiments.

步驟S42：利用重力資訊，獲取相機在預設運動狀態下拍攝的當前圖像的相機位姿參數。 Step S42: Obtain the camera pose parameters of the current image captured by the camera in a preset motion state by using the gravity information.

步驟S43：基於當前圖像的相機位姿參數，獲取當前圖像之後的待處理圖像的相機位姿參數。 Step S43: Based on the camera pose parameters of the current image, acquire the camera pose parameters of the image to be processed after the current image.

步驟S44：判斷待處理圖像的相機位姿參數是否滿足預設穩定狀態條件，若否，則執行步驟S45，若是，則執行步驟S46。 Step S44: Determine whether the camera pose parameters of the image to be processed meet the preset stable state condition, if not, execute step S45, and if yes, execute step S46.

預設穩定狀態條件可以包括以下至少一者：相機位姿參數無異常值，待處理圖像的相機位姿參數與待處理圖像上一幀的相機位姿參數之間的差異在預設範圍內。在一個實施場景中，異常值可以包括大於位移閾值的位移參數，大於旋轉閾值的旋轉參數；其中，位移閾值、旋轉閾值和預設範圍可以根據實際應用需要進行設置，在此不做限定。 The preset steady-state conditions may include at least one of the following: the camera pose parameters have no abnormal values, and the difference between the camera pose parameters of the image to be processed and the camera pose parameters of the previous frame of the image to be processed is within a preset range Inside. In one implementation scenario, outliers may include displacement parameters greater than a displacement threshold, Rotation parameters greater than the rotation threshold; wherein, the displacement threshold, the rotation threshold and the preset range can be set according to actual application needs, and are not limited here.

步驟S45：確定無法獲取待處理圖像的位移參數，並利用待處理圖像的上一幀圖像的圖元值和上一幀圖像的相機位姿參數，得到待處理圖像的旋轉參數。 Step S45: Determine that the displacement parameter of the image to be processed cannot be obtained, and use the primitive value of the image in the previous frame of the image to be processed and the camera pose parameter of the image in the previous frame to obtain the rotation parameter of the image to be processed .

在實際應用過程中，快速運動、光照條件劇烈變化等因素均會導致相機位姿參數不準確，從而導致視覺定位不準確，為了提高視覺定位的魯棒性，在判斷處理圖像的相機位姿參數不滿足預設穩定狀態條件的情況下，移動設備可以確定通過上述步驟所獲取到的相機位姿參數不準確性，特別是位移參數。故此，為了降低旋轉參數更新失敗而導致的虛擬物體貼屏等問題，移動設備可以利用待處理圖像的上一幀圖像的圖元值和上一幀圖像的相機位姿參數，得到待處理圖像的旋轉參數，從而保持對旋轉參數的更新。 In the actual application process, factors such as fast movement and drastic changes in lighting conditions will lead to inaccurate camera pose parameters, resulting in inaccurate visual positioning. In order to improve the robustness of visual positioning, when judging the camera pose of an image When the parameters do not meet the preset stable state conditions, the mobile device may determine that the camera pose parameters obtained through the above steps are inaccurate, especially the displacement parameters. Therefore, in order to reduce the problem of sticking virtual objects on the screen caused by the failure to update the rotation parameters, the mobile device can use the primitive values of the previous frame image of the image to be processed and the camera pose parameters of the previous frame image to obtain the Handles the image's rotation parameters, thus keeping the rotation parameters updated.

在一個實施場景中，請結合參閱圖5，圖5是圖4中步驟S45一實施例的流程示意圖。其中，S45可以包括如下步驟。 In an implementation scenario, please refer to FIG. 5 , which is a schematic flowchart of an embodiment of step S45 in FIG. 4 . Wherein, S45 may include the following steps.

步驟S451：利用待處理圖像和上一幀圖像之間的位姿變換參數對上一幀圖像中的至少部分圖元點進行投影變換，得到至少部分圖元點在待處理圖像的投影點。 Step S451: Use the pose transformation parameters between the image to be processed and the previous frame of image to perform projective transformation on at least some of the primitive points in the previous frame of image, to obtain at least some of the primitive points in the image to be processed projected point.

為了便於描述，可以將待處理圖像表示為k，將上一幀圖像表示為k-1，位姿變換參數表示為T _k,k-1，上一幀圖像中的至少部分圖元點的二維座標表示為u，至少部分圖元點的深度值表示為du，二維到三維的逆投影函數可以表示為π^-1，三維到二維的投影函數可以表示為π，則投影點可以表示為π(T _k,k-1‧π^-1(u,du))，這裡，可以參閱前述實施例中的相關步驟，在此不再贅述。 For the convenience of description, the image to be processed can be denoted as k , the previous frame image can be denoted as k -1 , the pose transformation parameters can be denoted as T _{k,k -1} , and at least some primitives in the previous frame image The two-dimensional coordinates of a point are expressed as u , and the depth value of at least some primitive points is expressed as du , the back projection function from 2D to 3D can be expressed as π ^-1 , and the projection function from 3D to 2D can be expressed as π, then the projection The point can be expressed as π( T _{k,k -1} ‧π ^-1 ( u , du )). Here, reference can be made to the relevant steps in the foregoing embodiments, which will not be repeated here.

在一個實施場景中，為了降低運算複雜度，移動設備還可以將上一幀圖像進行降採樣，得到上一幀圖像的縮略圖像(如，40 * 30或更小的圖像)，從而利用待處理圖像和上一幀圖像之間的位姿變換參數對縮略圖像中的圖元點進行投影變換，得到縮略圖像中的圖元點在待處理圖像的投影點。在另一個實施場景中，為了降低運算複雜度，移動設備還可以將縮略圖像中的圖元點投影至單位球上，即可以將縮略圖像中的圖元點的深度值統一設置為1，此外，還可以根據實際應用需要將深度值統一設置為其他數值，在此不做限定。 In an implementation scenario, in order to reduce the computational complexity, the mobile device may also downsample the previous frame image to obtain a thumbnail image of the previous frame image (eg, 40*30 or smaller image) , so that the primitive points in the thumbnail image are projectively transformed using the pose transformation parameters between the image to be processed and the previous frame image, and the primitive points in the thumbnail image are in the position of the image to be processed projected point. In another implementation scenario, in order to reduce the computational complexity, the mobile device can also project the primitive points in the thumbnail image onto the unit sphere, that is, the depth values of the primitive points in the thumbnail image can be uniformly set is 1. In addition, the depth value can also be uniformly set to other values according to actual application needs, which is not limited here.

步驟S452：利用至少部分圖元點在上一幀圖像中的圖元值和與至少部分圖元點對應的投影點在待處理圖像中的圖元值的差異，構建關於位姿變換參數的目標函數。 Step S452: Using the difference between the primitive values of at least some of the primitive points in the previous frame image and the primitive values of the projected points corresponding to at least some of the primitive points in the image to be processed, construct parameters for pose transformation the objective function of .

在一個實施場景中，移動設備可以利用至少部分圖元點在上一幀圖像中局部區域圖元值W _k-1(u)，以及至少部分圖元點對應的投影點π(T _k,k-1‧π^-1(u,du))在待處理圖像中局部區域圖元值W _k(π(T _k,k-1‧π^-1(u,du)))的差異，構建關於位姿變換參數的目標函數。 In an implementation scenario, the mobile device may use at least some of the primitive points in the local area primitive value W _{k -1} ( u ) in the previous frame image, and at least some of the primitive points corresponding to the projection point π( T _{k, k -1} ‧π ^-1 ( u , du )) in the image to be processed local area primitive value W _k (π( T _{k,k -1} ‧π ^-1 ( u , du ))) difference, construct Objective function with respect to pose transformation parameters.

在另一個實施場景中，在對上一幀圖像進行降採樣的情況下，移動設備可以利用縮略圖像中圖元點的圖元值和這些圖元值對應的投影點在待處理圖像中的圖元值的差異，構建關於位姿變換參數的目標函數。 In another implementation scenario, when down-sampling the previous frame image, the mobile device can use the primitive values of the primitive points in the thumbnail image and the projection points corresponding to these primitive values in the image to be processed The difference between the primitive values in the image is used to construct the objective function with respect to the pose transformation parameters.

其中，目標函數可以參閱前述實施例中的相關步驟，在此不再贅述。 Wherein, for the objective function, reference may be made to the relevant steps in the foregoing embodiments, which will not be repeated here.

步驟S453：利用求解目標函數得到的位姿變換參數對上一幀圖像的相機位姿參數進行變換處理，得到待處理圖像的旋轉參數。 Step S453: Use the pose transformation parameters obtained by solving the objective function to transform the camera pose parameters of the previous frame image to obtain the rotation parameters of the image to be processed.

對上述目標函數進行優化求解，在優化求解過程中，移動設備可以僅優化旋轉參數，從而利用求解得到的位姿變換參數對上一幀圖像的相機位姿參數進行變換處理，得到待處理圖像的相機位姿參數，並提取相機位姿參數中的旋轉參數，作為待處理圖像的旋轉參數。 The above objective function is optimized and solved. During the optimization and solution process, the mobile device can only optimize the rotation parameters, so that the pose transformation parameters obtained by the solution are used to transform the camera pose parameters of the previous frame image, and the image to be processed is obtained. The camera pose parameters of the image, and the rotation parameters in the camera pose parameters are extracted as the rotation parameters of the image to be processed.

在一個實施場景中，為了提高視覺定位的魯棒性，移動設備在得到待處理圖像的旋轉參數之後，可以繼續檢測相機當前的加速度資訊，並判斷加速度資訊是否處於預設運動狀態，獲取加速度資訊以及判斷加速度資訊是否處於預設運動狀態的步驟可以參閱前述公開實施例中相關步驟，在此不再贅述。若處於預設運動狀態，則可以認為此時相機處於靜止狀態或勻速運動狀態，則可以重新執行獲取相機的重力資訊的步驟以及後續步驟，若不處於預設運動狀態，則可以認為此時相機仍然處於劇烈運動的狀態，則可以重新執行檢測相機當前的加速度資訊的步驟以及後續步驟。通過在視覺定位不準確的情況下，重複檢測相機當前的加速度資訊，並判斷加速度資訊是否處於預設運動狀態，並在處於預設運動狀態的情況下，重新執行獲取相機的重力資訊的步驟以及後續步驟，能夠提高視覺定位的魯棒性。 In an implementation scenario, in order to improve the robustness of visual positioning, the mobile device can continue to detect the current acceleration information of the camera after obtaining the rotation parameters of the image to be processed, and determine whether the acceleration information is in the preset motion state, and obtain the acceleration The information and the steps of judging whether the acceleration information is in the preset motion state can refer to the related steps in the foregoing disclosed embodiments, and will not be repeated here. If it is in the preset motion state, it can be considered that the camera is in a static state or a uniform motion state at this time, and the steps of obtaining the gravity information of the camera and subsequent steps can be re-executed. If it is not in the preset motion state, it can be considered that the camera If you are still in the state of vigorous exercise, you can re-execute the steps of detecting the current acceleration information of the camera and next steps. By repeatedly detecting the current acceleration information of the camera in the case of inaccurate visual positioning, and judging whether the acceleration information is in the preset motion state, and re-executing the steps of obtaining the gravity information of the camera and in the preset motion state As a subsequent step, the robustness of visual localization can be improved.

步驟S46：將待處理圖像作為當前圖像。 Step S46: Take the image to be processed as the current image.

在得到待處理圖像的旋轉參數之後，移動設備可以將待處理圖像作為當前圖像，並重新執行上述基於當前圖像的相機位姿參數，獲取當前圖像之後的待處理圖像的相機位姿參數的步驟以及後續步驟，以在劇烈運動或光照條件劇烈變化的情況下，仍然能夠持續更新旋轉參數。 After obtaining the rotation parameters of the image to be processed, the mobile device can use the image to be processed as the current image, and re-execute the above-mentioned camera pose parameters based on the current image to obtain the camera of the image to be processed after the current image The step of pose parameters and subsequent steps to keep updating the rotation parameters in the case of severe motion or drastic changes in lighting conditions.

步驟S47：重新執行步驟S43以及後續步驟。 Step S47: Re-execute step S43 and subsequent steps.

區別於前述實施例，相機位姿參數設置為包括旋轉參數和位移參數，且在移動設備獲取當前圖像之後的待處理圖像的相機位姿參數之後，回應於待處理圖像的相機位姿參數不滿足預設穩定狀態條件，確定無法獲取待處理圖像的位移參數，從而利用待處理圖像的上一幀圖像的圖元值和上一幀圖像的相機位姿參數，得到待處理圖像的旋轉參數，進而能夠在相機位姿參數不準確的情況下，直接利用圖像圖元，估計旋轉參數，能夠有利於避免旋轉參數無法更新而導致的諸如虛擬實境中虛擬物體貼屏等問題。 Different from the previous embodiments, the camera pose parameters are set to include rotation parameters and displacement parameters, and after the mobile device acquires the camera pose parameters of the image to be processed after the current image, the response to the camera pose of the image to be processed The parameters do not meet the preset steady state conditions, and it is determined that the displacement parameters of the image to be processed cannot be obtained, so that the image to be processed can be obtained by using the primitive value of the previous frame image of the image to be processed and the camera pose parameters of the previous frame image. Process the rotation parameters of the image, and then directly use the image primitives to estimate the rotation parameters when the camera pose parameters are inaccurate, which can help avoid problems such as virtual object sticking in virtual reality caused by the inability to update the rotation parameters. screen etc.

請參閱圖6，圖6是本發明視覺定位裝置60一實施例的框架示意圖。視覺定位裝置60包括重力資訊獲取部分61、第一位姿獲取部分62和第二位姿獲取部分63，重力資訊獲取部分61用於獲取相機的重力資訊；第一位姿獲取部分62配置為利用重力資訊，獲取相機在預設運動狀態下拍攝的當前圖像的相機位姿參數；第二位姿獲取部分63配置為基於當前圖像的相機位姿參數，獲取當前圖像之後的待處理圖像的相機位姿參數。 Please refer to FIG. 6 . FIG. 6 is a schematic frame diagram of an embodiment of a visual positioning device 60 of the present invention. The visual positioning device 60 includes a gravity information acquisition part 61, a first pose acquisition part 62 and a second pose acquisition part 63, The gravity information acquisition part 61 is used to obtain the gravity information of the camera; the first pose acquisition part 62 is configured to use the gravity information to obtain the camera pose parameters of the current image captured by the camera in a preset motion state; the second pose acquisition Part 63 is configured to acquire camera pose parameters of images to be processed after the current image based on the camera pose parameters of the current image.

在一些公開實施例中，重力資訊包括重力方向資訊，視覺定位裝置60還包括特徵方向獲取部分，配置為獲取當前圖像中的特徵點的特徵方向資訊，視覺定位裝置60還包括深度資訊獲取部分，配置為利用特徵點的特徵方向資訊和重力方向資訊，得到當前圖像中特徵點的深度資訊，第二位姿獲取部分63還配置為基於當前圖像中特徵點的深度資訊和當前圖像的相機位姿參數，獲取當前圖像之後的待處理圖像中特徵點的深度資訊和待處理圖像的相機位姿參數。 In some disclosed embodiments, the gravity information includes gravity direction information, and the visual positioning device 60 further includes a feature direction acquisition part configured to acquire feature direction information of feature points in the current image, and the visual positioning device 60 also includes a depth information acquisition part , configured to use the feature direction information and gravity direction information of the feature points to obtain the depth information of the feature points in the current image, and the second pose acquisition part 63 is also configured to be based on the depth information of the feature points in the current image and the current image The camera pose parameters, obtain the depth information of the feature points in the image to be processed after the current image and the camera pose parameters of the image to be processed.

區別於前述實施例，通過獲取當前圖像中特徵點的特徵方向資訊，並利用特徵點的特徵方向資訊和重力資訊所包含的重力方向資訊，得到當前圖像中特徵點的深度資訊，故能夠僅基於當前圖像來初始化當前圖像中特徵點的深度資訊和當前圖像的相機位姿參數，且能夠基於當前圖像中特徵點的深度資訊和當前圖像的相機位姿參數，獲取當前圖像之後的待處理圖像中特徵點的深度資訊和待處理圖像的相機位姿參數，而無需掃描多幀圖像來進行初始化工作，從而能夠提高視覺定位的回應速度。 Different from the foregoing embodiments, by obtaining the feature direction information of the feature points in the current image, and using the feature direction information of the feature points and the gravity direction information included in the gravity information, the depth information of the feature points in the current image can be obtained, so it can Initialize the feature points in the current image only based on the current image Depth information and the camera pose parameters of the current image, and based on the depth information of the feature points in the current image and the camera pose parameters of the current image, the depth information of the feature points in the image to be processed after the current image can be obtained And the camera pose parameters of the image to be processed, without scanning multiple frames of images for initialization, which can improve the response speed of visual positioning.

在一些公開實施例中，特徵方向資訊包括特徵點的方向向量，重力方向資訊包括重力向量，深度資訊包括特徵點的深度值，深度資訊獲取部分包括第一運算子部分，配置為對特徵點的方向向量和重力向量進行第一預設運算，得到特徵點的方向向量和重力向量之間的夾角，深度資訊獲取部分第二運算子部分，配置為對相機的預設高度和夾角進行第二預設運算，得到特徵點的深度值。 In some disclosed embodiments, the feature direction information includes the direction vector of the feature point, the gravity direction information includes the gravity vector, the depth information includes the depth value of the feature point, and the depth information acquisition part includes a first operator part configured to Perform the first preset operation on the direction vector and the gravity vector to obtain the angle between the direction vector and the gravity vector of the feature point. The second operator part of the depth information acquisition part is configured to perform a second preset on the preset height and angle of the camera. Set the operation to get the depth value of the feature point.

區別於前述實施例，特徵方向資訊設置為包括特徵點的方向向量，重力方向資訊設置為包括重力向量，深度資訊設置為包括特徵點的深度值，從而對特徵點的方向向量和重力向量進行第一預設運算，得到特徵點的方向向量和重力向量之間的夾角，從而對相機的預設高度和夾角進行第二預設運算，得到特徵點的深度值，故能夠有利於降低獲取特徵點深度值的計算複雜度。 Different from the above-mentioned embodiments, the feature direction information is set to include the direction vector of the feature point, the gravity direction information is set to include the gravity vector, and the depth information is set to include the depth value of the feature point, so that the direction vector and the gravity vector of the feature point are set for the first time. The first preset operation obtains the angle between the direction vector and the gravity vector of the feature point, so as to perform the second preset operation on the preset height and angle of the camera to obtain the depth value of the feature point, so it can help reduce the cost of obtaining the feature point Computational complexity of depth values.

在一些公開實施例中，第一預設運算包括內積運算，和/或，第二預設運算包括將預設高度除以夾角的餘弦值。 In some disclosed embodiments, the first preset operation includes an inner product operation, and/or, the second preset operation includes dividing the preset height by the cosine of the included angle.

區別於前述實施例，將第一預設運算設置為包括內積運算，能夠有利於降低獲取方向向量和重力向量之間夾角的複雜度，將第二預設運算設置為包括將預設高度除以夾角的餘弦值，能夠有利於降低獲取深度值的複雜度。 Different from the foregoing embodiments, setting the first preset operation to include the inner product operation can help reduce the complexity of obtaining the angle between the direction vector and the gravity vector, and setting the second preset operation to include dividing the preset height by by The cosine value of the included angle can help reduce the complexity of obtaining the depth value.

在一些公開實施例中，第二位姿獲取部分63包括位姿跟蹤子部分，配置為利用預設位姿跟蹤方式對當前圖像中特徵點的深度資訊、當前圖像的相機位姿參數進行跟蹤處理，得到當前圖像的下一幀圖像中特徵點的深度資訊和下一幀圖像的相機位姿參數，第二位姿獲取部分63包括重複執行子部分，配置為將下一幀圖像作為當前圖像，並重新執行利用預設位姿跟蹤方式對當前圖像中特徵點的深度資訊、當前圖像的相機位姿參數進行跟蹤處理的步驟以及後續步驟。 In some disclosed embodiments, the second pose acquisition part 63 includes a pose tracking subsection configured to use a preset pose tracking method to carry out the depth information of the feature points in the current image and the camera pose parameters of the current image Tracking processing, to obtain the depth information of the feature points in the next frame image of the current image and the camera pose parameters of the next frame image, the second pose acquisition part 63 includes a repeat execution subsection, configured to convert the next frame The image is used as the current image, and the steps of tracking the depth information of the feature points in the current image and the camera pose parameters of the current image by using the preset pose tracking method and subsequent steps are re-executed.

區別於前述實施例，利用預設位姿跟蹤方式對當前圖像中特徵點的深度資訊、當前圖像的相機位姿參數進行跟蹤處理，得到當前圖像的下一幀圖像中特徵點的深度資訊和下一幀圖像的相機位姿參數，從而將下一幀圖像作為當前圖像，並重新執行利用預設位姿跟蹤方式對當前圖像中特徵點的深度資訊、當前圖像的相機位姿參數進行跟蹤處理的步驟以及後續步驟，進而能夠逐幀計算相機位姿參數，有利於降低相機位姿參數的累積誤差。 Different from the above-mentioned embodiments, the depth information of the feature points in the current image and the camera pose parameters of the current image are tracked by using the preset pose tracking method, and the feature points in the next frame of the current image are obtained. Depth information and the camera pose parameters of the next frame image, so that the next frame image is used as the current image, and the depth information of the feature points in the current image, the current image are re-executed using the preset pose tracking method The steps of tracking and subsequent steps of the camera pose parameters, and then the camera pose parameters can be calculated frame by frame, which is conducive to reducing the cumulative error of the camera pose parameters.

在一些公開實施例中，位姿跟蹤子部分包括：特徵點投影部分，配置為利用當前圖像中特徵點的深度資訊，確定特徵點在下一幀圖像中的投影點；位姿變換參數計算部分，配置為基於特徵點在當前圖像中局部區域的圖元值和投影點在下一幀圖像中局部區域的圖元值之間的差異，得到當前圖像與下一幀圖像之間的位姿變換參數；相機位姿參數計算部分，配置為利用位姿變換參數和當前圖像的相機位姿參數，得到下一幀圖像的相機位姿參數；相機位姿參數優化部分，配置為利用已經收斂的三維點，優化下一幀圖像的相機位姿參數；深度資訊獲取部分，配置為獲取特徵點的深度資訊的概率分佈，並利用概率分佈，得到下一幀圖像中特徵點的深度資訊。 In some disclosed embodiments, the pose tracking subpart includes: a feature point projection part configured to use the depth information of the feature points in the current image to determine the projection points of the feature points in the next frame image; pose transformation parameter calculation The part is configured to obtain the difference between the current image and the next frame image based on the difference between the primitive value of the feature point in the local area in the current image and the primitive value of the projected point in the local area in the next frame image. The pose transformation parameters of ; the camera position The pose parameter calculation part is configured to use the pose transformation parameters and the camera pose parameters of the current image to obtain the camera pose parameters of the next frame image; the camera pose parameter optimization part is configured to use the converged 3D points, Optimize the camera pose parameters of the next frame image; the depth information acquisition part is configured to obtain the probability distribution of the depth information of the feature point, and use the probability distribution to obtain the depth information of the feature point in the next frame image.

區別於前述實施例，通過利用當前圖像中特徵點的深度資訊，確定特徵點在下一幀圖像中的投影點，從而基於特徵點在當前圖像中局部區域的圖元值和投影點在下一幀圖像中局部區域的圖元值之間的差異，得到當前圖像與下一幀圖像之間的位姿變換參數，並利用位姿變換參數和當前圖像的相機位姿參數，得到下一幀圖像的相機位姿參數，利用已經收斂的三維點，優化下一幀圖像的相機位姿參數，從而可以對相機位姿參數進行進一步的優化，有利於提高相機位姿參數的準確性；而通過獲取特徵點的深度資訊的概率分佈，並利用概率分佈，得到下一幀圖像中特徵點的深度資訊，從而能夠基於深度資訊的分佈概率，在拍攝過程中對深度資訊進行優化。 Different from the previous embodiments, by using the depth information of the feature points in the current image, the projection points of the feature points in the next frame of image are determined, so that based on the primitive values and projection points of the feature points in the local area of the current image, the The difference between the primitive values of the local area in one frame of image is used to obtain the pose transformation parameters between the current image and the next frame of image, and using the pose transformation parameters and the camera pose parameters of the current image, Get the camera pose parameters of the next frame of image, use the converged 3D points to optimize the camera pose parameters of the next frame of image, so that the camera pose parameters can be further optimized, which is conducive to improving the camera pose parameters Accuracy; and by obtaining the probability distribution of the depth information of the feature points, and using the probability distribution, the depth information of the feature points in the next frame of image can be obtained, so that the depth information can be calculated based on the distribution probability of the depth information during the shooting process. optimize.

在一些公開實施例中，相機位姿參數包括旋轉參數和位移參數，視覺定位裝置60還包括相機位姿檢測部分，配置為回應於待處理圖像的相機位姿參數不滿足預設穩定狀態條件，確定無法獲取待處理圖像的位移參數，視覺定位裝置60還包括旋轉參數更新部分，配置為利用待處理圖像的上一幀圖像的圖元值和上一幀圖像的相機位姿參數，得到待處理圖像的旋轉參數。 In some disclosed embodiments, the camera pose parameters include rotation parameters and displacement parameters, and the visual positioning device 60 further includes a camera pose detection part configured to respond to the camera pose parameters of the image to be processed not satisfying the preset steady state condition , it is determined that the displacement parameter of the image to be processed cannot be obtained, and the visual positioning device 60 also includes a rotation parameter updating part configured to use the primitive value of the previous frame image of the image to be processed and the camera pose of the previous frame image parameter, get the rotation parameter of the image to be processed.

區別於前述實施例，相機位姿參數設置為包括旋轉參數和位移參數，且在獲取當前圖像之後的待處理圖像的相機位姿參數之後，回應於待處理圖像的相機位姿參數不滿足預設穩定狀態條件，確定無法獲取待處理圖像的位移參數，從而利用待處理圖像的上一幀圖像的圖元值和上一幀圖像的相機位姿參數，得到待處理圖像的旋轉參數，進而能夠在相機位姿參數不準確的情況下，直接利用圖像圖元，估計旋轉參數，能夠有利於降低因旋轉參數無法更新而導致的諸如虛擬實境中虛擬物體貼屏等問題發生的概率。 Different from the foregoing embodiments, the camera pose parameters are set to include rotation parameters and displacement parameters, and after acquiring the camera pose parameters of the image to be processed after the current image, in response to the fact that the camera pose parameters of the image to be processed are not Satisfy the preset steady state conditions, determine that the displacement parameters of the image to be processed cannot be obtained, so that the image to be processed can be obtained by using the primitive value of the previous frame image of the image to be processed and the camera pose parameters of the previous frame image The rotation parameters of the image can be directly used to estimate the rotation parameters when the camera pose parameters are inaccurate. probability of such problems occurring.

在一些公開實施例中，旋轉參數更新部分包括投影變換子部分，配置為利用待處理圖像和上一幀圖像之間的位姿變換參數對上一幀圖像中的至少部分圖元點進行投影變換，得到至少部分圖元點在待處理圖像的投影點，旋轉參數更新部分包括函數構建子部分，配置為利用至少部分圖元點在上一幀圖像中的圖元值和與至少部分圖元點對應的投影點在待處理圖像中的圖元值的差異，構建關於位姿變換參數的目標函數，旋轉參數更新部分包括參數獲取子部分，配置為利用求解目標函數得到的位姿變換參數對上一幀圖像的相機位姿參數進行變換處理，得到待處理圖像的旋轉參數。 In some disclosed embodiments, the rotation parameter update part includes a projective transformation subsection configured to use the pose transformation parameters between the image to be processed and the previous frame image to update at least some primitive points in the previous frame image Perform projection transformation to obtain the projection points of at least some of the primitive points in the image to be processed, and the rotation parameter update part includes a function construction subsection configured to use at least some of the primitive points in the previous frame image and the sum of The difference between the primitive values of the projection points corresponding to at least some of the primitive points in the image to be processed is used to construct an objective function about the pose transformation parameters, and the rotation parameter update part includes a parameter acquisition subsection, which is configured to use the obtained by solving the objective function The pose transformation parameters transform the camera pose parameters of the previous frame image to obtain the rotation parameters of the image to be processed.

區別於前述實施例，利用待處理圖像和上一幀圖像之間的位姿變換參數對上一幀圖像中的至少部分圖元點進行投影變換，得到至少部分圖元點在待處理圖像的投影點，並利用至少部分圖元點在上一幀圖像中的圖元值和至少部分圖元點對應的投影點在待處理圖像中的圖元值的差異，構建關於位姿變換參數的目標函數，從而利用求解目標函數得到的位姿變換參數對上一幀圖像的相機位姿參數進行變換處理，得到待處理圖像的旋轉參數，故能夠基於至少部分圖元點求得旋轉參數，能夠有利於降低計算旋轉參數的計算量。 Different from the foregoing embodiments, at least some of the primitive points in the previous frame image are adjusted using the pose transformation parameters between the image to be processed and the previous frame image Perform projection transformation to obtain the projection points of at least some of the primitive points in the image to be processed, and use the primitive values of at least some of the primitive points in the previous frame image and the projection points corresponding to at least some of the primitive points in the image to be processed The difference between the primitive values in the image is used to construct an objective function about the pose transformation parameters, so that the pose transformation parameters obtained by solving the objective function are used to transform the camera pose parameters of the previous frame image, and the image to be processed is obtained. The rotation parameter of the image, so the rotation parameter can be obtained based on at least some of the primitive points, which can help reduce the calculation amount of the calculation of the rotation parameter.

在一些公開實施例中，旋轉參數更新部分包括降採樣子部分，配置為將上一幀圖像進行降採樣處理，得到上一幀圖像的縮略圖像，投影變換子部分還配置為利用待處理圖像和上一幀圖像之間的位姿變換參數對縮略圖像中的圖元點進行投影變換，得到縮略圖像中的圖元點在待處理圖像的投影點。 In some disclosed embodiments, the rotation parameter update part includes a downsampling subsection configured to perform downsampling processing on the previous frame image to obtain a thumbnail image of the previous frame image, and the projection transformation subsection is also configured to use The pose transformation parameters between the image to be processed and the previous frame image perform projection transformation on the primitive points in the thumbnail image to obtain the projection points of the primitive points in the thumbnail image on the image to be processed.

區別於前述實施例，通過將上一幀圖像進行降採樣處理，得到上一幀圖像的縮略圖像，從而利用待處理圖像的上一幀圖像之間的位姿變換參數對縮略圖像中的圖元點進行投影變換，得到縮略圖像中的圖元點在待處理圖像的投影點，以進行後續的目標函數構建以及求解，能夠有利於降低計算旋轉參數的計算量。 Different from the foregoing embodiments, the thumbnail image of the previous frame image is obtained by down-sampling the previous frame image, so that the pose transformation parameters between the previous frame images of the image to be processed are used to The primitive points in the thumbnail image are projected and transformed to obtain the projection points of the primitive points in the thumbnail image on the image to be processed, so as to construct and solve the subsequent objective function, which can help reduce the cost of calculating the rotation parameters Calculations.

在一些公開實施例中，視覺定位裝置60還包括加速度檢測部分，配置為檢測相機當前的加速度資訊，並判斷加速度資訊是否處於預設運動狀態，重力資訊獲取部分61、第一位姿獲取部分62和第二位姿獲取部分63還配置為在判斷結果為是的情況下重新執行獲取相機的重力資訊的步驟以及後續步驟，加速度檢測部分還配置為在判斷結果為否的情況下，重新執行檢測相機當前的加速度資訊的步驟以及後續步驟。 In some disclosed embodiments, the visual positioning device 60 also includes an acceleration detection part configured to detect the current acceleration information of the camera and determine whether the acceleration information is in a preset motion state, the gravity information acquisition part 61, the first pose acquisition part 62 It is also matched with the second pose acquisition part 63 It is configured to re-execute the step of obtaining the gravity information of the camera and the subsequent steps if the judgment result is Yes, and the acceleration detection part is also configured to re-execute the steps of detecting the current acceleration information of the camera and the subsequent steps if the judgment result is No. step.

區別於前述實施例，在得到待處理圖像的旋轉參數之後，進一步檢測相機當前的加速度資訊，並判斷加速度資訊是否處於預設運動狀態，從而在處於預設運動狀態的情況下，重新執行獲取相機的重力資訊的步驟以及後續步驟，並在不處於預設運動狀態的情況下，重新執行檢測相機當前的加速度資訊的步驟以及後續步驟，進而能夠有利於提高視覺定位的魯棒性。 Different from the above-mentioned embodiments, after obtaining the rotation parameters of the image to be processed, the current acceleration information of the camera is further detected, and it is judged whether the acceleration information is in the preset motion state, so that in the case of the preset motion state, re-execute the acquisition The step of detecting the gravity information of the camera and the subsequent steps, and re-executing the step of detecting the current acceleration information of the camera and the subsequent steps when it is not in the preset motion state, can help improve the robustness of visual positioning.

在一些公開實施例中，重力資訊包括重力方向資訊，相機位姿參數包括旋轉參數和位移參數，第一位姿獲取部分62包括旋轉角度獲取子部分，配置為利用重力方向資訊，獲取相機分別相對於世界座標系x座標軸、y座標軸和z座標軸的旋轉角度；且相機按照旋轉角度旋轉後的重力方向與z座標軸的反方向相同，第一位姿獲取部分62參數初始化子部分，配置為利用旋轉角度，得到旋轉參數，並將位移參數設置為預設數值。 In some disclosed embodiments, the gravity information includes gravity direction information, the camera pose parameters include rotation parameters and displacement parameters, and the first pose acquisition part 62 includes a rotation angle acquisition subsection, configured to use the gravity direction information to obtain the camera relative to each other. The rotation angle of the x coordinate axis, y coordinate axis and z coordinate axis of the world coordinate system; and the direction of gravity after the camera is rotated according to the rotation angle is the same as the opposite direction of the z coordinate axis. The first pose acquisition part 62 parameter initialization subsection is configured to use the rotation Angle, get the rotation parameter, and set the displacement parameter to the preset value.

在本發明實施例以及其他的實施例中，“部分”可以是部分電路、部分處理器、部分程式或軟體等等，當然也可以是單元，還可以是模組也可以是非模組化的。 In the embodiments of the present invention and other embodiments, a "part" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course, it may also be a unit, and it may also be a module or non-modular.

區別於前述實施例，通過利用重力方向資訊，獲取相機分別相對於世界座標系x座標軸、y座標軸和z座標軸的旋轉角度，且相機按照旋轉角度旋轉後的重力方向與z座標軸的反方向相同，從而利用旋轉角度，得到旋轉參數，並將位移參數設置為預設數值，能夠通過重力對齊得到旋轉參數，進而初始化相機位姿參數，有利於降低相機位姿參數初始化的計算量。 Different from the previous embodiments, by using the gravity direction information, the camera is obtained relative to the world coordinate system x-coordinate axis, y-coordinate axis and z-coordinate The rotation angle of the axis, and the direction of gravity after the camera rotates according to the rotation angle is the same as the opposite direction of the z coordinate axis, so that the rotation parameter can be obtained by using the rotation angle, and the displacement parameter can be set to a preset value, and the rotation parameter can be obtained through gravity alignment. In turn, the camera pose parameters are initialized, which is beneficial to reduce the calculation amount of camera pose parameter initialization.

在一些公開實施例中，世界座標系的原點為相機拍攝當前圖像時所在的位置，預設數值為0。 In some disclosed embodiments, the origin of the world coordinate system is the position where the camera captures the current image, and the default value is 0.

區別於前述實施例，將世界座標系的原點設置為相機拍攝當前圖像時所在的位置，預設數值設置為0，能夠有利於降低初始化位移參數的複雜度。 Different from the foregoing embodiments, the origin of the world coordinate system is set to the position where the camera captures the current image, and the preset value is set to 0, which can help reduce the complexity of initializing the displacement parameters.

在一些公開實施例中，預設運動狀態為靜止狀態或勻速運動狀態；和/或，重力資訊是利用相機在預設狀態下的加速度資訊得到的。 In some disclosed embodiments, the preset motion state is a static state or a uniform motion state; and/or, the gravity information is obtained by using the acceleration information of the camera in the preset state.

區別於前述實施例，將預設運動狀態設置為靜止狀態或勻速運動狀態，能夠有利於提高初始化當前圖像的相機位姿參數的準確性；而利用相機在預設狀態下的加速度資訊得到重力資訊，能夠僅利用加速度計得到重力資訊，從而能夠有利於進一步降低視覺定位技術的使用成本，擴大視覺定位技術的使用範圍。 Different from the foregoing embodiments, setting the preset motion state to a static state or a uniform motion state can help improve the accuracy of initializing the camera pose parameters of the current image; and use the acceleration information of the camera in the preset state to obtain gravity Information can only use the accelerometer to obtain gravity information, which can help to further reduce the cost of using the visual positioning technology and expand the scope of use of the visual positioning technology.

請參閱圖7，圖7是本發明電子設備70一實施例的框架示意圖。電子設備70包括相互耦接的記憶體71和處理器72，處理器72用於執行記憶體71中儲存的程式指令，以實現上述任一視覺定位方法實施例的步驟。在一個實施場景中，電子設備70可以包括但不限於：手機、平板電腦、機器人等移動設備，在此不做限定。 Please refer to FIG. 7 . FIG. 7 is a schematic frame diagram of an embodiment of an electronic device 70 of the present invention. The electronic device 70 includes a memory 71 and a processor 72 coupled to each other. The processor 72 is used to execute the program instructions stored in the memory 71 to implement the steps of any one of the above embodiments of the visual positioning method. In an implementation scenario, the electronic device 70 may include but not limited to: a mobile phone, Mobile devices such as tablet computers and robots are not limited here.

在本發明實施例中，處理器72用於控制其自身以及記憶體71以實現上述任一視覺定位方法實施例的步驟。處理器72還可以稱為CPU(Central Processing Unit，中央處理單元)。處理器72可能是一種積體電路晶片，具有信號的處理能力。處理器72還可以是通用處理器、數位訊號處理器(Digital Signal Processor,DSP)、專用積體電路(Application Specific Integrated Circuit,ASIC)、現場可程式設計閘陣列(Field-Programmable Gate Array,FPGA)或者其他可程式設計邏輯器件、分立門或者電晶體邏輯器件、分立硬體元件。通用處理器可以是微處理器或者該處理器也可以是任何常規的處理器等。另外，處理器72可以由積體電路晶片共同實現。 In the embodiment of the present invention, the processor 72 is used to control itself and the memory 71 to implement the steps of any of the above embodiments of the visual positioning method. The processor 72 may also be referred to as a CPU (Central Processing Unit, central processing unit). The processor 72 may be an integrated circuit chip with signal processing capability. The processor 72 can also be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field-programmable gate array (Field-Programmable Gate Array, FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. In addition, the processor 72 may be jointly realized by an integrated circuit chip.

上述方案，能夠降低視覺定位技術的使用成本，擴大視覺定位技術的使用範圍。 The above solution can reduce the use cost of the visual positioning technology and expand the application range of the visual positioning technology.

請參閱圖8，圖8為本發明電腦可讀儲存介質80一實施例的框架示意圖。電腦可讀儲存介質80儲存有能夠被處理器運行的程式指令801，程式指令801用於實現上述任一視覺定位方法實施例的步驟。 Please refer to FIG. 8 . FIG. 8 is a frame diagram of an embodiment of a computer-readable storage medium 80 of the present invention. The computer-readable storage medium 80 stores program instructions 801 that can be executed by the processor, and the program instructions 801 are used to implement the steps of any one of the above embodiments of the visual positioning method.

在本發明所提供的幾個實施例中，應該理解到，所揭露的方法和裝置，可以通過其它的方式實現。例如，以上所描述的裝置實施方式僅僅是示意性的，例如，模組或單元的劃分，僅僅為一種邏輯功能劃分，實際實現時可以有另外的劃分方式，例如單元或元件可以結合或者可以集成到另一個系統，或一些特徵可以忽略，或不執行。另一點，所顯示或討論的相互之間的耦合或直接耦合或通信連接可以是通過一些介面，裝置或單元的間接耦合或通信連接，可以是電性、機械或其它的形式。 In the several embodiments provided by the present invention, it should be understood that the disclosed methods and devices can be implemented in other ways. For example, with The device implementation described above is only illustrative. For example, the division of modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, units or components can be combined or integrated into other A system, or some feature, can be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

作為分離部件說明的單元可以是或者也可以不是物理上分開的，作為單元顯示的部件可以是或者也可以不是物理單元，即可以位於一個地方，或者也可以分佈到網路單元上。可以根據實際的需要選擇其中的部分或者全部單元來實現本實施方式方案的目的。 A unit described as a separate component may or may not be physically separated, and a component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may also be distributed to network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本發明各個實施例中的各功能單元可以集成在一個處理單元中，也可以是各個單元單獨物理存在，也可以兩個或兩個以上單元集成在一個單元中。上述集成的單元既可以採用硬體的形式實現，也可以採用軟體功能單元的形式實現。 In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented not only in the form of hardware, but also in the form of software functional units.

集成的單元如果以軟體功能單元的形式實現並作為獨立的產品銷售或使用時，可以儲存在一個電腦可讀取儲存介質中。基於這樣的理解，本發明的技術方案本質上或者說對現有技術做出貢獻的部分或者該技術方案的全部或部分可以以軟體產品的形式體現出來，該電腦軟體產品儲存在一個儲存介質中，包括若干指令用以使得一台電腦設備(可以是個人電腦，伺服器，或者網路設備等)或處理器(processor)執行本發明各個實施方式方法的全部或部分步驟。而前述的儲存介質包括：U盤、移動硬碟、唯讀記憶體(ROM，Read-Only Memory)、隨機存取記憶體(RAM，Random Access Memory)、磁碟或者光碟等各種可以儲存程式碼的介質。 If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium. Include a number of instructions to make a computer device (which can be a personal computer, server, or network device, etc.) or A processor (processor) executes all or part of the steps of the methods in various embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc., which can store program codes. medium.

Industrial Applicability

本發明實施例中，通過獲取相機的重力資訊，從而利用重力資訊，獲取相機在預設運動狀態下拍攝的當前圖像的相機位姿參數，並基於當前圖像的相機位姿參數，獲取當前圖像之後的待處理圖像的相機位姿參數，進而能夠只依賴於相機和重力資訊來進行視覺定位，故能夠降低視覺定位技術的使用成本，擴大視覺定位技術的使用範圍。 In the embodiment of the present invention, by obtaining the gravity information of the camera, the gravity information is used to obtain the camera pose parameters of the current image captured by the camera in a preset motion state, and based on the camera pose parameters of the current image, the current The camera pose parameters of the image to be processed after the image can then only rely on the camera and gravity information for visual positioning, so the cost of using visual positioning technology can be reduced and the scope of use of visual positioning technology can be expanded.

S11~S13:步驟 S11~S13: Steps

Claims

A visual positioning method applicable to a visual positioning device, the method comprising: acquiring gravity information of a camera; using the gravity information to acquire camera pose parameters of a current image captured by the camera in a preset motion state; based on The camera pose parameters of the current image, acquire the camera pose parameters of the image to be processed after the current image; wherein, the camera pose parameters include rotation parameters and displacement parameters; in response to the to-be-processed The camera pose parameters of the image do not meet the preset steady state conditions, and it is determined that the displacement parameters of the image to be processed cannot be obtained; and, using the primitive value of the previous frame of the image to be processed and the The camera pose parameters of the last frame of image to obtain the rotation parameters of the image to be processed; The camera pose parameters of the image, obtaining the rotation parameters of the image to be processed includes: using the pose transformation parameters between the image to be processed and the previous frame image to At least some of the primitive points are subjected to projection transformation to obtain the projection points of the at least some of the primitive points in the image to be processed; using the primitive values of the at least some of the primitive points in the previous frame image and the difference between the primitive values of the projected points corresponding to the at least part of the primitive points in the image to be processed, constructing the object of the pose transformation parameters A scalar function; using the pose transformation parameters obtained by solving the objective function to transform the camera pose parameters of the previous frame image to obtain the rotation parameters of the image to be processed.

According to the method described in claim 1, wherein the gravity information includes gravity direction information, and the camera pose parameters of the image to be processed after the current image are obtained based on the camera pose parameters of the current image Before parameters, the method further includes: acquiring feature direction information of feature points in the current image; using feature direction information of feature points and the gravity direction information to obtain the feature in the current image The depth information of the point; the camera pose parameters based on the current image, obtaining the camera pose parameters of the image to be processed after the current image includes: based on the feature points in the current image The depth information and the camera pose parameters of the current image are obtained by acquiring the depth information of the feature points in the image to be processed after the current image and the camera pose parameters of the image to be processed.

According to the method described in claim 2, wherein, the feature direction information includes the direction vector of the feature point, the gravity direction information includes a gravity vector, and the depth information includes the depth value of the feature point; The feature direction information and the gravity direction information of the feature point, obtaining the depth information of the feature point in the current image includes: performing a first prediction on the direction vector and the gravity vector of the feature point Set the operation to obtain the direction vector and the gravity direction of the feature point An included angle between the quantities; performing a second preset operation on the preset height of the camera and the included angle to obtain the depth value of the feature point.

The method according to claim 3, wherein the first preset operation includes an inner product operation; and/or, the second preset operation includes dividing the preset height by a cosine value of the included angle.

According to the method described in claim 2, wherein, based on the depth information of the feature points in the current image and the camera pose parameters of the current image, the image to be processed after the current image is acquired The depth information of the feature points in the image and the camera pose parameters of the image to be processed include: using a preset pose tracking method to track the depth information of the feature points in the current image, the current image The camera pose parameters of the current image are tracked to obtain the depth information of the feature points in the next frame image of the current image and the camera pose parameters of the next frame image; the next frame image image as the current image, and re-execute the step of tracking the depth information of the feature points in the current image and the camera pose parameters of the current image using the preset pose tracking method and next steps.

According to the method described in claim 5, wherein, the depth information of the feature points in the current image and the camera pose parameters of the current image are tracked by using the preset pose tracking method to obtain The depth information of the feature points in the next frame image of the current image and the The camera pose parameters of the next frame image include: using the depth information of the feature point in the current image to determine the projection point of the feature point in the next frame image; based on the feature The difference between the primitive value of the point in the local area in the current image and the primitive value of the projected point in the local area in the next frame image, to obtain the current image and the next The pose transformation parameters between frame images; using the pose transformation parameters and the camera pose parameters of the current image to obtain the camera pose parameters of the next frame image; using the converged three-dimensional points , optimizing the camera pose parameters of the next frame of image; obtaining the probability distribution of the depth information of the feature point, and using the probability distribution to obtain the depth information of the feature point in the next frame of image.

According to the method described in claim 1, wherein the use of the pose transformation parameters between the image to be processed and the image of the previous frame is used to modify at least some primitive points in the image of the previous frame Performing projection transformation to obtain that at least some of the primitive points are before the projection point of the image to be processed, the method further includes: performing downsampling processing on the previous frame image to obtain the previous frame image The thumbnail image of the image; the use of the pose transformation parameters between the image to be processed and the image of the previous frame to perform projective transformation on at least some primitive points in the image to be processed, to obtain The projection points of at least some of the primitive points on the image to be processed include: Using the pose transformation parameters between the image to be processed and the previous frame image to perform projective transformation on the primitive points in the thumbnail image to obtain the primitive points in the thumbnail image At the projection point of the image to be processed.

According to the method described in claim 1, wherein the image to be processed is obtained by using the primitive value of the previous frame image of the image to be processed and the camera pose parameter of the previous frame image After the rotation parameters of the image, the method further includes: detecting the current acceleration information of the camera, and judging whether the acceleration information is in the preset motion state; if so, re-executing the step of acquiring the gravity information of the camera and subsequent steps; if not, re-execute the step of detecting the current acceleration information of the camera and subsequent steps.

According to the method described in claim 1, wherein the gravity information includes gravity direction information, the camera pose parameters include rotation parameters and displacement parameters, and the camera is in a preset motion state obtained by using the gravity information The camera pose parameters of the current image captured below include: using the gravity direction information to obtain the rotation angles of the camera relative to the x coordinate axis, y coordinate axis and z coordinate axis of the world coordinate system respectively; The direction of gravity after the rotation angle is rotated is the same as the opposite direction of the z coordinate axis; the rotation parameter is obtained by using the rotation angle, and the displacement parameter is set as a preset value.

The method according to claim 9, wherein the world The origin of the coordinate system is the position where the camera captures the current image, and the preset value is 0.

The method according to any one of claims 1 to 10, wherein the preset motion state is a static state or a uniform motion state; and/or, the gravity information is obtained by using the camera in the preset motion state Get the acceleration information below.

An electronic device, comprising a memory and a processor coupled to each other, the processor is used to execute program instructions stored in the memory, so as to realize the visual positioning method described in any one of claims 1 to 11.

A computer-readable storage medium, on which program instructions are stored, and when the program instructions are executed by a processor, the visual positioning method described in any one of claims 1 to 11 is implemented.