TW202203644A

TW202203644A - Method and system for simultaneously tracking 6 dof poses of movable object and movable camera

Info

Publication number: TW202203644A
Application number: TW110114401A
Authority: TW
Inventors: 汪德美; 謝中揚
Original assignee: 財團法人工業技術研究院
Priority date: 2020-07-08
Filing date: 2021-04-21
Publication date: 2022-01-16
Also published as: TWI793579B

Abstract

A method and a system for simultaneously tracking several 6 DoF poses of a movable object and a movable camera are provided. The method includes the following steps: A series of images are captured by a movable camera. Several environmental feature points are extracted from the images. The environmental feature points are matched to calculate several camera matrixes of the movable camera, and then the 6 DoF poses of the movable camera are calculated by the camera matrixes. At the same time, several feature points of the movable object are inferred from the images captured by the movable camera. The coordinates of the feature points of the movable object are corrected through the camera matrixes corresponding to the images, as well as the predefined geometric and temporal constraints. Then, the 6 DoF poses of the movable object are calculated.

Description

Method and system for simultaneously tracking the 6-DOF orientation of a movable object and a movable camera

本揭露是有關於一種同時追蹤可移動物體與可移動相機的六自由度方位之方法與系統。The present disclosure relates to a method and system for simultaneously tracking the 6DOF orientation of a movable object and a movable camera.

在現有的追蹤技術中，例如同時定位與地圖構建技術（Simultaneous Localization And Mapping, SLAM）可以追蹤可移動相機的六自由度方位，但卻無法同時追蹤可移動物體。原因是可移動相機需要用穩定的環境特徵點才能進行定位，而可移動物體的特徵點不穩定，通常會被丟棄，無法用於追蹤。Among existing tracking technologies, such as Simultaneous Localization And Mapping (SLAM), which can track the 6-DOF orientation of movable cameras, it cannot track movable objects at the same time. The reason is that the movable camera needs stable environment feature points for localization, and the feature points of movable objects are unstable and are usually discarded and cannot be used for tracking.

另一方面，用於追蹤可移動物體的技術都會忽略環境特徵點以避免干擾，因此這些技術都無法追蹤可移動相機。On the other hand, techniques used to track moving objects ignore environmental features to avoid interference, so none of these techniques can track moving cameras.

大多數神經網路所學習的特徵都是用來區分物體的類型，而不是計算物體的六自由度方位。某些用於辨識姿態或手勢的神經網路只能夠輸出骨骼關節在影像平面的2D坐標（x, y），即使靠深度感測技術估算關節與相機之間的距離，也不是空間中真正的3D座標，更無法計算空間中的六自由度方位。Most of the features learned by neural networks are used to distinguish the type of objects, not to calculate the six-degree-of-freedom orientation of the object. Some neural networks for recognizing poses or gestures can only output the 2D coordinates (x, y) of the skeletal joints in the image plane. Even if the distance between the joint and the camera is estimated by depth sensing technology, it is not the real space. 3D coordinates, and it is impossible to calculate the six-degree-of-freedom orientation in space.

在運動捕捉系統中，則是使用多個固定相機追蹤關節位置，一般會在關節上貼標記以減少誤差，沒有追蹤可移動相機的六自由度方位。In a motion capture system, multiple fixed cameras are used to track joint positions, and marks are generally attached to the joints to reduce errors, and the 6-DOF orientation of movable cameras is not tracked.

因此，就目前已知的技術而言，尚未有任何技術能夠做到同時追蹤可移動物體與可移動相機。Therefore, as far as the currently known technology is concerned, there is no technology that can simultaneously track a movable object and a movable camera.

隨著混合實境（mixed reality, MR）的快速發展，促使研究人員開發能夠同時追蹤可移動相機和可移動物體之六個自由度方位的技術。在混合實境的應用中，由於安裝在MR眼鏡上的相機會隨頭部移動，因此需要知道相機的六自由度方位才能知道使用者的位置和方向。與使用者互動的物體也會移動，因此還需要知道該物體的六自由度方位才能在適當的位置和方向顯示虛擬內容。戴著MR眼鏡的使用者可能在室內或室外自由走動，很難在環境中放置標記。而且為了有較好的使用體驗，除了物體本身的特徵外，也不會在物體上貼特殊的標記。The rapid development of mixed reality (MR) has prompted researchers to develop technologies that can simultaneously track the orientation of moveable cameras and moveable objects in six degrees of freedom. In mixed reality applications, since the camera mounted on the MR glasses moves with the head, it is necessary to know the 6-DOF orientation of the camera to know the user's position and orientation. Objects that the user interacts with also move, so knowledge of the object's 6DOF orientation is also required to display virtual content in the proper position and orientation. Users wearing MR glasses may move freely indoors or outdoors, making it difficult to place markers in the environment. Moreover, in order to have a better user experience, in addition to the characteristics of the object itself, no special marks will be attached to the object.

雖然這些情況提高追蹤六自由度方位的難度，我們仍開發出能夠同時追蹤可移動物體與可移動相機的技術，以解決上述這些問題並滿足更多的應用。While these situations increase the difficulty of tracking 6DOF orientations, we have developed techniques that can track both moving objects and moving cameras simultaneously to address these issues and satisfy more applications.

本揭露所提出之技術例如可以應用於：當使用者戴著MR眼鏡時，可以在手持裝置，例如：手機的真實螢幕旁顯示一個或多個虛擬螢幕，根據手機和MR眼鏡上的相機的六自由度方位設定虛擬螢幕的預設位置、方向和大小。並且，透過六自由度方位的追蹤，可以自動控制虛擬螢幕旋轉和移動，使其與觀看方向一致。本揭露技術可以為使用者提供以下好處：（1）將小的實體螢幕擴展到大的虛擬螢幕；（2）將單個實體螢幕增加到多個虛擬螢幕，以同時查看更多應用程式；（3）虛擬螢幕的內容不會被他人窺探。The technology proposed in the present disclosure can be applied to, for example, when a user wears MR glasses, one or more virtual screens can be displayed next to the real screen of a handheld device, such as a mobile phone, according to the six characteristics of the camera on the mobile phone and MR glasses. DOF Orientation sets the default position, orientation and size of the virtual screen. And, through 6-DOF azimuth tracking, the virtual screen can be automatically controlled to rotate and move so that it is consistent with the viewing direction. The disclosed technology can provide users with the following benefits: (1) expand a small physical screen to a large virtual screen; (2) increase a single physical screen to multiple virtual screens to view more applications at the same time; (3) ) The content of the virtual screen will not be spied on by others.

根據本揭露之一實施例，提出一種同時追蹤可移動物體與可移動相機的六自由度方位（6 DoF poses）之方法，包括以下步驟：以可移動相機擷取一連串的影像，從這些影像中提取數個環境特徵點，匹配這些環境特徵點計算可移動相機之數個相機矩陣，再由這些相機矩陣計算可移動相機的六自由度方位；並同時從可移動相機擷取的這些影像中推算可移動物體的數個特徵點，使用這些影像各自對應的相機矩陣，以及預先定義的幾何限制和時間限制，修正可移動物體的這些特徵點的座標，再以這些修正後的特徵點座標及其對應的相機矩陣，計算可移動物體的六自由度方位。According to an embodiment of the present disclosure, a method for simultaneously tracking 6 DoF poses of a movable object and a movable camera is proposed, which includes the following steps: capturing a series of images with the movable camera, and from the images Extract several environmental feature points, match these environmental feature points to calculate several camera matrices of the movable camera, and then calculate the six-degree-of-freedom orientation of the movable camera from these camera matrices; and at the same time calculate from these images captured by the movable camera Several feature points of the movable object, using the corresponding camera matrices of these images, as well as the pre-defined geometric and time constraints, correct the coordinates of these feature points of the movable object, and then use the corrected coordinates of the feature points and their The corresponding camera matrix, calculates the six-degree-of-freedom orientation of the movable object.

根據本揭露之另一實施例，提出一種同時追蹤可移動物體與可移動相機的六自由度方位之系統，包括可移動相機、可移動相機六自由度方位計算單元及可移動物體六自由度方位計算單元。可移動相機用以擷取一連串的影像。可移動相機六自由度方位計算單元用以從這些影像中提取數個環境特徵點，匹配這些環境特徵點計算可移動相機之數個相機矩陣，再由這些相機矩陣計算可移動相機的六自由度方位。可移動物體六自由度方位計算單元，用以從可移動相機擷取的這些影像中推算可移動物體的數個特徵點，透過這些影像各自對應的相機矩陣，以及預先定義的幾何限制、和時間限制，修正可移動物體的這些特徵點的座標，再以這些修正後的特徵點座標及其對應的這些相機矩陣，計算可移動物體的六自由度方位。According to another embodiment of the present disclosure, a system for simultaneously tracking the 6-DOF orientation of a movable object and a movable camera is provided, including a movable camera, a movable camera 6-DOF orientation calculation unit, and a movable object 6-DOF orientation computing unit. The movable camera is used to capture a series of images. The 6-DOF orientation calculation unit of the movable camera is used to extract several environmental feature points from these images, match these environmental feature points to calculate several camera matrices of the movable camera, and then calculate the 6-DOF of the movable camera from these camera matrices position. The six-degree-of-freedom orientation calculation unit of the movable object is used to calculate several feature points of the movable object from the images captured by the movable camera, through the corresponding camera matrices of these images, as well as the predefined geometric constraints, and time Restriction, correct the coordinates of these feature points of the movable object, and then calculate the six-degree-of-freedom orientation of the movable object based on the corrected coordinates of the feature points and the corresponding camera matrices.

為了對本揭露之上述及其他方面有更佳的瞭解，下文特舉實施例，並配合所附圖式詳細說明如下：In order to have a better understanding of the above-mentioned and other aspects of the present disclosure, the following embodiments are given and described in detail with the accompanying drawings as follows:

請參照第1A、1B圖，其繪示本揭露同時追蹤可移動物體與可移動相機之技術與習知技術相比在應用上的說明。本揭露所提出之技術例如可以應用於：如第1A圖所示，當使用者戴著MR眼鏡G1（MR眼鏡G1上配置可移動相機110）時，可以在手持裝置，例如：手機P1（即可移動物體900）的真實螢幕旁顯示一個或多個虛擬螢幕，根據手機P1和MR眼鏡G1上的可移動相機110的六自由度方位設定虛擬螢幕D2、D3的預設位置、方向和大小。可移動相機110的「可移動」係指相對於三維空間之一靜止物而言。並且，透過六自由度方位的追蹤，可以自動控制虛擬螢幕D2、D3旋轉和移動，使其與觀看方向一致（如第1B圖所示），使用者也可以根據自己的喜好調整這些虛擬螢幕D2、D3的位置和角度。習知技術所顯示的虛擬螢幕會跟著MR眼鏡G1移動，不會跟著物體的六自由度方位移動。本揭露技術可以為使用者提供以下好處：（1）將小的實體螢幕D1擴展到大的虛擬螢幕D2；（2）將單個實體螢幕D1增加到多個虛擬螢幕D2、D3，以同時查看更多應用程式；（3）虛擬螢幕D2、D3的內容不會被他人窺探。上述技術也可以應用於平板電腦或筆記型電腦，在其實體螢幕旁設置虛擬螢幕。可移動物體900除了實體螢幕以外，還可以是其他能定義特徵的物體，例如：汽車、自行車、行人等。可移動相機110不侷限是MR眼鏡G1上的相機，也可以是自主移動機器人和車輛上的相機。Please refer to FIGS. 1A and 1B , which illustrate the application of the technology of simultaneously tracking a movable object and a movable camera of the present disclosure compared with the prior art. For example, the technology proposed in the present disclosure can be applied to: as shown in FIG. 1A , when the user wears the MR glasses G1 (the movable camera 110 is arranged on the MR glasses G1 ), the user can use a handheld device, such as the mobile phone P1 (ie, the mobile phone P1 ) One or more virtual screens are displayed next to the real screen of the movable object 900), and the preset positions, directions and sizes of the virtual screens D2 and D3 are set according to the 6-DOF orientation of the movable camera 110 on the mobile phone P1 and the MR glasses G1. "Moveable" of the movable camera 110 means relative to a stationary object in three-dimensional space. Moreover, through the 6-DOF azimuth tracking, the rotation and movement of the virtual screens D2 and D3 can be automatically controlled to be consistent with the viewing direction (as shown in Figure 1B). Users can also adjust these virtual screens D2 according to their own preferences. , the position and angle of D3. The virtual screen displayed by the conventional technology will move with the MR glasses G1, and will not move with the 6-DOF orientation of the object. The disclosed technology can provide users with the following benefits: (1) expand a small physical screen D1 to a large virtual screen D2; (2) increase a single physical screen D1 to multiple virtual screens D2 and D3 to view more Multi-application; (3) The content of virtual screens D2 and D3 will not be spied on by others. The above technology can also be applied to a tablet or laptop, where a virtual screen is placed alongside its physical screen. In addition to the physical screen, the movable object 900 can also be other objects that can define features, such as: cars, bicycles, pedestrians, and the like. The movable camera 110 is not limited to a camera on the MR glasses G1, but can also be a camera on an autonomous mobile robot and a vehicle.

請參照第2A圖，其繪示根據一實施例之同時追蹤可移動物體900（標示於第1A圖）與可移動相機110的六自由度方位之系統100與方法。可移動物體900例如是第1A圖之手機P1；可移動相機110例如是第1A圖之MR眼鏡G1上的相機。同時追蹤可移動物體900與可移動相機110的六自由度方位之系統100包括可移動相機110、可移動相機六自由度方位計算單元120及可移動物體六自由度方位計算單元130。可移動相機110用以擷取一連串影像IM。可移動相機110可以設置於頭戴式立體顯示器、行動裝置、電腦或機器人上。可移動相機六自由度方位計算單元120及/或可移動物體六自由度方位計算單元130例如是電路、晶片、電路板、程式碼、或儲存程式碼之儲存裝置。Please refer to FIG. 2A , which illustrates a system 100 and method for simultaneously tracking the 6DOF orientation of a movable object 900 (labeled in FIG. 1A ) and a movable camera 110 according to one embodiment. The movable object 900 is, for example, the mobile phone P1 in FIG. 1A ; the movable camera 110 is, for example, the camera on the MR glasses G1 in FIG. 1A . The system 100 for simultaneously tracking the 6DOF orientation of the movable object 900 and the movable camera 110 includes the movable camera 110 , the movable camera 6DOF orientation calculation unit 120 and the movable object 6DOF orientation calculation unit 130 . The movable camera 110 is used to capture a series of images IM. The movable camera 110 can be installed on a head-mounted stereoscopic display, a mobile device, a computer or a robot. The movable camera 6-DOF orientation calculation unit 120 and/or the movable object 6-DOF orientation calculation unit 130 is, for example, a circuit, a chip, a circuit board, a code, or a storage device for storing the code.

可移動相機六自由度方位計算單元120包括環境特徵擷取單元121、相機矩陣計算單元122及相機方位計算單元123，其實施方式例如是電路、晶片、電路板、程式碼、或儲存程式碼之儲存裝置。環境特徵擷取單元121用以從這些影像IM中提取數個環境特徵點EF。相機矩陣計算單元122係匹配這些環境特徵點EF計算可移動相機110之數個相機矩陣CM。相機方位計算單元123再由相機矩陣CM計算可移動相機110的六自由度方位CD。The six-degree-of-freedom orientation calculation unit 120 of the movable camera includes an environmental feature extraction unit 121, a camera matrix calculation unit 122 and a camera orientation calculation unit 123, the implementations of which are, for example, circuits, chips, circuit boards, code, or stored code. storage device. The environmental feature extraction unit 121 is used for extracting several environmental feature points EF from the images IM. The camera matrix calculation unit 122 calculates several camera matrices CM of the movable camera 110 by matching these environmental feature points EF. The camera orientation calculation unit 123 then calculates the six-degree-of-freedom orientation CD of the movable camera 110 from the camera matrix CM.

可移動物體六自由度方位計算單元130包括物體特徵座標推算單元131、物體特徵座標修正單元132及物體方位計算單元133，其實施方式例如是電路、晶片、電路板、程式碼、或儲存程式碼之儲存裝置。物體特徵座標推算單元131用以從可移動相機110擷取的這些影像IM中推算可移動物體900的數個特徵點OF，這些特徵點OF為預先定義，與可移動相機110擷取的這些影像IM做比對，以推算這些特徵點OF的座標。其中，可移動物體900為剛性物體。The six-degree-of-freedom orientation calculation unit 130 of the movable object includes an object feature coordinate calculation unit 131, an object feature coordinate correction unit 132, and an object orientation calculation unit 133, which are implemented by, for example, a circuit, a chip, a circuit board, a code, or a stored code. storage device. The object feature coordinate estimation unit 131 is used to estimate several feature points OF of the movable object 900 from the images IM captured by the movable camera 110 . These feature points OF are predefined, and the images captured by the movable camera 110 IM makes a comparison to calculate the coordinates of these feature points OF. The movable object 900 is a rigid object.

請參照第2B圖所繪示之另一實施例，同時追蹤可移動物體900與可移動相機110的六自由度方位之方法包含訓練階段（training stage）ST1和追蹤階段（tracking stage）ST2。其中，物體特徵座標推算單元131使用神經網路推論模型MD，從可移動相機110擷取的這些影像IM中推算可移動物體900特徵點OF的座標，神經網路推論模型MD為預先訓練，訓練資料由手動或自動標記獲得，在訓練過程中加入幾何限制GC和時間限制TC。Referring to another embodiment shown in FIG. 2B , the method for simultaneously tracking the 6DOF orientation of the movable object 900 and the movable camera 110 includes a training stage ST1 and a tracking stage ST2 . The object feature coordinate estimation unit 131 uses the neural network inference model MD to estimate the coordinates of the feature point OF of the movable object 900 from the images IM captured by the movable camera 110 . The neural network inference model MD is pre-trained and trained. Data were obtained by manual or automatic labeling, and geometry-limited GC and time-limited TC were added during training.

物體特徵座標修正單元132使用這些影像IM各自對應的相機矩陣CM，以及預先定義的幾何限制GC和時間限制TC，修正可移動物體900的這些特徵點OF的座標。其中，物體特徵座標修正單元132使用這些相機矩陣CM，將這些特徵點OF的二維座標投影至對應的三維座標，依據幾何限制GC，刪除三維座標偏差大於預定值的特徵點OF，或以相鄰特徵點OF的座標依據幾何限制GC補充未被偵測到的特徵點OF的座標。並且，物體特徵座標修正單元132更依據時間限制TC，比對這些特徵點OF於多張連續影像IM中的座標變化，再以這些連續影像IM中對應的這些特徵點OF的座標修正座標變化大於預定值的特徵點OF的座標，得到修正後之這些特徵點OF’的座標。The object feature coordinate correction unit 132 corrects the coordinates of these feature points OF of the movable object 900 by using the camera matrix CM corresponding to each of the images IM, and the predefined geometric constraints GC and time constraints TC. Wherein, the object feature coordinate correction unit 132 uses these camera matrices CM to project the two-dimensional coordinates of these feature points OF to the corresponding three-dimensional coordinates, and delete the feature points OF whose three-dimensional coordinate deviation is greater than a predetermined value according to the geometric limit GC, or use the corresponding The coordinates of the adjacent feature points OF are supplemented by the coordinates of the undetected feature points OF according to the geometric constraints GC. In addition, the object feature coordinate correction unit 132 compares the coordinate changes of the feature points OF in the plurality of continuous images IM according to the time limit TC, and then corrects the coordinate changes based on the coordinates of the corresponding feature points OF in the continuous images IM greater than The coordinates of the feature points OF with predetermined values are obtained to obtain the coordinates of these feature points OF' after correction.

請參照第3A圖，其示例說明可移動相機擷取的一連串影像中，環境特徵點、可移動物體特徵點各自的對應關係。對於非平面物體來說，則可以透過幾個選定的特徵點OF的質心來定義方向和位置。請參照第3B圖，其示例說明物體在空間的位置與方向。特徵點OF擬合出最佳平面PL，最佳平面PL之中心點C可以代表物體在三維空間中的位置(x, y, x)，並且用最佳平面PL之法向量N可以表示物體的方向。Please refer to FIG. 3A , which illustrates the corresponding relationship between the environmental feature points and the movable object feature points in a series of images captured by the movable camera. For non-planar objects, the orientation and position can be defined by the centroids of several selected feature points OF. Please refer to Figure 3B, which illustrates the position and orientation of objects in space. The feature point OF fits the best plane PL, the center point C of the best plane PL can represent the position (x, y, x) of the object in the three-dimensional space, and the normal vector N of the best plane PL can represent the object's position. direction.

幾何限制GC定義於三維空間中，對於剛性物體，特徵點OF之間的距離應該是固定的。經過相機矩陣投影至二維影像平面後，所有特徵點OF的位置須限制在合理的範圍內。The geometric constraint GC is defined in three-dimensional space, and for rigid objects, the distance between feature points OF should be fixed. After the camera matrix is projected to the two-dimensional image plane, the positions of all feature points OF must be limited to a reasonable range.

請參照第4A～4B圖，其示例說明修正特徵點OF的座標。相機矩陣CM不僅可用於計算可移動相機110和可移動物體900的六自由度方位，還可套用三維的幾何限制GC，修正特徵點OF*投影到二維影像平面的座標（如第4A圖所示）或添加缺少的特徵點OF**座標（如第4B圖所示）。Please refer to FIGS. 4A to 4B, which illustrate the coordinates of the corrected feature point OF as an example. The camera matrix CM can not only be used to calculate the six-degree-of-freedom orientation of the movable camera 110 and the movable object 900, but also apply the three-dimensional geometric constraints GC to correct the coordinates of the feature point OF* projected to the two-dimensional image plane (as shown in Figure 4A). shown) or add the missing feature point OF** coordinates (as shown in Figure 4B).

物體方位計算單元133再以修正後之這些特徵點OF’的座標及其對應的這些相機矩陣CM，計算可移動物體900的六自由度方位OD。對於平面之可移動物體，使用這些特徵點OF計算最佳擬合平面。可移動物體900的六自由度方位OD由平面之中心點及法向量定義。對於非平面之可移動物體，可移動物體900的六自由度方位OD由這些特徵點OF'之三維座標的質心定義。The object orientation calculation unit 133 then calculates the six-degree-of-freedom orientation OD of the movable object 900 based on the corrected coordinates of the feature points OF' and the corresponding camera matrices CM. For plane movable objects, use these feature points OF to calculate the best fit plane. The six-degree-of-freedom orientation OD of the movable object 900 is defined by the center point and normal vector of the plane. For a non-planar movable object, the six-degree-of-freedom orientation OD of the movable object 900 is defined by the centroid of the three-dimensional coordinates of these feature points OF'.

如第2B圖所示，同時追蹤可移動物體900與可移動相機110的六自由度方位之系統100的訓練階段（training stage）ST1包括訓練資料生成單元140及神經網路訓練單元150，其實施方式例如是電路、晶片、電路板、程式碼、或儲存程式碼之儲存裝置。As shown in FIG. 2B , the training stage ST1 of the system 100 for simultaneously tracking the 6-DOF orientation of the movable object 900 and the movable camera 110 includes a training data generating unit 140 and a neural network training unit 150 . The method is, for example, a circuit, a chip, a circuit board, a code, or a storage device that stores the code.

神經網路訓練單元150用以訓練神經網路推論模型MD。神經網路推論模型MD用於推算可移動物體900之特徵點OF的位置和序列。在訓練資料生成單元140中，訓練資料可以是手動標記特徵點之位置和序列的影像、或者是自動擴充已標記的影像。請參照第5A～5D圖，其繪示以手機為例之各種訓練資料。在這些圖式中，特徵點OF由實體螢幕D4的四個內角定義。實體螢幕D4擺放成縱向方向時，順時針方向從左上角到左下角依序指定為四個特徵點OF的順序。如第5A圖所示，四個特徵點OF依序具有座標

、座標

、座標

、座標

。即使將實體螢幕D4屏幕旋轉到橫向，特徵點OF的順序也保持不變（如第5B圖所示）。在某些情況中，並不是所有的特徵點OF都能被拍到。因此，訓練資料需要包含一些類似第5C圖或第5D圖這種缺漏一些特徵點OF的影像。如第5A圖與第5D圖所示，特徵點標記的動作可以分辨出手機的正面（即螢幕）與背面，而僅在正面進行標記。為了獲得較高的精準度，在標記特徵點OF時放大每張影像，直到清楚地看到每個像素。由於手動標記的動作非常耗時，因此需要自動擴充才能將訓練資料擴展到百萬張數等級。對手動標記的影像進行自動擴充的方法包含：按比例縮放與旋轉、以透視投影法進行映射、轉換到不同的顏色、調整其亮度和對比度、添加移動模糊和雜訊、加上其他物體遮蓋某些特徵點(如第5C圖與第5D圖所示)、變更螢幕顯示的內容、或者替換背景等等。再將這些手動標記之特徵點OF的位置按照轉換關係重新計算在自動擴充的影像中的位置。The neural network training unit 150 is used for training the neural network inference model MD. The neural network inference model MD is used to infer the position and sequence of the feature points OF of the movable object 900 . In the training data generating unit 140, the training data may be images of manually marked positions and sequences of feature points, or images that have been automatically augmented. Please refer to Figures 5A to 5D, which illustrate various training materials using a mobile phone as an example. In these figures, the feature point OF is defined by the four inner corners of the solid screen D4. When the physical screen D4 is placed in the vertical direction, the order of the four feature points OF is designated in the clockwise direction from the upper left corner to the lower left corner. As shown in Fig. 5A, the four feature points OF have coordinates in sequence

,coordinate

. Even if the solid screen D4 screen is rotated to landscape orientation, the order of feature points OF remains unchanged (as shown in Fig. 5B). In some cases, not all feature points OF can be captured. Therefore, the training data needs to include some images like the 5C or 5D images that lack some feature points OF. As shown in Figures 5A and 5D, the action of feature point marking can distinguish the front (ie, screen) and back of the mobile phone, and only mark the front. For better accuracy, zoom in on each image while marking the feature point OF until each pixel is clearly seen. Since manually labeled actions are time-consuming, automatic augmentation is required to scale the training data to the millions of sheets. Methods for automatic augmentation of manually labeled images include: scaling and rotating, mapping in perspective projection, converting to different colors, adjusting their brightness and contrast, adding motion blur and noise, and adding other objects to obscure certain objects. Some feature points (as shown in Figure 5C and Figure 5D), change the content displayed on the screen, or replace the background and so on. Then, the positions of these manually marked feature points OF are recalculated in the automatically expanded image according to the conversion relationship.

請參照第6圖，其示例說明神經網路在訓練階段的主要結構包含特徵抽取和特徵點座標預測。其中特徵抽取器ET可以使用如ResNet這種深度殘差網路或其他有類似功能的網路。所抽取的特徵向量FV傳送至特徵點座標預測層FL中，推算特徵點OF的座標（例如目前影像之特徵點OF的座標以

表示、前一張影像之特徵點OF的座標以

表示）。除了特徵點預測層之外，本實施例還加上幾何限制層GCL和時間限制層TCL以減少錯誤的預測。在訓練階段，每一層會根據損失函數計算出預測值與真值的損失值LV，然後將這些損失值及其各自的權重進行累加以獲得總損失值OLV。Please refer to Figure 6, which illustrates that the main structure of the neural network in the training phase includes feature extraction and feature point coordinate prediction. The feature extractor ET can use a deep residual network such as ResNet or other networks with similar functions. The extracted feature vector FV is sent to the feature point coordinate prediction layer FL, and the coordinates of the feature point OF are estimated (for example, the coordinates of the feature point OF of the current image are

Indicates that the coordinates of the feature point OF of the previous image are given by

Express). In addition to the feature point prediction layer, this embodiment also adds a geometric constraint layer GCL and a temporal constraint layer TCL to reduce erroneous predictions. In the training phase, each layer calculates the loss value LV of the predicted value and the true value according to the loss function, and then accumulates these loss values and their respective weights to obtain the total loss value OLV.

請參照第7圖，其示例說明在相鄰的兩張影像之間，特徵點位移的計算方式。在目前影像中特徵點OF的座標為

，同一特徵點OF在前一張影像中的座標為

，其間的位移定為

。Please refer to FIG. 7 , which illustrates the calculation method of the displacement of feature points between two adjacent images. The coordinates of the feature point OF in the current image are

, the coordinates of the same feature point OF in the previous image are

, the displacement between them is set as

.

不合理的位移以懲罰值

進行限制。懲罰值

例如是按照下式（1）進行計算。

………………………………………………..（1）

Unreasonable displacement with penalty value

limit. penalty value

For example, it is calculated according to the following formula (1).

………………………………………………..(1)

其中m為所有訓練資料針對每個特徵點OF所計算出的位移平均值，s是位移標準差，d是同一特徵點OF在前一影像與目前影像之間的位移。當d≤m時，位移屬於可接受範圍內，沒有懲罰值（即

）。請參照第8圖，其示例說明時間限制TC、懲罰值

的計算及判定方法。圓的中心代表在前一張影像中特徵點OF的座標

，圓的面積代表在目前影像中特徵點OF可接受的位移。如果在目前影像中，特徵點OF的預測座標

在圓內（位移d'≤m），則懲罰值

為零。如果在目前影像中特徵點OF的預測座標

在圓外（位移d"＞m），則懲罰值

為

。位移超出圓的半徑(即m)越多，在訓練過程中將會得到較大的懲罰值

和較大的損失值，以此限制特徵點OF的座標在合理範圍內。where m is the average displacement calculated by all training data for each feature point OF, s is the standard deviation of the displacement, and d is the displacement of the same feature point OF between the previous image and the current image. When d≤m, the displacement falls within the acceptable range without penalty (ie

). Please refer to Figure 8 which illustrates the time limit TC, penalty value

calculation and determination method. The center of the circle represents the coordinates of the feature point OF in the previous image

, the area of the circle represents the acceptable displacement of the feature point OF in the current image. If in the current image, the predicted coordinates of the feature point OF

In the circle (displacement d'≤m), then the penalty value

zero. If the predicted coordinates of the feature point OF in the current image

Outside the circle (displacement d">m), the penalty value

for

. The more the displacement exceeds the radius of the circle (i.e. m), the larger the penalty value will be during the training process

and a larger loss value to limit the coordinates of the feature point OF within a reasonable range.

請參照第9圖，其示例說明缺少時間限制TC而產生不正確位移的情況。第9圖之左側圖示為前一影像，右側圖示為目前影像。在前一影像中，辨識出具有座標

之特徵點OF。但在目前影像中，從反光成像中辨識出具有座標

之特徵點OF，座標

與座標

之間的位移大於時間限制TC所設定的範圍，故可以判定座標

不正確。Please refer to Fig. 9, which illustrates the case of incorrect displacement due to lack of time limit TC. The left side of Figure 9 is the previous image, and the right side is the current image. In the previous image, it is recognized that there are coordinates

The feature point OF. However, in the current image, it is recognized from the reflection image that there are coordinates

The feature point OF, the coordinates

with coordinates

The displacement between them is greater than the range set by the time limit TC, so the coordinates can be determined.

Incorrect.

如第2B圖所示，在追蹤階段ST2，可移動相機110擷取一連串的影像IM。從這些影像中提取數個環境特徵點EF，然後將其用於計算可移動相機110的相應的相機矩陣CM和六自由度方位CD。同時，可移動物體900的特徵點OF的座標也被神經網路推論模型MD推算出來，並由相機矩陣CM轉換、修正，以獲得可移動物體900的六自由度方位OD。As shown in FIG. 2B, in the tracking stage ST2, the movable camera 110 captures a series of images IM. Several environmental feature points EF are extracted from these images, which are then used to calculate the corresponding camera matrix CM and six-degree-of-freedom orientation CD of the movable camera 110 . At the same time, the coordinates of the feature point OF of the movable object 900 are also calculated by the neural network inference model MD, and converted and corrected by the camera matrix CM to obtain the six-degree-of-freedom orientation OD of the movable object 900 .

請參照第10圖，其繪示加入增量學習階段(incremental learning stage)ST3之同時追蹤可移動物體900（標示於第1A圖）與可移動相機110的六自由度方位之系統200與方法，包含：自動擴增單元260及權重調整單元270，其實施方式例如是電路、晶片、電路板、程式碼、或儲存程式碼之儲存裝置。Please refer to FIG. 10, which illustrates a system 200 and method for simultaneously tracking the 6DOF orientation of a movable object 900 (marked in FIG. 1A ) and a movable camera 110 while adding an incremental learning stage ST3, Including: the automatic amplification unit 260 and the weight adjustment unit 270, the implementations of which are, for example, a circuit, a chip, a circuit board, a code, or a storage device for storing the code.

在第10圖之實施例中，神經網路推論模型MD在訓練階段，其訓練資料由手動標記和自動擴充組成；而在增量學習階段，其訓練資料由自動標記和自動擴充組成。In the embodiment of FIG. 10, the neural network inference model MD is in the training phase, and its training data is composed of manual labeling and automatic expansion; and in the incremental learning phase, its training data is composed of automatic labeling and automatic expansion.

在追蹤可移動物體900的同時，神經網路推論模型MD在背景執行增量學習。增量學習的訓練資料包括：可移動相機110擷取的影像IM及自動擴增單元260根據影像IM自動擴增的影像IM’。自動擴增單元260並以對應影像IM及IM’的修正後之特徵點OF的座標取代手動標記，做為特徵點座標真值。權重調整單元270調整神經網路推論模型MD中的權重，以更新為神經網路推論模型MD’，藉此適應使用情境以精準追蹤可移動物體900的六自由度方位OD。While tracking the movable object 900, the neural network inference model MD performs incremental learning in the background. The training data for incremental learning includes: the image IM captured by the movable camera 110 and the image IM' automatically augmented by the automatic augmentation unit 260 according to the image IM. The automatic augmentation unit 260 replaces the manual mark with the coordinates of the corrected feature point OF corresponding to the images IM and IM' as the true value of the feature point coordinates. The weight adjustment unit 270 adjusts the weights in the neural network inference model MD to update the neural network inference model MD', thereby adapting to the usage situation to accurately track the six-degree-of-freedom orientation OD of the movable object 900 .

此外，請參照第11圖，其繪示應用於MR眼鏡之同時追蹤可移動物體900與可移動相機110的六自由度方位之系統300與方法，包括：方位修正單元310、方位穩定單元320、視軸計算單元330、螢幕方位計算單元340及立體影像產生單元350，其實施方式例如是電路、晶片、電路板、程式碼、或儲存程式碼之儲存裝置。方位修正單元310包括交叉比對單元311及修正單元312，其實施方式例如是電路、晶片、電路板、程式碼、或儲存程式碼之儲存裝置。立體影像產生單元350包括影像產生單元351及成像單元352，其實施方式例如是電路、晶片、電路板、程式碼、或儲存程式碼之儲存裝置。In addition, please refer to FIG. 11, which shows a system 300 and a method for simultaneously tracking the 6-DOF orientation of the movable object 900 and the movable camera 110 when applied to MR glasses, including an orientation correction unit 310, an orientation stabilization unit 320, The viewing axis calculation unit 330 , the screen orientation calculation unit 340 and the stereoscopic image generation unit 350 are implemented by, for example, a circuit, a chip, a circuit board, a code, or a storage device for storing the code. The orientation correction unit 310 includes a cross-comparison unit 311 and a correction unit 312, which are implemented by, for example, a circuit, a chip, a circuit board, a code, or a storage device for storing the code. The stereoscopic image generating unit 350 includes an image generating unit 351 and an imaging unit 352, which are implemented by, for example, a circuit, a chip, a circuit board, a code, or a storage device for storing the code.

隨著可移動相機110和可移動物體900的移動，需要對它們的六自由度方位CD、OD進行交叉比對和修正（如第8圖所示）。方位修正單元310之交叉比對單元311用以交叉比對可移動物體900之六自由度方位OD與可移動相機110之六自由度方位CD。修正單元312用以修正可移動物體900之六自由度方位OD與可移動相機110之六自由度方位CD。As the movable camera 110 and the movable object 900 move, their six-degree-of-freedom orientations CD and OD need to be cross-compared and corrected (as shown in FIG. 8 ). The cross-comparison unit 311 of the orientation correction unit 310 is used for cross-comparison of the six-degree-of-freedom orientation OD of the movable object 900 and the six-degree-of-freedom orientation CD of the movable camera 110 . The correction unit 312 is used for correcting the 6-DOF orientation OD of the movable object 900 and the 6-DOF orientation CD of the movable camera 110 .

為減少因頭部無意識的輕微晃動，而重新計算可移動相機及可移動物體的六自由度方位，造成虛擬螢幕D2（繪示於第1A圖）跟著晃動產生暈眩。方位穩定單元320用以判斷當可移動物體900之六自由度方位OD或可移動相機110之六自由度方位CD的變動小於預設值時，不改變可移動物體900之六自由度方位OD與可移動相機110之六自由度方位CD。In order to reduce the unintentional slight shaking of the head, the 6-DOF orientations of the movable camera and the movable object are recalculated, causing the virtual screen D2 (shown in Figure 1A) to sway and cause dizziness. The orientation stabilization unit 320 is used to determine that when the change of the six-degree-of-freedom orientation OD of the movable object 900 or the six-degree-of-freedom orientation CD of the movable camera 110 is less than a preset value, the six-degree-of-freedom orientation OD of the movable object 900 and the six-degree-of-freedom orientation OD of the movable object 900 are not changed. A six-degree-of-freedom orientation CD of the movable camera 110 .

視軸計算單元330用以根據可移動相機110之六自由度方位CD計算使用者之雙眼的視軸。The viewing axis calculating unit 330 is used for calculating the viewing axis of the user's eyes according to the six-degree-of-freedom orientation CD of the movable camera 110 .

螢幕方位計算單元340用以根據可移動物體900之六自由度方位OD與可移動相機110之六自由度方位CD計算虛擬螢幕D2之六自由度方位DD，讓虛擬螢幕D2隨著可移動物體900一起移動（如第1B圖所示），或是隨著可移動相機110之六自由度方位改變虛擬螢幕D2呈顯的視角。The screen orientation calculation unit 340 is used for calculating the six-degree-of-freedom orientation DD of the virtual screen D2 according to the six-degree-of-freedom orientation OD of the movable object 900 and the six-degree-of-freedom orientation CD of the movable camera 110 , so that the virtual screen D2 follows the movable object 900 move together (as shown in FIG. 1B ), or change the displayed viewing angle of the virtual screen D2 with the 6-DOF orientation of the movable camera 110 .

立體影像產生單元350之影像產生單元351用以根據虛擬螢幕D2之六自由度方位DD及立體顯示器（例如是第1A圖之MR眼鏡G1）的光學參數產生虛擬螢幕D2之左眼影像及右眼影像。立體影像產生單元350之成像單元352用以顯示虛擬螢幕D2的立體影像於立體顯示器（例如是第1A圖之MR眼鏡G1）。The image generating unit 351 of the stereoscopic image generating unit 350 is used for generating the left-eye image and the right-eye image of the virtual screen D2 according to the 6-DOF orientation DD of the virtual screen D2 and the optical parameters of the stereoscopic display (for example, the MR glasses G1 in FIG. 1A ). image. The imaging unit 352 of the stereoscopic image generating unit 350 is used for displaying the stereoscopic image of the virtual screen D2 on the stereoscopic display (eg, the MR glasses G1 in FIG. 1A ).

其中，立體影像產生單元350之成像單元352可以根據使用者設定，將虛擬螢幕D2顯示於可移動物體900周圍之特定位置。The imaging unit 352 of the stereoscopic image generating unit 350 can display the virtual screen D2 at a specific position around the movable object 900 according to user settings.

綜上所述，雖然本揭露已以實施例揭露如上，然其並非用以限定本揭露。本揭露所屬技術領域中具有通常知識者，在不脫離本揭露之精神和範圍內，當可作各種之更動與潤飾。因此，本揭露之保護範圍當視後附之申請專利範圍所界定者為準。To sum up, although the present disclosure has been disclosed above with embodiments, it is not intended to limit the present disclosure. Those with ordinary knowledge in the technical field to which the present disclosure pertains can make various changes and modifications without departing from the spirit and scope of the present disclosure. Therefore, the scope of protection of the present disclosure should be determined by the scope of the appended patent application.

100, 200, 300:同時追蹤可移動物體與可移動相機的六自由度方位之系統 110:可移動相機 120:可移動相機六自由度方位計算單元 121:環境特徵擷取單元 122:相機矩陣計算單元 123:相機方位計算單元 130:可移動物體六自由度方位計算單元 131:物體特徵座標推算單元 132:物體特徵座標修正單元 133:物體方位計算單元 140:訓練資料生成單元 150:神經網路訓練單元 260:自動擴增單元 270:權重調整單元 310:方位修正單元 311:交叉比對單元 312:修正單元 320:方位穩定單元 330:視軸計算單元 340:螢幕方位計算單元 350:立體影像產生單元 351:影像產生單元 352:成像單元 900:可移動物體 CD:可移動相機之六自由度方位 CM:相機矩陣 d, d’, d” :位移 D1, D4:實體螢幕 D2, D3:虛擬螢幕 DD:虛擬螢幕之六自由度方位 EF:環境特徵點 ET:特徵抽取器 FL:特徵點座標預測層 FV:特徵向量 G1:MR眼鏡 GC:幾何限制 GCL:幾何限制層 IM, IM’:影像 LV:損失值 MD:神經網路推論模型 m:位移平均值 OD:可移動物體的六自由度方位 OF, OF’, OF*, OF**:特徵點 OLV:總損失值 P1:手機 s:位移標準差 ST1:訓練階段 ST2:追蹤階段 ST3:增量學習階段 TC:時間限制 TCL:時間限制層

,

,

,

,:座標

:位移

:懲罰值 PL:最佳平面 C:中心點 N:法向量100, 200, 300: a system for simultaneously tracking the 6-DOF orientation of a movable object and a movable camera 110: movable camera 120: movable camera 6-DOF orientation calculation unit 121: environmental feature extraction unit 122: camera matrix calculation Unit 123: camera orientation calculation unit 130: movable object six degrees of freedom orientation calculation unit 131: object feature coordinate calculation unit 132: object feature coordinate correction unit 133: object orientation calculation unit 140: training data generation unit 150: neural network training unit 260: automatic augmentation unit 270: weight adjustment unit 310: orientation correction unit 311: cross comparison unit 312: correction unit 320: orientation stabilization unit 330: boresight calculation unit 340: screen orientation calculation unit 350: stereoscopic image generation unit 351: Image generating unit 352: Imaging unit 900: Movable object CD: Six degrees of freedom orientation of movable camera CM: Camera matrix d, d', d": Displacement D1, D4: Physical screen D2, D3: Virtual screen DD : 6-DOF orientation of virtual screen EF: Environmental feature point ET: Feature extractor FL: Feature point coordinate prediction layer FV: Feature vector G1: MR glasses GC: Geometric constraint GCL: Geometric constraint layer IM, IM': Image LV: Loss value MD: Neural network inference model m: Displacement mean OD: Six degrees of freedom orientation of movable object OF, OF', OF*, OF**: Feature point OLV: Total loss value P1: Mobile phone s: Displacement standard Difference ST1: Training Phase ST2: Tracking Phase ST3: Incremental Learning Phase TC: Time Constrained TCL: Time Constrained Layer

,

,:coordinate

: displacement

: Penalty value PL: Best plane C: Center point N: Normal vector

第1A、1B圖繪示本揭露同時追蹤可移動物體與可移動相機之技術與習知技術相比在應用上的說明。第2A圖繪示根據一實施例之同時追蹤可移動物體與可移動相機的六自由度方位之系統與方法。第2B圖繪示加入訓練階段之同時追蹤可移動物體與可移動相機的六自由度方位之系統與方法。第3A圖繪示可移動相機擷取的一連串影像中，環境特徵點、可移動物體特徵點各自的對應關係。第3B圖示例說明物體在空間的位置與方向。第4A～4B圖繪示修補可移動物體的特徵點。第5A～5D圖繪示以手機為例之特徵點定義及各種訓練資料。第6圖繪示神經網路在訓練階段的結構。第7圖繪示在相鄰的兩張影像之間，特徵點位移的計算方式。第8圖繪示時間限制的計算及判定方法。第9圖繪示缺少時間限制而產生不正確位移的情況。第10圖繪示加入增量學習之同時追蹤可移動物體與可移動相機的六自由度方位之系統與方法。第11圖繪示應用於MR眼鏡之同時追蹤可移動物體與可移動相機的六自由度方位之系統與方法。FIGS. 1A and 1B illustrate the application of the technology of simultaneously tracking a movable object and a movable camera of the present disclosure compared with the prior art. FIG. 2A illustrates a system and method for simultaneously tracking the 6DOF orientation of a movable object and a movable camera, according to one embodiment. Figure 2B illustrates a system and method for simultaneously tracking the 6DOF orientation of a movable object and a movable camera while adding a training phase. FIG. 3A shows the corresponding relationship between the environmental feature points and the movable object feature points in a series of images captured by the movable camera. Figure 3B illustrates the position and orientation of objects in space. 4A to 4B illustrate the feature points of the repaired movable object. Figures 5A to 5D illustrate the definition of feature points and various training data for a mobile phone as an example. Figure 6 shows the structure of the neural network in the training phase. FIG. 7 shows the calculation method of the displacement of feature points between two adjacent images. FIG. 8 shows the calculation and determination method of the time limit. Figure 9 shows the case of incorrect displacement due to lack of time constraints. Figure 10 illustrates a system and method for simultaneously tracking the 6DOF orientation of a movable object and a movable camera while adding incremental learning. FIG. 11 illustrates a system and method for simultaneously tracking the 6-DOF orientation of a movable object and a movable camera when applied to MR glasses.

100:同時追蹤可移動物體與可移動相機的六自由度方位之系統100: A 6DOF Orientation System for Simultaneous Tracking of Movable Objects and Movable Cameras

110:可移動相機110: removable camera

120:可移動相機六自由度方位計算單元120: A six-degree-of-freedom orientation calculation unit for a movable camera

121:環境特徵擷取單元121: Environmental feature extraction unit

122:相機矩陣計算單元122: Camera matrix calculation unit

123:相機方位計算單元123: Camera orientation calculation unit

130:可移動物體六自由度方位計算單元130: Six-degree-of-freedom orientation calculation unit for movable objects

131:物體特徵座標推算單元131: Object feature coordinate calculation unit

132:物體特徵座標修正單元132: Object feature coordinate correction unit

133:物體方位計算單元133: Object orientation calculation unit

CD:可移動相機之六自由度方位CD: Six Degrees of Freedom Orientation for Movable Cameras

CM:相機矩陣CM: Camera Matrix

EF:環境特徵點EF: Environmental Feature Point

GC:幾何限制GC: geometric constraints

IM:影像IM: Video

MD:神經網路推論模型MD: Neural Network Inference Models

OD:可移動物體的六自由度方位OD: six degrees of freedom orientation of movable objects

OF,OF’:可移動物體的特徵點OF,OF': feature points of movable objects

ST1:訓練階段ST1: Training Phase

ST2:追蹤階段ST2: Tracking Phase

TC:時間限制TC: time limit

Claims

A method of simultaneously tracking a plurality of 6 DoF poses of a movable object and a movable camera, including: Capture a plurality of images with the movable camera, extract a plurality of environmental feature points from the images, match the environmental feature points to calculate a plurality of camera matrices of the movable camera, and then use the camera matrices to calculate the movable camera matrix those 6DOF orientations of the camera; and Calculate a plurality of feature points of the movable object from the images captured by the movable camera, and correct the characteristics of the movable object through the camera matrices corresponding to the images and the predefined geometric and time constraints. The coordinates of the feature points, and the corrected coordinates of the feature points and the corresponding camera matrices are used to calculate the 6-DOF orientations of the movable object.

The method for simultaneously tracking the 6-DOF orientations of a movable object and a movable camera as described in claim 1, wherein the feature points of the movable object are estimated from the images captured by the movable camera as It is pre-defined that the coordinates of the feature points are calculated by comparing with the images captured by the movable camera.

The method for simultaneously tracking the 6-DOF orientations of a movable object and a movable camera as described in claim 1, wherein the feature points of the movable object are estimated from the images captured by the movable camera, The coordinates of the feature points are estimated by a neural network inference model, which is pre-trained, and the training data is composed of manual marking and automatic expansion, and the geometric constraints and the time constraints are added in the training process.

The method of simultaneously tracking the 6DOF orientations of a movable object and a movable camera as described in claim 3, wherein while tracking the movable object, the neural network inference model performs incremental learning in the background, the incremental learning The training data for quantitative learning includes: the images captured by the movable camera and the images automatically augmented by the images, and the coordinates of the feature points after the correction corresponding to the images are used to replace the manual markers, and the adjustment of the The weight in the neural network inference model, and the neural network inference model is updated to accurately estimate the coordinates of the feature points of the movable object.

The method for simultaneously tracking the 6-DOF orientations of the movable object and the movable camera as described in claim 1, further comprising: Cross-comparing the 6DOF orientations of the movable object and the 6DOF orientations of the movable camera to correct the 6DOF orientations of the movable object and the 6DOF orientations of the movable camera degree orientation; When the 6DOF orientations of the movable object or the 6DOF orientations of the movable camera change less than a preset value, the 6DOF orientations of the movable object and the movable camera are not changed the 6DOF orientations of the camera; calculating the visual axis of the user's eyes according to the 6DOF orientations of the movable camera; Calculate the 6DOF orientations of the virtual screen based on the 6DOF orientations of the movable object and the 6DOF orientations of the movable camera; and The left-eye image and the right-eye image of the virtual screen are generated according to the 6-DOF orientations of the virtual screen and the optical parameters of the stereoscopic display, so as to display the stereoscopic image of the virtual screen on the stereoscopic display.

The method for simultaneously tracking the 6-DOF orientations of a movable object and a movable camera as described in claim 5, wherein the virtual screen is set by the user to be displayed at a specific position around the movable object, the virtual screen move with the movable object.

The method for simultaneously tracking the six-degree-of-freedom orientations of a movable object and a movable camera as described in claim 1, wherein the step of correcting the coordinates of the feature points of the movable object comprises: Using the camera matrices, project the two-dimensional coordinates of the feature points to the corresponding three-dimensional coordinates; According to the geometric restriction, delete the feature points whose three-dimensional coordinate deviation is greater than a predetermined value, or supplement the coordinates of the undetected feature points with the coordinates of the adjacent feature points according to the geometric restriction; and According to the time limit, compare the coordinate changes of the feature points in the consecutive images, and then use the coordinates of the corresponding feature points in the consecutive images to correct the features whose coordinate changes are greater than a set value coordinates of the point.

The method for simultaneously tracking the 6-DOF orientations of a movable object and a movable camera as described in claim 1, wherein in the step of calculating the 6-DOF orientations of the movable object, For the movable object in the plane, use the feature points to calculate the fitting plane, and the 6-DOF orientations of the movable object are defined by the center point and normal vector of the plane; For the non-planar movable object, the 6DOF orientations of the movable object are defined by the centroids of the three-dimensional coordinates of the feature points.

The method for simultaneously tracking the six-degree-of-freedom orientations of a movable object and a movable camera as described in claim 1, wherein the movable object is a rigid object, and the movable camera is disposed on a head-mounted stereoscopic display, a mobile device, on a computer or robot.

A system for simultaneously tracking multiple 6DOF orientations of a movable object and a movable camera, comprising: The movable camera is used to capture a plurality of images; A six-degree-of-freedom orientation calculation unit for a movable camera is used to extract a plurality of environmental feature points from the images, match the environmental feature points to calculate a plurality of camera matrices of the movable camera, and then calculate the movable camera matrix from the camera matrices. these 6DOF orientations of the mobile camera; and A six-degree-of-freedom orientation calculation unit for a movable object is used to calculate a plurality of feature points of the movable object from the images captured by the movable camera, through the camera matrices corresponding to the images, and a predefined Geometric constraints and time constraints, modify the coordinates of the feature points of the movable object, and then calculate the six-freedom of the movable object based on the corrected coordinates of the feature points and the corresponding camera matrices degree orientation.

The system for simultaneously tracking the 6-DOF orientations of a movable object and a movable camera as described in claim 10, wherein the movable camera 6-DOF orientation calculation unit comprises: an environmental feature extraction unit for extracting the environmental feature points from the images; a camera matrix calculation unit for calculating the camera matrices of the movable camera by matching the environmental feature points; and The camera orientation calculation unit uses the camera matrices to calculate the six-degree-of-freedom orientations of the movable camera.

The system for simultaneously tracking the 6-DOF orientations of a movable object and a movable camera as described in claim 10, wherein the movable object 6-DOF orientation calculation unit comprises: an object feature estimating unit for estimating the feature points of the movable object from the images captured by the movable camera; an object feature coordinate correction unit, configured to correct the coordinates of the feature points of the movable object through the camera matrices corresponding to the images, and the predefined geometric constraints and the time constraints; and The object orientation calculation unit calculates the six-degree-of-freedom orientations of the movable object based on the corrected coordinates of the feature points and the corresponding camera matrices.

The system for simultaneously tracking the 6-DOF orientations of a movable object and a movable camera as described in claim 12, wherein the object feature estimation unit estimates the position of the movable object from the images captured by the movable camera The feature points are predefined, and the coordinates of the feature points are calculated by comparing with the images captured by the movable camera.

The system for simultaneously tracking the 6-DOF orientations of a movable object and a movable camera as described in claim 12, wherein the object feature estimation unit estimates the position of the movable object from the images captured by the movable camera For these feature points, the coordinates of these feature points are calculated by a neural network inference model. The neural network inference model is pre-trained, and the training data is composed of manual marking and automatic expansion. In the training process, the geometric constraints and the time limit.

A system for simultaneously tracking the 6DOF orientations of a movable object and a movable camera as described in claim 14, wherein while tracking the movable object, the neural network inference model performs incremental learning in the background, the incremental learning The training data for quantitative learning includes: the images captured by the movable camera and the images automatically augmented by the images, and the coordinates of the feature points after the correction corresponding to the images are used to replace the manual markers, and the adjustment of the The weight in the neural network inference model, and the neural network inference model is updated to accurately estimate the coordinates of the feature points of the movable object.

The system for simultaneously tracking the six-degree-of-freedom orientations of a movable object and a movable camera as described in claim 10, further comprising: an orientation correction unit for cross-comparing the 6-DOF orientations of the movable object and the 6-DOF orientations of the movable camera to correct the 6-DOF orientations of the movable object and the movable the 6DOF orientations of the camera; The azimuth stabilization unit does not change the 6-DOF azimuths of the movable object when the change of the 6-DOF azimuths of the movable object or the 6-DOF azimuths of the movable camera is less than a preset value the 6DOF orientations with the movable camera; a viewing axis calculating unit for calculating the viewing axis of the user's eyes according to the six-degree-of-freedom orientations of the movable camera; a screen orientation calculation unit for calculating a plurality of 6-DOF orientations of the virtual screen according to the 6-DOF orientations of the movable object and the 6-DOF orientations of the movable camera; and The stereoscopic image generating unit is used for generating the left-eye image and the right-eye image of the virtual screen according to the 6-DOF orientations of the virtual screen and the optical parameters of the stereoscopic display, so as to display the stereoscopic image of the virtual screen on the stereoscopic display.

The system for simultaneously tracking the 6DOF orientations of a movable object and a movable camera as described in claim 16, wherein the virtual screen is set by the user to be displayed at a specific position around the movable object, the virtual screen move with the movable object.

The system for simultaneously tracking the six-degree-of-freedom orientations of a movable object and a movable camera as described in claim 12, wherein the object feature coordinate correction unit Using the camera matrices, project the two-dimensional coordinates of the feature points to the corresponding three-dimensional coordinates; and According to the geometric restriction, delete the feature points whose three-dimensional coordinate deviation is greater than a predetermined value, or supplement the coordinates of the undetected feature points with the coordinates of the adjacent feature points according to the geometric restriction; and According to the time limit, compare the coordinate changes of the feature points in the consecutive images, and then use the coordinates of the corresponding feature points in the consecutive images to correct the features whose coordinate changes are greater than a set value coordinates of the point.

The system for simultaneously tracking the 6DOF orientations of a movable object and a movable camera as described in claim 12, wherein the object orientation calculation unit For the movable object in the plane, use the feature points to calculate the fitting plane, and the 6-DOF orientations of the movable object are defined by the center point and normal vector of the plane; For the non-planar movable object, the 6DOF orientations of the movable object are defined by the centroids of the three-dimensional coordinates of the feature points.

The system for simultaneously tracking the six-degree-of-freedom orientations of a movable object and a movable camera as described in claim 10, wherein the movable object is a rigid object, and the movable camera is disposed on a head-mounted stereoscopic display, a mobile device, on a computer or robot.