TW202405757A

TW202405757A - Computing apparatus and model generation method

Info

Publication number: TW202405757A
Application number: TW111140954A
Authority: TW
Inventors: 杜宇威; 張鈞凱
Original assignee: 杜宇威
Priority date: 2022-07-22
Filing date: 2022-10-27
Publication date: 2024-02-01
Also published as: TWI822423B; CN117437348A; TWM637241U

Abstract

A computing apparatus and a model generation method are provided. In the method, sensing data is fused to determine depth information of multiple sensing points, moving trajectories of one or more pixels in the image data are tracked according to the image data and the inertial measurement data through the visual inertial odometry (VIO) algorithm, and those sensing points are mapped into a coordinate system according to the depth information and the moving trajectories through the simultaneous localization and mapping (SLAM) algorithm, to generate a three-dimensional environment model. An object is set on the three-dimensional environment model through a setting operation. The shopping information of the object is provided.

Description

Computing device and model generation method

本發明是有關於一種空間建模技術，且特別是有關於一種運算裝置及模型產生方法。The present invention relates to a spatial modeling technology, and in particular, to a computing device and a model generation method.

為了模擬真實環境，可以對真實環境的空間進行掃描以產生看起來像真實環境的模擬環境。模擬環境可實現在諸如遊戲、家居佈置、機器人移動等應用。值得注意的是，掃描空間所得到的感測資料可能有誤差，進而造成模擬環境的失真。To simulate a real environment, the space of the real environment can be scanned to produce a simulated environment that looks like the real environment. The simulation environment can be implemented in applications such as games, home decoration, and robot movement. It is worth noting that the sensing data obtained by scanning the space may have errors, which may cause distortion of the simulated environment.

本發明實施例提供一種運算裝置及模型產生方法，可補償誤差，進而提升模擬環境的擬真度。Embodiments of the present invention provide a computing device and a model generation method that can compensate for errors and thereby improve the fidelity of the simulation environment.

本發明實施例的模型產生方法包括：融合那些感測資料，以決定多個感測點的深度資訊。這些感測資料包括影像資料及慣性(Inertial)測量資料。依據影像資料及慣性測量資料透過視覺慣性測程(Visual Inertial Odometry，VIO)演算法追蹤影像資料中的一個或更多個像素的移動軌跡。依據深度資訊及移動軌跡透過同步定位與映射(Simultaneous Localization And Mapping，SLAM)演算法將那些感測點映射到坐標系，以產生三維環境模型。三維環境模型中的位置由坐標系所定義。The model generation method in the embodiment of the present invention includes: fusing the sensing data to determine the depth information of multiple sensing points. These sensing data include image data and inertial measurement data. Based on the image data and inertial measurement data, the movement trajectory of one or more pixels in the image data is tracked through the Visual Inertial Odometry (VIO) algorithm. Based on the depth information and movement trajectories, those sensing points are mapped to the coordinate system through the Simultaneous Localization And Mapping (SLAM) algorithm to generate a three-dimensional environment model. Positions in the 3D environment model are defined by coordinate systems.

本發明實施例的運算裝置包括記憶體及處理器。記憶體用以儲存程式碼。處理器耦接記憶體。處理器載入程式碼以執行運算裝置經配置用以融合多個感測資料以決定多個感測點的深度資訊，依據影像資料及慣性測量資料透過視覺慣性測程演算法追蹤影像資料中的一個或更多個像素的移動軌跡，並依據深度資訊及移動軌跡透過同步定位與映射演算法將那些感測點映射到坐標系以產生三維環境模型。感測資料包括影像資料及慣性測量資料。三維環境模型中的位置由坐標系所定義。The computing device in the embodiment of the present invention includes a memory and a processor. Memory is used to store program code. The processor is coupled to the memory. The processor loads the program code to execute the computing device configured to fuse multiple sensing data to determine depth information of multiple sensing points, and track the depth information in the image data through a visual inertial odometry algorithm based on the image data and inertial measurement data. The movement trajectories of one or more pixels are mapped to the coordinate system through simultaneous positioning and mapping algorithms based on the depth information and movement trajectories to generate a three-dimensional environment model. Sensing data includes image data and inertial measurement data. Positions in the 3D environment model are defined by coordinate systems.

基於上述，依據本發明的運算裝置及模型產生方法，利用VIO及SLAM演算法估測環境中的感測點的位置，並據以建立三維環境模型。藉此，可提升位置估測的準確度及三維模型的擬真度。Based on the above, according to the computing device and model generation method of the present invention, VIO and SLAM algorithms are used to estimate the positions of sensing points in the environment, and thereby establish a three-dimensional environment model. This can improve the accuracy of position estimation and the fidelity of the three-dimensional model.

為讓本發明的上述特徵和優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明如下。In order to make the above-mentioned features and advantages of the present invention more obvious and easy to understand, embodiments are given below and described in detail with reference to the accompanying drawings.

圖1是依據本發明的一實施例的模型產生系統1的示意圖。請參照圖1，模型產生系統1包括(但不僅限於)行動裝置10及運算裝置30。Figure 1 is a schematic diagram of a model generation system 1 according to an embodiment of the present invention. Referring to FIG. 1 , the model generation system 1 includes (but is not limited to) a mobile device 10 and a computing device 30 .

行動裝置10可以是手機、平板電腦、掃描器、機器人、穿戴式裝置、自走車或車載系統。行動裝置10包括(但不僅限於)多台感測器11。The mobile device 10 may be a mobile phone, a tablet computer, a scanner, a robot, a wearable device, a self-propelled vehicle or a vehicle-mounted system. The mobile device 10 includes (but is not limited to) multiple sensors 11 .

感測器11可以是影像擷取裝置、光達(LiDAR)、飛行時間(Time-of-Flight，ToF)偵測器、慣性測量單元(Inertial Measurement Unit，IMU)、加速度計、陀螺儀或電子羅盤。在一實施例中，感測器11用以取得感測資料。感測資料包括影像資料及慣性感測資料。影像資料可以是一張或更多張影像及其像素的感測強度。慣性感測資料可以是姿態、三軸的加速度、角速度或位移。The sensor 11 may be an image capture device, a LiDAR, a Time-of-Flight (ToF) detector, an Inertial Measurement Unit (IMU), an accelerometer, a gyroscope or an electronic sensor. compass. In one embodiment, the sensor 11 is used to obtain sensing data. Sensing data includes image data and inertial sensing data. The image data may be one or more images and the sensing intensity of their pixels. Inertial sensing data can be attitude, three-axis acceleration, angular velocity or displacement.

運算裝置30可以是手機、平板電腦、桌上型電腦、筆記型電腦、伺服器或智能助理裝置。運算裝置30通訊連接行動裝置10。例如，透過Wi-Fi、藍芽、紅外線或其他無線傳輸技術，或透過電路內部線路、乙太網路、光纖網路、通用序列匯流排(Universal Serial Bus，USB)或其他有線傳輸技術傳送或接收資料，並可能有額外的通訊收發器(圖未示)實現。運算裝置30包括(但不僅限於)記憶體31及處理器32。The computing device 30 may be a mobile phone, a tablet computer, a desktop computer, a notebook computer, a server or an intelligent assistant device. The computing device 30 is communicatively connected to the mobile device 10 . For example, through Wi-Fi, Bluetooth, infrared or other wireless transmission technologies, or through circuit internal wiring, Ethernet, optical fiber network, Universal Serial Bus (USB) or other wired transmission technologies or Receive data and may be implemented with additional communication transceivers (not shown). The computing device 30 includes (but is not limited to) a memory 31 and a processor 32 .

記憶體31可以是任何型態的固定或可移動隨機存取記憶體(Radom Access Memory，RAM)、唯讀記憶體(Read Only Memory，ROM)、快閃記憶體(flash memory)、傳統硬碟(Hard Disk Drive，HDD)、固態硬碟(Solid-State Drive，SSD)或類似元件。在一實施例中，記憶體31用以儲存程式碼、軟體模組、資料(例如，感測資料、或三維模型)或檔案，其詳細內容待後續實施例詳述。The memory 31 can be any type of fixed or removable random access memory (Radom Access Memory, RAM), read only memory (Read Only Memory, ROM), flash memory (flash memory), traditional hard disk (Hard Disk Drive, HDD), solid-state drive (Solid-State Drive, SSD) or similar components. In one embodiment, the memory 31 is used to store program codes, software modules, data (for example, sensing data, or three-dimensional models) or files, the details of which will be described in subsequent embodiments.

處理器32耦接及記憶體31。處理器32可以是中央處理單元(Central Processing Unit，CPU)，或是其他可程式化之一般用途或特殊用途的微處理器(Microprocessor)、數位信號處理器(Digital Signal Processor，DSP)、可程式化控制器、特殊應用積體電路(Application-Specific Integrated Circuit，ASIC)或其他類似元件或上述元件的組合。在一實施例中，處理器32用以執行運算裝置30的所有或部份作業，且可載入並執行記憶體31所儲存的程式碼、軟體模組、檔案及/或資料。在一實施例中，處理器32執行本發明實施例的所有或部分操作。在一些實施例中，記憶體31所記錄的那些軟體模組或程式碼也可能是實體電路所實現。The processor 32 is coupled to the memory 31 . The processor 32 may be a central processing unit (CPU), or other programmable general-purpose or special-purpose microprocessor (Microprocessor), digital signal processor (Digital Signal Processor, DSP), programmable chemical controller, Application-Specific Integrated Circuit (ASIC) or other similar components or a combination of the above components. In one embodiment, the processor 32 is used to execute all or part of the operations of the computing device 30 and can load and execute program codes, software modules, files and/or data stored in the memory 31 . In one embodiment, the processor 32 performs all or part of the operations of embodiments of the present invention. In some embodiments, the software modules or program codes recorded in the memory 31 may also be implemented by physical circuits.

在一些實施例中，行動裝置10與運算裝置30可整合成獨立裝置。In some embodiments, the mobile device 10 and the computing device 30 may be integrated into independent devices.

下文中，將搭配模型產生系統1中的各項裝置及元件說明本發明實施例所述之方法。本方法的各個流程可依照實施情形而隨之調整，且並不僅限於此。In the following, the method described in the embodiment of the present invention will be explained in conjunction with various devices and components in the model generation system 1 . Each process of this method can be adjusted according to the implementation situation, and is not limited to this.

圖2是依據本發明的一實施例的模型產生方法的流程圖。請參照圖2，運算裝置30的處理器32融合多個感測資料，以決定多個感測點的深度資訊(步驟S210)。具體而言，透過感測器11掃描所處環境，可形成多個感測點。感測點的深度資訊可以是感測器11與感測點之間的距離。在一實施例中，處理器32可將影像資料中的影像匹配成多個影像區塊。例如，處理器32可透過影像特徵比對或深度學習模型辨識影像中的物體(例如，牆、天花板、地板或櫃架)，並依據物體所在的區域的輪廓分割成影像區塊。接著，處理器32可決定那些影像區塊對應的深度資訊。例如，處理器32可透過深度學習模型擷取特徵並據以預測影像區塊或所屬物體的深度資訊。深度學習模型/演算法可分析訓練樣本以自中獲得規律，從而透過規律對未知資料預測。一般而言，深度資訊通常相關於物件在場景中的大小比例及姿態。而深度學習模型即是經學習後所建構出的機器學習模型，並據以對待評估資料(例如，影像區域)推論。又例如，處理器32可比對影像區域與記憶體31所儲存位於不同位置的物體的特徵資訊。處理器32可基於相似程度高於對應門檻值的位置決定深度資訊。Figure 2 is a flow chart of a model generation method according to an embodiment of the present invention. Referring to FIG. 2 , the processor 32 of the computing device 30 fuses multiple sensing data to determine depth information of multiple sensing points (step S210 ). Specifically, multiple sensing points can be formed by scanning the environment through the sensor 11 . The depth information of the sensing point may be the distance between the sensor 11 and the sensing point. In one embodiment, the processor 32 may match images in the image data into multiple image blocks. For example, the processor 32 can identify objects in the image (for example, walls, ceilings, floors, or cabinets) through image feature comparison or deep learning models, and segment the objects into image blocks based on the outline of the area where the object is located. Then, the processor 32 can determine the depth information corresponding to those image blocks. For example, the processor 32 can extract features through a deep learning model and predict the depth information of the image block or the corresponding object based on the features. Deep learning models/algorithms can analyze training samples to obtain patterns from them, and then predict unknown data through patterns. Generally speaking, depth information is usually related to the size ratio and posture of objects in the scene. The deep learning model is a machine learning model constructed after learning, and inferences are made based on the evaluation data (for example, image area). For another example, the processor 32 can compare the image area with the characteristic information of objects located at different locations stored in the memory 31 . The processor 32 may determine the depth information based on the locations with a degree of similarity higher than a corresponding threshold.

在另一實施例中，感測器11為深度感測器或距離感測器。處理器32可依據深度感測器或距離感測器的感測資料決定環境中的多個感測點的深度資訊。In another embodiment, the sensor 11 is a depth sensor or a distance sensor. The processor 32 can determine the depth information of multiple sensing points in the environment based on the sensing data of the depth sensor or the distance sensor.

處理器32依據影像資料及慣性測量資料透過視覺慣性測程(Visual Inertial Odometry，VIO)演算法追蹤影像資料中的一個或更多個像素的移動軌跡(步驟S220)。具體而言，VIO是使用一個或多個影像擷取裝置及一個或多個IMU進行狀態測量的技術。前述狀態是指感測器11的載體(例如，行動裝置10)在特定自由度下的姿態、速度或其他物理量。由於影像擷取裝置可在一定的曝光時間內捕獲光子以取得到一張二維(2D)的影像，在低速運動時，影像擷取裝置所得到的影像資料記錄相當豐富的環境資訊。然而，同時影像資料容易受到環境的影響，且有尺寸上的模棱兩可的問題。相比之下，IMU是用於感測自身角加速度及加速度。雖然慣性測量資料較為單一且累積誤差很大，但其不受環境的影響。此外，慣性測量資料還具有確切尺度單位的特性，正好彌補影像資料的短缺。透過整合影像資料及慣性測量資料兩者，可得到較為準確的慣性導航。The processor 32 tracks the movement trajectory of one or more pixels in the image data through a visual inertial odometry (VIO) algorithm based on the image data and inertial measurement data (step S220). Specifically, VIO is a technology that uses one or more image capture devices and one or more IMUs to perform state measurements. The aforementioned state refers to the posture, speed or other physical quantity of the carrier of the sensor 11 (for example, the mobile device 10) under a specific degree of freedom. Since the image capture device can capture photons within a certain exposure time to obtain a two-dimensional (2D) image, when moving at low speed, the image data obtained by the image capture device records quite rich environmental information. However, at the same time, image data are easily affected by the environment and have dimensional ambiguities. In contrast, IMU is used to sense its own angular acceleration and acceleration. Although the inertial measurement data is relatively single and has a large cumulative error, it is not affected by the environment. In addition, inertial measurement data also have the characteristics of exact scale units, which just makes up for the shortage of image data. By integrating both image data and inertial measurement data, more accurate inertial navigation can be obtained.

圖3是依據本發明的一實施例的慣性導航的示意圖。請參照圖3，處理器32可決定影像資料中的物體在時間點T1與時間點T2之間的位置差異。時間點T1早於時間點T2。物體在影像中佔據部分像素。處理器32可辨識物體，判斷物體在影像中的影像位置，並定義為地標(landmark)L。處理器32可比較兩個不同時間點T1、T2在影像擷取裝置112所擷取到相同物體的位置差異。Figure 3 is a schematic diagram of inertial navigation according to an embodiment of the present invention. Referring to FIG. 3 , the processor 32 can determine the position difference of the object in the image data between time point T1 and time point T2. Time point T1 is earlier than time point T2. The object occupies some of the pixels in the image. The processor 32 can identify the object, determine the image position of the object in the image, and define it as a landmark (landmark) L. The processor 32 can compare the position difference of the same object captured by the image capturing device 112 at two different time points T1 and T2.

接著，處理器32可依據時間點T1的初始位置及位置差異決定時間點T1至時間點T2的移動軌跡。初始位置是依據時間點T1的慣性測量資料(透過IMU 111取得)所決定。例如，對IMU 111的慣性積分可得出初始位置。處理器32可進一步將地標L的位置自感測坐標系轉換到世界坐標系WC。而VIO的資料融合方法有很多。例如，鬆耦合(loosely coupled)與緊耦合(tightly coupled)。鬆耦合演算法分別依據影像資料與慣性測量資料進行位姿估測，再對其位姿估測結果進行融合。而緊耦合演算法直接融合影像資料與慣性測量資料，依據融合資料建構運動與觀測方程式，並據以進行狀態估測。Then, the processor 32 can determine the movement trajectory from time point T1 to time point T2 based on the initial position and position difference of time point T1. The initial position is determined based on the inertial measurement data (obtained through the IMU 111) at time point T1. For example, integrating the inertia of IMU 111 may yield the initial position. The processor 32 may further convert the position of the landmark L from the sensing coordinate system to the world coordinate system WC. VIO has many data fusion methods. For example, loosely coupled and tight coupled. The loose coupling algorithm performs pose estimation based on image data and inertial measurement data respectively, and then fuses the pose estimation results. The tightly coupled algorithm directly fuses image data and inertial measurement data, constructs motion and observation equations based on the fused data, and performs state estimation accordingly.

請參照圖2，處理器32依據深度資訊及移動軌跡透過同步定位與映射(Simultaneous Localization And Mapping，SLAM)演算法將那些感測點映射到坐標系，以產生三維(3D)環境模型(步驟S230)。具體而言，SLAM演算法可透過座標轉換將環境中的感測點處於不同時刻不同位置的深度資訊，轉換到同一個坐標系下，從而產生對於環境的完整三維環境模型。而三維環境模型中的位置由這坐標系所定義。Referring to FIG. 2 , the processor 32 maps the sensing points to the coordinate system through a Simultaneous Localization And Mapping (SLAM) algorithm based on the depth information and movement trajectories to generate a three-dimensional (3D) environment model (step S230 ). Specifically, the SLAM algorithm can convert the depth information of sensing points in the environment at different locations at different times into the same coordinate system through coordinate conversion, thereby generating a complete three-dimensional environment model of the environment. The position in the three-dimensional environment model is defined by this coordinate system.

然而，無偏差/誤差且高準確度的環境三維環境模型需要倚賴無偏差的移動軌跡以及深度資訊。然而，由於各種感測器11通常會存在不同程度上的誤差。此外，雜訊通常會存在於真實環境中，因此SLAM演算法要考慮的不只是數學上的唯一解，更包括與那些和結果相關的物理概念的相互作用。值得注意的是，在三維模型構建的下一個反覆運算步驟中，測得的距離和方向/姿態有可預知的系列誤差。這些誤差通常由感測器11的有限準確度、以及來自環境中的其他雜訊所引起，並反映在三維環境模型上的點、或是特徵的誤差。隨著時間的推移和運動的變化，定位和地圖構建的誤差累計增加，進而影響地圖本身的精度。However, a bias/error-free and highly accurate 3D environment model requires unbiased movement trajectories and depth information. However, various sensors 11 usually have errors to varying degrees. In addition, noise usually exists in the real environment, so the SLAM algorithm must consider not only the unique mathematical solution, but also the interaction with the physical concepts related to the result. It is worth noting that in the next iterative step of building a three-dimensional model, the measured distance and direction/attitude have a predictable series of errors. These errors are usually caused by the limited accuracy of the sensor 11 and other noise from the environment, and are reflected in errors in points or features on the three-dimensional environment model. As time passes and movement changes, errors in positioning and map construction accumulate, affecting the accuracy of the map itself.

在一實施例中，處理器32可匹配第一時間點的第一關聯性及第二時間點的第二關聯性。第一時間點早於該第二時間點。第一關聯性是第一時間點的那些感測資料與該三維環境模型中的對應位置之間的關聯性，且第二關聯性是第二時間點的那些感測資料與該三維環境模型中的對應位置之間的關聯性。也就是，在特定時間點的感測資料與所對應的地標。SLAM演算架構是透過一個反覆運算數學問題，來解決各種感測資料的偏差。數學問題例如是基於感測資料(作為狀態)形成運動方程式及觀測方程式。In one embodiment, the processor 32 may match the first correlation at the first point in time and the second correlation at the second point in time. The first time point is earlier than the second time point. The first correlation is the correlation between the sensing data at the first time point and the corresponding position in the three-dimensional environment model, and the second correlation is the correlation between the sensing data at the second time point and the three-dimensional environment model. The correlation between the corresponding positions. That is, the sensing data at a specific time point and the corresponding landmark. The SLAM algorithm uses an iterative mathematical problem to solve the deviations of various sensed data. Mathematical problems include, for example, forming equations of motion and equations of observation based on sensed data (as states).

處理器32可依據第一關聯性及第二關聯性之間的匹配結果修正那些感測點在坐標系上的位置。為了補償這些誤差，處理器32可將當前的三維環境模型與先前的三維環境模型進行匹配。例如，透過可知已走過三維環境模型中重複地點的環路閉合(Loop Closure)演算法。或者，用於SLAM機率學相關的演算法。例如，卡爾曼濾波、粒子濾波(某一種蒙特卡羅方法)以及掃描匹配的資料範圍。透過這些演算法，處理器32可透過比對當前(例如，第二時間點)以及過去(例如，第一時間點)的感測資料，來逐步最佳化過去及現在的軌跡位置以及深度資訊。透過遞迴式的最佳化，可得到對環境中各個點的精準估測。由上述說明可知，本發明實施例的演算法能夠形成一個閉環，也才能夠隨著軌跡的推移，累積出完整且精準的三維環境模型。反之，若未形成閉環，則誤差可能會持續累積並且放大，最終導致前後資料不連貫，進而產出無用的三維環境模型。The processor 32 may correct the positions of those sensing points on the coordinate system according to the matching results between the first correlation and the second correlation. To compensate for these errors, processor 32 may match the current three-dimensional environment model with previous three-dimensional environment models. For example, through the loop closure algorithm, it is known that repeated locations in the three-dimensional environment model have been traveled. Or, for algorithms related to SLAM probabilistics. For example, Kalman filtering, particle filtering (a certain Monte Carlo method), and scanning the data range for matching. Through these algorithms, the processor 32 can gradually optimize past and present trajectory positions and depth information by comparing current (eg, second time point) and past (eg, first time point) sensing data. . Through recursive optimization, accurate estimates of each point in the environment can be obtained. From the above description, it can be seen that the algorithm of the embodiment of the present invention can form a closed loop, and can also accumulate a complete and accurate three-dimensional environment model as the trajectory progresses. On the contrary, if a closed loop is not formed, the errors may continue to accumulate and amplify, eventually leading to incoherence in the previous and later data, thereby producing a useless three-dimensional environmental model.

在一實施例中，處理器32可依據第一關聯性及第二關聯性透過最佳化演算法最小化那些感測點在坐標系上的位置的誤差，並依據第二關聯性透過濾波演算法估測那些感測點在坐標系上的位置。最佳化演算法是將SLAM的狀態估測轉換成誤差項，並最小化誤差項。例如，牛頓法、高斯-牛頓法或Levenberg-Marquardt方法。濾波演算法例如是卡爾曼濾波、擴展卡爾曼濾波、粒子濾波。最佳化演算法可參酌不同時間點的感測資料，而濾波演算法是對當前感測資料引入雜訊。In one embodiment, the processor 32 can minimize errors in the positions of the sensing points on the coordinate system through an optimization algorithm based on the first correlation and the second correlation, and through a filtering algorithm based on the second correlation. method to estimate the position of those sensing points on the coordinate system. The optimization algorithm converts the SLAM state estimate into an error term and minimizes the error term. For example, Newton's method, Gauss-Newton method or Levenberg-Marquardt method. Examples of filtering algorithms include Kalman filtering, extended Kalman filtering, and particle filtering. The optimization algorithm can refer to the sensing data at different time points, while the filtering algorithm introduces noise to the current sensing data.

與現有技術不同處在於，相較於現有技術僅單一採用最佳化演算法或濾波演算法，本發明實施例結合兩種演算法。而最佳化演算法與濾波演算法的比重相關於運算裝置30的軟硬體資源及預測位置的準確度。例如，若軟硬體資源或準確度要求較低，則濾波演算法的比重高於最佳化演算法。而若軟硬體資源或準確度要求較高，則最佳化演算法的比重高於濾波演算法。The difference from the existing technology is that compared with the existing technology that only uses an optimization algorithm or a filtering algorithm, the embodiment of the present invention combines the two algorithms. The proportion of the optimization algorithm and the filtering algorithm is related to the software and hardware resources of the computing device 30 and the accuracy of the predicted position. For example, if the software and hardware resources or accuracy requirements are low, the filtering algorithm has a higher weight than the optimization algorithm. And if the software and hardware resources or accuracy requirements are high, the proportion of the optimization algorithm will be higher than that of the filtering algorithm.

在一實施例中，處理器32可接收設置操作。設置操作可透過諸如觸控面板、滑鼠、鍵盤或其他輸入裝置取得。例如，滑動、按壓或點擊操作。處理器32可依據這設置操作在三維環境模型中設置物件。依據不同應用情境，物件例如是家具、畫框或家電。處理器32可依據設置操作移動物件，並將物件放置於三圍環境模型中的指定位置。接著，處理器32可透過顯示器(圖未示)提供這物件的購物資訊。例如，物件名稱、金額、運送方式、支付選擇等選項。處理器32還可透過通訊收發器(圖未示)連結到店商伺服器，並據以完成購物流程。In one embodiment, processor 32 may receive a set operation. Setting operations can be obtained through input devices such as touch panels, mice, keyboards, or other input devices. For example, a swipe, press, or click. The processor 32 can set objects in the three-dimensional environment model according to this setting operation. Depending on the application scenario, the objects may be furniture, picture frames or home appliances. The processor 32 can move the object according to the setting operation and place the object at a designated position in the three-dimensional environment model. Then, the processor 32 can provide the shopping information of the object through the display (not shown). For example, item name, amount, shipping method, payment options and other options. The processor 32 can also connect to the store server through a communication transceiver (not shown) and complete the shopping process accordingly.

在一應用情境中，行動裝置10可快速掃瞄空間並感知空間內的所有尺寸資訊，讓使用者不再需要任何手動丈量，即能直接輕鬆在三維環境模型中佈置家具。本發明實施例還可提供軟體即服務(Software as a Service，SaaS)系統，讓使用者可參考實際空間搭配呈現或調整擺放位置，且運算裝置30所裝載的購物程式可將商品加入購物車以直接購物。除此之外，雲端串連的方式更能夠讓使用者互助遠端搭配空間，進而成為線上最大家居社群。然而，不限於家具布置，本發明實施例的快速建模特性還能導入其他應用。In an application scenario, the mobile device 10 can quickly scan the space and sense all dimensional information in the space, so that the user no longer needs any manual measurements and can directly and easily arrange furniture in the three-dimensional environment model. Embodiments of the present invention can also provide a Software as a Service (SaaS) system, allowing users to refer to the actual space for presentation or adjustment of placement, and the shopping program loaded on the computing device 30 can add products to the shopping cart. to shop directly. In addition, the cloud connection method allows users to cooperate with each other to coordinate space remotely, thus becoming the largest online home community. However, it is not limited to furniture arrangement, and the rapid modeling characteristics of embodiments of the present invention can also be introduced into other applications.

綜上所述，在本發明的運算裝置及模型產生方法中，對手機或其他可攜式行動裝置的LiDAR、相機、IMU等感測器的資料進行資料融合以取得具有深度資訊，再透過VIO演算法來追蹤相機上不同像素的移動軌跡，利用深度資訊及移動的軌跡再搭配上SLAM演算法框架進行最佳化，以得到對環境中各感測點的準確估測。To sum up, in the computing device and model generation method of the present invention, data from sensors such as LiDAR, cameras, and IMUs of mobile phones or other portable mobile devices are fused to obtain in-depth information, and then through VIO Algorithm is used to track the movement trajectories of different pixels on the camera, and the depth information and movement trajectories are used together with the SLAM algorithm framework for optimization to obtain accurate estimates of each sensing point in the environment.

雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明的精神和範圍內，當可作些許的更動與潤飾，故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed above through embodiments, they are not intended to limit the present invention. Anyone with ordinary knowledge in the technical field may make some modifications and modifications without departing from the spirit and scope of the present invention. Therefore, The protection scope of the present invention shall be determined by the appended patent application scope.

1:模型產生系統 10:行動裝置 11:感測器 30:運算裝置 31:記憶體 32:處理器 S210~S230:步驟 T1、T2:時間點 111:IMU 112:影像擷取裝置 L:地標 WC:世界坐標系 1: Model generation system 10:Mobile device 11: Sensor 30:Computing device 31:Memory 32: Processor S210~S230: steps T1, T2: time points 111:IMU 112:Image capture device L:Landmark WC: world coordinate system

圖1是依據本發明的一實施例的模型產生系統的示意圖。圖2是依據本發明的一實施例的模型產生方法的流程圖。圖3是依據本發明的一實施例的慣性導航的示意圖。 FIG. 1 is a schematic diagram of a model generation system according to an embodiment of the present invention. Figure 2 is a flow chart of a model generation method according to an embodiment of the present invention. Figure 3 is a schematic diagram of inertial navigation according to an embodiment of the present invention.

S210~S230:步驟 S210~S230: steps

Claims

A model generation method including: Fusion of multiple sensing data to determine depth information of multiple sensing points, where the sensing data includes image data and inertial measurement data; Track a movement trajectory of at least one pixel in the image data through a visual inertial odometry (VIO) algorithm based on the image data and the inertial measurement data; and Based on the depth information and the movement trajectory, the sensing points are mapped to a coordinate system through a Simultaneous Localization and Mapping (SLAM) algorithm to generate a three-dimensional environment model, in which The position is defined by this coordinate system.

The model generation method as described in claim 1, wherein the step of mapping the sensing points to the coordinate system includes: Matching a first correlation at a first time point and a second correlation at a second time point, wherein the first time point is earlier than the second time point, and the first correlation is the first time point The correlation between the sensing data and the corresponding position in the three-dimensional environment model, and the second correlation is between the sensing data at the second time point and the corresponding position in the three-dimensional environment model relevance; and The positions of the sensing points on the coordinate system are modified according to the matching result between the first correlation and the second correlation.

The model generation method of claim 2, wherein the step of modifying the positions of the sensing points on the coordinate system based on the matching result between the first correlation and the second correlation includes: Minimizing the position error of the sensing points on the coordinate system through an optimization algorithm based on the first correlation and the second correlation; and Estimating the positions of the sensing points on the coordinate system through a filtering algorithm based on the second correlation, wherein the proportion of the optimization algorithm and the filtering algorithm is related to the resources and predicted position of the computing device accuracy.

As for the model generation method described in claim 1, the step of fusing the sensing data includes: Split the image data into multiple image blocks; The steps of determining the depth information corresponding to the image blocks and tracking the movement trajectory of the at least one pixel in the image data include: Determining a position difference between an object in the image data between a third time point and a fourth time point, wherein the third time point is earlier than the fourth time point; and The movement trajectory from the third time point to the fourth time point is determined based on an initial position at the third time point and the position difference, wherein the initial position is determined based on the inertial measurement data at the third time point.

The model generation method as described in request item 1 further includes: receive a setting operation; Set an object in the three-dimensional environment model according to the setting operation; and Provide shopping information for this item.

A computing device including: a memory to store a program code; and a processor, coupled to the memory, configured to load to execute: Fusion of multiple sensing data to determine depth information of multiple sensing points, where the sensing data includes image data and inertial measurement data; Track a movement trajectory of at least one pixel in the image data through a visual inertial odometry algorithm based on the image data and the inertial measurement data; and Based on the depth information and the movement trajectory, the sensing points are mapped to a coordinate system through simultaneous positioning and mapping to generate a three-dimensional environment model, where the positions in the three-dimensional environment model are defined by the coordinate system.

The computing device of claim 6, wherein the processor is further configured to execute: Matching a first correlation at a first time point and a second correlation at a second time point, wherein the first time point is earlier than the second time point, and the first correlation is the first time point The correlation between the sensing data and the corresponding depth information and the corresponding movement trajectory, and the second correlation is the sensing data at the second time point, the corresponding depth information and the corresponding movement trajectory the correlation between; and The positions of the sensing points on the coordinate system are modified according to the matching result between the first correlation and the second correlation.

The computing device of claim 7, wherein the processor is further configured to execute: Minimizing the position error of the sensing points on the coordinate system through an optimization algorithm based on the first correlation and the second correlation; and Estimating the positions of the sensing points on the coordinate system through a filtering algorithm based on the second correlation, wherein the proportion of the optimization algorithm and the filtering algorithm is related to the resources and predicted positions of the computing device accuracy.

The computing device of claim 6, wherein the processor is further configured to execute: Split the image data into multiple image blocks; Determine the depth information corresponding to these image blocks; Determining a position difference between an object in the image data between a third time point and a fourth time point, wherein the third time point is earlier than the fourth time point; and The movement trajectory from the third time point to the fourth time point is determined based on an initial position at the third time point and the position difference, wherein the initial position is determined based on the inertial measurement data at the third time point.

The computing device of claim 6, wherein the processor is further configured to execute: receive a setting operation; Set an object in the three-dimensional environment model according to the setting operation; and Provide shopping information for this item.