TWI798999B

TWI798999B - Device and method for buliding three-dimensional video

Info

Publication number: TWI798999B
Application number: TW110146850A
Authority: TW
Inventors: 田永平; 徐偉軒; 吳家齊; 廖燕鈴
Original assignee: 財團法人工業技術研究院
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2023-04-11
Also published as: TW202327347A; CN116263977A

Abstract

A device and a method for building a three-dimensional video are provided. The device includes a plurality of video capture devices and a processing device. Each video capture device generates a video capture signal. The video capture signal includes image data, depth data, and time stamps with the image data. In a video capture stage, for each of the time stamps, the processing device calculates point cloud data and texture data based on the image data and the depth data of the video capture signal, and creates a three-dimensional model based on the point cloud data and the texture data. The processing device integrates the time stamps and the three-dimensional model corresponding to the time stamps into a video stream.

Description

Apparatus and method for constructing three-dimensional images

本發明是有關於一種用於建置三維影像的裝置與方法。The invention relates to a device and method for constructing a three-dimensional image.

在擴增實境（Augmented Reality；AR）技術或是虛擬實境（Virtual Reality；VR）技術的應用情境中，需要播放3D影像或是動態的3D模型來讓使用者更為融入虛擬場景或方便於與虛擬物件的互動。In the application scenarios of augmented reality (Augmented Reality; AR) technology or virtual reality (Virtual Reality; VR) technology, it is necessary to play 3D images or dynamic 3D models to allow users to better integrate into the virtual scene or facilitate for interacting with virtual objects.

然則，若要產生動態的3D影像（或稱為是「動畫」），便需要先行建置靜態的3D模型，從而耗時耗力。並且，若要使靜態的3D模型轉換成動態的3D模型，需要另外建立3D模型的骨架以及表面的材質貼圖，因而需要大量的設計人力，方能續行後續的動畫製作。因此，若能夠降低3D模型在建立時的難度，讓使用者更易於建置3D模型，則可更為方便地研發AR技術或VR技術等應用。However, if you want to generate dynamic 3D images (or called "animation"), you need to build a static 3D model first, which is time-consuming and labor-intensive. Moreover, if the static 3D model is to be converted into a dynamic 3D model, the skeleton of the 3D model and the texture map of the surface need to be established separately, thus requiring a lot of design manpower to continue the subsequent animation production. Therefore, if it is possible to reduce the difficulty of creating a 3D model and make it easier for users to build a 3D model, it will be more convenient to develop applications such as AR technology or VR technology.

本發明提供一種用於建置三維影像的裝置與方法，藉由實景影像產生具備3D模型的3D影像，除了可以記錄3D模型在時間軸上的動作外，還可節省3D模型的建置時間。The present invention provides a device and method for building a 3D image. A 3D image with a 3D model is generated from a real scene image. In addition to recording the movement of the 3D model on the time axis, it can also save the time for building the 3D model.

本發明提出一種用於建置三維影像的裝置。所述裝置包括多個影像擷取裝置以及處理裝置。每個影像擷取裝置用以產生影像擷取訊號。所述影像擷取訊號包括圖像資料、深度資料以及時間戳記，且每個影像擷取裝置所產生的影像擷取訊號藉由所述時間戳記而位於相同時間線上。處理裝置用以接收所述影像擷取裝置各自的所述影像擷取訊號。在影像擷取階段，影像擷取裝置擷取所述影像擷取訊號，處理裝置自所述影像擷取裝置接收所述影像擷取訊號。針對每個時間戳記，處理裝置依據所述影像擷取訊號中的所述圖像資料以及所述深度資料以計算點雲資料以及貼圖資料，依據所述點雲資料以及所述貼圖資料建立三維模型。處理裝置將所述時間戳記與所述時間戳記對應的所述三維模型統整為影像串流。The invention proposes a device for constructing a three-dimensional image. The device includes a plurality of image capturing devices and a processing device. Each image capture device is used to generate an image capture signal. The image capture signal includes image data, depth data and time stamps, and the image capture signals generated by each image capture device are located on the same timeline through the time stamps. The processing device is used for receiving the respective image capture signals of the image capture devices. In the image capture stage, the image capture device captures the image capture signal, and the processing device receives the image capture signal from the image capture device. For each time stamp, the processing device calculates point cloud data and texture data according to the image data and the depth data in the image capture signal, and establishes a 3D model based on the point cloud data and the texture data . The processing device integrates the time stamp and the 3D model corresponding to the time stamp into an image stream.

本發明提出一種用於建置三維影像的方法。所述方法包括：配置多個影像擷取裝置，每個影像擷取裝置用以產生影像擷取訊號，所述影像擷取訊號包括圖像資料、深度資料以及時間戳記，且每個影像擷取裝置所產生的所述影像擷取訊號藉由所述時間戳記而位於相同時間線上；在影像擷取階段，獲得來自所述多個影像擷取裝置所擷取的所述影像擷取訊號；針對每個時間戳記，依據所述影像擷取訊號中的所述圖像資料以及所述深度資料以計算一點雲資料以及一貼圖資料，依據所述點雲資料以及所述貼圖資料建立三維模型；以及，將所述時間戳記與所述時間戳記對應的所述三維模型統整為影像串流。The invention proposes a method for constructing a three-dimensional image. The method includes: configuring a plurality of image capture devices, each image capture device is used to generate an image capture signal, the image capture signal includes image data, depth data and time stamp, and each image capture The image capture signals generated by the device are located on the same timeline by the time stamp; in the image capture phase, the image capture signals captured from the plurality of image capture devices are obtained; for For each time stamp, calculate a point cloud data and a texture data according to the image data and the depth data in the image capture signal, and build a 3D model according to the point cloud data and the texture data; and , integrating the time stamp and the 3D model corresponding to the time stamp into an image stream.

基於上述，本發明提供一種用於建置三維影像的裝置與方法，藉由多個影像擷取裝置擷取實景影像，並依據實景影像來估算點雲、貼圖等向量資料，藉此建立時間軸上每個時間點的3D模型，產生具備3D模型的3D影像。因此，本發明藉由擷取實景影像而直接建立3D模型以及具備前述3D模型的3D影像，可記錄3D模型在時間軸上的動作，並節省3D模型的建置時間。再者，由於3D影像具備3D模型的動態動作，更有利於估算或產生與前述3D模型對應的骨架資訊，藉以達到利用實景影像即可實現3D模型與其骨架的建立，降低3D模型的建置成本。Based on the above, the present invention provides a device and method for constructing a 3D image. Real-scene images are captured by multiple image capture devices, and vector data such as point clouds and textures are estimated based on the real-scene images, thereby establishing a time axis. 3D model at each time point above, and generate a 3D image with the 3D model. Therefore, the present invention directly creates a 3D model and a 3D image with the aforementioned 3D model by capturing real-scene images, which can record the movement of the 3D model on the time axis and save the time for building the 3D model. Furthermore, since the 3D image has the dynamic action of the 3D model, it is more conducive to estimating or generating the skeleton information corresponding to the aforementioned 3D model, so as to realize the establishment of the 3D model and its skeleton by using the real scene image, and reduce the construction cost of the 3D model .

圖1是依照本發明一實施例的一種用於建置三維影像的裝置100的方塊圖。用於建置三維影像的裝置100主要包括多個影像擷取裝置110-1~110-n（n為大於等於2個的正整數）以及處理裝置120。每個影像擷取裝置110-1~110-n用以產生影像擷取訊號S110-1~S110-n。本實施例的影像擷取裝置110-n是以深度攝影機作為實施例。處理裝置120可以是用於影像擷取裝置110-1~110-n各自的影像擷取訊號S110-1~S110-n，並處理這些影像擷取訊號S110-1~S110-n以產生具備時間軸的3D模型的影像串流190。FIG. 1 is a block diagram of an apparatus 100 for constructing a 3D image according to an embodiment of the present invention. The device 100 for constructing a 3D image mainly includes a plurality of image capture devices 110 - 1 - 110 -n (n is a positive integer greater than or equal to 2) and a processing device 120 . Each image capture device 110-1~110-n is used for generating image capture signals S110-1~S110-n. The image capturing device 110-n of this embodiment is a depth camera as an example. The processing device 120 can be used for the respective image capture signals S110-1~S110-n of the image capture devices 110-1~110-n, and process these image capture signals S110-1~S110-n to generate An image stream 190 of the 3D model of the shaft.

影像擷取裝置110-1~110-n包括時間同步模組112-1~112-n以及收發模組114-1~114-n。時間同步模組112-1~112-n用於產生時間戳記。處理裝置120可藉由影像擷取訊號S110-1~S110-n中的時間戳記得知每個圖像資料以及深度資料是何時擷取的。收發模組114-1~114-n可透過網路、匯流排或相應的通訊協定以將影像擷取裝置110-1~110-n所產生的影像擷取訊號S110-1~S110-n發送至處理裝置120。The image capture devices 110-1~110-n include time synchronization modules 112-1~112-n and transceiver modules 114-1~114-n. The time synchronization modules 112-1~112-n are used to generate time stamps. The processing device 120 can know when each image data and depth data are captured through the time stamps in the image capture signals S110-1˜S110-n. The transceiver modules 114-1~114-n can transmit the image capture signals S110-1~S110-n generated by the image capture devices 110-1~110-n through the network, bus or corresponding communication protocols. to the processing device 120.

處理裝置120可以是作為控制前述影像擷取裝置110-1~110-n的主控台，或是位於雲端的伺服器。本實施例的處理裝置120可以包括通訊模組122、影像處理模組124以及中央處理器126。通訊模組122與前述收發模組114-1~114-n通訊且接收前述影像擷取訊號S110-1~S110-n。影像處理模組124與中央處理器126可處理前述影像擷取訊號S110-1~S110-n。應用本實施例者可依其需求實現前述模組（例如且不限於，時間同步模組112-1~112-n、收發模組114-1~114-n、通訊模組122以及影像處理模組124），例如，以中央處理器、圖形處理器、積體電路、韌體設備…等方式實現前述模組的對應功能。The processing device 120 can be used as a main console for controlling the aforementioned image capture devices 110-1~110-n, or a server located in the cloud. The processing device 120 of this embodiment may include a communication module 122 , an image processing module 124 and a central processing unit 126 . The communication module 122 communicates with the aforementioned transceiver modules 114-1~114-n and receives the aforementioned image capture signals S110-1~S110-n. The image processing module 124 and the CPU 126 can process the aforementioned image capture signals S110-1˜S110-n. Those who apply this embodiment can implement the aforementioned modules according to their needs (for example and not limited to, time synchronization modules 112-1~112-n, transceiver modules 114-1~114-n, communication modules 122 and image processing modules Group 124), for example, realize the corresponding functions of the aforementioned modules in the form of central processing unit, graphics processing unit, integrated circuit, firmware device, etc.

本實施例的影像擷取訊號S110-1~S110-n除了包括圖像資料以及前述圖像資料對應的深度資料以外，還包括時間戳記。時間戳記用於記錄擷取此圖像資料的時間點資訊。每一影像擷取裝置110-1~110-n所產生的影像擷取訊號S110-1~S110-n藉由對應的時間戳記而位於相同時間線上。The image capture signals S110-1˜S110-n in this embodiment include not only image data and depth data corresponding to the aforementioned image data, but also time stamps. The time stamp is used to record the time point information of capturing the image data. The image capture signals S110-1-S110-n generated by each of the image capture devices 110-1-110-n are located on the same timeline with corresponding time stamps.

詳細來說，影像擷取裝置110-1~110-n包括時間同步模組112-1~112-n以及收發模組114-1~114-n。時間同步模組112-1~112-n用於產生時間戳記。處理裝置120可藉由影像擷取訊號S110-1~S110-n中的時間戳記得知每個圖像資料以及深度資料是何時擷取的。例如，在初始化階段，處理裝置120控制每一影像擷取裝置110-1~110-n以將每一時間同步模組112-1~112-n同步化，也就是，使這些時間同步模組112-1~112-n所產生的時間戳記相互同步。本實施例的收發模組114-1~114-n可透過網路、匯流排或相應的通訊協定以將影像擷取裝置110-1~110-n所產生的影像擷取訊號S110-1~S110-n發送至處理裝置120。In detail, the image capture devices 110-1~110-n include time synchronization modules 112-1~112-n and transceiver modules 114-1~114-n. The time synchronization modules 112-1~112-n are used to generate time stamps. The processing device 120 can know when each image data and depth data are captured through the time stamps in the image capture signals S110-1˜S110-n. For example, in the initialization stage, the processing device 120 controls each image capture device 110-1~110-n to synchronize each time synchronization module 112-1~112-n, that is, to make these time synchronization modules The time stamps generated by 112-1~112-n are mutually synchronized. The transceiver modules 114-1~114-n of this embodiment can transmit the image capture signals S110-1~110-1~110-n generated by the image capture devices 110-1~110-n through a network, a bus or a corresponding communication protocol. S110-n is sent to the processing device 120 .

圖2是依照本發明一實施例的一種用於建置三維影像的方法200的流程圖。圖2所述方法200採用圖1中所述用於建置三維影像的裝置100來實現，且方法200中各步驟係由處理裝置120控制與執行。方法200包括步驟S205至S230。於步驟S205中，在圖1之用於建置三維影像的裝置100的初始化階段，圖1之處理裝置120獲得影像擷取裝置110-1~110-n各自的影像擷取訊號S110-1~S110-n，以獲得初始化背景資料以及其對應的初始化深度資料。藉此，於後續的影像擷取階段（步驟S210至步驟S230），圖1中之處理裝置120可在物件並未放入前先行擷取場景中的背景資訊。本實施例的初始化階段還可包括：圖1中之時間同步模組112-1~112-n的同步化、影像擷取裝置110-1~110-n的相對位置校正…等操作，將於後續實施例詳細說明。FIG. 2 is a flowchart of a method 200 for constructing a 3D image according to an embodiment of the present invention. The method 200 shown in FIG. 2 is implemented by using the device 100 for building a three-dimensional image shown in FIG. 1 , and each step in the method 200 is controlled and executed by the processing device 120 . The method 200 includes steps S205 to S230. In step S205, in the initialization stage of the device 100 for constructing a three-dimensional image in FIG. 1, the processing device 120 in FIG. 1 obtains image capture signals S110-1~ S110-n, to obtain the initialized background data and the corresponding initialized depth data. Thereby, in the subsequent image capture stage (step S210 to step S230 ), the processing device 120 in FIG. 1 can capture the background information in the scene before the object is placed. The initialization stage of this embodiment may also include: the synchronization of the time synchronization modules 112-1~112-n in FIG. 1, the relative position correction of the image capture devices 110-1~110-n...etc. The subsequent examples will be described in detail.

在影像擷取階段的步驟S210，影像擷取裝置110-1~110-n擷取影像擷取訊號S110-1~S110-n，且圖1中之處理裝置120自影像擷取裝置110-1~110-n接收影像擷取訊號S110-1~S110-n。In step S210 of the image capture stage, the image capture devices 110-1~110-n capture image capture signals S110-1~S110-n, and the processing device 120 in FIG. ~110-n receives image capturing signals S110-1~S110-n.

在所述影像擷取階段的步驟S220，針對每個時間戳記，圖1中之處理裝置120依據影像擷取訊號S110-1~S110-n中與前述時間戳記對應的圖像資料以及深度資料以計算點雲資料以及貼圖資料，並且依據前述點雲資料以及前述貼圖資料建立3D模型。在所述影像擷取階段的步驟S230，圖1中之處理裝置120將這些時間戳記與這些時間戳記對應的3D模型統整為影像串流。因此，本發明實施例便藉由實景影像產生具備3D模型的3D影像，除了可以記錄3D模型在時間軸上的動作外，還可節省3D模型的建置時間。In the step S220 of the image capture phase, for each time stamp, the processing device 120 in FIG. Calculate point cloud data and texture data, and build a 3D model based on the aforementioned point cloud data and the aforementioned texture data. In the step S230 of the image capture phase, the processing device 120 in FIG. 1 integrates the time stamps and the 3D models corresponding to the time stamps into an image stream. Therefore, in the embodiment of the present invention, a 3D image with a 3D model is generated from a real-scene image, which not only can record the movement of the 3D model on the time axis, but also saves the construction time of the 3D model.

圖3是依照本發明一實施例中配置多個影像擷取裝置110-1~110-8的示意圖。圖4與圖5用以呈現圖3中影像擷取裝置所擷取的圖像資料的示意圖。本實施例的影像擷取裝置分別從多個角度（如，圖3中(A)部分所示的影像擷取裝置110-1～110-5）和/或多個高度（如，圖3中(B)部分所示的影像擷取裝置110-6～110-8）對物件310（以人體呈現）擷取影像，從而產生圖1之影像擷取訊號S110-1~S110-8。圖3中(A)部分所呈現的XY平面用以呈現影像擷取裝置110-1～110-5分別從不同角度對物件310擷取影像。圖3中(B)部分所呈現的XZ平面用以呈現影像擷取裝置110-6～110-8分別從不同高度對物件310擷取影像，例如，影像擷取裝置110-6的高度高於影像擷取裝置110-7，且影像擷取裝置110-7的高度高於影像擷取裝置110-8。應用本實施例者可依其需求以相對於物件310而言不同的角度、或以不同的高度、或以不同角度以及高度設置多個影像擷取裝置110-n。若同時以不同角度以及高度設置多個影像擷取裝置（例如，同時實現圖3中(A)部分與(B)部分），則會於每個高度設置5個影像擷取裝置且以總共15個影像擷取裝置來實現本發明實施例。FIG. 3 is a schematic diagram of configuring multiple image capture devices 110-1~110-8 according to an embodiment of the present invention. 4 and 5 are schematic diagrams for presenting image data captured by the image capture device in FIG. 3 . The image capture device of this embodiment is viewed from multiple angles (for example, image capture devices 110-1 to 110-5 shown in part (A) of FIG. 3 ) and/or multiple heights (for example, in FIG. 3 The image capture devices 110 - 6 - 110 - 8 shown in part (B) capture images of the object 310 (shown as a human body), thereby generating image capture signals S110 - 1 - S110 - 8 in FIG. 1 . The XY plane shown in part (A) of FIG. 3 is used to show that the image capture devices 110 - 1 - 110 - 5 respectively capture images of the object 310 from different angles. The XZ plane shown in part (B) of FIG. 3 is used to show that the image capture devices 110-6-110-8 capture images of the object 310 from different heights. For example, the height of the image capture device 110-6 is higher than The image capture device 110-7, and the height of the image capture device 110-7 is higher than the image capture device 110-8. Those who apply this embodiment can set multiple image capture devices 110 - n at different angles relative to the object 310 , or at different heights, or at different angles and heights according to their requirements. If multiple image capture devices are installed at different angles and heights at the same time (for example, to realize part (A) and part (B) in Figure 3 at the same time), 5 image capture devices will be set at each height and a total of 15 An image capture device is used to realize the embodiment of the present invention.

詳細來說，圖3中(A)部分所示的影像擷取裝置110-1~110-5是以五等分的方式均設置在物件310(人體)的四周，相鄰兩個影像擷取裝置之間的角度差為72度角。因此，相鄰的影像擷取裝置在同一時間點所擷取影像擷取訊號的圖像資料會有些許重疊或相近似。圖4是圖3中(A)部分相鄰影像擷取裝置110-1、110-2、110-5中圖像資料430-1、430-2、430-5以及將這些圖像資料拼接的示意圖。在此以影像擷取裝置110-1為例來說明，如何利用與影像擷取裝置110-1相鄰的影像擷取裝置110-2、110-5來產生屬於影像擷取裝置110-1對應視角所擷取到的視角圖像資料。影像擷取裝置110-1所擷取的圖像資料430-1有部分區域與影像擷取裝置110-2、110-5所擷取的圖像資料部分重疊。因此，本實施例圖1中之處理裝置120的影像處理模組124可利用影像縫合演算法將這些具有部分重疊的圖像資料430-1、430-2、430-5以水平銜接的方式整合成屬於影像擷取裝置110-1對應視角所擷取到的視角圖像資料440-1。前述「影像縫合演算法」可將對應不同視角的多張影像進行裁切、拼貼，以縫合成廣視野影像。請參閱公告第TWI672670號專利說明書。圖4的實施例雖然呈現出影像擷取裝置110-1、110-2、110-5所擷取的圖像資料可以整合為影像擷取裝置110-1對應視角的視角圖像資料440-1，然而，應用本實施例亦可知悉，當其他影像擷取裝置（如，影像擷取裝置110-3、110-4）所擷取的圖像資料亦可以與影像擷取裝置110-1所擷取的圖像資料有相互重疊處，本實施例圖1中之處理裝置120亦可使用前述影像縫合演算法將影像擷取裝置110-1~110-5所擷取的圖像資料整合為影像擷取裝置110-1對應視角的視角圖像資料。並且，應用本實施例者藉由前述操作而分別產生每一影像擷取裝置110-1~110-5對應視角的視角圖像資料。In detail, the image capture devices 110-1~110-5 shown in part (A) of FIG. The angular difference between the devices is 72 degrees. Therefore, the image data of the image capture signals captured by adjacent image capture devices at the same time point may overlap or be similar to some extent. Fig. 4 is the image data 430-1, 430-2, 430-5 in the adjacent image capture devices 110-1, 110-2, 110-5 in part (A) of Fig. 3 and the splicing of these image data schematic diagram. Here, the image capture device 110-1 is taken as an example to illustrate how to use the image capture devices 110-2 and 110-5 adjacent to the image capture device 110-1 to generate a corresponding image belonging to the image capture device 110-1. Perspective image data captured by the perspective. Part of the image data 430-1 captured by the image capture device 110-1 overlaps with the image data captured by the image capture devices 110-2 and 110-5. Therefore, the image processing module 124 of the processing device 120 in FIG. 1 in this embodiment can use an image stitching algorithm to integrate these partially overlapping image data 430-1, 430-2, and 430-5 in a horizontally connected manner. The viewing angle image data 440-1 belonging to the corresponding viewing angle captured by the image capturing device 110-1 is formed. The aforementioned "image stitching algorithm" can cut and collage multiple images corresponding to different viewing angles to stitch them into a wide-view image. Please refer to the publication No. TWI672670 patent specification. Although the embodiment in FIG. 4 shows that the image data captured by the image capturing devices 110-1, 110-2, and 110-5 can be integrated into the viewing angle image data 440-1 corresponding to the viewing angle of the image capturing device 110-1. However, it can also be known by applying this embodiment that when the image data captured by other image capture devices (eg, image capture devices 110-3, 110-4) can also be combined with the image data captured by the image capture device 110-1 The image data captured overlaps with each other. The processing device 120 in FIG. The image capture device 110 - 1 corresponds to the viewing angle image data of the viewing angle. Moreover, the person applying this embodiment generates the angle-of-view image data corresponding to each of the image capture devices 110 - 1 - 110 - 5 through the aforementioned operations.

另一方面，圖3中(B)部分所示的影像擷取裝置110-6～110-8是以不同高度的方式設置在物件310附近，因此，相鄰高度的影像擷取裝置在同一時間點所擷取影像擷取訊號的圖像資料會有些許重疊或相近似。如圖5是圖3中(A)部分相鄰影像擷取裝置110-6、110-7、110-8中圖像資料530-6、530-7、530-8以及將這些圖像資料拼接的示意圖。因為影像擷取裝置110-6~110-8在XY平面上皆為相同視角，因此影像擷取裝置110-7所擷取的圖像資料530-7有部分區域與影像擷取裝置110-6、110-8所擷取的圖像資料530-6、530-8部分重疊。因此，本實施例圖1之處理裝置120的影像處理模組124可利用影像縫合演算法(請參閱公告第TWI672670號)將這些具有部分重疊的530-6、530-7、530-8以垂直銜接的方式整合成屬於影像擷取裝置110-6~110-8對應視角所擷取到的視角圖像資料540-2。On the other hand, the image capture devices 110-6~110-8 shown in part (B) of FIG. 3 are arranged near the object 310 at different heights. Click the captured image to capture the image data of the signal to be slightly overlapped or similar. Figure 5 is the image data 530-6, 530-7, 530-8 in the adjacent image capture devices 110-6, 110-7, 110-8 in (A) of Figure 3 and the splicing of these image data schematic diagram. Because the image capture devices 110-6~110-8 all have the same viewing angle on the XY plane, the image data 530-7 captured by the image capture device 110-7 has a part of the area that is the same as that of the image capture device 110-6. , 110-8 The captured image data 530-6, 530-8 are partially overlapped. Therefore, the image processing module 124 of the processing device 120 in FIG. 1 of this embodiment can use an image stitching algorithm (please refer to the announcement No. TWI672670) to vertically align these partially overlapping 530-6, 530-7, and 530-8 The image data 540-2 of the corresponding viewing angles captured by the image capturing devices 110-6~110-8 are integrated in a concatenated manner.

本實施例圖1中之處理裝置120可藉由圖3影像擷取裝置110-1~110-8各自擷取的圖像資料與對應的深度資料來產生3D模型所需、部分的點雲資料，並且將這些點雲資料銜接成完整的3D模型，此3D模型用以描述物件310。或是，圖1中之處理裝置120可藉由影像擷取裝置110-1~110-5的對應視角的視角圖像資料（如，圖4視角圖像資料440-1、圖5視角圖像資料540-2）與對應的深度資料來產生3D模型所需、部分的點雲資料，並且將這些點雲資料銜接成完整且用以描述物件310的3D模型。或是，圖1中之處理裝置120可藉由影像擷取裝置110-1~110-5的對應視角的視角圖像資料整合成相對於物件310的環景影像，藉由此環景影像與對應的深度資料來產生3D模型所需的點雲資料，從而產生完整且用以描述物件310的3D模型。換句話說，應用本實施例者可依其需求以依據影像擷取裝置110-1~110-8各自擷取的圖像資料以及深度資料來計算或估算描述物件310的部分點雲資料，從而整合前述點雲資料來產生物件310的3D模型。並且，除了計算點雲資料以外，本實施例圖1中之處理裝置120還會藉由依據影像擷取裝置110-1~110-8各自擷取的圖像資料以及深度資料來計算物件310上的紋理材質，從而產生相應的貼圖資料，藉以建立同時具有點雲資料以及貼圖資料的3D模型。In this embodiment, the processing device 120 in FIG. 1 can use the image data captured by the image capture devices 110-1~110-8 in FIG. 3 and the corresponding depth data to generate part of the point cloud data required by the 3D model. , and connect these point cloud data into a complete 3D model, and the 3D model is used to describe the object 310 . Alternatively, the processing device 120 in FIG. 1 can use the perspective image data corresponding to the perspectives of the image capture devices 110-1~110-5 (for example, the perspective image data 440-1 in FIG. 4, the perspective image in FIG. 5 Data 540 - 2 ) and corresponding depth data to generate part of the point cloud data required by the 3D model, and connect these point cloud data into a complete 3D model for describing the object 310 . Alternatively, the processing device 120 in FIG. 1 can integrate the viewing angle image data corresponding to the viewing angles of the image capturing devices 110-1~110-5 into a surrounding view image relative to the object 310, by combining the surrounding view image with the The corresponding depth data is used to generate the point cloud data required by the 3D model, thereby generating a complete 3D model for describing the object 310 . In other words, the person applying this embodiment can calculate or estimate part of the point cloud data describing the object 310 based on the image data and depth data captured by the image capture devices 110-1~110-8 according to their needs, so that The aforementioned point cloud data are integrated to generate a 3D model of the object 310 . Moreover, in addition to calculating the point cloud data, the processing device 120 in FIG. 1 of this embodiment will also calculate the position on the object 310 based on the image data and depth data captured by the image capture devices 110-1~110-8 respectively. Texture material, so as to generate corresponding texture data, so as to create a 3D model with both point cloud data and texture data.

圖6是依照本發明一實施例圖3中(A)部分對於影像擷取裝置110-1~110-5的位置配置來呈現物件610的3D模型的示意圖。圖6中(A)部分相近似於圖3中(A)部分影像擷取裝置110-1~110-5的相對位置配置。圖6中(B)部分與(C)部分分別呈現影像擷取裝置110-1、110-5所擷取到的圖像資料630-1、630-5作為舉例。圖6中(D)部分呈現經圖1之處理裝置120計算後獲得點雲資料與貼圖資料並加以整合後的3D模型。FIG. 6 is a schematic diagram showing a 3D model of an object 610 according to the position configuration of the image capture devices 110 - 1 - 110 - 5 in part (A) of FIG. 3 according to an embodiment of the present invention. Part (A) in FIG. 6 is similar to the relative position configuration of the image capture devices 110-1~110-5 in part (A) in FIG. 3 . Parts (B) and (C) of FIG. 6 present image data 630 - 1 , 630 - 5 captured by the image capture devices 110 - 1 , 110 - 5 respectively as an example. Part (D) of FIG. 6 presents the 3D model obtained and integrated after calculation by the processing device 120 of FIG. 1 .

圖7是依照本發明一實施例的點雲資料與人體骨架進行匹配的示意圖。當圖1中之處理裝置120計算得到由點雲資料710整合成的3D模型後，便可依據點雲資料710透過產生與3D模型對應的骨架資料720。前述骨架匹配技術請參閱論文『Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields, IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume: 43, Issue: 1, Jan. 1 2021』。所述骨架匹配技術是可利用點雲資料710判讀出人體骨架的位置，且由於本發明圖1之處理裝置120產生具時間軸的三維模型的影像串流，此3D影像串流具備3D模型的動態動作，更有利於估算或產生與前述3D模型對應的骨架資訊720，且此骨架資訊720將具備較佳的準確性。Fig. 7 is a schematic diagram of matching point cloud data with a human skeleton according to an embodiment of the present invention. After the processing device 120 in FIG. 1 calculates the 3D model integrated from the point cloud data 710 , it can generate the skeleton data 720 corresponding to the 3D model according to the point cloud data 710 . For the aforementioned skeleton matching technology, please refer to the paper "Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields, IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume: 43, Issue: 1, Jan. 1 2021". The skeleton matching technology can use the point cloud data 710 to judge the position of the human skeleton, and since the processing device 120 of the present invention in FIG. Dynamic actions are more conducive to estimating or generating the skeleton information 720 corresponding to the aforementioned 3D model, and the skeleton information 720 will have better accuracy.

圖8是依照本發明一實施例的用於建置三維影像的方法800的細節流程圖。圖8方法800由圖1之用於建置三維影像的裝置100實現，且圖8方法800係為圖2之用於建置三維影像的方法200中各步驟的細節。請參照圖8與圖2，圖8步驟S201為初始化階段，包括步驟S802、S804以及S806。FIG. 8 is a detailed flowchart of a method 800 for constructing a 3D image according to an embodiment of the present invention. The method 800 in FIG. 8 is implemented by the device 100 for building a 3D image in FIG. 1 , and the method 800 in FIG. 8 is the details of each step in the method 200 for building a 3D image in FIG. 2 . Please refer to FIG. 8 and FIG. 2 , step S201 in FIG. 8 is an initialization phase, including steps S802 , S804 and S806 .

於步驟S802，圖1之處理裝置120透過影像擷取裝置110-1~110-n各自擷取的影像擷取訊號S110-1~S110-n來判斷與校正這些影像擷取裝置110-1~110-n的相對位置。在一實施例中，可用手動方式或以電子驅動方式調整這些影像擷取裝置110-1~110-n的相對位置。前述電子驅動方式例如是在這些影像擷取裝置110-1~110-n裝設可調式基座，使圖1之處理裝置120自動地調整影像擷取裝置110-1~110-n的視角與相對位置。In step S802, the processing device 120 in FIG. 1 determines and corrects the image capture devices 110-1~S110-n through the image capture signals S110-1~S110-n respectively captured by the image capture devices 110-1~110-n. The relative position of 110-n. In one embodiment, the relative positions of the image capture devices 110 - 1 - 110 - n can be adjusted manually or electronically. The aforementioned electronic driving method is, for example, installing adjustable bases on these image capture devices 110-1~110-n, so that the processing device 120 in FIG. relative position.

於步驟S804，類似圖2之步驟S205，圖1之處理裝置120透過獲得影像擷取裝置110-1~110-n各自的影像擷取訊號S110-1~S110-n，以獲得初始化背景資料以及其對應的初始化深度資料。藉此，於後續的影像擷取階段，圖之1處理裝置120可在物件並未放入前先行擷取場景中的背景資訊。In step S804, similar to step S205 in FIG. 2, the processing device 120 in FIG. 1 acquires the initialization background data and Its corresponding initialization depth data. In this way, in the subsequent image capture stage, the processing device 120 in FIG. 1 can capture the background information of the scene before the object is placed.

於步驟S806，圖1之處理裝置120控制影像擷取裝置110-1~110-n以將時間同步模組112-1~112-n同步化，使這些時間同步模組112-n所產生的時間戳記相互同步。時間同步模組112-1~112-n用以產生影像擷取訊號S112-1~S112-n中的時間戳記。In step S806, the processing device 120 of FIG. 1 controls the image capture devices 110-1~110-n to synchronize the time synchronization modules 112-1~112-n, so that the time synchronization modules 112-n generate The timestamps are synchronized with each other. The time synchronization modules 112-1~112-n are used for generating time stamps in the image capture signals S112-1~S112-n.

在完成初始化階段（步驟S201）後，圖1之用於建置三維影像的裝置100便可進入影像擷取階段。於步驟S210，在影像擷取階段，圖1之處理裝置120控制影像擷取裝置110-1~110-n擷取影像擷取訊號S110-1~S110-n，並自影像擷取裝置110-1~110-n接收影像擷取訊號S110-1~S110-n。圖2之步驟S220中則具備多個步驟S822、S824、S826以及S828。於步驟S822中，在處理裝置120中拼接影像擷取訊號S110-1~S110-n中的圖像資料以產生對應於影像擷取裝置110-1~110-n的視角的視角圖像資料，如前述圖3至圖5的相應實施例所述。於步驟S824中，在處理裝置120中銜接前述視角圖像資料成為環景影像，如前述圖3至圖5的相應實施例所述。於步驟S826中，圖1之處理裝置120依據前述視角圖像資料以及環景影像而產生點雲資料以及貼圖資料。本實施例可先行依據視角圖像資料以及環景影像而產生點雲資料，然後再利用前述點雲資料產生前述貼圖資料。本實施例可依據視角圖像資料進行三角網格化而產生前述貼圖資料。於步驟S828中，圖1之處理裝置120依據點雲資料以及貼圖資訊建立每個時間戳記對應的三維模型。After completing the initialization phase (step S201 ), the device 100 for constructing a 3D image in FIG. 1 can enter into an image capture phase. In step S210, in the image capture phase, the processing device 120 in FIG. 1 controls the image capture devices 110-1~110-n to capture image capture signals S110-1~S110-n, and transmits the signals from the image capture device 110- 1~110-n receive image capture signals S110-1~S110-n. Step S220 in FIG. 2 includes multiple steps S822 , S824 , S826 and S828 . In step S822, splicing the image data in the image capture signals S110-1~S110-n in the processing device 120 to generate viewing angle image data corresponding to the viewing angles of the image capturing devices 110-1~110-n, It is as described in the corresponding embodiments of the aforementioned FIGS. 3 to 5 . In step S824 , the aforementioned perspective image data are concatenated in the processing device 120 to form a surround view image, as described in the corresponding embodiments of FIGS. 3 to 5 . In step S826 , the processing device 120 in FIG. 1 generates point cloud data and texture data according to the aforementioned perspective image data and surrounding view image. In this embodiment, the point cloud data can be generated firstly according to the perspective image data and the surrounding view image, and then the aforementioned texture data can be generated by using the aforementioned point cloud data. In this embodiment, the aforementioned texture data can be generated by performing triangular meshing according to the perspective image data. In step S828, the processing device 120 in FIG. 1 creates a 3D model corresponding to each time stamp according to the point cloud data and texture information.

在完成步驟S220後，表示每個時間戳記對應的三維模型皆已建立，因此於步驟S230中，圖1之處理裝置120將這些時間戳記與這些時間戳記對應的3D模型統整為影像串流。再者，於步驟S840中，圖1之處理裝置120還可依據前述點雲資料以及貼圖資料產生與3D模型對應的骨架資料，如前述圖7實施例所述。After step S220 is completed, it means that the 3D models corresponding to each time stamp have been established. Therefore, in step S230, the processing device 120 in FIG. 1 integrates these time stamps and the 3D models corresponding to these time stamps into an image stream. Furthermore, in step S840, the processing device 120 in FIG. 1 can also generate skeleton data corresponding to the 3D model according to the aforementioned point cloud data and texture data, as described in the embodiment of FIG. 7 above.

因此，本發明實施例會在第一個時間戳記產生與此時間戳記相對應的3D模型，在下一個時間戳記（也就是，在時間軸上相對於第一個時間戳記的下一個時間戳記，可能為第一個時間戳記經過特定時間區段（如，一秒或零點幾秒後的時間點）後的下個時間戳記）產生與此時間戳記相對應的3D模型，從而逐步完成每個時間戳記對應的3D模型。因此，在每個時間戳記上皆會產生對應的3D模型，從而在一時間軸中具備動態的3D模型，實現錄製3D動畫的功能。Therefore, the embodiment of the present invention will generate a 3D model corresponding to this timestamp at the first timestamp, and at the next timestamp (that is, the next timestamp relative to the first timestamp on the time axis, which may be The first timestamp passes through a specific time period (for example, the next timestamp after one second or a few tenths of a second) to generate a 3D model corresponding to this timestamp, thereby gradually completing each timestamp correspondence 3D model. Therefore, a corresponding 3D model will be generated at each time stamp, thereby having a dynamic 3D model in a time axis and realizing the function of recording 3D animation.

綜上所述，本發明實施例藉由多個影像擷取裝置擷取實景影像，並依據實景影像來估算點雲、貼圖等向量資料，藉此建立時間軸上每個時間點的3D模型，產生具備3D模型的3D影像。因此，本發明藉由擷取實景影像而直接建立3D模型以及具備前述3D模型的3D影像，可記錄3D模型在時間軸上的動作，並節省3D模型的建置時間。再者，由於3D影像具備3D模型的動態動作，更有利於估算或產生與前述3D模型對應的骨架資訊，藉以達到利用實景影像即可實現3D模型與其骨架的建立，降低3D模型的建置成本。In summary, the embodiment of the present invention uses multiple image capture devices to capture real-scene images, and estimates vector data such as point clouds and textures based on the real-scene images, thereby establishing a 3D model at each time point on the time axis, Generate 3D images with 3D models. Therefore, the present invention directly creates a 3D model and a 3D image with the aforementioned 3D model by capturing real-scene images, which can record the movement of the 3D model on the time axis and save the time for building the 3D model. Furthermore, since the 3D image has the dynamic action of the 3D model, it is more conducive to estimating or generating the skeleton information corresponding to the aforementioned 3D model, so as to realize the establishment of the 3D model and its skeleton by using the real scene image, and reduce the construction cost of the 3D model .

100:用於建置三維影像的裝置 110-1~110-n:影像擷取裝置 112-1~112-n:時間同步模組 114-1~114-n:收發模組 120:處理裝置 122:通訊模組 124:影像處理模組 126:中央處理器 190:影像串流 310:物件 430-1~430-5、530-6~530-8、630-1、630-5:圖像資料 440-1、540-2:視角圖像資料 710:點雲資料 720:骨架資料 S110-1~S110-n:影像擷取訊號 S201～S230、S802~S840:用於建置三維影像的方法的各步驟 100: Apparatus for constructing three-dimensional images 110-1~110-n: image capture device 112-1~112-n: Time synchronization module 114-1~114-n: transceiver module 120: processing device 122: Communication module 124: Image processing module 126: CPU 190: Video streaming 310: object 430-1~430-5, 530-6~530-8, 630-1, 630-5: image data 440-1, 540-2: perspective image data 710: Point cloud data 720: Skeleton data S110-1~S110-n: Image capture signal S201-S230, S802-S840: each step of the method for building a three-dimensional image

圖1是依照本發明一實施例的一種用於建置三維影像的裝置\的方塊圖。圖2是依照本發明一實施例的一種用於建置三維影像的方法的流程圖。圖3是依照本發明一實施例中配置多個影像擷取裝置的示意圖。圖4與圖5用以呈現圖3中影像擷取裝置所擷取的圖像資料的示意圖。圖6是依照本發明一實施例圖3中(A)部分對於影像擷取裝置的位置配置來呈現物件的3D模型的示意圖。圖7是依照本發明一實施例的點雲資料與人體骨架進行匹配的示意圖。圖8是依照本發明一實施例的用於建置三維影像的方法的細節流程圖。 FIG. 1 is a block diagram of a device\for constructing a 3D image according to an embodiment of the present invention. FIG. 2 is a flowchart of a method for constructing a 3D image according to an embodiment of the invention. FIG. 3 is a schematic diagram of configuring multiple image capture devices according to an embodiment of the present invention. 4 and 5 are schematic diagrams for presenting image data captured by the image capture device in FIG. 3 . FIG. 6 is a schematic diagram showing a 3D model of an object according to the position configuration of the image capture device in part (A) of FIG. 3 according to an embodiment of the present invention. Fig. 7 is a schematic diagram of matching point cloud data with a human skeleton according to an embodiment of the present invention. FIG. 8 is a detailed flowchart of a method for constructing a 3D image according to an embodiment of the invention.

S205~S230:用於建置三維影像的方法的各步驟 S205~S230: each step of the method for constructing a three-dimensional image

Claims

A device for building a three-dimensional image, comprising: a plurality of image capture devices, each image capture device is used to generate an image capture signal, the image capture signal includes an image data, a depth data and a time stamp by which the image capture signals generated by each image capture device are located on the same timeline; and a processing device for receiving the respective an image capture signal, wherein, in an image capture stage, the plurality of image capture devices capture the image capture signal, the processing device receives the image capture signal from the image capture device, For each time stamp, the processing device calculates point cloud data and a texture data according to the image data and the depth data in the image capture signal, and according to the point cloud data and the texture data building a 3D model, the processing device integrates the time stamp and the 3D model corresponding to the time stamp into an image stream, wherein in an initialization phase, the processing device passes through the image capture device The image capture signals captured respectively are used to determine and correct the relative positions of the plurality of image capture devices.

The device according to claim 1, wherein the processing device generates a skeleton data corresponding to the 3D model according to the point cloud data and the texture data, and the 3D model in the image stream has The corresponding skeleton data.

The device according to claim 1, wherein the image capture device captures images of an object from multiple angles or multiple heights, thereby generating the image capture signal.

The device as described in claim 1, wherein each image capture device includes a time synchronization module and a transceiver module, and the processing device controls these image capture devices to set the time during the initialization phase Synchronization of synchronization modules, wherein the time synchronization module is used to generate the time stamp, and the transceiver module transmits the information generated by the image capture device through a network, a bus or a communication protocol The image capture signal is sent to the processing device, and the processing device includes a communication module, an image processing module and a central processing unit, the communication module communicates with the transceiver module and receives the The image capture signal, the image processing module and the CPU process the image capture signal.

The device according to claim 1, wherein in the initialization stage, the processing device obtains the respective image capture signals of the image capture devices to obtain an initialized background data and a corresponding initialized depth data , wherein the processing device calculates the image data, the point cloud data and the texture data corresponding to the depth data based on the initialized background data and the corresponding initialized depth data.

The device according to claim 1, wherein the processing device concatenates the image data in the image capture signal to generate a viewing angle image data corresponding to the viewing angle of the image capturing device, concatenating the The perspective image data becomes a surround view image, and the point cloud data and the texture map are generated according to the perspective view image data and the surround view image data, so as to establish the 3D model corresponding to each time stamp according to the point cloud data and the texture information.

The device according to claim 1, wherein the processing device concatenates the image data in the image capture signal to generate a view image data corresponding to the view angle of the image capture signal, according to the The perspective image data is triangulated to generate texture data, the point cloud data is generated according to the image data in the image capture signal, and all time stamps corresponding to each time stamp are established according to the point cloud data. 3D model described.

A method for building a three-dimensional image, comprising: configuring a plurality of image capture devices, each image capture device is used to generate an image capture signal, the image capture signal includes an image data, a depth data and a time stamp, and the image capture signals generated by each image capture device are located on the same time line by the time stamp; in an image capture stage, the signals from the plurality of image capture devices are obtained The captured image capture signal; for each time stamp, calculate a point cloud data and a texture data according to the image data and the depth data in the image capture signal, according to the point building a 3D model from the cloud data and the texture data; and integrating the 3D model corresponding to the time stamp into an image stream, wherein each image capture device includes a time synchronization module , and the method further includes: in an initialization phase, controlling the image capture devices to set the time synchronization model Group synchronization, wherein the time synchronization module is used to generate the time stamp.

The method according to claim 8, further comprising: generating a skeleton data corresponding to the 3D model according to the point cloud data and the texture data, wherein the 3D model in the image stream has a corresponding The skeleton data of .

The method according to claim 8, wherein the image capture device captures images of an object from multiple angles or multiple heights, thereby generating the image capture signal.

The method according to claim 8, further comprising: in the initialization stage, judging and correcting the relative positions of the plurality of image capture devices through the image capture signals captured by each of the image capture devices .

The method according to claim 8, further comprising: in the initialization stage, obtaining the respective image capture signals of the image capture devices to obtain an initialized background data and a corresponding initialized depth data, wherein calculating the image data and the point cloud data and the texture data corresponding to the depth data based on the initialized background data and the corresponding initialized depth data.

The method according to claim 8, wherein the point cloud data and the texture map data are calculated according to the image data and the depth data in the image capture signal, and the point cloud data and the texture data are calculated according to the point cloud data and the depth data. The step of creating the 3D model from the texture data includes: splicing the image data in the image capture signal to generate a corresponding image A view image data of the view angle of the captured signal; concatenating the view image data to form a surround view image; generating the point cloud data and the texture map data according to the view view image data and the surround view image ; and establishing the 3D model corresponding to each time stamp according to the point cloud data and the texture information.

The method according to claim 8, wherein the point cloud data and the texture map data are calculated according to the image data and the depth data in the image capture signal, and the point cloud data and the texture data are calculated according to the point cloud data and the depth data. The step of creating the 3D model from the texture data includes: splicing the image data in the image capture signal to generate a viewing angle image data corresponding to the viewing angle of the image capture signal, and according to the performing triangulation on the perspective image data to generate a texture data; and generating the point cloud data according to the image data in the image capture signal, and establishing each time stamp correspondence according to the point cloud data The 3D model of .