TWI757965B

TWI757965B - Deep learning method for augmented reality somatosensory game machine

Info

Publication number: TWI757965B
Application number: TW109139232A
Authority: TW
Inventors: 姚皓瀚; 周乃宏; 黃信霖
Original assignee: 鈊象電子股份有限公司
Priority date: 2020-11-10
Filing date: 2020-11-10
Publication date: 2022-03-11
Also published as: TW202219897A

Abstract

A deep learning method for augmented reality somatosensory game machine includes the following operations: obtaining, by a processor, a depth map according to a depth image and a first mask, in which the depth image and the first mask are generated separately according to the image; inputting, by the processor, the depth map and the image to a neural network model so as to generate a second mask; and generating, by the processor, a matting image according to the second mask and the image.

Description

Deep Learning Method for Augmented Reality Somatosensory Game Machine

本案是有關於一種擴增實境體感遊戲機之深度學習方法及其裝置，特別是有關深度學習神經網路模型的擴增實境體感遊戲機之深度學習方法及其裝置。This case is about a deep learning method and device for an augmented reality somatosensory game machine, especially a deep learning method and device for an augmented reality somatosensory game machine with a deep learning neural network model.

擴增實境體感遊戲機之深度學習方法中，基於深度學習之即時人物去背技術，是現行社群軟體常用的技術，亦有相當數量的去背軟體工具於市面上。例如精確度不佳，導致產生的去背圖像有不正常的破損，或是需要大量效能與運作時間，無法在低性能平台上用於即時去背應用，且可能導致無法同步運行其他重負載應用程式，讓可用性大幅下降。Among the deep learning methods of augmented reality somatosensory game consoles, the real-time character removal technology based on deep learning is a commonly used technology in current social software, and there are also a considerable number of back removal software tools on the market. For example, poor accuracy, resulting in abnormally corrupted backing images, or requiring a lot of performance and operating time, cannot be used for instant backing applications on low-performance platforms, and may cause other heavy loads to run synchronously. app, so usability drops drastically.

本案之一態樣是在提供一種擴增實境體感遊戲機之深度學習方法，包含以下步驟：由處理器依據深度影像以及第一遮罩以取得深度圖，其中深度影像以及第一遮罩係依據影像分別產生；由處理器輸入深度圖以及影像至神經網路模型以產生第二遮罩；以及由處理器依據第二遮罩以及影像產生去背影像。An aspect of the present application is to provide a deep learning method for an augmented reality somatosensory game console, comprising the following steps: obtaining a depth map by a processor according to a depth image and a first mask, wherein the depth image and the first mask are respectively generated according to the image; the processor inputs the depth map and the image to the neural network model to generate a second mask; and the processor generates a background image according to the second mask and the image.

本案之另一態樣是在提供一種擴增實境體感遊戲機之深度學習裝置，包含處理器以及記憶體。處理器用以依據深度影像以及第一遮罩以取得深度圖，輸入深度圖以及影像至神經網路模型以產生第二遮罩，以及依據第二遮罩以及影像產生去背影像，其中深度影像以及遮罩係依據影像分別產生。記憶體用以儲存神經網路模型。Another aspect of the present application is to provide a deep learning device for an augmented reality somatosensory game machine, including a processor and a memory. The processor is used for obtaining a depth map according to the depth image and the first mask, inputting the depth map and the image to the neural network model to generate a second mask, and generating a background image according to the second mask and the image, wherein the depth image and Masks are generated separately from images. Memory is used to store neural network models.

以下揭示提供許多不同實施例或例證用以實施本發明的不同特徵。特殊例證中的元件及配置在以下討論中被用來簡化本案。所討論的任何例證只用來作解說的用途，並不會以任何方式限制本發明或其例證之範圍和意義。The following disclosure provides many different embodiments or illustrations for implementing different features of the invention. The elements and configurations of the particular instances are used to simplify the present case in the following discussion. Any examples discussed are for illustrative purposes only and do not in any way limit the scope and meaning of the invention or its examples.

請參閱第1圖。第1圖係一種擴增實境體感遊戲機之深度學習裝置100的示意圖。如第1圖所繪式，擴增實境體感遊戲機之深度學習裝置100包含處理器110以及記憶體130。處理器110與記憶體130相耦接。於部分實施例中，擴增實境體感遊戲機之深度學習裝置100更包含相機150。相機150與處理器110相耦接。See Figure 1. FIG. 1 is a schematic diagram of a deep learning device 100 of an augmented reality somatosensory game machine. As depicted in FIG. 1 , the deep learning device 100 of the augmented reality somatosensory game machine includes a processor 110 and a memory 130 . The processor 110 is coupled to the memory 130 . In some embodiments, the deep learning device 100 of the augmented reality somatosensory game machine further includes a camera 150 . The camera 150 is coupled to the processor 110 .

請參閱第2圖。第2圖係根據本發明之一些實施例所繪示之一種擴增實境體感遊戲機之深度學習方法200的示意圖。本發明的實施方式不以此為限制。See Figure 2. FIG. 2 is a schematic diagram of a deep learning method 200 of an augmented reality somatosensory game machine according to some embodiments of the present invention. Embodiments of the present invention are not limited thereto.

應注意到，此擴增實境體感遊戲機之深度學習方法200可應用於與第1圖中的擴增實境體感遊戲機之深度學習裝置100的結構相同或相似之系統。而為使敘述簡單，以下將以第2圖為例執行對操作方法敘述，然本發明不以第1圖的應用為限。 It should be noted that the deep learning method 200 of the augmented reality somatosensory game machine can be applied to a system with the same or similar structure as the deep learning apparatus 100 of the augmented reality somatosensory game machine in FIG. 1 . In order to simplify the description, the following will take FIG. 2 as an example to describe the operation method, but the present invention is not limited to the application of FIG. 1 .

需注意的是，於一些實施例中，擴增實境體感遊戲機之深度學習方法200亦可實作為一電腦程式，並儲存於一非暫態電腦可讀取媒體中，而使電腦、電子裝置、或前述如第1圖中的擴增實境體感遊戲機之深度學習裝置100中的處理器110讀取此記錄媒體後執行此一操作方法，處理器可以由一或多個晶片組成。非暫態電腦可讀取記錄媒體可為唯讀記憶體、快閃記憶體、軟碟、硬碟、光碟、隨身碟、磁帶、可由網路存取之資料庫或熟悉此技藝者可輕易思及具有相同功能之非暫態電腦可讀取記錄媒體。 It should be noted that, in some embodiments, the deep learning method 200 of the augmented reality somatosensory game machine can also be implemented as a computer program and stored in a non-transitory computer-readable medium, so that the computer, The electronic device or the processor 110 in the deep learning device 100 of the augmented reality somatosensory game machine in FIG. 1 reads the recording medium and executes the operation method. The processor may be composed of one or more chips. composition. The non-transitory computer-readable recording medium can be read-only memory, flash memory, floppy disk, hard disk, optical disk, pen drive, magnetic tape, a database accessible through a network, or a person skilled in the art can easily think of it. and a non-transitory computer-readable recording medium with the same function.

另外，應瞭解到，在本實施方式中所提及的擴增實境體感遊戲機之深度學習方法的操作，除特別敘明其順序者外，均可依實際需要調整其前後順序，甚至可同時或部分同時執行。 In addition, it should be understood that the operations of the deep learning method of the augmented reality somatosensory game machine mentioned in this embodiment can be adjusted according to actual needs unless the order is specifically stated, or even Can be executed concurrently or partially concurrently.

再者，在不同實施例中，此些操作亦可適應性地增加、置換、及/或省略。 Furthermore, in different embodiments, these operations may be adaptively added, replaced, and/or omitted.

請參閱第2圖。擴增實境體感遊戲機之深度學習方法200包含以下步驟。 See Figure 2. The deep learning method 200 of the augmented reality somatosensory game machine includes the following steps.

於步驟S210：依據深度影像以及第一遮罩以取得深度圖。請一併參閱第1圖，於部分實施例中，步驟S210可由如第1圖中的處理器110所執行。關於步驟S210的詳細操作方式將於以下一併參閱第3圖至第6圖說明。 In step S210 : obtaining a depth map according to the depth image and the first mask. Please refer to FIG. 1 together. In some embodiments, step S210 may be performed by the processor 110 in FIG. 1 . The detailed operation of step S210 will be described below with reference to FIGS. 3 to 6 together.

於部分實施例中，相機150拍攝影像。深度影像係由NUITRACK軟體依據相機150所拍攝的影像所產生。於部分實施例中，NUITRACK軟體係儲存於記憶體130中，由處理器110執行。 In some embodiments, the camera 150 captures images. The depth image is generated by the NUITRACK software according to the image captured by the camera 150 . In some embodiments, the NUITRACK software is stored in the memory 130 and executed by the processor 110 .

請參閱第3圖。第3圖係根據本發明之一些實施例所繪示之一種相機150所拍攝的影像300的示意圖。需注意的是，相機150所拍攝的影像係為連續的影像，第3圖所示之影像300係為連續的影像中的一幀。於部分實施例中，影像300係為RGB彩圖。 See Figure 3. FIG. 3 is a schematic diagram of an image 300 captured by a camera 150 according to some embodiments of the present invention. It should be noted that the images captured by the camera 150 are continuous images, and the image 300 shown in FIG. 3 is one frame of the continuous images. In some embodiments, the image 300 is an RGB color image.

請參閱第4圖。第4圖係根據本發明之一些實施例所繪示之一種第一遮罩400的示意圖。如第4圖所示之第一遮罩400係依據影像300由NUITRACK軟體所產生。由第4圖可看出，由NUITRACK軟體所產生的第一遮罩400包含瑕疵與破漏，直接使用於人物去背的效果不佳。 See Figure 4. FIG. 4 is a schematic diagram of a first mask 400 according to some embodiments of the present invention. The first mask 400 shown in FIG. 4 is generated by the NUITRACK software according to the image 300 . As can be seen from FIG. 4 , the first mask 400 generated by the NUITRACK software contains flaws and leaks, and the effect of directly using it to remove the back of a character is not good.

請參閱第5圖。第5圖係根據本發明之一些實施例所繪示之一種深度影像500的示意圖。請回頭參閱第1圖，於部分實施例中，相機150係為深度相機。深度影像500係為相機150所取得的深度影像的其中一幀。深度影像500與第3圖中的影像300相對應。 See Figure 5. FIG. 5 is a schematic diagram of a depth image 500 according to some embodiments of the present invention. Referring back to FIG. 1, in some embodiments, the camera 150 is a depth camera. The depth image 500 is one frame of the depth image obtained by the camera 150 . The depth image 500 corresponds to the image 300 in FIG. 3 .

請參閱第6圖。第6圖係根據本發明之一些實施例所繪示之一種深度圖600的示意圖。深度圖600係由第1圖中的處理器110依據第5圖中的深度影像500以及第4圖中的第一遮罩400所產生。於部分實施例中，第1圖中的處理器110合成第5圖中的深度影像500以及第4圖中的第一遮罩400以產生深度圖600。以第4圖的第一遮罩400處理第5圖中的深度影像500後，使深度影像500僅留下目標人物的深度資料，以產生深度圖600。 See Figure 6. FIG. 6 is a schematic diagram of a depth map 600 according to some embodiments of the present invention. The depth map 600 is generated by the processor 110 in FIG. 1 according to the depth image 500 in FIG. 5 and the first mask 400 in FIG. 4 . In some embodiments, the processor 110 in FIG. 1 combines the depth image 500 in FIG. 5 and the first mask 400 in FIG. 4 to generate the depth map 600 . After processing the depth image 500 in FIG. 5 with the first mask 400 in FIG. 4 , only the depth data of the target person is left in the depth image 500 to generate the depth image 600 .

請回頭參閱第2圖。於步驟S230中，輸入深度圖以及影像至神經網路模型以產生第二遮罩。請一併參閱第1圖，於部分實施例中，步驟S230可由如第1圖中的處理器110所執行。關於步驟S230的詳細操作方式將於以下一併參閱第7圖至第8圖說明。 Please refer back to Figure 2. In step S230, the depth map and the image are input to the neural network model to generate a second mask. Please refer to FIG. 1 together. In some embodiments, step S230 may be performed by the processor 110 in FIG. 1 . The detailed operation of step S230 will be described below with reference to FIGS. 7 to 8 together.

請參閱第7圖。第7圖係根據本發明之一些實施例所繪示之一種神經網路模型700的示意圖。於步驟S230中，如第1圖中的處理器110將第3圖中的影像300以及第6圖中的深度圖600輸入至神經網路模型700以產生第二遮罩。 See Figure 7. FIG. 7 is a schematic diagram of a neural network model 700 according to some embodiments of the present invention. In step S230 , the processor 110 in FIG. 1 inputs the image 300 in FIG. 3 and the depth map 600 in FIG. 6 to the neural network model 700 to generate a second mask.

如第7圖所繪示，神經網路模型700包含編碼單元710以及解碼單元730。於部分實施例中，於步驟S230中，編碼單元710依據深度圖600以及影像300產生第一資料D1以及第二資料D2。於部分實施例中，神經網路模型700係為DeepLabv3+。 As shown in FIG. 7 , the neural network model 700 includes an encoding unit 710 and a decoding unit 730 . In some embodiments, in step S230 , the encoding unit 710 generates the first data D1 and the second data D2 according to the depth map 600 and the image 300 . In some embodiments, the neural network model 700 is DeepLabv3+.

詳細而言，於部分實施例中，編碼單元710包含MobileNetV2單元712以及ASPP(Atrous Spatial Pyramid Pooling)單元714。MobileNetV2單元712對深度圖600以及影像300進行特徵壓縮操作以產生第一資料D1。另一方面，MobileNetV2單元712對深度圖600以及影像300進行特徵壓縮操作後，再經ASPP單元712對深度圖600以及影像300進行特徵提取操作以產生第二資料D2。 Specifically, in some embodiments, the encoding unit 710 includes a MobileNetV2 unit 712 and an ASPP (Atrous Spatial Pyramid Pooling) unit 714 . The MobileNetV2 unit 712 performs a feature compression operation on the depth map 600 and the image 300 to generate the first data D1. On the other hand, after the MobileNetV2 unit 712 performs feature compression on the depth map 600 and the image 300, the ASPP unit 712 performs feature extraction on the depth map 600 and the image 300 to generate the second data D2.

如第7圖所繪示，MobileNetV2單元712對深度圖600以及影像300進行特徵壓縮操作後，ASPP單元712對深度圖600以及影像300進行多次卷積操作715A至715D以產生多個第一特徵資料C1至C4，此外，ASPP單元712對深度圖600以及影像300進行池化操作718已產生第二特徵資料C5。接著，ASPP單元712將多個第一特徵資料C1至C4以及第二特徵資料C5合成，以產生資料集CC1。 As shown in FIG. 7 , after the MobileNetV2 unit 712 performs feature compression on the depth map 600 and the image 300 , the ASPP unit 712 performs multiple convolution operations 715A to 715D on the depth map 600 and the image 300 to generate a plurality of first features Data C1 to C4, in addition, ASPP unit 712 performs a pooling operation 718 on depth map 600 and image 300 to generate second feature data C5. Next, the ASPP unit 712 synthesizes the plurality of first feature data C1 to C4 and the second feature data C5 to generate a data set CC1.

於ASPP單元712中，卷積操作715A至715D分別包含卷積參數與比例參數。卷積操作715A至715D之間的比例參數互質。 In ASPP unit 712, convolution operations 715A-715D include convolution parameters and scale parameters, respectively. The scale parameters between convolution operations 715A to 715D are coprime.

舉例而言，於部分實施例中，卷積操作715A的卷積參數係為1×1×256。卷積操作715B的卷積參數係為3×3×256，比例參數係為1。卷積操作715C的卷積參數係為3×3×256，比例參數係為2。卷積操作715D的卷積參數係為3×3×256，比例參數係為3。卷積操作715B至715D之間的比例參數互質。 For example, in some embodiments, the convolution parameter of the convolution operation 715A is 1×1×256. The convolution parameter of the convolution operation 715B is 3×3×256, and the scale parameter is 1. The convolution parameter of the convolution operation 715C is 3×3×256, and the scale parameter is 2. Convolution Operation 715D Convolution The parameter system is 3×3×256, and the scale parameter system is 3. The scale parameters between convolution operations 715B to 715D are coprime.

ASPP單元藉由空洞卷積來增加神經網路對每個像素點周圍的感受野，使其不會只認知到線性的特徵，可以大幅改善神經網路推論的結果，使所得遮罩完整度較高。空洞卷積的特色是會依據比例參數(rate)跳格提取特徵，以少量卷積提取較大視野的特徵，若不同卷積操作之間的比例參數共享1以上的公因數，會導致發生格柵效應問題，使部分像素的特徵沒有被提取到。因此，若是不同卷積操作之間的比例參數互質，可改善格柵效應的問題。 The ASPP unit uses atrous convolution to increase the neural network's receptive field around each pixel, so that it does not only recognize linear features, which can greatly improve the results of neural network inference, and make the obtained mask more complete. high. The feature of hole convolution is that it will extract features according to the ratio parameter (rate), and extract features with a large field of view with a small amount of convolution. If the ratio parameters between different convolution operations share a common factor of more than 1, it will cause grid The gate effect problem, so that the features of some pixels are not extracted. Therefore, if the scale parameters between different convolution operations are relatively prime, the problem of grid effect can be improved.

接著，編碼單元710對資料集CC1進行1×1的卷積操作，以產生第二資料D2。 Next, the encoding unit 710 performs a 1×1 convolution operation on the data set CC1 to generate the second data D2.

解碼單元730由編碼單元710接收第一資料D1和第二資料D2後，合成第一資料D1和第二資料D2以產生資料集CC2，解碼單元730並依據資料集CC2以產生第二遮罩。 After receiving the first data D1 and the second data D2 from the encoding unit 710, the decoding unit 730 synthesizes the first data D1 and the second data D2 to generate a data set CC2, and the decoding unit 730 generates a second mask according to the data set CC2.

請參閱第8圖。第8圖係根據本發明之一些實施例所繪示之一種第二遮罩800的示意圖。相較於第4圖中的第一遮罩400，第二遮罩800的準確度更高。 See Figure 8. FIG. 8 is a schematic diagram of a second mask 800 according to some embodiments of the present invention. Compared with the first mask 400 in FIG. 4 , the accuracy of the second mask 800 is higher.

詳細而言，於部分實施例中，解碼單元730對第一資料D1進行1×1的卷積操作，以產生第一特徵資料D1C。解碼單元730對第二資料D2進行參數為4的膨脹操作，以產生第二特徵資料D2C。解碼單元730將第一特徵資料D1C和第二特徵資料D2C合成，以產生資料集CC2。解碼單元730接著對資料集CC2進行3×3的卷積操作後，以產生第三特徵資料D3C，接著對第三特徵資料D3C進行膨脹操作以產生第二遮罩。 Specifically, in some embodiments, the decoding unit 730 performs a 1×1 convolution operation on the first data D1 to generate the first feature data D1C. The decoding unit 730 performs a dilation operation with a parameter of 4 on the second data D2 to generate the second feature data D2C. The decoding unit 730 synthesizes the first feature data D1C and the second feature data D2C to generate a data set CC2. decoding The unit 730 then performs a 3×3 convolution operation on the data set CC2 to generate a third feature data D3C, and then performs a dilation operation on the third feature data D3C to generate a second mask.

於部分實施例中，神經網路模型700包含損失函數。損失函數的運算式如下：

In some embodiments, the neural network model 700 includes a loss function. The formula for the loss function is as follows:

其中

為推論所得的遮罩像素透明度，

為正解的像素透明度，

為常數，以防止透明度皆為0時，神經網路無梯度可更新。 in

is the inferred mask pixel transparency,

is the pixel transparency of the correct solution,

is a constant to prevent the neural network from having no gradients to update when the transparency is all 0.

請回頭參閱第2圖。於步驟S250中，依據第二遮罩以及影像產生去背影像。請一併參閱第1圖，於部分實施例中，步驟S250可由如第1圖中的處理器110所執行。請參閱第9圖。第9圖係根據本發明之一些實施例所繪示之一種去背影像900的示意圖。於部分實施例中，去背影像900即為將如第8圖中所示之第二遮罩800以及如第3圖中所示之影像300合成所產生。 Please refer back to Figure 2. In step S250, a background-removed image is generated according to the second mask and the image. Please refer to FIG. 1 together. In some embodiments, step S250 may be performed by the processor 110 in FIG. 1 . See Figure 9. FIG. 9 is a schematic diagram of a background-removed image 900 according to some embodiments of the present invention. In some embodiments, the background-removed image 900 is generated by combining the second mask 800 shown in FIG. 8 and the image 300 shown in FIG. 3 .

於部分實施例中，處理器250可為伺服器或其他裝置。於部分實施例中，處理器110可以是具有儲存、運算、資料讀取、接收信號或訊息、傳送信號或訊息等功能的伺服器、電路、中央處理單元(central processor unit,CPU)、微處理器(MCU)或其他具有同等功能的裝置。於部分實施例中，相機150可以是具有影像擷取、拍照等功能的電路其他具有同等功能的裝置。於部分實施例中，記憶體130可以是具有儲存功能的元件或類似功能的元件。 In some embodiments, the processor 250 may be a server or other device. In some embodiments, the processor 110 may be a server, a circuit, a central processing unit (CPU), a microprocessor having functions of storing, computing, reading data, receiving signals or messages, and transmitting signals or messages. controller (MCU) or other devices with equivalent functions. In some embodiments, the camera 150 may be capable of capturing images, taking pictures, etc. Functional circuit Other devices with the same function. In some embodiments, the memory 130 may be an element with a storage function or a similar function element.

市面上存在對於人物偵測與分割相對成熟的產品，例如NuiTrack SDK，分割所得遮罩經常有各種瑕疵與破漏，直接使用於人物去背的效果不佳。由上述之實施方式可知，本案之實施例藉由提供一種擴增實境體感遊戲機之深度學習方法及擴增實境其裝置，以神經網路參考NuiTrack SDK等產品所產生的破碎遮罩的資訊，推論出完整的遮罩，即可用以產生對特定人物的正確去背影像。此外，由於DeepLabv3+的網路結構運算量極大，每一次推論皆需要消耗很長時間，因此將神經網路初步提取特徵的網路從Xception_101替換成MobileNetV2，可大幅減少此階段的耗時，推論的速度提升，以動態影像去背的即時應用。 There are relatively mature products for character detection and segmentation on the market, such as NuiTrack SDK, the masks obtained from segmentation often have various flaws and leaks, and the effect of directly using them to remove the backs of characters is not good. It can be seen from the above-mentioned embodiments that the embodiment of this application provides a deep learning method for an augmented reality somatosensory game machine and an augmented reality device thereof, and uses a neural network to refer to the broken masks generated by products such as NuiTrack SDK and so on. information, deduce the complete mask, which can then be used to generate the correct de-backed image for a particular person. In addition, because the network structure of DeepLabv3+ requires a large amount of computation, each inference takes a long time. Therefore, replacing the network for the initial feature extraction of the neural network from Xception_101 to MobileNetV2 can greatly reduce the time-consuming of this stage. Speed boost, instant application of moving images to the back.

綜上所述，本案之實施方式透過神經網路推論所得的遮罩相較原始的分割遮罩準確度上升，最終取得的特定人物去背影像較完整，且在低效能平台上仍可與重負載應用程式同時運用。 To sum up, the mask obtained by the neural network inference in the implementation of this case is more accurate than the original segmentation mask, and the final image of the specific person without the back is more complete, and it can still be compared with the heavyweight on the low-performance platform. Load applications simultaneously.

另外，上述例示包含依序的示範步驟，但該些步驟不必依所顯示的順序被執行。以不同順序執行該些步驟皆在本揭示內容的考量範圍內。在本揭示內容之實施例的精神與範圍內，可視情況增加、取代、變更順序及/或省略該些步驟。 In addition, the above illustrations include sequential exemplary steps, but the steps do not have to be performed in the order shown. It is within the contemplation of this disclosure to perform the steps in a different order. These steps may be added, replaced, changed order and/or omitted as appropriate within the spirit and scope of the embodiments of the present disclosure.

雖然本案已以實施方式揭示如上，然其並非用以限定本案，任何熟習此技藝者，在不脫離本案之精神和範圍內，當可作各種之更動與潤飾，因此本案之保護範圍當視後附之申請專利範圍所界定者為準。 Although this case has been disclosed above in terms of implementation, it is not intended to limit this case. Anyone who is familiar with this technique can make various changes and modifications without departing from the spirit and scope of this case. Therefore, the scope of protection in this case should be considered later. The scope of the attached patent application shall prevail.

100:擴增實境體感遊戲機之深度學習裝置 100: Deep Learning Device for Augmented Reality Somatosensory Game Console

110:處理器 110: Processor

130:記憶體 130: Memory

150:相機 150: Camera

200:擴增實境體感遊戲機之深度學習方法 200: Deep Learning Methods for Augmented Reality Somatosensory Game Consoles

S210至S250:步驟 S210 to S250: Steps

300:影像 300: Video

400:第一遮罩 400: first mask

500:深度影像 500: Deep Image

600:深度圖 600: Depth Map

700:神經網路模型 700: Neural Network Models

710:編碼單元 710: Coding unit

730:解碼單元 730: Decoding unit

712:MobileNetV2單元 712: MobileNetV2 unit

714:ASPP單元 714: ASPP unit

D1,D2:資料 D1, D2: Data

715A至715D:卷積操作 715A to 715D: Convolution Operation

718:池化操作 718: Pooling operation

C1至C5:特徵資料 C1 to C5: Characteristic data

CC1,CC2:資料集 CC1,CC2: Datasets

D1C至D3C:特徵資料 D1C to D3C: Characterization Information

800:第二遮罩 800: Second mask

900:去背影像 900: remove back image

為讓本揭示之上述和其他目的、特徵、優點與實施例能夠更明顯易懂，所附圖式之說明如下：第1圖係一種擴增實境體感遊戲機之深度學習裝置的示意圖；第2圖係根據本發明之一些實施例所繪示之一種擴增實境體感遊戲機之深度學習方法的示意圖；第3圖係根據本發明之一些實施例所繪示之一種相機所拍攝的影像的示意圖。；第4圖係根據本發明之一些實施例所繪示之一種第一遮罩的示意圖；第5圖係根據本發明之一些實施例所繪示之一種深度影像的示意圖；第6圖係根據本發明之一些實施例所繪示之一種深度圖的示意圖；第7圖係根據本發明之一些實施例所繪示之一種神經網路模型的示意圖；第8圖係根據本發明之一些實施例所繪示之一種第二遮罩的示意圖；第9圖係根據本發明之一些實施例所繪示之一種去背影像的示意圖。 In order to make the above and other objects, features, advantages and embodiments of the present disclosure more clearly understood, the accompanying drawings are described as follows: Figure 1 is a schematic diagram of a deep learning device of an augmented reality somatosensory game console; FIG. 2 is a schematic diagram of a deep learning method for an augmented reality somatosensory game console according to some embodiments of the present invention; FIG. 3 is a schematic diagram of an image captured by a camera according to some embodiments of the present invention. ; FIG. 4 is a schematic diagram of a first mask according to some embodiments of the present invention; FIG. 5 is a schematic diagram of a depth image according to some embodiments of the present invention; FIG. 6 is a schematic diagram of a depth map according to some embodiments of the present invention; FIG. 7 is a schematic diagram of a neural network model according to some embodiments of the present invention; FIG. 8 is a schematic diagram of a second mask according to some embodiments of the present invention; FIG. 9 is a schematic diagram of a background-removed image according to some embodiments of the present invention.

S210至S250:步驟 S210 to S250: Steps

Claims

A deep learning method for an augmented reality somatosensory game machine, comprising: obtaining a depth map by a processor according to a depth image and a first mask, wherein the depth image and the first mask are based on an image are generated respectively; the processor inputs the depth map and the image to a neural network model to generate a second mask; and the processor generates a background image according to the second mask and the image; wherein the The neural network model includes an encoding unit and a decoding unit, wherein the encoding unit includes a MobileNetV2 unit and an ASPP unit, wherein the processor inputs the depth map and the image to the neural network model to generate the second mask The mask includes: generating a first data and a second data by the coding unit according to the depth map and the image; and synthesizing the first data and the second data by the decoding unit to generate a first data set, and decoding the first data set to generate the second mask; wherein generating the first data and the second data by the coding unit according to the depth map and the image includes: generating by the MobileNetV2 unit according to the depth map and the image the first data; and the MobileNetV2 unit and the ASPP unit to generate the second data according to the depth map and the image.

The deep learning method for an augmented reality somatosensory game machine according to claim 1, wherein the MobileNetV2 unit and the ASPP unit generate the second data according to the depth map and the image, comprising: according to a plurality of convolution parameters and a plurality of scale parameters to perform a plurality of convolution operations to generate a plurality of first feature data; perform a pooling operation to generate a second feature data; collect the first feature data and the second feature data to generate a second feature a data set; and generating the second data according to the second data set.

The deep learning method for an augmented reality somatosensory game machine according to claim 2, wherein the proportional parameters are mutually prime.

The deep learning method for an augmented reality somatosensory game machine according to claim 1, wherein synthesizing the first data and the second data by the decoding unit to generate the first data set comprises: performing the first data processing on the first data set. The convolution operation is performed to generate a first feature data; the expansion operation is performed on the second data to generate a second feature data; and the first feature data and the second feature data are synthesized.

The depth of the augmented reality somatosensory game machine as described in claim 1 A degree learning method, wherein decoding the first data set to generate the second mask comprises: performing a convolution operation on the first data set to generate a third feature data; and performing a dilation operation on the third feature data to generate the second mask.

The deep learning method for an augmented reality somatosensory game console according to claim 1, further comprising: generating the first mask by a NUITRACK software; and generating the depth image by a camera.

A deep learning device for an augmented reality somatosensory game machine, comprising: a processor for obtaining a depth map according to a depth image and a first mask, and inputting the depth map and an image to a neural network a model to generate a second mask, and generate a back image according to the second mask and the image, wherein the depth image and the mask are respectively generated according to the image; and a memory for storing the nerve A network model; wherein the neural network model includes a MobileNetV2 unit and an ASPP unit, wherein the processor is further configured to generate a first data according to the MobileNetV2 unit, and generate a second data according to the MobileNetV2 unit and the ASPP unit data, and generate the second mask according to the first data and the second data.