TWM596391U

TWM596391U - Situation awareness system

Info

Publication number: TWM596391U
Application number: TW108217162U
Authority: TW
Inventors: 蘇愷宏; 林永祥
Original assignee: 亞達科技股份有限公司
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2020-06-01

Abstract

A situation awareness system includes a server, a first augmented reality (AR) device and a second AR device. A first digital image is extracted by the first AR device. The first AR device receives a class label corresponding to an object to be labeled through a user interface, and uploads the class label and the first digital image to the server which trains a machine learning model accordingly. A second digital image is extracted by the second AR device and a situation object is detected according to the machine learning model. The second AR device displays a graphic object on a transparent display according to the location of the situation object.

Description

Situational awareness system

本新型是有關於一種透過擴增實境裝置自動產生人工智慧應用程式的系統，其中應用程式能分享給其他使用者。 The present invention relates to a system for automatically generating artificial intelligence applications through augmented reality devices, where the applications can be shared with other users.

多數人會做出錯誤的決定是因為缺乏對於情境的認知，而情境認知仰賴個人的經驗，生活習性與天賦，有些具有天賦的人能夠根據本能辨識出特定的情境，例如辨識出特定的物件。現今的技術足以將這些經驗和天賦透過機器學習轉換為數位化的人工智慧模型。然而，大量的標記工作和複雜的模型訓練，使得人工智慧模型產出緩慢，也不易散播。近年來因為圖形處理單元(graph processing unit，GPU)運算能力的大幅提升，人工智慧已然成為最熱門的新興產業。然而現今的人工智慧應用各自為政，沒有系統化和結構性的生態鏈，一般人難以接觸。 Most people make the wrong decision because of a lack of awareness of the situation, which depends on personal experience, life habits and talents. Some talented people can recognize specific situations based on instinct, such as identifying specific objects. Today's technology is sufficient to convert these experiences and talents into digital artificial intelligence models through machine learning. However, a lot of labeling work and complex model training make artificial intelligence models slow to produce and difficult to spread. In recent years, artificial intelligence has become the hottest emerging industry because of the substantial improvement in the computing power of graph processing units (GPUs). However, the application of artificial intelligence today is independent. There is no systematic and structural ecological chain, which is difficult for ordinary people to contact.

本新型的實施例提出一種狀況認知系統，包括伺服器、第一擴增實境裝置與第二擴增實境裝置。第一擴增實境裝置通訊連接至伺服器，第一擴增實境裝置包括第一影像感測器與第一透明顯示器，用以透過第一影像感測器擷取第一數位影像。第二擴增實境裝置，通訊連接至伺服器，第二擴增實境裝置包括第二影像感測器與第二透明顯示器。第一擴增實境裝置透過使用者介面接收關於一待分類物件的類別標籤，並上傳類別標籤與第一數位影像至伺服器。伺服器根據類別標籤與第一數位影像訓練出一機器學習模型。第二擴增實境裝置透過第二影像感測器擷取第二數位影像，第二擴增實境裝置或伺服器根據機器學習模型偵測第二數位影像中的情境物件，第二擴增實境裝置根據情境物件的位置在第二透明顯示器中顯示對應的圖示物件。 An embodiment of the present invention proposes a situational awareness system, including a server, a first augmented reality device, and a second augmented reality device. First augmented reality The device is communicatively connected to the server. The first augmented reality device includes a first image sensor and a first transparent display for capturing the first digital image through the first image sensor. The second augmented reality device is communicatively connected to the server. The second augmented reality device includes a second image sensor and a second transparent display. The first augmented reality device receives the class label about an object to be classified through the user interface, and uploads the class label and the first digital image to the server. The server trains a machine learning model based on the class label and the first digital image. The second augmented reality device captures the second digital image through the second image sensor. The second augmented reality device or server detects the contextual object in the second digital image according to the machine learning model. The second augmented reality device The reality device displays the corresponding illustrated object on the second transparent display according to the position of the situation object.

在一些實施例中，上述的使用者介面包括語音介面或手勢介面。 In some embodiments, the aforementioned user interface includes a voice interface or a gesture interface.

在一些實施例中，第一擴增實境裝置或伺服器用以偵測第一數位影像中的至少一個第一推薦物件。第一擴增實境裝置用以在第一透明顯示器上顯示關於第一推薦物件的邊界框，並透過使用者介面接收來自使用者的指令以調整邊界框的位置與大小。 In some embodiments, the first augmented reality device or server is used to detect at least one first recommended object in the first digital image. The first augmented reality device is used to display a bounding box about the first recommended object on the first transparent display, and receive a command from the user through the user interface to adjust the position and size of the bounding box.

在一些實施例中，第一擴增實境裝置或伺服器是根據第一卷積神經網路來偵測第一推薦物件，第一卷積神經網路用以執行僅一次推論程序以輸出多個邊界框的位置與多個類別信心值，其中邊界框的大小為固定。 In some embodiments, the first augmented reality device or server detects the first recommended object according to the first convolutional neural network. The first convolutional neural network is used to execute the inference process only once to output multiple The position of each bounding box and the confidence values of multiple categories, where the size of the bounding box is fixed.

在一些實施例中，伺服器用以取得多個訓練影像與關於每一個訓練影像的標記資料，其中標記資料包括訓練邊界框的大小。伺服器用以執行非監督式分群演算法以將訓練邊界框的大小分為多個群組，並且從每一個群組中取得預設邊界框大小。伺服器將預設邊界框大小用於上述的僅一次推論程序中。 In some embodiments, the server is used to obtain multiple training images and labeled data about each training image, where the labeled data includes training edges The size of the bounding box. The server is used to execute an unsupervised grouping algorithm to divide the size of the training bounding box into multiple groups, and obtain the default bounding box size from each group. The server uses the default bounding box size in the one-time inference procedure described above.

在一些實施例中，上述的機器學習模型為第二卷積神經網路，上述偵測第二數位影像中的情境物件的操作包括：根據第一卷積神經網路偵測第二數位影像中的第二推薦物件與第二推薦物件的類別信心值；以及根據第二卷積神經網路輸出第二推薦物件的情境信心值，並將第二推薦物件的類別信心值乘上情境信心值以得到結果信心值，若結果信心值大於臨界值則判斷第二推薦物件為情境物件。 In some embodiments, the aforementioned machine learning model is a second convolutional neural network, and the operation of detecting the context object in the second digital image includes: detecting the second digital image according to the first convolutional neural network The second recommended object and the category confidence value of the second recommended object; and output the situation confidence value of the second recommendation object according to the second convolutional neural network, and multiplying the category confidence value of the second recommendation object by the situation confidence value to The result confidence value is obtained, and if the result confidence value is greater than the critical value, it is determined that the second recommended object is a situation object.

在一些實施例中，第一推薦物件屬於第一類別，類別標籤屬於第二類別，第二類別被涵蓋在第一類別之中。 In some embodiments, the first recommended object belongs to the first category, the category label belongs to the second category, and the second category is covered in the first category.

在上述的系統中，使用者可以不用寫程式，透過擴增實境裝置來建立機器學習模型並分享給其他使用者。 In the above system, users can create machine learning models through augmented reality devices without writing programs and share them with other users.

為讓本新型的上述特徵和優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明如下。 In order to make the above-mentioned features and advantages of the present invention more obvious and understandable, the embodiments are specifically described below and described in detail in conjunction with the accompanying drawings.

100‧‧‧狀況認知系統 100‧‧‧Congnition System

102‧‧‧伺服器 102‧‧‧Server

110、120‧‧‧擴增實境裝置 110, 120‧‧‧ augmented reality device

111、121‧‧‧影像感測器 111、121‧‧‧Image sensor

112、122‧‧‧透明顯示器 112、122‧‧‧Transparent display

130‧‧‧網路 130‧‧‧ Internet

201‧‧‧創作者 201‧‧‧ Creator

210、220、230、232、240‧‧‧步驟 210, 220, 230, 232, 240 ‧‧‧ steps

211‧‧‧參數 211‧‧‧Parameter

226‧‧‧訓練樣本 226‧‧‧Training sample

231‧‧‧應用程式 231‧‧‧Application

221~225‧‧‧步驟 221~225‧‧‧Step

300、400‧‧‧數位影像 300, 400‧‧‧ digital images

301~304‧‧‧推薦物件 301~304‧‧‧Recommended items

310‧‧‧擴增實境場景 310‧‧‧ Augmented reality scene

311~314‧‧‧邊界框 311~314‧‧‧Boundary box

401‧‧‧情境物件 401‧‧‧Situational object

410‧‧‧擴增實境場景 410‧‧‧ Augmented reality scene

411‧‧‧邊界框 411‧‧‧Boundary box

412‧‧‧文字 412‧‧‧Text

421~424‧‧‧圖標 421~424‧‧‧Icon

502、504、506‧‧‧步驟 502, 504, 506‧‧‧ steps

501‧‧‧訓練影像 501‧‧‧ training image

503‧‧‧預設邊界框大小 503‧‧‧Preset bounding box size

505‧‧‧增強影像 505‧‧‧Enhanced image

507‧‧‧標籤 507‧‧‧ label

508‧‧‧機器學習模型 508‧‧‧ machine learning model

[圖1]是根據一實施例繪示狀況認知系統的示意圖。 [FIG. 1] is a schematic diagram illustrating a situation recognition system according to an embodiment.

[圖2A]是根據一實施例繪示狀況認知系統的運作示意圖。 [Fig. 2A] is a schematic diagram illustrating the operation of a situational awareness system according to an embodiment.

[圖2B]是根據一實施例繪示步驟220的流程圖。 [FIG. 2B] is a flowchart illustrating step 220 according to an embodiment.

[圖3A]是根據一實施例繪示擴增實境裝置所擷取的數位影像的示意圖。 [FIG. 3A] A schematic diagram illustrating digital images captured by an augmented reality device according to an embodiment.

[圖3B]是根據一實施例繪示使用者所看到的擴增實境場景的示意圖。 [FIG. 3B] A schematic diagram illustrating an augmented reality scene seen by a user according to an embodiment.

[圖4A]是根據一實施例繪示擴增實境裝置所擷取的數位影像的示意圖。 FIG. 4A is a schematic diagram illustrating digital images captured by an augmented reality device according to an embodiment.

[圖4B]是根據一實施例繪示使用者所看到的擴增實境場景的示意圖。 [FIG. 4B] A schematic diagram illustrating an augmented reality scene seen by a user according to an embodiment.

[圖5]是根據一實施例繪示訓練機器學習模型的流程圖。 [Fig. 5] is a flowchart illustrating training a machine learning model according to an embodiment.

關於本文中所使用之『第一』、『第二』、...等，並非特別指次序或順位的意思，其僅為了區別以相同技術用語描述的元件或操作。 With regard to the "first", "second", ... etc. used in this article, it does not specifically mean the order or order, it is only to distinguish the elements or operations described in the same technical terms.

本揭露所提出的技術可稱為狀況認知放大器(Situation Awareness Magnifier，SAM)，在人工智慧(artificial intelligence，AI)技術的基礎上，SAM以輕便型智慧眼鏡為媒介，讓有高度狀況認知的創作者可以透過他們的經驗，標記常人無法意識到的狀況，藉由雲端自動化的機器學習，將這些經驗轉換為一個個AI應用程式(application，APP)。一般使用者透過智慧眼鏡與SAM雲端系統連接，便可取得創作者提供的狀況認知。 The technology proposed in this disclosure can be called Situation Awareness Magnifier (SAM). On the basis of artificial intelligence (AI) technology, SAM uses lightweight smart glasses as a medium to allow creation with a high degree of situational awareness. Through their experience, people can mark the situation that ordinary people can't realize, and through automated machine learning in the cloud, these experiences can be converted into AI applications (applications). Ordinary users can connect with the SAM cloud system through smart glasses to obtain the situational awareness provided by the creator.

圖1是根據一實施例繪示狀況認知系統的示意圖。狀況認知系統100包括了伺服器102、擴增實境(augmented reality，AR)裝置110與擴增實境裝置120，其中擴增實境裝置110、120可透過網路130通訊連接至伺服器102。在此實施例中擴增實境裝置110、120是實作為智慧眼鏡，但在其他實施例中也可以實作為面罩、透明平板等，本新型並不限制擴增實境裝置110、120的大小與外型。在此實施例中，穿戴擴增實境裝置110的使用者被稱為創作者，創作者可以透過伺服器102將自己的狀況認知能力實作為應用程式，藉此將狀況認知能力分享給擴增實境裝置120的使用者。 FIG. 1 is a schematic diagram illustrating a situational awareness system according to an embodiment. The situational awareness system 100 includes a server 102, augmented reality (augmented reality) reality (AR) device 110 and augmented reality device 120, wherein the augmented reality devices 110, 120 can be connected to the server 102 via the network 130 communication. In this embodiment, the augmented reality devices 110, 120 are implemented as smart glasses, but in other embodiments, they can also be implemented as face masks, transparent tablets, etc. The present invention does not limit the size of the augmented reality devices 110, 120 With appearance. In this embodiment, the user wearing the augmented reality device 110 is called a creator, and the creator can use the server 102 to implement his own situational awareness ability as an application, thereby sharing the situational awareness ability to the augmentation The user of the reality device 120.

擴增實境裝置110包括了影像感測器111與透明顯示器112。擴增實境裝置120包括了影像感測器121與透明顯示器122。透明顯示器112、122亦可稱為透視型顯示器(See-through display)，是一種可讓使用者得以同時觀看外在影像以及投射影像的裝置，達到實像與虛像疊加而不遮蔽視野的效果。透明顯示器112、122可包括液晶顯示器、有機發光二極體、投影器、鏡片、光導等，本新型並不限制透明顯示器112、122所包含的元件。影像感測器111、121可包括感光耦合元件(Charge-coupled Device，CCD)感測器、互補性氧化金屬半導體(Complementary Metal-Oxide Semiconductor)感測器或其他合適的感光元件。為了簡化起見，圖1中並未繪示擴增實境裝置110、120的所有元件，例如，擴增實境裝置110、120還可包括處理器、麥克風、聲音撥放器、慣性測量單元、通訊模組、實體按鈕等，本新型並不在此限。上述的通訊模組可以是蜂巢式網路(或稱行動網路)、近場通訊、紅外線通訊、藍芽或無線保真(WiFi)等等，用以通訊連接至網路130。 The augmented reality device 110 includes an image sensor 111 and a transparent display 112. The augmented reality device 120 includes an image sensor 121 and a transparent display 122. The transparent displays 112 and 122 can also be referred to as see-through displays (See-through display), which is a device that allows users to simultaneously view external images and project images, so as to achieve the effect of superimposing real and virtual images without obscuring the field of view. The transparent displays 112, 122 may include liquid crystal displays, organic light emitting diodes, projectors, lenses, light guides, etc. The present invention does not limit the components included in the transparent displays 112, 122. The image sensors 111 and 121 may include a charge-coupled device (CCD) sensor, a complementary metal-oxide semiconductor (Complementary Metal-Oxide Semiconductor) sensor, or other suitable photosensitive elements. For simplicity, not all components of the augmented reality device 110, 120 are shown in FIG. 1, for example, the augmented reality device 110, 120 may further include a processor, a microphone, a sound player, and an inertial measurement unit , Communication modules, physical buttons, etc., this new model is not limited to this. The above-mentioned communication module may be a cellular network (or mobile network), near field communication, infrared communication, Bluetooth or wireless fidelity (WiFi), etc., for connecting to the network 130.

圖2A是根據一實施例繪示狀況認知系統的運作示意圖，請參照圖2A，在此創作者201是指擴增實境裝置110的使用者，在步驟210中，創作者201在伺服器102提供的系統平台上建立專案，此系統平台可具有網頁、應用程式、網路服務等介面。上述的專案是用以管理創作者201根據自身的狀況認知能力所建立的機器學習模型，創作者201可輸入多個參數211，這些參數211會傳送至伺服器102。參數211包括所要認知的類別、影像解析度、機器學習模型的種類、機器學習模型的各種參數等等。舉例來說，上述的類別可包括人的情緒、人是否說謊、動植物的物種、機械是否故障、是否生病等等，伺服器102可以提供多個預設的類別，或者創作者201可以建立新的類別。在一些實施例中，伺服器102也可以提供一個由專家建立的類別樹狀圖，其中紀錄了每個類別的從屬關係，例如“人臉”的類別會涵蓋“生氣”的類別，因此“人臉”節點會是“生氣”節點的父節點，以此類推。類別樹狀圖中各個類別的從屬關係可以用來幫助創作者標記影像中的物件，舉例來說，若創作者要建立“生氣”的類別，則伺服器102可以提供人臉的物件偵測器，當要標記一張影像中的“生氣”類別時，可先根據此物件偵測器偵測出人臉以供創作者判斷是否為“生氣”類別。 2A is a schematic diagram illustrating the operation of a situational awareness system according to an embodiment. Please refer to FIG. 2A. Here, creator 201 refers to a user of augmented reality device 110. In step 210, creator 201 is on server 102. Create a project on the provided system platform. This system platform can have interfaces such as web pages, applications, and network services. The above-mentioned project is used to manage the machine learning model created by the creator 201 according to his own situational cognitive ability. The creator 201 can input a plurality of parameters 211, and these parameters 211 are sent to the server 102. The parameter 211 includes the category to be recognized, image resolution, type of machine learning model, various parameters of the machine learning model, and so on. For example, the above categories may include human emotions, whether people lie, species of animals and plants, whether machinery malfunctions, whether they are sick, etc. The server 102 may provide multiple preset categories, or the creator 201 may create new category. In some embodiments, the server 102 can also provide a category tree created by experts, which records the affiliation of each category. For example, the category of "face" will cover the category of "angry", so "human The "face" node will be the parent of the "angry" node, and so on. The affiliation of each category in the category tree can be used to help the creator mark the objects in the image. For example, if the creator wants to create a "angry" category, the server 102 can provide an object detector for the face When you want to mark the "angry" category in an image, you can first detect the face based on this object detector for the creator to judge whether it is "angry" category.

在建立好專案以後，在步驟220中，創作者會透過擴增實境裝置110擷取數位影像並標記所要認知的物件以產生訓練樣本。具體來說，圖2B是根據一實施例繪示步驟220的流程圖。請參照圖2B，在步驟221，擴增實境裝置110先透過自身的影像感測器111擷取一數位影像(如圖3A所示的數位影像300)，在此例子中創作者要標記數位影像300中屬於“生氣”類別的物件。 After creating the project, in step 220, the creator will capture digital images through the augmented reality device 110 and mark the objects to be recognized to generate training samples. Specifically, FIG. 2B is a flowchart illustrating step 220 according to an embodiment. Please refer to FIG. 2B. In step 221, the augmented reality device 110 first captures a digital image through its own image sensor 111 (as shown in FIG. 3A. Like 300), in this example, the creator wants to mark objects in the digital image 300 that belong to the "angry" category.

在步驟222，由擴增實境裝置110或伺服器102偵測數位影像300中的推薦物件301~304(在此例子中為人臉)。在一些實施例中，可由伺服器102提供推薦物件301~304的物件偵測器給擴增實境裝置110，或者這個物件偵測器是內建在擴增實境裝置110當中。在一些實施例中，也可由擴增實境裝置110傳送數位影像300至伺服器102，由伺服器偵測出推薦物件301~304以後將偵測結果回傳至擴增實境裝置110。 In step 222, the augmented reality device 110 or the server 102 detects the recommended objects 301-304 (in this example, human faces) in the digital image 300. In some embodiments, the server 102 may provide an object detector that recommends objects 301 to 304 to the augmented reality device 110, or the object detector is built into the augmented reality device 110. In some embodiments, the augmented reality device 110 may also send the digital image 300 to the server 102, and after the server detects the recommended objects 301-304, the detection result is returned to the augmented reality device 110.

在步驟223中，根據偵測結果顯示圖示物件。具體來說，擴增實境裝置110會根據推薦物件301~304的位置產生對應的圖示物件(例如邊界框)並將此圖示物件顯示在透明顯示器112當中，圖3B繪示的是創作者所看到的擴增實境場景310，其中邊界框311~314為虛像，其餘是真實場景，兩者會混和在一起。 In step 223, the icon object is displayed according to the detection result. Specifically, the augmented reality device 110 generates a corresponding icon object (such as a bounding box) according to the positions of the recommended objects 301-304 and displays the icon object on the transparent display 112. FIG. 3B shows the creation The augmented reality scene 310 seen by the author, in which the bounding boxes 311 to 314 are virtual images, and the rest are real scenes, the two will be mixed together.

在步驟224中，選擇所要標記的邊界框。創作者可以透過一使用者介面輸入一或多個指令，藉此從推薦物件301~304中選擇出所要標記的物件。此使用者介面例如為語音介面或手勢介面，也就是說擴增實境裝置110可透過麥克風接收來自創作者的聲音訊號並辨識創作者說出的話或者是透過影像感測器111辨識創作者的手勢來分析出創作者所要下達的指令。或者在一些實施例中，上述的使用者介面也可以用滑鼠、鍵盤、手寫板、實體按鍵、或其他硬體裝置來實作。在此例子中創作者選擇了邊界框313(推薦物件303)。 In step 224, the bounding box to be marked is selected. The creator can input one or more commands through a user interface, thereby selecting the objects to be marked from the recommended objects 301-304. The user interface is, for example, a voice interface or a gesture interface, which means that the augmented reality device 110 can receive the audio signal from the creator through the microphone and recognize the words spoken by the creator or recognize the creator through the image sensor 111 Gesture to analyze the instructions that the creator wants to give. Or in some embodiments, the user interface described above may also be implemented with a mouse, keyboard, tablet, physical keys, or other hardware devices. In this example, the author selects the bounding box 313 (recommended object 303).

在步驟225，提供推薦物件的類別標籤。創作者可以透過上述的使用者介面輸入一或多個指令，進一步地把邊界框313(推薦物件303)分類為“生氣”這個類別，創作者所提供的類別標籤可以用文字、數字或二進位碼記錄下來。在一些實施例中，創作者也可以透過使用者介面下達指令以調整邊界框311~314的位置與大小，之後再提供類別標籤。在一些實施例中，如果創作者所要標記的物件並不在邊界框311~314的範圍內，創作者可以透過使用者介面建立一個新的邊界框並提供相對應的類別標籤。數位影像300、邊界框313的位置與大小(或新建立邊界框的位置與大小)與類別標籤可合併稱為一個訓練樣本，接下來可再重複步驟222~225，藉此產生更多訓練樣本。 At step 225, a category label of the recommended item is provided. The creator can enter one or more commands through the above user interface to further classify the bounding box 313 (recommended object 303) as the "angry" category. The category label provided by the creator can use text, numbers or binary The code is recorded. In some embodiments, the creator can also issue commands through the user interface to adjust the position and size of the bounding boxes 311-314, and then provide the category label. In some embodiments, if the object to be marked by the creator is not within the range of the bounding boxes 311-314, the creator can create a new bounding box through the user interface and provide the corresponding category label. The position and size of the digital image 300, the bounding box 313 (or the position and size of the newly created bounding box) and the category label can be combined to be called a training sample, and then steps 222~225 can be repeated to generate more training samples .

在一些實施例中也可以不產生推薦物件，擴增實境裝置110會直接透過使用者接面接收關於一待分類物件的類別標籤。舉例來說，擴增實境裝置110可以設定數位影像中對應使用者目光的一個特定位置(例如影像中心)並取得此特定位置上的物件(亦稱為待分類物件)，擴增實境裝置110可以判斷使用者的目光是否停留在此待分類物件超過一預設時間(例如5秒)，若是的話則等待使用者說出此待分類物件的類別標籤。在本揭露中，上述的推薦物件也可以被稱為待分類物件，因此不論有沒有產生推薦物件，只要透過擴增實境裝置110的使用者介面來產生一個待分類物件的類別標籤都在本揭露的範圍中。 In some embodiments, the recommended object may not be generated, and the augmented reality device 110 will directly receive the category label about an object to be classified through the user interface. For example, the augmented reality device 110 can set a specific position (eg, image center) in the digital image corresponding to the user's gaze and obtain an object (also referred to as an object to be classified) at this specific position, the augmented reality device 110 can determine whether the user's gaze stays on the object to be classified for more than a preset time (for example, 5 seconds), and if so, wait for the user to say the category label of the object to be classified. In the present disclosure, the above-mentioned recommended objects can also be referred to as objects to be classified, so regardless of whether or not to generate recommended objects, as long as the category label of an object to be classified is generated through the user interface of the augmented reality device 110, this category Within the scope of disclosure.

請參照回圖2A，接下來擴增實境裝置110會將所收集到的一或多個訓練樣本226傳送至伺服器102。在步驟220 中，由於系統先提供了推薦物件，因此創作者可以快速地建立類別標籤。值得注意的是，推薦物件的類別必須要涵蓋創作者所要建立的類別，例如“人臉”這類別涵蓋了“生氣”的類別。以另一個角度來說，若推薦物件屬於第一類別，創作者提供的類別標籤屬於第二類別，則第二類別會被涵蓋在第一類別之中，但上述的類別僅是範例，本新型並不限制第一類別與第二類別為何。除此之外，創作者是透過擴增實境裝置110來建立類別標籤，因此創作者可以任意地在室內/戶外移動，當他發現合適的物件便可以產生新的訓練樣本，相較於在電腦前枯燥的不斷建立標籤，此實施例提供的系統對於創作者來說更加友善。 Please refer back to FIG. 2A. Next, the augmented reality device 110 transmits the collected one or more training samples 226 to the server 102. In step 220 In, because the system provides recommended items first, the creator can quickly create category tags. It is worth noting that the category of recommended objects must cover the category that the creator wants to create. For example, the category of "face" covers the category of "angry". From another perspective, if the recommended object belongs to the first category and the category label provided by the creator belongs to the second category, the second category will be covered in the first category, but the above category is just an example. It does not limit what the first category and the second category are. In addition, the creator creates the category label through the augmented reality device 110, so the creator can move indoors/outdoors arbitrarily. When he finds a suitable object, he can generate new training samples, compared with the Labels are constantly created in front of the computer, and the system provided by this embodiment is more friendly to creators.

當擴增實境裝置110傳送訓練樣本226給伺服器102以後，伺服器102可以判斷訓練樣本的個數是否超過一個臨界值，若是的話則會開始執行步驟230，若否的話則會通知創作者繼續收集訓練樣本。在步驟230中，伺服器102根據收集到的類別標籤與數位影像訓練一個機器學習模型，此機器學習模型可為決策樹、隨機森林、多層次神經網路、卷積神經網路、支持向量機等等，本新型並不在此限。在一些實施例中，伺服器102也可以對所收集到的數位影像執行一些前處理，例如亮度調整、去雜訊等，本新型並不限制這些前處理的內容。在一些實施例中，伺服器102也可以建立一個應用程式231，讓使用者透過應用程式231來使用所訓練出的機器學習模型。在建立應用程式231以後，伺服器102會將應用程式231發佈在使用者可存取的平台上。 After the augmented reality device 110 sends the training samples 226 to the server 102, the server 102 can determine whether the number of training samples exceeds a critical value, if so, it will start to perform step 230, if not, it will notify the creator Continue to collect training samples. In step 230, the server 102 trains a machine learning model based on the collected class labels and digital images. The machine learning model can be a decision tree, random forest, multi-level neural network, convolutional neural network, support vector machine Wait, the new model is not limited to this. In some embodiments, the server 102 may also perform some pre-processing on the collected digital images, such as brightness adjustment, noise reduction, etc. The present invention does not limit the content of these pre-processing. In some embodiments, the server 102 may also create an application 231 for the user to use the trained machine learning model through the application 231. After creating the application 231, the server 102 will publish the application 231 on a platform accessible to the user.

在一些實施例中，在發佈應用程式231之前，伺服器102可以先將應用程式231傳送給創作者進行測試(步驟232)，在測試之後創作者可以回到步驟220以收集更多訓練樣本，或者創作者可以接受訓練結果，接下來伺服器102才會將應用程式231發佈。 In some embodiments, before publishing the application 231, The server 102 may first send the application 231 to the creator for testing (step 232), and after the test, the creator may return to step 220 to collect more training samples, or the creator may accept the training results, and then the server 102 Only then will the application 231 be released.

在步驟240，使用者透過擴增實境裝置120來下載並安裝應用程式231，之後便可以根據訓練好的機器學習模型來認知新的狀況。具體來說，擴增實境裝置120可透過自身的影像感測器121擷取一數位影像，如圖4A的數位影像400，擴增實境裝置120可以根據所訓練出的機器學習模型來偵測數位影像400中的情境物件401(在此例子中為生氣的人臉)。在一些實施例中，擴增實境裝置120可以將數位影像400傳送至伺服器102，由伺服器102偵測出情境物件401後將偵測結果回傳至擴增實境裝置120。或者，擴增實境裝置120也可以透過應用程式231從伺服器102下載訓練好的機器學習模型，透過自身的處理器來偵測數位影像400中的情境物件401。 In step 240, the user downloads and installs the application program 231 through the augmented reality device 120, and then can recognize the new situation according to the trained machine learning model. Specifically, the augmented reality device 120 can capture a digital image through its own image sensor 121, as shown in the digital image 400 of FIG. 4A, the augmented reality device 120 can detect according to the trained machine learning model Measure the situation object 401 in the digital image 400 (in this example, an angry face). In some embodiments, the augmented reality device 120 may send the digital image 400 to the server 102, and the server 102 detects the situation object 401 and returns the detection result to the augmented reality device 120. Alternatively, the augmented reality device 120 can also download the trained machine learning model from the server 102 through the application program 231, and detect the situation object 401 in the digital image 400 through its own processor.

擴增實境裝置120也會根據情境物件401的位置在透明顯示器121中顯示對應的圖示物件，如圖4B所示的擴增實境場景410，圖示物件包括邊界框411與“生氣”的文字412。然而，如4B僅為一範例，上述的圖示物件也可包含任意的圖案、符號、數字等等，本新型並不在此限。在一些實施例中，擴增實境場景410中也可以顯示多個圖標421~424，這些圖標421~424是用以切換至其他的狀況認知能力，也就是偵測其他的物件，例如為車、花、樹、小孩等等，本新型並不在此限。 The augmented reality device 120 will also display the corresponding graphical object on the transparent display 121 according to the position of the situation object 401, as shown in the augmented reality scene 410 shown in FIG. 4B, the illustrated object includes a bounding box 411 and "angry" Of text 412. However, if 4B is just an example, the above illustrated objects may also include any patterns, symbols, numbers, etc., and the present invention is not limited thereto. In some embodiments, multiple icons 421 to 424 can also be displayed in the augmented reality scene 410. These icons 421 to 424 are used to switch to other situational awareness, that is, to detect other objects, such as a car , Flowers, trees, children, etc., the new model is not limited to this.

在一些實施例中，步驟220中所採用的物件偵測器與步驟230所訓練的機器學習模型都是卷積神經網路。在習知的物件偵測方法中是用一個視窗掃過整張數位影像，對於每個視窗都要判斷是否為所要偵測的物件，若是的話則在設定此視窗為一個邊界框，之後調整視窗的大小再重新掃描一次，也就是說在這樣的驗算法中需要推論(inference)多次。然而，在此實施例中是先預設有多個邊界框，每個邊界框有固定的大小，而卷積神經網路僅執行一次推論程序來輸出多個邊界框的位置與多個類別信心值。更具體來說，每個邊界框都至少對應到3+N個參數，包含X座標、Y座標、是否有物件的機率P(Object)，N個類別信心值，若總共有M個邊界框，則卷積神經網路至少會輸出Mx(3+N)個數值，其中M、N為正整數。上述的N個類別信心值例如是對應到狗、貓、人臉、車等N個類別，將上述的機率P(Object)乘上對應的類別信心值則表示此邊界框內有對應類別的機率為何，可表示為以下方程式(1)。 In some embodiments, the object detection used in step 220 The machine and the machine learning model trained in step 230 are both convolutional neural networks. In the conventional object detection method, a window is used to scan the entire digital image. For each window, it is determined whether it is the object to be detected. If so, set this window as a bounding box, and then adjust the window The size of is rescanned again, that is to say, it needs to be inferred multiple times in such an algorithm. However, in this embodiment, a plurality of bounding boxes are preset first, and each bounding box has a fixed size, and the convolutional neural network only executes the inference process once to output the positions of multiple bounding boxes and the confidence of multiple categories value. More specifically, each bounding box corresponds to at least 3+N parameters, including X coordinate, Y coordinate, the probability of whether there is an object P(Object), N class confidence values, and if there are M total bounding boxes, Then the convolutional neural network will output at least Mx(3+N) values, where M and N are positive integers. The above-mentioned N category confidence values correspond to N categories such as dogs, cats, faces, and cars. Multiplying the above probability P(Object) by the corresponding category confidence value indicates that there is a corresponding category probability in this bounding box Why, it can be expressed as the following equation (1).

P(C_i)=P(C_i|Object)×P(Object)...(1) P(C _i )=P(C _i |Object)×P(Object)...(1)

其中P(C_i|Object)為上述的類別信心值，C_i為第i個類別，i=1...N，P(C_i)為此邊界框內有對應類別C_i的機率。值得注意的是，在訓練階段若影像中的物件個數大於上述的正整數M，則部分邊界框中並沒有物件，因此這些邊界框的機率P(Object)應為0。在一些實施例中，上述Mx(3+N)個數值可以是在全連接層之後輸出，或者在一些實施例中也可以在卷積層之後輸出，然而本領域具有通常知識者當可理解卷積神經網路的架構，在此不再詳細贅述。 Where P(C _i |Object) is the above-mentioned category confidence value, C _i is the i-th category, i=1...N, and P(C _i ) is the probability that there is a corresponding category C _i in this bounding box. It is worth noting that during the training phase, if the number of objects in the image is greater than the above positive integer M, there are no objects in some bounding boxes, so the probability P(Object) of these bounding boxes should be 0. In some embodiments, the above Mx(3+N) values may be output after the fully connected layer, or may be output after the convolution layer in some embodiments, however, those with ordinary knowledge in the art can understand convolution The architecture of the neural network will not be described in detail here.

由於上述邊界框的大小是固定的，因此必須有一機制來決定此大小(包含寬度與高度)。圖5是根據一實施例繪示訓練機器學習模型的流程圖。請參照圖5，其中各步驟是由伺服器102來執行。首先從資料庫出取得多個訓練影像501以及關於訓練影像的標記資料，標記資料亦稱為基本事實(ground truth)，包括了已經標記過的邊界框(以下稱訓練邊界框)的大小與位置。在步驟502，執行一非監督式分群演算法以將這些訓練邊界框的大小分為多個群組，並且從每一個群組中取得預設邊界框大小503，例如取每個群組的中心位置(取平均)以做為預設邊界框大小503。此非監督式分群演算法例如為K-均值演算法(K-mcans)，其中K可為5、6或任意合適的正整數。這些預設邊界框大小503可以表示為向量[高度,寬度]，例如為[64,44]、[88,75]等共K個向量，本新型並不限制這些預設邊界框大小503的數值。在步驟504中，可以對訓練影像501執行上述的前處理以得到增強影像505。接下來在步驟506，根據標籤507、預設邊界框大小503與增強影像505來執行監督式機器學習演算法以取得機器學習模型508。 Since the size of the above bounding box is fixed, there must be a Mechanism to determine this size (including width and height). FIG. 5 is a flowchart illustrating training a machine learning model according to an embodiment. Please refer to FIG. 5, wherein each step is executed by the server 102. First, obtain multiple training images 501 and labeled data about the training images from the database. The labeled data is also called ground truth, including the size and position of the bounding box (hereinafter referred to as the training bounding box) that has been marked . In step 502, an unsupervised clustering algorithm is executed to divide the size of these training bounding boxes into multiple groups, and the default bounding box size 503 is obtained from each group, for example, taking the center of each group The position (averaged) is taken as the preset bounding box size 503. The unsupervised clustering algorithm is, for example, K-means algorithm (K-mcans), where K may be 5, 6 or any suitable positive integer. These preset bounding box sizes 503 can be expressed as vectors [height, width], for example, [64,44], [88,75], and a total of K vectors. The present invention does not limit the values of these preset bounding box sizes 503 . In step 504, the aforementioned pre-processing may be performed on the training image 501 to obtain an enhanced image 505. Next, in step 506, a supervised machine learning algorithm is executed according to the label 507, the preset bounding box size 503 and the enhanced image 505 to obtain the machine learning model 508.

預設邊界框大小503除了用在訓練階段也會用在上述僅一次的推論程序中。本實施的卷積神經網路是直接預測邊界框的位置，因此不需要利用視窗掃過整張影像，據此只需要執行一次推論程序，可以減少推論的時間。 The preset bounding box size 503 is used in the above-mentioned inference procedure only in addition to the training phase. The convolutional neural network of this implementation directly predicts the position of the bounding box, so there is no need to use the window to scan the entire image, so only one inference procedure needs to be executed, which can reduce the time of inference.

在上述例子中，步驟220所採用的是偵測人臉的卷積神經網路，而步驟230是訓練偵測生氣人臉的卷積神經網路。在一些實施例中，在步驟230中伺服器102可以不採用偵測人臉的卷積神經網路，重新訓練一個“生氣人臉”的機器學習模型，也就是說輸入影像為包含背景的整張影像，其中具有生氣人臉的標記。或者在一些實施例中可以單獨訓練一個“生氣”的機器學習模型，這個機器學習模型可以與“人臉”的機器學習模型結合。具體來說，步驟220中所使用的卷積神經網路稱為第一卷積神經網路，而在步驟230中所訓練的是卷積神經網路稱為第二卷積神經網路，第二卷積神經網路的輸入為人臉影像，輸出為表示是否為“生氣”的數值。在步驟240中偵測情境物件的操作可先根據第一卷積神經網路偵測數位影像400中的第二推薦物件(相同於情境物件401)與並取得第二推薦物件的類別信心值P(face)，根據上述的描述可以表示為以下方程式(2)。 In the above example, step 220 uses a convolutional neural network to detect faces, and step 230 is to train a convolutional neural network to detect angry faces. In some embodiments, in step 230, the server 102 may not use a convolutional neural network to detect human faces, and retrain a machine learning "angry face" The model, that is to say, the input image is the entire image containing the background, which has the mark of the angry face. Or in some embodiments, a "angry" machine learning model can be trained separately, and this machine learning model can be combined with a "human face" machine learning model. Specifically, the convolutional neural network used in step 220 is called the first convolutional neural network, and the convolutional neural network trained in step 230 is called the second convolutional neural network. The input of the two convolutional neural network is a face image, and the output is a value indicating whether it is "angry". In step 240, the operation of detecting the context object may first detect the second recommended object (same as the context object 401) in the digital image 400 according to the first convolutional neural network and obtain the class confidence value P of the second recommended object (face), according to the above description can be expressed as the following equation (2).

P(face)=P(face|Object)×P(Object)...(2) P(face)=P(face|Object)×P(Object)...(2)

接著根據第二卷積神經網路輸出第二推薦物件的情境信心值P(angry|face)，並將第二推薦物件的類別信心值P(face)乘上情境信心值P(angry|face)以得到結果信心值P(angry)，表示為以下方程式(3)。 Then output the situational confidence value P(angry|face) of the second recommended object according to the second convolutional neural network, and multiply the category confidence value P(face) of the second recommended object by the situational confidence value P(angry|face) In order to obtain the result confidence value P(angry), it is expressed as the following equation (3).

P(angry)=P(angry|face)×P(face)...(3) P(angry)=P(angryface)×P(face)...(3)

若結果信心值P(angry)大於一臨界值則判斷第二推薦物件為情境物件。在此實施例中，由於第二卷積神經網路是根據人臉來判斷是否生氣，因此第二卷積神經網路的複雜度會比較低，可以減少訓練的時間。值得注意的是，可以減少訓練時間的一個原因是步驟220所採用的推薦物件的類別涵蓋了使用者所要標記的類別，這不只減少創作者標記所需要的時間，也減少了訓練時間。 If the result confidence value P(angry) is greater than a critical value, it is determined that the second recommended object is a situation object. In this embodiment, since the second convolutional neural network judges whether it is angry according to the human face, the complexity of the second convolutional neural network will be relatively low, which can reduce the training time. It is worth noting that one reason for reducing the training time is that the category of the recommended object used in step 220 covers the category to be marked by the user, which not only reduces the time required for the creator to mark, but also reduces the training time.

本新型藉由輕便型的智慧眼鏡和自動化的AI模型訓練系統，精鍊樣本品質，縮短標記工作與模型訓練時間。搭配完善的雲端系統，使一般人能取得這些保貴的認知經驗，應用在日常生活中。本新型以創作者和使用者共享狀況意識為出發點，打造以AI App為主體的生態系統。藉由自動化的AI訓練與辨識服務，創造整個生態系，讓一般人也能參與AI。此外，透過上述的系統，使用者不需撰寫程式，只需透過智慧眼鏡和使用者介面來標記狀態，雲端平台即自動進行機器學習，大幅減低入門門檻。本工具除了供一般消費性市場使用外，可應用於任何需要大量標記物件，以供未來機器學習之場合。人員僅需透過眼鏡標記物件並上傳SAM平台，後續訓練工作將由平台自動完成。其產生之訓練模型，可再依需求進一步連結重整，組成多功能的AI App。 The new model uses refined smart glasses and an automated AI model training system to refine the quality of the samples and shorten the marking work and model training time. With the perfect cloud system, ordinary people can obtain these precious cognitive experiences and apply them in daily life. This new model is based on the awareness of creators and users to share the situation, and creates an ecosystem with AI apps as the main body. With automated AI training and recognition services, the entire ecosystem is created so that ordinary people can participate in AI. In addition, through the above system, users do not need to write programs, just mark the status through smart glasses and user interface, and the cloud platform automatically performs machine learning, which greatly reduces the entry threshold. In addition to being used in the general consumer market, this tool can be applied to any occasion that requires a large number of marked objects for future machine learning. The personnel only need to mark the object through the glasses and upload it to the SAM platform. The subsequent training will be completed automatically by the platform. The training model generated by it can be further connected and reformed according to the needs to form a multi-functional AI App.

雖然本新型已以實施例揭露如上，然其並非用以限定本新型，任何所屬技術領域中具有通常知識者，在不脫離本新型的精神和範圍內，當可作些許的更動與潤飾，故本新型的保護範圍當視後附的申請專利範圍所界定者為準。 Although the present invention has been disclosed as above with examples, it is not intended to limit the present invention. Anyone who has ordinary knowledge in the technical field can make some changes and retouching without departing from the spirit and scope of the present invention. The scope of protection of this new model shall be subject to the scope defined in the appended patent application.

100:狀況認知系統 100: situational awareness system

102:伺服器 102: server

110、120:擴增實境裝置 110, 120: Augmented reality device

111、121:影像感測器 111, 121: image sensor

112、122:透明顯示器 112, 122: Transparent display

130:網路 130: Internet

Claims

A situational awareness system, including:

A server;

A first augmented reality device is communicatively connected to the server. The first augmented reality device includes a first image sensor and a first transparent display for capturing through the first image sensor Take a first digital image; and

A second augmented reality device, communicatively connected to the server, the second augmented reality device includes a second image sensor and a second transparent display,

The first augmented reality device receives a category label about an object to be classified through a user interface, and uploads the category label and the first digital image to the server,

The server trains a machine learning model based on the category label and the first digital image,

Wherein the second augmented reality device captures a second digital image through the second image sensor, and the second augmented reality device or the server detects the second digital image according to the machine learning model A situation object, the second augmented reality device displays the corresponding icon object on the second transparent display according to the position of the situation object.

The situation recognition system as described in item 1 of the patent application scope, wherein the user interface includes a voice interface or a gesture interface.

For the situational awareness system described in item 1 of the patent application scope, The first augmented reality device or the server is used to detect at least one first recommended object in the first digital image as the object to be classified, and the first augmented reality device is used in the first A transparent display displays a bounding box on the at least one first recommended object, and receives a command from the user through the user interface to adjust the position and size of the bounding box.

The situational awareness system as described in item 3 of the patent application scope, wherein the first augmented reality device or the server detects the at least one first recommended object according to a first convolutional neural network, the first A convolutional neural network is used to execute the inference procedure only once to output the positions of multiple bounding boxes and multiple class confidence values, wherein the sizes of the bounding boxes are fixed.

The condition recognition system as described in item 4 of the patent application scope, wherein the server is used to obtain a plurality of training images and a piece of labeled data about each of the training images, wherein the marked data includes the size of at least one training bounding box ,

The server is used to execute an unsupervised grouping algorithm to divide the size of the at least one training bounding box of the training images into multiple groups, and obtain a default boundary from each of the groups Frame size,

The server uses the preset bounding box sizes in the one-time inference process.

The condition recognition system as described in item 5 of the patent application scope, wherein the machine learning model is a second convolutional neural network to detect the second number The operations of the situation object in the bit image include:

Detecting a second recommended object in the second digital image and the class confidence value of the second recommended object according to the first convolutional neural network; and

Output a context confidence value of the second recommended object according to the second convolutional neural network, and multiply the category confidence value of the second recommended object by the context confidence value to obtain a result confidence value, if the result confidence If the value is greater than a critical value, the second recommended object is determined to be the situation object.

The situation recognition system as described in item 6 of the patent application scope, wherein the first recommended object belongs to a first category, the category label belongs to a second category, and the second category is covered in the first category.