TWI795823B - Electronic device and its operation method - Google Patents
Electronic device and its operation method Download PDFInfo
- Publication number
- TWI795823B TWI795823B TW110123573A TW110123573A TWI795823B TW I795823 B TWI795823 B TW I795823B TW 110123573 A TW110123573 A TW 110123573A TW 110123573 A TW110123573 A TW 110123573A TW I795823 B TWI795823 B TW I795823B
- Authority
- TW
- Taiwan
- Prior art keywords
- information
- processor
- electronic device
- image
- gaze
- Prior art date
Links
Images
Landscapes
- User Interface Of Digital Computer (AREA)
- Electrophonic Musical Instruments (AREA)
Abstract
Description
本發明是有關於一種電子裝置及其操作方法,且特別是有關於一種可自動執行用戶給定任務的電子裝置及其操作方法。The present invention relates to an electronic device and its operating method, and in particular to an electronic device capable of automatically executing tasks given by users and its operating method.
電子商務是指透過網際網路或電子交易的方式進行交易活動以及相關服務活動。在電子商務上的應用上,需要由用戶點擊功能鍵的方式以進行相關操作。例如,以滑鼠點擊網頁上的功能鍵的以進行開啟指定商品頁面、將指定商品加入購物車等操作。因此,視所要達成的目的性不同,用戶需要來回在頁面上進行操作。然而,若用戶點擊到錯誤的功能鍵,則需要花費更多的時間才能達成原先的目的。另一種方式是以語音助理作輔助以進行操作。用戶可以透過語音來下達指令。然而,有可能發生語音助理判斷語意不清或是理解錯誤,導致無法正確完成任務的狀況。E-commerce refers to transaction activities and related service activities conducted through the Internet or electronic transactions. In e-commerce applications, it is necessary for the user to click a function key to perform related operations. For example, click the function button on the webpage with the mouse to perform operations such as opening the specified product page, adding the specified product to the shopping cart, and the like. Therefore, depending on the purpose to be achieved, the user needs to perform operations on the page back and forth. However, if the user clicks on the wrong function key, it will take more time to achieve the original purpose. Another way is to use the voice assistant as an assistant to operate. Users can give instructions by voice. However, it may happen that the voice assistant judges that the language is unclear or misunderstood, resulting in the situation that the task cannot be completed correctly.
因此,需要提出一種解決方案,以兼顧用戶操作的便利性以及執行任務的正確性。Therefore, a solution needs to be proposed to take into account the convenience of user operation and the correctness of performing tasks.
本發明提供一種電子裝置及其操作方法,可兼顧用戶操作的便利性以及執行任務的正確性。The invention provides an electronic device and its operation method, which can take into account the convenience of user operation and the correctness of task execution.
本發明提出一種電子裝置的操作方法,其中電子裝置包括顯示面板、眼球追蹤模組以及處理器,其中處理器耦接顯示面板與眼球追蹤模組。前述操作方法包括:眼球追蹤模組依據用戶的眼部影像進行眼球追蹤以產生注視影像信息;處理器依據注視影像信息與顯示面板的解析度,以產生注視座標信息;處理器對來自用戶的語音信息執行語意解析以產生物件信息與任務信息;處理器依據物件信息與注視座標信息以產生選定物件信息;以及處理器依據選定物件信息與任務信息,以執行相應的任務動作。The invention provides an operating method of an electronic device, wherein the electronic device includes a display panel, an eye tracking module, and a processor, wherein the processor is coupled to the display panel and the eye tracking module. The aforementioned operation method includes: the eye tracking module performs eye tracking according to the user's eye image to generate gaze image information; the processor generates gaze coordinate information according to the gaze image information and the resolution of the display panel; Information performs semantic analysis to generate object information and task information; the processor generates selected object information according to the object information and gaze coordinate information; and the processor executes corresponding task actions according to the selected object information and task information.
本發明提出一種電子裝置,包括顯示面板、眼球追蹤模組、語音輸入模組以及處理器。眼球追蹤模組用以產生注視影像信息。語音輸入模組用以接收語音信息。處理器耦接顯示面板、語音輸入模組與眼球追蹤模組。處理器用以:依據注視影像信息與顯示面板的解析度以產生注視座標信息;對語音信息執行語意解析以產生物件信息與任務信息;依據物件信息與注視座標信息以產生選定物件信息;以及依據選定物件信息與任務信息以執行相應的任務動作。The invention provides an electronic device, including a display panel, an eye tracking module, a voice input module and a processor. The eye tracking module is used to generate gaze image information. The voice input module is used for receiving voice information. The processor is coupled to the display panel, the voice input module and the eye tracking module. The processor is used to: generate gaze coordinate information according to the gaze image information and the resolution of the display panel; perform semantic analysis on the voice information to generate object information and task information; generate selected object information according to the object information and gaze coordinate information; Object information and task information to perform corresponding task actions.
本發明提出一種電子裝置的操作方法。電子裝置包括顯示面板、眼球追蹤模組、語音輸入模組以及處理器。前述操作方法包括:顯示面板播放影片;處理器依據語音輸入模組接收到的第一語音信息,擷取影片的當前畫面;處理器對第一語音信息執行第一語意解析,以產生第一物件信息與第一任務信息;眼球追蹤模組依據第一眼部影像進行第一眼球追蹤,以產生第一注視影像信息;處理器依據第一注視影像信息與顯示面板的解析度,以產生第一注視座標信息;處理器依據第一注視座標信息,決定當前畫面的局部畫面;處理器依據局部畫面,決定興趣物件;處理器依據興趣物件與第一物件信息,以產生第一選定物件信息;以及處理器依據第一選定物件信息與第一任務信息,以執行相應的第一任務動作。The invention provides an operating method of an electronic device. The electronic device includes a display panel, an eye tracking module, a voice input module and a processor. The aforementioned operation method includes: displaying the video on the display panel; the processor extracts the current frame of the video according to the first voice information received by the voice input module; the processor executes the first semantic analysis on the first voice information to generate the first object information and the first task information; the eye tracking module performs the first eye tracking according to the first eye image to generate the first gaze image information; the processor generates the first gaze image information according to the first gaze image information and the resolution of the display panel The gaze coordinate information; the processor determines the partial image of the current image according to the first gaze coordinate information; the processor determines the object of interest according to the partial image; the processor generates the first selected object information according to the object of interest and the first object information; and The processor executes the corresponding first task action according to the first selected object information and the first task information.
基於上述,本發明可以眼球追蹤來替代滑鼠游標,並搭配語音助理功能,以依據語意分析結果執行任務動作。藉此,可以避免用戶點擊錯誤以及語音助理理解錯誤的問題,兼顧了操作便利性以及任務執行的正確性。Based on the above, the present invention can replace the mouse cursor with eye tracking, and cooperate with the voice assistant function to perform task actions according to the semantic analysis results. In this way, the problems of wrong clicking by the user and wrong understanding of the voice assistant can be avoided, taking into account the convenience of operation and the correctness of task execution.
圖1繪示為本發明一實施例的電子裝置的方塊示意圖。圖2繪示為本發明一實施例的電子裝置的操作方法的步驟流程圖。請同時參見圖1與圖2,電子裝置100包括顯示面板110、眼球追蹤模組120、語音辨識模組130以及處理器140。顯示面板110用以顯示一畫面。眼球追蹤模組120用以依據用戶的眼部影像以測量用戶眼睛的注視點的位置或者眼球相對頭部的運動,藉此實現眼球追蹤(Eye Tracking)。眼球追蹤模組120執行眼球追蹤以產生注視影像信息(步驟S210)。語音輸入模組130用以接收來自用戶的語音信息。FIG. 1 is a schematic block diagram of an electronic device according to an embodiment of the present invention. FIG. 2 is a flow chart showing the steps of the operation method of the electronic device according to an embodiment of the present invention. Please refer to FIG. 1 and FIG. 2 at the same time. The
用戶的眼部影像可由電子裝置100的攝像模組(未示出)進行拍攝。眼部影像可包括普爾欽(Purkinje)斑以及瞳孔影像。普爾欽斑是在光束入射眼球的過程中,在眼球角膜表面反射所形成。瞳孔影像可以是亮瞳(bright pupil)影像或暗瞳(dark pupil)影像。在一實施例中,眼球追蹤模組120可以是眼動儀,其係利用眼動過程的變動特徵(瞳孔)與保持不變的特徵(普爾欽斑)來進行眼球追蹤。在細節上,眼球追蹤模組120交替地使用不同方位的光源向人眼發出近紅外線,以分別於相鄰兩幀獲取亮瞳影像和暗瞳影像。瞳孔影像的位置可以透過對亮瞳影像與暗瞳影像進行疊加差分而獲得。注視影像信息包括普爾欽斑的位置、瞳孔影像的位置以及前述兩者的相對方向與相對距離。The user's eye image can be captured by a camera module (not shown) of the
處理器140耦接顯示面板110、眼球追蹤模組120以及語音輸入模組130。處理器140基於顯示面板110的解析度,以依據注視影像信息來產生注視座標信息(步驟S220)。處理器140並可對前述語音信息進行語意解析,以產生對應的物件信息與任務信息(步驟S230)。處理器140還可依據物件信息與注視座標信息產生選定物件信息(步驟S240),並依據選定物件信息與任務信息來執行相應的任務動作(步驟S250)。The
圖3繪示為本發明第一實施例的電子裝置的操作方法的步驟流程圖。圖4繪示為本發明第一實施例的使用情境的示意圖。請同時參照圖1、圖3與圖4,在步驟S310中,由電子裝置100將網頁資訊顯示於顯示面板110(例如商品頁面)。在步驟S320中,當用戶的視線落在前述頁面的目標401上時,可由眼球追蹤模組120依據用戶眼睛朝向方向D1的眼部影像以產生注視影像信息。在步驟S330中,由處理器140依據顯示面板110的解析度以及注視影像信息以產生注視座標信息。注視座標信息可以指示用戶視線落點對應在顯示區域上的一個特定位置。在步驟S340中,由處理器140依據注視座標信息喚醒語音助理,並依照語音輸入模組130是否接收到來自用戶的語音信息進行不同操作(步驟S350)。若未接收到用戶的語音信息,則回到步驟S320。若接收到用戶的語音信息,則處理器140將語音信息轉化為文字信息,並進一步依據文字信息進行語意解析,以產生物件信息以及任務信息(步驟S360)。FIG. 3 is a flow chart showing the steps of the operation method of the electronic device according to the first embodiment of the present invention. FIG. 4 is a schematic diagram of a usage scenario of the first embodiment of the present invention. Please refer to FIG. 1 , FIG. 3 and FIG. 4 at the same time. In step S310 , the
在細節上,可由一語音辨識模組(圖未示)依據語音辨識演算法(例如語音轉文字(Speech To Text, STT)識別)來辨識用戶的語音信息以產生文字信息。語意解析可由一語意解析模組(圖未示)來進行。舉例來說,語意解析模組可透過自然語言理解(Natural Language Understanding,NLU)演算法以基於文字信息來產生解析結果。自然語言理解演算法的作用在於讓系統能夠讀懂前述文字信息,使其理解文本、語言並提取資訊,以幫助文本分類、語法分析、資訊搜索等下游任務的執行。自然語言理解首先是要從文字信息辨識出字詞並找區分出不同的詞性。接著,可以依據概率語言模型來預測可能接續在後的字詞。常見的概率語言模型包括N元語法(N-gram)模型、遞迴神經網路(Recurrent Neural Network,RNN)模型以及序列到序列(Seq2Seq)模型。在一實施例中,也可以使用序列到序列暨專注機制(Attention Mechanism)模型。詞嵌入(word embedding)是概率語言模型最常見的訓練方式。可將訓練好的字詞通過讀熱編碼(One Hot Encoding)後輸入概率語言模型。In detail, a speech recognition module (not shown) can recognize the user's speech information according to a speech recognition algorithm (such as speech-to-text (Speech To Text, STT) recognition) to generate text information. Semantic analysis can be performed by a semantic analysis module (not shown). For example, the semantic analysis module can generate an analysis result based on text information through a Natural Language Understanding (NLU) algorithm. The function of the natural language understanding algorithm is to enable the system to understand the aforementioned text information, so that it can understand the text, language and extract information to help the execution of downstream tasks such as text classification, syntax analysis, and information search. The first step in natural language understanding is to identify words from text information and to distinguish different parts of speech. Then, the probabilistic language model can be used to predict the words that may follow. Common probabilistic language models include N-gram (N-gram) models, recurrent neural network (Recurrent Neural Network, RNN) models, and sequence-to-sequence (Seq2Seq) models. In an embodiment, a sequence-to-sequence and attention mechanism (Attention Mechanism) model may also be used. Word embedding is the most common training method for probabilistic language models. The trained words can be input into the probabilistic language model through One Hot Encoding.
詞向量(word to vector,Word2Vec)是由Google團隊所提出的詞嵌入模型,藉由機器學習訓練大量的文章以使用向量來代表單字的意思。由於單字語意的資訊量十分龐大,可使用要素分析(principal component analysis,PCA)進行降維。藉由忽略向量中資訊量較少的維度,來達到縮減維度的效果。透過自然語言理解可對前述文字信息進行意圖識別以及實體提取。其中,實體提取的結果為物件信息,且意圖識別的結果為任務信息。舉例來說,用戶的語音信息可以是「將這件衣服加入我的最愛」、「將這件衣服加入購物車」或「搜尋關於這件衣服的更多資訊」。可以通過語音辨識以及語意解析以產生物件信息(衣服)與任務信息(加入我的最愛、加入購物車或搜尋更多資訊)。Word to vector (Word2Vec) is a word embedding model proposed by the Google team, which uses machine learning to train a large number of articles to use vectors to represent the meaning of words. Due to the huge amount of semantic information of a single word, principal component analysis (PCA) can be used for dimensionality reduction. Dimensionality reduction is achieved by ignoring dimensions with less information in the vector. Through natural language understanding, intent recognition and entity extraction can be performed on the aforementioned text information. Among them, the result of entity extraction is object information, and the result of intent recognition is task information. For example, the user's voice message can be "add this dress to my favorite", "add this dress to the shopping cart", or "search for more information about this dress". Object information (clothes) and task information (add to favorites, add to shopping cart, or search for more information) can be generated through speech recognition and semantic analysis.
在本實施例中,語音辨識與語意解析的工作可由雲端伺服器(圖未示)來執行。例如可透過LUIS(Language Understanding Intelligent Service)或Google Dialogflow等雲端服務來進行。也就是說,由處理器140將接收到的用戶語音信息傳輸到雲端伺服器,再從雲端伺服器接收最終的解析結果。然而本發明不以此為限,在一實施例中,語音辨識的工作可在本地端(電子裝置100)執行,並且語意解析的工作可在雲端伺服器來執行。甚至,在一實施例中,語音辨識以及語意解析的工作都在本地端執行。In this embodiment, the voice recognition and semantic analysis can be performed by a cloud server (not shown). For example, it can be done through cloud services such as LUIS (Language Understanding Intelligent Service) or Google Dialogflow. That is to say, the
在步驟S370中,由處理器140依據注視座標信息以及物件信息以產生選定物件信息。以圖4為例,由於用戶視線落點在目標401上,因此處理器140可以依據目標401於顯示面板110的顯示區域上的位置(即注視座標信息)以及物件信息(例如衣服),而產生選定物件信息。舉例來說,處理器140可以依據注視座標信息鎖定網頁資訊的一圖片資訊,並依據前述圖片資訊以及物件信息產生選定物件信息。簡單來說,以將網頁商品加入購物車的操作過程為例,可以透過偵測用戶視線移動以取代手動移動游標的動作。當視線停止在一處時觸發語音助理功能。並且,透過由用戶下達語音信息來取代手動點擊滑鼠左鍵以將商品加入購物車的動作。In step S370, the
請重新參照圖3,在步驟S380中,由處理器140依據選定物件信息以及任務信息已執行相應的任務動作。任務動作包括但不限於將選定物件信息對應的商品放入線上購物車、將選定物件信息對應的網址加入我的最愛、在網際網路上找尋選定物件信息的資訊(如尺寸、顏色等資訊)、於多個購物網站中找尋包含對應商品且具有最高相關度的購物網站,以及於多個購物網站中找尋選定物件信息對應的商品的價格資訊以進行比價並找出具有最低價格的購物網站。在執行完選購的動作後,由處理器140產生反饋信息(步驟S390),如商品價格、商品資訊、購買頁面等。在一實施例中,可以由處理器140控制揚聲器(圖未示)發出聲音以提示用戶任務已完成。在一實施例中,可以由處理器140控制顯示面板110顯示文字和/或圖片以提示用戶任務已完成。Please refer to FIG. 3 again, in step S380, the
圖5繪示為本發明第二實施例的電子裝置的操作方法的步驟流程圖。圖6繪示為本發明第二實施例的使用情境的示意圖。請同時參照圖1、圖5與圖6,在步驟S501中,透過顯示面板110播放影片。步驟S502~S505的相關細節可以參照圖3所示第一實施例的步驟S320~350的說明,於此不再贅述。在步驟S506中,在語音輸入模組130接收到來自用戶的語音信息時,由處理器140擷取當前畫面。在此之前,用戶需手動暫停影片播放,以免處理器140擷取到非目標畫面。在步驟S507中,由處理器140依據注視座標信息決定當前畫面的局部畫面502,其中前述局部畫面502的面積小於當前畫面。並且,由處理器140依據局部畫面502決定興趣物件。在步驟S508中,由處理器140將語音信息轉化為文字信息,並進一步依據文字信息進行語意解析,以產生物件信息以及任務信息。在步驟S509中,由處理器140依據興趣物件以及物件信息以產生選定物件信息。步驟S510與步驟S511的細節可以分別參考圖3的步驟S380與步驟S390的說明,於此不再贅述。FIG. 5 is a flow chart showing the steps of the operation method of the electronic device according to the second embodiment of the present invention. FIG. 6 is a schematic diagram of a usage scenario of the second embodiment of the present invention. Please refer to FIG. 1 , FIG. 5 and FIG. 6 at the same time. In step S501 , a video is played through the display panel 110 . For the relevant details of steps S502-S505, reference may be made to the description of steps S320-350 of the first embodiment shown in FIG. 3 , which will not be repeated here. In step S506, when the
請同時參考圖1與圖6,在一操作情境中,當用戶暫停播放影片,並注視影面人物的衣服說出「這件衣服要去哪裡購買」時,處理器140可以依據語音信息解析出「衣服」、「要去」、「哪裡」等物件信息及「找尋網站」的任務信息。並且,處理器140可決定興趣物件503。處理器140並可對興趣物件503的色彩進行識別。並且,可透過邊緣檢測以及形狀來判斷袖子樣式、領口樣式以及衣服輪廓。如此一來,可以得到例如「白色」、「長袖」、「圓領」、「上衣」等關鍵詞。處理器140可依據前述多個關鍵詞查找相關程度最高的商品販售頁面,並透過顯示面板110顯示該商品販售頁面。此外,處理器140可經由文字、圖片以及語音當中至少一種方式來提示使用者「這件衣服可以在這裡購買到」。Please refer to FIG. 1 and FIG. 6 at the same time. In an operation scenario, when the user pauses the video and looks at the clothes of the characters in the video and says "Where do I want to buy this clothes?", the
具體來說,當用戶視線落點位在目標501,處理器140可依據注視座標信息決定當前畫面的局部畫面502。處理器140並可決定局部畫面502中興趣物件503。處理器140可基於注視座標信息進行擴展以產生具有預定尺寸的局部畫面502。在一實施例中,處理器140可依據注視座標信息以將用戶視線落點做為中心進行擴展。除了產生具有預定尺寸的局部畫面502之外,也可以採用動態決定局部畫面502的範圍。在一實施例中,當處理器140無法依據局部畫面502決定興趣物件503時(可能為局部畫面502尺寸過小所導致),由處理器140增加局部畫面502的尺寸。Specifically, when the user's gaze falls on the
在決定興趣物件的細節上,在一實施例中,處理器140可依據注視座標信息所對應的視線落點的像素與相鄰的多個像素來決定興趣物件503,其中注視座標信息所對應的像素的顏色資訊與前述相鄰的多個像素的顏色資訊相近。以圖5為例,處理器140可以基於目標501的像素的顏色資訊(例如白色)找尋相鄰的多個具有相近顏色資訊的像素,以此決定興趣物件503。Regarding the details of determining the object of interest, in one embodiment, the
另外,處理器140可運用深度學習技術對局部畫面502做影像辨識以決定興趣物件,其類型包括影像分類(Classification)、分類與定位(Classification + Localization)、物件偵測(object detection)以及實例分割(Instance Segmentation)。影像分類是從來源影像去進行分類且一張影像只會被判別為一種類別。分類與定位可標註單一物體(Single Object)所在的位置及大小。物體偵測可進一步標註多個物體(Multiple Object)所在的位置及大小。實體切割可標註實體(Instance),其中同一類的物體可以區分各別的位置及大小,尤其適用於物體之間有重疊的狀況。In addition, the
圖7A、7B、8A與8B皆繪示為本發明運用深度學習技術對局部畫面做影像辨識的示意圖。請同時參見圖1與圖7A,在一實施例中,處理器140可基於局部畫面701以透過影像分類來決定興趣物件702。透過大量已分類影像作為訓練資料的來源,訓練好的深度學習模型能夠快速且準確地進行分類。其中,可以採用邏輯斯回歸(Logistic Regression)作為分類模型,透過激勵函數(例如Sigmoid Activation Function)來判斷各類別的機率大小,並採用機率最大的類別。並且,處理器140可依據分類結果來決定興趣物件702。請同時參見圖1與圖7B,在一實施例中,處理器140可對局部畫面701進行分類,並依據注視座標信息進行定位,藉此決定興趣物件702。物件定位需要輸出定界框(bounding box)703的四個參數的預測值,且目標向量包括定界框資訊的多維矩陣。7A, 7B, 8A and 8B are schematic diagrams of the present invention using deep learning technology to perform image recognition on partial images. Please refer to FIG. 1 and FIG. 7A at the same time. In one embodiment, the
請同時參見圖1與圖8A,在一實施例中,處理器140可基於局部畫面801以透過物件偵測來決定興趣物件。物件偵測涉及到定界框與物件標籤。物件偵測的目標向量除了分類標籤編碼的一維向量外還需加上定界框的四個參數預測值。來源影像僅需經過前處理(pre-process)就能提供給深度學習模型並生辨識結果,包括物件位置、物件大小、物件種類、辨識分數等。與影像分類以及分類與定位不同的是,物件偵測可將分類與定位應用在多個物件(如物件802~805)上,而不僅限於單一物件。請同時參見圖1與圖8B,在一實施例中,處理器140可基於局部畫面801以透過實體切割結果來決定興趣物件。實體切割的目標是完整的切割出物件的輪廓,而不僅僅是位置和一個粗略的範圍。基於實體切割此目的而衍生出的遮罩區域卷積類神經網路(Mask R-CNN)成了這類深度學習模型的基底,例如mask_inception、mask_resnet皆為目前常見的模型。其中,若經過實例分割產生的物件有多個時(如物件802~805),由處理器140選擇當中的一個做為興趣物件。在本實施例中,處理器140可例如包括一個或多個的中央處理器(Central Processing Unit,CPU)、微處理器(Microprocessor Control Unit,MCU)或現場可程式閘陣列(Field Programmable Gate Array,FPGA)等諸如此類的處理電路或控制電路所組成,但本發明並不以此為限。Please refer to FIG. 1 and FIG. 8A at the same time. In one embodiment, the
本發明還可先執行第二實施例的步驟再執行第一實施例的步驟,並以此做為第三實施例。關於第一實施例與第二實施例的細節可以分別參考圖3與圖5的說明,於此不再贅述。舉例來說,當圖5所示步驟執行結束後,一商品頁面被開啟。此時,用戶可以依循圖3所示第一實施例的步驟,以透過眼球追蹤以及在語音助理的輔助下,快速且準確地對目標商品執行動作任務(例如加入購物車)。In the present invention, the steps of the second embodiment can be executed first, and then the steps of the first embodiment can be executed, and this can be regarded as the third embodiment. For the details of the first embodiment and the second embodiment, reference may be made to the descriptions of FIG. 3 and FIG. 5 respectively, which will not be repeated here. For example, after the steps shown in FIG. 5 are executed, a product page is opened. At this point, the user can follow the steps of the first embodiment shown in FIG. 3 to quickly and accurately perform action tasks (such as adding to a shopping cart) on the target product through eye tracking and with the assistance of the voice assistant.
綜上所述,本發明可以眼球追蹤來替代滑鼠游標,並搭配語音助理功能,以依據語意分析結果執行任務動作。藉此,可以避免用戶點擊錯誤的情形。並且,也可以減少或避免語音助理判斷語意不清或是理解錯誤的問題。如此一來,可在不中斷用戶瀏覽操作的前提下快速且準確地執行任務動作,兼顧操作便利性以及任務執行的正確性。To sum up, the present invention can replace the mouse cursor with eye tracking, and cooperate with the voice assistant function to perform tasks according to the semantic analysis results. In this way, it is possible to avoid the situation that the user clicks wrongly. In addition, it is also possible to reduce or avoid the problem that the voice assistant judges unclear meaning or misunderstands. In this way, the task action can be executed quickly and accurately without interrupting the user's browsing operation, taking into account the convenience of operation and the correctness of task execution.
100:電子裝置
110:顯示面板
120:眼球追蹤模組
130:語音辨識模組
140:處理器
401:目標
501:目標
502:局部畫面
503:興趣物件
701:局部畫面
702:興趣物件
703:定界框
801:局部畫面
802~805:物件
D1:方向
S210~S250、S310~S390、S501~S511:步驟100: Electronic device
110: display panel
120:Eye Tracking Module
130:Speech recognition module
140: Processor
401: target
501: target
502: Partial screen
503: object of interest
701: Partial screen
702: object of interest
703: bounding box
801:
圖1繪示為本發明一實施例的電子裝置的方塊示意圖。 圖2繪示為本發明一實施例的電子裝置的操作方法的步驟流程圖。 圖3繪示為本發明第一實施例的電子裝置的操作方法的步驟流程圖。 圖4繪示為本發明第一實施例的使用情境的示意圖。 圖5繪示為本發明第二實施例的電子裝置的操作方法的步驟流程圖。 圖6繪示為本發明第二實施例的使用情境的示意圖。 圖7A、7B、8A與8B皆繪示為本發明運用深度學習技術對局部畫面做影像辨識的示意圖。FIG. 1 is a schematic block diagram of an electronic device according to an embodiment of the present invention. FIG. 2 is a flow chart showing the steps of the operation method of the electronic device according to an embodiment of the present invention. FIG. 3 is a flow chart showing the steps of the operation method of the electronic device according to the first embodiment of the present invention. FIG. 4 is a schematic diagram of a usage scenario of the first embodiment of the present invention. FIG. 5 is a flow chart showing the steps of the operation method of the electronic device according to the second embodiment of the present invention. FIG. 6 is a schematic diagram of a usage scenario of the second embodiment of the present invention. 7A, 7B, 8A and 8B are schematic diagrams of the present invention using deep learning technology to perform image recognition on partial images.
S210~S250:步驟S210~S250: steps
Claims (30)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202063045795P | 2020-06-29 | 2020-06-29 | |
US63/045,795 | 2020-06-29 |
Publications (2)
Publication Number | Publication Date |
---|---|
TW202201179A TW202201179A (en) | 2022-01-01 |
TWI795823B true TWI795823B (en) | 2023-03-11 |
Family
ID=80787861
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW110123573A TWI795823B (en) | 2020-06-29 | 2021-06-28 | Electronic device and its operation method |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI795823B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW201411413A (en) * | 2012-05-09 | 2014-03-16 | Intel Corp | Eye tracking based selective accentuation of portions of a display |
CN106415445A (en) * | 2014-06-06 | 2017-02-15 | 英特尔公司 | Technologies for viewer attention area estimation |
CN109801137A (en) * | 2019-01-18 | 2019-05-24 | 泰康保险集团股份有限公司 | Commercial articles ordering method, product ordering apparatus, storage medium and electronic equipment |
CN110785735A (en) * | 2017-07-11 | 2020-02-11 | 三星电子株式会社 | Apparatus and method for voice command scenario |
-
2021
- 2021-06-28 TW TW110123573A patent/TWI795823B/en active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW201411413A (en) * | 2012-05-09 | 2014-03-16 | Intel Corp | Eye tracking based selective accentuation of portions of a display |
CN106415445A (en) * | 2014-06-06 | 2017-02-15 | 英特尔公司 | Technologies for viewer attention area estimation |
CN110785735A (en) * | 2017-07-11 | 2020-02-11 | 三星电子株式会社 | Apparatus and method for voice command scenario |
CN109801137A (en) * | 2019-01-18 | 2019-05-24 | 泰康保险集团股份有限公司 | Commercial articles ordering method, product ordering apparatus, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
TW202201179A (en) | 2022-01-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10977515B2 (en) | Image retrieving apparatus, image retrieving method, and setting screen used therefor | |
EP3267362B1 (en) | Machine learning image processing | |
Cheng et al. | An image-to-class dynamic time warping approach for both 3D static and trajectory hand gesture recognition | |
Lu et al. | A video-based automated recommender (VAR) system for garments | |
Kong et al. | Interactive phrases: Semantic descriptionsfor human interaction recognition | |
US20190332872A1 (en) | Information push method, information push device and information push system | |
CN109844767A (en) | Visual search based on image analysis and prediction | |
Jaiswal et al. | An intelligent recommendation system using gaze and emotion detection | |
Alam et al. | Unified learning approach for egocentric hand gesture recognition and fingertip detection | |
US10726358B2 (en) | Identification of individuals and/or times using image analysis | |
JP2019133620A (en) | Coordination retrieval method, computer device and computer program that are based on coordination of multiple objects in image | |
CN101751648A (en) | Online try-on method based on webpage application | |
Fragkiadakis et al. | Signing as input for a dictionary query: Matching signs based on joint positions of the dominant hand | |
Mosayyebi et al. | Gender recognition in masked facial images using EfficientNet and transfer learning approach | |
KR20200140588A (en) | System and method for providing image-based service to sell and buy product | |
TWI795823B (en) | Electronic device and its operation method | |
Dewan et al. | A deep learning pipeline for Indian dance style classification | |
Jain et al. | Gestarlite: An on-device pointing finger based gestural interface for smartphones and video see-through head-mounts | |
CN108629824B (en) | Image generation method and device, electronic equipment and computer readable medium | |
Goel | Shopbot: an image based search application for e-commerce domain | |
Gavrilescu | Recognizing human gestures in videos by modeling the mutual context of body position and hands movement | |
Mao et al. | SC-YOLO: Provide Application-Level Recognition and Perception Capabilities for Smart City Industrial Cyber-Physical System | |
Roxo et al. | Is Gender “In-the-Wild” Inference Really a Solved Problem? | |
WO2023041181A1 (en) | Electronic device and method for determining human height using neural networks | |
Ren et al. | Toward three-dimensional human action recognition using a convolutional neural network with correctness-vigilant regularizer |