TW202213040A

TW202213040A - Gesture control method based on image and electronic apparatus using the same

Info

Publication number: TW202213040A
Application number: TW109131889A
Authority: TW
Inventors: 吳政澤; 李安正; 洪英士
Original assignee: 宏碁股份有限公司
Priority date: 2020-09-16
Filing date: 2020-09-16
Publication date: 2022-04-01
Also published as: TWI757871B

Abstract

A gesture control method based on image and an electronic apparatus using the same are provided. An image is displayed via the display screen. A hand image of a user hand is captured through the image capturing device. A gesture performed by the user hand in 3D space is detected by using the hand image, and whether the gesture is matched a predetermined control gesture is determined by using the hand image. If yes, keypoint detection is performed on the hand image to obtain at least one keypoint coordinate of the user hand. The keypoint coordinate is mapped to at least one 2D screen coordinate on the display screen. An operation is performed to a image object in the image according to the 2D screen coordinate.

Description

Image-based gesture control method and electronic device using the same

本發明是有關於一種電子裝置，且特別是有關於一種基於影像的手勢控制方法與使用此方法的電子裝置。The present invention relates to an electronic device, and more particularly, to an image-based gesture control method and an electronic device using the same.

在傳統的使用者輸入介面中，通常是利用按鍵、鍵盤或滑鼠來操控電子裝置。隨著科技的進步，新一代的使用者介面作得越來越人性化且越來越方便，其中觸控介面即是一個成功的例子，其讓使用者可直覺式地點選螢幕上的物件而達到操控的效果。對於現今的觸控式電子產品而言，使用者可以透過觸控筆或手指來操控電子產品，使得電子產品可反應於觸控操作執行各種功能。然而，隨著電子產品的功能越來越多，直接碰觸螢幕的觸控操作方式已漸漸不能滿足使用者操作上的需求。像是，觸控技術需使用者觸控或接近觸控螢幕才能發生效用，此將直接限制使用者與電子產品之間的實際距離。另一方面，當電子產品的螢幕不具備觸控功能時，使用者必須使用額外的輸入裝置來操控電子裝置，此種操控方式通常較不直覺且不便利。此外，對應於不同的使用者操作情境，都有其較為適合的使用者輸入方式。In the conventional user input interface, the electronic device is usually controlled by using keys, a keyboard or a mouse. With the advancement of technology, a new generation of user interfaces has become more and more user-friendly and more convenient, of which the touch interface is a successful example, which allows users to intuitively select objects on the screen and achieve the effect of control. For today's touch-sensitive electronic products, a user can control the electronic product through a stylus or a finger, so that the electronic product can perform various functions in response to the touch operation. However, as electronic products have more and more functions, the touch operation method of directly touching the screen has gradually been unable to meet the user's operational needs. For example, touch technology requires the user to touch or approach the touch screen to be effective, which directly limits the actual distance between the user and the electronic product. On the other hand, when the screen of the electronic product does not have the touch function, the user must use an additional input device to control the electronic device, which is usually less intuitive and inconvenient. In addition, corresponding to different user operation situations, there are more suitable user input methods.

有鑑於此，本發明提出一種基於影像的手勢控制方法與使用此方法的電子裝置，可提高電子裝置使用上的直覺性與便利性。In view of this, the present invention provides an image-based gesture control method and an electronic device using the method, which can improve the intuition and convenience of using the electronic device.

本發明實施例提供一種基於影像的手勢控制方法，適用於包括影像擷取裝置與顯示螢幕的電子裝置，並包括下列步驟。透過顯示螢幕顯示影像。透過影像擷取裝置對使用者手部擷取手部影像。利用手部影像偵測使用者手部於三維空間中進行的一手勢，利用手部影像判斷手勢是否符合預定控制手勢。若是，對手部影像進行關鍵點偵測，以獲取使用者手部的至少一關鍵點座標。將至少一關鍵點座標映射至顯示螢幕上的至少一二維螢幕座標。依據至少一二維螢幕座標對該影像中的影像物件執行一操作。An embodiment of the present invention provides an image-based gesture control method, which is suitable for an electronic device including an image capture device and a display screen, and includes the following steps. Display the image through the display screen. A hand image is captured from the user's hand through the image capturing device. The hand image is used to detect a gesture performed by the user's hand in the three-dimensional space, and the hand image is used to determine whether the gesture conforms to the predetermined control gesture. If so, perform key point detection on the hand image to obtain at least one key point coordinate of the user's hand. At least one key point coordinate is mapped to at least one two-dimensional screen coordinate on the display screen. An operation is performed on the image object in the image according to at least one two-dimensional screen coordinate.

本發明實施例提供一種電子裝置，其包括影像擷取裝置、儲存裝置，以及處理器。處理器耦接影像擷取裝置與儲存裝置，並經配置以執行下列步驟。透過顯示螢幕顯示影像。透過影像擷取裝置對使用者手部擷取手部影像。利用手部影像偵測使用者手部於三維空間中進行的一手勢，利用手部影像判斷手勢是否符合預定控制手勢。若是，對手部影像進行關鍵點偵測，以獲取使用者手部的至少一關鍵點座標。將至少一關鍵點座標映射至顯示螢幕上的至少一二維螢幕座標。依據至少一二維螢幕座標對該影像中的影像物件執行一操作。An embodiment of the present invention provides an electronic device including an image capture device, a storage device, and a processor. The processor is coupled to the image capture device and the storage device, and is configured to perform the following steps. Display the image through the display screen. A hand image is captured from the user's hand through the image capturing device. The hand image is used to detect a gesture performed by the user's hand in the three-dimensional space, and the hand image is used to determine whether the gesture conforms to the predetermined control gesture. If so, perform key point detection on the hand image to obtain at least one key point coordinate of the user's hand. At least one key point coordinate is mapped to at least one two-dimensional screen coordinate on the display screen. An operation is performed on the image object in the image according to at least one two-dimensional screen coordinate.

基於上述，於本發明的實施例中，使用者可透過浮空手勢對顯示螢幕所顯示的影像中的一影像物件執行一操作，讓使用者在使用圖像設計軟體時能體驗更為直覺且便利的操作效果。Based on the above, in the embodiment of the present invention, the user can perform an operation on an image object in the image displayed on the display screen through the floating gesture, so that the user can experience a more intuitive and intuitive experience when using the image design software. Convenient operation effect.

本發明的部份實施例接下來將會配合附圖來詳細描述，以下的描述所引用的元件符號，當不同附圖出現相同的元件符號將視為相同或相似的元件。Some embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Element symbols quoted in the following description will be regarded as the same or similar elements when the same element symbols appear in different drawings.

圖1是依照本發明一實施例的電子裝置的功能方塊圖。請參照圖1，電子裝置10包括顯示螢幕110、儲存裝置120、影像擷取裝置130，以及處理器140。電子裝置10可以是筆記型電腦、桌上型電腦、智慧型手機、平板電腦、遊戲機或其他具有顯示功能的電子裝置，在此並不對電子裝置10的種類加以限制。FIG. 1 is a functional block diagram of an electronic device according to an embodiment of the present invention. Referring to FIG. 1 , the electronic device 10 includes a display screen 110 , a storage device 120 , an image capture device 130 , and a processor 140 . The electronic device 10 may be a notebook computer, a desktop computer, a smart phone, a tablet computer, a game console, or other electronic devices having a display function, and the type of the electronic device 10 is not limited herein.

顯示螢幕110可以是液晶顯示螢幕（Liquid Crystal Display，LCD）、發光二極體（Light Emitting Diode，LED）顯示螢幕、有機發光二極體（Organic Light Emitting Diode，OLED）等各類型的顯示螢幕，本發明對此不限制。The display screen 110 may be a liquid crystal display screen (LCD), a light-emitting diode (Light Emitting Diode, LED) display screen, an organic light-emitting diode (Organic Light Emitting Diode, OLED) and other types of display screens. The present invention is not limited to this.

儲存裝置120用以儲存檔案、影像、指令、程式碼、軟體元件等等資料，其可以例如是任意型式的固定式或可移動式隨機存取記憶體（random access memory，RAM）、唯讀記憶體（read-only memory，ROM）、快閃記憶體（flash memory）、硬碟或其他類似裝置、積體電路及其組合。The storage device 120 is used to store data such as files, images, instructions, code, software components, etc., which can be, for example, any type of fixed or removable random access memory (random access memory, RAM), read-only memory body (read-only memory, ROM), flash memory (flash memory), hard disk or other similar devices, integrated circuits and combinations thereof.

影像擷取裝置130可包括具有電荷耦合元件（Charge Coupled Device，CCD）或互補式金氧半導體（Complementary Metal-Oxide Semiconductor，CMOS）元件的影像感測器，用以擷取顯示螢幕110前方的影像，以偵測顯示螢幕110前方的使用者在三維空間中進行手勢操作的位置及種類。舉例而言，影像擷取裝置130可以是RGB彩色照相裝置，但本發明對此不限制。The image capturing device 130 may include an image sensor having a Charge Coupled Device (CCD) or a Complementary Metal-Oxide Semiconductor (CMOS) device for capturing an image in front of the display screen 110 , so as to detect the position and type of gesture operation performed by the user in front of the display screen 110 in the three-dimensional space. For example, the image capturing device 130 may be an RGB color camera device, but the invention is not limited thereto.

處理器140耦接儲存裝置120、影像擷取裝置130以及顯示螢幕110，用以控制電子裝置10的整體作動，其例如是中央處理單元（Central Processing Unit，CPU），或是其他可程式化之一般用途或特殊用途的微處理器（Microprocessor）、數位訊號處理器（Digital Signal Processor，DSP）、可程式化控制器、特殊應用積體電路（Application Specific Integrated Circuits，ASIC）、可程式化邏輯裝置（Programmable Logic Device，PLD）、或其他類似裝置或這些裝置的組合。處理器140可執行記錄於儲存裝置120中的程式碼、軟體模組、指令等等，以實現本發明實施例的手勢控制方法。The processor 140 is coupled to the storage device 120 , the image capture device 130 and the display screen 110 for controlling the overall operation of the electronic device 10 , such as a central processing unit (CPU) or other programmable General-purpose or special-purpose microprocessors (Microprocessors), digital signal processors (DSPs), programmable controllers, application-specific integrated circuits (ASICs), programmable logic devices (Programmable Logic Device, PLD), or other similar devices or combinations of these devices. The processor 140 can execute the program codes, software modules, instructions, etc. recorded in the storage device 120 to implement the gesture control method of the embodiment of the present invention.

圖2是依照本發明一實施例的手勢控制方法的流程圖。請參照圖2，本實施例的方式適用於上述實施例中的電子裝置10，以下即搭配電子裝置10中的各項元件說明本實施例的詳細步驟。FIG. 2 is a flowchart of a gesture control method according to an embodiment of the present invention. Referring to FIG. 2 , the method of this embodiment is applicable to the electronic device 10 in the above-mentioned embodiment. The following describes the detailed steps of this embodiment in conjunction with various elements in the electronic device 10 .

於步驟S201，透過顯示螢幕110顯示影像。於一實施例中，電子裝置10可在使用者操作圖像設計軟體或影像播放軟體時，透過顯示螢幕110顯示照片或其他影像圖檔。本發明對於影像的圖檔格式並不加以限制。於步驟S202，處理器140透過影像擷取裝置130對使用者手部擷取手部影像。In step S201 , an image is displayed through the display screen 110 . In one embodiment, the electronic device 10 can display photos or other image files through the display screen 110 when the user operates the image design software or the image playback software. The present invention does not limit the image file format. In step S202 , the processor 140 captures a hand image of the user's hand through the image capture device 130 .

於步驟S203，處理器140利用手部影像偵測使用者手部於三維空間中進行的手勢，並獲取使用者手部的至少一關鍵點座標。使用者透過運動手指頭可比出各種不同的手勢。於一實施例中，利用膚色偵測、邊緣偵測、機器學習模型或其他相關電腦視覺技術，處理器140可透過手部影像來辨識使用者手部的手勢的種類。於一實施例中，處理器140可依據手部影像判斷使用者的手勢為握拳手勢、單指手勢、雙指手勢或其他多指手勢。舉例而言，處理器140可依據手部影像判斷使用者的手勢是否為食指伸出的單指手勢。或者，處理器140可依據手部影像判斷使用者的手勢是否為食指伸出與大拇指伸出的雙指手勢。於本實施例中，處理器140可先判斷手勢是否符合預定控制手勢，若是，處理器140對手部影像進行關鍵點偵測，以獲取使用者手部的至少一關鍵點座標。In step S203, the processor 140 uses the hand image to detect gestures performed by the user's hand in the three-dimensional space, and acquires at least one key point coordinate of the user's hand. The user can compare various gestures by moving the finger. In one embodiment, using skin color detection, edge detection, machine learning model or other related computer vision technologies, the processor 140 can identify the type of the user's hand gestures through the hand image. In one embodiment, the processor 140 can determine, according to the hand image, that the user's gesture is a clenched fist gesture, a single-finger gesture, a two-finger gesture, or other multi-finger gestures. For example, the processor 140 may determine whether the user's gesture is a single-finger gesture in which the index finger is extended according to the hand image. Alternatively, the processor 140 may determine whether the user's gesture is a two-finger gesture of extending the index finger and extending the thumb according to the hand image. In this embodiment, the processor 140 can first determine whether the gesture conforms to the predetermined control gesture, and if so, the processor 140 performs key point detection on the hand image to obtain at least one key point coordinate of the user's hand.

此外，於一實施例中，處理器140可透過機器學習模型來對使用者手部影像進行關鍵點偵測（Keypoint Triangulation），以偵測出使用者手部的多個手部關鍵點，並獲取這些手部關鍵點的三維關鍵點座標。如圖3所示，圖3是依照本發明一實施例的使用者手部的關鍵點座標的示意圖。處理器140可以透過機器學習模型從單一手部影像Img_f推算出21個使用者手部的3D關鍵點座標。如圖3所示，處理器140可獲取多個關鍵點座標（例如2食指指尖的的關鍵點座標KP1與大拇指指尖的關鍵點座標KP2）。In addition, in one embodiment, the processor 140 may perform Keypoint Triangulation on the user's hand image through the machine learning model, so as to detect a plurality of hand key points of the user's hand, and Get the 3D keypoint coordinates of these hand keypoints. As shown in FIG. 3 , FIG. 3 is a schematic diagram of the coordinates of key points of a user's hand according to an embodiment of the present invention. The processor 140 can calculate the 3D key point coordinates of the 21 user's hands from the single hand image Img_f through the machine learning model. As shown in FIG. 3 , the processor 140 may acquire a plurality of key point coordinates (for example, the key point coordinates KP1 of the fingertips of two index fingers and the key point coordinates KP2 of the thumb tip).

於步驟S204，處理器140將至少一關鍵點座標映射至顯示螢幕110上的至少一二維螢幕座標。為了達成操控顯示螢幕110上的影像物件的目的，處理器140將使用者手部的關鍵點座標映射為二維螢幕座標，從而依據二維螢幕座標執行後續操作。具體而言，處理器140可先將三維關鍵點座標投影轉換為二維平面上的二維虛擬座標後，再將此二維虛擬座標正規化為符合螢幕座標系統的二維螢幕座標。於一實施例中，處理器140將至少一關鍵點座標投影至使用者手部與影像擷取裝置130之間的虛擬平面上，以獲取虛擬平面上的至少一二維虛擬座標。接著，依據顯示螢幕的解析度與螢幕選定範圍，處理器140正規化至少一二維虛擬座標，以獲取顯示螢幕110上的至少一二維螢幕座標。In step S204 , the processor 140 maps at least one key point coordinate to at least one two-dimensional screen coordinate on the display screen 110 . In order to achieve the purpose of manipulating the image objects on the display screen 110 , the processor 140 maps the coordinates of the key points of the user's hand into two-dimensional screen coordinates, so as to perform subsequent operations according to the two-dimensional screen coordinates. Specifically, the processor 140 may first convert the 3D key point coordinates into 2D virtual coordinates on a 2D plane, and then normalize the 2D virtual coordinates into 2D screen coordinates conforming to the screen coordinate system. In one embodiment, the processor 140 projects at least one key point coordinate onto a virtual plane between the user's hand and the image capturing device 130 to acquire at least one two-dimensional virtual coordinate on the virtual plane. Next, the processor 140 normalizes at least one two-dimensional virtual coordinate to obtain at least one two-dimensional screen coordinate on the display screen 110 according to the resolution of the display screen and the screen selection range.

詳細而言，圖4是依照本發明一實施例的產生二維螢幕座標的示意圖。請參照圖4，食指指尖的三維關鍵點座標KP1為(X,Y,Z)。處理器140可將關鍵點座標KP1投影至使用者手部與影像擷取裝置130的相機位置C1之間的虛擬平面41上，而獲取虛擬平面41上的二維虛擬座標PV1，其表示為(x,y)。接著，處理器140可依據螢幕解析度對二維虛擬座標PV1進行正規化處理而產生螢幕座標系上的二維螢幕座標PS1，其表示為(x _cur,y _cur)。 In detail, FIG. 4 is a schematic diagram of generating two-dimensional screen coordinates according to an embodiment of the present invention. Referring to FIG. 4 , the three-dimensional key point coordinate KP1 of the index finger tip is (X, Y, Z). The processor 140 can project the key point coordinate KP1 on the virtual plane 41 between the user's hand and the camera position C1 of the image capture device 130 to obtain a two-dimensional virtual coordinate PV1 on the virtual plane 41, which is expressed as ( x, y). Next, the processor 140 can normalize the two-dimensional virtual coordinate PV1 according to the screen resolution to generate a two-dimensional screen coordinate PS1 on the screen coordinate system, which is represented as (x _cur , y _cur ).

於一實施例中，基於相似三角形原理，處理器140可依據一深度比例而將三維關鍵點座標KP1轉換為二維虛擬座標PV1。處理器140將至少一關鍵點座標KP1的第一座標分量乘上一深度比例f/Z，而獲取至少一二維虛擬座標PV1的第一座標分量，即x=X*f/Z。處理器140將至少一關鍵點座標KP1的第二座標分量乘上一深度比例f/Z，而獲取至少一二維虛擬座標PV1的第二座標分量，即y=Y*f/Z。深度比例f/Z為虛擬平面41與影像擷取裝置130之間的預設距深度f比上至少一關鍵點座標KP1的第三座標分量Z。In one embodiment, based on the similar triangle principle, the processor 140 can convert the three-dimensional key point coordinate KP1 into a two-dimensional virtual coordinate PV1 according to a depth ratio. The processor 140 multiplies the first coordinate component of the at least one key point coordinate KP1 by a depth ratio f/Z to obtain the first coordinate component of the at least one two-dimensional virtual coordinate PV1, ie x=X*f/Z. The processor 140 multiplies the second coordinate component of the at least one key point coordinate KP1 by a depth ratio f/Z to obtain the second coordinate component of the at least one two-dimensional virtual coordinate PV1, ie y=Y*f/Z. The depth ratio f/Z is the predetermined distance f between the virtual plane 41 and the image capturing device 130 and the third coordinate component Z of the at least one key point coordinate KP1 over the depth f.

然後，繼續參照圖4，處理器140可依據下列式(1)～式(4)與二維虛擬座標PV1決定二維螢幕座標PS1。於此，螢幕選定範圍的左上角座標為(x _min,y _min)，而螢幕選定範圍的右下角座標為(x _max,y _max)。螢幕選定範圍的尺寸與位置可依據實際需求而設置，本發明對此不限制。於一實施例中，當螢幕選定範圍為全螢幕時，(x _min,y _min)可表徵為(0,0)，則(x _max,y _max)可表徵為(S _width-1,S _height-1)，且顯示螢幕110的解析度為S _width*S _height。

式(1)

式(2)

式(3)

式(4) 藉此，處理器140可將使用者手部的一或多個關鍵點座標轉換為顯示螢幕110上螢幕選定範圍內的一或多個二維螢幕座標。 Then, referring to FIG. 4, the processor 140 can determine the two-dimensional screen coordinate PS1 according to the following equations (1) to (4) and the two-dimensional virtual coordinate PV1. Here, the coordinates of the upper left corner of the screen selection range are (x _min , y _min ), and the coordinates of the lower right corner of the screen selection range are (x _max , y _max ). The size and position of the selected range of the screen can be set according to actual needs, which is not limited in the present invention. In one embodiment, when the screen selection range is the full screen, (x _min , y _min ) can be represented as (0, 0), then (x _max , y _max ) can be represented as (S _width -1, S _height -1), and the resolution of the display screen 110 is S _width *S _height .

Formula 1)

Formula (2)

Formula (3)

Equation (4) Thereby, the processor 140 can convert one or more key point coordinates of the user's hand into one or more two-dimensional screen coordinates within a selected range of the screen on the display screen 110 .

最後，於步驟S205，處理器140依據至少一二維螢幕座標對影像中的影像物件執行一操作。具體而言，處理器140可透過各種影像分析技術而從影像中萃取出影像物件，此影像物件例如為影像中的人物、動植物、交通工具、日常用品或其他可辨識影像物件等等。此外，上述操作可包括選取操作、拖曳操作、縮放操作或施於影像物件的其他影像編輯操作，本發明對此不限制。於一實施例中，處理器140可依據關聯於使用者手部的二維螢幕座標識別使用者選取的影像物件。於一實施例中，處理器140可依據關聯於使用者手部的二維螢幕座標將影像物件自第一位置拖曳至第二位置。於一實施例中，處理器140可依據關聯於使用者手部的二維螢幕座標放大或縮小影像物件。於一實施例中，處理器140可依據關聯於使用者手部的二維螢幕座標對影像物件進行顏色處理或其他修圖處理。藉此，使用者可透過非常直覺的操作方式而對影像物件進行各式操作，大幅提昇圖像設計軟體的操作流暢性與方便性。使用者也不會受限於執行觸控操作的距離限制，而可從相距於電子裝置10較遠的位置進行相關操作。Finally, in step S205, the processor 140 performs an operation on the image object in the image according to at least one two-dimensional screen coordinate. Specifically, the processor 140 can extract image objects from the image through various image analysis techniques, such as people, animals and plants, vehicles, daily necessities or other recognizable image objects in the image. In addition, the above operations may include selection operations, drag operations, zoom operations, or other image editing operations performed on the image object, which are not limited in the present invention. In one embodiment, the processor 140 can identify the image object selected by the user according to the two-dimensional screen coordinates associated with the user's hand. In one embodiment, the processor 140 can drag the image object from the first position to the second position according to the two-dimensional screen coordinates associated with the user's hand. In one embodiment, the processor 140 can zoom in or zoom out the image object according to the two-dimensional screen coordinates associated with the user's hand. In one embodiment, the processor 140 may perform color processing or other image retouching processing on the image object according to the two-dimensional screen coordinates associated with the user's hand. In this way, the user can perform various operations on the image object through a very intuitive operation method, which greatly improves the operation fluency and convenience of the image design software. The user is also not limited by the distance limit for performing touch operations, and can perform related operations from a position farther away from the electronic device 10 .

圖5是依照本發明一實施例的手勢控制方法的應用情境示意圖。請參照圖5，使用者U1可透過手勢G1而選取影像Img_1中的影像物件obj_1。具體而言，透過將使用者手部的關鍵點座標KP1映射至顯示螢幕110上的二維螢幕座標PS1，電子裝置10可判斷使用者U1選取影像物件obj_1。在電子裝置10確定使用者選取影像物件obj_1之後，使用者U1可透過手勢G2而將影像Img_1中的影像物件obj_1拖曳至一資料夾，以將影像物件obj_1儲存至使用者選取的資料夾。或者，於其他實施例中，使用者可透過手勢G2而將影像Img_1中的影像物件obj_1拖曳至另一張影像上，使另一張影像上合成有影像物件obj_1。FIG. 5 is a schematic diagram of an application scenario of a gesture control method according to an embodiment of the present invention. Referring to FIG. 5 , the user U1 can select the image object obj_1 in the image Img_1 through the gesture G1 . Specifically, by mapping the key point coordinate KP1 of the user's hand to the two-dimensional screen coordinate PS1 on the display screen 110 , the electronic device 10 can determine that the user U1 selects the image object obj_1 . After the electronic device 10 determines that the user selects the image object obj_1, the user U1 can drag the image object obj_1 in the image Img_1 to a folder through the gesture G2 to save the image object obj_1 to the folder selected by the user. Alternatively, in other embodiments, the user can drag the image object obj_1 in the image Img_1 to another image through the gesture G2, so that the image object obj_1 is synthesized on the other image.

為了更清楚說明本發明，以下將以對影像物件進行選取操作與拖曳操作為範例進行說明。圖6是依照本發明一實施例的手勢控制方法的流程圖。請參照圖6，本實施例的方式適用於上述實施例中的電子裝置10，以下即搭配電子裝置10中的各項元件說明本實施例的詳細步驟。In order to explain the present invention more clearly, the following will take the selection operation and drag operation of the image object as an example for description. FIG. 6 is a flowchart of a gesture control method according to an embodiment of the present invention. Referring to FIG. 6 , the method of this embodiment is applicable to the electronic device 10 in the above-mentioned embodiment, and the detailed steps of this embodiment are described below in conjunction with various elements in the electronic device 10 .

於步驟S601，透過顯示螢幕110顯示影像。於步驟S602，處理器140對影像進行語義分割（Semantic Segmentation）操作，以獲取影像中的影像物件的物件邊界。詳細而言，處理器140可透過語義分割操作，對影像中的每一像素分類為一或多個影像物件或影像背景。圖7是依照本發明一實施例的對影像進行語義分割操作的示意圖。請參照圖7，於一實施例中，處理器140可先對影像Img_2進行物件偵測而偵測出影像Img_2中的影像物件。舉例而言，處理器140可透過機器學習模型（例如CNN模型等等）來對影像Img_2進行物件偵測，以辨識出影像Img_2中的影像物件與其對應物件種類。在進行物件偵測之後，處理器140可獲取對應於各個影像物件的物件框（Bounding box）與物件種類，例如圖7所示的物件框B1～B5。然後，處理器140可對這些物件框B1～B5所框選的影像區塊進行語義分割處理，而將影像Img_2中每一個像素分類為背景與多個影像物件，從而獲取這些影像物件各自對應的物件邊界M1～M5。In step S601 , an image is displayed through the display screen 110 . In step S602, the processor 140 performs a semantic segmentation (Semantic Segmentation) operation on the image to obtain object boundaries of the image objects in the image. In detail, the processor 140 can classify each pixel in the image as one or more image objects or image backgrounds through semantic segmentation operations. FIG. 7 is a schematic diagram of performing a semantic segmentation operation on an image according to an embodiment of the present invention. Referring to FIG. 7 , in one embodiment, the processor 140 may first perform object detection on the image Img_2 to detect the image objects in the image Img_2. For example, the processor 140 may perform object detection on the image Img_2 through a machine learning model (eg, a CNN model, etc.), so as to identify the image objects in the image Img_2 and their corresponding object types. After the object detection is performed, the processor 140 may acquire the bounding box and the object type corresponding to each image object, such as the bounding boxes B1 to B5 shown in FIG. 7 . Then, the processor 140 can perform semantic segmentation processing on the image blocks framed by the object frames B1-B5, and classify each pixel in the image Img_2 into a background and a plurality of image objects, so as to obtain the corresponding images of the image objects. Object boundaries M1 to M5.

於步驟S603，處理器140透過影像擷取裝置130對使用者手部擷取手部影像。於步驟S604，處理器140利用手部影像判斷手勢是否符合預定控制手勢。於本實施例中，預定控制手勢包括特定單指手勢與特定雙指手勢。於本實施例中，當手勢不符合特定單指手勢或特定雙指手勢時，處理器140將不會對影像物件進行任何操作。反之，若步驟S604判斷為是，於步驟S605，處理器140對手部影像進行關鍵點偵測，以獲取使用者手部的至少一關鍵點座標。於步驟S606，處理器140將至少一關鍵點座標映射至顯示螢幕110上的至少一二維螢幕座標。步驟S604～步驟S605的操作可參考前述實施例說明。In step S603 , the processor 140 captures a hand image of the user's hand through the image capture device 130 . In step S604, the processor 140 uses the hand image to determine whether the gesture conforms to the predetermined control gesture. In this embodiment, the predetermined control gesture includes a specific one-finger gesture and a specific two-finger gesture. In this embodiment, when the gesture does not conform to the specific one-finger gesture or the specific two-finger gesture, the processor 140 will not perform any operation on the image object. On the other hand, if the determination in step S604 is yes, in step S605, the processor 140 performs key point detection on the hand image to obtain at least one key point coordinate of the user's hand. In step S606 , the processor 140 maps at least one key point coordinate to at least one two-dimensional screen coordinate on the display screen 110 . For the operations of steps S604 to S605, reference may be made to the description of the foregoing embodiments.

當手勢符合特定單指手勢，於步驟S607，處理器140判斷對應於至少一關鍵點座標的至少一二維螢幕座標是否位於物件邊界之內。若步驟S607判斷為是，於步驟S608，處理器140對影像物件執行一選取操作。反之，若關鍵點座標的二維螢幕座標未位於物件邊界之內，處理器140可依據關鍵點座標的二維螢幕座標於顯示螢幕110上標示用以提示使用者的一游標。When the gesture conforms to a specific one-finger gesture, in step S607, the processor 140 determines whether the at least one two-dimensional screen coordinate corresponding to the at least one key point coordinate is within the object boundary. If the determination in step S607 is yes, in step S608, the processor 140 performs a selection operation on the image object. Conversely, if the 2D screen coordinates of the key point coordinates are not within the object boundary, the processor 140 may mark a cursor on the display screen 110 for prompting the user according to the 2D screen coordinates of the key point coordinates.

舉例而言，圖8是依照本發明一實施例的對影像物件執行選取操作的示意圖。請參照圖8，假設顯示螢幕110顯示有影像Img_3，且影像Img_3經語義分割操作可獲取影像物件Obj_1～Obj_4。當食指指尖的關鍵點座標KP1_1為(X _i,Y _i,Z _i)，處理器140可將關鍵點座標KP1_1映射為二維螢幕座標PS1_1。處理器140可判斷二維螢幕座標PS1_1並未位於影像物件Obj_1～Obj_4的物件邊界之內，因此處理器140可控制顯示螢幕110於二維螢幕座標PS1_1顯示一游標。之後，當使用者手部項右移動後，食指指尖的關鍵點座標KP1_2為(X _f,Y _f,Z _f)，處理器140可將關鍵點座標KP1_2映射為二維螢幕座標PS1_2。處理器140可判斷二維螢幕座標PS1_2並位於影像物件Obj_3的物件邊界之內，因此處理器140可對影像物件Obj_3執行一選取操作，以依據後續的其他手勢對影像物件Obj_3執行其他操作。或者，於一實施例中，當使用者已經選定執行特定影像編輯功能時，處理器140可依據上述選取操作而直接實施前述影像編輯功能於影像物件Obj_3上。一實施例中，處理器140可控制顯示螢幕110於影像物件Obj_3的周圍顯示粗邊效果、放大影像物件Obj_3或其他視覺效果，來提示使用者已經選取影像物件Obj_3。 For example, FIG. 8 is a schematic diagram of performing a selection operation on an image object according to an embodiment of the present invention. Referring to FIG. 8 , it is assumed that the image Img_3 is displayed on the display screen 110 , and the image objects Obj_1 to Obj_4 can be acquired by the image Img_3 through the semantic segmentation operation. When the key point coordinate KP1_1 of the index finger tip is (X _i , Y _i , Z _i ), the processor 140 can map the key point coordinate KP1_1 to the two-dimensional screen coordinate PS1_1 . The processor 140 can determine that the two-dimensional screen coordinate PS1_1 is not within the object boundary of the image objects Obj_1 to Obj_4, so the processor 140 can control the display screen 110 to display a cursor at the two-dimensional screen coordinate PS1_1. Afterwards, when the user's hand item moves right, the key point coordinate KP1_2 of the index finger tip is (X _f , Y _f , Z _f ), and the processor 140 can map the key point coordinate KP1_2 to the two-dimensional screen coordinate PS1_2 . The processor 140 can determine that the two-dimensional screen coordinate PS1_2 is within the object boundary of the image object Obj_3, so the processor 140 can perform a selection operation on the image object Obj_3 to perform other operations on the image object Obj_3 according to other subsequent gestures. Alternatively, in one embodiment, when the user has selected to execute a specific image editing function, the processor 140 may directly implement the aforementioned image editing function on the image object Obj_3 according to the above-mentioned selection operation. In one embodiment, the processor 140 can control the display screen 110 to display a thick border effect, a magnified image object Obj_3 or other visual effects around the image object Obj_3 to remind the user that the image object Obj_3 has been selected.

另一方面，當手勢符合特定雙指手勢，於步驟S609，處理器140判斷第一關鍵點座標與第二關鍵點座標之間的距離是否小於門檻值。若步驟S609判斷為是，於步驟S610，處理器140對影像物件開始執行拖曳操作。於步驟S611，反應於第一關鍵點座標與第二關鍵點座標之間的距離大於另一門檻值，處理器140對影像物件結束執行拖曳操作。On the other hand, when the gesture conforms to a specific two-finger gesture, in step S609, the processor 140 determines whether the distance between the coordinates of the first key point and the coordinates of the second key point is less than a threshold value. If the determination in step S609 is yes, in step S610, the processor 140 starts to perform a drag operation on the image object. In step S611, in response to the distance between the coordinates of the first key point and the coordinates of the second key point being greater than another threshold value, the processor 140 ends the drag operation on the image object.

圖9是依照本發明一實施例的計算第一關鍵點與第二關鍵點之間的距離的示意圖。請參照圖9，當手勢符合特定雙指手勢，處理器140可依據食指指尖的關鍵點座標KP1（即第一關鍵點座標）與大拇指指尖的關鍵點座標KP2（即第二關鍵點座標），來判斷使用者是否企圖對影像物件執行拖曳操作以及拖曳操作對應的拖曳路徑。如圖9所示，處理器140可透過計算座標(X ₁,Y ₁,Z ₁)與座標(X ₂,Y ₂,Z ₂)之間的歐式距離來獲取關鍵點座標KP1與關鍵點座標KP2之間的距離d，如下列式(5)所示。

式(5) FIG. 9 is a schematic diagram of calculating the distance between the first key point and the second key point according to an embodiment of the present invention. Referring to FIG. 9 , when the gesture conforms to a specific two-finger gesture, the processor 140 may determine the key point coordinate KP1 (ie the first key point coordinate) of the index finger tip and the key point coordinate KP2 (ie the second key point) of the thumb tip coordinates) to determine whether the user attempts to perform a drag operation on the image object and the drag path corresponding to the drag operation. As shown in FIG. 9 , the processor 140 can obtain the key point coordinate KP1 and the key point coordinate by calculating the Euclidean distance between the coordinates (X ₁ , Y ₁ , Z ₁ ) and the coordinates (X ₂ , Y ₂ , Z ₂ ) The distance d between KP2 is represented by the following formula (5).

Formula (5)

圖10是依照本發明一實施例的對影像物件執行拖曳操作的示意圖。假設使用者已經對影像物件obj_10進行選取操作。請參照圖10，當使用者的食指與大拇指靠攏到足夠程度時，關鍵點座標KP1_1與關鍵點座標KP2_1之間的距離將小於門檻值。與此同時，反應於關鍵點座標KP1_1與關鍵點座標KP2_1之間的距離小於門檻值，處理器140可對位於資料夾F1內的影像物件obj_10開始執行一拖曳操作。接著，使用者可在不改變手勢的情況下移動使用者手部位置。在將影像物件obj_10拖曳至目標位置（例如資料夾F2的螢幕顯示位置）之後，使用者可將食指與大拇指拉開釋放。當使用者的食指與大拇指拉開釋放到足夠程度時，關鍵點座標KP1_2與關鍵點座標KP2_2之間的距離將大於另一門檻值。反應於關鍵點座標KP1_2與關鍵點座標KP2_2之間的距離大於另一門檻值，處理器140可對影像物件obj_10結束執行一拖曳操作。於一實施例中，反應於關鍵點座標KP1_2與關鍵點座標KP2_2之間的距離大於另一門檻值，處理器140可依據關鍵點座標KP1_2或關鍵點座標KP2_2對應的二維幕座標而決定拖曳操作的拖曳終點。藉此，影像物件obj_10可複製或移動至資料夾F2。FIG. 10 is a schematic diagram of performing a drag operation on an image object according to an embodiment of the present invention. It is assumed that the user has selected the image object obj_10. Referring to FIG. 10 , when the user's index finger and thumb are close enough, the distance between the key point coordinate KP1_1 and the key point coordinate KP2_1 will be smaller than the threshold value. At the same time, in response to the distance between the key point coordinate KP1_1 and the key point coordinate KP2_1 being smaller than the threshold value, the processor 140 may start to perform a drag operation on the image object obj_10 in the folder F1. The user can then move the user's hand position without changing the gesture. After dragging the image object obj_10 to the target position (eg, the screen display position of the folder F2 ), the user can pull the index finger and thumb apart to release. When the user's index finger and thumb are pulled apart to a sufficient extent, the distance between the key point coordinates KP1_2 and the key point coordinates KP2_2 will be greater than another threshold value. In response to the fact that the distance between the key point coordinate KP1_2 and the key point coordinate KP2_2 is greater than another threshold value, the processor 140 may end performing a drag operation on the image object obj_10 . In one embodiment, in response to the distance between the key point coordinate KP1_2 and the key point coordinate KP2_2 being greater than another threshold value, the processor 140 may decide to drag according to the key point coordinate KP1_2 or the two-dimensional screen coordinate corresponding to the key point coordinate KP2_2 The drag end point of the operation. Thereby, the image object obj_10 can be copied or moved to the folder F2.

綜上所述，於本發明實施例中，使用者可透過浮空手勢對顯示螢幕所顯示的影像中的一影像物件執行一操作。使用者可透過非常直覺的操作方式而對影像物件進行各式操作，大幅提昇圖像設計軟體的操作流暢性與方便性。使用者也不會受限於執行觸控操作的距離限制，而可從相距於電子裝置較遠的位置進行相關操作。To sum up, in the embodiment of the present invention, the user can perform an operation on an image object in the image displayed on the display screen through the floating gesture. Users can perform various operations on image objects through a very intuitive operation method, which greatly improves the operation fluency and convenience of the image design software. The user is also not limited by the distance limit for performing touch operations, and can perform related operations from a position farther away from the electronic device.

雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明的精神和範圍內，當可作些許的更動與潤飾，故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed above by the embodiments, it is not intended to limit the present invention. Anyone with ordinary knowledge in the technical field can make some changes and modifications without departing from the spirit and scope of the present invention. Therefore, The protection scope of the present invention shall be determined by the scope of the appended patent application.

10:電子裝置 110:顯示螢幕 120:儲存裝置 130:視線偵測裝置 140:處理器 Img_f:手部影像 KP1、KP2、KP1_1、KP1_2、KP2_1、KP2_2:關鍵點座標 41:虛擬平面 PV1:二維虛擬座標 PS1、PS1_1、PS1_2:二維螢幕座標 C1:相機位置 G1~G2:手勢 Img_1、Img_2、Img_3:影像 B1～B5:物件框 M1～M5:物件邊界 F1、F2:資料夾 obj_1～obj_4、obj_10:影像物件 S201～S205、S601～S611:步驟 10: Electronics 110: Display screen 120: Storage Device 130: Line of sight detection device 140: Processor Img_f: hand image KP1, KP2, KP1_1, KP1_2, KP2_1, KP2_2: key point coordinates 41: Virtual Plane PV1: 2D virtual coordinates PS1, PS1_1, PS1_2: 2D screen coordinates C1: Camera position G1~G2: Gestures Img_1, Img_2, Img_3: Image B1～B5: Object frame M1～M5: object boundary F1, F2: folder obj_1～obj_4, obj_10: image objects S201～S205, S601～S611: Steps

圖1是依照本發明一實施例的電子裝置的功能方塊圖。圖2是依照本發明一實施例的手勢控制方法的流程圖。圖3是依照本發明一實施例的使用者手部的關鍵點座標的示意圖。圖4是依照本發明一實施例的產生二維螢幕座標的示意圖。圖5是依照本發明一實施例的手勢控制方法的應用情境示意圖。圖6是依照本發明一實施例的手勢控制方法的流程圖。圖7是依照本發明一實施例的對影像進行語義分割操作的示意圖。圖8是依照本發明一實施例的對影像物件執行選取操作的示意圖。圖9是依照本發明一實施例的計算第一關鍵點與第二關鍵點之間的距離的示意圖。圖10是依照本發明一實施例的對影像物件執行拖曳操作的示意圖。 FIG. 1 is a functional block diagram of an electronic device according to an embodiment of the present invention. FIG. 2 is a flowchart of a gesture control method according to an embodiment of the present invention. FIG. 3 is a schematic diagram of key point coordinates of a user's hand according to an embodiment of the present invention. FIG. 4 is a schematic diagram of generating two-dimensional screen coordinates according to an embodiment of the present invention. FIG. 5 is a schematic diagram of an application scenario of a gesture control method according to an embodiment of the present invention. FIG. 6 is a flowchart of a gesture control method according to an embodiment of the present invention. FIG. 7 is a schematic diagram of performing a semantic segmentation operation on an image according to an embodiment of the present invention. FIG. 8 is a schematic diagram of performing a selection operation on an image object according to an embodiment of the present invention. FIG. 9 is a schematic diagram of calculating the distance between the first key point and the second key point according to an embodiment of the present invention. FIG. 10 is a schematic diagram of performing a drag operation on an image object according to an embodiment of the present invention.

S201~S205:步驟 S201~S205: Steps

Claims

An image-based gesture control method suitable for an electronic device including an image capture device and a display screen, the method comprising: display an image through the display screen; capturing a hand image of the user's hand through the image capturing device; Use the hand image to detect a gesture performed by the user's hand in three-dimensional space, use the hand image to determine whether the gesture conforms to a predetermined control gesture, and if so, perform key point detection on the hand image to obtain at least one key point coordinate of the user's hand; mapping the at least one keypoint coordinate to at least one two-dimensional screen coordinate on the display screen; and An operation is performed on the image object in the image according to the at least one two-dimensional screen coordinate.

The image-based gesture control method of claim 1, wherein the step of mapping the at least one key point coordinate to the at least one two-dimensional screen coordinate on the display screen comprises: projecting the at least one key point coordinate onto a virtual plane between the user's hand and the image capturing device to obtain at least one two-dimensional virtual coordinate on the virtual plane; and The at least one two-dimensional virtual coordinate is normalized according to the resolution of the display screen and a screen selection range to obtain the at least one two-dimensional screen coordinate on the display screen.

The image-based gesture control method as claimed in claim 1, wherein the at least one key point coordinate is projected onto the virtual plane between the user's hand and the image capture device to obtain the The steps of the at least one two-dimensional virtual coordinate include: multiplying the first coordinate component of the at least one key point coordinate by a depth ratio to obtain the first coordinate component of the at least one two-dimensional virtual coordinate; and Multiplying the second coordinate component of the at least one key point coordinate by the depth ratio to obtain the second coordinate component of the at least one two-dimensional virtual coordinate, wherein the depth ratio is the difference between the virtual plane and the image capture device The predetermined distance to depth ratio is the third coordinate component of the at least one key point coordinate.

The image-based gesture control method of claim 1, wherein the step of performing the operation on the image object in the image according to the at least one two-dimensional screen coordinate comprises: performing a semantic segmentation operation on the image to obtain the object boundary of the image object in the image; When the gesture conforms to a specific one-finger gesture, determining whether the at least one two-dimensional screen coordinate corresponding to the at least one key point coordinate is within the bounds of the object; and If so, perform a selection operation on the image object.

The image-based gesture control method of claim 1, wherein the at least one key point coordinate includes a first key point coordinate and a second key point coordinate, and the image object in the image is determined according to the at least one two-dimensional screen coordinate The steps to do this include: performing a semantic segmentation operation on the image to obtain the object boundary of the image object in the image; When the gesture conforms to a specific two-finger gesture, determine whether the distance between the coordinates of the first key point and the coordinates of the second key point is less than a threshold value; If so, start performing a drag operation on the image object; and In response to the distance between the coordinates of the first key point and the coordinates of the second key point being greater than another threshold value, the drag operation is ended.

An electronic device, comprising: an image capture device; a display screen; a storage device recording a plurality of instructions; and a processor, coupled to the display screen, the image capture device and the storage device, configured to: an image through the display screen; capturing a hand image of the user's hand through the image capturing device; Use the hand image to detect a gesture performed by the user's hand in three-dimensional space, use the hand image to determine whether the gesture conforms to a predetermined control gesture, and if so, perform key point detection on the hand image to obtain at least one key point coordinate of the user's hand; mapping the at least one keypoint coordinate to at least one two-dimensional screen coordinate on the display screen; and An operation is performed on an image object in the image according to the at least one two-dimensional screen coordinate.

The electronic device of claim 6, wherein the processor is further configured to: projecting the at least one key point coordinate onto a virtual plane between the user's hand and the image capturing device to obtain at least one two-dimensional virtual coordinate on the virtual plane; and The at least one two-dimensional virtual coordinate is normalized according to the resolution of the display screen and a screen selection range to obtain the at least one two-dimensional screen coordinate on the display screen.

The electronic device of claim 6, wherein the processor is further configured to: multiplying the first coordinate component of the at least one key point coordinate by a depth ratio to obtain the first coordinate component of the at least one two-dimensional virtual coordinate; and Multiplying the second coordinate component of the at least one key point coordinate by the depth ratio to obtain the second coordinate component of the at least one two-dimensional virtual coordinate, wherein the depth ratio is the difference between the virtual plane and the image capture device The predetermined distance to depth ratio is the third coordinate component of the at least one key point coordinate.

The electronic device of claim 6, wherein the processor is further configured to: performing a semantic segmentation operation on the image to obtain the object boundary of the image object in the image; When the gesture conforms to a single-finger gesture, determining whether the at least one two-dimensional screen coordinate corresponding to the at least one key point coordinate is within the object boundary; and If so, perform a selection operation on the image object.

The electronic device of claim 6, wherein the at least one keypoint coordinate includes a first keypoint coordinate and a second keypoint coordinate, and the processor is further configured to: performing a semantic segmentation operation on the image to obtain the object boundary of the image object in the image; When the gesture conforms to the two-finger gesture, determine whether the distance between the coordinates of the first key point and the coordinates of the second key point is less than a threshold value; If so, perform a drag operation on the image object; and In response to the distance between the coordinates of the first key point and the coordinates of the second key point being greater than another threshold value, the drag operation is ended.