TWI603270B

TWI603270B - Method and apparatus for detecting person to use handheld device

Info

Publication number: TWI603270B
Application number: TW103143287A
Authority: TW
Inventors: 林伯聰; 許佳微
Original assignee: 由田新技股份有限公司
Priority date: 2014-12-11
Filing date: 2014-12-11
Publication date: 2017-10-21
Also published as: TW201621758A; CN105989329A

Description

Method and device for detecting personnel using handheld device

本發明是有關於一種影像辨識技術，且特別是有關於一種利用影像辨識技術來偵測人員是否使用手持裝置的方法及裝置。 The present invention relates to an image recognition technology, and more particularly to a method and apparatus for utilizing image recognition technology to detect whether a person uses a handheld device.

隨著行動通訊等技術的快速發展，人們可使用諸如功能型手機(feature phone)或智慧型手機(smart phone)等類型的手持裝置來進行通話、傳訊息甚至是網際網路(Internet)瀏覽。此外，隨著半導體製程、材料、機構設計等技術的進步，手持裝置逐漸具備符合輕薄設計以方便手持操作。因此，手持裝置所帶來的方便性，使得人們的生活逐漸無法脫離手持裝置。 With the rapid development of technologies such as mobile communication, people can use a handheld device such as a feature phone or a smart phone to make calls, send messages, and even browse the Internet. In addition, with the advancement of semiconductor process, materials, mechanism design and other technologies, handheld devices are gradually equipped with a thin and light design to facilitate handheld operations. Therefore, the convenience brought by the handheld device makes people's life gradually unable to get rid of the handheld device.

然而，人們常常會在操作手持裝置時還會希望同時處理其他事物。當人們在同時處理多件事物的過程中，將會非常容易造成意外。例如，人們在駕駛交通工具同時還使用手持裝置，則會因為注意力分散等因素，而導致交通意外發生。因此，如何有效且即時地監控駕駛行為或其他不適合使用手持裝置的情況以避免意外發生，將會是此領域中極需解決之問題。 However, people often want to handle other things at the same time when operating the handheld device. When people are dealing with many things at the same time, it will be very easy to cause accidents. For example, when people use a handheld device while driving a vehicle, traffic accidents may occur due to factors such as distraction. So how do you have Effective and immediate monitoring of driving behavior or other situations where it is not appropriate to use a handheld device to avoid accidents will be an issue that needs to be addressed in this area.

本發明提供一種偵測人員使用手持裝置的方法及裝置，其可透過影像辨識技術來判斷人員耳側是否出現特定手勢以及人員的嘴部動作，以準確且快速地判斷人員是否使用手持裝置。 The invention provides a method and a device for detecting a person using a handheld device, which can determine whether a specific gesture and a person's mouth movement appear on the ear side of the person through the image recognition technology, so as to accurately and quickly determine whether the person uses the handheld device.

本發明提供一種偵測人員使用手持裝置的方法，適用於電子裝置，此方法包括下列步驟。擷取人員的影像。分析影像，以獲得臉部物件。依據臉部物件決定耳側搜尋區域以及嘴部比對區域。藉由在耳側搜尋區域中搜尋是否存在特定手勢，同時在嘴部比對區域中比對所偵測的嘴部動作資訊是否符合預設嘴部資訊，來判斷人員是否正在使用手持裝置。 The present invention provides a method for detecting a person using a handheld device, which is suitable for use in an electronic device. The method includes the following steps. Capture images of people. Analyze the image to get a facial object. The ear side search area and the mouth comparison area are determined according to the face object. It is judged whether the person is using the handheld device by searching for whether a specific gesture exists in the ear search area and comparing whether the detected mouth motion information meets the preset mouth information in the mouth comparison area.

在本發明的一實施例中，上述藉由在耳側搜尋區域中搜尋是否存在特定手勢，同時在嘴部比對區域中比對所偵測的嘴部動作資訊是否符合預設嘴部資訊，來判斷人員是否正在使用手持裝置包括下列步驟。偵測耳側搜尋區域中是否出現人員的手部物件。取得手部物件的手指特徵點，其中手指特徵點包括前景邊緣以及梯度值。比對手指特徵點來判斷是否存在特定手勢。 In an embodiment of the invention, the above-mentioned searching for the presence of a specific gesture in the ear-side search area, and comparing the detected mouth motion information in the mouth comparison area with the preset mouth information, To determine if a person is using a handheld device includes the following steps. Detects whether a person's hand object appears in the ear side search area. A finger feature point of the hand object is obtained, wherein the finger feature point includes a foreground edge and a gradient value. The finger feature points are compared to determine if there is a specific gesture.

在本發明的一實施例中，上述藉由在耳側搜尋區域中搜尋是否存在特定手勢，同時在嘴部比對區域中比對所偵測的嘴部動作資訊是否符合預設嘴部資訊，來判斷人員是否正在使用手持裝置之後，更包括下列步驟。在預設時間內，判斷人員正在使用手持裝置的手持事件數與偵測到臉部物件之臉部事件數的比例是否大於比例門檻值，以產生提示信號。 In an embodiment of the invention, the above-mentioned searching for the presence of a specific gesture in the ear-side search area, and comparing the detected mouth motion information in the mouth comparison area with the preset mouth information, To determine if the person is using the handheld After the device, the following steps are further included. During the preset time, it is judged whether the ratio of the number of hand-held events of the handheld device and the number of facial events of the detected face object is greater than the proportional threshold value to generate a prompt signal.

在本發明的一實施例中，上述的耳側搜尋區域包括第一耳朵區域以及第二耳朵區域，而依據臉部物件決定耳側搜尋區域以及嘴部比對區域包括下列步驟。依據臉部物件，偵測影像中的肩部特徵，而獲得肩部物件。依據肩部物件與臉部物件而決定耳側搜尋區域。過濾耳側搜尋區域中的干擾特徵，以偵測第一耳朵區域與第二耳朵區域。 In an embodiment of the invention, the ear side searching area includes a first ear area and a second ear area, and determining the ear side searching area and the mouth comparing area according to the facial object includes the following steps. According to the face object, the shoulder feature in the image is detected to obtain the shoulder object. The ear side search area is determined according to the shoulder object and the face object. The interference features in the ear side search area are filtered to detect the first ear area and the second ear area.

在本發明的一實施例中，上述依據臉部物件決定耳側搜尋區域以及嘴部比對區域包括下列步驟。由臉部物件中辨識出鼻孔定位點。基於鼻孔定位點設定嘴部區域。對嘴部區域的影像進行影像處理以判斷人員之嘴部物件。依據嘴部物件在嘴部區域決定嘴部比對區域。 In an embodiment of the invention, the determining the ear side search area and the mouth comparison area according to the facial object includes the following steps. The nostril positioning points are identified from the facial objects. The mouth area is set based on the nostril positioning point. The image of the mouth area is image processed to determine the mouth of the person. The mouth contrast region is determined in the mouth region according to the mouth object.

在本發明的一實施例中，上述藉由在耳側搜尋區域中搜尋是否存在特定手勢，同時在嘴部比對區域中比對嘴部動作是否符合預設嘴部資訊，來判斷人員是否正在使用手持裝置包括下列步驟。取得嘴部影像，且依據嘴部影像取得嘴部特徵。依據嘴部特徵來判斷嘴部影像為張開動作影像或閉合動作影像。在嘴部紀錄時間內，依序紀錄嘴部比對區域中所偵測到的所有閉合動作影像或張開動作影像並轉換成編碼序列。將編碼序列存入嘴部動作資訊。 In an embodiment of the present invention, whether the person is being judged by searching for whether a specific gesture exists in the ear side search area, and whether the mouth movement is in accordance with the preset mouth information in the mouth comparison area The use of a handheld device includes the following steps. The mouth image is obtained, and the mouth feature is obtained according to the mouth image. According to the characteristics of the mouth, the image of the mouth is determined to be an open motion image or a closed motion image. During the recording time of the mouth, all closed motion images or open motion images detected in the mouth comparison region are sequentially recorded and converted into a coding sequence. The code sequence is stored in the mouth motion information.

在本發明的一實施例中，上述藉由在耳側搜尋區域中搜尋是否存在特定手勢，同時在嘴部比對區域中比對嘴部動作是否符合預設嘴部資訊，來判斷人員是否正在使用手持裝置包括下列步驟。在嘴部比對時間內，將嘴部比對區域的影像與樣板影像進行比對，以產生嘴型編碼。將嘴型編碼存入編碼序列中。將編碼序列存入嘴部動作資訊。 In an embodiment of the present invention, whether the person is being judged by searching for whether a specific gesture exists in the ear side search area, and whether the mouth movement is in accordance with the preset mouth information in the mouth comparison area The use of a handheld device includes the following steps. During the mouth comparison time, the image of the mouth comparison area is compared with the template image to generate a mouth type code. The mouth type code is stored in the code sequence. The code sequence is stored in the mouth motion information.

本發明提供一種偵測人員使用手持裝置的裝置。此裝置包括影像擷取單元、儲存單元以及處理單元。影像擷取單元擷取人員的影像。儲存單元儲存影像以及預設嘴部資訊。處理單元耦接至儲存單元以取得影像。處理單元分析影像，以獲得臉部物件。處理單元依據臉部物件決定耳側搜尋區域以及嘴部比對區域。並且，處理單元藉由在耳側搜尋區域中搜尋是否存在特定手勢，同時在嘴部比對區域中比對所偵測的嘴部動作資訊是否符合預設嘴部資訊，來判斷人員是否正在使用手持裝置。 The present invention provides a device for detecting a person using a handheld device. The device comprises an image capturing unit, a storage unit and a processing unit. The image capturing unit captures an image of the person. The storage unit stores images and presets mouth information. The processing unit is coupled to the storage unit to obtain an image. The processing unit analyzes the image to obtain a facial object. The processing unit determines the ear side search area and the mouth comparison area according to the face object. Moreover, the processing unit determines whether the person is using by searching for whether a specific gesture exists in the ear side search area, and comparing whether the detected mouth motion information meets the preset mouth information in the mouth comparison area. Handheld device.

在本發明的一實施例中，上述的處理單元偵測耳側搜尋區域中是否出現人員的手部物件，取得手部物件的手指特徵點，其中手指特徵點包括前景邊緣以及梯度值，且處理單元比對手指特徵點來判斷是否存在特定手勢。 In an embodiment of the invention, the processing unit detects whether a human hand object appears in the ear side search area, and obtains a finger feature point of the hand object, wherein the finger feature point includes a foreground edge and a gradient value, and the processing The unit compares the finger feature points to determine if a particular gesture exists.

在本發明的一實施例中，在預設時間內，上述的處理單元判斷人員正在使用手持裝置的手持事件數與偵測到臉部物件之臉部事件數的比例是否大於比例門檻值。 In an embodiment of the invention, the processing unit determines whether the ratio of the number of handheld events that the person is using the handheld device to the number of facial events that detect the facial object is greater than a proportional threshold value within a preset time.

在本發明的一實施例中，上述的耳側搜尋區域包括第一耳朵區域以及第二耳朵區域。處理單元依據臉部物件，偵測影像中的肩部特徵，而獲得肩部物件。處理單元依據肩部物件與臉部物件而決定耳側搜尋區域，且過濾耳側搜尋區域中的干擾特徵，以偵測第一耳朵區域與第二耳朵區域。 In an embodiment of the invention, the ear side search area includes the first Ear area and second ear area. The processing unit detects the shoulder features in the image according to the facial object, and obtains the shoulder object. The processing unit determines the ear side search area according to the shoulder object and the face object, and filters the interference feature in the ear side search area to detect the first ear area and the second ear area.

在本發明的一實施例中，上述的處理單元由臉部物件中辨識出鼻孔定位點，基於鼻孔定位點設定嘴部搜尋區，對嘴部搜尋區的影像進行影像處理以判斷人員之嘴部物件，且依據嘴部物件在嘴部比對區域取出嘴部區域。 In an embodiment of the invention, the processing unit identifies a nostril positioning point from the facial object, sets a mouth searching area based on the nostril positioning point, and performs image processing on the image of the mouth searching area to determine the mouth of the person. The object, and the mouth region is taken out at the mouth alignment area according to the mouth object.

在本發明的一實施例中，上述的處理單元透過影像擷取單元取得嘴部影像，依據嘴部影像取得嘴部特徵，處理單元依據嘴部特徵來判斷嘴部影像為張開動作影像或閉合動作影像，在嘴部紀錄時間內，處理單元依序紀錄嘴部比對區域中所偵測到的所有閉合動作影像或張開動作影像並轉換成編碼序列，且將編碼序列存入嘴部動作資訊。 In an embodiment of the invention, the processing unit obtains the mouth image through the image capturing unit, and obtains the mouth feature according to the mouth image, and the processing unit determines the mouth image as the opening motion image or the closing according to the mouth feature. In the motion image, during the recording time of the mouth, the processing unit sequentially records all the closed motion images or the motion motion images detected in the mouth comparison region and converts them into a coding sequence, and stores the coding sequence in the mouth motion. News.

在本發明的一實施例中，在嘴部比對時間內，上述的處理單元將嘴部比對區域的影像與樣板影像進行比對，以產生嘴型編碼，並將嘴型編碼存入編碼序列中，且將編碼序列存入嘴部動作資訊。 In an embodiment of the invention, in the mouth comparison time, the processing unit compares the image of the mouth comparison area with the template image to generate a mouth type code, and stores the mouth type code into the code. In the sequence, the code sequence is stored in the mouth motion information.

基於上述，本發明實施例可藉由影像辨識技術監控人員的耳側搜尋區域是否出現特定手勢以及嘴部比對區域是否，再同時比對人員的嘴部動作是否符合預設嘴部資訊，以判斷人員是否正在使用手持裝置。藉此，便能有效且準確地判斷人員是否正使用手持裝置。 Based on the above, the image recognition technology can monitor whether a specific gesture of the ear side search area of the person and the mouth comparison area are present by the image recognition technology, and at the same time, whether the mouth movement of the matching person meets the preset mouth information, Determine if the person is using a handheld device. In this way, it is possible to effectively and accurately determine whether the person is making Use a handheld device.

為讓本發明的上述特徵和優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明如下。 The above described features and advantages of the invention will be apparent from the following description.

100‧‧‧裝置 100‧‧‧ device

110‧‧‧影像擷取單元 110‧‧‧Image capture unit

130‧‧‧儲存單元 130‧‧‧storage unit

150‧‧‧處理單元 150‧‧‧Processing unit

170‧‧‧影像處理模組170 170‧‧‧Image Processing Module 170

190‧‧‧警示單元 190‧‧‧Warning unit

210‧‧‧臉部偵測模組 210‧‧‧Face Detection Module

230‧‧‧耳側區域定義模組 230‧‧‧ear area definition module

240‧‧‧嘴部區域定義模組 240‧‧‧Mouth area definition module

250‧‧‧手勢判定模組 250‧‧‧ gesture determination module

270‧‧‧嘴部比對模組 270‧‧‧ mouth comparison module

280‧‧‧事件數判斷模組 280‧‧‧Event Counting Module

290‧‧‧警示模組 290‧‧‧ Warning Module

S310~S370、S410~S490、S510~S590、S610~S670、S710 ~S790‧‧‧步驟 S310~S370, S410~S490, S510~S590, S610~S670, S710 ~S790‧‧‧Steps

400、500‧‧‧影像 400, 500‧‧ images

410、510‧‧‧臉部物件 410, 510‧‧‧Face objects

440、450、530、540‧‧‧耳側搜尋區域 440, 450, 530, 540 ‧ ‧ ear side search area

520‧‧‧臉部區域 520‧‧‧Face area

w1、w2、w3、w4‧‧‧寬度 W1, w2, w3, w4‧‧‧ width

h1、h2、h3、h4‧‧‧高度 H1, h2, h3, h4‧‧‧ height

圖1是依據本發明一實施例說明一種偵測人員使用手持裝置的裝置的方塊圖。 1 is a block diagram showing an apparatus for detecting a person using a handheld device according to an embodiment of the invention.

圖2是依照本發明一實施例的影像處理模組的方塊圖。 2 is a block diagram of an image processing module in accordance with an embodiment of the present invention.

圖3是依據本發明一實施例說明一種偵測人員使用手持裝置的方法流程圖。 FIG. 3 is a flow chart showing a method for detecting a person using a handheld device according to an embodiment of the invention.

圖4是依據本發明一實施例的人員影像的示意圖。 4 is a schematic diagram of a person image in accordance with an embodiment of the present invention.

圖5是依據本發明另一實施例的人員影像的示意圖。 FIG. 5 is a schematic diagram of a person image according to another embodiment of the present invention.

圖6是依據本發明一實施例說明決定嘴部比對區域的流程範例。 FIG. 6 is a flow chart showing an example of determining a mouth matching region according to an embodiment of the present invention.

圖7是依據本發明一實施例說明紀錄嘴部動作資訊的流程範例。 FIG. 7 is a flow chart showing an example of recording mouth movement information according to an embodiment of the present invention.

圖8是依據本發明另一實施例說明紀錄嘴部動作資訊的流程範例。 FIG. 8 is a flow chart showing an example of recording mouth movement information according to another embodiment of the present invention.

圖9是依據本發明一實施例說明偵測人員使用手持裝置的範例。 FIG. 9 is a diagram illustrating an example of a detecting person using a handheld device according to an embodiment of the invention.

人們在抓取行動電話接聽來電時，通常會將手持裝置拿至臉側附近，以使手持裝置的聽筒朝向耳朵，並使手持裝置的話筒貼近嘴部。此外，由於使用習慣的不同，人們抓取手持裝置的手勢亦可能不同。據此，本發明實施例便是對人員進行影像監控，利用影像辨識技術來判斷人員的耳朵附近是否出現特定手勢，以判斷人員是否手拿行動電話。同時，本發明實施例更判斷人員的嘴部動作是否符合預設嘴部資訊，以判斷人員是否講話。藉此，便能有效且準確地判斷人員是否正使用手持裝置。以下提出符合本發明之精神的多個實施例，應用本實施例者可依其需求而對這些實施例進行適度調整，而不僅限於下述描述中的內容。 When a mobile phone picks up a mobile phone to answer an incoming call, the handheld device is usually brought to the vicinity of the face so that the handset of the handheld device faces the ear and the microphone of the handheld device is placed close to the mouth. In addition, people may have different gestures for grabbing handheld devices due to different usage habits. Accordingly, the embodiment of the present invention performs image monitoring on a person, and uses image recognition technology to determine whether a specific gesture appears near the ear of the person to determine whether the person holds the mobile phone. At the same time, the embodiment of the present invention further determines whether the mouth motion of the person meets the preset mouth information to determine whether the person speaks. Thereby, it is possible to effectively and accurately judge whether or not the person is using the handheld device. A plurality of embodiments in accordance with the spirit of the present invention are set forth below, and those applying the present embodiment can be appropriately adjusted according to their needs, and are not limited to the contents described in the following description.

圖1是依據本發明一實施例說明一種偵測人員使用手持裝置的裝置的方塊圖。請參照圖1，裝置100包括影像擷取單元110、儲存單元130、處理單元150、影像處理模組170及警示單190。在一實施例中，裝置100例如是設置在行車內，以對駕駛人進行偵測。在其他實施例中，裝置100亦可以使用於自動櫃員機(Automated Teller Machine；ATM)等自動交易裝置，以判斷例如是操作者是否正接聽手持裝置而進行轉帳操作。需說明的是，應用本發明實施例者可依據需求，將裝置100設置於任何需要監控人員是否正使用手持裝置的電子裝置或場所，本發明實施例不加以限制。 1 is a block diagram showing an apparatus for detecting a person using a handheld device according to an embodiment of the invention. Referring to FIG. 1 , the device 100 includes an image capturing unit 110 , a storage unit 130 , a processing unit 150 , an image processing module 170 , and a warning sheet 190 . In one embodiment, the device 100 is, for example, disposed within the vehicle for detecting the driver. In other embodiments, the device 100 can also be used in an automated transaction device such as an Automated Teller Machine (ATM) to determine, for example, whether the operator is answering the handheld device and performing a transfer operation. It should be noted that the embodiment of the present invention can be used to set the device 100 to any electronic device or location that requires the monitoring personnel to use the handheld device.

影像擷取單元110可以是電荷耦合元件(Charge coupled device；CCD)鏡頭、互補式金氧半電晶體(Complementary metal oxide semiconductor transistors；CMOS)鏡頭、或紅外線鏡頭的攝影機、照相機。影像擷取單元110用以擷取人員的影像，並將影像存放至儲存單元130。 The image capturing unit 110 may be a charge coupled device (Charge coupled Device; CCD) lens, complementary metal oxide semiconductor transistor (CMOS) lens, or infrared lens camera, camera. The image capturing unit 110 is configured to capture an image of a person and store the image in the storage unit 130.

儲存單元130可以是任何型態的固定或可移動隨機存取記憶體(random access memory；RAM)、唯讀記憶體(read-only memory；ROM)、快閃記憶體(flash memory)、硬碟(Hard Disk Drive；HDD)或類似元件或上述元件的組合。 The storage unit 130 can be any type of fixed or removable random access memory (RAM), read-only memory (ROM), flash memory, hard disk. (Hard Disk Drive; HDD) or the like or a combination of the above elements.

處理單元150耦接影像擷取單元110以及儲存單元130。處理單元150可以是中央處理器(Central Processing Unit；CPU)具有運算功能的晶片組、微處理器或微控制器(micro control unit；MCU)。本發明實施例處理單元150用以處理本實施例之裝置100的所有作業。處理單元150可透過影像擷取單元110取得影像，將影像儲存至儲存單元130中，並對影像進行影像處理之程序。 The processing unit 150 is coupled to the image capturing unit 110 and the storage unit 130. The processing unit 150 may be a chip set, a microprocessor or a micro control unit (MCU) having a computing function of a central processing unit (CPU). The processing unit 150 of the embodiment of the present invention is configured to process all the operations of the apparatus 100 of the embodiment. The processing unit 150 can acquire images through the image capturing unit 110, store the images in the storage unit 130, and perform image processing on the images.

影像處理模組170例如是由電腦程式語言算撰寫的程式碼片段，上述程式碼片段例如可儲存於儲存單元130(或者另一儲存單元)中並且包括多個指令，藉由處理單元150來執行上述程式碼片段。另外，在其他實施例中，上述影像處理模組170亦可以是由一或多個電路所組成的硬體組件，在此並不限制。 The image processing module 170 is, for example, a code segment written by a computer program language. The code segment can be stored in the storage unit 130 (or another storage unit) and includes a plurality of instructions, which are executed by the processing unit 150. The above code fragment. In addition, in other embodiments, the image processing module 170 may also be a hardware component composed of one or more circuits, which is not limited herein.

圖2是依照本發明一實施例的影像處理模組170的方塊圖。請參照圖2，影像處理模組130包括臉部偵測模組210、耳側區域定義模組230、嘴部區域定義模組240、手勢判定模組250、嘴部比對模組270、事件數判斷模組280以及警示模組290。 2 is a block diagram of an image processing module 170 in accordance with an embodiment of the present invention. Referring to FIG. 2, the image processing module 130 includes a face detection module 210 and an ear side. The area definition module 230, the mouth area definition module 240, the gesture determination module 250, the mouth comparison module 270, the event number determination module 280, and the warning module 290.

例如，在影像擷取單元110擷取人員的影像後，影像處理模組170的臉部偵測模組210在影像中偵測臉部特徵，而獲得臉部物件。並且，耳側區域定義模組230、嘴部區域定義模組240依據臉部物件來分別決定耳側搜尋區域以及嘴部比對區域。之後，藉由手勢判定模組250在耳側搜尋區域中搜尋是否存在特定手勢，以及嘴部比對模組270在嘴部比對區域中比對是否符合預設嘴部資訊，來判斷人員是否正在使用手持裝置。事件數判斷模組280判斷判斷人員正在使用手持裝置的手持事件數與偵測到臉部物件之臉部事件數的比例是否大於比例門檻值。並且，警示模組290依據判斷結果來提示人員。各功能模組的詳細運作步驟待稍後實施例再進一步說明。 For example, after the image capturing unit 110 captures the image of the person, the face detecting module 210 of the image processing module 170 detects the facial features in the image to obtain the facial object. Moreover, the ear side region defining module 230 and the mouth region defining module 240 respectively determine the ear side searching area and the mouth matching area according to the face object. Then, the gesture determination module 250 searches for whether a specific gesture exists in the ear side search area, and whether the mouth comparison module 270 compares the preset mouth information in the mouth comparison area to determine whether the person is The handheld device is being used. The event number judgment module 280 determines whether the ratio of the number of hand-held events that the judgment person is using the hand-held device to the number of face events in which the face object is detected is greater than the proportional threshold value. Moreover, the alert module 290 prompts the person based on the result of the determination. The detailed operational steps of each functional module are further explained in the following embodiments.

警示單元190例如是液晶顯示器(Liquid Crystal Display；LCD)、有機電激發光顯示器(Organic Electro-Luminescent Display；OELD)等顯示單元、具備至少一個發光二極體(Light Emitting Diode；LED)的燈光模組、包括振動馬達的振動模組及具備單聲道或立體聲揚聲器的揚聲器模組其中之一或其組合。 The warning unit 190 is, for example, a display unit such as a liquid crystal display (LCD) or an organic electro-luminescence display (OELD), or a light module having at least one light emitting diode (LED). One or a combination of the group, the vibration module including the vibration motor, and the speaker module having a mono or stereo speaker.

需說明的是，在其他實施例中，影像擷取單元110更具有照明元件，用以在光線不足時適時進行補光，以確保其所拍攝之影像的清晰度。 It should be noted that, in other embodiments, the image capturing unit 110 further has a lighting component for correcting light when the light is insufficient to ensure the sharpness of the image captured by the image capturing unit 110.

為幫助理解本發明之技術，以下舉一情境說明本發明的應用方式。假設本發明實施例的裝置100設置於汽車上，一位駕駛人員坐在駕駛位置(未方便說明，以下以「人員」作為此駕駛人員)，裝置100上的影像擷取單元110可對人員進行拍攝。影像擷取單元110所擷取到人員的影像可包含人員的臉部、肩部甚至是半身。此外，假設手持裝置放置於排檔附近或儀表板上方等任何位置。於此，底下將依據此情境搭配諸多實施例來進行詳細說明。 In order to help understand the technology of the present invention, the following context illustrates the application of the present invention. Use the method. It is assumed that the device 100 of the embodiment of the present invention is installed in a car, and a driver sits in a driving position (not conveniently described, the following is a "person" as the driver), and the image capturing unit 110 on the device 100 can perform the personnel. Shooting. The image captured by the image capturing unit 110 may include a person's face, shoulders, or even a half body. In addition, it is assumed that the handheld device is placed anywhere near the gear or above the dashboard. Herein, the details will be described below in conjunction with various embodiments.

圖3是依據本發明一實施例說明一種偵測人員使用手持裝置(例如，功能型手機或智慧型手機等類型的行動電話)的方法流程圖。請參照圖2，本實施例的方法適用於圖1的裝置100及圖2的影像處理模組170。下文中，將搭配裝置100及影像處理模組170中的各項元件及模組說明本發明實施例所述之方法。本方法的各個流程可依照實施情形而隨之調整，且並不僅限於此。 FIG. 3 is a flow chart showing a method for detecting a person using a handheld device (for example, a mobile phone of a type such as a smart phone or a smart phone) according to an embodiment of the invention. Referring to FIG. 2, the method of this embodiment is applicable to the device 100 of FIG. 1 and the image processing module 170 of FIG. Hereinafter, the methods described in the embodiments of the present invention will be described in conjunction with the components and modules in the device 100 and the image processing module 170. The various processes of the method can be adjusted accordingly according to the implementation situation, and are not limited thereto.

在步驟S310中，處理單元150透過影像擷取單元110擷取人員的影像。例如，影像擷取單元110可設定為每秒30張、45張等拍攝速度，或僅拍攝一張影像，以對人員進行拍攝，並將擷取的影像儲存在儲存單元130中。 In step S310, the processing unit 150 captures the image of the person through the image capturing unit 110. For example, the image capturing unit 110 can set the shooting speed of 30 sheets, 45 sheets, or the like, or only one image to capture the person and store the captured image in the storage unit 130.

在其他實施例中，處理單元150亦可事先設定啟動條件。當符合此啟動條件時，處理單元150可致能影像擷取單元110來擷取人員的影像。例如，可在影像擷取單元110的附近設置感測器(例如，紅外線感測器)。裝置10利用紅外線感測器來偵測是否有人員位於影像擷取單元110可擷取影像的範圍內。倘若紅外線感測器偵測到在影像擷取單元110前方有人員出現(即，符合啟動條件)時，處理單元150便會致能影像擷取單元110開始擷取影像。另外，裝置100上亦可設置啟動鈕，當此啟動鈕被按壓時，處理單元150才啟動影像擷取單元110。需說明的是，上述僅為舉例說明，本發明並不以此為限。 In other embodiments, the processing unit 150 may also set a startup condition in advance. When the start condition is met, the processing unit 150 can enable the image capture unit 110 to capture an image of the person. For example, a sensor (eg, an infrared sensor) may be disposed in the vicinity of the image capturing unit 110. The device 10 uses an infrared sensor to detect whether a person is located within a range in which the image capturing unit 110 can capture an image. If the infrared sensor detects that a person appears in front of the image capturing unit 110 (ie, matches When the condition is started, the processing unit 150 enables the image capturing unit 110 to start capturing images. In addition, a start button may be disposed on the device 100, and when the start button is pressed, the processing unit 150 activates the image capturing unit 110. It should be noted that the foregoing is merely illustrative, and the invention is not limited thereto.

此外，影像處理模組170亦可對擷取到的影像序列執行背景濾除動作。例如，將第I張影像與第I+1張影像進行差分處理，I為正整數。之後，影像處理模組170可將濾除背影的影像轉為灰階影像，藉此進行後續動作。 In addition, the image processing module 170 can also perform background filtering on the captured image sequence. For example, the first image and the first +1 image are subjected to differential processing, and I is a positive integer. Thereafter, the image processing module 170 can convert the image filtered back to a grayscale image, thereby performing subsequent actions.

接著，由處理單元150開始透過影像處理模組170對上述影像序列的各張影像進行影像處理程序。在步驟S330中，處理單元150透過臉部偵測模組210分析影像，以獲得臉部物件。具體而言，處理單元150分析一張或多張影像以取得臉部特徵(例如，眼睛、鼻子、嘴唇等)，再利用臉部特徵的比對，來找出影像中的臉部物件。例如，儲存單元130儲存有特徵資料庫。此特徵資料庫包括了臉部特徵樣本(pattern)。而臉部偵測模組210藉由與特徵資料庫中的樣本進行比對來獲得臉部物件。針對偵測臉部的技術，本發明實施例可利用AdaBoost演算法或其他人臉偵測演算法(例如，主成份分析(Principal Component Analysis；PCA)、獨立成份分析(Independent Component Analysis；ICA)等演算法或利用Haar-like特徵來進行人臉偵測動作等)來獲得各影像中的臉部物件。 Then, the processing unit 150 starts to perform an image processing process on each image of the video sequence through the image processing module 170. In step S330, the processing unit 150 analyzes the image through the face detection module 210 to obtain a facial object. Specifically, the processing unit 150 analyzes one or more images to obtain facial features (eg, eyes, nose, lips, etc.), and then uses the alignment of the facial features to find facial objects in the image. For example, the storage unit 130 stores a feature database. This feature database includes facial feature samples. The face detection module 210 obtains the face object by comparing with the samples in the feature database. For the technique of detecting a face, the embodiment of the present invention can utilize an AdaBoost algorithm or other face detection algorithms (for example, Principal Component Analysis (PCA), Independent Component Analysis (ICA), etc. The algorithm or the Haar-like feature is used to perform face detection actions, etc.) to obtain facial objects in each image.

而在其他實施例中，在偵測臉部特徵之前，臉部偵測模組210更可先執行背景濾除動作。例如，處理單元150可透過影像擷取單元110事先擷取至少一張未存在人像的背景影像，以在獲得人員的影像之後，臉部偵測模組210可將包括人像的影像與背景影像進行相減，如此便能夠將背景濾除。之後，臉部偵測模組210可將濾除背影的影像轉為灰階影像，再轉為二值化影像。此時，臉部偵測模組210便可於二值化影像中來偵測臉部特徵。 In other embodiments, the face detection mode is detected before the facial features are detected. The group 210 can perform the background filtering action first. For example, the processing unit 150 can capture at least one background image of the non-existing image through the image capturing unit 110, so that after obtaining the image of the person, the face detecting module 210 can perform the image including the portrait and the background image. Subtract, so you can filter out the background. Afterwards, the face detection module 210 can convert the image filtered back into a grayscale image and then convert it into a binarized image. At this time, the face detection module 210 can detect the facial features in the binarized image.

在步驟S350中，處理單元150依據臉部物件而透過耳側區域定義模組230及嘴部區域定義模組240分別決定至少一個耳側搜尋區域以及嘴部比對區域。 In step S350, the processing unit 150 determines at least one of the ear side search area and the mouth comparison area through the ear side area definition module 230 and the mouth area definition module 240 according to the face object.

在一實施例中，耳側搜尋區域包括第一耳朵區域以及第二耳朵區域。耳側區域定義模組230依據臉部物件，偵測影像中的肩部特徵，而獲得肩部物件。耳側區域定義模組230依據肩部物件與臉部物件而決定耳側搜尋區域，且過濾耳側搜尋區域中的干擾特徵，以偵測第一耳朵區域與第二耳朵區域。在本實施例中，干擾特徵例如是眼鏡干擾特徵及頭髮干擾特徵至少其中之一。 In an embodiment, the ear side search area includes a first ear area and a second ear area. The ear side region definition module 230 detects the shoulder features in the image according to the face object, and obtains the shoulder object. The ear side region definition module 230 determines the ear side search area according to the shoulder object and the face object, and filters the interference feature in the ear side search area to detect the first ear area and the second ear area. In this embodiment, the interference feature is, for example, at least one of a glasses interference feature and a hair interference feature.

具體而言，在上述欲偵測區域中，耳側區域定義模組230可在臉部物件下方處偵測是否有兩個水平對稱的邊緣特徵存在，而在臉部物件的兩側皆存在有對稱性邊緣特徵時，將此邊緣特徵視為肩部，以獲得肩部物件。 Specifically, in the area to be detected, the ear side area defining module 230 can detect whether two horizontally symmetric edge features exist under the face object, and there are two sides of the face object. When symmetrical edge features, this edge feature is considered a shoulder to obtain a shoulder object.

舉例而言，圖4是依據本發明一實施例的人員影像的示意圖。在本實施例中，假設偵測人員使用手持裝置的裝置100的儲存單元130具有特徵資料庫。此特徵資料庫包括臉部特徵樣本 (pattern)、肩部特徵樣本、干擾特徵樣本等樣本。而耳側區域定義模組230藉由比較擷取的影像與特徵資料庫中的樣本，來獲得臉部物件與肩部物件。 For example, FIG. 4 is a schematic diagram of a human image in accordance with an embodiment of the present invention. In this embodiment, it is assumed that the storage unit 130 of the device 100 that detects the person using the handheld device has a feature database. This feature database includes facial feature samples Samples such as (pattern), shoulder feature samples, and interference feature samples. The ear side region definition module 230 obtains the face object and the shoulder object by comparing the captured image with the sample in the feature database.

請參照圖4，耳側區域定義模組230可於影像400中獲得臉部物件410，並且根據臉部物件410的位置，往下搜尋而獲得肩部物件420、430。此時，耳側區域定義模組230便可依據臉部物件與肩部物件來獲得耳側搜尋區域440、450。例如，搜尋區域440例如是以臉部物件410在垂直軸上的最高點與肩部物件420在垂直軸上的最低點之間的距離作為高度h1，並以臉部物件410在水平軸上的基準點(例如，臉部物件410的1/3寬度處)與肩部物件420的邊緣處之間的距離作為寬度w1。而搜尋區域450例如是以臉部物件410在垂直軸上的最高點與肩部物件430在垂直軸上的最低點之間的距離作為高度h2，並以臉部物件410在水平軸上的另一基準點(例如，臉部物件410的2/3寬度處)與肩部物件430的邊緣處之間的距離作為寬度w2。需說明的是，上述高度h1與高度h2、寬度w1與寬度w2可以為相同亦可以為不同，端視人員雙肩的位置而決定。 Referring to FIG. 4, the ear side region definition module 230 can obtain the face object 410 in the image 400, and search for the shoulder object 420, 430 according to the position of the face object 410. At this time, the ear side region defining module 230 can obtain the ear side searching regions 440, 450 according to the face object and the shoulder object. For example, the search area 440 is, for example, the distance between the highest point of the face object 410 on the vertical axis and the lowest point of the shoulder object 420 on the vertical axis as the height h1, and on the horizontal axis of the face object 410. The distance between the reference point (eg, at the 1/3 width of the face item 410) and the edge of the shoulder article 420 is taken as the width w1. The search area 450 is, for example, the distance between the highest point of the face object 410 on the vertical axis and the lowest point of the shoulder object 430 on the vertical axis as the height h2, and the other of the face object 410 on the horizontal axis. The distance between a reference point (eg, at a 2/3 width of the face item 410) and the edge of the shoulder article 430 is taken as the width w2. It should be noted that the height h1 and the height h2, the width w1 and the width w2 may be the same or different, and the position of the shoulders of the person is determined.

或者，耳側區域定義模組230可在臉部物件下方處偵測人員是否穿著衣服，依據衣服的邊緣特徵決定肩部物件。例如，以衣服在水平軸上的最左端及最右端往垂直軸上方延伸的最高點的水平對稱邊緣特徵，來作為肩部物件。 Alternatively, the ear side region definition module 230 can detect whether a person wears clothes under the face object, and determines the shoulder object according to the edge feature of the clothes. For example, a horizontally symmetrical edge feature of the highest point of the garment extending from the leftmost and rightmost ends of the garment to the vertical axis is used as the shoulder article.

在另一實施例中，耳側區域定義模組230亦可僅依據臉部物件的長度或寬度，來決定耳側位置區域。舉例而言，圖5是依據本發明另一實施例的人員影像的示意圖。請參照圖5，耳側區域定義模組230可於影像500中獲得臉部物件510，並且根據臉部物件510在水平軸上的最右端以及最左端之間寬度(或臉部物件在垂直軸上的最高點以及最低點之間的長度)決定一個臉部區域520。此臉部區域520例如是長寬皆為寬度w3的正方形區域。耳側搜尋區域530例如是以臉部區域520在垂直軸上的最高點往下高度h3(例如，寬度w3的四分之一長度)處作為耳側搜尋區域530在垂直軸上的最高點，並以高度h4(例如，與寬度w3相同的長度)作為耳側搜尋區域530在垂直軸上的長度，再以臉部區域520在水平軸上的最左端作為耳側搜尋區域530在水平軸上的最右端，並在臉部區域520在水平軸上的最左端向左方水平延伸寬度w4(例如，0.7倍的寬度w3長度)處作為耳側搜尋區域530在水平軸上的最左端。而耳側搜尋區域540則對稱於耳側搜尋區域530，依據耳側搜尋區域530類推，便能決定耳側搜尋區域540的邊界，於此不再贅述。需說明的是，寬度w3、w4以及高度h3、h4之間長度比例可依據設計需求來變更，本發明實施例不限於此。 In another embodiment, the ear side region definition module 230 can also be based only on the face. The length or width of the object determines the area of the ear side. For example, FIG. 5 is a schematic diagram of a person image according to another embodiment of the present invention. Referring to FIG. 5, the ear side region definition module 230 can obtain the face object 510 in the image 500, and according to the width between the rightmost end and the leftmost end of the face object 510 on the horizontal axis (or the face object on the vertical axis) The upper point and the length between the lowest points determine a face area 520. This face region 520 is, for example, a square region whose width and width are both widths w3. The ear side search area 530 is, for example, the highest point h3 of the face area 520 on the vertical axis downward height h3 (for example, a quarter length of the width w3) as the highest point of the ear side search area 530 on the vertical axis. And the height h4 (for example, the same length as the width w3) is used as the length of the ear side search region 530 on the vertical axis, and then the leftmost end of the face region 520 on the horizontal axis is used as the ear side search region 530 on the horizontal axis. The far right end of the face region 520 extends horizontally to the left at the leftmost end on the horizontal axis by a width w4 (for example, 0.7 times the width w3 length) as the leftmost end of the ear side search region 530 on the horizontal axis. The ear side search area 540 is symmetric with respect to the ear side search area 530. According to the ear side search area 530, the boundary of the ear side search area 540 can be determined, and details are not described herein again. It should be noted that the length ratios between the widths w3 and w4 and the heights h3 and h4 may be changed according to design requirements, and the embodiment of the present invention is not limited thereto.

此外，在其他一些實施例中，耳側區域定義模組230亦可以僅決定一個耳側搜尋區域。例如，以圖4中臉部物件410在垂直軸上的最高點與肩部物件420、430在垂直軸上的最低點之間的距離作為耳側搜尋區域的高度。並且，以肩部物件420、430在水平軸上的寬度(肩寬)作為耳側搜尋區域的寬度。 In addition, in other embodiments, the ear side region definition module 230 may also determine only one ear side search region. For example, the distance between the highest point of the facial article 410 on the vertical axis in FIG. 4 and the lowest point on the vertical axis of the shoulder articles 420, 430 is taken as the height of the ear side search region. Also, the width (shoulder width) of the shoulder articles 420, 430 on the horizontal axis is taken as the width of the ear side search region.

另一方面，除了依據臉部物件決定耳側搜尋區域之外，在另一實施例中，嘴部區域定義模組240由鼻孔物件中辨識出鼻孔定位點，基於鼻孔定位點設定嘴部區域，對嘴部區域的影像進行影像處理以判斷人員之嘴部物件，且依據嘴部物件在嘴部區域決定嘴部比對區域。 On the other hand, in addition to determining the ear side search area according to the facial object, in another embodiment, the mouth area definition module 240 identifies the nostril positioning point from the nostril object, and sets the mouth area based on the nostril positioning point. The image of the mouth region is image-processed to determine the mouthpiece of the person, and the mouth-matching region is determined in the mouth region according to the mouthpiece.

舉例而言，圖6是依據本發明一實施例說明決定嘴部比對區域的流程範例。嘴部區域定義模組240可基於鼻孔位置資訊設定嘴部區域，藉由例如是嘴唇顏色與皮膚、牙齒顏色深淺不同的差異，透過調整嘴部區域內的對比而獲得加強影像(步驟S610)，並進一步對此加強影像進行去雜點處理。例如，透過像素矩陣將雜點濾除。嘴部區域定義模組240便可獲得相對於加強影像更清晰的去雜點影像(步驟S630)。接著，嘴部區域定義模組240根據影像中某個顏色與另一個顏色的對比程度進行邊緣銳利化處理，以決定此去雜點影像中的邊緣，因而獲得銳化影像(步驟S550)。由於影像的複雜程度將決定影像占用的記憶容量，為提高比對的效能，嘴部區域定義模組240更對此銳化影像進行二值化處理。例如，嘴部區域定義模組240先設定門檻值，並將影像中的像素分為超出或低於此門檻值的二種數值，而可獲得二值化影像(步驟S670)。最後，嘴部區域定義模組240再次對二值化影像進行邊緣銳利化處理。此時，二值化影像中人員的嘴唇部位已相當明顯，嘴部區域定義模組240便可於嘴部區域中取出嘴部比對區域(步驟S690)。 For example, FIG. 6 is a flow chart showing an example of determining a mouth matching region according to an embodiment of the present invention. The mouth area defining module 240 can set the mouth area based on the nostril position information, and obtain a enhanced image by adjusting the contrast in the mouth area by, for example, a difference in the color of the lips and the color of the skin and the teeth (step S610). And further strengthen the image to remove the noise. For example, the pixels are filtered through the pixel matrix. The mouth area definition module 240 can obtain a clear denoised image with respect to the enhanced image (step S630). Next, the mouth region definition module 240 performs edge sharpening processing according to the degree of contrast of a certain color in the image with another color to determine an edge in the denoising image, thereby obtaining a sharpened image (step S550). Since the complexity of the image will determine the memory capacity occupied by the image, in order to improve the performance of the comparison, the mouth region definition module 240 performs binarization processing on the sharpened image. For example, the mouth area definition module 240 first sets the threshold value and divides the pixels in the image into two values exceeding or lower than the threshold value to obtain a binarized image (step S670). Finally, the mouth region definition module 240 performs edge sharpening processing on the binarized image again. At this time, the lip portion of the person in the binarized image is quite conspicuous, and the mouth region defining module 240 can take out the mouth matching region in the mouth region (step S690).

需說明的是，應用本發明實施例者，可以依據設計需求來決定耳側位置區域以及嘴部比對區域，例如針對不同人員的臉部特徵(例如，臉寬、耳朵大小、嘴唇寬度等)進行調整，本發明並不以此為限。 It should be noted that, in the embodiment of the present invention, the ear side position area and the mouth comparison area may be determined according to design requirements, for example, facial features for different people (for example, face width, ear size, lip width, etc.) Adjustments are made and the invention is not limited thereto.

在步驟S370中，處理單元150藉由手勢判定模組250在耳側搜尋區域中搜尋是否存在特定手勢，同時嘴部比對模組270在嘴部比對區域中比對所偵測的嘴部動作資訊是否符合預設嘴部資訊，來判斷人員是否正在使用手持裝置。 In step S370, the processing unit 150 searches for the specific gesture in the ear side search area by the gesture determination module 250, and the mouth comparison module 270 compares the detected mouth in the mouth comparison area. Whether the action information conforms to the preset mouth information to determine whether the person is using the handheld device.

在一實施例中，手勢判定模組250偵測耳側搜尋區域中是否出現人員的手部物件，取得手部物件的手指特徵點，其中手指特徵點包括前景邊緣以及梯度值。接著，手勢判定模組250比對手指特徵點來判斷是否存在特定手勢。 In one embodiment, the gesture determination module 250 detects whether a person's hand object appears in the ear side search area, and obtains a finger feature point of the hand object, wherein the finger feature point includes a foreground edge and a gradient value. Next, the gesture determination module 250 compares the finger feature points to determine whether a specific gesture exists.

舉例而言，手勢判定模組250將當前影像與參考影像(可以是先前影像，例如此當前影像的前一幅影像或前N幅影像，亦或是預先設定的任一幅影像)兩者各自的耳側搜尋區域(例如，圖4的興趣區域R)執行影像相減演算法，以獲得前景影像。手勢判定模組250並藉由參考影像的耳側搜尋區域，濾除前景影像的雜訊，以獲得前景邊緣。或者，手勢判定模組250可利用邊緣偵測演算法來計算前景邊緣。藉此，手勢判定模組250便可依據前景影像或前景邊緣來判斷是否耳側搜尋區域是否出現手部物件。手勢判定模組250並利用手指特徵(例如，指甲、手指關節、指距等)來決定手部物件的前景邊緣。手勢判定模組250亦可以手腕的角度或手勢區塊的面積大小等特徵來作為前景邊緣的判斷。 For example, the gesture determination module 250 compares the current image with the reference image (which may be a previous image, such as the previous image or the first N images of the current image, or any of the preset images). The ear side search area (eg, the interest area R of FIG. 4) performs an image subtraction algorithm to obtain a foreground image. The gesture determination module 250 filters out the noise of the foreground image by using the ear side search area of the reference image to obtain the foreground edge. Alternatively, the gesture determination module 250 can utilize an edge detection algorithm to calculate the foreground edge. Thereby, the gesture determination module 250 can determine whether the hand object is present in the ear side search area according to the foreground image or the foreground edge. The gesture determination module 250 uses finger features (eg, nails, finger joints, finger distances, etc.) to determine the foreground edge of the hand item. Gesture determination module 250 can also Features such as the angle of the wrist or the size of the gesture block are used as the judgment of the foreground edge.

此外，手勢判定模組250更計算耳側搜尋區域上的影像的灰階值函數，依據灰階變化(例如，階梯狀變化(step edge)或直線狀變化(line edge))來計算手部物件的梯度值。 In addition, the gesture determination module 250 further calculates a gray scale value function of the image on the ear side search area, and calculates the hand object according to the gray scale change (for example, a step edge or a line edge). Gradient value.

在一些實施例中，儲存單元130例如可預先儲存手部物件的膚色樣本，手勢判定模組250更可依據顏色來判定手部物件。例如，手勢判定模組250可依據臉部物件或膚色樣本決定手部物件的膚色值，並找尋在影像中臉部物件以外的其他區域是否出現相同或相近色彩值，來認定手部物件。或者，手勢判定模組250更可依據影像畫面中膚色深淺變化來決定手部物件的邊緣。 In some embodiments, the storage unit 130 may store a skin color sample of the hand item in advance, and the gesture determination module 250 may determine the hand item according to the color. For example, the gesture determination module 250 can determine the skin color value of the hand object according to the face object or the skin color sample, and find whether the same or similar color value appears in other areas than the facial object in the image to identify the hand object. Alternatively, the gesture determination module 250 can determine the edge of the hand object according to the change in the skin color in the image frame.

在本實施例中，儲存單元130亦可儲存數個對應於特定手勢的手勢樣本，這些手勢樣本中具有各自對應的預設前景邊緣以及預設梯度值。手勢判定模組250便可將預設前景邊緣以及預設梯度值來與上述影像辨識所決定的前景邊緣以及梯度值進行比較，若比對符合，則判斷耳側搜尋區域中的手部物件符合特定手勢。接著，若在預定的時間內(例如，1秒、2秒等)所偵測到的序列影像上，在各影像上的上述耳側搜尋區域中都偵測到特定手勢的特徵時，即可確認人員的手拿手持裝置的狀況。 In this embodiment, the storage unit 130 may also store a plurality of gesture samples corresponding to specific gestures, and the gesture samples have respective preset foreground edges and preset gradient values. The gesture determination module 250 can compare the preset foreground edge and the preset gradient value with the foreground edge and the gradient value determined by the image recognition, and if the comparison is matched, determine that the hand object in the ear side search area meets Specific gestures. Then, if the characteristics of the specific gesture are detected in the ear-side search area on each image in the sequence image detected within a predetermined time (for example, 1 second, 2 seconds, etc.), Confirm the condition of the person holding the handheld device.

需說明的是，在一些實施例中，手勢判定模組250更可依據兩個耳側區域(例如，第一耳朵區域以及第二耳朵區域)所偵測到的特定手勢來決定人員的手拿手持裝置的狀況。例如，手勢判定模組250在第一耳朵區域與第二耳朵區域其中之一搜尋到特定手勢時，將判定人員正手拿手持裝置。手勢判定模組250在第一耳朵區域與第二耳朵區域同時搜尋到特定手勢時，則判定人員手拿手持裝置。在第一耳朵區域與第二耳朵區域並不存在特定手勢時，手勢判定模組250亦判定人員並未使用手拿手持裝置。 It should be noted that, in some embodiments, the gesture determination module 250 can further determine the person's hand according to the specific gesture detected by the two ear regions (eg, the first ear region and the second ear region). The condition of the handheld device. For example, hand The potential determination module 250, when one of the first ear region and the second ear region searches for a particular gesture, will determine that the person is holding the handheld device forehand. The gesture determination module 250 determines that the person holds the handheld device when the first ear region and the second ear region simultaneously search for a specific gesture. When there is no specific gesture in the first ear region and the second ear region, the gesture determination module 250 also determines that the person is not using the handheld device.

在手勢判定模組250搜尋特定手勢的過程中，亦可同時或相距特定時間間距(例如，1秒、2秒等)下透過嘴部比對模組270來比對在嘴部比對區域中所偵測的嘴部動作資訊是否符合預設嘴部資訊。例如，預設嘴部資訊是編碼序列、影像變動率或像素變動率等。在一個比對時間(例如，2秒、5秒等)內，嘴部比對模組270可比對影像序列所取得的嘴部動作資訊是否符合預設的編碼序列、影像變動率或像素變動率。若嘴部比對模組270判斷嘴部動作資訊是否符合預設嘴部資訊，則判定人員正在使用手持裝置。反之，則判斷人員並未使用手持裝置。 In the process of the gesture determination module 250 searching for a specific gesture, the mouth comparison module 270 can be compared in the mouth comparison region at the same time or at a certain time interval (for example, 1 second, 2 seconds, etc.). Whether the detected mouth motion information conforms to the preset mouth information. For example, the preset mouth information is a code sequence, a picture change rate, or a pixel change rate. During a comparison time (for example, 2 seconds, 5 seconds, etc.), the mouth comparison module 270 can compare the mouth motion information obtained by the image sequence with a preset coding sequence, image variation rate, or pixel variation rate. . If the mouth comparison module 270 determines whether the mouth motion information conforms to the preset mouth information, the determination person is using the handheld device. On the contrary, the judger does not use the handheld device.

需說明的是，在比對在嘴部比對區域中所偵測的嘴部動作資訊是否符合預設嘴部資訊之前，嘴部比對模組270更先紀錄嘴部動作資訊以便後續比對。以下將舉實施例說明。 It should be noted that the mouth comparison module 270 records the mouth motion information for subsequent comparison before comparing whether the mouth motion information detected in the mouth comparison region meets the preset mouth information. . The embodiment will be described below.

在一實施例中，嘴部比對模組270取得嘴部影像，且依據嘴部影像取得嘴部特徵，依據嘴部特徵來判斷嘴部影像為張開動作影像或閉合動作影像。在嘴部紀錄時間內，嘴部比對模組270依序紀錄嘴部比對區域中所偵測到的所有閉合動作影像或張開動作影像並轉換成編碼序列，且將編碼序列存入嘴部動作資訊。 In one embodiment, the mouth comparison module 270 obtains the mouth image, and obtains the mouth feature according to the mouth image, and determines the mouth image as the open motion image or the closed motion image according to the mouth feature. During the mouth recording time, the mouth comparison module 270 sequentially records all the closed motion images or the open motion images detected in the mouth comparison region and converts them into a coding sequence, and stores the coding sequence in the mouth. Department action information.

舉例而言，圖7是依據本發明一實施例說明紀錄嘴部動作資訊的流程範例。請參照圖7，嘴部比對模組270在嘴部比對區域中取出數個嘴部特徵(步驟S710)，例如，嘴部特徵包括上唇部位以及下唇部位。具體而言，嘴部比對模組270取出嘴部特徵的方法，可藉由找出嘴部區域的左右兩側邊界，定義出左側嘴角與右側嘴角。同樣的，嘴部比對模組270藉由找出嘴部比對區域的上下兩側的輪廓線，並經由左側嘴角與右側嘴角的連線辨識出上唇部位與下唇部位。接著，嘴部比對模組270將上唇部位與下唇部位的間距與間隙值(例如，0.5公分、1公分等)進行比較(步驟S720)。判斷上唇部位與下唇部位間的間距是否大於間隙值(步驟S730)。若上唇部位與下唇部位間的間距大於間隙值，則代表使用者的嘴巴是張開的，並藉以獲得一張張開動作影像(步驟S740)。反之，則嘴部比對模組270獲得一張閉合動作影像(步驟S750)。 For example, FIG. 7 is a flow chart showing an example of recording mouth movement information according to an embodiment of the present invention. Referring to FIG. 7, the mouth comparison module 270 takes out a plurality of mouth features in the mouth comparison region (step S710). For example, the mouth features include an upper lip portion and a lower lip portion. Specifically, the method of extracting the mouth features by the mouth comparison module 270 can define the left and right corners of the mouth by finding the left and right borders of the mouth region. Similarly, the mouth comparison module 270 recognizes the upper and lower lip portions by the line connecting the upper and lower sides of the mouth comparison region and the line connecting the left and right corners. Next, the mouth comparison module 270 compares the distance between the upper lip portion and the lower lip portion with a gap value (for example, 0.5 cm, 1 cm, etc.) (step S720). It is judged whether or not the interval between the upper lip portion and the lower lip portion is larger than the gap value (step S730). If the distance between the upper lip portion and the lower lip portion is greater than the gap value, the user's mouth is opened, and an open motion image is obtained (step S740). On the contrary, the mouth comparison module 270 obtains a closed motion image (step S750).

嘴部比對模組270依據閉合動作影像或張開動作影像產生編碼，並將編碼存入編碼序列中(例如，第N欄，N為正整數)(步驟S760)。編碼序列可以是二元編碼，或以摩斯電碼的方式編碼。例如，將張開動作影像定義為1，而閉合動作影像定義為0。若人員張開嘴巴兩個單位時間後再閉起嘴巴兩個單位時間，則編碼序列即為(1,1,0,0)。接著，嘴部比對模組270判斷是否達到嘴部紀錄時間(步驟S770)。例如，嘴部比對模組270在步驟S710時啟動計時器，並在步驟S770判斷計時器是否到達嘴部紀錄時間。若嘴部比對模組270判斷計時器尚未達到嘴部紀錄時間，則使N=N+1(步驟S780)，並返回步驟S710繼續分辨嘴部的開闔狀況。並且，下次產生的編碼將會儲存至例如是編碼序列中的下一欄位(例如，第N+1欄)。其中，每一個第N欄位均代表一個單位時間(例如，200毫秒、500毫秒等)，儲存於欄位中的編碼則代表一個單位時間所紀錄的所有張開動作影像以及閉合動作影像的順序。 The mouth comparison module 270 generates a code according to the closed motion image or the open motion image, and stores the code in the code sequence (for example, the Nth column, N is a positive integer) (step S760). The coding sequence can be binary coded or encoded in the form of a Morse code. For example, the open motion image is defined as 1 and the closed motion image is defined as 0. If the person opens his mouth for two unit hours and then closes his mouth for two unit hours, the coding sequence is (1, 1, 0, 0). Next, the mouth comparison module 270 determines whether or not the mouth recording time is reached (step S770). For example, the mouth comparison module 270 starts a timer at step S710, and determines whether the timer reaches the mouth record at step S770. between. If the mouth comparison module 270 determines that the timer has not reached the mouth recording time, N=N+1 is made (step S780), and the process returns to step S710 to continue to distinguish the opening condition of the mouth. And, the next generated code will be stored, for example, to the next field in the code sequence (for example, column N+1). Each of the Nth fields represents a unit time (for example, 200 milliseconds, 500 milliseconds, etc.), and the code stored in the field represents all open motion images recorded in a unit time and the sequence of the closed motion images. .

需說明的是，此範例中可在步驟S710至S780的流程中加入延遲時間(例如，100毫秒、200毫秒等)，使步驟S710至S780流程所耗費的時間等於單位時間，以使每個第N欄位代表一個單位時間。最後，嘴部比對模組270將編碼序列存入嘴部動作資訊(步驟S790)。 It should be noted that, in this example, a delay time (for example, 100 milliseconds, 200 milliseconds, etc.) may be added in the processes of steps S710 to S780, so that the time spent in the processes of steps S710 to S780 is equal to the unit time, so that each The N field represents a unit time. Finally, the mouth matching module 270 stores the code sequence in the mouth motion information (step S790).

而在另一實施例中，在嘴部比對時間內，嘴部比對模組270將嘴部比對區域的影像與樣板影像進行比對，以產生嘴型編碼。嘴部比對模組270將嘴型編碼存入編碼序列中，且將編碼序列存入嘴部動作資訊。 In another embodiment, during the mouth comparison time, the mouth comparison module 270 compares the image of the mouth alignment area with the template image to generate a mouth pattern. The mouth comparison module 270 stores the mouth type code in the code sequence and stores the code sequence in the mouth motion information.

舉例而言，圖8是依據本發明另一實施例說明紀錄嘴部動作資訊的流程範例。在本實施例中，嘴部動作資訊亦可代表多種嘴型之組合序列。請參照圖8，嘴部比對模組270將嘴部比對區域的影像與儲存單元130中的數個樣板(pattern)影像進行比對(步驟S810)。樣板影像可以是具有辨識性的特定嘴部動作影像或唇語等，例如，朗讀日文五十音的「、、、、」、中文的「喂、您好、請說、我是」或英文的「hello」等的嘴部各處肌肉呈現的動作。這些樣板影像分別具有一定的變動彈性，即便人員臉部影像中的嘴型與樣板影像具有些微的差異，只要差異在變動彈性可容許的範圍內，嘴部比對模組270依然可辨識為與樣板影像相符。 For example, FIG. 8 is a flow chart showing an example of recording mouth movement information according to another embodiment of the present invention. In this embodiment, the mouth motion information may also represent a combined sequence of a plurality of mouth types. Referring to FIG. 8, the mouth comparison module 270 compares the image of the mouth alignment area with a plurality of pattern images in the storage unit 130 (step S810). The sample image can be a specific specific mouth motion image or lip language, for example, reading the Japanese syllabary. , , , , "Chinese, "Hello, Hello, Please, I am" or English "hello" and other muscles in the mouth. These template images have a certain degree of flexibility, even if there is a slight difference between the mouth shape and the template image in the human face image, as long as the difference is within the allowable range of the variable elasticity, the mouth comparison module 270 can still be recognized as The sample image matches.

接著，嘴部比對模組270判斷嘴部比對區域的影像相符於樣板影像(步驟S820)。若比對結果為相符，則嘴部比對模組270產生嘴型編碼，並將嘴型編碼存入編碼序列中(例如，第M欄，M為正整數)(步驟S830)。若比對結果為不符合，則處理單元150將M=M+1(步驟S840)，並返回步驟S810。接著，嘴部比對模組270判斷是否達到嘴部比對時間(步驟S850)。例如，嘴部比對模組270在步驟S810時啟動計時器，並在步驟S840判斷計時器是否到達嘴部比對時間。當計時器到達嘴部比對時間後，嘴部比對模組270將編碼序列存入嘴部動作資訊(步驟S870)。若嘴部比對模組270判斷計時器尚未達到嘴部比對時間，則使M=M+1(步驟S860)，並返回步驟S810繼續將嘴部比對區域的影像與樣板影像進行比對。 Next, the mouth matching module 270 determines that the image of the mouth matching area matches the template image (step S820). If the result of the comparison is consistent, the mouth comparison module 270 generates a mouth type code and stores the mouth type code in the code sequence (for example, the Mth column, M is a positive integer) (step S830). If the result of the comparison is not satisfied, the processing unit 150 sets M = M + 1 (step S840), and returns to step S810. Next, the mouth comparison module 270 determines whether or not the mouth comparison time is reached (step S850). For example, the mouth comparison module 270 starts the timer at step S810, and determines whether the timer reaches the mouth comparison time at step S840. When the timer reaches the mouth comparison time, the mouth comparison module 270 stores the code sequence in the mouth motion information (step S870). If the mouth comparison module 270 determines that the timer has not reached the mouth comparison time, then M=M+1 (step S860), and returns to step S810 to continue comparing the image of the mouth comparison area with the template image. .

此外，在事件數判斷模組280判定人員是否正在使用手持裝置之前，更依據例如是手指特徵點的數量以及出現使用手持裝置的事件數來避免誤判狀況。以下舉實施例說明。 In addition, before the event number determination module 280 determines whether the person is using the handheld device, the false positive condition is further avoided depending on, for example, the number of finger feature points and the number of events in which the handheld device is used. The following is a description of the examples.

在一實施例中，在預設時間內，事件數判斷模組280判斷人員正在使用手持裝置的手持事件數與偵測到臉部物件之臉部事件數的比例是否大於比例門檻值。具體而言，事件數判斷模組280先判斷手持裝置偵測功能是否已經啟動一段預設時間(例如，5秒、10秒等)，再統計這段偵測時間內偵測到臉部物件(事件F)的臉部事件數N_F以及耳側搜尋區域出現特定手勢且同時偵測到嘴部動作資訊符合預設嘴部資訊(事件P)的手持事件數N_P。接著，事件數判斷模組280計算臉部事件數N_P與手持事件數N_F之比值，並判斷此比值是否大於門檻值(例如，0.6、0.7等)，以認定人員是否正在使用手持裝置。也就是說，事件數判斷模組280要偵測到人臉物件的前提下，才可能有偵測到事件P。 In an embodiment, the event number determination module 280 determines whether the ratio of the number of hand-held events that the person is using the handheld device to the number of face events that detect the face object is greater than the proportional threshold value within the preset time. Specifically, the event number determination module 280 first determines whether the handheld device detection function has been activated for a preset period of time (for example, 5 seconds, 10 seconds, etc.), and then detects the detected face object during the detection time ( The number of facial events N _{F of the} event F) and the specific gesture of the ear side search area and the number of hand-held events N _{P of the} mouth movement information matching the preset mouth information (event P) are simultaneously detected. Next, the event number determination module 280 calculates the ratio of the number of face events N _P to the number of hand-held events N _F and determines whether the ratio is greater than a threshold (eg, 0.6, 0.7, etc.) to determine whether the person is using the handheld device. That is to say, the event number judging module 280 may detect the event P on the premise that the face object is detected.

此外，在另一實施例中，在上述預設時間內，事件數判斷模組280更判斷偵測到臉部物件之臉部事件數N_P是否大於事件門檻值(例如，10次、15次等)。需說明的是，當上述條件未符合(例如，臉部事件數N_P與手持事件數N_F之比值未大於門檻值或臉部事件數N_P未大於事件門檻值)時，則事件數判斷模組280待下個預設時間到達時，再次判斷上述條件是否符合。 In addition, in another embodiment, during the preset time, the event number determining module 280 further determines whether the number of facial events N _{P of the} detected facial object is greater than the event threshold (for example, 10 times, 15 times). Wait). It should be noted that when the above conditions are not met (for example, the ratio of the number of facial events N _P to the number of hand-held events N _F is not greater than the threshold or the number of facial events N _{P is} not greater than the threshold of the event), the number of events is determined. When the module 280 arrives at the next preset time, it is determined again whether the above conditions are met.

此外，當處理單元150透過嘴部區域定義模組240、嘴部比對模組270及事件數判斷模組280判斷人員正使用手持裝置時，更透過警示模組290啟動警示程序。具體而言，處理單元150產生提示信號至警示模組290，警示模組便可依據提示信號來警示人員。例如，警示模組290可透過警示單元190顯示文字、影像或圖像說明警告事宜(例如，小心！駕駛過程請勿使用手持裝置！)。警示單元190的燈光模組亦可以特定頻率閃爍燈光或發出特定顏色的燈光(例如，紅色、綠色等)。警示單元190的振動模組例如是利用振動馬達產生固定頻率或變動頻率等方式振動。或者，警示模組290亦可透過警示單元190的揚聲器模組發出提示音。 In addition, when the processing unit 150 determines that the person is using the handheld device through the mouth area definition module 240, the mouth comparison module 270, and the event number determination module 280, the processing module 150 further activates the warning program through the warning module 290. Specifically, the processing unit 150 generates a prompt signal to the alert module 290, and the alert module can alert the person according to the prompt signal. For example, the alert module 290 can display a text, image, or image through the alert unit 190 to indicate a warning (eg, be careful not to use the handheld device during driving!). The light module of the warning unit 190 can also flash or emit at a specific frequency. Lights of a specific color (for example, red, green, etc.). The vibration module of the warning unit 190 vibrates, for example, by a vibration motor to generate a fixed frequency or a variable frequency. Alternatively, the alert module 290 can also emit a prompt tone through the speaker module of the alert unit 190.

在一些實施例中，預先儲存在儲存單元130中的預設嘴部資訊可以是取自樣板影像排列組合而成的預設嘴型編碼序列，且各預設嘴型編碼序列均對應於提示信號。例如，當使用者受到挾持時，人員可由嘴部作出唸誦「求救」的動作，但不必發出聲音。處理單元150便能以難以被察覺的方式，使警示單元190產生求救訊號並發送至保全中心等處求援(警示模組可包括通訊模組)。 In some embodiments, the preset mouth information stored in the storage unit 130 may be a preset mouth type coding sequence obtained by combining the template image arrangement, and each preset mouth type coding sequence corresponds to the prompt signal. . For example, when the user is held hostage, the person can make a "salvation" action from the mouth, but does not have to make a sound. The processing unit 150 can cause the alert unit 190 to generate a distress signal and send it to the security center or the like for assistance in a manner that is difficult to detect (the alert module can include a communication module).

藉此，本發明實施例的裝置100可透過準確且快速地影像辨識技術來偵測人員是否使用手持裝置，以降低駕駛過程中接聽來電而造成意外的發生率。以下另舉範例說明。 Therefore, the device 100 of the embodiment of the present invention can detect whether a person uses a handheld device through an accurate and fast image recognition technology, so as to reduce the incidence of accidents caused by answering an incoming call during driving. The following is an example.

圖9是依據本發明一實施例說明偵測人員使用手持裝置的範例。請參照圖9，處理單元150透過影像擷取單元110取得影像畫面(步驟S910)，且臉部偵測模組210判斷影像畫面中是否出現人臉(步驟S920)。若臉部偵測模組210判斷此影像畫面並未出現人臉，則返回S910，繼續取得接續的影像畫面。反之，當臉部偵測模組210偵測到人臉時，耳側區域定義模組230及嘴部區域定義模組240依據臉部物件來分別決定耳側搜尋區域以及嘴部比較區域(步驟S930)。接著，手勢判定模組250偵測後續影像畫面是否出現人員的手部物件，並藉以取得手指特徵點中的前景邊緣以及梯度(步驟S950)，利用儲存單元130預先儲存的皮膚模組(例如，電腦程式語言撰寫的程式碼片段)來比對手部物件並判斷影像畫面中的出現的手勢(步驟S960)，且透過手指特徵點偵測模組(例如，電腦程式語言撰寫的程式碼片段)來比對影像畫面中的是否存在特定手勢(步驟S970)，更判斷嘴部比較區域中的嘴部動作是否符合預設嘴部資訊。然後，手勢判定模組250判斷偵測結果是否通過空間過濾(spatial filter)以及時間過濾(temporal filter)(步驟S980)。空間過濾例如是判斷特定手勢是否出現在依據臉部物件所決定的耳側搜尋區域。而時間過濾例如是判斷5秒鐘內偵測到使用手機的手持事件數與偵測到人臉的臉部事件數之比值是否大於0.6。若偵測結果未能通過空間過濾或是時間過濾，則返回S910。反之，警示模組290透過警示單元190發出警示(步驟S990)。 FIG. 9 is a diagram illustrating an example of a detecting person using a handheld device according to an embodiment of the invention. Referring to FIG. 9, the processing unit 150 obtains a video image through the image capturing unit 110 (step S910), and the face detecting module 210 determines whether a human face appears in the video image (step S920). If the face detection module 210 determines that there is no face on the image, the process returns to S910 to continue to obtain the connected image. On the other hand, when the face detection module 210 detects a human face, the ear side region definition module 230 and the mouth region definition module 240 respectively determine the ear side search area and the mouth comparison area according to the face object (steps) S930). Then, the gesture determination module 250 detects subsequent image frames. Whether the hand item of the person appears, and the foreground edge and the gradient in the finger feature point are obtained (step S950), and the skin module (for example, the code part written by the computer programming language) pre-stored by the storage unit 130 is used to compare the opponent. And determining an appearance gesture in the image frame (step S960), and comparing the presence or absence of a specific gesture in the image frame by using a finger feature point detection module (for example, a code segment written in a computer programming language) (step S970), it is further determined whether the mouth movement in the mouth comparison area conforms to the preset mouth information. Then, the gesture determination module 250 determines whether the detection result passes a spatial filter and a temporal filter (step S980). Spatial filtering, for example, is to determine whether a particular gesture appears in the ear side search area determined by the facial object. The time filtering is, for example, determining whether the ratio of the number of hand-held events detected using the mobile phone to the number of face events detecting the face is greater than 0.6 within 5 seconds. If the detection result fails to pass spatial filtering or time filtering, it returns to S910. On the contrary, the warning module 290 issues an alert through the alert unit 190 (step S990).

需說明的是，上述駕駛汽車(亦可以是飛機、船等)的情境是為了幫助實施例說明，但本發明實施例亦可應用在自動交易裝置或其他監控人員是否正使用手持裝置的電子裝置或場所。 It should be noted that the above situation of driving a car (which may also be an airplane, a ship, etc.) is for the purpose of assisting the description of the embodiment, but the embodiment of the present invention can also be applied to an electronic device of an automatic transaction device or other monitoring personnel who is using the handheld device. Or place.

綜上所述，本發明實施例所述的裝置可藉由影像辨識技術判斷耳側搜尋區域是否存在特定手勢，同時判斷人員的嘴部動作是否符合預設嘴部資訊。當判斷結果都符合時，本發明實施例的裝置便可判斷人員正在使用手持裝置，亦可發出提示信號來警示人員。藉此，本發明實施例可有效且即時地監控駕駛行為或其他不適合使用手持裝置的情況，例如，駕駛人員更可提高警覺，自動交易裝置可幫助警察單位快速處理電話詐欺等問題。 In summary, the device according to the embodiment of the present invention can determine whether a specific gesture exists in the ear side search area by using image recognition technology, and determine whether the mouth motion of the person meets the preset mouth information. When the judgment results are all met, the device of the embodiment of the present invention can judge that the person is using the handheld device, and can also issue a prompt signal to alert the person. Thereby, the embodiment of the invention can effectively and immediately monitor the driving behavior or other situations that are not suitable for using the handheld device, for example, the driver can be more alert. Automated trading devices help police units quickly deal with issues such as phone fraud.

雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明的精神和範圍內，當可作些許的更動與潤飾，故本發明的保護範圍當視後附的申請專利範圍所界定者為準。 Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention, and any one of ordinary skill in the art can make some changes and refinements without departing from the spirit and scope of the present invention. The scope of the invention is defined by the scope of the appended claims.

S310~S370‧‧‧步驟 S310~S370‧‧‧Steps

Claims

A method for detecting a person using a handheld device, which is applicable to an electronic device, comprising: capturing at least one image of a person; analyzing the at least one image to obtain a facial object; determining at least one ear side according to the facial object Searching area and a mouth matching area; and searching for whether a specific gesture exists in the at least one ear side search area, and simultaneously detecting a mouth motion information detected in the mouth matching area Compliance with a preset mouth information to determine whether the person is using a handheld device, wherein the mouth matching area is detected to generate a code sequence, and determining whether the mouth motion information conforms to the preset according to the code sequence Mouth information.

The method of claim 1, wherein the mouth motion is detected in the mouth comparison region by searching for the specific gesture in the at least one ear search region. The step of determining whether the person is using the handheld device includes: detecting whether the first hand object of the person is present in the at least one ear side search area; obtaining at least one of the hand items a finger feature point, wherein the at least one finger feature point includes a foreground edge and at least one gradient value; and determining whether the particular gesture exists by comparing the at least one finger feature point.

The method of claim 1, wherein at least one of Determining whether the specific gesture is present in the ear search area, and comparing whether the detected mouth motion information meets the preset mouth information in the mouth comparison area to determine whether the person is using the handheld After the step of the device, the method further includes: determining, within a preset time, whether the ratio of the number of handheld events that the person is using the handheld device to the number of facial events detecting the facial object is greater than a proportional threshold To generate a prompt signal.

The method of claim 1, wherein the at least one ear side search area comprises a first ear area and a second ear area, and the at least one ear side search area and the mouth are determined according to the facial object The step of comparing the regions includes: detecting a shoulder feature of the at least one image according to the facial object, and obtaining a shoulder object; determining the ear side search according to the shoulder object and the facial object And filtering an interference feature in the ear side search area to detect the first ear area and the second ear area.

The method of claim 1, wherein the step of determining the at least one ear side search area and the mouth comparison area according to the facial object comprises: identifying a nostril positioning point from the facial object; Setting a mouth region based on the nostril positioning point; and performing an image processing on the image of the mouth region to determine a mouth object of the person; The mouth matching region is determined in the mouth region according to the mouthpiece.

The method of claim 1, wherein the presence or absence of the specific gesture is searched for in the ear-side search area, and whether the mouth movement corresponds to the preset mouth in the mouth comparison area The step of determining whether the person is using the handheld device comprises: obtaining at least one mouth image, and obtaining a plurality of mouth features according to the at least one mouth image; determining the at least one according to the mouth features The mouth image is an open motion image or a closed motion image; in the mouth recording time, all the closed motion images or the open motion images detected in the mouth comparison region are sequentially recorded and Converting to the code sequence; and storing the code sequence in the mouth motion information.

The method of claim 1, wherein the presence or absence of the specific gesture is searched for in the ear-side search area, and whether the mouth movement corresponds to the preset mouth in the mouth comparison area The information to determine whether the person is using the handheld device includes: comparing the image of the mouth comparison area with the plurality of template images during a mouth comparison time to generate a mouth type code And storing the mouth type code in the code sequence; and storing the code sequence in the mouth motion information.

A device for detecting a person using a handheld device includes: an image capturing unit that captures at least one image of a person; a storage unit that stores the image and a preset mouth information; and a processing unit coupled to The storage unit is configured to obtain the image, wherein the processing unit analyzes the at least one image to obtain a facial object, and the processing unit determines at least one of the ear side search area and a mouth comparison area according to the facial object, and borrows Determining the presence or absence of a specific gesture in the at least one ear side search area, and determining whether the mouth motion information detected in the mouth comparison area matches the preset mouth information Whether a handheld device is being used, wherein the processing unit detects the mouth matching area to generate a code sequence, and determines whether the mouth motion information conforms to the preset mouth information according to the code sequence.

The apparatus of claim 8, wherein the processing unit detects whether a hand item of the person is present in the at least one ear side search area, and obtains at least one finger feature point of the hand item, wherein the at least one The one-finger feature point includes a foreground edge and at least one gradient value, and the processing unit determines whether the specific gesture exists if the at least one finger feature point is present.

The device of claim 8, wherein the processing unit determines a number of hand-held events that the person is using the handheld device and detects a number of facial events of the face object within a predetermined time period. Whether the ratio is greater than a proportional threshold.

The device of claim 8, wherein the at least one ear side The search area includes a first ear area and a second ear area, and the processing unit detects a shoulder feature of the at least one image according to the facial object to obtain a shoulder object, and the processing unit is configured according to the The shoulder object and the facial object determine the ear side search area, and filter an interference feature in the ear side search area to detect the first ear area and the second ear area.

The device of claim 8, wherein the processing unit identifies a nostril positioning point from the facial object, and the mouth searching area is set based on the nostril positioning point, and the image of the mouth searching area is performed. An image processing is performed to determine one of the mouthpieces of the person, and a mouth region is taken out from the mouth portion in accordance with the mouthpiece.

The apparatus of claim 8, wherein the processing unit obtains at least one mouth image through the image capturing unit, and obtains a plurality of mouth features according to the at least one mouth image, the processing unit according to the mouths The feature is used to determine that the at least one mouth image is an open motion image or a closed motion image. During a mouth recording time, the processing unit sequentially records all the detected objects in the mouth comparison region. The motion image or the motion image is closed and converted into the code sequence, and the code sequence is stored in the mouth motion information.

The device of claim 8, wherein the processing unit compares the image of the mouth comparison area with the plurality of template images during a mouth comparison time to generate a mouth type code, The mouth type code is stored in the code sequence, and the code sequence is stored in the mouth motion information.