TWI739675B

TWI739675B - Image recognition method and apparatus

Info

Publication number: TWI739675B
Application number: TW109141300A
Authority: TW
Inventors: 徐立恆; 黃騰瑩
Original assignee: 友達光電股份有限公司
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2021-09-11
Also published as: CN113221734A; TW202221563A

Abstract

An image recognition and apparatus are provided. After an image sequence is obtained, a behavior recognition model is used to analyze the image sequence to obtain a behavior posture of a user, and a gaze recognition model is used to analyze the image sequence to obtain a sight line feature of the user. Afterwards, a first score corresponding to the behavior posture is calculated, and a second score corresponding to the sight line feature is calculated. A risk score is calculated based on the first score and the second score and a corresponding warning signal is issued based on the risk score.

Description

Image recognition method and device

本發明是有關於一種影像處理方法及裝置，且特別是有關於一種影像辨識方法及裝置。The present invention relates to an image processing method and device, and more particularly to an image recognition method and device.

隨著交通運輸的發達，如何提高安全性是所有車廠不斷努力與精進的方向。而在智慧車用的技術應用上，安全考量成為第一要素。為了有效提升行車安全，進一步預防事故發生，因此智慧車用的應用與技術，越來越被關注。據此，如何透過智慧車用的輔助來有效且即時地監控駕駛行為，藉此降低由人為疏失所造成的行車事故，則為目前需考量的重點。With the development of transportation, how to improve safety is the direction of all car manufacturers' continuous efforts and diligence. In the technical application of smart cars, safety considerations become the first element. In order to effectively improve driving safety and further prevent accidents, the application and technology of smart cars are getting more and more attention. Accordingly, how to effectively and instantly monitor driving behavior through the assistance of smart cars, thereby reducing driving accidents caused by human negligence, is the focus of current consideration.

本發明提供一種影像辨識方法及裝置，可提高辨識準確率。The invention provides an image recognition method and device, which can improve the recognition accuracy.

本發明的影像辨識方法，包括：獲得影像序列；利用行為辨識模型分析影像序列，以獲得使用者的行為姿態；計算行為姿態對應的第一分數；利用視線辨識模型分析影像序列，以獲得該使用者的視線特徵；計算視線特徵對應的第二分數；基於第一分數與第二分數來計算危險分數；以及根據危險分數來發出對應的警示訊號。The image recognition method of the present invention includes: obtaining an image sequence; analyzing the image sequence using a behavior recognition model to obtain the user's behavior posture; calculating the first score corresponding to the behavior posture; using the sight recognition model to analyze the image sequence to obtain the use Calculate the second score corresponding to the sight feature; calculate the risk score based on the first score and the second score; and issue a corresponding warning signal based on the risk score.

在本發明的一實施例中，在獲得使用者的行為姿態之後，更包括：在判定行為姿態為分心行為的情況下，計算行為姿態對應的第一分數，包括：將第一分數設定為分心行為所對應的主分數；以及在持續判定行為姿態為分心行為的過程中，每經過第一判斷時間，將第一分數加上對應於主分數的輔助分數。In an embodiment of the present invention, after obtaining the behavior posture of the user, it further includes: in the case of determining that the behavior posture is a distracting behavior, calculating the first score corresponding to the behavior posture includes: setting the first score to The main score corresponding to the distracted behavior; and in the process of continuously determining that the behavior posture is the distracted behavior, each time the first judgment time passes, the first score is added to the auxiliary score corresponding to the main score.

在本發明的一實施例中，在獲得使用者的行為姿態之後，更包括：在判定行為姿態為疲勞行為的情況下，設定第一分數為對應至疲勞等級為1的第一預設分數；在疲勞等級為1的情況下經過第一預設時間之後，判定行為姿態仍然為疲勞行為的情況下，將第一分數設定為對應至疲勞等級為2的第二預設分數，其中第二預設分數大於第一預設分數；以及在疲勞等級為2的情況下經過第二預設時間之後，判定行為姿態仍然為疲勞行為的情況下，將第一分數設定為對應至疲勞等級為3的第三預設分數，其中第三預設分數大於第二預設分數。In an embodiment of the present invention, after obtaining the behavior posture of the user, it further includes: in the case of determining that the behavior posture is a fatigue behavior, setting the first score as the first preset score corresponding to the fatigue level of 1; In the case where the fatigue level is 1, after the first preset time has elapsed, and in the case where it is determined that the behavior posture is still a fatigue behavior, the first score is set as the second preset score corresponding to the fatigue level 2, wherein the second prediction Set the score to be greater than the first preset score; and in the case of a fatigue level of 2, after the second preset time has elapsed, and in the case that the behavior posture is still a fatigue behavior, set the first score to correspond to the fatigue level of 3. The third preset score, where the third preset score is greater than the second preset score.

在本發明的一實施例中，在獲得使用者的視線特徵之後，更包括：基於視線特徵判斷使用者是否未面向指定方向；以及在判定使用者未面向指定方向時，計算第二分數。計算第二分數的步驟包括：每經過第二判斷時間，將第二分數加上指定分數。In an embodiment of the present invention, after obtaining the user's line of sight feature, it further includes: judging whether the user is not facing the designated direction based on the line of sight feature; and calculating the second score when it is determined that the user is not facing the designated direction. The step of calculating the second score includes: adding a designated score to the second score every time the second judgment time has elapsed.

在本發明的一實施例中，在獲得使用者的視線特徵之後，更包括：基於視線特徵判斷是否符合視線疲勞特徵；在判定視線特徵符合視線疲勞特徵時，設定第二分數為對應至疲勞等級為1的第一預設分數；在疲勞等級為1的情況下經過第一預設時間之後，判定視線特徵仍然符合視線疲勞特徵時，將第二分數設定為對應至疲勞等級為2的第二預設分數，其中第二預設分數大於第一預設分數；以及在疲勞等級為2的情況下經過第二預設時間之後，判定視線特徵仍然符合視線疲勞特徵時，將第二分數設定為對應至疲勞等級為3的第三預設分數，其中第三預設分數大於第二預設分數。In an embodiment of the present invention, after obtaining the user's line of sight feature, it further includes: judging whether the line of sight feature meets the line of sight fatigue feature based on the line of sight feature; when determining that the line of sight feature meets the line of sight fatigue feature, setting the second score to correspond to the fatigue level The first preset score is 1; after the first preset time has elapsed when the fatigue level is 1, when it is determined that the line of sight feature still meets the line of sight fatigue feature, the second score is set to correspond to the second score of fatigue level 2. A preset score, where the second preset score is greater than the first preset score; and when the second preset time has elapsed when the fatigue level is 2, when it is determined that the line of sight feature still meets the line of sight fatigue feature, the second score is set to Corresponds to a third preset score with a fatigue level of 3, where the third preset score is greater than the second preset score.

在本發明的一實施例中，上述影像辨識方法更包括：在偵測到行為姿態回復至正常狀態時，將第一分數歸零；以及在偵測到視線方向回到指定方向時，將第二分數歸零。In an embodiment of the present invention, the above-mentioned image recognition method further includes: resetting the first score to zero when the behavior posture is detected to return to a normal state; Two points are returned to zero.

在本發明的一實施例中，在根據危險分數來發出對應的警示訊號之後，更包括：持續發出警示訊號，直到偵測到行為姿態回復至正常狀態以及偵測到視線方向回到指定方向。In an embodiment of the present invention, after the corresponding warning signal is issued according to the risk score, it further includes: continuously issuing the warning signal until the detected behavior posture returns to the normal state and the detected line of sight direction returns to the specified direction.

在本發明的一實施例中，上述根據危險分數來發出對應的警示訊號的步驟，包括：基於危險分數判斷目前危險等級為第一警示等級、第二警示等級、第三警市等級或第四警示等級；在判定目前危險等級為第一警示等級，發出閃燈訊號以及提示短音；在判定目前危險等級為第二警示等級，發出閃燈訊號以及持續發出第一提示長音；在判定目前危險等級為第三警示等級，發出閃燈訊號以及持續發出第二提示長音；以及在判定目前危險等級為第四警示等級，發送通知訊號至遠端管制中心。In an embodiment of the present invention, the step of issuing the corresponding warning signal according to the risk score includes: judging based on the risk score that the current risk level is the first warning level, the second warning level, the third warning level, or the fourth warning level. Warning level; when the current danger level is determined to be the first warning level, a flashing light signal and short sound are issued; when the current danger level is determined to be the second warning level, a flashing light signal is issued and the first long sound is continuously emitted; when the current danger level is determined The level is the third warning level, which emits a flashing signal and the second long beeping sound continuously; and when the current danger level is determined to be the fourth warning level, a notification signal is sent to the remote control center.

在本發明的一實施例中，透過行為辨識模型來標記影像序列中的使用者的深度特徵，透過視線辨識模型來找出影像序列中的使用者的多個特徵點。In an embodiment of the present invention, the behavior recognition model is used to mark the depth characteristics of the user in the image sequence, and the gaze recognition model is used to find multiple characteristic points of the user in the image sequence.

本發明的影像辨識裝置，包括：影像擷取裝置，擷取影像序列；儲存裝置，包括行為辨識模型、視線辨識模型以及危險判定模型；以及處理器，耦接至影像擷取裝置儲存裝置。處理器經配置以：自影像擷取裝置獲得影像序列；利用行為辨識模型分析影像序列，以獲得使用者的行為姿態；利用視線辨識模型分析影像序列，以獲得使用者的視線特徵；利用危險判定模型計算行為姿態對應的第一分數，並且計算視線特徵對應的第二分數，之後基於第一分數與第二分數來計算危險分數，並根據危險分數來發出對應的警示訊號。The image recognition device of the present invention includes: an image capture device that captures an image sequence; a storage device including a behavior recognition model, a line of sight recognition model, and a hazard determination model; and a processor coupled to the image capture device storage device. The processor is configured to: obtain the image sequence from the image capture device; analyze the image sequence using the behavior recognition model to obtain the user's behavior posture; use the line of sight recognition model to analyze the image sequence to obtain the user's line of sight characteristics; use the risk determination The model calculates the first score corresponding to the behavior and posture, and calculates the second score corresponding to the line of sight feature, and then calculates the risk score based on the first score and the second score, and issues corresponding warning signals based on the risk score.

基於上述，本發明採用行為辨識模型與視線辨識模型行來同時識別行為姿態與視線方向，來獲得更準確的辨識結果，並且基於危險分數來發出對應的警示訊號，藉此可提醒駕駛與後方來車。Based on the above, the present invention uses the behavior recognition model and the line of sight recognition model line to simultaneously recognize the behavior posture and the line of sight direction to obtain a more accurate recognition result, and issue corresponding warning signals based on the hazard score, thereby reminding the driver and the rear vehicle.

圖1是依照本發明一實施例的影像辨識裝置的方塊圖。請參照圖1，影像辨識裝置100包括處理器110、儲存裝置120以及影像擷取裝置130。處理器110耦接至儲存裝置120以及影像擷取裝置130。在一實施例中，影像辨識裝置100可以設置在車輛中，與車輛的中控台連線。在另一實施例中，影像辨識裝置100可以整合在車輛的中控台內。FIG. 1 is a block diagram of an image recognition device according to an embodiment of the invention. Please refer to FIG. 1, the image recognition device 100 includes a processor 110, a storage device 120 and an image capture device 130. The processor 110 is coupled to the storage device 120 and the image capturing device 130. In an embodiment, the image recognition device 100 may be installed in a vehicle and connected to the center console of the vehicle. In another embodiment, the image recognition device 100 may be integrated in the center console of the vehicle.

處理器110例如為中央處理單元（Central Processing Unit，CPU）、物理處理單元（Physics Processing Unit，PPU）、可程式化之微處理器（Microprocessor）、嵌入式控制晶片、數位訊號處理器（Digital Signal Processor，DSP）、特殊應用積體電路（Application Specific Integrated Circuits，ASIC）或其他類似裝置。The processor 110 is, for example, a central processing unit (CPU), a physical processing unit (PPU), a programmable microprocessor (Microprocessor), an embedded control chip, and a digital signal processor (Digital Signal Processor). Processor, DSP), Application Specific Integrated Circuits (ASIC) or other similar devices.

儲存裝置120包括行為辨識模型121、視線辨識模型123以及危險判定模型125。行為辨識模型121、視線辨識模型123以及危險判定模型125分別由一或多個程式碼片段所組成，透過處理器110來執行，以達成其對應的功能。The storage device 120 includes a behavior recognition model 121, a sight recognition model 123 and a danger determination model 125. The behavior recognition model 121, the sight recognition model 123, and the danger determination model 125 are respectively composed of one or more code fragments, which are executed by the processor 110 to achieve their corresponding functions.

影像擷取裝置130例如是採用電荷耦合元件（Charge coupled device，CCD）鏡頭、互補式金氧半電晶體（Complementary metal oxide semiconductor transistors，CMOS）鏡頭的攝影機、照相機等，用以獲得影像序列。在此，影像序列可以是動態影像或靜態影像。The image capturing device 130 is, for example, a video camera, a camera, etc., using a charge coupled device (CCD) lens, a complementary metal oxide semiconductor transistors (CMOS) lens, and the like to obtain an image sequence. Here, the image sequence can be a dynamic image or a static image.

行為辨識模型121主要處理人物行為，包含分心、疲勞、異常駕駛等人物身體特徵與動作。而在行為辨識模型121的訓練階段，可針對多種行為姿態的特徵來進行模型訓練。這些行為姿態還可分為分心行為與疲勞行為兩大類別。分心行為包括車輛行駛中飲食、車輛行駛中操作車內中控台、車輛行駛中使用智慧型行動裝置、在車輛行駛中手離開方向盤。疲勞行為包括車輛行駛中打哈欠、車輛行駛中打瞌睡等。The behavior recognition model 121 mainly deals with character behavior, including distraction, fatigue, abnormal driving, and other physical characteristics and actions of the character. In the training phase of the behavior recognition model 121, model training can be performed on the characteristics of a variety of behavior postures. These behaviors can also be divided into two categories: distracted behaviors and fatigue behaviors. Distracted behaviors include eating and drinking while the vehicle is running, operating the center console in the vehicle while the vehicle is running, using smart mobile devices while the vehicle is running, and leaving the steering wheel while the vehicle is running. Fatigue behaviors include yawning while driving, dozing off while driving, etc.

在一實施例中，使用DensePose模型來建立行為辨識模型121。DensePose模型使用人物三維特徵來建置模型，比起傳統的二維影像還增加了深度特徵，故，可透過行為辨識模型121對影像序列中使用者的每個人體部件進行深度特徵的標記，促使類神經網路能依照二維影像中現有人體部件來推測出剩餘被遮蔽的人物特徵與動作，藉以大幅提高傳統行為辨識的辨識種類與準確度。同時，此架構可使用一般影像擷取裝置130來取得二維影像，再透過DensePose模型來建置三維特徵，除了可提高辨識準確度外，亦可兼具降低成本、易普及採用的優勢。採用了DensePose模型的行為辨識模型121中，針對未被遮蔽影像的辨識率可高達90%，針對部分被遮敝影像的辨識率亦可達到70%～80%。此外，採用了DensePose模型的行為辨識模型121可以從單張圖框（frame）來識別出行為，因而加速了辨識時間。In one embodiment, the DensePose model is used to build the behavior recognition model 121. The DensePose model uses three-dimensional features of characters to build models. Compared with traditional two-dimensional images, it also adds depth features. Therefore, the behavior recognition model 121 can be used to mark the depth features of each human body part of the user in the image sequence, prompting Neural networks can infer the remaining occluded character features and actions based on the existing human body parts in the two-dimensional image, thereby greatly improving the recognition types and accuracy of traditional behavior recognition. At the same time, this architecture can use a general image capturing device 130 to obtain a two-dimensional image, and then use a DensePose model to build a three-dimensional feature. In addition to improving the recognition accuracy, it also has the advantages of reducing cost and easy popularization. In the behavior recognition model 121 using the DensePose model, the recognition rate for unoccluded images can be as high as 90%, and the recognition rate for partially occluded images can also reach 70% to 80%. In addition, the behavior recognition model 121 using the DensePose model can recognize behaviors from a single frame, thereby speeding up the recognition time.

視線辨識模型123主要處理人物臉部的視線方向，並且進一步處理動態面板移動、分心、疲勞辨識等特徵動作。在一實施例中，使用OpenPose模型來建立視線辨識模型123。OpenPose模型透過向量（Vector）來辨識視線。例如，透過OpenPose模型找出臉部的三點特徵，並主動輸出人物特徵座標軸以利後續模型訓練。之後，在基於先後順序所擷取的時間上相鄰的至少兩張圖框中，透過辨識三點共面三角形的變形比例，來判斷人物的視線方向。The gaze recognition model 123 mainly processes the gaze direction of the character's face, and further processes characteristic actions such as dynamic panel movement, distraction, and fatigue recognition. In one embodiment, the OpenPose model is used to build the sight recognition model 123. The OpenPose model uses Vector to identify the line of sight. For example, the OpenPose model is used to find the three-point feature of the face, and actively output the character's feature coordinate axis to facilitate subsequent model training. Afterwards, in at least two adjacent frames in time captured based on the sequence, the person’s line of sight direction is determined by recognizing the deformation ratio of the three-point coplanar triangle.

圖2是依照本發明一實施例的基於OpenPose模型所識別出的特徵點的示意圖。在圖2中，繪示出透過OpenPose模型所識別出使用者身上的特徵點（N0～N17）。所述特徵點N0～N17為構成人體骨架的關鍵點（keypoint）。本實施例是利用OpenPose模型所識別出使用者的臉上的特徵點N0、N14、N15來作為視線特徵。基於連續多張圖框中的特徵點N0、N14、N15所形成的共面三角形的變形比例、以及特徵點N0、N14、N15在不同圖框中的位置，來判斷使用者的視線方向。Fig. 2 is a schematic diagram of feature points identified based on the OpenPose model according to an embodiment of the present invention. In Figure 2, the feature points (N0～N17) on the user identified through the OpenPose model are drawn. The feature points N0 to N17 are key points that constitute the skeleton of the human body. In this embodiment, the feature points N0, N14, and N15 on the face of the user identified by the OpenPose model are used as the line of sight features. Based on the deformation ratio of the coplanar triangle formed by the feature points N0, N14, and N15 in consecutive multiple frames, and the positions of the feature points N0, N14, and N15 in different frames, the user's line of sight direction is determined.

採用了OpenPose模型的視線辨識模型123中，透過標註人物臉部的特徵點N0、N14、N15（作為視線特徵），能夠判斷使用者的視線方向是否偏移，進一步來判斷使用者是否分心、是否查看左右來車、是否察看後照鏡、是否使用車用電子裝置、是否疲勞駕駛等。並且，處理器110還可根據視線特徵來切換主控台的顯示面板的畫面或者驅動主控台的特定操作。例如，當透過視線辨識模型123偵測到使用者看向車內A柱的方向，處理器110切換顯示面板的畫面以顯示A柱車外影像。或者，處理器110利用視線特徵來控制音樂大小、導航開關等特定操作。In the gaze recognition model 123 using the OpenPose model, by marking the feature points N0, N14, and N15 of the person’s face (as gaze features), it can determine whether the user’s gaze direction is offset, and further determine whether the user is distracted, Whether to check the left and right vehicles, whether to check the rear mirror, whether to use the electronic device of the car, whether to drive fatigue, etc. In addition, the processor 110 can also switch the screen of the display panel of the main console or drive a specific operation of the main console according to the line of sight characteristics. For example, when it is detected through the sight recognition model 123 that the user is looking in the direction of the A-pillar in the car, the processor 110 switches the screen of the display panel to display the A-pillar exterior image. Alternatively, the processor 110 uses the line of sight feature to control specific operations such as the size of the music and the navigation switch.

一般而言，進行眼動辨識需要設置眼動儀或是必須將影像擷取裝置130面向使用者進行拍攝，否則無法進行。而使用OpenPose模型的視線辨識模型123可以直接透過臉部方向來判斷視線方向，大幅增加機構設置的自由度，也能減少眼球偏移時造成的誤差。採用了OpenPose模型的線辨識模型123中，不管影像擷取裝置130為正面面向使用者進行設置或是未正面面向使用者進行設置，其視線特徵的辨識率皆可高達90%。Generally speaking, to perform eye movement recognition, an eye tracker must be installed or the image capturing device 130 must be faced to the user to shoot, otherwise it cannot be performed. The gaze recognition model 123 using the OpenPose model can directly determine the gaze direction through the direction of the face, which greatly increases the degree of freedom of the mechanism setting, and can also reduce the error caused by the deviation of the eyeball. In the line recognition model 123 adopting the OpenPose model, whether the image capturing device 130 is set to face the user or is not set to face the user, the recognition rate of the line of sight feature can be as high as 90%.

圖3是依照本發明一實施例的影像辨識方法流程圖。請同時參照圖1及圖3，在步驟S305中，透過影像擷取裝置130來獲得影像序列。在獲得影像序列之後，可以同時驅使行為辨識模型121與視線辨識模型123來分別執行步驟S310與步驟S315。然，在此並不限定行為辨識模型121與視線辨識模型123的執行順序。FIG. 3 is a flowchart of an image recognition method according to an embodiment of the invention. Referring to FIG. 1 and FIG. 3 at the same time, in step S305, an image sequence is obtained through the image capturing device 130. After the image sequence is obtained, the behavior recognition model 121 and the sight recognition model 123 can be simultaneously driven to perform step S310 and step S315 respectively. Of course, the execution sequence of the behavior recognition model 121 and the sight recognition model 123 is not limited here.

在步驟S310中，利用行為辨識模型121分析影像序列，以獲得使用者的行為姿態。接著，在步驟S320中，利用危險判定模型125來計算行為姿態對應的第一分數。具體而言，行為辨識模型121透過類神經網路演算法來識別出影像序列中是否存在使用者，並在存有使用者的情況下判斷其行為姿態是否符合分心行為或疲勞行為。倘若行為姿態符合分心行為或疲勞行為，才透過危險判定模型125對其計算第一分數。In step S310, the behavior recognition model 121 is used to analyze the image sequence to obtain the user's behavior posture. Next, in step S320, the risk determination model 125 is used to calculate the first score corresponding to the behavior posture. Specifically, the behavior recognition model 121 uses a neural network-like algorithm to identify whether there is a user in the image sequence, and when there is a user, it determines whether the behavior and posture correspond to a distracted behavior or a fatigue behavior. If the behavior posture conforms to the distracted behavior or the fatigue behavior, the first score is calculated by the risk determination model 125.

具體而言，在行為辨識模型121判定行為姿態為分心行為的情況下，透過危險判定模型125計算行為姿態對應的第一分數的方式如下：將第一分數設定為分心行為所對應的主分數，並且在持續判定行為姿態為分心行為的過程中，每經過第一判斷時間，將第一分數加上對應於主分數的輔助分數，直到行為辨識模型121偵測到行為姿態回復至正常狀態。在一實施例中，於行為辨識模型121的訓練中設定多個分心行為，並針對不同的分心行為設定不同的主分數與輔助分數，如表1所示。Specifically, when the behavior recognition model 121 determines that the behavior posture is a distracted behavior, the method of calculating the first score corresponding to the behavior posture through the danger determination model 125 is as follows: In the process of continuously determining that the behavior posture is a distracting behavior, the first score is added to the auxiliary score corresponding to the main score every time the first judgment time passes, until the behavior recognition model 121 detects that the behavior posture returns to normal state. In one embodiment, multiple distraction behaviors are set in the training of the behavior recognition model 121, and different main scores and auxiliary scores are set for different distraction behaviors, as shown in Table 1.

表1 分心行為主分數輔助分數車輛行駛中飲食 10 5 車輛行駛中操作車內中控台 30 10 車輛行駛中使用智慧型行動裝置 60 20 車輛行駛中手離開方向盤 60 20 Table 1 Distraction Main score Auxiliary score Eating and drinking while driving 10 5 Operating the center console in the vehicle while the vehicle is running 30 10 Use smart mobile devices while driving 60 20 Hands leave the steering wheel while the vehicle is driving 60 20

在表1中，以分心行為的危險程度來設定主分數及輔助分數的高低。例如，在判定行為姿態為車輛行駛中手離開方向盤的情況下，將第一分數設定為60分，在動作持續時間中每經過1秒便將第一分數再累加20分。即，在判定車輛行駛中手離開方向盤且持續5秒所對應的第一分數為：60（主分數）+5（持續時間）×20（輔助分數）。並且，在偵測到行為姿態回復至正常狀態時，將第一分數歸零。In Table 1, the main score and auxiliary score are set based on the degree of risk of distracted behavior. For example, in the case where it is determined that the behavior posture is that the hand leaves the steering wheel while the vehicle is running, the first score is set to 60 minutes, and the first score is added to 20 points every 1 second in the action duration. That is, when it is determined that the hand leaves the steering wheel for 5 seconds while the vehicle is running, the first score corresponding to: 60 (main score) + 5 (duration)×20 (auxiliary score). And, when it is detected that the behavior posture returns to a normal state, the first score is reset to zero.

所述行為辨識模型121還可在偵測到多個分心行為的狀況下，加總第一分數。例如，參照表1，倘若偵測到駕駛在車輛行駛中飲食5秒（10+5×5），同時偵測到駕駛在車輛行駛中操作車內中控台1秒（30+10×1），此時，第一分數為35+40分。The behavior recognition model 121 can also add up the first score when multiple distracting behaviors are detected. For example, referring to Table 1, if it is detected that the driver is eating and drinking in the vehicle for 5 seconds (10+5×5), and it is detected that the driver is operating the center console in the vehicle for 1 second (30+10×1) while the vehicle is moving. , At this time, the first score is 35+40 points.

另一方面，在行為辨識模型121判定行為姿態為疲勞行為的情況下，透過危險判定模型125計算行為姿態對應的第一分數的方式如下：在判定為疲勞行為的情況下，設定第一分數為對應至疲勞等級為1的預設分數A1；在疲勞等級為1的情況下經過預設時間B1之後，判定行為姿態仍然為疲勞行為的情況下，將第一分數設定為對應至疲勞等級為2的預設分數A2（A2＞A1）；在疲勞等級為2的情況下經過預設時間B2之後，判定行為姿態仍然為疲勞行為的情況下，將第一分數設定為對應至疲勞等級為3的預設分數A3（A3＞A2）。而在偵測到行為姿態回復至正常狀態時，將第一分數歸零。On the other hand, in the case where the behavior identification model 121 determines that the behavior posture is a fatigue behavior, the method of calculating the first score corresponding to the behavior posture through the risk determination model 125 is as follows: In the case of a fatigue behavior, the first score is set as Corresponds to the preset score A1 with a fatigue level of 1. After the preset time B1 has elapsed when the fatigue level is 1, and the behavior posture is still a fatigue behavior, the first score is set to correspond to the fatigue level 2. The default score A2 (A2>A1); in the case of fatigue level 2 after the preset time B2, in the case of determining that the behavior posture is still fatigue behavior, the first score is set to correspond to the fatigue level 3 The default score is A3 (A3>A2). When it is detected that the behavior posture returns to a normal state, the first score is returned to zero.

具體而言，於行為辨識模型121的訓練中進一步設定疲勞行為的特徵態樣，包括車輛行駛中打哈欠、車輛行駛中打瞌睡等。並且，進一步設定多個疲勞等級，並針對不同的疲勞等級設定不同的預設分數。例如，由疲勞程度輕至疲勞程度重來設定疲勞等級1～3，並且疲勞等級越高，對應的預設分數越高。Specifically, during the training of the behavior recognition model 121, the characteristic patterns of fatigue behaviors are further set, including yawning while the vehicle is running, and dozing off while the vehicle is running. In addition, multiple fatigue levels are further set, and different preset scores are set for different fatigue levels. For example, fatigue levels 1 to 3 are set from a light fatigue level to a heavy fatigue level, and the higher the fatigue level, the higher the corresponding preset score.

假設疲勞等級1對應的預設分數B1為60分，疲勞等級2對應的預設分數B2為80分，疲勞等級3對應的預設分數B3為100分，並且假設預設時間B1為5秒，預設時間B2為3秒。在初始判定行為姿態為車輛行駛中打哈欠、車輛行駛中打瞌睡等疲勞行為的情況下，視為疲勞等級1，故，設定第一分數為60分。而在疲勞等級為1的情況下經過5秒仍未回復正常狀態的情況下，視為疲勞程度加重至疲勞等級2，此時將第一分數設定為80分。而在疲勞等級為2的情況下經過3秒仍未回復正常狀態的情況下，將第一分數設定為100分。在此，疲勞等級設定3個等級僅為舉例說明，並不以此為限。Assuming that the preset score B1 corresponding to fatigue level 1 is 60 points, the preset score B2 corresponding to fatigue level 2 is 80 points, the preset score B3 corresponding to fatigue level 3 is 100 points, and the preset time B1 is 5 seconds. The preset time B2 is 3 seconds. When the initial determination behavior is fatigue behavior such as yawning while the vehicle is running, dozing off while the vehicle is running, it is regarded as a fatigue level 1, so the first score is set to 60 points. In the case where the fatigue level is 1 and the normal state is not restored after 5 seconds, it is considered that the fatigue level has increased to the fatigue level 2, and the first score is set to 80 points at this time. In the case where the fatigue level is 2 and the normal state is not restored after 3 seconds, the first score is set to 100 points. Here, the setting of 3 levels of fatigue level is only an example, and it is not limited to this.

另外，在步驟S315中，利用視線辨識模型123分析影像序列，以獲得使用者的視線特徵。接著，在步驟S325中，利用危險判定模型125來計算視線特徵對應的第二分數。具體而言，視線辨識模型123透過類神經網路演算法來識別出影像序列中是否存在使用者，並在存有使用者的情況下獲得其視線特徵。倘若視線特徵未面向指定方向或是視線特徵符合視線疲勞特徵，才透過危險判定模型125對其計算第二分數。In addition, in step S315, the visual line recognition model 123 is used to analyze the image sequence to obtain the visual line characteristics of the user. Next, in step S325, the risk determination model 125 is used to calculate the second score corresponding to the line of sight feature. Specifically, the line-of-sight recognition model 123 recognizes whether there is a user in the image sequence through a neural network-like algorithm, and obtains the line-of-sight characteristics of the user in the presence of the user. If the line of sight feature does not face the specified direction or the line of sight feature meets the line of sight fatigue feature, the second score is calculated for it through the risk determination model 125.

具體而言，在獲得使用者的視線特徵之後，透過視線辨識模型123基於視線特徵判斷使用者是否未面向指定方向。例如基於駕駛的視線方向來判斷駕駛是否面向正前方。在判定使用者未面向指定方向時，才透過危險判定模型125計算行為第二分數。例如，每經過第二判斷時間（例如1秒），將第二分數加上指定分數。舉例來說，當判斷駕駛的視線方向不在正面（指定方向）時，在經過1秒後，將第二分數加上15分，經過2秒後第二分數為30分（15+15），以此類推。而在偵測到視線方向回到指定方向時，將第二分數歸零。Specifically, after obtaining the user's line of sight feature, the line of sight recognition model 123 determines whether the user is not facing a designated direction based on the line of sight feature. For example, it is determined whether the driver is facing straight ahead based on the direction of the driver's line of sight. When it is determined that the user is not facing the designated direction, the second score of the behavior is calculated through the risk determination model 125. For example, every second judgment time (for example, 1 second), the second score is added to the designated score. For example, when it is judged that the driver’s line of sight is not in the front (specified direction), after 1 second, 15 minutes are added to the second score, and after 2 seconds, the second score is 30 minutes (15+15). And so on. When it is detected that the direction of the line of sight returns to the specified direction, the second score is returned to zero.

另一方面，在視線辨識模型123判定行為視線特徵判斷符合視線疲勞特徵的情況下，透過危險判定模型125計算對應的第二分數的方式如下：在判定視線特徵符合視線疲勞特徵時，設定第二分數為對應至疲勞等級為1的預設分數C1；在疲勞等級為1的情況下經過預設時間D1之後，判定視線特徵仍然符合視線疲勞特徵時，將第二分數設定為對應至疲勞等級為2的預設分數C2（C2＞C1）；在疲勞等級為2的情況下經過預設時間D2之後，判定視線特徵仍然符合視線疲勞特徵時，將第二分數設定為對應至疲勞等級為3的預設分數C3（C3＞C2）。On the other hand, when the gaze recognition model 123 determines that the behavioral gaze feature is consistent with the gaze fatigue feature, the method of calculating the corresponding second score through the hazard determination model 125 is as follows: when the gaze feature is determined to match the gaze fatigue feature, set the second The score is the preset score C1 corresponding to the fatigue level of 1. After the preset time D1 has elapsed when the fatigue level is 1, the second score is set to correspond to the fatigue level when it is determined that the line-of-sight feature still meets the line-of-sight fatigue feature The default score of 2 is C2 (C2>C1); when the fatigue level is 2 after the preset time D2, when it is determined that the line of sight feature still meets the line of sight fatigue feature, the second score is set to correspond to the fatigue level of 3 The default score is C3 (C3>C2).

具體而言，於視線辨識模型123的訓練中進一步設定視線疲勞特徵的特徵態樣。並且，進一步設定多個疲勞等級，並針對不同的疲勞等級設定不同的預設分數。例如，由疲勞程度輕至疲勞程度重來設定疲勞等級1～3，並且疲勞等級越高，對應的預設分數越高。Specifically, in the training of the gaze recognition model 123, the characteristic pattern of the gaze fatigue feature is further set. In addition, multiple fatigue levels are further set, and different preset scores are set for different fatigue levels. For example, fatigue levels 1 to 3 are set from a light fatigue level to a heavy fatigue level, and the higher the fatigue level, the higher the corresponding preset score.

假設疲勞等級1對應的預設分數C1為60分，疲勞等級2對應的預設分數C2為80分，疲勞等級3對應的預設分數C3為100分，並且假設預設時間D1為5秒，預設時間D2為3秒。在初始判定符合視線疲勞特徵的情況下，設定第二分數為60分。在持續經過5秒仍未回復正常狀態的情況下，將第二分數設定為80分。倘若接著持續經過3秒仍未回復正常狀態的情況下，將第二分數設定為100分。在此僅為舉例說明，並不以此為限。而在視線特徵回復正常狀態時，將第二分數歸零。在此，將視線特徵未符合視線疲勞特徵視為是正常狀態。Assuming that the preset score C1 corresponding to fatigue level 1 is 60 points, the preset score C2 corresponding to fatigue level 2 is 80 points, the preset score C3 corresponding to fatigue level 3 is 100 points, and the preset time D1 is 5 seconds. The preset time D2 is 3 seconds. In the case where the initial judgment is consistent with the line-of-sight fatigue characteristics, the second score is set to 60 points. In the case where the normal state is not restored after 5 seconds, the second score is set to 80 points. If the normal state is not restored after 3 seconds, the second score is set to 100 points. This is only an example, and it is not limited to this. When the line of sight feature returns to a normal state, the second score is returned to zero. Here, it is considered that the line of sight feature does not meet the line of sight fatigue feature as a normal state.

在計算第一分數、第二分數之後，在步驟S330中，透過危險判定模型125基於第一分數與第二分數來計算危險分數。在一實施例中，危險判定模型125將第一分數與第二分數相加來獲得危險分數。在其他實施例中，危險分數的計算亦可為如下：危險分數=第一分數×第一權重+第二分數×第二權重。其中，第一權重+第二權重=1。After calculating the first score and the second score, in step S330, the risk determination model 125 is used to calculate the risk score based on the first score and the second score. In one embodiment, the risk determination model 125 adds the first score and the second score to obtain the risk score. In other embodiments, the calculation of the risk score may also be as follows: risk score=first score×first weight+second score×second weight. Among them, the first weight+the second weight=1.

在獲得危險分數之後，在步驟S335中，根據危險分數來發出對應的警示訊號。並且，還可進一步設定為：持續發出警示訊號，直到偵測到行為姿態回復至正常狀態以及偵測到視線方向回到指定方向。在此，可將警示等級分為多個等級，不同的警示等級會發出不同的警示訊號。After the risk score is obtained, in step S335, a corresponding warning signal is issued according to the risk score. In addition, it can be further set to: continue to send out a warning signal until the detected behavior posture returns to the normal state and the detected line of sight direction returns to the specified direction. Here, the warning levels can be divided into multiple levels, and different warning levels will issue different warning signals.

例如，底下以4個警示等級來說明。危險分數位於60～80之間，判定目前危險等級為第一警示等級，其對應的警示訊號包括發出閃燈訊號以及提示短音（例如一短聲或兩短聲）。危險分數位於80～100之間，判定目前危險等級為第二警示等級，其對應的警示訊號包括發出閃燈訊號以及持續發出第一提示長音。危險分數位於100～200之間，判定目前危險等級為第三警示等級，其對應的警示訊號包括發出閃燈訊號以及持續發出第二提示長音（例如尖銳長音）。危險分數大於200，判定目前危險等級為第四警示等級，發送通知訊號至遠端管制中心，並且還可同時發出閃燈訊號以及第二提示長音。For example, there are 4 warning levels below. The hazard score is between 60 and 80, and the current hazard level is determined to be the first warning level. The corresponding warning signal includes a flashing light signal and a short sound (such as one short sound or two short sounds). The hazard score is between 80 and 100, and the current hazard level is determined to be the second warning level, and the corresponding warning signal includes the flashing light signal and the continuous first warning tone. The hazard score is between 100 and 200, and the current hazard level is determined to be the third warning level, and the corresponding warning signal includes the flashing light signal and the continuous second long sound (such as a sharp long sound). If the danger score is greater than 200, the current danger level is determined to be the fourth warning level, a notification signal is sent to the remote control center, and a flashing light signal and a second long tone can also be issued at the same time.

例如，透過行為辨識模型121偵測到駕駛出現疲勞行為8秒（第一分數100分），同時透過視線辨識模型123偵測到視線特徵符合視線疲勞特徵5秒（第二分數80分），則危險分數為180分。故，判定目前危險等級為第三警示等級，其對應的警示訊號包括發出閃燈訊號以及持續發出第二提示長音（例如尖銳長音）。For example, if driving fatigue is detected for 8 seconds through the behavior recognition model 121 (the first score is 100 points), and at the same time it is detected through the gaze recognition model 123 that the line of sight characteristics match the line of sight fatigue characteristics for 5 seconds (the second score is 80 points), then The risk score is 180 points. Therefore, it is determined that the current danger level is the third warning level, and the corresponding warning signals include the flashing light signal and the continuous second long sound (such as a sharp long sound).

在一實施例中，在危險分數大於200的情況下，處理器110可透過WiFi（wireless fidelity，無線保真）等無線通訊技術來發送通知訊號至遠端管制中心，以利後續追蹤與校正，或提供即時救援。並且，還可進一步將影像擷取裝置130所拍攝到的影像序列傳送至遠端管制中心。In one embodiment, when the risk score is greater than 200, the processor 110 may send a notification signal to the remote control center through wireless communication technologies such as WiFi (wireless fidelity) to facilitate subsequent tracking and correction. Or provide immediate relief. In addition, the image sequence captured by the image capturing device 130 can be further transmitted to the remote control center.

另外，除了發出警示訊號之外，還可進一步直接鎖定中控台的操作（例如導航螢幕、音樂控制等）。In addition, in addition to issuing a warning signal, it can further directly lock the operation of the center console (such as navigation screen, music control, etc.).

綜上所述，本發明採用行為辨識模型與視線辨識模型行來同時識別行為姿態與視線方向，來獲得更準確的辨識結果，並且基於危險分數來發出對應的警示訊號，藉此可提醒駕駛與後方來車。此外，進一步利用深度特徵以及向量分析來分別訓練行為辨識模型與視線辨識模型，可提高辨識的準確率。In summary, the present invention uses the behavior recognition model and the line of sight recognition model line to simultaneously recognize the behavior posture and the line of sight direction to obtain more accurate recognition results, and issue corresponding warning signals based on the hazard score, thereby reminding driving and Cars coming from behind. In addition, further use of depth features and vector analysis to separately train the behavior recognition model and the line of sight recognition model can improve the accuracy of recognition.

100:影像辨識裝置 110:處理器 120:儲存裝置 121:行為辨識模型 123:視線辨識模型 125:危險判定模型 130:影像擷取裝置 N0～N17:特徵點 S305～S335:影像辨識方法各步驟 100: Image recognition device 110: processor 120: storage device 121: Behavior Recognition Model 123: Sight recognition model 125: Dangerous Judgment Model 130: Image capture device N0～N17: Feature points S305～S335: Steps of image recognition method

圖1是依照本發明一實施例的影像辨識裝置的方塊圖。圖2是依照本發明一實施例的基於OpenPose模型所識別出的特徵點的示意圖。圖3是依照本發明一實施例的影像辨識方法流程圖。 FIG. 1 is a block diagram of an image recognition device according to an embodiment of the invention. Fig. 2 is a schematic diagram of feature points identified based on the OpenPose model according to an embodiment of the present invention. FIG. 3 is a flowchart of an image recognition method according to an embodiment of the invention.

S305~S335:影像辨識方法各步驟 S305~S335: Steps of image recognition method

Claims

An image recognition method includes: obtaining an image sequence from an image capturing device; and executing the following steps by an electronic device, including: inputting the image sequence to a behavior recognition model to obtain a behavior posture of a user Calculate a first score corresponding to the behavior posture; input the image sequence to a line of sight recognition model to obtain a line of sight direction of the user; calculate a second score corresponding to the line of sight feature; based on the first score and the The second score is used to calculate a risk score; and a corresponding warning signal is issued based on the risk score.

The image recognition method according to claim 1, wherein after obtaining the behavior posture of the user, it further includes: in the case of determining that the behavior posture is a distraction behavior, calculating the first score corresponding to the behavior posture , Including: setting the first score as a main score corresponding to the distraction behavior; and in the process of continuously determining that the behavior posture is the distraction behavior, each time a first determination time elapses, the first score Add an auxiliary score corresponding to the main score.

The image recognition method according to claim 1, wherein after obtaining the behavior posture of the user, it further includes: setting the first score in the case of determining that the behavior posture is a fatigue behavior Is a first preset score corresponding to a fatigue level of 1; when the fatigue level is 1, after a first preset time has elapsed, it is determined that the behavior posture is still the fatigue behavior. A score is set as a second preset score corresponding to the fatigue level of 2, wherein the second preset score is greater than the first preset score; and when the fatigue level is 2, it passes through a second preset After time, if it is determined that the behavior posture is still the fatigue behavior, the first score is set as a third preset score corresponding to the fatigue level of 3, wherein the third preset score is greater than the second prediction Set the score.

The image recognition method according to claim 1, wherein the step of inputting the image sequence to the gaze recognition model to obtain the gaze direction of the user includes: using the user's face recognized by the gaze recognition model The multiple feature points are used as a line of sight feature, and the line of sight direction is determined based on the line of sight feature; wherein, after obtaining the line of sight direction of the user, it further includes: judging whether the user is not facing a line of sight based on the line of sight feature Designated direction; and when it is determined that the user is not facing the designated direction, calculating the second score includes: adding a designated score to the second score every time a second judgment time has elapsed.

The image recognition method according to claim 1, wherein the step of inputting the image sequence to the line-of-sight recognition model to obtain the line-of-sight direction of the user includes: Use the multiple feature points on the user's face identified by the gaze recognition model as a gaze feature, and determine the gaze direction based on the gaze feature; wherein, after obtaining the gaze direction of the user, It further includes: judging whether the line-of-sight feature meets a line-of-sight fatigue feature based on the line-of-sight feature; when determining that the line-of-sight feature meets the line-of-sight fatigue feature, setting the second score as a first preset score corresponding to a fatigue level of 1; When the fatigue level is 1, after a first preset time has elapsed, when it is determined that the line of sight feature still conforms to the line of sight fatigue feature, the second score is set to a second preset score corresponding to the fatigue level of 2. The second preset score is greater than the first preset score; and when the fatigue level is 2 after a second preset time, when it is determined that the line of sight feature still meets the line of sight fatigue feature, the second The score is set as a third preset score corresponding to the fatigue level of 3, wherein the third preset score is greater than the second preset score.

The image recognition method according to claim 1, further comprising: resetting the first score to zero when detecting that the behavior posture returns to a normal state; and when detecting that the line of sight direction returns to a specified direction , Reset the second score to zero.

The image recognition method according to claim 1, wherein after the corresponding warning signal is issued according to the risk score, it further comprises: continuously issuing the warning signal until the detected behavior posture returns to a normal state and detection Go to the direction of the line of sight and return to a specified direction.

The image recognition method according to claim 1, wherein the step of issuing the corresponding warning signal according to the risk score includes: judging based on the risk score that a current risk level is a first warning level, a second warning level, A third warning level or a fourth warning level; when it is determined that the current danger level is the first warning level, a flashing light signal and a short tone are issued; when the current danger level is determined to be the second warning level, it is issued The flashing light signal and continuously emitting a first long warning tone; when determining that the current danger level is the third warning level, emitting the flashing light signal and continuously emitting a second long warning tone; and when determining that the current danger level is the first warning level Four warning levels, sending a notification signal to a remote control center.

The image recognition method according to claim 1, wherein the behavior recognition model is used to mark the depth characteristics of the user in the image sequence, and the line of sight recognition model is used to find out a plurality of the user in the image sequence Feature points.

An image recognition device includes: an image capture device that captures an image sequence; a storage device including a behavior recognition model, a line of sight recognition model, and a hazard determination model; and a processor coupled to the image capture The capture device and the storage device are configured to: obtain the image sequence from the image capture device; input the image sequence to the behavior recognition model to obtain a user's row Is posture; input the image sequence to the line of sight recognition model to obtain a line of sight direction of the user; use the hazard determination model to calculate a first score corresponding to the behavior posture, and calculate a second score corresponding to the line of sight feature Then, a risk score is calculated based on the first score and the second score, and a corresponding warning signal is issued according to the risk score.

The image recognition device according to claim 10, wherein the processor is configured to: in the case of using the behavior recognition model to determine that the behavior posture is a distracting behavior, calculate the corresponding behavior posture through the danger determination model The first score includes: setting the first score as a main score corresponding to the distraction behavior; The first score plus an auxiliary score corresponding to the main score.

The image recognition device according to claim 10, wherein the processor is configured to: in the case of using the behavior recognition model to determine that the behavior posture is a fatigue behavior, set the first score to correspond to the hazard determination model A first preset score with a fatigue level of 1. After a first preset time has elapsed when the fatigue level is 1, the behavior identification model is used to determine that the behavior posture is still the fatigue behavior. Through the risk determination model, the first score is set as a second preset score corresponding to the fatigue level of 2, wherein the second preset score is greater than the first preset score; in the case where the fatigue level is 2 After a second preset time has elapsed, when the behavior identification model is used to determine that the behavior posture is still the fatigue behavior, the first score is set to correspond to the fatigue level of 3 through the risk determination model. Three preset scores, where the third preset score is greater than the second preset score.

The image recognition device according to claim 10, wherein the processor is configured to: use a plurality of feature points on the user's face recognized by the gaze recognition model as a gaze feature, and based on the gaze feature To determine the direction of the line of sight; use the line of sight recognition model to determine whether the user is not facing a specified direction based on the feature of the line of sight; when it is determined that the user is not facing the specified direction, calculate the second score through the hazard determination model, including : Every time a second judgment time has elapsed, add a specified score to the second score.

The image recognition device according to claim 10, wherein the processor is configured to: use a plurality of feature points on the user's face recognized by the gaze recognition model as a gaze feature, and based on the gaze feature To determine the direction of the line of sight; use the line of sight recognition model to determine whether it meets a line of sight fatigue feature based on the line of sight feature; use the line of sight recognition model to determine that the line of sight feature meets the line of sight fatigue feature When the risk determination model is used, the second score is set as a first preset score corresponding to a fatigue level of 1; when the fatigue level is 1, after a first preset time has elapsed, the sight recognition is used When the model determines that the line-of-sight feature still meets the line-of-sight fatigue feature, the second score is set to a second preset score corresponding to the fatigue level of 2 through the risk determination model setting, wherein the second preset score is greater than the A first preset score; when a second preset time has elapsed when the fatigue level is 2, when the sight recognition model is used to determine that the sight feature still meets the sight fatigue feature, the risk determination model is used to set the first The second score is set as a third preset score corresponding to the fatigue level of 3, wherein the third preset score is greater than the second preset score.

The image recognition device according to claim 10, wherein the processor is configured to: when detecting that the behavior posture returns to a normal state, reset the first score to zero through the hazard determination model; and when detecting When the line of sight direction returns to a specified direction, the second score is reset to zero through the danger determination model.

The image recognition device according to claim 10, wherein the processor is configured to: continuously send out the warning signal through the hazard determination model until the detection of the behavior and posture returns to a normal state and the detection of the return of the line of sight To a specified direction.

The image recognition device according to claim 10, wherein the processor is configured to: determine a current risk level as a first warning level, a second warning level, and a third warning based on the risk score through the risk determination model City level or a fourth warning level; when the current danger level is determined to be the first warning level, a flashing light signal and a short tone are issued; when the current danger level is determined to be the second warning level, the flashing light signal is issued And continuously emit a first long warning tone; when determining that the current danger level is the third warning level, emit the flashing signal and continuously emit a second long warning tone; when determining that the current danger level is the fourth warning level, send A notification signal to a remote control center.

The image recognition device according to claim 10, wherein the processor is configured to: use the behavior recognition model to mark the depth characteristics of the user in the image sequence, and to find the depth characteristics of the user in the image sequence through the sight recognition model Multiple characteristic points of the user.