TWI831524B

TWI831524B - System and method for abnormal driving behavior detection based on spatial-temporal relationship between objects

Info

Publication number: TWI831524B
Application number: TW111148322A
Authority: TW
Inventors: 張傳旺; 黃譯興; 張雍晨; 王秉濂
Original assignee: 國立勤益科技大學
Priority date: 2022-12-15
Filing date: 2022-12-15
Publication date: 2024-02-01

Abstract

A system and a method for an abnormal driving behavior detection based on a spatial-temporal relationship between objects are disclosed. The system and method comprises: an in-cockpit device for capturing continuous image in the cockpit and presenting the images on a webpage in real time; a system host for receiving the image and determining whether abnormal driving behavior is present in the image to generate a voice alert and an abnormal event alert notification; multiple cell phones for receiving abnormal event alert; and a monitor screen to instantly play the in-cockpit taken image via the webpage and to instantly display abnormal event alert notification from the system host. The present invention can effectively determine the occurrence of abnormal driving behavior and issue notification and alert.

Description

Dangerous driving behavior detection system and method based on spatio-temporal relationship of objects

本發明是有關於一種駕駛行為偵測系統及方法，特別是有關於一種基於物件時空關係之危險駕駛行為偵測系統及方法。The present invention relates to a driving behavior detection system and method, and in particular to a dangerous driving behavior detection system and method based on the spatiotemporal relationship of objects.

汽車是人類日常生活的重要交通工具之一，從媒體報導與多項調查報告可以看到，汽車駕駛者分心與疲勞是造成意外事故的重要原因。近年來因駕駛者分心而導致車禍事故發生的事件層出不窮，因此安全駕駛行為監視系統的研製與佈建愈來愈受到重視。傳統的行車監控攝影機雖然具有錄製與回放的功能，但由於對監控的場域只有記錄的功能，對於突發的異常事件，無法即時地發揮警示與提醒的效果。目前雖有駕駛艙內駕駛員不良駕駛行為的偵測，然而受限於運算量大及架構複雜，因此僅能侷限於臉部表情的偵測，而且無法讓駕駛以外的第三人即時得知此危險駕駛行為。Cars are one of the important means of transportation in human daily life. From media reports and multiple survey reports, we can see that driver distraction and fatigue are important causes of accidents. In recent years, car accidents have occurred one after another due to driver distraction. Therefore, the development and deployment of safe driving behavior monitoring systems have received increasing attention. Although traditional traffic surveillance cameras have recording and playback functions, because they only have the recording function for the monitored area, they cannot provide immediate warning and reminder effects for sudden abnormal events. Although there is currently detection of driver's bad driving behavior in the cockpit, due to the large amount of calculation and complex architecture, it can only be limited to the detection of facial expressions, and it cannot be immediately known to third parties other than the driver. This dangerous driving behavior.

有鑑於此，本發明之目的之一就是在提供一種基於物件時空關係之危險駕駛行為偵測系統及方法，以解決上述長期存在之問題。In view of this, one of the purposes of the present invention is to provide a dangerous driving behavior detection system and method based on the spatiotemporal relationship of objects to solve the above long-standing problems.

為達前述目的，本發明提出一種基於物件時空關係之危險駕駛行為偵測系統，包含：一駕駛艙內設備，設於一駕駛艙之內部，至少包含一具網路傳輸功能之微型攝影裝置及一具無線網路傳輸功能之微處理器，該具網路傳輸功能之微型攝影裝置係具有一攝影鏡頭以拍攝一駕駛艙中之一影像，該影像係一連續畫面，且該具無線網路傳輸功能之微處理器係將該影像利用一MJPG(Motion JPG)即時影像串流技術呈現於一網頁上；一系統主機，該系統主機係接收該影像，並以一深度學習模型架構針對該影像進行一危險駕駛行為偵測(abnormal driving behavior detection)步驟，該危險駕駛行為偵測步驟包含依據複數個物件於該駕駛艙中的位置關係與重疊情形定義出該些物件之複數個空間事件，以及依據該些空間事件的時間關係，藉以判斷該影像中是否出現一危險駕駛行為，其中若出現該危險駕駛行為，則該系統主機發出一危險事件警示通知並將該危險駕駛行為之類型傳輸至該駕駛艙中之該具無線網路傳輸功能之微處理器，藉以使得該駕駛艙中之一語音撥放單元產生對應之一語音警示提醒；複數個手機，分別由一駕駛員及該駕駛員所屬之一公司之一主管所持有，用以藉由具有一訊息警示功能之一通訊軟體接收由該系統主機發出的該危險事件警示通知；以及一監看螢幕，用以經由該網頁即時撥放該具網路傳輸功能之微型攝影裝置所拍攝之該影像以及即時顯示該系統主機所發出之該危險事件警示通知。In order to achieve the above purpose, the present invention proposes a dangerous driving behavior detection system based on the spatiotemporal relationship of objects, including: a cockpit device, located inside a cockpit, including at least a miniature camera device with a network transmission function; A microprocessor with a wireless network transmission function. The micro-photography device with a network transmission function has a camera lens to capture an image in a cockpit. The image is a continuous frame, and the micro-photography device has a wireless network. The microprocessor with the transmission function displays the image on a web page using an MJPG (Motion JPG) real-time image streaming technology; a system host receives the image and uses a deep learning model architecture to target the image Performing an abnormal driving behavior detection step, the dangerous driving behavior detection step includes defining a plurality of spatial events of a plurality of objects based on their positional relationships and overlapping situations in the cockpit, and Based on the time relationship of the spatial events, it is determined whether a dangerous driving behavior occurs in the image. If the dangerous driving behavior occurs, the system host issues a dangerous event warning notification and transmits the type of the dangerous driving behavior to the The microprocessor with wireless network transmission function in the cockpit enables a voice playback unit in the cockpit to generate a corresponding voice warning reminder; a plurality of mobile phones are respectively owned by a driver and the driver. Possessed by a supervisor of a company, it is used to receive the dangerous event warning notification sent by the system host through a communication software with a message warning function; and a monitoring screen is used to play the real-time broadcast through the web page. The image captured by the micro camera device with network transmission function and the dangerous event warning notice issued by the system host are displayed in real time.

其中，該危險駕駛行為偵測步驟係使用YOLOv5架構且採用Mosaic(馬賽克)數據增強技術以訓練複數個物件偵測模型，藉以偵測出該些物件，且藉由定義該些物件之該些空間事件與該些空間事件的時間關係，藉以判斷該影像中是否出現該危險駕駛行為。Among them, the dangerous driving behavior detection step uses the YOLOv5 architecture and adopts Mosaic (mosaic) data enhancement technology to train multiple object detection models to detect the objects, and by defining the spaces of the objects The time relationship between the event and the spatial events is used to determine whether the dangerous driving behavior appears in the image.

其中，該系統主機係將該影像中出現該危險駕駛行為之時間與該類型記錄於一資料庫中，以便作為該駕駛員之考核與管理。Among them, the system host records the time and type of the dangerous driving behavior in the image in a database for the assessment and management of the driver.

其中，該危險駕駛行為偵測步驟係依據該些物件於該駕駛艙中的位置關係與重疊情形是否超過對應之第一預設值，藉以定義出該些物件之該些空間事件，該危險駕駛行為偵測步驟係依據該些空間事件的持續時間是否超過對應之第二預設值，藉以判斷該影像中是否出現該危險駕駛行為。Among them, the dangerous driving behavior detection step is based on whether the positional relationship and overlap of the objects in the cockpit exceed the corresponding first preset value, thereby defining the spatial events of the objects, and the dangerous driving The behavior detection step determines whether the dangerous driving behavior appears in the image based on whether the duration of the spatial events exceeds the corresponding second preset value.

其中，該些物件係包含該駕駛艙中之複數個物品以及該駕駛員之複數個人體特徵，該些物品為食物、安全帶、方向盤、手機，該些人體特徵為人臉、眼睛、雙手。Among them, these objects include multiple items in the cockpit and multiple personal characteristics of the driver. These items are food, seat belts, steering wheels, and mobile phones. These human characteristics are faces, eyes, and hands. .

為達前述之一目的，本發明提出一種基於物件時空關係之危險駕駛行為偵測方法，包含下列步驟：進行一影像拍攝及即時串流步驟，其係使用具有一攝影鏡頭之一具網路傳輸功能之微型攝影裝置拍攝一駕駛艙中之一影像，該影像係一連續畫面，以及使用一具無線網路傳輸功能之微處理器將該影像利用一MJPG即時影像串流技術呈現於一網頁上；進行一危險駕駛行為偵測(abnormal driving behavior detection)步驟，其係使用一系統主機接收該影像，並以一深度學習模型架構針對該影像進行該危險駕駛行為偵測(abnormal driving behavior detection) 步驟，該危險駕駛行為偵測步驟包含依據複數個物件於該駕駛艙中的位置關係與重疊情形定義出該些物件之複數個空間事件，以及依據該些空間事件的時間關係，藉以判斷該影像中是否出現一危險駕駛行為，其中若出現該危險駕駛行為，則該系統主機發出一危險事件警示通知並將該危險駕駛行為之類型傳輸至該駕駛艙中之該具無線網路傳輸功能之微處理器，藉以使得該駕駛艙中之一語音撥放單元產生對應之一語音警示提醒；進行一通訊軟體警示步驟，其係使用分別由一駕駛員及該駕駛員所屬之一公司之一主管所持有之複數個手機，藉由具有一訊息警示功能之一通訊軟體接收由該系統主機發出的該危險事件警示通知；以及進行一監看步驟，其係使用一監看螢幕經由網頁即時撥放該具網路傳輸功能之微型攝影裝置所拍攝之該影像以及即時顯示該系統主機所發出之該危險事件警示通知。In order to achieve one of the aforementioned objectives, the present invention proposes a dangerous driving behavior detection method based on the spatio-temporal relationship of objects, which includes the following steps: performing an image shooting and real-time streaming step, which uses a network transmission device with a photographic lens. The functional miniature camera device captures an image in the cockpit, the image is a continuous frame, and uses a microprocessor with a wireless network transmission function to present the image on a web page using an MJPG real-time image streaming technology ; Carry out an abnormal driving behavior detection step, which uses a system host to receive the image, and uses a deep learning model architecture to perform the abnormal driving behavior detection step on the image. , the dangerous driving behavior detection step includes defining a plurality of spatial events of a plurality of objects in the cockpit based on their positional relationships and overlapping conditions, and based on the time relationship of these spatial events, in order to determine what is in the image. Whether a dangerous driving behavior occurs, and if the dangerous driving behavior occurs, the system host sends a dangerous event warning notification and transmits the type of the dangerous driving behavior to the microprocessor with wireless network transmission function in the cockpit device, thereby causing a voice playback unit in the cockpit to generate a corresponding voice warning reminder; performing a communication software warning step, which is used by a driver and a supervisor of a company to which the driver belongs. A plurality of mobile phones receive the dangerous event warning notification sent by the system host through a communication software with a message warning function; and perform a monitoring step, which uses a monitoring screen to play the dangerous event warning notification through a web page in real time. The image captured by the micro camera device with network transmission function and the dangerous event warning notice issued by the system host are displayed in real time.

其中，該危險駕駛行為偵測步驟係使用YOLOv5架構且採用Mosaic數據增強技術以訓練複數個物件偵測模型，藉以偵測出該些物件，且藉由定義該些物件之該些空間事件與該些空間事件的時間關係，藉以判斷該影像中是否出現該危險駕駛行為。Among them, the dangerous driving behavior detection step uses the YOLOv5 architecture and adopts Mosaic data enhancement technology to train a plurality of object detection models to detect the objects, and by defining the spatial events of the objects and the The temporal relationship between these spatial events is used to determine whether the dangerous driving behavior occurs in the image.

其中，該些物件係包含該駕駛艙中之複數個物品以及該駕駛員之複數個人體特徵。Among them, the objects include a plurality of objects in the cockpit and a plurality of personal characteristics of the driver.

承上所述，本發明之基於物件時空關係之危險駕駛行為偵測系統及方法具有以下優點：Based on the above, the dangerous driving behavior detection system and method based on the spatiotemporal relationship of objects of the present invention has the following advantages:

(1) 具可行性：輕量化的系統，架構簡單、可達到即時(real-time)辨識的效果，本系統僅在進行物件偵測時使用到YOLOv5的深度學習模型，在辨識危險駕駛行為時，因採用物件的空間(spatial)與時間(temporal)關係來判斷，由於運算量小，可以達到即時辨識的效果，大大提升系統的可行性。(1) Feasibility: The lightweight system has a simple structure and can achieve real-time identification effects. This system only uses the deep learning model of YOLOv5 when detecting objects. When identifying dangerous driving behaviors, , because the spatial and temporal relationships of objects are used to judge, and due to the small amount of calculation, real-time recognition can be achieved, greatly improving the feasibility of the system.

(2) 具可擴充性：可輕易擴充新的危險駕駛行為類型，讓系統的辨識能力更全面，本系統可輕易地新增危險駕駛行為的類型，僅需簡易的訓練物件偵測模型，並定義好該事件的物件時空關係即可。(2) Scalable: New types of dangerous driving behaviors can be easily expanded to make the system’s identification capabilities more comprehensive. This system can easily add new types of dangerous driving behaviors by simply training the object detection model, and Just define the spatio-temporal relationship between the objects of the event.

(3) 具實用性：本系統具有即時監看、即時危險駕駛行為辨識及即時通知的功能，能在最短的時間內通知駕駛員停止危險行為的進行，有效降低交通事故的發生率。此外，本系統也會將各駕駛員的危險事件記錄於資料庫，這些記錄可做為公司對駕駛員的管理與考核之用。(3) Practical: This system has the functions of real-time monitoring, real-time dangerous driving behavior identification and real-time notification. It can notify the driver to stop dangerous behavior in the shortest possible time, effectively reducing the incidence of traffic accidents. In addition, this system will also record each driver's dangerous events in the database, and these records can be used by the company for driver management and assessment.

茲為使鈞審對本發明的技術特徵及所能達到的技術功效有更進一步的瞭解與認識，謹佐以較佳的實施例及配合詳細的說明如後。In order to enable Jun Shen to have a further understanding of the technical features and technical effects of the present invention, preferred embodiments and accompanying detailed descriptions are provided below.

為利瞭解本創作之技術特徵、內容與優點及其所能達成之功效，茲將本創作配合圖式，並以實施例之表達形式詳細說明如下，而其中所使用之圖式，其主旨僅為示意及輔助說明書之用，未必為本創作實施後之真實比例與精準配置，故不應就所附之圖式的比例與配置關係解讀、侷限本創作於實際實施上的權利範圍。此外，為使便於理解，下述實施例中的相同元件係以相同的符號標示來說明。In order to facilitate understanding of the technical features, content and advantages of this invention and the effects it can achieve, this invention is described in detail below with diagrams and in the form of expressions of embodiments. The purpose of the diagrams used is only They are for illustration and auxiliary instructions, and may not represent the true proportions and precise configurations of the creation after its implementation. Therefore, the proportions and configurations of the attached drawings should not be interpreted to limit the scope of rights in the actual implementation of this creation. In addition, to facilitate understanding, the same elements in the following embodiments are labeled with the same symbols for explanation.

另外，在全篇說明書與申請專利範圍所使用的用詞，除有特別註明外，通常具有每個用詞使用在此領域中、在此揭露的內容中與特殊內容中的平常意義。某些用以描述本創作的用詞將於下或在此說明書的別處討論，以提供本領域技術人員在有關本創作的描述上額外的引導。In addition, unless otherwise noted, the terms used throughout the specification and patent application generally have the ordinary meanings of each term used in the field, the content disclosed herein, and the specific content. Certain terms used to describe the invention are discussed below or elsewhere in this specification to provide those skilled in the art with additional guidance in describing the invention.

關於本文中如使用“第一”、“第二”、“第三”等，並非特別指稱次序或順位的意思，亦非用以限定本創作，其僅僅是為了區別以相同技術用語描述的組件或操作而已。The use of "first", "second", "third", etc. in this article does not specifically refer to the order or sequence, nor is it used to limit the present invention. It is only used to distinguish components described by the same technical terms. Or just an operation.

其次，在本文中如使用用詞“包含”、“包括”、“具有”、“含有”等，其均為開放性的用語，即意指包含但不限於。Secondly, if the words "include", "includes", "have", "contains", etc. are used in this article, they are all open terms, which means including but not limited to.

本發明的基於物件時空關係之危險駕駛行為偵測系統包含四個部分，第一個部分是安裝於駕駛艙內的攝影裝置及具無線網路傳輸功能之微處理器，其功能是拍攝駕駛艙的影像並透過5G無線網路將影像傳送至第二部分之系統主機。當系統主機偵測到危險事件時，會傳送危險行為的類別至駕駛艙，並發出語音警示訊息，提醒駕駛員立即停止危險行為。第二部分為放置於機房內的系統主機。系統主機主要負責物件偵測(object detection)和危險駕駛行為偵測(abnormal driving behavior detection)等二個任務。The dangerous driving behavior detection system based on the spatio-temporal relationship of objects of the present invention includes four parts. The first part is a photography device installed in the cockpit and a microprocessor with a wireless network transmission function. Its function is to photograph the cockpit. The image is transmitted to the system host in the second part through the 5G wireless network. When the system host detects a dangerous event, it will send the category of the dangerous behavior to the cockpit and issue a voice warning message to remind the driver to stop the dangerous behavior immediately. The second part is the system host placed in the computer room. The system host is mainly responsible for two tasks: object detection and abnormal driving behavior detection.

本發明在此危險駕駛行為偵測系統部署前，會先進行一連串的訓練過程，讓系統主機能從連續的影像中辨識出危險的駕駛行為。一旦辨識出危險的駕駛行為，除了將該事件發生的時間與類型記錄於資料庫外，也會同時透過Line Notify發送訊息至駕駛員和預先設定的公司主管手機，此外，也會將此事件發送至公司端的監看螢幕，好讓相關人員能及時採取必要的提醒作為。Before the dangerous driving behavior detection system is deployed, the present invention will first conduct a series of training processes so that the system host can identify dangerous driving behaviors from continuous images. Once a dangerous driving behavior is identified, in addition to recording the time and type of the incident in the database, a message will also be sent to the driver and the preset company supervisor's mobile phone through Line Notify. In addition, the incident will also be sent to to the company-side monitoring screen so that relevant personnel can take necessary reminders in a timely manner.

第三部份和第四部份是駕駛員及公司主管的手機以及公司端的監看螢幕，其功能是接收由系統主機發出的危險事件警示通知。公司端的監看螢幕還可以即時監看由駕駛艙回傳的畫面，讓公司達到有效管理的目的。The third and fourth parts are the driver's and company supervisor's mobile phones and the company's monitoring screen. Their function is to receive warning notifications of dangerous events sent by the system host. The monitoring screen on the company side can also monitor the images returned from the cockpit in real time, allowing the company to achieve effective management.

詳言之，請參閱圖1及圖2，圖1為本發明之基於物件時空關係之危險駕駛行為偵測系統之示意圖，圖2為本發明之基於物件時空關係之危險駕駛行為偵測方法之示意圖。本發明之基於物件時空關係之危險駕駛行為偵測系統10，主要由四個部份組成，包含駕駛艙內設備20、系統主機30、複數個手機40及監看螢幕50。For details, please refer to Figures 1 and 2. Figure 1 is a schematic diagram of the dangerous driving behavior detection system based on the spatio-temporal relationship of objects according to the present invention, and Figure 2 is a dangerous driving behavior detection method based on the spatio-temporal relationship of objects according to the present invention. Schematic diagram. The dangerous driving behavior detection system 10 based on the spatio-temporal relationship of objects of the present invention mainly consists of four parts, including the equipment in the cockpit 20, the system host 30, a plurality of mobile phones 40 and the monitoring screen 50.

駕駛艙內設備20係用以進行一影像拍攝及即時串流步驟S100，且係設於汽車等載具之駕駛艙之內部，且至少包含具網路傳輸功能之微型攝影裝置22及具無線網路傳輸功能之微處理器24，其中具網路傳輸功能之微型攝影裝置22係具有攝影鏡頭以拍攝駕駛艙中之影像，此影像係連續畫面。而且，具無線網路傳輸功能之微處理器24係將上述之影像利用MJPG即時影像串流技術呈現於網頁上。網頁框架較佳為具有輕巧簡潔及擴展性強之特色，因此與網路傳輸技術結合，執行串流網頁可以大幅減少傳統技術所無法解決之負擔。MJPG即時影像串流技術是種良好的開源專案(open source project)，可通過HTTP的方式訪問無線網路傳輸技術上的相容攝像頭(即，具網路傳輸功能之微型攝影裝置22)，從而做到遠端視訊傳輸的效果，其可透過設定網頁的action參數為Stream即可實現即時串流影像的效果。其中，駕駛艙內設備20還選擇性包含語音撥放單元26，此語音撥放單元26例如為喇叭等播音裝置，且例如電性連接具無線網路傳輸功能之微處理器24。語音撥放單元26係接收具無線網路傳輸功能之微處理器24所傳輸之語音指令，藉以產生對應之語音。The equipment 20 in the cockpit is used to perform an image capturing and real-time streaming step S100, and is installed inside the cockpit of a vehicle such as a car, and at least includes a micro camera device 22 with network transmission function and a wireless network. The microprocessor 24 has a network transmission function, and the micro photography device 22 with a network transmission function has a camera lens to capture images in the cockpit, and the images are continuous images. Moreover, the microprocessor 24 with wireless network transmission function presents the above-mentioned images on the web page using MJPG real-time image streaming technology. Web page frameworks are preferably lightweight, concise and highly scalable. Therefore, combined with network transmission technology, executing streaming web pages can significantly reduce the burden that traditional technologies cannot solve. MJPG real-time image streaming technology is a good open source project that can access cameras compatible with wireless network transmission technology (i.e., micro camera devices 22 with network transmission functions) through HTTP, thereby To achieve the effect of remote video transmission, the effect of real-time streaming of images can be achieved by setting the action parameter of the web page to Stream. Among them, the equipment in the cockpit 20 also optionally includes a voice playback unit 26. The voice playback unit 26 is, for example, a broadcasting device such as a speaker, and is electrically connected to a microprocessor 24 with a wireless network transmission function. The voice playing unit 26 receives voice commands transmitted by the microprocessor 24 with wireless network transmission function, thereby generating corresponding voice.

在本發明之基於物件時空關係之危險駕駛行為偵測系統10中，系統主機30係用以進行一危險駕駛行為偵測(abnormal driving behavior detection)步驟S200，且係例如設於駕駛員所屬之公司之處所。此系統主機30係接收駕駛艙內設備20所傳輸之上述影像，並以深度學習模型架構針對該影像進行危險駕駛行為偵測(abnormal driving behavior detection)步驟S200。其中，危險駕駛行為偵測步驟S200包含依據影像中之複數個物件於駕駛艙中的位置關係與重疊情形定義出該些物件之複數個空間事件(spatial event)，以及依據空間事件的時間關係(temporal relation)，藉以判斷影像中是否出現危險駕駛行為。若出現危險駕駛行為，則系統主機30發出危險事件警示通知並將危險駕駛行為之類型傳輸至駕駛艙中之具無線網路傳輸功能之微處理器24，藉此具無線網路傳輸功能之微處理器24可依據此危險駕駛行為之類型產生對應之語音指令，使得駕駛艙中之語音撥放單元26產生對應之語音警示提醒。駕駛艙的影像是由一連串連續的畫面所組成，一個畫面(frame)經YOLOv5物件偵測後，可能包含了人臉、眼睛、雙手、安全帶、手機、食物等多種物件(object)。舉例而言，這些物件可如圖3所示之物件a-左手、物件b-右手、物件c-方向盤、物件d-手機、物件e-左眼、物件f-右眼等。In the dangerous driving behavior detection system 10 based on the spatio-temporal relationship of objects of the present invention, the system host 30 is used to perform an abnormal driving behavior detection (abnormal driving behavior detection) step S200, and is, for example, located in the company where the driver belongs. place. The system host 30 receives the above-mentioned image transmitted by the device 20 in the cockpit, and uses a deep learning model architecture to perform abnormal driving behavior detection (abnormal driving behavior detection) on the image in step S200. Among them, the dangerous driving behavior detection step S200 includes defining a plurality of spatial events (spatial events) of a plurality of objects in the image based on their positional relationships and overlapping conditions in the cockpit, and based on the temporal relationship of the spatial events ( temporal relation) to determine whether dangerous driving behavior occurs in the image. If a dangerous driving behavior occurs, the system host 30 issues a dangerous event warning notification and transmits the type of dangerous driving behavior to the microprocessor 24 with a wireless network transmission function in the cockpit, whereby the microprocessor 24 with a wireless network transmission function The processor 24 can generate a corresponding voice command according to the type of dangerous driving behavior, so that the voice playing unit 26 in the cockpit generates a corresponding voice warning reminder. The image of the cockpit is composed of a series of continuous pictures. After a frame is detected by YOLOv5 objects, it may contain a variety of objects (objects) such as faces, eyes, hands, seat belts, mobile phones, and food. For example, these objects can be as shown in Figure 3: object a-left hand, object b-right hand, object c-steering wheel, object d-mobile phone, object e-left eye, object f-right eye, etc.

舉例而言，危險駕駛行為及對應之語音警示提醒之範例例如為(1)打瞌睡(雙眼闔上)，亦即若辨識出駕駛員雙眼閉闔，則語音提醒駕駛員「不要睡著喔」。(2) 使用手機，亦即若辨識出駕駛員使用手機，則語音提醒駕駛員「開車請勿使用手機」。 (3)未繫安全帶，亦即若辨識出駕駛員未繫上安全帶，也就是沒辨識到安全帶，則語音提醒駕駛員「請繫上安全帶」。(4)雙手離開方向盤，亦即若辨識出駕駛員未握方向盤，也就是沒辨識到雙手，則語音提醒駕駛員「請握緊方向盤」。 (5)未偵測到駕駛員，亦即若辨識不出駕駛的臉部，則語音提醒駕駛員「未偵測到駕駛員請直視前方」。而且，上述範例僅為舉例，並非用以限定本發明。For example, examples of dangerous driving behaviors and corresponding voice warning reminders are (1) Dozing off (eyes closed), that is, if the driver's eyes are recognized to be closed, the driver will be voice reminded "Don't fall asleep." oh". (2) Using a mobile phone, that is, if it is recognized that the driver is using a mobile phone, the driver will be reminded by voice to "Do not use the mobile phone while driving." (3) The seat belt is not fastened, that is, if it is recognized that the driver has not fastened the seat belt, that is, the seat belt is not recognized, the driver will be reminded by voice "Please fasten the seat belt." (4) With both hands off the steering wheel, that is, if it is recognized that the driver is not holding the steering wheel, that is, his hands are not recognized, the driver will be reminded by voice to "please hold the steering wheel tightly." (5) The driver is not detected, that is, if the driver's face cannot be recognized, the driver will be reminded by voice "No driver detected, please look straight ahead." Moreover, the above examples are only examples and are not intended to limit the present invention.

此外，本發明上述之危險駕駛行為偵測步驟係例如使用YOLOv5架構且採用Mosaic(馬賽克)數據增強技術以訓練複數個物件偵測模型，藉以偵測出該些物件，且藉由定義該些物件之該些空間事件與該些空間事件的時間關係，藉以判斷該影像中是否出現該危險駕駛行為。In addition, the above-mentioned dangerous driving behavior detection step of the present invention uses, for example, the YOLOv5 architecture and Mosaic data enhancement technology to train multiple object detection models to detect these objects, and by defining these objects These spatial events and the time relationship between these spatial events are used to determine whether the dangerous driving behavior occurs in the image.

本發明係依據各種物件(例如，物件a-左手、物件b-右手、物件c-方向盤、物件d-手機、物件e-左眼、物件f-右眼等物件)於空間中的位置關係與重疊情形，定義出許多不同的空間事件(spatial event)，例如物件a-左手(Object-LHand)、物件b-右手(Object-RHand)和物件c-方向盤(Object-Steering Wheel)沒有任何重疊，將被視為手部離開方向盤的事件；再如，物件a-左手(Object-LHand)或物件b-右手b(Object-RHand)若和物件d-手機(Object-Mobile Phone)重疊，將被視為滑手機或接聽電話的事件。本發明可以透過計算各物件(Object)的邊界框(bounding box)重疊情況輕易求得物件(Object)的重疊情形，如圖4所示，本發明係以兩個物件重疊區域之面積除以兩個物件之邊界框總和面積之數值百分比來判斷這兩個物件重疊情況，且上述之數值百分比係可依據需求而定，數值百分比可例如為介於1%至100%之間之任意數值。此外，本發明也考量空間事件的時間的關係(temporal relation)，當某空間事件被辨識出來後，將開始計算該事件的持續時間(影像幀數，frame count)，一旦持續時間超過各事件所定義的成立時間(空間事件的禎數超過定義的數值)，該事件將被認定為危險駕駛行為，例如，手持手機超過5秒鐘(125幀)，如圖4所示，或者是閉眼超過2秒鐘(50幀)(如圖5所示)。舉例而言，每項危險駕駛行為都有程式的計數，對於有抓到此項危險駕駛行為做歸零的動作，當幀數符合條件的時候，就會進行截圖，並語音撥放。The present invention is based on the positional relationship between various objects (for example, object a-left hand, object b-right hand, object c-steering wheel, object d-mobile phone, object e-left eye, object f-right eye, etc.) in space and The overlapping situation defines many different spatial events. For example, object a-left hand (Object-LHand), object b-right hand (Object-RHand) and object c-steering wheel (Object-Steering Wheel) do not have any overlap. It will be regarded as an event when the hand leaves the steering wheel; for another example, if object a-left hand (Object-LHand) or object b-right hand b (Object-RHand) overlaps with object d-mobile phone (Object-Mobile Phone), it will be Considered as an event of swiping the phone or answering the phone. The present invention can easily obtain the overlapping situation of objects (Object) by calculating the overlapping situation of the bounding boxes of each object (Object). As shown in Figure 4, the present invention divides the area of the overlapping area of two objects by the two objects. The numerical percentage of the total area of the bounding box of each object is used to determine the overlap of the two objects, and the above numerical percentage can be determined according to the requirements. The numerical percentage can, for example, be any value between 1% and 100%. In addition, the present invention also considers the temporal relation of spatial events. When a spatial event is identified, the duration of the event (frame count) will be calculated. Once the duration exceeds the limit of each event, The defined establishment time (the number of frames of the space event exceeds the defined value), the event will be considered as dangerous driving behavior, for example, holding a mobile phone for more than 5 seconds (125 frames), as shown in Figure 4, or closing eyes for more than 2 seconds (50 frames) (as shown in Figure 5). For example, each dangerous driving behavior has a program count. If the dangerous driving behavior is caught, the action will be reset to zero. When the number of frames meets the conditions, a screenshot will be taken and the voice will be played.

上述之物件物件係包含出現在駕駛艙中之複數個物品，例如食物、安全帶、方向盤、手機等各種物品，以及駕駛員之複數個人體特徵，例如人臉、眼睛、雙手等各種人體器官。上述之危險駕駛行為偵測步驟係依據該些物件於該駕駛艙中的位置關係與重疊情形是否超過對應之預設值(例如第一預設值，其可為上述之重疊情形之數值百分比之範圍中之任意數值)，藉以定義出該些物件之該些空間事件，該危險駕駛行為偵測步驟係依據該些空間事件的持續時間是否超過對應之預設值(例如第二預設值)，藉以判斷該影像中是否出現該危險駕駛行為。The above-mentioned objects include a plurality of items appearing in the cockpit, such as food, seat belts, steering wheels, mobile phones and other items, as well as a plurality of the driver's personal characteristics, such as the face, eyes, hands and other human organs. . The above-mentioned dangerous driving behavior detection step is based on whether the positional relationship of the objects in the cockpit and the overlapping situation exceed the corresponding preset value (for example, the first preset value, which can be a numerical percentage of the above-mentioned overlapping situation). any value in the range), thereby defining the spatial events of the objects. The dangerous driving behavior detection step is based on whether the duration of the spatial events exceeds the corresponding preset value (such as the second preset value) , to determine whether the dangerous driving behavior appears in the image.

在本發明之基於物件時空關係之危險駕駛行為偵測系統中，複數個手機40係分別由駕駛員及駕駛員所屬之公司之主管所持有，這些手機40係用以進行一通訊軟體警示步驟S300，例如藉由具有訊息警示功能之通訊軟體接收由系統主機30發出的危險事件警示通知。其中，上述之通訊軟體例如為Line即時通訊平台，且上述之訊息警示功能則係例如為Line即時通訊平台之Line Notify功能。藉此，當危險駕駛行為被辨識出來後，本發明之危險駕駛行為偵測系統會透過Line Notify即刻發出危險駕駛行為的畫面與警示訊息，因此駕駛員及其主管之手機40將會顯示上述之危險事件警示通知，如圖5所示，其中危險事件警示通知可例如包含對應於上述之危險事件之影像截圖及危險事件之內容。In the dangerous driving behavior detection system based on the spatio-temporal relationship of objects of the present invention, a plurality of mobile phones 40 are held by the driver and the manager of the company to which the driver belongs respectively. These mobile phones 40 are used to carry out a communication software warning step. S300, for example, receiving a dangerous event warning notification sent by the system host 30 through communication software with a message warning function. Among them, the above-mentioned communication software is, for example, the Line instant messaging platform, and the above-mentioned message alert function is, for example, the Line Notify function of the Line instant messaging platform. In this way, when dangerous driving behavior is identified, the dangerous driving behavior detection system of the present invention will immediately send out the picture and warning message of the dangerous driving behavior through Line Notify, so the mobile phones 40 of the driver and his supervisor will display the above. The dangerous event warning notification is as shown in Figure 5. The dangerous event warning notification may, for example, include image screenshots corresponding to the above dangerous events and the content of the dangerous events.

此外，在本發明之基於物件時空關係之危險駕駛行為偵測系統中更具有監看螢幕50，用以進行一監看步驟S400。監看螢幕50係例如設於駕駛員所屬之公司之處所，用以經由上述網頁即時撥放具網路傳輸功能之微型攝影裝置22所拍攝之影像以及即時顯示系統主機30所發出之危險事件警示通知，其中危險事件警示通知可例如包含對應於上述之危險事件之影像截圖及危險事件之內容。而且，本發明之危險駕駛行為偵測系統之系統主機30還可選擇性將具網路傳輸功能之微型攝影裝置22所拍攝之影像中出現該危險駕駛行為之時間與該類型記錄於一資料庫32中，以便作為該駕駛員之考核與管理。In addition, the dangerous driving behavior detection system based on the spatio-temporal relationship of objects of the present invention further has a monitoring screen 50 for performing a monitoring step S400. The monitoring screen 50 is, for example, installed at the company premises where the driver belongs, and is used to instantly display the images captured by the micro camera device 22 with network transmission function through the above-mentioned web page and the real-time display of dangerous event warnings issued by the system host 30 Notification, in which the dangerous event warning notification may, for example, include image screenshots corresponding to the above-mentioned dangerous events and the content of the dangerous events. Moreover, the system host 30 of the dangerous driving behavior detection system of the present invention can also selectively record the time and type of the dangerous driving behavior in the images captured by the micro camera device 22 with network transmission function in a database. 32 in order to assess and manage the driver.

在一實施例中，為了偵測環境以及駕駛員，本發明以影像中整張圖片為訓練。本發明之製作係以駕駛艙之前座為主要基準，駕駛員的眼神以清晰正視為訓練基準，安全帶的部分則係針對不同款式的衣服去做訓練，且方向盤必須兩側以上有被握持。YOLOv5架構主要由輸入端(Input)、骨幹(Backbone)、頸部(Neck)網路及輸出端(Head)組成，YOLOv5主要有四個模型，包含s、m、l及x。針對輸入端，本發明在模型訓練階段，提出的改進策略主要包括Mosaic數據增強、自適應錨框計算、自適應圖片縮放。亦即，輸入端係採用Mosaic(馬賽克)數據增強技術輸入影像，Mosaic數據增強分成隨機縮放、隨機裁剪、隨機排布的方式進行拼接，對於小目標有良好的檢測效果。骨幹係形成圖形特徵。就骨幹之運作而言，本發明係融合其他偵測演算法中的一些新策略，主要包括Focus結構與CSP結構。假定640 x 640 x 3 輸入進來後會先經過一個 Focus 模塊，Focus模塊再對圖片進行切片操作，將圖形相鄰的四個位置進行堆疊，鄰近採樣，這樣就拿到了四張圖片，四張圖片互補，沒有失真，輸入變成了原本的 4 倍，拼接起來的圖片相對於原先的RGB三通道模式變成了12 個通道，最後將得到的新圖片再經過卷積層操作，最終得到了沒有失真的兩倍採樣特徵圖。頸部網路係對圖形特徵進行混合和組合，生成特徵金字塔。本發明所採用之YOLOv5物件偵測網路在骨幹與最後的輸出層之間添加了FPN及PAN結構，來進一步提升特徵的多樣性及強韌性(robust)。舉例而言，分為兩種採樣模式，將不同層特徵參數進行重新聚合，向下採樣(FPN)，和向上採樣(Botton-up)，加強結構化區別特徵，形成新一組的特徵金字塔。輸出端則係對圖形特徵進行預測，應用定框，生成信心值和邊界框的最終輸出向量。本發明之輸出層的錨框機制與YOLOv4相同，主要改進的是訓練時的損失函數GIOU_Loss以及預測框篩選的DIOU_nms。舉例而言，在網路訓練中，網路會在基礎上輸出預測框，進而和基準真相(ground truth)進行比對，計算兩者差距，再反向更新反覆運算參數，YOLOv5在每次訓練過程中，自我計算最佳的錨框值。In one embodiment, in order to detect the environment and the driver, the present invention uses the entire image in the image as training. The production of this invention is based on the front seat of the cockpit as the main benchmark. The driver's eyes are clear and square as the training benchmark. The seat belt part is trained on different styles of clothes, and the steering wheel must be held on both sides or above. . The YOLOv5 architecture mainly consists of input, backbone, neck network and output. YOLOv5 mainly has four models, including s, m, l and x. For the input end, the improvement strategies proposed by the present invention during the model training phase mainly include Mosaic data enhancement, adaptive anchor frame calculation, and adaptive picture scaling. That is to say, the input end uses Mosaic data enhancement technology to input images. Mosaic data enhancement is divided into random scaling, random cropping, and random arrangement for splicing, which has good detection results for small targets. Backbone systems form graphic features. As far as the operation of the backbone is concerned, the present invention integrates some new strategies in other detection algorithms, mainly including the Focus structure and the CSP structure. Assume that after inputting 640 x 640 x 3, it will first pass through a Focus module. The Focus module then slices the image, stacks the four adjacent positions of the graphics, and samples them adjacently. In this way, four pictures are obtained. Complementary, without distortion, the input becomes 4 times the original, and the spliced picture becomes 12 channels compared to the original RGB three-channel mode. Finally, the new picture obtained is passed through the convolution layer operation, and finally the two images without distortion are obtained. times sampling feature map. The neck network mixes and combines graphic features to generate a feature pyramid. The YOLOv5 object detection network used in the present invention adds FPN and PAN structures between the backbone and the final output layer to further enhance the diversity and robustness of features. For example, it is divided into two sampling modes, which re-aggregate the feature parameters of different layers, down-sampling (FPN), and up-sampling (Botton-up), to strengthen the structured distinguishing features and form a new set of feature pyramids. The output end predicts the graphical features, applies the frame, and generates the final output vector of the confidence value and the bounding box. The anchor frame mechanism of the output layer of the present invention is the same as that of YOLOv4. The main improvements are the loss function GIOU_Loss during training and the DIOU_nms for prediction frame screening. For example, in network training, the network will output the prediction frame on the basis, and then compare it with the ground truth, calculate the difference between the two, and then update the iterative calculation parameters in reverse. YOLOv5 will perform the calculation in each training During the process, the best anchor box value is calculated by itself.

綜上所述，本發明之基於物件時空關係之危險駕駛行為偵測系統及方法具有以下優點：To sum up, the dangerous driving behavior detection system and method based on the spatiotemporal relationship of objects of the present invention has the following advantages:

以上所述僅為舉例性，而非為限制性者。任何未脫離本發明之精神與範疇，而對其進行之等效修改或變更，均應包含於後附之申請專利範圍中。The above is only illustrative and not restrictive. Any equivalent modifications or changes that do not depart from the spirit and scope of the present invention shall be included in the appended patent scope.

10:危險駕駛行為偵測系統10: Dangerous driving behavior detection system

20:駕駛艙內設備20: Equipment in the cockpit

22:具網路傳輸功能之微型攝影裝置22: Micro photography device with network transmission function

24:具無線網路傳輸功能之微處理器24: Microprocessor with wireless network transmission function

26:語音撥放單元26: Voice playing unit

30:系統主機30: System host

32:資料庫32:Database

40:手機40:Mobile phone

50:監看螢幕50:Monitor screen

a、b、c、d、e、f:物件a, b, c, d, e, f: objects

S100、S200、S300、S400:步驟S100, S200, S300, S400: steps

圖1為本發明之基於物件時空關係之危險駕駛行為偵測系統之示意圖。Figure 1 is a schematic diagram of the dangerous driving behavior detection system based on the spatio-temporal relationship of objects according to the present invention.

圖2為本發明之基於物件時空關係之危險駕駛行為偵測方法之示意圖。Figure 2 is a schematic diagram of the dangerous driving behavior detection method based on the spatio-temporal relationship of objects according to the present invention.

圖3為本發明之危險駕駛行為偵測系統所拍攝之影像之畫面中之物件之示意圖。FIG. 3 is a schematic diagram of objects in the image captured by the dangerous driving behavior detection system of the present invention.

圖4為本發明之危險駕駛行為偵測系統判斷出使用手機之危險駕駛行為之示意圖。Figure 4 is a schematic diagram of the dangerous driving behavior detection system of the present invention determining the dangerous driving behavior of using a mobile phone.

圖5為本發明之危險駕駛行為偵測系統使用手機之通訊軟體呈現危險事件警示通知之示意圖。Figure 5 is a schematic diagram of the dangerous driving behavior detection system of the present invention using the communication software of a mobile phone to present warning notifications of dangerous events.

10:危險駕駛行為偵測系統 10: Dangerous driving behavior detection system

20:駕駛艙內設備 20: Equipment in the cockpit

22:具網路傳輸功能之微型攝影裝置 22: Micro photography device with network transmission function

24:具無線網路傳輸功能之微處理器 24: Microprocessor with wireless network transmission function

26:語音撥放單元 26: Voice playing unit

30:系統主機 30: System host

32:資料庫 32:Database

40:手機 40:Mobile phone

50:監看螢幕 50:Monitor screen

Claims

A dangerous driving behavior detection system based on the spatio-temporal relationship of objects, including: a cockpit device, located inside a cockpit, including at least a miniature camera device with a network transmission function and a wireless network transmission function. Microprocessor, the micro-photography device with network transmission function is equipped with a photographic lens to capture an image in the cockpit, the image is a continuous frame, and the microprocessor with wireless network transmission function is to The image is presented on a web page using MJPG real-time image streaming technology; a system host receives the image and uses a deep learning model architecture to perform abnormal driving behavior detection on the image ) step, the dangerous driving behavior detection step includes defining a plurality of spatial events of the plurality of objects in the image based on the positional relationship and overlap of the objects in the cockpit, and based on the time relationship of the spatial events, This is used to determine whether a dangerous driving behavior occurs in the image. If the dangerous driving behavior occurs, the system host issues a dangerous event warning notification and transmits the type of the dangerous driving behavior to the wireless network in the cockpit. The microprocessor with the transmission function enables a voice playback unit in the cockpit to generate a corresponding voice warning reminder; a plurality of mobile phones are held by a driver and a manager of a company to which the driver belongs. , used to receive the dangerous event warning notification sent by the system host through a communication software with a message warning function; and a monitoring screen used to instantly play the micro photography with network transmission function through the web page The image captured by the device and the dangerous event warning notification issued by the system host are instantly displayed.

The dangerous driving behavior detection system based on the spatio-temporal relationship of objects as described in claim 1, wherein the dangerous driving behavior detection step uses the YOLOv5 architecture and adopts Mosaic data enhancement technology to train multiple object detection models to detect these objects, and by defining the spatial events of the objects and the temporal relationship between the spatial events, it is determined whether the dangerous driving behavior occurs in the image.

The dangerous driving behavior detection system based on the spatio-temporal relationship of objects as described in claim 1, wherein the system host records the time and type of the dangerous driving behavior in the image in a database to serve as the driver assessment and management.

The dangerous driving behavior detection system based on the spatio-temporal relationship of objects as described in claim 1, wherein the dangerous driving behavior detection step is based on whether the positional relationship and overlap of the objects in the cockpit exceed the corresponding first prediction Values are set to define the spatial events of the objects. The dangerous driving behavior detection step is based on whether the duration of the spatial events exceeds the corresponding second preset value to determine whether the hazard appears in the image. driving behavior.

The dangerous driving behavior detection system based on the spatio-temporal relationship of objects as described in claim 1, wherein the objects include a plurality of items in the cockpit and a plurality of personal characteristics of the driver, and the items are food, safety Belt, steering wheel, mobile phone, these human body features are face, eyes, hands.

A dangerous driving behavior detection method based on the spatio-temporal relationship of objects, including the following steps: performing an image shooting and real-time streaming step, which uses a miniature photography device with a photography lens and a network transmission function to capture a cockpit an image, which is a continuous image, and use a microprocessor with a wireless network transmission function to present the image on a web page using an MJPG real-time image streaming technology; perform an abnormal driving behavior detection step, which uses A system host receives the image and uses a deep learning model architecture to perform the abnormal driving behavior detection step on the image. The abnormal driving behavior detection step includes detecting the abnormal driving behavior based on a plurality of objects in the image. The positional relationship and overlapping situation in the cabin define a plurality of spatial events of these objects, and based on the time relationship of these spatial events, it is judged whether a dangerous driving behavior occurs in the image. If the dangerous driving behavior occurs, then The system host issues a dangerous event warning notification and transmits the type of dangerous driving behavior to the microprocessor with wireless network transmission function in the cockpit, thereby causing a voice playback unit in the cockpit to generate a corresponding A voice warning reminder; carrying out a communication software warning step, which uses multiple mobile phones respectively held by a driver and a manager of a company to which the driver belongs, through communication with a message warning function The software receives the dangerous event warning notification issued by the system host; and performs a monitoring step, which uses a monitoring screen to instantly play the image captured by the micro camera device with network transmission function through the web page; and Immediately display the dangerous event warning notification issued by the system host.

The dangerous driving behavior detection method based on the spatio-temporal relationship of objects as described in request 6, wherein the dangerous driving behavior detection step uses the YOLOv5 architecture and adopts Mosaic data enhancement technology to train multiple object detection models to detect these objects, and by defining the spatial events of the objects and the temporal relationship between the spatial events, it is determined whether the dangerous driving behavior occurs in the image.

The dangerous driving behavior detection method based on the spatio-temporal relationship of objects as described in claim 6, wherein the system host records the time and type of the dangerous driving behavior in the image in a database to serve as the driver assessment and management.

The dangerous driving behavior detection method based on the spatio-temporal relationship of objects as described in claim 6, wherein the dangerous driving behavior detection step is based on whether the positional relationship and overlap of the objects in the cockpit exceed the corresponding first prediction Values are set to define the spatial events of the objects. The dangerous driving behavior detection step is based on whether the duration of the spatial events exceeds the corresponding second preset value to determine whether the hazard appears in the image. driving behavior.

The dangerous driving behavior detection method based on the spatio-temporal relationship of objects as described in claim 6, wherein the objects include a plurality of objects in the cockpit and a plurality of personal characteristics of the driver.