CN111222477A

CN111222477A - Vision-based method and device for detecting two hands leaving steering wheel

Info

Publication number: CN111222477A
Application number: CN202010026699.4A
Authority: CN
Inventors: 戚治舟; 王汉超
Original assignee: Xiamen Ruiwei Information Technology Co ltd
Current assignee: Xiamen Ruiwei Information Technology Co ltd
Priority date: 2020-01-10
Filing date: 2020-01-10
Publication date: 2020-06-02
Anticipated expiration: 2040-01-10
Also published as: CN111222477B

Abstract

The invention provides a vision-based detection method for hands leaving a steering wheel, which comprises the following steps: collecting sample data, then labeling the sample data, and then performing network training and optimization by using the labeled sample data to obtain a model; converting the model into a model under ncnn; acquiring an infrared picture of a driver, processing the picture, inputting the processed picture into a model, analyzing a model result to acquire a position of a steering wheel, expanding the area of the steering wheel, selecting a set roi, cutting out the roi, processing the cut picture, inputting the processed picture into the model to judge whether two hands of the driver leave the steering wheel, and giving an alarm if the driver does not have one hand and then gets on the steering wheel; if not, no alarm is given; the invention also provides a device which can effectively improve the detection rate of the model, reduce the network input size and improve the model speed more quickly.

Description

Vision-based method and device for detecting two hands leaving steering wheel

Technical Field

The invention relates to the technical field of computers, in particular to a vision-based method and device for detecting that two hands leave a steering wheel.

Background

When a driver drives a vehicle, a plurality of factors interfere with safe driving, and the safety of passengers can be damaged by behaviors that the driver does not necessarily comply with traffic regulations and safe operating regulations, such as driving for answering a call, smoking and the like. At the moment, the driver can be warned to correct the irregular behavior of the driver by judging the time length of the driver when the two hands leave the steering wheel. Currently, there are three main methods for detecting the driver's hands off the steering wheel:

(1) based on the steering wheel torque signal: the driver torque state is estimated from the plurality of electric steering signals, the torque state is used to determine whether the driver grips the steering wheel, and the magnitude of the driver torque state is compared to a high grip torque threshold. The method has advantages in processing speed, but has obvious defects, only can use preset experience threshold, and has poor robustness and small application range.

(2) Methods for measuring both-hand pressure or temperature based on a steering wheel sensor, and the like: the sensor is installed around the steering wheel mainly through the ability of hardware sensor, judges whether both hands grip the steering wheel through temperature or forced induction. The method has high processing speed, but has high cost, is easy to be influenced by external interference factors, and is easy to cause false alarm.

(3) Machine vision based methods: with the development of deep learning, computer machine vision technology is rapidly developed through a convolutional neural network. The greatest advantage of the deep learning method is that the characteristics required by the target task are learned through the convolutional network, and the recognition rate of the deep learning object can exceed that of the human eye in many fields. Due to the limitation of computing power of hardware equipment, the neural network with excellent effect needs enough computing power, so that many artificial intelligence projects are difficult to land.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a method and a device for detecting that two hands leave a steering wheel based on vision, which can effectively improve the detection rate of a model, reduce the size of network input and improve the speed of the model more quickly.

In a first aspect, the present invention provides a method comprising:

step 1, collecting sample data, then labeling the sample data, and then performing network training and optimization by using the labeled sample data to obtain a model;

step 2, converting the model into a model under ncnn;

step 3, acquiring an infrared picture of the driver, processing the picture, inputting the processed picture into a model, analyzing a model result to acquire a position of a steering wheel, expanding the area of the steering wheel, selecting a set roi, cutting out the roi, processing the cut picture, inputting the processed picture into the model to judge whether the two hands of the driver leave the steering wheel, and giving an alarm if the driver does not have one hand and then gets on the steering wheel; otherwise, no alarm is given.

Further, the step 1 is further specifically: collecting infrared pictures of a driver through a monitoring camera in the vehicle, wherein the infrared pictures comprise a picture of the camera right above the driver, a picture of the camera at the top of a vehicle door and a picture of a steering wheel crawled on a network;

marking a steering wheel in the collected infrared picture to obtain a position coordinate of the steering wheel, expanding the area of the steering wheel, selecting a set roi, cutting the area, marking the picture, and respectively marking a hand coordinate of a driver gripping the steering wheel, a hand coordinate of a driver not gripping the steering wheel and category information;

the method comprises the steps of performing network training by using a caffe frame, converting marked infrared pictures into lmdb training data under the caffe, selecting a MobileNet v2-yolov3 network for a steering wheel detection network and a double-hand leaving steering wheel recognition network, setting learning rate and training data quantity in each time by adopting an SGD optimization learning method, performing data enhancement operation on pictures with different input sizes, stabilizing and converging network loss values after the steering wheel detection network and the double-hand leaving steering wheel recognition network are trained for a set number of times, pruning and optimizing the steering wheel detection network, and finally obtaining a trained model.

Further, the step 3 is further specifically:

step 31, starting the vehicle, acquiring an infrared picture of a driver, processing the picture, inputting the picture into a steering wheel detection network if the position of the steering wheel does not exist, analyzing a model result to acquire the position of the steering wheel, and entering step 32; if yes, go to step 32;

step 32, expanding the area of the steering wheel, selecting a set roi, cutting out the roi, processing the cut picture, inputting the picture into a two-hand-away steering wheel recognition network to judge whether the two hands of the driver leave the steering wheel, if the two hands leave the steering wheel recognition network and continuously recognize that the driver does not have one hand and then the steering wheel within a set time, calling the steering wheel detection network again, if the steering wheel is detected, giving an alarm, and if the steering wheel is not detected, giving no alarm; if the two hands leave the steering wheel recognition network within the set time and continuously recognize that the driver has one or two hands to go to the steering wheel, no alarm is given.

In a second aspect, the present invention provides an apparatus comprising:

the training optimization module collects sample data, then labels the sample data, and then performs network training and optimization by adopting the labeled sample data to obtain a model;

the conversion module is used for converting the model into a model under ncnn;

the detection module is used for acquiring an infrared picture of a driver, processing the picture, inputting the processed picture into the model, analyzing a model result to acquire the position of a steering wheel, expanding the area of the steering wheel, selecting a set roi, cutting out the roi, processing the cut picture, inputting the processed picture into the model to judge whether the two hands of the driver leave the steering wheel or not, and giving an alarm if the driver does not have one hand and then puts the steering wheel on the steering wheel; otherwise, no alarm is given.

Further, the training optimization module is further specifically: collecting infrared pictures of a driver through a monitoring camera in the vehicle, wherein the infrared pictures comprise a picture of the camera right above the driver, a picture of the camera at the top of a vehicle door and a picture of a steering wheel crawled on a network;

Further, the detection module further specifically includes:

the position unit is used for starting the vehicle, acquiring an infrared picture of a driver, processing the picture, inputting the picture into a steering wheel detection network if the steering wheel position does not exist, analyzing a model result to acquire the steering wheel position, and entering the alarm unit; if yes, entering an alarm unit;

the warning unit is used for expanding the area of the steering wheel, selecting a set roi, cutting out the roi, processing the cut picture, inputting the picture into a two-hand-away steering wheel recognition network to judge whether the two hands of a driver leave the steering wheel or not, if the two hands leave the steering wheel recognition network and continuously recognize that the driver does not have one hand and then the steering wheel within a set time, calling the steering wheel detection network again, if the steering wheel is detected, giving a warning, and if the steering wheel is not detected, giving no warning; if the two hands leave the steering wheel recognition network within the set time and continuously recognize that the driver has one or two hands to go to the steering wheel, no alarm is given.

One or more technical solutions provided in the embodiments of the present invention have at least the following technical effects or advantages:

(1) the monitoring camera in the vehicle is directly utilized, no matter the camera is at any angle, the region of the detected roi can be cut out through a steering wheel detection algorithm, the trouble of camera installation is not needed, the application range is wide, and the cost is low.

(2) Through the combined judgment of the steering wheel detection model and the two network models of whether the two hands leave the steering wheel identification network, the problems of false alarm and external interference can be better solved. When people and objects shelter from the camera or the installation position of the camera is not good, false alarm can not be caused. Whether hands leave the steering wheel or not is detected by intercepting interested roi areas through a steering wheel detection model, the detection rate of the model can be effectively improved, the size of network input can be reduced, and the speed of the model can be improved more quickly.

(3) The scheme combines a light-weight mobilenet v2 main network and a post-processing method of a yolo v3 network with a very good detection effect. Although two detection networks are needed to be matched, the speed is not influenced at all, the network consumption time can be ignored due to the fact that the number of times of detection of the steering wheel is small, whether two hands leave the steering wheel recognition network or not is small, the speed is high, and 40-60 ms can be achieved on the arm. On the premise of high speed, the method has better detection effect and better robustness compared with other algorithms. In the scheme, 328904 pictures are collected in total, 25592 test sets are collected, the accuracy rate of steering wheel detection reaches more than 99%, and the preparation rate of separating hands from the steering wheel reaches more than 95%.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

The invention will be further described with reference to the following examples with reference to the accompanying drawings.

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a sample annotation view of the present invention;

FIG. 3 is a flowchart and steps for online model deployment according to the present invention.

Detailed Description

The technical scheme in the embodiment of the application has the following general idea:

(1) the technical innovation point lies in that the positioning of the steering wheel position is accurate, the required hardware computing power is not large, and the steering wheel detection speed is high.

(2) The acquired steering wheel area is expanded, an interested roi area is selected, the roi area is input to a convolutional network, the convolutional network can identify whether two hands of a driver leave the steering wheel or not, the positions of the two hands are output by the convolutional network at the same time, and the state of each hand (whether the steering wheel is grabbed or not) is judged.

(3) In order to overcome the misinformation that people or objects block the steering wheel when a driver tightly grips the steering wheel to cause errors, multiframes are adopted to judge whether the driver tightly grips the steering wheel or not in a continuous period of time and whether the steering wheel is blocked or not is judged by detecting the steering wheel, and the misinformation is thoroughly solved by integrating the identification results of the multiframes.

The first part is model training, including data collection, data format conversion, network selection, model training and network optimization; the second part is online service deployment, which comprises model conversion, preprocessing under a mobile terminal framework, code writing such as network analysis and the like.

First, the detailed steps and process of deep learning model training (as shown in FIG. 1)

(1) And (3) collecting data: the infrared pictures of the driver are collected through the monitoring camera in the vehicle, and the infrared pictures comprise pictures of the camera right above the driver, pictures of the camera at the top of the vehicle door and pictures of a steering wheel crawled on a net, and various scenes (the pictures under the conditions of day, night, strong light, dim light, backlight and the like) are covered.

(2) Labeling of sample data (see fig. 2): and marking the steering wheel in the collected infrared picture data to obtain the position coordinate of the steering wheel, wherein the position of the steering wheel of the same vehicle is not changed, so that the marking is only needed once. And expanding the area of the steering wheel, selecting an interested roi, cutting the roi, labeling the picture, and respectively labeling the hand coordinates and category information of the driver gripping the steering wheel and the hand not gripping the steering wheel.

(3) Training and optimizing the network: from the above data collection and data labeling, we finally obtained training data. We use the cafe framework to perform network training, and need to convert the pictures and labeled data in the training data into lmdb training data under the cafe. The design of the network is next, and a lightweight network MobileNet-V2 is selected as a main network due to an algorithm required to run on an embedded device (arm). The MobileNet series network is a network specially designed by Google aiming at mobile terminal equipment, so that the calculation amount of the network is greatly reduced, and the performance can be improved by selecting the network as a main network. After the post-processing of the network is selected from the ssd network post-processing and the yolov3 network post-processing, tests show that the post-processing of yolov3 has better small target detection effect, and finally, the MobileNetv2-yolov3 network is selected for both steering wheel detection and double-hand off-steering wheel identification. The training begins after the network and lmdb training data are ready. By adopting an SGD optimization learning method, the learning rate is set to be 0.001, the number of the training data batch _ size of each network is 128, data enhancement operations such as random scaling and the like are carried out on pictures with different input sizes, and the robustness of the network is improved. After 15 ten thousand training times, the network loss value is stable and converged, and then the pruning optimization is carried out on the steering wheel detection network, so that a trained model is finally obtained. After training, the accuracy of the steering wheel detection network can reach more than 0.99 through testing on 3 ten thousand data sets, and the accuracy of the two hands leaving the steering wheel network can reach more than 0.95. Because the detection area of the steering wheel network with two hands away is the roi area of the steering wheel area, the network input is small, the consumed time is short, 40-60 ms can be achieved on the arm, and the requirements can be well met. However, the steering wheel detection network needs full-image detection, the network input is large, and the time consumption reaches 400 ms. And (4) carrying out pruning optimization on the network, wherein the pruning is mainly to reduce the output of partial network layers and remove redundant features extracted by the network. According to the target detection size, a layer of upsampling layer is inferred to have little effect in the network, one upsampling layer is deleted, finally, the time consumption of the network is reduced to 100ms from the original 400ms, and the accuracy can still reach more than 0.99.

Step two, step and flow of on-line model deployment (as fig. 3):

(1) model conversion: since the framework we use during training is caffe, if it is really deployed on the mobile end, it needs to be transplanted to the mobile end framework. A domestic mobile terminal framework has a good Tencent neural network forward computing framework ncnn and an deep neural network inference engine mnn of Ali. Since ncnn is used comparatively much, the model of caffe is converted to the model under ncnn.

(2) Code compiling such as preprocessing and network analysis: the method comprises the steps of acquiring an infrared picture of a driver through a camera, processing the picture by using a preprocessing code, inputting the processed picture into a steering wheel detection network, analyzing a model result to acquire a steering wheel position (the model result is a relative value, and the real coordinate of the steering wheel on the picture is required to be restored according to the size of the picture), wherein the steering wheel detection network only needs to be called once because the coordinate of the steering wheel of the same vehicle does not change. And expanding the area of the steering wheel, selecting an interested roi, cutting out the roi, processing the cut picture, and inputting whether the two hands leave the steering wheel network. If within 1 second, whether both hands leave the steering wheel network and continuously recognize that the driver does not have one hand on the steering wheel, the warning is prepared, the steering wheel detection network is called again before warning, if the steering wheel is detected, voice warning is given, the driver is prompted that both hands should be placed on the steering wheel for safe driving, and if the steering wheel is not detected, an object shelters from the steering wheel, and warning is not needed.

Example one

The present embodiment provides a method, comprising;

step 1, collecting infrared pictures of a driver through a monitoring camera in the vehicle, wherein the infrared pictures comprise a picture of the camera right above the driver, a picture of the camera at the top of a vehicle door and a picture of a steering wheel crawled on a network;

using a caffe frame to train a network, converting an infrared image after marking into lmdb training data under the caffe, selecting a MobileNet v2-yolov3 network for a steering wheel detection network and a double-hand leaving steering wheel recognition network, setting a learning rate and the quantity of each training data by adopting an SGD optimization learning method, performing data enhancement operation on images with different input sizes, after the steering wheel detection network and the double-hand leaving steering wheel recognition network are trained for a set number of times, stabilizing and converging a network loss value, then performing pruning optimization on the steering wheel detection network, and finally obtaining a trained model;

step 2, converting the model into a model under ncnn;

The step 3 is further specifically as follows:

Based on the same inventive concept, the application also provides a device corresponding to the method in the first embodiment, which is detailed in the second embodiment.

Example two

In this embodiment, an apparatus is provided, comprising:

the training optimization module collects infrared pictures of a driver through a monitoring camera in the vehicle, wherein the infrared pictures comprise a picture of the camera right above the driver, a picture of the camera at the top of a vehicle door and a picture of a steering wheel crawled on a net;

the conversion module is used for converting the model into a model under ncnn;

The detection module is further specifically:

Since the apparatus described in the second embodiment of the present invention is an apparatus used for implementing the method of the first embodiment of the present invention, based on the method described in the first embodiment of the present invention, a person skilled in the art can understand the specific structure and the deformation of the apparatus, and thus the details are not described herein. All the devices adopted in the method of the first embodiment of the present invention belong to the protection scope of the present invention.

Although specific embodiments of the invention have been described above, it will be understood by those skilled in the art that the specific embodiments described are illustrative only and are not limiting upon the scope of the invention, and that equivalent modifications and variations can be made by those skilled in the art without departing from the spirit of the invention, which is to be limited only by the appended claims.

Claims

1. A vision-based detection method for hands leaving a steering wheel is characterized in that: the method comprises the following steps:

step 2, converting the model into a model under ncnn;

2. The vision-based two-handed off-steering-wheel detection method of claim 1, wherein: the step 1 is further specifically as follows: collecting infrared pictures of a driver through a monitoring camera in the vehicle, wherein the infrared pictures comprise a picture of the camera right above the driver, a picture of the camera at the top of a vehicle door and a picture of a steering wheel crawled on a network;

3. The vision-based two-handed off-steering-wheel detection method of claim 2, wherein: the step 3 is further specifically as follows:

4. The utility model provides a both hands leave steering wheel detection device based on vision which characterized in that: the method comprises the following steps:

the conversion module is used for converting the model into a model under ncnn;

5. A vision-based hands-off steering wheel detection apparatus according to claim 4, wherein: the training optimization module is further specifically: collecting infrared pictures of a driver through a monitoring camera in the vehicle, wherein the infrared pictures comprise a picture of the camera right above the driver, a picture of the camera at the top of a vehicle door and a picture of a steering wheel crawled on a network;

6. A vision-based hands-off steering wheel detection apparatus according to claim 5, wherein: the detection module is further specifically: