CN110008834B - Steering wheel intervention detection and statistics method based on vision - Google Patents

Steering wheel intervention detection and statistics method based on vision Download PDF

Info

Publication number
CN110008834B
CN110008834B CN201910150734.0A CN201910150734A CN110008834B CN 110008834 B CN110008834 B CN 110008834B CN 201910150734 A CN201910150734 A CN 201910150734A CN 110008834 B CN110008834 B CN 110008834B
Authority
CN
China
Prior art keywords
intervention
steering wheel
detection
network structure
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910150734.0A
Other languages
Chinese (zh)
Other versions
CN110008834A (en
Inventor
程球
张雪莲
毛泉涌
文凌艳
周明政
赵云
胡芳芳
王军
谢兰青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETHIK Group Ltd
Original Assignee
CETHIK Group Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETHIK Group Ltd filed Critical CETHIK Group Ltd
Priority to CN201910150734.0A priority Critical patent/CN110008834B/en
Publication of CN110008834A publication Critical patent/CN110008834A/en
Application granted granted Critical
Publication of CN110008834B publication Critical patent/CN110008834B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a steering wheel intervention detection and statistics method based on vision, which comprises the following steps: constructing an integrated network structure containing steering wheel detection and intervention attribute identification, and training the integrated network structure by using a sample image; adopting the integrated network structure, taking a single frame image to be detected as the input of the integrated network structure, taking a steering wheel as a detection target, and judging whether the steering wheel is interfered on the current single frame image according to the interference attribute information output by the integrated network structure to obtain an interference judgment result; and processing the intervention judgment result within preset time by adopting a density statistical method to obtain the start-stop time point and the intervention duration of the steering wheel being intervened. The method has the advantages of few dependent hardware equipment, convenient realization, elimination of false detection and missing detection in the detection process, high accuracy of the detection result and capability of counting the detection result so as to be more intuitively embodied.

Description

Steering wheel intervention detection and statistics method based on vision
Technical Field
The invention belongs to the field of steering wheel intervention detection, and particularly relates to a steering wheel intervention detection and statistical method based on vision.
Background
Hand detection and gesture recognition are important research problems with good prospects in human-computer interaction and robot application, and have important applications in the fields of sports, security and security, and traffic safety driving.
In the unmanned competition, the number of times of driver's intervention on the steering wheel of the automobile and the duration of each intervention are two important indexes for evaluating the quality of the unmanned system. The driver intervenes the steering wheel directly by touching the hand with the steering wheel, so the steering wheel intervention detection can be converted into the touch detection of the hand with the steering wheel. Most steering wheel touch detection systems, which use sensors for detection (such as pressure sensors), such as the patent document with publication number CN 105143015B, adopt a detection method that the sensors are disposed on the steering wheel, and then determine whether the driver's hands touch the steering wheel according to different signals generated by the sensors, and such a steering wheel touch detection technology is widely used in various Advanced Driver Assistance Systems (ADAS). In fact, besides being collected by a special sensor, the touch signal can be collected by a video image to judge whether the touch is made or not by the logic similar to human eyes. By using the visual method, only one camera is needed, and various complex sensors do not need to be arranged on the steering wheel, so that the space design of the steering wheel and the cab is greatly simplified, and the cost is low.
Disclosure of Invention
The invention aims to provide a vision-based steering wheel intervention detection and statistical method, which has the advantages of few dependent hardware equipment, convenient realization, elimination of false detection and missing detection in the detection process, high accuracy of detection results and capability of counting the detection results so as to be more intuitively embodied.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a vision-based steering wheel intervention detection and statistics method, comprising:
constructing an integrated network structure containing steering wheel detection and intervention attribute identification, and training the integrated network structure by using a sample image;
adopting the integrated network structure, taking a single frame image to be detected as the input of the integrated network structure, taking a steering wheel as a detection target, and judging whether the steering wheel is interfered on the current single frame image according to the interference attribute information output by the integrated network structure to obtain an interference judgment result;
processing the intervention judgment result within preset time by adopting a density statistical method to obtain the start and stop time point and the intervention duration of the steering wheel;
wherein, the processing the intervention judgment result within the preset time by adopting a density statistical method comprises the following steps:
setting the fusion width as N frames, and appointing a frame f in the intervention judgment resulttCounting the frame ftFirst N frames, last N frames and including frame ftThe total frame number is 2N +1, and the number of intervention frames in the intervention state in the total frame number is counted to be NtCalculating to obtain the interference density d within the range of 2N +1 framestAnd is and
Figure GDA0002816666510000021
if d istIf the frame size is more than or equal to 0.5, defining the current 2N +1 frame range as an intervention state; if d ist<0.5, defining the current 2N +1 frame range as a non-interference state;
wherein, the obtaining of the starting and ending time point and the intervention duration of the steering wheel includes: in a time period dt-1To dt+TInternal:
if d ist-1<0.5, and dtIs not less than 0.5, then d is representedtThe corresponding time point is the initial time point of the current intervention state;
if d ist+T-1Not less than 0.5 and dt+T<0.5, and dtTo dt+T-1The interference density judged in the above step is more than or equal to 0.5, then d is representedt+TThe corresponding time point is the termination time point of the current intervention state;
and obtaining the intervention duration of the current intervention state as T; wherein T represents a time period; dtRepresenting the intervention density within the range of the current 2N +1 frames; dt-1Relative to dtThe intervention density within the range of the previous 2N +1 frame; dt+TRepresenting a phaseFor dtIntervention density within the range of 2N +1 frames after T duration; dt+T-1Relative to dt+TThe previous 2N +1 frame.
Preferably, the building of the integrated network structure including steering wheel detection and intervention attribute identification includes:
constructing a basic network structure, wherein the basic network structure comprises 9 convolutional layers and 5 maximum pooling layers;
and setting a plurality of candidate windows in each candidate area on the last layer of feature map of the basic network structure, wherein each candidate window comprises coordinate information of a steering wheel circumscribed rectangle, target judgment information, target category probability information and the intervention attribute information to form an integrated network structure.
Preferably, the training of the integrated network structure by using the sample image includes:
acquiring a driving video, and extracting 1 frame from every N frames in the driving video for storage;
and labeling the stored image to obtain a sample image, wherein the labeling content comprises: whether the coordinates, the target category and the steering wheel of the steering wheel circumscribed rectangle are interfered or not;
randomly dividing sample images to obtain a test set and a training set;
and training the integrated network structure by using the sample images in the training set until the integrated network structure reaches a preset condition by testing the sample images in the testing set.
Preferably, the basic network structure passes through a convolutional layer C1, a maximum pooling layer M1, a convolutional layer C2, a maximum pooling layer M2, a convolutional layer C3, a maximum pooling layer M3, a convolutional layer C4, a maximum pooling layer M4, a convolutional layer C5, a maximum pooling layer M5, a convolutional layer C6, a convolutional layer C7, a convolutional layer C8 and a convolutional layer C9 in sequence from the input layer I.
Preferably, the calculation of the loss function of the intervention attribute information includes: li=(yp_i-pi)2(ii) a Wherein liRepresenting a loss function; y isp_iIndicating the genus of interventionAn output value of the sexual information; p is a radical ofiRepresenting the true value;
gradient calculation of the intervention attribute information, comprising:
Figure GDA0002816666510000031
wherein δ represents a gradient; y isp_iOutput value, p, representing intervention Attribute informationiRepresenting the true value.
Preferably, the value of N is 5-10.
The vision-based steering wheel intervention detection and statistical method provided by the invention designs an end-to-end deep learning network integrating target detection and attribute identification tasks, integrates the detection of hands and a steering wheel into one network, avoids the real-time detection of touching the steering wheel by hands depending on a plurality of networks or strategy modes, simplifies the detection method and improves the detection efficiency; and the video sequence intervention density statistical algorithm based on the time sequence is applied to the steering wheel intervention detection, the false detection and the missed detection in the detection process are eliminated, and the starting time point, the ending time point and the intervention duration time length of the intervention are estimated.
Drawings
FIG. 1 is a block diagram of a method for vision-based steering wheel intervention detection and statistics in accordance with an embodiment of the present invention;
FIG. 2 is a flow chart of one embodiment of a vision-based steering wheel intervention detection and statistics method of the present invention;
FIG. 3 is a schematic diagram of an application scenario of the steering wheel intervention detection of the present invention;
FIG. 4 is a schematic diagram of an image labeled according to the present invention;
FIG. 5 is a schematic diagram of an embodiment of an integrated network architecture of the present invention;
FIG. 6 is a diagram illustrating the effect of detecting steering wheel intervention in a single frame image according to the present invention;
FIG. 7 is an intervention timing diagram of the dry pre-detection output of the present invention in an ideal state;
FIG. 8 is an intervention timing diagram of the dry pre-detection output for the actual state of the invention;
FIG. 9 is a schematic illustration of the intervention density calculation of the present invention;
FIG. 10 is a graph of intervention density calculated from the intervention density profile of FIG. 8 according to the present invention;
FIG. 11 is another intervention density map of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
As shown in fig. 1, the present embodiment provides a method for detecting and counting steering wheel intervention based on vision, which mainly includes two stages:
the first stage is steering wheel intervention detection based on deep learning, a single frame image is used as input, a steering wheel is used as a detection target, a target frame containing the steering wheel is detected by using a target detection algorithm, a feature vector corresponding to the target frame on a terminal feature map is multiplexed, and whether the steering wheel is intervened on the frame image is judged;
and the second stage is based on the intervention density statistics of the image sequence, and the interference caused by false detection and missed detection in the first stage is eliminated by using a density statistical algorithm, so that the starting and stopping time point and the intervention duration of each intervention are obtained.
As shown in fig. 2, the steering wheel intervention detection and statistics method based on vision of the present embodiment is mainly divided into the following 4 steps.
And S1, building a monitoring system in the cab, and collecting the video images of the application scene.
In acquiring a driving video, an application scene video image mainly consists of two parts: firstly, real driver monitoring video is obtained in the unmanned competition process; and secondly, a plurality of private cars are used for simulating the driver monitoring video in the unmanned competition process. The sample data set comprises two parts because the real unmanned competition monitoring video resource is limited, the monitoring visual angle and the environment in the car are single, and the steering wheel is in a non-interference state in a large time period, so that the serious sample imbalance is caused. The monitoring video simulated by the private car mainly makes up the defects of a real match scene, and the sample diversity is increased.
Firstly, adjusting a plurality of monitoring visual angles, and increasing angles and environment diversity by using a plurality of vehicles; secondly, the steering wheel is in an interfered state most of the time when the private car is driven (the steering wheel is not interfered when the private car is in a parking state for a small part of the time), so that the problem of sample imbalance is reduced; thirdly, in the simulation process, a driver specially simulates various driving postures and driving habits, such as a double-hand holding disc, a single-hand holding disc, a palm rubbing disc, a finger hooking disc and the like, so that the diversity of the intervention state sample is increased; finally, various external natural environments are simulated, including dim illumination when passing through a tunnel, strong exposure when in sunny days, and the like. An example of an application scenario is shown in fig. 3.
And a comprehensive driving video is acquired by utilizing rich scenes so as to facilitate later-stage test and training.
S2, manually labeling the images, wherein the labeled content of each image comprises: coordinates of a circumscribed rectangle of the steering wheel (e.g., coordinates at the upper left and lower right corners of the rectangle), whether the steering wheel is touched by hand (e.g., 0 means tampered with, 1 means not tampered with)
The video is disassembled frame by frame, and 1 frame is extracted every N frames and stored as a sample, in this embodiment, N is 25, invalid pictures caused by blur, darkness, transition exposure and the like are further removed, and the remaining pictures are labeled.
Under normal conditions, each picture only contains one steering wheel at most, and the marked contents are coordinates (xmin, ymin, xmax, ymax) at the left upper corner and the right lower corner of a target circumscribed rectangle of the steering wheel, a target type c and a sign i indicating whether the steering wheel is interfered or not. Since there is only one type of target for the steering wheel, c is always 0 (class id starts from 0); in the intervention flag bit, 0 is used to indicate intervention, and 1 indicates non-intervention.
During training, the coordinates of the target frame are converted into coordinates of a central point and width and height (x, y, w, h), the coordinates are divided by the width and height of the original image to be normalized, the finally formed label form is [ c, x, y, w, h, i ], the finally marked schematic diagram of the picture is shown in fig. 4, generally, the label is directly marked on the picture in color, and in this embodiment, in order to clearly show the label, the label is separated from the picture, so that the label is marked above the picture.
The final formed valid data set in this embodiment includes 45798 pictures, where 5798 pictures are randomly selected as the test set, and 40000 pictures are the training sets.
S3, constructing an integrated network structure containing steering wheel detection and intervention attribute identification, and training the integrated network structure by using the sample images to obtain the network structure capable of accurately judging whether the steering wheel is touched by a hand in each frame of image.
The main purpose of this stage is to determine whether the steering wheel is tampered with in a single frame of image. From the technical aspect, the image classification problem is substantially a two-classification problem, and the image classification problem based on deep learning at present has high accuracy, but the classification network requires that a main object in an input picture occupies most of pixels of the whole image, because the principle of the classification network is to extract the features of the whole image for classification after several convolution and pooling operations, if the main object only occupies a small part of pixels of the whole image, the features are easily lost or submerged by the background in the extraction process, and it is difficult to obtain ideal classification accuracy.
Unlike the general classification task, the classification target of the present embodiment is not a certain object but a three-dimensional spatial positional relationship of two objects (hand and steering wheel), the presentation form of such spatial positional relationship on a two-dimensional image is various, and the useful pixels describing such positional relationship occupy very few and the subtleties are easily confused.
In view of the particularity of the application scenario, the embodiment does not adopt a standard classification network, but combines the idea of target detection, detects the intervention relationship as an attribute of each target, and outputs the intervention relationship together with information such as the position coordinates of the target frame, the confidence of the target, the category of the target and the like. The method has two advantages that firstly, the target detection task focuses more on feature extraction of important targets in the image, and the judgment of the intervention relation is only based on the local features indicated by the target frame rather than the whole image, so that background interference can be reduced, and the detection accuracy is improved; and secondly, the intervention relation is used as an attribute of each target, instead of deducting local features and then performing cascade classification network judgment, so that feature reuse is fully realized, the design and training of a complex network are avoided, and the detection process is more real-time.
The key targets in the detection are class 2, hand and steering wheel respectively. The intuitive idea of detecting whether to intervene is to detect the position coordinates of the hand and the steering wheel respectively, and then judge whether to intervene according to the position coordinates, at this time, the detection stage can be simplified into the conventional target detection task of the hand and the steering wheel, however, it is extremely difficult to infer the three-dimensional spatial position relationship from the two-dimensional coordinates, so the scheme is difficult to implement. The other idea is to directly detect whether the hand intervenes in the steering wheel or whether the steering wheel is intervened by the hand, and an artificial rule is not used for reasoning the intervention relation, but the manual rule is handed to the deep neural network for automatic learning and reasoning. The second idea is adopted in this embodiment, and is further simplified to detect only the steering wheel, and output the detected steering wheel as an attribute together with the target detection information.
S31, as shown in fig. 5, constructing an integrated network structure including steering wheel detection and intervention attribute identification, including:
an infrastructure network structure is constructed that includes 9 convolutional layers and 5 max pooling layers. Specifically, the infrastructure network structure passes through a convolutional layer C1, a maximum channelization layer M1, a convolutional layer C2, a maximum channelization layer M2, a convolutional layer C3, a maximum channelization layer M3, a convolutional layer C4, a maximum channelization layer M4, a convolutional layer C5, a maximum channelization layer M5, a convolutional layer C6, a convolutional layer C7, a convolutional layer C8, and a convolutional layer C9 in sequence from an input layer I.
Network parameters of the convolutional layer are represented in the form of [ k _ size, k _ size, channels, stride ], wherein k _ size is the size of a convolutional kernel, channels are the number of output characteristic channels, and stride is a step size; network parameters of the pooling layer are represented by [ k _ size, k _ size and stride ], wherein the k _ size is the size of a pooling kernel, and stride is a step size; the input parameters of each layer adopt [ resolution, resolution, channel ], where resolution is the resolution of the image and channel is the number of channels. Specifically, the infrastructure network structure is shown in table 1.
TABLE 1 infrastructure network architecture
Network layer type Network parameters Inputting parameters
Convolutional layer C1 [3,3,16,1] [416,416,3]
Maximum pooling layer M1 [2,2,2] [416,416,3]
Convolutional layer C2 [3,3,32,1] [208,208,16]
Maximum pooling layer M2 [2,2,2] [208,208,16]
Convolutional layer C3 [3,3,64,1] [104,104,32]
Maximum pooling layer M3 [2,2,2] [104,104,32]
Convolutional layer C4 [3,3,128,1] [52,52,64]
Maximum pooling layer M4 [2,2,2] [52,52,64]
Convolutional layer C5 [3,3,256,1] [26,26,128]
Maximum pooling layer M5 [2,2,2] [26,26,128]
Convolutional layer C6 [3,3,512,1] [13,13,256]
Convolutional layer C7 [3,3,1024,1] [13,13,256]
Convolutional layer C8 [3,3,1024,1] [13,13,256]
Convolutional layer C9 [1,1,35,1] [13,13,256]
In the standard tiny-yolov2 target detection algorithm, 5 anchor boxes are arranged on each anchor on the last layer of feature map of convolution, each anchor box comprises 4 target frame coordinates x, y, w, h information, 1 piece of target judgment information P _ o and probability information P _ c of each type (only 1 piece of information because only one type of target of a steering wheel exists here), so that the number of feature channels of the last layer of feature map of standard target detection is 5 (4+1+1) ═ 30.
However, unlike the standard target detection algorithm, the steering wheel tampering detection of the present embodiment is added by one more tampering attribute information P _ i. In this embodiment, 5 candidate windows (anchor box) are set in each candidate region (anchor) on the last layer of feature map of the basic network structure, and each candidate window includes coordinate information (x, y, w, h information) of a steering wheel circumscribed rectangle, whether target judgment information (P _ o) exists, target category probability information (P _ c), and the intervention attribute information (P _ i), so as to form an integrated network structure.
Furthermore, after the intervention attribute information P _ i is placed in the target class probability information P _ C, it is used to describe the attribute of the target in the frame, so that the parameters output by the last convolutional layer C9 of the integrated network structure of this embodiment are [13,13,35], that is, the number of feature channels of the last feature map layer of the network is 5 × (4+1+1) ═ 35.
Therefore, an integrated network structure for steering wheel intervention detection and intervention attribute identification is formed, the integrated network structure is adopted, a single-frame image to be detected is used as the input of the integrated network structure, the steering wheel is used as a detection target, whether the steering wheel is intervened on the current single-frame image is judged according to the intervention attribute information output by the integrated network structure, and an intervention judgment result can be obtained.
Since the attribute of the embodiment only considers the item of whether the steering wheel is intervened, the network structure and the standard target detection network structure only have one more intervention attribute bit, and in fact, the attribute bit can be expanded, even to tens or hundreds of bits according to requirements. The general attribute recognition system generally adopts a mode of cascading a target detection network and a classification network, firstly detects the coordinates of an interesting target in an image by using a target detection algorithm, then deduces a local area containing the target on an original image according to the coordinates, and finally inputs the local area into a classification network to output the target attribute.
The target detection and attribute identification integrated network structure provided by the embodiment combines the detection task and the attribute identification task into one network, fully multiplexes the detection network characteristic information, and avoids the problems of serious time consumption, serious memory consumption and the like of a cascade network. The network structure provided by the embodiment still keeps the characteristic of end-to-end training, and the trouble of step-by-step training of the cascade network is reduced. The target detection and attribute identification integrated network designed in the embodiment is described in detail by taking the application to steering wheel intervention detection as an example, but the idea can be extended to other intelligent visual applications, such as target tracking, target positioning and the like.
And S32, designing a loss function.
The design of the loss function and the gradient calculation of the standard target detection output (x, y, w, h, P _ o, P _ c) are the same as those in the prior art, for example, as described in yolov2 paper, and will not be described herein again. The design and calculation of the loss function of the intervention attribute information output P _ i added in the integrated network structure of the present embodiment are further described by a specific formula below.
From the sample labeling form, it can be seen that the output of P _ i is actually classified into two categories, linear regression is used for classification, the mean square error is used for describing the loss, and the loss function calculation includes: loss function li=(yp_i-pi)2(ii) a Wherein liRepresenting a loss function; y isp_iOutput value, p, representing intervention Attribute informationiAnd representing the real value, which is the real value of the intervention attribute information.
In order to alleviate the problem of imbalance between samples in the intervention state and the non-intervention state during gradient calculation, the gradient value generated by fewer samples is multiplied by a coefficient to increase the influence of the gradient value on the network weight correction. Because the number of the intervention state samples is less, if the marking information of the intervention flag bit is yp_iWhen 0, that is, the steering wheel is in the intervened state in the picture, and we multiply the calculated gradient by 1.1, the gradient calculation design includes:
Figure GDA0002816666510000081
wherein δ represents a gradient; y isp_iOutput value, p, representing intervention Attribute informationiAnd representing the real value, which is the real value of the intervention attribute information.
And S33, training and testing the integrated network structure.
In the training process, all pictures are input into the network by reissize to 416x416 resolution, the total iteration number max _ batchs is 45000, the size of each batch is 64, the weight attenuation factor lambda is 0.0005, the back propagation uses a momentum method, and the momentum factor v is 0.9. The learning rate attenuation mode is piecewise constant attenuation, the initial learning rate is 0.0001, the learning rate is increased to 0.001 after 100 iterations, and then the learning rate is multiplied by an attenuation factor of 0.1 every 10000 iterations.
In this embodiment, the size of the training set is 40000, and the size of the test set is 5798. After 40000 times of training, the Average detection Precision (AP) of the steering wheel target is 99%, the interference detection accuracy is 90% and the detection speed on the GTX1070 is 6 ms/f. The detection effect is schematically shown in fig. 6, wherein normal indicates that the automobile is in a normal automatic driving state without intervention, i.e. the steering wheel is not touched by hands; unormal indicates an intervention, i.e. the steering wheel is touched by hand and the car is in a state of manual driving.
Calculating and judging according to the loss function of the intervention attribute informationWhether previously in an intervention state: when the output value y of the intervention attribute informationp_i>0.5 means no intervention, and the closer the output value is to 1 means higher reliability of the non-intervention preset; when the output value y of the intervention attribute informationp_iAn intervention is indicated at ≦ 0.5, and a closer output value to 0 indicates a higher confidence in the intervention.
And S4, processing the intervention judgment result within the preset time by adopting a density statistical method to obtain the start and stop time point and the intervention duration of the steering wheel being intervened.
After each frame of image obtained from the driving video is successfully identified, in order to further improve the identification accuracy and visually present whether the video has an intervention state, the embodiment further finds the starting time point and the ending time point of the steering wheel being intervened on the time sequence, so as to calculate the duration of each intervention and count the intervention times of the whole video.
The ideal recognition result of whether the steering wheel is interfered is shown in fig. 7, the interference detection accuracy is 100%, the time sequence analysis is simple, and the interference starting point and the interference duration can be easily counted only by judging whether the interference state occurs. However, in practice, the intervention detection is often in a state as shown in fig. 8, and since there are always errors (missed detection and false detection) in the detection of the steering wheel and the intervention detection, even if the accuracy reaches 99%, a false detection occurs on average in 100 frames, and a false detection occurs on average every 4s when the time sequence is calculated. Obviously, the method directly depends on the result output by the intervention detection network, and the intervention starting and stopping time point and the intervention duration are difficult to correctly analyze.
In order to solve the above problem, in this embodiment, the processing the intervention judgment result within the preset time by using a density statistical method specifically includes, as shown in fig. 9:
setting the fusion width as N frames, and appointing a frame f in the intervention judgment resulttCounting the frame ftFirst N frames, last N frames and including frame ftThe total frame number is 2N +1, and the number of intervention frames in the intervention state in the total frame number is counted to be NtCalculating to obtain the interference density d within the range of 2N +1 framestAnd is and
Figure GDA0002816666510000101
if d istIf the frame size is more than or equal to 0.5, defining the current 2N +1 frame range as an intervention state; if d ist<0.5, the current 2N +1 frame range is defined as the non-interference state.
As shown in fig. 10, the intervention density graph obtained after the intervention density calculation is performed on fig. 8 shows that although a great amount of missed detections and false detections exist in the intervention time sequence output by the detection network, the intervention times can still be accurately counted after the intervention density calculation. The intervention state and the non-intervention state presented in the graph are obviously demarcated, and the condition whether the steering wheel is intervened or not can be clearly and accurately obtained from the intervention density graph.
After determining whether the steering wheel is in the intervention state within each 2N +1 frame range in the above manner, the start-stop time point and the intervention duration of the steering wheel being intervened may be obtained, as shown in fig. 11, which specifically includes: in a time period dt-1To dt+TInternal:
if d ist-1<0.5, and dtIs not less than 0.5, then d is representedtThe corresponding time point is the initial time point of the current intervention state;
if d ist+T-1Not less than 0.5 and dt+T<0.5, and dtTo dt+T-1The interference density judged in the above step is more than or equal to 0.5, then d is representedt+TThe corresponding time point is the termination time point of the current intervention state;
and obtaining the intervention duration of the current intervention state as T; wherein T represents a time period and T>0;dtRepresenting the intervention density within the range of the current 2N +1 frames; dt-1Relative to dtThe intervention density within the range of the previous 2N +1 frame; dt+TRelative to dtIntervention density within the range of 2N +1 frames after T duration; dt+T-1Relative to dt+TThe previous 2N +1 frame.
It should be noted that, since the intervention density statistics requires N frames of intervention detection results before and after the time sequence, the start and end time points of the intervention are delayed by N frames from the real point. Usually, the value of N is between 5 and 10, i.e. between 0.2s and 0.4s, and the delay is acceptable in the statistics of the embodiment.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (6)

1. A vision-based steering wheel intervention detection and statistics method is characterized in that the vision-based steering wheel intervention detection and statistics method comprises the following steps:
constructing an integrated network structure containing steering wheel detection and intervention attribute identification, and training the integrated network structure by using a sample image;
adopting the integrated network structure, taking a single frame image to be detected as the input of the integrated network structure, taking a steering wheel as a detection target, and judging whether the steering wheel is interfered on the current single frame image according to the interference attribute information output by the integrated network structure to obtain an interference judgment result;
processing the intervention judgment result within preset time by adopting a density statistical method to obtain the start and stop time point and the intervention duration of the steering wheel;
wherein, the processing the intervention judgment result within the preset time by adopting a density statistical method comprises the following steps:
setting the fusion width as N frames, and judging whether the frame is an interference frame or notSpecify a frame f in the fruittCounting the frame ftFirst N frames, last N frames and including frame ftThe total frame number is 2N +1, and the number of intervention frames in the intervention state in the total frame number is counted to be NtCalculating to obtain the interference density d within the range of 2N +1 framestAnd is and
Figure FDA0002816666500000011
if d istIf the frame size is more than or equal to 0.5, defining the current 2N +1 frame range as an intervention state; if d ist<0.5, defining the current 2N +1 frame range as a non-interference state;
wherein, the obtaining of the starting and ending time point and the intervention duration of the steering wheel includes: in a time period dt-1To dt+TInternal:
if d ist-1<0.5, and dtIs not less than 0.5, then d is representedtThe corresponding time point is the initial time point of the current intervention state;
if d ist+T-1Not less than 0.5 and dt+T<0.5, and dtTo dt+T-1The interference density judged in the above step is more than or equal to 0.5, then d is representedt+TThe corresponding time point is the termination time point of the current intervention state;
and obtaining the intervention duration of the current intervention state as T; wherein T represents a time period; dtRepresenting the intervention density within the range of the current 2N +1 frames; dt-1Relative to dtThe intervention density within the range of the previous 2N +1 frame; dt+TRelative to dtIntervention density within the range of 2N +1 frames after T duration; dt+T-1Relative to dt+TThe previous 2N +1 frame.
2. The vision-based steering wheel intervention detection and statistics method of claim 1, wherein the constructing an integrated network structure containing steering wheel detection and intervention attribute identification comprises:
constructing a basic network structure, wherein the basic network structure comprises 9 convolutional layers and 5 maximum pooling layers;
and setting a plurality of candidate windows in each candidate area on the last layer of feature map of the basic network structure, wherein each candidate window comprises coordinate information of a steering wheel circumscribed rectangle, target judgment information, target category probability information and the intervention attribute information to form an integrated network structure.
3. The vision-based steering wheel intervention detection and statistics method of claim 2, wherein the training of the unified network structure using sample images comprises:
acquiring a driving video, and extracting 1 frame from every N frames in the driving video for storage;
and labeling the stored image to obtain a sample image, wherein the labeling content comprises: whether the coordinates, the target category and the steering wheel of the steering wheel circumscribed rectangle are interfered or not;
randomly dividing sample images to obtain a test set and a training set;
and training the integrated network structure by using the sample images in the training set until the integrated network structure reaches a preset condition by testing the sample images in the testing set.
4. The vision-based steering wheel intervention detection and statistics method of claim 2, wherein the infrastructure network structure passes sequentially from input level I through convolutional layer C1, max-convolutional layer M1, convolutional layer C2, max-convolutional layer M2, convolutional layer C3, max-convolutional layer M3, convolutional layer C4, max-convolutional layer M4, convolutional layer C5, max-convolutional layer M5, convolutional layer C6, convolutional layer C7, convolutional layer C8, and convolutional layer C9.
5. The vision-based steering wheel intervention detection and statistics method of claim 1, wherein the computation of the penalty function for intervention attribute information comprises: li=(yp_i-pi)2(ii) a Wherein liRepresenting a loss function; y isp_iOutput value representing intervention attribute information;piRepresenting the true value;
gradient calculation of the intervention attribute information, comprising:
Figure FDA0002816666500000021
wherein δ represents a gradient; y isp_iOutput value, p, representing intervention Attribute informationiRepresenting the true value.
6. The vision-based steering wheel intervention detection and statistics method of claim 1, wherein the value of N is 5-10.
CN201910150734.0A 2019-02-28 2019-02-28 Steering wheel intervention detection and statistics method based on vision Active CN110008834B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910150734.0A CN110008834B (en) 2019-02-28 2019-02-28 Steering wheel intervention detection and statistics method based on vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910150734.0A CN110008834B (en) 2019-02-28 2019-02-28 Steering wheel intervention detection and statistics method based on vision

Publications (2)

Publication Number Publication Date
CN110008834A CN110008834A (en) 2019-07-12
CN110008834B true CN110008834B (en) 2021-04-06

Family

ID=67166379

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910150734.0A Active CN110008834B (en) 2019-02-28 2019-02-28 Steering wheel intervention detection and statistics method based on vision

Country Status (1)

Country Link
CN (1) CN110008834B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310841B (en) * 2020-02-24 2023-06-20 中南大学湘雅医院 Medical image classification method, medical image classification device, medical image classification apparatus, medical image classification computer device, and medical image classification storage medium
CN114360321B (en) * 2021-11-09 2023-04-07 易显智能科技有限责任公司 Hand action sensing system, training system and training method for motor vehicle driver
CN118107605B (en) * 2024-04-30 2024-08-02 润芯微科技(江苏)有限公司 Vehicle control method and system based on steering wheel gesture interaction

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944341A (en) * 2017-10-27 2018-04-20 荆门程远电子科技有限公司 Driver based on traffic monitoring image does not fasten the safety belt automatic checkout system

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102547139A (en) * 2010-12-30 2012-07-04 北京新岸线网络技术有限公司 Method for splitting news video program, and method and system for cataloging news videos
CN102324016B (en) * 2011-05-27 2013-06-05 北京东方奔腾信息技术有限公司 Statistical method for high-density crowd flow
CN102263937B (en) * 2011-07-26 2013-07-24 华南理工大学 Driver's driving behavior monitoring device and monitoring method based on video detection
CN102289660B (en) * 2011-07-26 2013-07-03 华南理工大学 Method for detecting illegal driving behavior based on hand gesture tracking
JP5784061B2 (en) * 2013-03-27 2015-09-24 本田技研工業株式会社 Input device, input method, and input program
CN104078039A (en) * 2013-03-27 2014-10-01 广东工业大学 Voice recognition system of domestic service robot on basis of hidden Markov model
DE102013211052B3 (en) * 2013-06-13 2014-12-18 Ford Global Technologies, Llc Detecting a state of a hand-steering wheel touch by means of an observer
CN104092988A (en) * 2014-07-10 2014-10-08 深圳市中控生物识别技术有限公司 Method, device and system for managing passenger flow in public place
CN104207791B (en) * 2014-08-26 2017-02-15 江南大学 Fatigue driving detection method
CN105488957B (en) * 2015-12-15 2018-06-12 小米科技有限责任公司 Method for detecting fatigue driving and device
CN105513354A (en) * 2015-12-22 2016-04-20 电子科技大学 Video-based urban road traffic jam detecting system
CN106372584B (en) * 2016-08-26 2019-06-11 浙江银江研究院有限公司 A kind of video image mosaic detection method
CN106845344B (en) * 2016-12-15 2019-10-25 重庆凯泽科技股份有限公司 Demographics' method and device
CN107274678B (en) * 2017-08-14 2019-05-03 河北工业大学 A kind of night vehicle flowrate and model recognizing method based on Kinect
CN107479044B (en) * 2017-08-23 2020-04-28 西安电子工程研究所 Adaptive track starting method based on point track density real-time statistics
CN108399388A (en) * 2018-02-28 2018-08-14 福州大学 A kind of middle-high density crowd quantity statistics method
CN108647617A (en) * 2018-05-02 2018-10-12 深圳市唯特视科技有限公司 A kind of positioning of driver's hand and grasping analysis method based on convolutional neural networks
CN109151501B (en) * 2018-10-09 2021-06-08 北京周同科技有限公司 Video key frame extraction method and device, terminal equipment and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944341A (en) * 2017-10-27 2018-04-20 荆门程远电子科技有限公司 Driver based on traffic monitoring image does not fasten the safety belt automatic checkout system

Also Published As

Publication number Publication date
CN110008834A (en) 2019-07-12

Similar Documents

Publication Publication Date Title
CN108932500B (en) A kind of dynamic gesture identification method and system based on deep neural network
CN110363140B (en) Human body action real-time identification method based on infrared image
CN110097044B (en) One-stage license plate detection and identification method based on deep learning
CN109284670B (en) Pedestrian detection method and device based on multi-scale attention mechanism
CN108062525B (en) Deep learning hand detection method based on hand region prediction
CN106886216B (en) Robot automatic tracking method and system based on RGBD face detection
CN101447082B (en) Detection method of moving target on a real-time basis
CN108875600A (en) A kind of information of vehicles detection and tracking method, apparatus and computer storage medium based on YOLO
CN103530600B (en) Licence plate recognition method under complex illumination and system
CN112784810B (en) Gesture recognition method, gesture recognition device, computer equipment and storage medium
CN110008834B (en) Steering wheel intervention detection and statistics method based on vision
CN111767878B (en) Deep learning-based traffic sign detection method and system in embedded device
CN107273832B (en) License plate recognition method and system based on integral channel characteristics and convolutional neural network
CN113420607A (en) Multi-scale target detection and identification method for unmanned aerial vehicle
CN111104903A (en) Depth perception traffic scene multi-target detection method and system
CN110119726A (en) A kind of vehicle brand multi-angle recognition methods based on YOLOv3 model
CN110298297A (en) Flame identification method and device
CN110334703B (en) Ship detection and identification method in day and night image
CN112906550B (en) Static gesture recognition method based on watershed transformation
US20230394829A1 (en) Methods, systems, and computer-readable storage mediums for detecting a state of a signal light
CN114049572A (en) Detection method for identifying small target
CN106023249A (en) Moving object detection method based on local binary similarity pattern
CN114973207A (en) Road sign identification method based on target detection
CN106056078A (en) Crowd density estimation method based on multi-feature regression ensemble learning
CN111259736B (en) Real-time pedestrian detection method based on deep learning in complex environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant