CN110008834B

CN110008834B - Steering wheel intervention detection and statistics method based on vision

Info

Publication number: CN110008834B
Application number: CN201910150734.0A
Authority: CN
Inventors: 程球; 张雪莲; 毛泉涌; 文凌艳; 周明政; 赵云; 胡芳芳; 王军; 谢兰青
Original assignee: CETHIK Group Ltd
Current assignee: CETHIK Group Ltd
Priority date: 2019-02-28
Filing date: 2019-02-28
Publication date: 2021-04-06
Anticipated expiration: 2039-02-28
Also published as: CN110008834A

Abstract

The invention discloses a steering wheel intervention detection and statistics method based on vision, which comprises the following steps: constructing an integrated network structure containing steering wheel detection and intervention attribute identification, and training the integrated network structure by using a sample image; adopting the integrated network structure, taking a single frame image to be detected as the input of the integrated network structure, taking a steering wheel as a detection target, and judging whether the steering wheel is interfered on the current single frame image according to the interference attribute information output by the integrated network structure to obtain an interference judgment result; and processing the intervention judgment result within preset time by adopting a density statistical method to obtain the start-stop time point and the intervention duration of the steering wheel being intervened. The method has the advantages of few dependent hardware equipment, convenient realization, elimination of false detection and missing detection in the detection process, high accuracy of the detection result and capability of counting the detection result so as to be more intuitively embodied.

Description

Steering wheel intervention detection and statistics method based on vision

Technical Field

The invention belongs to the field of steering wheel intervention detection, and particularly relates to a steering wheel intervention detection and statistical method based on vision.

Background

Hand detection and gesture recognition are important research problems with good prospects in human-computer interaction and robot application, and have important applications in the fields of sports, security and security, and traffic safety driving.

In the unmanned competition, the number of times of driver's intervention on the steering wheel of the automobile and the duration of each intervention are two important indexes for evaluating the quality of the unmanned system. The driver intervenes the steering wheel directly by touching the hand with the steering wheel, so the steering wheel intervention detection can be converted into the touch detection of the hand with the steering wheel. Most steering wheel touch detection systems, which use sensors for detection (such as pressure sensors), such as the patent document with publication number CN 105143015B, adopt a detection method that the sensors are disposed on the steering wheel, and then determine whether the driver's hands touch the steering wheel according to different signals generated by the sensors, and such a steering wheel touch detection technology is widely used in various Advanced Driver Assistance Systems (ADAS). In fact, besides being collected by a special sensor, the touch signal can be collected by a video image to judge whether the touch is made or not by the logic similar to human eyes. By using the visual method, only one camera is needed, and various complex sensors do not need to be arranged on the steering wheel, so that the space design of the steering wheel and the cab is greatly simplified, and the cost is low.

Disclosure of Invention

The invention aims to provide a vision-based steering wheel intervention detection and statistical method, which has the advantages of few dependent hardware equipment, convenient realization, elimination of false detection and missing detection in the detection process, high accuracy of detection results and capability of counting the detection results so as to be more intuitively embodied.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a vision-based steering wheel intervention detection and statistics method, comprising:

constructing an integrated network structure containing steering wheel detection and intervention attribute identification, and training the integrated network structure by using a sample image;

adopting the integrated network structure, taking a single frame image to be detected as the input of the integrated network structure, taking a steering wheel as a detection target, and judging whether the steering wheel is interfered on the current single frame image according to the interference attribute information output by the integrated network structure to obtain an interference judgment result;

processing the intervention judgment result within preset time by adopting a density statistical method to obtain the start and stop time point and the intervention duration of the steering wheel;

wherein, the processing the intervention judgment result within the preset time by adopting a density statistical method comprises the following steps:

setting the fusion width as N frames, and appointing a frame f in the intervention judgment result_tCounting the frame f_tFirst N frames, last N frames and including frame f_tThe total frame number is 2N +1, and the number of intervention frames in the intervention state in the total frame number is counted to be N_tCalculating to obtain the interference density d within the range of 2N +1 frames_tAnd is and

if d is_tIf the frame size is more than or equal to 0.5, defining the current 2N +1 frame range as an intervention state; if d is_t<0.5, defining the current 2N +1 frame range as a non-interference state;

wherein, the obtaining of the starting and ending time point and the intervention duration of the steering wheel includes: in a time period d_t-1To d_t+TInternal:

if d is_t-1<0.5, and d_tIs not less than 0.5, then d is represented_tThe corresponding time point is the initial time point of the current intervention state;

if d is_t+T-1Not less than 0.5 and d_t+T<0.5, and d_tTo d_t+T-1The interference density judged in the above step is more than or equal to 0.5, then d is represented_t+TThe corresponding time point is the termination time point of the current intervention state;

and obtaining the intervention duration of the current intervention state as T; wherein T represents a time period; d_tRepresenting the intervention density within the range of the current 2N +1 frames; d_t-1Relative to d_tThe intervention density within the range of the previous 2N +1 frame; d_t+TRepresenting a phaseFor d_tIntervention density within the range of 2N +1 frames after T duration; d_t+T-1Relative to d_t+TThe previous 2N +1 frame.

Preferably, the building of the integrated network structure including steering wheel detection and intervention attribute identification includes:

constructing a basic network structure, wherein the basic network structure comprises 9 convolutional layers and 5 maximum pooling layers;

and setting a plurality of candidate windows in each candidate area on the last layer of feature map of the basic network structure, wherein each candidate window comprises coordinate information of a steering wheel circumscribed rectangle, target judgment information, target category probability information and the intervention attribute information to form an integrated network structure.

Preferably, the training of the integrated network structure by using the sample image includes:

acquiring a driving video, and extracting 1 frame from every N frames in the driving video for storage;

and labeling the stored image to obtain a sample image, wherein the labeling content comprises: whether the coordinates, the target category and the steering wheel of the steering wheel circumscribed rectangle are interfered or not;

randomly dividing sample images to obtain a test set and a training set;

and training the integrated network structure by using the sample images in the training set until the integrated network structure reaches a preset condition by testing the sample images in the testing set.

Preferably, the basic network structure passes through a convolutional layer C1, a maximum pooling layer M1, a convolutional layer C2, a maximum pooling layer M2, a convolutional layer C3, a maximum pooling layer M3, a convolutional layer C4, a maximum pooling layer M4, a convolutional layer C5, a maximum pooling layer M5, a convolutional layer C6, a convolutional layer C7, a convolutional layer C8 and a convolutional layer C9 in sequence from the input layer I.

Preferably, the calculation of the loss function of the intervention attribute information includes: l_i＝(y_{p_i}-p_i)²(ii) a Wherein l_iRepresenting a loss function; y is_{p_i}Indicating the genus of interventionAn output value of the sexual information; p is a radical of_iRepresenting the true value;

gradient calculation of the intervention attribute information, comprising:

wherein δ represents a gradient; y is_{p_i}Output value, p, representing intervention Attribute information_iRepresenting the true value.

Preferably, the value of N is 5-10.

The vision-based steering wheel intervention detection and statistical method provided by the invention designs an end-to-end deep learning network integrating target detection and attribute identification tasks, integrates the detection of hands and a steering wheel into one network, avoids the real-time detection of touching the steering wheel by hands depending on a plurality of networks or strategy modes, simplifies the detection method and improves the detection efficiency; and the video sequence intervention density statistical algorithm based on the time sequence is applied to the steering wheel intervention detection, the false detection and the missed detection in the detection process are eliminated, and the starting time point, the ending time point and the intervention duration time length of the intervention are estimated.

Drawings

FIG. 1 is a block diagram of a method for vision-based steering wheel intervention detection and statistics in accordance with an embodiment of the present invention;

FIG. 2 is a flow chart of one embodiment of a vision-based steering wheel intervention detection and statistics method of the present invention;

FIG. 3 is a schematic diagram of an application scenario of the steering wheel intervention detection of the present invention;

FIG. 4 is a schematic diagram of an image labeled according to the present invention;

FIG. 5 is a schematic diagram of an embodiment of an integrated network architecture of the present invention;

FIG. 6 is a diagram illustrating the effect of detecting steering wheel intervention in a single frame image according to the present invention;

FIG. 7 is an intervention timing diagram of the dry pre-detection output of the present invention in an ideal state;

FIG. 8 is an intervention timing diagram of the dry pre-detection output for the actual state of the invention;

FIG. 9 is a schematic illustration of the intervention density calculation of the present invention;

FIG. 10 is a graph of intervention density calculated from the intervention density profile of FIG. 8 according to the present invention;

FIG. 11 is another intervention density map of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

As shown in fig. 1, the present embodiment provides a method for detecting and counting steering wheel intervention based on vision, which mainly includes two stages:

the first stage is steering wheel intervention detection based on deep learning, a single frame image is used as input, a steering wheel is used as a detection target, a target frame containing the steering wheel is detected by using a target detection algorithm, a feature vector corresponding to the target frame on a terminal feature map is multiplexed, and whether the steering wheel is intervened on the frame image is judged;

and the second stage is based on the intervention density statistics of the image sequence, and the interference caused by false detection and missed detection in the first stage is eliminated by using a density statistical algorithm, so that the starting and stopping time point and the intervention duration of each intervention are obtained.

As shown in fig. 2, the steering wheel intervention detection and statistics method based on vision of the present embodiment is mainly divided into the following 4 steps.

And S1, building a monitoring system in the cab, and collecting the video images of the application scene.

In acquiring a driving video, an application scene video image mainly consists of two parts: firstly, real driver monitoring video is obtained in the unmanned competition process; and secondly, a plurality of private cars are used for simulating the driver monitoring video in the unmanned competition process. The sample data set comprises two parts because the real unmanned competition monitoring video resource is limited, the monitoring visual angle and the environment in the car are single, and the steering wheel is in a non-interference state in a large time period, so that the serious sample imbalance is caused. The monitoring video simulated by the private car mainly makes up the defects of a real match scene, and the sample diversity is increased.

Firstly, adjusting a plurality of monitoring visual angles, and increasing angles and environment diversity by using a plurality of vehicles; secondly, the steering wheel is in an interfered state most of the time when the private car is driven (the steering wheel is not interfered when the private car is in a parking state for a small part of the time), so that the problem of sample imbalance is reduced; thirdly, in the simulation process, a driver specially simulates various driving postures and driving habits, such as a double-hand holding disc, a single-hand holding disc, a palm rubbing disc, a finger hooking disc and the like, so that the diversity of the intervention state sample is increased; finally, various external natural environments are simulated, including dim illumination when passing through a tunnel, strong exposure when in sunny days, and the like. An example of an application scenario is shown in fig. 3.

And a comprehensive driving video is acquired by utilizing rich scenes so as to facilitate later-stage test and training.

S2, manually labeling the images, wherein the labeled content of each image comprises: coordinates of a circumscribed rectangle of the steering wheel (e.g., coordinates at the upper left and lower right corners of the rectangle), whether the steering wheel is touched by hand (e.g., 0 means tampered with, 1 means not tampered with)

The video is disassembled frame by frame, and 1 frame is extracted every N frames and stored as a sample, in this embodiment, N is 25, invalid pictures caused by blur, darkness, transition exposure and the like are further removed, and the remaining pictures are labeled.

Under normal conditions, each picture only contains one steering wheel at most, and the marked contents are coordinates (xmin, ymin, xmax, ymax) at the left upper corner and the right lower corner of a target circumscribed rectangle of the steering wheel, a target type c and a sign i indicating whether the steering wheel is interfered or not. Since there is only one type of target for the steering wheel, c is always 0 (class id starts from 0); in the intervention flag bit, 0 is used to indicate intervention, and 1 indicates non-intervention.

During training, the coordinates of the target frame are converted into coordinates of a central point and width and height (x, y, w, h), the coordinates are divided by the width and height of the original image to be normalized, the finally formed label form is [ c, x, y, w, h, i ], the finally marked schematic diagram of the picture is shown in fig. 4, generally, the label is directly marked on the picture in color, and in this embodiment, in order to clearly show the label, the label is separated from the picture, so that the label is marked above the picture.

The final formed valid data set in this embodiment includes 45798 pictures, where 5798 pictures are randomly selected as the test set, and 40000 pictures are the training sets.

S3, constructing an integrated network structure containing steering wheel detection and intervention attribute identification, and training the integrated network structure by using the sample images to obtain the network structure capable of accurately judging whether the steering wheel is touched by a hand in each frame of image.

The main purpose of this stage is to determine whether the steering wheel is tampered with in a single frame of image. From the technical aspect, the image classification problem is substantially a two-classification problem, and the image classification problem based on deep learning at present has high accuracy, but the classification network requires that a main object in an input picture occupies most of pixels of the whole image, because the principle of the classification network is to extract the features of the whole image for classification after several convolution and pooling operations, if the main object only occupies a small part of pixels of the whole image, the features are easily lost or submerged by the background in the extraction process, and it is difficult to obtain ideal classification accuracy.

Unlike the general classification task, the classification target of the present embodiment is not a certain object but a three-dimensional spatial positional relationship of two objects (hand and steering wheel), the presentation form of such spatial positional relationship on a two-dimensional image is various, and the useful pixels describing such positional relationship occupy very few and the subtleties are easily confused.

In view of the particularity of the application scenario, the embodiment does not adopt a standard classification network, but combines the idea of target detection, detects the intervention relationship as an attribute of each target, and outputs the intervention relationship together with information such as the position coordinates of the target frame, the confidence of the target, the category of the target and the like. The method has two advantages that firstly, the target detection task focuses more on feature extraction of important targets in the image, and the judgment of the intervention relation is only based on the local features indicated by the target frame rather than the whole image, so that background interference can be reduced, and the detection accuracy is improved; and secondly, the intervention relation is used as an attribute of each target, instead of deducting local features and then performing cascade classification network judgment, so that feature reuse is fully realized, the design and training of a complex network are avoided, and the detection process is more real-time.

The key targets in the detection are class 2, hand and steering wheel respectively. The intuitive idea of detecting whether to intervene is to detect the position coordinates of the hand and the steering wheel respectively, and then judge whether to intervene according to the position coordinates, at this time, the detection stage can be simplified into the conventional target detection task of the hand and the steering wheel, however, it is extremely difficult to infer the three-dimensional spatial position relationship from the two-dimensional coordinates, so the scheme is difficult to implement. The other idea is to directly detect whether the hand intervenes in the steering wheel or whether the steering wheel is intervened by the hand, and an artificial rule is not used for reasoning the intervention relation, but the manual rule is handed to the deep neural network for automatic learning and reasoning. The second idea is adopted in this embodiment, and is further simplified to detect only the steering wheel, and output the detected steering wheel as an attribute together with the target detection information.

S31, as shown in fig. 5, constructing an integrated network structure including steering wheel detection and intervention attribute identification, including:

an infrastructure network structure is constructed that includes 9 convolutional layers and 5 max pooling layers. Specifically, the infrastructure network structure passes through a convolutional layer C1, a maximum channelization layer M1, a convolutional layer C2, a maximum channelization layer M2, a convolutional layer C3, a maximum channelization layer M3, a convolutional layer C4, a maximum channelization layer M4, a convolutional layer C5, a maximum channelization layer M5, a convolutional layer C6, a convolutional layer C7, a convolutional layer C8, and a convolutional layer C9 in sequence from an input layer I.

Network parameters of the convolutional layer are represented in the form of [ k _ size, k _ size, channels, stride ], wherein k _ size is the size of a convolutional kernel, channels are the number of output characteristic channels, and stride is a step size; network parameters of the pooling layer are represented by [ k _ size, k _ size and stride ], wherein the k _ size is the size of a pooling kernel, and stride is a step size; the input parameters of each layer adopt [ resolution, resolution, channel ], where resolution is the resolution of the image and channel is the number of channels. Specifically, the infrastructure network structure is shown in table 1.

TABLE 1 infrastructure network architecture

Network layer type	Network parameters	Inputting parameters
			Convolutional layer C1	[3,3,16,1]	[416,416,3]
Maximum pooling layer M1	[2,2,2]	[416,416,3]
			Convolutional layer C2	[3,3,32,1]	[208,208,16]
Maximum pooling layer M2	[2,2,2]	[208,208,16]
			Convolutional layer C3	[3,3,64,1]	[104,104,32]
Maximum pooling layer M3	[2,2,2]	[104,104,32]
			Convolutional layer C4	[3,3,128,1]	[52,52,64]
Maximum pooling layer M4	[2,2,2]	[52,52,64]
			Convolutional layer C5	[3,3,256,1]	[26,26,128]
Maximum pooling layer M5	[2,2,2]	[26,26,128]
			Convolutional layer C6	[3,3,512,1]	[13,13,256]
Convolutional layer C7	[3,3,1024,1]	[13,13,256]
			Convolutional layer C8	[3,3,1024,1]	[13,13,256]
Convolutional layer C9	[1,1,35,1]	[13,13,256]

In the standard tiny-yolov2 target detection algorithm, 5 anchor boxes are arranged on each anchor on the last layer of feature map of convolution, each anchor box comprises 4 target frame coordinates x, y, w, h information, 1 piece of target judgment information P _ o and probability information P _ c of each type (only 1 piece of information because only one type of target of a steering wheel exists here), so that the number of feature channels of the last layer of feature map of standard target detection is 5 (4+1+1) ═ 30.

However, unlike the standard target detection algorithm, the steering wheel tampering detection of the present embodiment is added by one more tampering attribute information P _ i. In this embodiment, 5 candidate windows (anchor box) are set in each candidate region (anchor) on the last layer of feature map of the basic network structure, and each candidate window includes coordinate information (x, y, w, h information) of a steering wheel circumscribed rectangle, whether target judgment information (P _ o) exists, target category probability information (P _ c), and the intervention attribute information (P _ i), so as to form an integrated network structure.

Furthermore, after the intervention attribute information P _ i is placed in the target class probability information P _ C, it is used to describe the attribute of the target in the frame, so that the parameters output by the last convolutional layer C9 of the integrated network structure of this embodiment are [13,13,35], that is, the number of feature channels of the last feature map layer of the network is 5 × (4+1+1) ═ 35.

Therefore, an integrated network structure for steering wheel intervention detection and intervention attribute identification is formed, the integrated network structure is adopted, a single-frame image to be detected is used as the input of the integrated network structure, the steering wheel is used as a detection target, whether the steering wheel is intervened on the current single-frame image is judged according to the intervention attribute information output by the integrated network structure, and an intervention judgment result can be obtained.

Since the attribute of the embodiment only considers the item of whether the steering wheel is intervened, the network structure and the standard target detection network structure only have one more intervention attribute bit, and in fact, the attribute bit can be expanded, even to tens or hundreds of bits according to requirements. The general attribute recognition system generally adopts a mode of cascading a target detection network and a classification network, firstly detects the coordinates of an interesting target in an image by using a target detection algorithm, then deduces a local area containing the target on an original image according to the coordinates, and finally inputs the local area into a classification network to output the target attribute.

The target detection and attribute identification integrated network structure provided by the embodiment combines the detection task and the attribute identification task into one network, fully multiplexes the detection network characteristic information, and avoids the problems of serious time consumption, serious memory consumption and the like of a cascade network. The network structure provided by the embodiment still keeps the characteristic of end-to-end training, and the trouble of step-by-step training of the cascade network is reduced. The target detection and attribute identification integrated network designed in the embodiment is described in detail by taking the application to steering wheel intervention detection as an example, but the idea can be extended to other intelligent visual applications, such as target tracking, target positioning and the like.

And S32, designing a loss function.

The design of the loss function and the gradient calculation of the standard target detection output (x, y, w, h, P _ o, P _ c) are the same as those in the prior art, for example, as described in yolov2 paper, and will not be described herein again. The design and calculation of the loss function of the intervention attribute information output P _ i added in the integrated network structure of the present embodiment are further described by a specific formula below.

From the sample labeling form, it can be seen that the output of P _ i is actually classified into two categories, linear regression is used for classification, the mean square error is used for describing the loss, and the loss function calculation includes: loss function l_i＝(y_{p_i}-p_i)²(ii) a Wherein l_iRepresenting a loss function; y is_{p_i}Output value, p, representing intervention Attribute information_iAnd representing the real value, which is the real value of the intervention attribute information.

In order to alleviate the problem of imbalance between samples in the intervention state and the non-intervention state during gradient calculation, the gradient value generated by fewer samples is multiplied by a coefficient to increase the influence of the gradient value on the network weight correction. Because the number of the intervention state samples is less, if the marking information of the intervention flag bit is y_{p_i}When 0, that is, the steering wheel is in the intervened state in the picture, and we multiply the calculated gradient by 1.1, the gradient calculation design includes:

wherein δ represents a gradient; y is_{p_i}Output value, p, representing intervention Attribute information_iAnd representing the real value, which is the real value of the intervention attribute information.

And S33, training and testing the integrated network structure.

In the training process, all pictures are input into the network by reissize to 416x416 resolution, the total iteration number max _ batchs is 45000, the size of each batch is 64, the weight attenuation factor lambda is 0.0005, the back propagation uses a momentum method, and the momentum factor v is 0.9. The learning rate attenuation mode is piecewise constant attenuation, the initial learning rate is 0.0001, the learning rate is increased to 0.001 after 100 iterations, and then the learning rate is multiplied by an attenuation factor of 0.1 every 10000 iterations.

In this embodiment, the size of the training set is 40000, and the size of the test set is 5798. After 40000 times of training, the Average detection Precision (AP) of the steering wheel target is 99%, the interference detection accuracy is 90% and the detection speed on the GTX1070 is 6 ms/f. The detection effect is schematically shown in fig. 6, wherein normal indicates that the automobile is in a normal automatic driving state without intervention, i.e. the steering wheel is not touched by hands; unormal indicates an intervention, i.e. the steering wheel is touched by hand and the car is in a state of manual driving.

Calculating and judging according to the loss function of the intervention attribute informationWhether previously in an intervention state: when the output value y of the intervention attribute information_{p_i}>0.5 means no intervention, and the closer the output value is to 1 means higher reliability of the non-intervention preset; when the output value y of the intervention attribute information_{p_i}An intervention is indicated at ≦ 0.5, and a closer output value to 0 indicates a higher confidence in the intervention.

And S4, processing the intervention judgment result within the preset time by adopting a density statistical method to obtain the start and stop time point and the intervention duration of the steering wheel being intervened.

After each frame of image obtained from the driving video is successfully identified, in order to further improve the identification accuracy and visually present whether the video has an intervention state, the embodiment further finds the starting time point and the ending time point of the steering wheel being intervened on the time sequence, so as to calculate the duration of each intervention and count the intervention times of the whole video.

The ideal recognition result of whether the steering wheel is interfered is shown in fig. 7, the interference detection accuracy is 100%, the time sequence analysis is simple, and the interference starting point and the interference duration can be easily counted only by judging whether the interference state occurs. However, in practice, the intervention detection is often in a state as shown in fig. 8, and since there are always errors (missed detection and false detection) in the detection of the steering wheel and the intervention detection, even if the accuracy reaches 99%, a false detection occurs on average in 100 frames, and a false detection occurs on average every 4s when the time sequence is calculated. Obviously, the method directly depends on the result output by the intervention detection network, and the intervention starting and stopping time point and the intervention duration are difficult to correctly analyze.

In order to solve the above problem, in this embodiment, the processing the intervention judgment result within the preset time by using a density statistical method specifically includes, as shown in fig. 9:

if d is_tIf the frame size is more than or equal to 0.5, defining the current 2N +1 frame range as an intervention state; if d is_t<0.5, the current 2N +1 frame range is defined as the non-interference state.

As shown in fig. 10, the intervention density graph obtained after the intervention density calculation is performed on fig. 8 shows that although a great amount of missed detections and false detections exist in the intervention time sequence output by the detection network, the intervention times can still be accurately counted after the intervention density calculation. The intervention state and the non-intervention state presented in the graph are obviously demarcated, and the condition whether the steering wheel is intervened or not can be clearly and accurately obtained from the intervention density graph.

After determining whether the steering wheel is in the intervention state within each 2N +1 frame range in the above manner, the start-stop time point and the intervention duration of the steering wheel being intervened may be obtained, as shown in fig. 11, which specifically includes: in a time period d_t-1To d_t+TInternal:

and obtaining the intervention duration of the current intervention state as T; wherein T represents a time period and T>0；d_tRepresenting the intervention density within the range of the current 2N +1 frames; d_t-1Relative to d_tThe intervention density within the range of the previous 2N +1 frame; d_t+TRelative to d_tIntervention density within the range of 2N +1 frames after T duration; d_t+T-1Relative to d_t+TThe previous 2N +1 frame.

It should be noted that, since the intervention density statistics requires N frames of intervention detection results before and after the time sequence, the start and end time points of the intervention are delayed by N frames from the real point. Usually, the value of N is between 5 and 10, i.e. between 0.2s and 0.4s, and the delay is acceptable in the statistics of the embodiment.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A vision-based steering wheel intervention detection and statistics method is characterized in that the vision-based steering wheel intervention detection and statistics method comprises the following steps:

setting the fusion width as N frames, and judging whether the frame is an interference frame or notSpecify a frame f in the fruit_tCounting the frame f_tFirst N frames, last N frames and including frame f_tThe total frame number is 2N +1, and the number of intervention frames in the intervention state in the total frame number is counted to be N_tCalculating to obtain the interference density d within the range of 2N +1 frames_tAnd is and

and obtaining the intervention duration of the current intervention state as T; wherein T represents a time period; d_tRepresenting the intervention density within the range of the current 2N +1 frames; d_t-1Relative to d_tThe intervention density within the range of the previous 2N +1 frame; d_t+TRelative to d_tIntervention density within the range of 2N +1 frames after T duration; d_t+T-1Relative to d_t+TThe previous 2N +1 frame.

2. The vision-based steering wheel intervention detection and statistics method of claim 1, wherein the constructing an integrated network structure containing steering wheel detection and intervention attribute identification comprises:

3. The vision-based steering wheel intervention detection and statistics method of claim 2, wherein the training of the unified network structure using sample images comprises:

randomly dividing sample images to obtain a test set and a training set;

4. The vision-based steering wheel intervention detection and statistics method of claim 2, wherein the infrastructure network structure passes sequentially from input level I through convolutional layer C1, max-convolutional layer M1, convolutional layer C2, max-convolutional layer M2, convolutional layer C3, max-convolutional layer M3, convolutional layer C4, max-convolutional layer M4, convolutional layer C5, max-convolutional layer M5, convolutional layer C6, convolutional layer C7, convolutional layer C8, and convolutional layer C9.

5. The vision-based steering wheel intervention detection and statistics method of claim 1, wherein the computation of the penalty function for intervention attribute information comprises: l_i＝(y_{p_i}-p_i)²(ii) a Wherein l_iRepresenting a loss function; y is_{p_i}Output value representing intervention attribute information；p_iRepresenting the true value;

gradient calculation of the intervention attribute information, comprising:

6. The vision-based steering wheel intervention detection and statistics method of claim 1, wherein the value of N is 5-10.