CN109871799A - A kind of driver based on deep learning plays the detection method of mobile phone behavior - Google Patents
A kind of driver based on deep learning plays the detection method of mobile phone behavior Download PDFInfo
- Publication number
- CN109871799A CN109871799A CN201910106254.4A CN201910106254A CN109871799A CN 109871799 A CN109871799 A CN 109871799A CN 201910106254 A CN201910106254 A CN 201910106254A CN 109871799 A CN109871799 A CN 109871799A
- Authority
- CN
- China
- Prior art keywords
- mobile phone
- bounding box
- dynamic object
- object region
- hand
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Abstract
The invention discloses a kind of, and the driver based on deep learning plays the detection method of mobile phone behavior, on the basis of deep learning carries out target detection, it proposes that the video of acquisition is carried out to pre-process and optimize its convolutional neural networks, frame image is converted into the video sample that driver's cabin plays mobile phone by acquisition driver, collected video is subjected to dynamically track processing, by the target area being mutually disturbed separately training detection, advantage is to substantially improve the real-time and accuracy of deep learning detection target, largely reduce and calculates the time, substantially increase the feature extraction accuracy rate of hand and mobile phone.
Description
Technical field
The present invention relates to the detection methods that a kind of driver plays mobile phone behavior, more particularly, to a kind of based on deep learning
The detection method of driver's object for appreciation mobile phone behavior.
Background technique
Mobile phone is as the greatest invention of 20th century mankind, while allowing the communication of people to become easy and convenient, also gradually
Many negative effects are increased to our life.For example, in traffic safety, driver usually vehicle rareness section,
Traffic lights generate during waiting and play mobile phone behavior.According to measuring and calculating, an emergency situation is encountered when being formed by reaction when driver plays mobile phone
Between it is also slower by 30% than driving when intoxicated, high 4 times of likelihood ratio normal driving or so of traffic accident occur when playing mobile phone, it is seen that
It is big that security risk existing for mobile phone is played when driving.
" People's Republic of China Road Traffic Safety Law Implementation Regulations " the 62nd article of Section 3 and 123 command of the Ministry of Public Security
Middle clear stipulaties can fine to the driver in driving procedure using mobile phone 200 yuan, detain 2 points of punishment.But in reality
This punishment measure is performed practically no function substantially in the traffic administration of border.Main cause is that driving plays the illegal activities of mobile phone by artificial
Management cost is too big, and cannot still be identified well by camera.So if to manage driver plays mobile phone row when driving
For it is very necessary for detecting driver with the presence or absence of mobile phone phenomenon is played by target detection technique.
Currently, also less about the detection research that driver plays mobile phone behavior, common driver generates on the way in driving
The monitoring and managing method for playing mobile phone behavior mainly has three classes.One is artificially enforced the law based on traffic administration personnel, i.e. the law enforcement such as traffic police people
Member estimates driver with the presence or absence of object for appreciation mobile phone behavior by artificial mode.Secondly, the detection method based on mobile phone signal, lead to
It crosses on vehicle or whether generates telephone signal in the vehicle of road monitoring region installation mobile phone signal receiver monitoring traveling,
If the vehicle in traveling is detected telephone signal and then shoots driver's picture, video camera evidence obtaining is carried out.Thirdly, based on taking the photograph
As the detection method of head, camera may be mounted at the monitoring region that vehicle interior or vehicle pass through, and be existed by shooting driver
Video or picture in driving procedure, are analyzed with computer vision technique, and detection driver plays mobile phone behavior.Due to
The method low efficiency of personal monitoring can not collect evidence, and have a significant limitation, thus we only analyze based on mobile phone signal and
Driver, which is carried out, based on camera plays two methods of mobile phone behavior monitoring.
(1) based on the detection method of mobile phone signal
Rodr í guez-Ascariz etc. proposes that a kind of driver uses the autoelectrinic system of mobile phone, passes through electronic circuit
Radio frequency captures driver's generated electric power when using mobile phone, and with two antennas and signal positioned at vehicle interior
Parser identifies when driver is using mobile phone.Zhi Lukui etc. invents a kind of cellular phone signal shielding device, by vehicle
The systems such as existing sensing, braking in auxiliary system, and utilize the microwave transport equipments such as bluetooth, infrared ray, radar and automobile
High precision be integrated, do not influence driver safety drive under the premise of, by 0.5 square metre of small range around driver
Interior mobile phone signal shielding.A possibility that the method directly plays mobile phone from source cutting driver, but when driver encounters critical
Quarter is not available mobile phone, there is very big security risk.Bo C etc. has devised and embodied TEXIVE method, utilizes inertia sensing
Device detection driver is irregular and fine motion abundant is made to detect driver and driven on the way using personal smart phone row
For this method can also distinguish between driver and passenger, reduce the interference for playing mobile phone behavioral value to driver.Leem S etc.
It is proposed that a kind of impulse radio ultra-wideband radar (IW-UWB) monitors driver's vital sign, breathing, heart rate, mobile phone signal etc.
Driver's relevant abnormalities phenomenon, detection system have various movements or background object to change in the car, are also able to detect driving
The mobile phone service condition of member.Such detection method based on mobile phone signal will usually install sensor, at high cost, and signal holds
Easy erroneous detection does not have very strong social utility's property.
(2) detection method based on computer vision
Tsinghua University proposes a kind of recognition methods for the behavior of making a phone call.In the car by camera installation, real-time monitoring
The face of driver expands each 1/2 human face region around in the human face region that positioning obtains, calculates and hand-held electricity whether occur
The aspect of model of words, to judge driver with the presence or absence of behavior of making a phone call.Wang D etc. proposes a kind of based on vehicle-mounted camera
Driver play mobile phone detection method, on the windshield by camera installation, using movable analytical algorithm by phone activity point
Solution is three movements, and indicates using AoG figure the time relationship between the hierarchical structure and movement of phone activity, thus
Differentiate whether driver is used mobile phone.Torres R et al. is proposed with the mobile electricity of deep learning target detection technique detection
Words extract mobile phone by a large amount of picture training that is, by collected driver's video input convolutional neural networks
Feature, there are the detections and classification when mobile phone for Lai Shixian driver's cabin, thus analyze driver with the presence or absence of play mobile phone behavior.
Detection method based on computer vision there are camera devices it is at low cost, contactless the advantages that, be most prospect at present
Detection method.Wherein, traditional object detection method, i.e., the detection method of artificial settings feature combining classification device is to non-rigid
Property object detection effect is poor.And the object detection method of deep learning can detecte non-rigid object, but detects accuracy and need
Improve, real-time is also to be improved.
Summary of the invention
It is high that technical problem to be solved by the invention is to provide a kind of detection accuracy, strong real-time based on deep learning
Driver play mobile phone behavior detection method.
The technical scheme of the invention to solve the technical problem is: a kind of driver based on deep learning plays hand
The detection method of machine behavior, detection method include:
Acquisition driver is converted into frame image in the video sample that driver's cabin plays mobile phone, subtracts each other respective pixel value to obtain difference
Partial image, then given threshold is to difference image binaryzation, the dynamic object image after binaryzation is filtered, is expanded,
The subsequent arithmetics such as corrosion remove isolated noise point, obtain dynamic object region, then carry out the dynamic object region of acquisition
Horizontal, upright projection, finding dynamic object region, four critical points are split up and down, obtain dynamic object region packet
Enclose box;
Obtained dynamic object region bounding box is subjected to dynamic object coarse positioning using normalized bending operation, will be moved
State target area bounding box is unified into the consistent square of resolution ratio;
Dynamic object region bounding box after normalization is input in the convolutional neural networks after training, uses classification
Device carries out target classification to the dynamic object region bounding box of input, and differentiation is sold and two class target of mobile phone, provides hand and hand
Machine label obtains in one's hands and mobile phone confidence, removes jamming target according to confidence threshold value, then uses frame
It returns device and regional frame positioning is carried out to dynamic object region bounding box, obtain in one's hands and two class of mobile phone dynamic object region and surround
Box;
If the distance between dynamic object region bounding box of dynamic object region bounding box of hand and mobile phone is less than etc.
In the radius of hand dynamic object region bounding box and the sum of the radius of mobile phone dynamic object region bounding box, then determine that two classes are dynamic
State target area bounding box, which exists, is overlapped phenomenon, and driver, which exists, plays mobile phone behavior;If the dynamic object region bounding box of hand
The distance between dynamic object region bounding box of mobile phone is greater than the radius and mobile phone of the dynamic object region bounding box of hand
The sum of the radius of dynamic object region bounding box then determines that two class dynamic object region bounding boxs there is no phenomenon is overlapped, drive
There is no play mobile phone behavior by member.
Threshold value to difference image binaryzation is 45~60, in identical ambient brightness, if consecutive frame image
Respective pixel value variation be less than threshold value, by the pixel in these regions be labeled as background pixel;If the correspondence of consecutive frame image
Pixel value variation is greater than threshold value, and the pixel in these regions is labeled as object pixel.
The confidence threshold value is 0.88~0.92, preferably 0.9.
The training step of convolutional neural networks are as follows: acquisition object for appreciation picture of mobile telephone 15000, which is opened, from network is fabricated to comprising only
The positive sample collection that hand, only mobile phone, hand and mobile phone exist simultaneously tertiary target picture, acquires 20000 hands and hand from network
The picture making of complex background around machine is at negative sample collection;
Collected data set is labeled, the label of hand and mobile phone is made, is converted into computer operation format;
The data of hand and mobile phone after conversion are inputted two independent neural networks respectively to be trained, first mark hand
Portion region and background are inputted in hand network and are trained, and are then carried out the classification of hand classifier and are returned the positioning of device,
Confidence and bounding box are provided, at the same time, phone area and background is marked, is trained in input handset network, with
The classification of mobile phone classifier is carried out afterwards and returns the positioning of device, provides confidence and bounding box;
After the completion of training, the training data of two independent neural networks is merged to the convolutional Neural net after being trained
Network.
The neural network is the neural network by optimization, and specific optimization method is as follows:
1) optimize convolution nuclear volume
(1) computable matrix, i.e. gray level co-occurrence matrixes are converted by Feature Mapping figure: take in image arbitrary point (x,
Y) and deviate it a bit (x+a, y+b), if (x, y) point gray value be (i, j);Point (x, y) is enabled to move on the entire image
It is dynamic, the value of various (i, j) is obtained, if gray scale numerical series is k, then the value of gray value (i, j) shares k2Kind, it traverses whole
A image counts the number of each (i, j) appearance, is arranged in a square matrix, obtained number is normalized to occur general
Rate p (i, j), p (i, j) square matrix generated are gray level co-occurrence matrixes;
(2) entropy in gray level co-occurrence matrixes is calculated, the entropy in gray level co-occurrence matrixes meets formula:
Wherein, p (i, j) refers to that the gray level co-occurrence matrixes after normalization, Ent are grayscale image entropy;
(3) entropy calculated is subjected to size sequence, and sets the convolution kernel proportion threshold value for needing to delete, such as following formula
It is shown:
Wherein, DETo need the convolution kernel number deleted, NEThe convolution kernel number possessed by current layer will need to delete
Threshold value Thershold be set as 10%, retain 90% convolution kernel.
Compared with the prior art, the advantages of the present invention are as follows deep learning carry out target detection on the basis of, propose pair
The video of acquisition pre-process and optimize to its convolutional neural networks, substantially improves deep learning detection target
Real-time and accuracy, and preferable experimental result is achieved, the video sample of mobile phone is played in driver's cabin by acquisition driver
It is converted into frame image, collected video is subjected to dynamically track processing, i.e., thick target positioning is carried out simultaneously to dynamic abnormal region
The candidate region normalization that positioning is obtained, the coarse positioning of dynamic area, which avoids, mentions the candidate region of whole picture
It takes, largely reduces and calculate the time, normalization then facilitates subsequent convolutional neural networks to extract feature;It will normalization
Candidate region afterwards is input in the convolutional neural networks after training, is different from traditional multiple target areas of CNN classification
Domain, by the target area being mutually disturbed, separately training detection substantially increases the feature extraction accuracy rate of hand and mobile phone, and passes through
The method of invalid convolution kernel in convolutional layer is reduced to greatly improve detection real-time while guaranteeing detection accuracy;By right
Hand and mobile phone carry out labeling, show confidence and provide recurrence bounding box, then the distance threshold of both settings can
Mobile phone behavior is played accurately to judge whether there is.
Convolutional neural networks are optimized, first is that effectively being deleted the convolution nuclear volume in convolutional layer, to reduce
The time of calculating;Second is that separately training each target with individual network when training, the accuracy of detection can be improved.
By 1 experimental data of table it is found that the network of different convolution nuclear volumes to be carried out to the training of same sample database, completely
Alexnet possesses 95.7% accuracy rate, and the model when retaining 90% convolution kernel number still possess 93.3% it is accurate
Rate, accuracy rate only have dropped two percentage points or so than complete model, but the calculating time of model but reduce it is 4 one-tenth nearly,
The volume of model is also to be reduced to 34MB from 212MB, is reduced more than 6 times, and one has been reached between accuracy rate and real-time
Preferable balance.
Table 1
Separately training the advantages of two convolutional neural networks mainly has at 3 points: first, hand and mobile phone when detecting will not be because
Mutual accuracy rate is influenced to shade one another;Second, the presence both more easily determined after the bounding box both provided is closed
System;Third, due to two networks be it is independent, can equipment relatively fall behind in the case where, by two networks separate in two electricity
It is handled simultaneously on brain, finally summarizes experimental result on a computer, reduce the calculating time of its network;
In addition, our additional one traditional single convolutional neural networks of training be used for the method after optimization of the invention into
Row analysis is compared.Sample is subjected to once-through operation, i.e., marks hand and mobile phone in image simultaneously, is entered into a convolution
It is trained in neural network, then with classifier classification hand and mobile phone, returns device positioning hand and mobile phone, provide setting for the two
Confidence score and bounding box.
The classification accuracy that network and the single convolutional neural networks of tradition are separately trained using the confusion matrix assessment present invention, is mixed
The matrix that confuses is an error matrix, is commonly used to visually assess the performance of supervised learning algorithm, size be n_classes ×
N_classes, wherein n_classes indicates that n_classes is divided into three classes by the quantity of class herein: hand, mobile phone, background.
Firstly, by background set be in confusion matrix 1, hand be set as in confusion matrix 2, mobile phone is set as obscuring
3 in matrix;
It is opened secondly, randomly selecting test sample 200, these pictures is input in single network and separately trained network and are carried out
The analysis of confusion matrix is compared.
Finally, comparing the numerical value of confusion matrix it is found that single network is by one's hands and mobile phone target area overlapping phenomenon interference
Seriously, hand and mobile phone are usually divided into background by mistake, and accuracy rate is low;And it is changed to separate between training network defensive position and mobile phone
Overlapping interference problem no longer interacts, and whole discrimination improves 7% or so than single network.
As shown in table 2, this paper algorithm still uses traditional object detection process, i.e., first looks for compared with R-CNN model
Candidate target object out extracts feature, reuses classifier and recurrence device carries out detection identification.The difference is that changing candidate
Method for extracting region optimizes the convolution kernel of convolutional neural networks, so that time loss reduces 47 times or so, than Fast R-
CNN algorithm is nearly 20 times fast, 4 times or so faster than Faster R-CNN algorithm.In this test, this algorithm and at present master
The Faster R-CNN model of stream is compared, and no matter has been above in accuracy rate or in real-time the latter, and Faster R-
Network RPN is extracted in the candidate region of CNN to be run on GPU, and the dynamic tracking algorithm of this paper algorithm is run on CPU, right
The requirement of equipment is lower, and the algorithm is extremely strong to the fault-tolerance of light, background, body posture etc., has stronger popularization meaning
Justice.
2 models comparison results of table
Detailed description of the invention
Fig. 1 is target area illustraton of model when detecting in the embodiment of the present invention.
Specific embodiment
The present invention will be described in further detail below with reference to the embodiments of the drawings.
Embodiment: a kind of driver based on deep learning plays the detection method of mobile phone behavior, and detection method includes:
Acquisition driver is converted into frame image in the video sample that driver's cabin plays mobile phone, subtracts each other respective pixel value to obtain difference
Partial image, then for given threshold to difference image binaryzation, the threshold value to difference image binaryzation is 45~60, bright in environment
In the case where degree variation less, if the variation of respective pixel value is less than pre-determined threshold value, it is believed that be herein background picture
Element;If the pixel value variation of image-region is very greatly, it is believed that this is because in image caused by moving object, by these
Zone marker is foreground pixel;Dynamic object image after binaryzation such as is filtered, expands, corroding at the subsequent arithmetics, and removal is lonely
Vertical noise spot, obtains dynamic object region;Then the dynamic object region of acquisition is subjected to horizontal, upright projection, finds dynamic
Four critical points are split up and down for target area, obtain dynamic object region bounding box;
Obtained dynamic object region bounding box is subjected to dynamic object coarse positioning using normalized bending operation, will be moved
State target area bounding box is unified into the consistent square of resolution ratio;Its purpose is first is that many classifiers only take fixed size
Image is input in classifier, and classifier can handle deformed image, will not influence the accuracy rate of object classification,
Second is that facilitating neural network model subsequent processing, calculation amount is reduced, the image resolution ratio that the present embodiment is chosen after normalization is
227*227。
Dynamic object region bounding box after normalization is input in the convolutional neural networks after training, uses classification
Device carries out target classification to the dynamic object region bounding box of input, and differentiation is sold and two class target of mobile phone, provides hand and hand
Machine label obtains in one's hands and mobile phone confidence, removes jamming target, confidence threshold according to confidence threshold value
Value is 0.88~0.92, preferably 0.9, then returns device using frame and confines to dynamic object region bounding box progress region
Position obtains in one's hands and two class of mobile phone dynamic object region bounding box;
The training step of convolutional neural networks are as follows: acquisition object for appreciation picture of mobile telephone 15000, which is opened, from network is fabricated to comprising only
The positive sample collection that hand, only mobile phone, hand and mobile phone exist simultaneously tertiary target picture, these positive sample Target Photos are in multiple
Under the conditions of miscellaneous background, different illumination conditions, different angle, different resolution etc., the diversity of sample is sufficiently met, is needed
It is to be noted that due to mainly being run in driving environment, so needing object for appreciation mobile phone sample graph when some driving of more acquisitions;From
The picture making of the complex background around 20000 hands and mobile phone is acquired on network into negative sample collection;Negative sample collection is mostly
Complex background around hand and mobile phone will consider instrument when driving when choosing negative sample similarly due to being in driving environment
The interference of exterior portion point etc. and mobile phone periphery and its similar object;
Collected data set is labeled using the ImageLabeler tool that MATLAB is carried, in callout box
Input needs the target designation marked, the label of hand and mobile phone is made, as the label of hand region can be set to " hand ".
The image data generated after mark is converted into computer operation Table format, is convenient for subsequent unified calculation;
The data of hand and mobile phone after conversion are inputted two independent neural networks respectively to be trained, first mark hand
Portion region and background are inputted in hand network and are trained, and are then carried out the classification of hand classifier and are returned the positioning of device,
Confidence and bounding box are provided, at the same time, phone area and background is marked, is trained in input handset network, with
The classification of mobile phone classifier is carried out afterwards and returns the positioning of device, provides confidence and bounding box;
After the completion of training, the training data of two independent neural networks is merged to the convolutional Neural net after being trained
Network.
Traditional convolutional neural networks classification multiple target object often frequently with a network class multiple objects, it is this
Although mode classification detection convenience, once detection target overlaps, target will interact when classification, accuracy meeting
Each target is separately trained with different networks when being greatly affected, therefore training, the accuracy of detection will be greatly improved.
And the above-mentioned neural network for training is the neural network by optimization, mainly optimizes convolution nuclear volume,
Each convolutional layer includes tens of or even hundreds of convolution kernel filters in Alexnet network.The purpose of this suboptimization is
How the convolution kernel filter for examining these large number of has an impact the overall performance of model.But directly pass through analysis
Convolution kernel can not intuitively and effectively analyze network, we are helped by the corresponding convolution Feature Mapping of convolution karyogenesis
We analyze the performance superiority and inferiority of each convolution kernel.
By taking first layer convolutional layer as an example, first layer convolutional layer use 96 filters, filter by 11*11 Two-Dimensional Moment
Whether battle array composition, the submatrix that two-dimensional matrix judges image match with the form of Convolution Filter, if form is coincide
Its value can generate Feature Mapping image greatly than the value of surrounding.As it can be seen that not all convolution kernel can all propose feature
It takes and is contributed, especially in the case where being directed to specific objective, therefore propose a kind of to arrange the disturbance degree of Feature Mapping
The method of sequence.Specific Optimization Steps are as follows:
(1) computable matrix, i.e. gray level co-occurrence matrixes are converted by Feature Mapping figure: in a two-dimensional coordinate system xoy
In, it takes the arbitrary point (x, y) in image and deviates its a bit (x+a, y+b), if the gray value of (x, y) point is (i, j);
It enables point (x, y) move on the entire image, obtains the value of various (i, j), if gray scale numerical series is k, then gray value
The value of (i, j) shares k2Kind, whole image is traversed, the number of each (i, j) appearance is counted, is arranged in a square matrix, will
To number be normalized to occur Probability p (i, j), p (i, j) square matrix generated is gray level co-occurrence matrixes;
(2) entropy in gray level co-occurrence matrixes is calculated, the entropy in gray level co-occurrence matrixes meets formula:
Wherein, p (i, j) refers to that the gray level co-occurrence matrixes after normalization, Ent are grayscale image entropy;
(3) it is calculated by the entropy to Feature Mapping gray level co-occurrence matrixes caused by each convolution kernel, by Feature Mapping
Calculated entropy carries out size sequence, and entropy is bigger to illustrate the more complicated of image, contribution degree of the convolution kernel filter to model
Height, due to there is hundreds of thousands of a convolution kernels in convolutional neural networks, and the input of every later layer needs to rely on the defeated of preceding layer
It out, is clearly not all right by visual means.If retain convolution kernel too much so guarantee accuracy rate while can not
Guarantee real-time;If the convolution nuclear volume cut down is excessive, the real-time of detection is improved, but may delete useful volume
Product core causes the accuracy rate of model to be lower, and loses more than gain, in order to effectively grasp the balance between accuracy rate and real-time, if
The convolution kernel proportion threshold value for needing to delete calmly, is shown below:
Wherein, DETo need the convolution kernel number deleted, NEThe convolution kernel number possessed by current layer, with Alexnet
For 96 convolution kernels of first layer, the threshold value Thershold deleted will be needed to be set as 10%, then deletes contribution margin sequence
In 1~9 convolution kernel, according to calculation above, adjustment threshold size that can be in due course adjusts the volume of every layer of convolutional layer
Product core filter quantity, it is desirable to obtain better discrimination, threshold value can be reduced, if to obtain better real-time, just add
Big threshold value reaches a balance of model accuracy rate and real-time by searching out a suitable threshold value;
The manpower and phone area detected needs to judge whether the region of the two is overlapped in the case where existing simultaneously, exclude
The two exist alone and apart from it is far a possibility that.Due to the variability of hand gesture and mobile phone size, bounding box is obtained
Area size is not uniquely to fix.Therefore we determine manpower area using the method for two region distance threshold values of setting
Domain and phone area whether there is superposition phenomenon, to judge driver with the presence or absence of object for appreciation mobile phone behavior.Key step:
Firstly, reading four apex coordinates a1, a2, a3, a4 of manpower bounding box and four tops of mobile phone bounding box respectively
Point coordinate b1, b2, b3, b4, target area illustraton of model are as shown in Figure 1.
Secondly, respectively calculating center point coordinate c1, the c2 in two regions, meet formula 3.
Again, the distance d for calculating two regional center points, meets formula 4.
Finally, calculating radius r1, r2 of two Rectangular Bounding Volumes, meet formula 5.
If the distance between dynamic object region bounding box of dynamic object region bounding box of hand and mobile phone is less than etc.
In the radius of hand dynamic object region bounding box and the sum of radius d≤r1+r2 of mobile phone dynamic object region bounding box, then sentence
Fixed two class dynamic object region bounding boxs, which exist, is overlapped phenomenon, and driver, which exists, plays mobile phone behavior;If the dynamic object area of hand
The distance between dynamic object region bounding box of domain bounding box and mobile phone is greater than the radius of the dynamic object region bounding box of hand
With the sum of the radius of dynamic object region bounding box of mobile phone d > r1+r2, then determine that two class dynamic object region bounding boxs are not deposited
It is being overlapped phenomenon, there is no play mobile phone behavior by driver.
Claims (6)
1. the detection method that a kind of driver based on deep learning plays mobile phone behavior, it is characterised in that specific detection method is such as
Under:
Acquisition driver is converted into frame image in the video sample that driver's cabin plays mobile phone, subtracts each other respective pixel value to obtain difference diagram
Picture, then given threshold is filtered the dynamic object image after binaryzation, expands, corrodes to difference image binaryzation
Subsequent arithmetic removes isolated noise point, obtains dynamic object region, then carries out the dynamic object region of acquisition horizontal, vertical
Shadow is delivered directly, finding dynamic object region, four critical points are split up and down, obtain dynamic object region bounding box;
Obtained dynamic object region bounding box is subjected to dynamic object coarse positioning using normalized bending operation, by dynamic mesh
Mark region bounding box is unified into the consistent square of resolution ratio;
Dynamic object region bounding box after normalization is input in the convolutional neural networks after training, uses classifier pair
The dynamic object region bounding box of input carries out target classification, and differentiation is sold and two class target of mobile phone, provides hand and mobile phone mark
Label obtain in one's hands and mobile phone confidence, remove jamming target according to confidence threshold value, then return device using frame
Regional frame positioning is carried out to dynamic object region bounding box, obtains in one's hands and two class of mobile phone dynamic object region bounding box;
If the distance between dynamic object region bounding box and the dynamic object region bounding box of mobile phone of hand are less than or equal to hand
The radius of dynamic object region bounding box and the sum of the radius of mobile phone dynamic object region bounding box, then determine two class dynamic objects
Region bounding box, which exists, is overlapped phenomenon, and driver, which exists, plays mobile phone behavior;If the dynamic object region bounding box and mobile phone of hand
The distance between dynamic object region bounding box be greater than hand dynamic object region bounding box radius and mobile phone dynamic mesh
The sum of the radius for marking region bounding box then determines that two class dynamic object region bounding boxs are not deposited there is no phenomenon, driver is overlapped
Playing mobile phone behavior.
2. a kind of driver based on deep learning as described in claim 1 plays the detection method of mobile phone behavior, feature exists
In the threshold value to difference image binaryzation be 45~60, in identical ambient brightness, if the correspondence of consecutive frame image
Pixel value variation is less than threshold value, and the pixel in these regions is labeled as background pixel;If the respective pixel value of consecutive frame image
Variation is greater than threshold value, and the pixel in these regions is labeled as object pixel.
3. a kind of driver based on deep learning as described in claim 1 plays the detection method of mobile phone behavior, feature exists
In the confidence threshold value be 0.88~0.92.
4. a kind of driver based on deep learning as described in claim 1 plays the detection method of mobile phone behavior, feature exists
In the confidence threshold value be 0.9.
5. a kind of driver based on deep learning as described in claim 1 plays the detection method of mobile phone behavior, feature exists
In the training step of convolutional neural networks are as follows: acquisition object for appreciation picture of mobile telephone 15000, which is opened, from network is fabricated to comprising only hand, only
Mobile phone, hand and mobile phone exist simultaneously the positive sample collection of tertiary target picture, acquire around 20000 hands and mobile phone from network
The picture making of complex background is at negative sample collection;
Collected data set is labeled, the label of hand and mobile phone is made, is converted into computer operation format;
The data of hand and mobile phone after conversion are inputted two independent neural networks respectively to be trained, first mark hand area
Domain and background are inputted in hand network and are trained, and are then carried out the classification of hand classifier and are returned the positioning of device, provide and set
Confidence score and bounding box mark phone area and background, are trained in input handset network, then carry out at the same time
The classification of mobile phone classifier and the positioning for returning device, provide confidence and bounding box;
After the completion of training, the training data of two independent neural networks is merged to the convolutional neural networks after being trained.
6. a kind of driver based on deep learning as described in claim 1 plays the detection method of mobile phone behavior, feature exists
In the neural network that the neural network is by optimization, specific optimization method is as follows:
1) optimize convolution nuclear volume
(1) computable matrix, i.e. gray level co-occurrence matrixes are converted by Feature Mapping figure: take the arbitrary point (x, y) in image with
And deviate it a bit (x+a, y+b), if (x, y) point gray value be (i, j);It enables point (x, y) move on the entire image, obtains
To the value of various (i, j), if gray scale numerical series is k, then the value of gray value (i, j) shares k2Kind, whole image is traversed,
Count each (i, j) appearance number, be arranged in a square matrix, by obtained number be normalized to occur Probability p (i,
J), p (i, j) square matrix generated is gray level co-occurrence matrixes;
(2) entropy in gray level co-occurrence matrixes is calculated, the entropy in gray level co-occurrence matrixes meets formula:
Wherein, p (i, j) refers to that the gray level co-occurrence matrixes after normalization, Ent are grayscale image entropy;
(3) entropy calculated is subjected to size sequence, and sets the convolution kernel proportion threshold value for needing to delete, be shown below:Wherein, DETo need the convolution kernel number deleted, NEThe convolution kernel number possessed by current layer,
The threshold value Thershold deleted will be needed to be set as 10%, retain 90% convolution kernel.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910106254.4A CN109871799B (en) | 2019-02-02 | 2019-02-02 | Method for detecting mobile phone playing behavior of driver based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910106254.4A CN109871799B (en) | 2019-02-02 | 2019-02-02 | Method for detecting mobile phone playing behavior of driver based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109871799A true CN109871799A (en) | 2019-06-11 |
CN109871799B CN109871799B (en) | 2023-03-24 |
Family
ID=66918614
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910106254.4A Active CN109871799B (en) | 2019-02-02 | 2019-02-02 | Method for detecting mobile phone playing behavior of driver based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109871799B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110287906A (en) * | 2019-06-26 | 2019-09-27 | 四川长虹电器股份有限公司 | Method and system based on image/video detection people " playing mobile phone " |
CN111461020A (en) * | 2020-04-01 | 2020-07-28 | 浙江大华技术股份有限公司 | Method and device for identifying behaviors of insecure mobile phone and related storage medium |
CN111462180A (en) * | 2020-03-30 | 2020-07-28 | 西安电子科技大学 | Object tracking method based on AND-OR graph AOG |
CN111913857A (en) * | 2020-07-08 | 2020-11-10 | 浙江大华技术股份有限公司 | Method and device for detecting operation behavior of intelligent equipment |
CN111931587A (en) * | 2020-07-15 | 2020-11-13 | 重庆邮电大学 | Video anomaly detection method based on interpretable space-time self-encoder |
CN112686188A (en) * | 2021-01-05 | 2021-04-20 | 西安理工大学 | Front windshield and driver region positioning method based on deep learning method |
CN112926510A (en) * | 2021-03-25 | 2021-06-08 | 深圳市商汤科技有限公司 | Abnormal driving behavior recognition method and device, electronic equipment and storage medium |
CN113139577A (en) * | 2021-03-22 | 2021-07-20 | 广东省科学院智能制造研究所 | Deep learning image classification method and system based on deformable convolution network |
CN114253395A (en) * | 2021-11-11 | 2022-03-29 | 易视腾科技股份有限公司 | Gesture recognition system for television control and recognition method thereof |
WO2024001617A1 (en) * | 2022-06-30 | 2024-01-04 | 京东方科技集团股份有限公司 | Method and apparatus for identifying behavior of playing with mobile phone |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013091370A1 (en) * | 2011-12-22 | 2013-06-27 | 中国科学院自动化研究所 | Human body part detection method based on parallel statistics learning of 3d depth image information |
CN106611169A (en) * | 2016-12-31 | 2017-05-03 | 中国科学技术大学 | Dangerous driving behavior real-time detection method based on deep learning |
CN108509902A (en) * | 2018-03-30 | 2018-09-07 | 湖北文理学院 | A kind of hand-held telephone relation behavioral value method during driver drives vehicle |
-
2019
- 2019-02-02 CN CN201910106254.4A patent/CN109871799B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013091370A1 (en) * | 2011-12-22 | 2013-06-27 | 中国科学院自动化研究所 | Human body part detection method based on parallel statistics learning of 3d depth image information |
CN106611169A (en) * | 2016-12-31 | 2017-05-03 | 中国科学技术大学 | Dangerous driving behavior real-time detection method based on deep learning |
CN108509902A (en) * | 2018-03-30 | 2018-09-07 | 湖北文理学院 | A kind of hand-held telephone relation behavioral value method during driver drives vehicle |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110287906A (en) * | 2019-06-26 | 2019-09-27 | 四川长虹电器股份有限公司 | Method and system based on image/video detection people " playing mobile phone " |
CN111462180B (en) * | 2020-03-30 | 2023-03-24 | 西安电子科技大学 | Object tracking method based on AND-OR graph AOG |
CN111462180A (en) * | 2020-03-30 | 2020-07-28 | 西安电子科技大学 | Object tracking method based on AND-OR graph AOG |
CN111461020A (en) * | 2020-04-01 | 2020-07-28 | 浙江大华技术股份有限公司 | Method and device for identifying behaviors of insecure mobile phone and related storage medium |
CN111461020B (en) * | 2020-04-01 | 2024-01-19 | 浙江大华技术股份有限公司 | Recognition method, equipment and related storage medium for unsafe mobile phone behavior |
CN111913857A (en) * | 2020-07-08 | 2020-11-10 | 浙江大华技术股份有限公司 | Method and device for detecting operation behavior of intelligent equipment |
CN111931587A (en) * | 2020-07-15 | 2020-11-13 | 重庆邮电大学 | Video anomaly detection method based on interpretable space-time self-encoder |
CN112686188A (en) * | 2021-01-05 | 2021-04-20 | 西安理工大学 | Front windshield and driver region positioning method based on deep learning method |
CN112686188B (en) * | 2021-01-05 | 2024-02-06 | 西安理工大学 | Front windshield and driver area positioning method based on deep learning method |
CN113139577A (en) * | 2021-03-22 | 2021-07-20 | 广东省科学院智能制造研究所 | Deep learning image classification method and system based on deformable convolution network |
CN113139577B (en) * | 2021-03-22 | 2024-02-23 | 广东省科学院智能制造研究所 | Deep learning image classification method and system based on deformable convolution network |
CN112926510A (en) * | 2021-03-25 | 2021-06-08 | 深圳市商汤科技有限公司 | Abnormal driving behavior recognition method and device, electronic equipment and storage medium |
CN114253395A (en) * | 2021-11-11 | 2022-03-29 | 易视腾科技股份有限公司 | Gesture recognition system for television control and recognition method thereof |
CN114253395B (en) * | 2021-11-11 | 2023-07-18 | 易视腾科技股份有限公司 | Gesture recognition system and method for television control |
WO2024001617A1 (en) * | 2022-06-30 | 2024-01-04 | 京东方科技集团股份有限公司 | Method and apparatus for identifying behavior of playing with mobile phone |
Also Published As
Publication number | Publication date |
---|---|
CN109871799B (en) | 2023-03-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109871799A (en) | A kind of driver based on deep learning plays the detection method of mobile phone behavior | |
CN106910203B (en) | The quick determination method of moving target in a kind of video surveillance | |
CN103824070B (en) | A kind of rapid pedestrian detection method based on computer vision | |
CN105427314B (en) | SAR image object detection method based on Bayes's conspicuousness | |
WO2017190574A1 (en) | Fast pedestrian detection method based on aggregation channel features | |
CN106203274A (en) | Pedestrian's real-time detecting system and method in a kind of video monitoring | |
CN101587544B (en) | Based on the carried on vehicle antitracking device of computer vision | |
CN101980242B (en) | Human face discrimination method and system and public safety system | |
CN106384345B (en) | A kind of image detection and flow statistical method based on RCNN | |
CN103902976A (en) | Pedestrian detection method based on infrared image | |
CN106204640A (en) | A kind of moving object detection system and method | |
CN105718912B (en) | A kind of vehicle characteristics object detecting method based on deep learning | |
CN109241349A (en) | A kind of monitor video multiple target classification retrieving method and system based on deep learning | |
CN104951784A (en) | Method of detecting absence and coverage of license plate in real time | |
CN103390164A (en) | Object detection method based on depth image and implementing device thereof | |
CN102521565A (en) | Garment identification method and system for low-resolution video | |
Han et al. | Deep learning-based workers safety helmet wearing detection on construction sites using multi-scale features | |
CN106570490B (en) | A kind of pedestrian's method for real time tracking based on quick clustering | |
CN106682641A (en) | Pedestrian identification method based on image with FHOG- LBPH feature | |
CN111915583B (en) | Vehicle and pedestrian detection method based on vehicle-mounted thermal infrared imager in complex scene | |
Chen et al. | Multi-shape descriptor vehicle classification for urban traffic | |
CN107301378A (en) | The pedestrian detection method and system of Multi-classifers integrated in image | |
CN109255286A (en) | A kind of quick detection recognition method of unmanned plane optics based on YOLO deep learning network frame | |
CN104951758B (en) | The vehicle-mounted pedestrian detection of view-based access control model and tracking and system under urban environment | |
CN104573811A (en) | Pedestrian flow counting method based on infrared image and color image fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |