AU2020103494A4 - Handheld call detection method based on lightweight target detection network - Google Patents

Handheld call detection method based on lightweight target detection network Download PDF

Info

Publication number
AU2020103494A4
AU2020103494A4 AU2020103494A AU2020103494A AU2020103494A4 AU 2020103494 A4 AU2020103494 A4 AU 2020103494A4 AU 2020103494 A AU2020103494 A AU 2020103494A AU 2020103494 A AU2020103494 A AU 2020103494A AU 2020103494 A4 AU2020103494 A4 AU 2020103494A4
Authority
AU
Australia
Prior art keywords
depth
convolution layer
detection
model
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2020103494A
Inventor
Zhongxin Zhang
Zuopeng Zhao
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Mining and Technology CUMT
Original Assignee
China University of Mining and Technology CUMT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Mining and Technology CUMT filed Critical China University of Mining and Technology CUMT
Priority to AU2020103494A priority Critical patent/AU2020103494A4/en
Application granted granted Critical
Publication of AU2020103494A4 publication Critical patent/AU2020103494A4/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/44Event detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Psychiatry (AREA)
  • Human Computer Interaction (AREA)
  • Social Psychology (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a handheld call detection method based on a lightweight target detection network, which comprises the following steps: Si, acquiring a driver image data set and labeling the driver image data set to obtain the sample image data set; S2, constructing a handheld call detection model based on an LMS-DN network, and performing model training through the sample image data set; S3, conducting performance test on the trained handheld call detection model based on the indexes of the detection precision, detection efficiency and model size; repeating step S2 to optimize and train the model when the performance test result is lower than a preset threshold; and S4, inputting the driver image acquired in real time into the optimized handheld call detection model to obtain the driver handheld call test result. The invention can effectively improve the detection precision and the detection efficiency, reduce the size of the model, has strong anti-interference capability, is suitable for embedded equipment, can overcome the influence of strong light and weak light, and can finish the real time detection of the target with high accuracy in the scene of obstacle interference. 1/8 Sl1. Acquiring a driver image data set, and labeling the driver image data set to obtain a sample image data set; S2. Constructing a handheld call detection model based on an LMS-DN network, and training the handheld call detection model through the sample image data set obtained in step Si to obtain a trained handheld call detection model; S3. Conducting a performance test on the trained handheld call detection model based on the indexes of the detection precision, the detection efficiency and the model size; If the performance test result is lower than a preset threshold, and repeating step S2 to optimize and train the model until the performance result reaches the preset threshold; S4. Inputting the driver image acquired in real time into the optimized handheld call detection model in step S3 to obtain the test result of the driver handheld call. Figure 1

Description

1/8
Sl1. Acquiring a driver image data set, and labeling the driver image data set to obtain a sample image data set;
S2. Constructing a handheld call detection model based on an LMS-DN network, and training the handheld call detection model through the sample image data set obtained in step Si to obtain a trained handheld call detection model;
S3. Conducting a performance test on the trained handheld call detection model based on the indexes of the detection precision, the detection efficiency and the model size; If the performance test result is lower than a preset threshold, and repeating step S2 to optimize and train the model until the performance result reaches the preset threshold;
S4. Inputting the driver image acquired in real time into the optimized handheld call detection model in step S3 to obtain the test result of the driver handheld call.
Figure 1
Handheld call detection method based on lightweight target detection network
TECHNICAL FIELD
[01] The invention relates to the technical field of target detection, in particular to a handheld call detection method based on a lightweight target detection network.
BACKGROUND
[02] In order to avoid the occurrence of traffic accidents, the detection algorithms of dangerous behaviors such as hand-held calls and telephone calls by drivers and the application of embedded environment have been studied. However, most of the research on target detection networks aim to improve the accuracy, but ignored the problems of the model, calculation quantity and reference quantity. In 2014, Girshick et al. proposed a region-based convolution neural network R-CNN, which used region recognition to detect objects. In 2015, in the improved version of R-CNN, Fast R-CNN and Faster R-CNN were proposed to realize end-to-end detection of targets. Both the Fast R-CNN and Faster R-CNN models were two-stage algorithms, and the accuracy was higher than the traditional algorithms, but the detection speed was slow and can not meet the real-time requirements. Mask R-CNN was also a two-stage approach, because where the ROIAlign instead of ROIPool was adopted in performing quantization. 2016, Redmon et al. proposed the YOLO and YOL09000, followed by the YOLOv2 and YOLOv3, which greatly improved the detection of small objects compared to the YOL09000. The YOLO series algorithm combined the two-stage tasks of classifying and identifying candidate frames in the Faster-RCNN, which greatly improved the detection speed, but because the number of the YOLOv3 parameters was large and the detection speed was slow. To realize the operation of the target detection network on the mobile equipment, the network is required to be both accurate enough and fast. In addition, the SSD algorithm proposed by Wei Liu et al. realized the regression detection of the whole image, and the speed was improved, but the accuracy of small target detection was greatly reduced. The YOLO and SSD and their derived networks are representative of the one-stage network which realized the end-to-end training, only used a convolution neural network to directly predict the types and positions of different targets, and improved the detection speed on the basis of sacrificing certain accuracy.
[03] The Tiny-Yolo was proposed in 2017 and widely used due to its high speed and low memory consumption. But it is still difficult to implement a real-time application for a device without a GPU (Graphics Processing Unit). In the same year, Andrew G. Howard et al proposed the MobileNet for mobile and embedded vision applications. In 2018, the network of the MobileNet-SSD derived from the VGG-SSD was proposed, which greatly reduced the parameters and at the same time greatly improved the detection speed. But the risk of missing and false detection of small objects was very high, so that the real-time and accurate handheld call detection could not be realized on the embedded equipment. Therefore, it is necessary to provide a handheld call detection method based on a lightweight target detection network so as to enhance the real-time performance and the accuracy of handheld call detection on embedded equipment.
SUMMARY
[04] The invention aims to provide a handheld call detection method based on a lightweight target detection network, so as to solve the technical problems in the prior art, effectively improve detection precision and detection efficiency, reduce the model size, have strong anti-interference capability, be suitable for embedded equipment, overcome the influence of strong light and weak light, and complete real-time target detection with high accuracy in a scene with certain obstacle interference.
[05] To achieve the above objectives, the invention provides the following scheme: the invention provides a handheld call test method based on a lightweight target detection network, which comprises the following steps:
[06] Si. Acquiring a driver image data set, and labeling the driver image data set to obtain the sample image data set;
[07] S2. Constructing a handheld call detection model based on an LMS-DN network, and training the handheld call detection model through a sample image data set obtained in step S Ito obtain a trained handheld call detection model;
[08] S3. Conducting performance test on the trained handheld call detection model based on the indexes of the detection precision, the detection efficiency and the model size, wherein if the performance detection result is lower than a preset threshold, repeating step S2 to optimize and train the model until the performance result reaches the preset threshold;
[09] S4. Inputting the driver image acquired in real time into the optimized handheld call detection model in step S3 to obtain the result of the driver handheld call detection.
[010] Preferably, in step S2, the LMS-DN network is divided into two parts: the first part is divided into a basic classification network Mobilenet-I, and the second part is divided into an SSDLite network.
[011] Preferably, the Mobilenet-I network comprises a sequentially connected Conv convolution layer with a depth of 3x3, a SinConv convolution layer with a depth of 3x3, a BnConv3 convolution layer with a depth of 3x3, a BnConv3 convolution layer with a depth of 5x5, two BnConv6 convolution layers with a depth of3x3, a BnConv6 convolution layer with a depth of 5x5, a BnConv6 convolution layer with a depth of 3x3, an FC full connection layer and a pooling layer, which are sequentially connected.
[012] Preferably, the BnConv3 convolution layer of a depth of 3x3 includes a Conv convolution layer of a depth of 1x1, a DwiseConv layer of a depth of 3x3, and a Conv convolution layer of a depth of 1x1, which are sequentially connected.
[013] Preferably, the BnConv3 convolution layer of a depth of 5x5 includes a Conv convolution layer of a depth of 1x1, a DwiseConv layer of a depth of 5x5, and a Conv convolution layer of a depth of 1x1, which are sequentially connected.
[014] Preferably, the BnConv6 convolution layer of a depth of 3x3 includes a Conv convolution layer of a depth of 1x1, a DwiseConv layer of a depth of 3x3, and a Conv convolution layer of a depth oflx1, which are sequentially connected.
[015] Preferably, the BnConv6 convolution layer of a depth of 5x5 includes a Conv convolution layer of a depth of 1x1, a DwiseConv layer of a depth of 5x5, and a Conv convolution layer of a depth oflx1, which are sequentially connected.
[016] Preferably, the SinConv convolution layer with a depth of 3x3 comprises two paths of branch structures, each path of branch structure comprises a DWConv convolution layer with a depth of 3x3 and a Conv convolution layer with a depth of lxI which are sequentially connected. . The two branch structures are synthesized to form one path of signals through a Concat function.
[017] Preferably, the SSDLite network includes a prediction layer that employs depth separable convolution.
[018] Preferably, the detection accuracy is measured through the accuracy rate, recall rate, precision rate and mean accuracy (mAP); the detection efficiency is measured through the number of detected frames per second; and the model size is measured through the megabyte (MB) of the model.
[019] The invention discloses the following technical effects:
[020] Aiming at a driver active safety prevention and control system, a handheld call detection model is constructed on the basis of a lightweight network LMS-DN. The LMS-DN network is formed by combining an improved Mobilenet-I and an improved SSDLite network. The experimental results prove that the handheld call detection model has higher detection precision and detection efficiency on small targets like mobile phones, with a smaller model size, strong anti-interference capability. The model is suitable for embedded equipment, can overcome the influence of strong light and weak light, and can realize real-time detection on the targets with higher accuracy in a scene with certain obstacle interference.
BRIEF DESCRIPTION OF THE FIGURES
[021] The invention aims to provide a handheld call detection method based on a lightweight target detection network, so as to solve the technical problems in the prior art, effectively improve detection precision and detection efficiency, reduce the model size, have strong anti-interference capability, be suitable for embedded equipment, overcome the influence of strong light and weak light, and complete real-time target detection with high accuracy in a scene with certain obstacle interference.
[022] To achieve the above objectives, the invention provides the following scheme: the invention provides a handheld call test method based on a lightweight target detection network, which comprises the following steps:
[023] Si. Acquiring a driver image data set, and labeling the driver image data set to obtain the sample image data set;
[024] S2. Constructing a handheld call detection model based on an LMS-DN network, and training the handheld call detection model through a sample image data set obtained in step Si to obtain a trained handheld call detection model;
[025] S3. Conducting performance test on the trained handheld call detection model based on the indexes of the detection precision, the detection efficiency and the model size, wherein if the performance detection result is lower than a preset threshold, repeating step S2 to optimize and train the model until the performance result reaches the preset threshold;
[026] S4. Inputting the driver image acquired in real time into the optimized handheld call detection model in step S3 to obtain the result of the driver handheld call detection.
[027] Preferably, in step S2, the LMS-DN network is divided into two parts: the first part is divided into a basic classification network Mobilenet-I, and the second part is divided into an SSDLite network.
[028] Preferably, the Mobilenet-I network comprises a sequentially connected Conv convolution layer with a depth of 3x3, a SinCony convolution layer with a depth of 3x3, a BnConv3 convolution layer with a depth of 3x3, a BnConv3 convolution layer with a depth of 5x5, two BnConv6 convolution layers with a depth of 3x3, a BnConv6 convolution layer with a depth of 5x5, a BnConv6 convolution layer with a depth of 3x3, an FC full connection layer and a pooling layer, which are sequentially connected.
[029] Preferably, the BnConv3 convolution layer of a depth of 3x3 includes a Conv convolution layer of a depth of lxi, a DwiseConv layer of a depth of 3x3, and a Conv convolution layer of a depth of x, which are sequentially connected.
[030] Preferably, the BnConv3 convolution layer of a depth of 5x5 includes a Conv convolution layer of a depth of ixi, a DwiseConv layer of a depth of 5x5, and a Conv convolution layer of a depth ofix, which are sequentially connected.
[031] Preferably, the BnConv6 convolution layer of a depth of 3x3 includes a Conv convolution layer of a depth of 1x1, a DwiseConv layer of a depth of 3x3, and a Conv convolution layer of a depth of 1x1, which are sequentially connected.
[032] Preferably, the BnConv6 convolution layer of a depth of 5x5 includes a Conv convolution layer of a depth of 1x1, a DwiseConv layer of a depth of 5x5, and a Conv convolution layer of a depth oflx1, which are sequentially connected.
[033] Preferably, the SinCony convolution layer with a depth of 3x3 comprises two paths of branch structures, each path of branch structure comprises a DWConv convolution layer with a depth of 3x3 and a Conv convolution layer with a depth of 1x1 which are sequentially connected. . The two branch structures are synthesized to form one path of signals through a Concat function.
[034] Preferably, the SSDLite network includes a prediction layer that employs depth separable convolution.
[035] Preferably, the detection accuracy is measured through the accuracy rate, recall rate, precision rate and mean accuracy (mAP); the detection efficiency is measured through the number of detected frames per second; and the model size is measured through the megabyte (MB) of the model.
[036] The invention discloses the following technical effects:
[037] Aiming at a driver active safety prevention and control system, a handheld call detection model is constructed on the basis of a lightweight network LMS-DN. The LMS-DN network is formed by combining an improved Mobilenet-I and an improved SSDLite network. The experimental results prove that the handheld call detection model has higher detection precision and detection efficiency on small targets like mobile phones, with a smaller model size, strong anti-interference capability. The model is suitable for embedded equipment, can overcome the influence of strong light and weak light, and can realize real-time detection on the targets with higher accuracy in a scene with certain obstacle interference.
[038]
[039] BRIEF DESCRIPTION OF THE FIGURES
[040] In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the figures which are required to be used in the embodiments will be briefly described in below. Obviously, the figures in the following description are only some embodiments of the present invention, and other figures can be obtained based on these ones without devoting inventive labor by those skilled in the art.
[041] Figure 1 is a flowchart of a handheld call detection method based on a lightweight target detection network according to the present invention;
[042] Figure 2 is a schematic diagram of a Mobilenet-I network structure, consisting of a schematic diagrams of an overall structure of the Mobilenet-I network (Figure 2 (a)), BnConv3 layer with a depth of 3x3 (Figure 2(b)), BnConv3 convolution layer with a depth of 5x5 (Figure 2(c)), BnConv6 convolution layer with a depth of 3x3 (Figure 2(d)), BnConv6 convolution layer with a depth of 5x5 (Figure 2(e)), and SinCony convolution layer with a depth of 3x3 (Figure 2(f) );
[043] Figure 3 is a schematic diagram showing the overall structure of an LMS DN network according to the present invention;
[044] Figure 4 is a schematic diagram of an SSD network structure according to the present invention;
[045] Figure 5 is an image of a SafeImgs data set according to the example in the present invention;
[046] Figure 6 is a diagram showing detection effects of a MobileNetV2-SSDLite and an LMS-DN network on a KITTI data set according to the example in the present invention, wherein Figure 6(a) shows the detection result under the MobileNetV2 SSDLite, and Figure 6(b) shows the detection result under the LMS-DN;
[047] Figure 7 shows detection results of the LMS-DN network and the MobileNetV2-SSDLite network under different thresholds according to the example in the present invention.
DESCRIPTION OF THE INVENTION
[048] The technical solution in the embodiments of the present invention will be clearly and fully described in below with reference to the accompanying figures. Obviously, the described embodiments are only a part but not all of these of the present invention. All other embodiments obtained based on the embodiments of the present invention by one of ordinary skill in the art without creative labor are within the scope of the present invention.
[049] In order to make the above objects, features and advantages of the present invention be more clear and understandable, the present invention will now be described in further detail with reference to the accompanying figures and specific embodiments thereof.
[050] Referring to Figure 1, the embodiment provides a handheld call detection method based on a lightweight target detection network, which specifically comprises the following steps:
[051] Si. Acquiring a driver image data set, and labeling the driver image data set to obtain the sample image data set;
[052] According to the embodiment, the sample image data set is obtained based on the SafeImgs data set, and images of the SafeImgs data set are all from a driver video monitoring platform; as shown in Figure 5. From the data in the SafeImgs data set, 30 videos of drivers on the phone during driving, from which the driving videos of drivers are selected as verification sets. Using the OpenCV to intercept the face of the collected video at 10 frames per second, and obtaining 5500 images in the training data set and 550 images in the verification data set. Labeling the training data set and the images in the verification data set by using a labeling tool of labellmg. After the samples are labeled, generating xml files corresponding to the samples one by one.
[053] S2. Constructing a handheld call detection model based on the LMS-DN network, and training the handheld call detection model through a sample image data set obtained in step S to obtain a trained handheld call detection model;
[054] The LMS-DN network is divided into two parts: the first part is the basic classification network Mobilenet-I, and the second part is the SSDLite network.
[055] The Mobilenet-I network, which is an improved version of the MobilenetV2 network, extracts the sense field by using expanded feature of the Inception structure, incorporates the separable convolutions with a depth of 5x5, and adjusts the overall structure of the MobilenetV2. The Mobilenet-I network includes: a Conv convolution layer with a depth of 3x3, a SinCony convolution layer with a depth of 3x3, a BnConv3 convolution layer with a depth of 3x3, a BnConv3 convolution layer with a depth of 5x5, two BnConv6 convolution layers with a depth of 3x3, a BnConv6 convolutional layer with a depth of 5x5, a BnConv6 convolutional layer with a depth of 3x3, an FC fully connected layer, and a pooling layer, which are sequentially connected, as shown in FIG. 2 (a). Among them, the BnConv3 convolutional layer with a depth of 3x3 includes a Conv convolution layer of a depth of lxI, a DwiseConv layer of a depth of 3x3 , and a Conv convolution layer of a depth of 1xi, which are sequentially connected, as shown in FIG. 2(b); the BnConv3 convolution layer of a depth of 5x5 includes a Conv convolution layer of a depth of 1xi, a DwiseConv layer of a depth of 5x5, and a Conv convolution layer of a depth of 1xi, which are sequentially connected, as shown in FIG. 2(c); the BnConv6 convolution layer of a depth of 3x3 includes a Conv convolution layer of a depth of 1xi, a DwiseConv layer of a depth of 3x3, and a Conv convolution layer of a depth of lxi, which are sequentially connected, as shown in FIG. 2(d); the BnConv6 convolution layer of a depth of 5x5 includes a Conv convolution layer of a depth oflxi, a DwiseConv layer of a depth of 5x5, and a Conv convolution layer of a depth of lxi, which are sequentially connected, as shown in FIG. 2(e); the SinCony convolution layer with a depth of 3x3 comprises two paths of branch structures, each path of branch structure comprises a DWConv convolution layer with a depth of 3x3 and a Conv convolution layer with a depth of x which are sequentially connected. . The two branch structures are synthesized to form one path of signals through a Concat function, as shown in Figure 2(f).
[056] The Mobilenet-I network contains the convolutions with a depth of 5x5, while the existing networks usually only use 3x3 cores. For deep separable convolutions, one 5x5 core can save resources more than two 3x3 cores. Formally, given the input shape (H, W, M) and output shape (H, W, N), where H and W represent the height and width of the image respectively, and M and N represent the depth of the input and output image respectively, the computational costs of the separable convolutions with a depth of 5x5 and 3x3 are calculated by the following equation:
[057] C5 = 5 H *W *M*(25+ N)
[058] C3 3 =H*W*M*(9+N)
[059] C5x5 < 2* C3x 3 (if N>7)
[060] where, C5x5 and C3x3 are the computational costs of the separable convolutions with a depth of 5x5 and 3x3, respectively.
[061] For the same effective sense field, when the input depth N>7, the computation of one 5x5 convolution kernel is less than that of two 3x3 convolution kernels. Remove the final pool layer of Mobilenet-I and add an auxiliary convolution layer to connect the base network and SSDLite network to form the whole structure of the LMS-DN network, as shown in Figure 3.
[062] A standard SSD network was used as the basic network of the SSDLite, and a prediction layer is added on the basis of the standard SSD network. The added prediction layer adopts depth separable convolution, and feature maps produced by a plurality of different convolution layers are fused for detection.
[063] The SSD is a target detection network that directly predicts target types and locations. The SSD network model is preceded by a standard architecture network for image classification, called the infrastructure network, followed by the addition of additional layers, and the fusion of feature maps output by six different convolutional layers for integrated detection. The SSD is shown in Figure 4. The basic network of the SSD is the VGG16, and the network structure is constructed by changing two fully connected layers into convolution layers and adding four convolution layers. The six convolutional layers participating in the feature map fusion generate a certain number
of frames called default frames, whose size Sk is calculated by the follow equation:
S -S. Sk = Smin + '" """ (k - 1), kE [1, m
[064] rn-1
[065] Where Smin represents the minimum value of the default frame, and Smi
equals 0.2; Smaxrepresents the maximum value of the default frame, and Sma equals 0.95; k represents the k-th default box; m represents the number of default boxes, and the default box is adjusted by the aspect ratio a.
[066] When calculating the loss function by the SSD, the total loss function is obtained by calculating the sum of the positioning loss function and the regression loss function.
[067] According to the feature extraction mechanism of small targets and the characteristics of different convolution layers, the LMS-DN network additionally extracts the features of two special convolution layers to detect the targets, which is very effective for detecting small target objects such as mobile phones.
[068] S3. Conducting performance test on the trained handheld call detection model based on the indexes of the detection precision, the detection efficiency and the model size, wherein if the performance detection result is lower than a preset threshold, repeating step S2 to optimize and train the model until the performance result reaches the preset threshold; The target detection model has a plurality of evaluation standards; according to the emphasis of different standards, the embodiment evaluates the trained handheld call detection model by using the detection accuracy, the detection efficiency and the model size; the detection accuracy is evaluated by using the accuracy rate, recall rate, precision rate and mean accuracy (mAP), as shown in the following formula:
accuracy=(1---)x1000/o
[069] m
precision= TP x100/o
[070] TP+FP
recall= xlOO%
[071] TP+FN mAP= AP(q)
[072] QR qE QR
[073] where a is the number of misclassified samples, m is the total number of samples, TP is (the number of) the positive samples which are correctly identified as positive samples, FP is (the number of) the negative samples which are misidentified
as positive samples, and QR refers to the total number of categories. Detection efficiency was evaluated using frames per second detection (FPS), and the size of the model was evaluated using MB (MByte, megabytes). These performance indexes are weighed through experiments, and the algorithm more suitable for embedded migration is discussed comprehensively.
[074] S4. Inputting the driver image acquired in real time into the optimized handheld call detection model in step S3 to obtain the result of the driver handheld call detection.
[075] To further verify the validity of the handheld call detection model of the invention, an experimental platform equipped with an Intel Core 15 7200U processor, an NVIDIA GTX 1080 8G video memory, a Ubuntu 16.04 software environment and a Caffe deep learning framework is utilized in the embodiment. The handheld call detection model based on the LMS-DN network of the invention is compared with the existing handheld call detection model from the following five aspects:
[076] Scheme 1: Experiments were conducted on the KITTI data set by combining the MobileNet, MobileNetV2, and MobileNet-I base networks with the SSD architecture, respectively, and the results are shown in Table 1:
[077] Table 1
Network Data set Model Size/MB Mean accuracy/mAP (%) MobileNet-SSD KITTI 25.1 46.8 MobileNetV2-SSD KITTI 21.8 47.0 MobileNct-I-SSD KITTI 21.5 48.3
[078] As can be seen from Table 1, comparing the results of the first two rows, when the SSD is used in all the detection networks, if the MobileNetV2 is used as the basic network, the network model size is reduced by 3.3 MB, and the detection accuracy is basically not influenced; comparing the results of the second and third rows, the MobileNet-I as the basic network, compared with the MobileNetV2, introduces the SinCony convolution unit by reference to the Inception structure and has more separable convolutions with a depth of 5x5, while the model size is slightly reduced, so that when the network depth reaches a certain degree, the computation cost for a 5x5 convolution core is smaller than that for two the MobileNetV2; besides, as a basic network, compared with the MobileNetV2, the MobileNet-I has a higher detection accuracy, and mAP is increased by 1.3%, which show that the MobileNet-I is also superior to the MobileNetV2 in feature extraction, with a smaller network model and higher detection accuracyScheme 2: The LMS-DN network of the invention and the two popular lightweight target detection networks Mobilenet-SSD and MobilenetV2 SSDLite are subjected to experiments on the VOC, KITTI and Safe_Imgs data sets;
[079] The experimental results on the VOC and KITTI data sets are shown in Tables 2 and 3, respectively;
[080] Table 2
Network Data set Model Mean Size/MB accuracy/mAP (%) MobileNet-SSD VOC0712 23.3 72.3 MobileNetV2-SSDLite VOC0712 19.7 72.6 LMS-DN VOC0712 20.5 76.2
[081] Table 3
Network Data set Model FPS Mean Size/MB accuracy/mAP (%) MobileNet-SSD KITTI 25.1 53 46.8 MobileNetV2-SSDLite KITTI 21.6 59 47.1 LMS-DN KITTI 22.5 58 49.7
[082] From Tables 2 and 3, compared with the results of the first two rows, the MobileNetV2-SSDLite network disclosed by the invention has the advantages that compared with the MobileNetV2-SSDLite, the MobileNetV2-SSD has the advantages that compared with the MobileNetV2-SSDLite, the MobileNetV2-SSDLite has the advantages that compared with the MobileNetV2-SSDLite, the MobileNetV2-SSDLite has the advantages that compared with the MobileNetV2-SSDLite, compared with the MobileNetV2-SSDLite, the mobile NetV2-SSDLite has the advantages that the mAP is improved by 3.6% and 2.6% under the condition that the FPS is slightly reduced. The detection effect of the MobileNetV2-SSDLite and the LMS-DN network on the KITTI data set is shown in FIG. 6, wherein Figure 6(a) is a detection result under the MobileNetV2-SSDLite, and FIG. 6(b) is a detection result under the LMS-DN; and as can be seen from FIG. 6, under the MobileNetV2-SSDLite, a plurality of cars which are smaller than those in the mobile NetV2-SSDLite are not detected, the LMS-DN network which is further improved on the SSDLite has great improvement on the detection of small target objects, and the target vehicles which are far away can be detected. Therefore, the improvement for SSDLite enables LMS-DN to detect more small target objects than MobileNetV2-SSDLite.
[083] The experimental results on the SafeImgs data set are shown in Table 4;
[084] Table 4
Network Model FPS Accura Precision Recall rate Size/M cy rate/ rate /
B /(%) MobileNet-SSD 18.6 59 81.3 90.6 80.6 MobileNetV2-SSDLite 17.5 65 82.7 92.8 83.5 LMS-DN 17.9 63 86.2 96.3 85.2
[085] As can be seen from Table 4, the LMS-DN network of the present invention has an accuracy of 86.2% at the cost of less model size, which is 4.9% higher than that of the MobileNet-SSD and 3.5% higher than that of the MobileNetV2-SSDLite. At the same time, both the accuracy rate and the recall rate have different degrees of improvement.
[086] Scheme 3: The LMS-DN network is compared with the traditional MobileNetV2-SSDLite network by using the SafeImgs data set under the same experimental condition and different detection accuracy threshold values, and the comparison result is shown in Figure 7. When the threshold value is gradually increased from 0.25 to 0.55, the detection performance of the MobileNetV2-SSDLite decreased obviously; however, for the LMS-DN, when the threshold value is 0.55, the accuracy is still 79.60%, so that the LMS-DN has good anti-interference capability.
[087] Scheme 4: To further verify the performance of the handheld call detection model on the embedded development board NVIDIA Jetson TX2, the comparison of the performance of the LMS-DN network and the traditional VGG16-SSD, MobileNet SSD and MobileNetV2-SSDLite on the embedded development board NVIDIA Jetson TX2 is conducted, and the experimental results are shown in Table 5:
[088] Table 5
Network Average time (ms) VGG16-SSD 246 MobileNet-SSD 58 MobileNetV2-SSDLite 50 LMS-DN 56
[089] As can be seen from Table 5, although the average detection time of the LMS-DN is only 6 ms longer than that of the MobileNetV2-SSDLite, it still satisfies the real-time detection requirement of the mobile device, and compared with the other two models, the LMS-DN can realize higher detection accuracy under the condition that the speeds are almost the same. Therefore, the LMS-DN model in the invention is more suitable for transplanting an embedded development board.
[090] Scheme 5: To further verify the reliability, the handheld call detection model of the present invention is subjected to tests of pictures under different illumination and Obstacle occlusion conditions on the NVIDIA Jetson TX2.
[091] The results of different illumination detection are shown in Table 6:
[092] Table 6
Accuracy/Acculacy Number of test Average detection (%) pictures time (ms) Normal 89.2 80 58 Strong light 72.5 80 59 Weak light 77.1 80 57
[093] As can be seen from Table 6, the detection accuracy of the LMS-DN network of the present invention is at a higher level under different illumination intensities.
[094] Obstacle occlusion was tested on NVIDIA Jetson TX2 as shown in Table 7:
[095] Table 7
Accuracy rate /(% Number of Average detection test pictures time (ms) Normal 89.2 80 58 Partial blocked 70.8 80 59
[096] As can be seen from Table 7, the LMS-DN network can still accurately detect the target object mobile phone when most of the mobile phone is blocked by the palm. In summary, the handheld call detection model constructed based on the LMS DN network not only can overcome the influence of strong light and weak light, but also can realize real-time detection of a target with high accuracy in a scene with certain obstacle interference.
[097] Although the invention has been described with reference to specific examples, it will be appreciated by those skilled in the art that the invention may be embodied in many other forms, in keeping with the broad principles and the spirit of the invention described herein.
[098] The present invention and the described embodiments specifically include the best method known to the applicant of performing the invention. The present invention and the described preferred embodiments specifically include at least one feature that is industrially applicable.

Claims (10)

THE CLAIMS DEFINING THE INVENTION ARE AS FOLLOWS:
1. A handheld call detection method based on a lightweight target detection network, characterized by comprising the following steps:
Si. Acquiring a driver image data set, and labeling the driver image data set to obtain the sample image data set;
S2. Constructing a handheld call detection model based on an LMS-DN network, and training the handheld call detection model through a sample image data set obtained in step S Ito obtain a trained handheld call detection model;
S3. Conducting performance test on the trained handheld call detection model based on the indexes of the detection precision, the detection efficiency and the model size, wherein if the performance detection result is lower than a preset threshold, repeating step S2 to optimize and train the model until the performance result reaches the preset threshold;
S4. Inputting the driver image acquired in real time into the optimized handheld call detection model in step S3 to obtain the result of the driver handheld call detection.
2. The handheld call detection method based on the lightweight target detection network according to claim 1, wherein in step S2, the LMS-DN network is divided into two parts: the first part is a basic classification network Mobilenet-I, and the second part is an SSDLite network.
3. The handheld call detection method based on the lightweight target detection network according to claim 2, wherein the Mobilenet-I network comprises a Conv convolution layer of a depth of 3x3, a SinCony convolution layer of a depth of 3x3, a BnConv3 convolution layer of a depth of 3x3, a BnConv3 convolution layer of a depth of 5x5, two BnConv6 convolution layers of a depth of 3x3, a BnConv6 convolution layer of a depth of 5x5, a BnConv6 convolution layer of a depth of 3x3, an FC full connection layer and a pooling pool layer which are sequentially connected.
4. The handheld call detection method based on the lightweight target detection network according to claim 3, wherein the BnConv3 convolution layer with a depth of
3x3 comprises a Conv convolution layer with a depth of lx1, a DwiseConv layer with a depth of 3x3 and a Conv convolution layer with a depth of 1x which are sequentially connected.
5. The handheld call detection method based on the lightweight target detection network according to claim 3, wherein the BnConv3 convolution layer with a depth of x5 comprises a Conv convolution layer with a depth of lx, a DwiseConv layer with a depth of 5x5, and a Conv convolution layer with a depth of 1x which are sequentially connected.
6. The handheld call detection method based on the lightweight target detection network according to claim 3, wherein the BnConv6 convolution layer with a depth of 3x3 comprises a Conv convolution layer with a depth of lx, a DwiseConv layer with a depth of 3x3, and a Conv convolution layer with a depth of x which are sequentially connected.
7. The handheld call detection method based on the lightweight target detection network according to claim 3, wherein the BnConv6 convolution layer with a depth of x5 comprises a Conv convolution layer with a depth of lxi, a DwiseConv layer with a depth of 5x5, and a Conv convolution layer with depth oflxi which are sequentially connected.
8. The handheld call detection method based on the lightweight target detection network according to claim 3, wherein the SinCony convolution layer with a depth of 3x3 comprises two paths of branch structures, each path of branch structure comprises a DWConv convolution layer with a depth of 3x3 and a Conv convolution layer with a depth of lxi which are sequentially connected. . The two branch structures are synthesized to form one path of signals through a Concat function.
9. The handheld call detection method based on a lightweight target detection network according to claim 2, wherein the SSDLite network includes a prediction layer that employs depth separable convolution.
10. The hand-held call detection method based on the lightweight target detection network according to claim 1, wherein the detection accuracy is detected by the accuracy, recall rate, precision and mean accuracy value mAP; the detection efficiency is measured through the number of detected frames per second; and the model size is measured through the megabyte (MB) of the model.
AU2020103494A 2020-11-17 2020-11-17 Handheld call detection method based on lightweight target detection network Ceased AU2020103494A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2020103494A AU2020103494A4 (en) 2020-11-17 2020-11-17 Handheld call detection method based on lightweight target detection network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2020103494A AU2020103494A4 (en) 2020-11-17 2020-11-17 Handheld call detection method based on lightweight target detection network

Publications (1)

Publication Number Publication Date
AU2020103494A4 true AU2020103494A4 (en) 2021-01-28

Family

ID=74192070

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2020103494A Ceased AU2020103494A4 (en) 2020-11-17 2020-11-17 Handheld call detection method based on lightweight target detection network

Country Status (1)

Country Link
AU (1) AU2020103494A4 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112836657A (en) * 2021-02-08 2021-05-25 中国电子科技集团公司第三十八研究所 Pedestrian detection method and system based on lightweight YOLOv3
CN113313679A (en) * 2021-05-21 2021-08-27 浙江大学 Bearing surface defect detection method based on multi-source domain depth migration multi-light source integration
CN113569667A (en) * 2021-07-09 2021-10-29 武汉理工大学 Inland ship target identification method and system based on lightweight neural network model
CN113591648A (en) * 2021-07-22 2021-11-02 北京工业大学 Method, system, device and medium for detecting real-time image target without anchor point
CN114120093A (en) * 2021-12-01 2022-03-01 安徽理工大学 Coal gangue target detection method based on improved YOLOv5 algorithm
CN114360736A (en) * 2021-12-10 2022-04-15 三峡大学 COVID-19 identification method based on multi-information sample class self-adaptive classification network
CN114419473A (en) * 2021-11-17 2022-04-29 中国电子科技集团公司第三十八研究所 Deep learning real-time target detection method based on embedded equipment
CN114495060A (en) * 2022-01-25 2022-05-13 青岛海信网络科技股份有限公司 Road traffic marking identification method and device
CN114612825A (en) * 2022-03-09 2022-06-10 云南大学 Target detection method based on edge equipment
CN114898171A (en) * 2022-04-07 2022-08-12 中国科学院光电技术研究所 Real-time target detection method suitable for embedded platform
CN115049906A (en) * 2022-06-17 2022-09-13 北京理工大学 Precision-preserving SAR ship detection method based on lightweight trunk
CN115099297A (en) * 2022-04-25 2022-09-23 安徽农业大学 Soybean plant phenotype data statistical method based on improved YOLO v5 model
CN117315550A (en) * 2023-11-29 2023-12-29 南京市特种设备安全监督检验研究院 Detection method for dangerous behavior of escalator passengers
CN113591648B (en) * 2021-07-22 2024-06-28 北京工业大学 Anchor-point-free real-time image target detection method, system, equipment and medium

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112836657A (en) * 2021-02-08 2021-05-25 中国电子科技集团公司第三十八研究所 Pedestrian detection method and system based on lightweight YOLOv3
CN113313679A (en) * 2021-05-21 2021-08-27 浙江大学 Bearing surface defect detection method based on multi-source domain depth migration multi-light source integration
CN113569667B (en) * 2021-07-09 2024-03-08 武汉理工大学 Inland ship target identification method and system based on lightweight neural network model
CN113569667A (en) * 2021-07-09 2021-10-29 武汉理工大学 Inland ship target identification method and system based on lightweight neural network model
CN113591648A (en) * 2021-07-22 2021-11-02 北京工业大学 Method, system, device and medium for detecting real-time image target without anchor point
CN113591648B (en) * 2021-07-22 2024-06-28 北京工业大学 Anchor-point-free real-time image target detection method, system, equipment and medium
CN114419473B (en) * 2021-11-17 2024-04-16 中国电子科技集团公司第三十八研究所 Deep learning real-time target detection method based on embedded equipment
CN114419473A (en) * 2021-11-17 2022-04-29 中国电子科技集团公司第三十八研究所 Deep learning real-time target detection method based on embedded equipment
CN114120093B (en) * 2021-12-01 2024-04-16 安徽理工大学 Coal gangue target detection method based on improved YOLOv algorithm
CN114120093A (en) * 2021-12-01 2022-03-01 安徽理工大学 Coal gangue target detection method based on improved YOLOv5 algorithm
CN114360736B (en) * 2021-12-10 2024-07-05 三峡大学 COVID-19 identification method based on multi-information sample class self-adaptive classification network
CN114360736A (en) * 2021-12-10 2022-04-15 三峡大学 COVID-19 identification method based on multi-information sample class self-adaptive classification network
CN114495060B (en) * 2022-01-25 2024-03-26 青岛海信网络科技股份有限公司 Road traffic marking recognition method and device
CN114495060A (en) * 2022-01-25 2022-05-13 青岛海信网络科技股份有限公司 Road traffic marking identification method and device
CN114612825A (en) * 2022-03-09 2022-06-10 云南大学 Target detection method based on edge equipment
CN114612825B (en) * 2022-03-09 2024-03-19 云南大学 Target detection method based on edge equipment
CN114898171A (en) * 2022-04-07 2022-08-12 中国科学院光电技术研究所 Real-time target detection method suitable for embedded platform
CN114898171B (en) * 2022-04-07 2023-09-22 中国科学院光电技术研究所 Real-time target detection method suitable for embedded platform
CN115099297B (en) * 2022-04-25 2024-06-18 安徽农业大学 Soybean plant phenotype data statistical method based on improved YOLO v5 model
CN115099297A (en) * 2022-04-25 2022-09-23 安徽农业大学 Soybean plant phenotype data statistical method based on improved YOLO v5 model
CN115049906B (en) * 2022-06-17 2024-07-05 北京理工大学 Precision-keeping SAR ship detection method based on lightweight trunk
CN115049906A (en) * 2022-06-17 2022-09-13 北京理工大学 Precision-preserving SAR ship detection method based on lightweight trunk
CN117315550B (en) * 2023-11-29 2024-02-23 南京市特种设备安全监督检验研究院 Detection method for dangerous behavior of escalator passengers
CN117315550A (en) * 2023-11-29 2023-12-29 南京市特种设备安全监督检验研究院 Detection method for dangerous behavior of escalator passengers

Similar Documents

Publication Publication Date Title
AU2020103494A4 (en) Handheld call detection method based on lightweight target detection network
US11887064B2 (en) Deep learning-based system and method for automatically determining degree of damage to each area of vehicle
CN110059558B (en) Orchard obstacle real-time detection method based on improved SSD network
CN111814755A (en) Multi-frame image pedestrian detection method and device for night motion scene
CN111476099B (en) Target detection method, target detection device and terminal equipment
CN111582253B (en) Event trigger-based license plate tracking and identifying method
Geetha et al. Detection and estimation of the extent of flood from crowd sourced images
Tang et al. Multiple-kernel adaptive segmentation and tracking (MAST) for robust object tracking
CN113420819A (en) Lightweight underwater target detection method based on CenterNet
CN111079518A (en) Fall-down abnormal behavior identification method based on scene of law enforcement and case handling area
CN114898326A (en) Method, system and equipment for detecting reverse running of one-way vehicle based on deep learning
CN113255580A (en) Method and device for identifying sprinkled objects and vehicle sprinkling and leaking
CN114267082A (en) Bridge side falling behavior identification method based on deep understanding
CN107045630B (en) RGBD-based pedestrian detection and identity recognition method and system
CN111354016A (en) Unmanned aerial vehicle ship tracking method and system based on deep learning and difference value hashing
CN113191270B (en) Method and device for detecting throwing event, electronic equipment and storage medium
CN111709377B (en) Feature extraction method, target re-identification method and device and electronic equipment
CN111652907B (en) Multi-target tracking method and device based on data association and electronic equipment
CN116721288A (en) Helmet detection method and system based on YOLOv5
Yu et al. An Algorithm for Target Detection of Engineering Vehicles Based on Improved CenterNet.
CN116797789A (en) Scene semantic segmentation method based on attention architecture
JP2014048702A (en) Image recognition device, image recognition method, and image recognition program
CN111723614A (en) Traffic signal lamp identification method and device
CN115690732A (en) Multi-target pedestrian tracking method based on fine-grained feature extraction
KR102283053B1 (en) Real-Time Multi-Class Multi-Object Tracking Method Using Image Based Object Detection Information

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry