CN112633086B - Near-infrared pedestrian monitoring method, system, medium and equipment based on multitasking EfficientDet - Google Patents

Near-infrared pedestrian monitoring method, system, medium and equipment based on multitasking EfficientDet Download PDF

Info

Publication number
CN112633086B
CN112633086B CN202011427301.4A CN202011427301A CN112633086B CN 112633086 B CN112633086 B CN 112633086B CN 202011427301 A CN202011427301 A CN 202011427301A CN 112633086 B CN112633086 B CN 112633086B
Authority
CN
China
Prior art keywords
pedestrian
efficientdet
detection
segmentation
multitasking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011427301.4A
Other languages
Chinese (zh)
Other versions
CN112633086A (en
Inventor
张建龙
何建辉
李桥
王斌
郭鑫宇
刘池帅
崔梦莹
时国强
余鑫城
方光祖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202011427301.4A priority Critical patent/CN112633086B/en
Publication of CN112633086A publication Critical patent/CN112633086A/en
Application granted granted Critical
Publication of CN112633086B publication Critical patent/CN112633086B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/12Details of acquisition arrangements; Constructional details thereof
    • G06V10/14Optical characteristics of the device performing the acquisition or on the illumination arrangements
    • G06V10/143Sensing or illuminating at different wavelengths
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Biology (AREA)
  • Traffic Control Systems (AREA)
  • Closed-Circuit Television Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of near-infrared image pedestrian detection, and discloses a near-infrared pedestrian monitoring method, a system, a medium and equipment based on multitask EfficientDet, wherein a near-infrared image pedestrian detection data set is utilized to obtain pedestrian activity area distribution in different scenes; adopting an Efficientdet-D0 as a basic detection network, constructing semantic segmentation branches by using P2, P3, P4 and P5 layers in a backbone network of the Efficientdet-D0 to monitor pedestrian activity areas, and enhancing segmentation performance through hole space pyramid pooling and convolution module attention; based on the multitasking pedestrian detection model, post-processing is carried out on the pedestrian target detection result through the predicted pedestrian activity area, and a final pedestrian detection result and a pedestrian activity area result are obtained. The invention has higher detection performance, can reduce false positive samples in results, and has important significance for pedestrian activity monitoring in night monitoring scenes.

Description

Near-infrared pedestrian monitoring method, system, medium and equipment based on multitasking EfficientDet
Technical Field
The invention belongs to the technical field of near-infrared image pedestrian detection, and particularly relates to a near-infrared pedestrian monitoring method, system, medium and equipment based on multitasking EfficientDet.
Background
The study of pedestrian detection began in the nineties of the twentieth century, from early traditional manual feature-based methods, such as HOG feature-based methods, harr wavelet feature-based methods, and edgelet feature-based methods, to current deep learning feature-based methods, such as feature extraction using a res net model, VGG model, and other convolutional neural network-based models. With the development of technology, pedestrian detection technology has rapidly developed. Pedestrian detection is one of the branches of computer vision field with wide application, and has very important functions in the fields of intelligent security, automatic driving, robots and the like.
The pedestrian detection technology of the near infrared image is a technology which is favorable for obtaining the position of a pedestrian by the near infrared image, and is widely applied to intelligent security and automatic driving technologies. Under night scenes, the visible light imaging cannot obtain a good imaging effect, and compared with the traditional visible light detection, the infrared pedestrian detection has the following advantages: (1) Under the condition of weak light, a better imaging effect still exists, and the pedestrian characteristic can be easily obtained from the background. (2) infrared imaging may reduce background color interference. By virtue of these advantages, infrared pedestrian detection has recently been expanding in many fields. The intelligent video monitoring system has unusual performances in the fields of vehicle-mounted auxiliary driving, military early warning and the like. However, compared with visible light imaging, infrared imaging has disadvantages such as less abundant texture and contour features than visible light imaging and single color information, which increases difficulty in recognition and detection. However, the flaw is not masked, the infrared pedestrian detection has very important significance in all aspects, and the serious challenge exists. Therefore, aiming at the existing problems, the design of the infrared pedestrian detection algorithm with high efficiency and robustness has very far-reaching significance.
Pedestrian target detection of near-infrared images is essentially a classification plus regression problem, namely, frame prediction is regarded as a regression task, and pedestrian and background separation is regarded as a classification task. In the past decade, students at home and abroad have made great contributions to the research of various pedestrian detection methods, and by means of feature extraction, the research methods can be divided into two types, and a method based on manual features is gradually developed into a method based on deep learning features.
The early pedestrian detection method mainly carries out feature extraction by manually designing a feature extraction operator, wherein the most representative is the gradient direction histogram (Histogram of Oriented Gradients, HOG) feature proposed by Dalal in 2005, which is used as a marker of milestones in the pedestrian detection history, lays a foundation for excellent feature extraction, and simultaneously combines a learning method of a support vector machine (Support Vector Machine, SVM) to obtain a better detection effect. Other methods based on manual features include a method based on Harr wavelet features and a method based on edgelet features.
The traditional infrared pedestrian image detection method based on manual feature extraction has the following defects: (1) The manual feature design is difficult, the effectiveness of the artificial structural feature cannot be guaranteed (2) the hierarchy of the artificial structural feature is shallow, pedestrian detection under a complex background is difficult to deal with, and the detection performance is low.
With the rapid development of deep learning theory and technology, researchers have attempted to solve pedestrian detection problems on infrared images using deep learning. According to the method, the accuracy rate is low and the omission ratio is high when infrared video image pedestrians are detected by using a method such as Wang Dianwei and the like aiming at the Yolov3 under the condition that the omission ratio is far lower than that of RPN_BF and HOG+SVM on a VS data set and an NIR data set by analyzing the target detection performance of R-CNN/Faster RCNN on a visible light image, a network model for visible light spectrum and infrared image is constructed, and the result shows that the accuracy rate of the improved Yolov3 algorithm in infrared pedestrian detection is up to 90.63%, which is obviously superior to that of Faster-RCNN and the Yolov3 algorithm, and the improved network can detect more targets at the same time, so that the omission ratio is reduced.
Although higher detection accuracy is achieved on the pedestrian detection problem of the deep learning near infrared image, there are disadvantages: (1) False detection against common sense information occurs in the detection result due to lack of understanding of the whole semantics of the image, noise and other interference factors; (2) There is a possibility that effective features of pedestrians cannot be extracted because the image information is single.
In summary, the problems of the prior art are: because the end-to-end convolutional neural network lacks an understanding of the overall semantic information, many false detections that violate common sense can occur. Because the near-infrared image has single color and blurred outline, how to effectively extract the characteristics of the pedestrian target from the near-infrared image is one of the main problems at present.
The difficulty of solving the technical problems is as follows: how to reduce semantic errors in detection results and how to effectively extract pedestrian features from near infrared images without degrading detection performance.
Meaning of solving the technical problems: the pedestrian detection algorithm of the near infrared image is widely applied to the fields of intelligent security, automatic driving and robots, and has important significance on the intelligent security, automatic driving and the like how to obtain a high-precision detection result in the shortest time possible.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides a near infrared pedestrian monitoring method, a near infrared pedestrian monitoring system, a near infrared pedestrian monitoring medium and near infrared pedestrian monitoring equipment based on a multi-task EfficientDet.
The invention is realized in such a way that the near infrared pedestrian monitoring method based on the multitasking EfficientDet comprises the following steps:
the pedestrian detection training data set of the near infrared image is utilized to obtain pedestrian activity area distribution under different scenes, and the pedestrian activity area distribution is used for training segmentation branches of a model;
the method comprises the steps of adopting an Efficientdet-D0 as a basic detection network, constructing semantic segmentation branches by using P2, P3, P4 and P5 layers in a backbone network of the Efficientdet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, sharing bottom layer characteristics through target detection and semantic segmentation, improving generalization capability of a model through multi-task learning, and further post-processing detection results through obtained pedestrian activity area prediction results to improve performance;
based on the multitasking EfficientDet-D0 model, the pedestrian target detection result is post-processed through the predicted pedestrian activity area, FP (false positive sample) in the predicted result is reduced, and the final pedestrian detection result and the pedestrian activity area result are obtained.
Further, the near infrared pedestrian monitoring method based on the multitasking EfficientDet specifically comprises the following steps:
step one, pedestrian activity area distribution in different scenes is obtained by utilizing a near infrared image pedestrian detection training data set;
secondly, adopting an Efficientdet-D0 as a basic detection network, constructing semantic segmentation branches by using P2, P3, P4 and P5 layers in a backbone network of the Efficientdet-D0 to monitor pedestrian activity areas, and adding a cavity space pyramid pooling module and an attention module to enhance segmentation performance;
training an improved multitasking EfficientDet-D0 model, and improving the generalization capability of the model to pedestrian targets through multitasking learning;
and step four, firstly inputting k images, testing to obtain pedestrian activity area prediction of the scene, inputting a near infrared image for testing to obtain a pedestrian target detection result, and filtering false positive samples in pedestrian target detection through the pedestrian activity area prediction result.
Further, the first step includes:
a) The used image training set is a single-channel near infrared image data set in a monitoring scene;
b) Traversing all images in each scene in the training data set, obtaining the center point coordinates of the pedestrian targets through the annotation data, obtaining mask images of pedestrian areas through the center point coordinates of all the pedestrian targets in the scene, and executing the operation on each scene to obtain the pedestrian distribution areas of each scene.
Further, the semantic division branch of the second step includes: adopting an Efficientdet-D0 as a basic detection network, constructing semantic segmentation branches by using P2, P3, P4 and P5 layers in a backbone network of the Efficientdet-D0, wherein the number of corresponding characteristic channels is (24, 40, 112, 320), performing 2 times up sampling after P5 passes through a cavity space pyramid pooling module, performing channel splicing with P4, then performing a convolution module attention mechanism, reducing the number of channels to 128 through a convolution layer, a BN layer and an activation layer, and obtaining a characteristic diagram O1 through the convolution module attention mechanism; the method comprises the steps of performing 2 times up-sampling on an obtained feature map O1 and performing channel splicing on P3, then reducing the number of channels to 64 through a convolution layer after a convolution module attention mechanism, obtaining a feature map O2 through a convolution module attention mechanism, performing 2 times up-sampling on the obtained feature map O2 and performing channel splicing on P2, then reducing the number of channels to 32 through the convolution layer after the convolution module attention mechanism, obtaining a feature map O3 through the convolution module attention mechanism, and reducing the number of channels to 1 through a BN layer and an activation layer by O3 to obtain an output result O4; directly calculating model loss by using O4; if in the test stage, the O4 is up-sampled by 4 times to obtain the segmentation result of the original image size.
Further, the third step includes:
a) Embedding the segmentation branches into a detection network, and sharing the bottom layer characteristics with a target detection network;
b) The method comprises the steps that a hole space pyramid pooling module and a convolution module attention mechanism attention model are adopted in a segmentation branch, so that the segmentation performance of the model is improved, and different learning rates are adopted in the segmentation branch and a detection branch to reduce the overfitting of the model;
c) The definition of Dice Loss is as follows, using Dice Loss as a Loss function of the split branches:
wherein X is a predictive mask, Y is a labeling mask for pedestrian areas, wherein |X n Y| is the intersection between X and Y, and |X|+|Y| is the union between X and Y;
b) Focal Loss is used as a Loss function of classification, and Focal Loss is defined:
wherein y is a label, y' is a model prediction result, gamma is a super parameter, used for adjusting the weight of a difficult sample, set to 2.0, alpha is a super parameter, used for adjusting the proportion of positive and negative samples, and set to 0.05;
c) SmoothL1 Loss was used as the Loss function for the frame regression:
wherein y is a label, and y' is a model prediction result;
d) Training parameter setting:
learning Rate:3e-4;
learning rate reduction mode: cosine dip;
batch Size: setting the batch size to be 16;
input image size: the input image size is 768 x 512;
optimizer: adopting an AdamW optimizer to realize rapid convergence of the network;
because the data volume of pedestrian region segmentation is less than that of pedestrian detection branches, the learning rate of the semantic segmentation branches is adjusted to be 0.01 times of the normal learning rate of the network; and the ratio of the classification loss, the regression loss and the segmentation loss was set to 10:10:1.
further, the fourth step includes: firstly, predicting pedestrian activity areas of k images, inputting the k images, and obtaining a final probability map result after the output result of a segmentation branch passes through a sigmoid function, wherein the definition of the sigmoid function is as follows:
the output result processing process is as follows:
D=S(D');
then taking a threshold T to carry out binarization processing on the prediction result, wherein the processing procedure is as follows:
taking the active areas predicted by the k image segmentation branches as prior information, and filtering target points of which the border central points are not in the predicted active areas in the final detection result, thereby achieving the purpose of reducing false positive samples, wherein the processing procedure is as follows:
sorce(x,y)=D(x,y)*sorce(x,y);
where the source (x, y) is the confidence score of the target at the point (x, y).
It is a further object of the present invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
the near-infrared pedestrian monitoring method based on the multitasking EfficientDet is characterized by comprising the following steps of:
acquiring pedestrian activity area distribution under different scenes by utilizing a near infrared image pedestrian detection training data set;
adopting an EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by using P2, P3, P4 and P5 layers in a backbone network of the EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, sharing bottom layer characteristics through target detection and semantic segmentation, and improving model generalization capability;
based on the multitasking EfficientDet-D0 model, the pedestrian target detection result is subjected to post-processing through the predicted pedestrian activity area, and a final pedestrian detection result and a pedestrian activity area result are obtained.
Another object of the present invention is to provide a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
the near-infrared pedestrian monitoring method based on the multitasking EfficientDet is characterized by comprising the following steps of:
acquiring pedestrian activity area distribution under different scenes by utilizing a near infrared image pedestrian detection training data set;
adopting an EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by using P2, P3, P4 and P5 layers in a backbone network of the EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, sharing bottom layer characteristics through target detection and semantic segmentation, and improving model generalization capability;
based on the multitasking EfficientDet-D0 model, the pedestrian target detection result is subjected to post-processing through the predicted pedestrian activity area, and a final pedestrian detection result and a pedestrian activity area result are obtained.
The invention further aims at providing an information data processing terminal which is used for realizing the near infrared pedestrian monitoring method based on the multitasking EfficientDet.
Another object of the present invention is to provide a near infrared pedestrian monitoring method system based on a multitasking EfficientDet for implementing the near infrared pedestrian monitoring method based on a multitasking EfficientDet, the near infrared pedestrian monitoring method system based on a multitasking EfficientDet comprising:
the pedestrian activity area distribution obtaining module is used for obtaining pedestrian activity area distribution in different scenes by utilizing the near infrared image pedestrian detection training data set;
the shared bottom layer feature segmentation module is used for adopting an EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by using P2, P3, P4 and P5 layers in a backbone network of the EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through the cavity space pyramid pooling module and the attention module, and sharing bottom layer features through target detection and semantic segmentation;
and the pedestrian detection result and pedestrian activity area result output module is used for carrying out post-processing on the pedestrian target detection result through the predicted pedestrian activity area based on the multitasking EfficientDet-D0 model to obtain a final pedestrian detection result and a pedestrian activity area result.
By combining all the technical schemes, the invention has the advantages and positive effects that: the method is mainly applied to the pedestrian detection field of near infrared images, solves the problem of unreasonable semantic errors in the prediction results, reduces the false positive rate of the prediction results while maintaining higher AP indexes, solves the problem of multi-task learning sharing feature extraction network, and effectively improves the generalization capability of the model; the invention converts the target detection problem into a two-class semantic segmentation problem and a normal target detection problem, and provides a pedestrian activity area prediction and pedestrian position detection network for pixel-level semantic segmentation, which comprises the following steps: the cavitation space pyramid pooling module which increases the attention mechanism of the convolution module is combined with up-sampling to serve as a pedestrian active area prediction branch, the generalization capability of the model is improved through multi-task learning, false positive samples in detection results are filtered through the pedestrian active area, and the performance of pedestrian target detection is further improved.
In the invention, semantic segmentation and target detection are combined to act on the pedestrian target detection process, and false positive samples which do not accord with semantic information exist in the pedestrian detection process, such as predicting the possible occurrence of tree or other areas which violate common sense of some pedestrian targets, the false positive samples of the part can be filtered through predicting the pedestrian areas, and the generalization capability of the model is improved through multi-task learning, so that more accurate pedestrian detection results can be obtained.
According to the method, the semantic segmentation network and the target detection network are integrated, the feature extraction network is shared, and a single model simultaneously completes two tasks of semantic segmentation and target detection, so that training cost is greatly reduced, generalization capability of the model is greatly improved, excessive fitting detection data of the model is reduced, and the method provided by the invention can obtain higher detection performance and better generalization capability through multi-task learning and post-processing of the detection result by using a pedestrian activity area.
Compared with the prior art, the method has the following advantages:
(1) According to the invention, the pedestrian active area segmentation branches are introduced into the target detection model, and the generalization capability of the model is improved through multi-task learning;
(2) According to the invention, the prediction result of the pedestrian activity area is obtained through the additional segmentation branches, and the pedestrian target detection result is subjected to post-processing through the pedestrian activity area, so that the detection performance is improved;
(3) In a fixed monitoring scene, the pedestrian activity area is pre-calculated and used for subsequent processing, so that the time cost of each activity area prediction is reduced, and the frame rate is always kept in a real-time monitoring state.
TABLE 1 comparison of the invention with other target detection methods
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly explain the drawings needed in the embodiments of the present application, and it is obvious that the drawings described below are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a near infrared pedestrian monitoring method based on a multitasking effect det according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of a system of a near infrared pedestrian monitoring method based on a multitasking EfficientDet according to an embodiment of the invention;
in fig. 2: 1. the pedestrian activity area distribution acquisition module; 2. a segmentation and detection module sharing the bottom layer characteristics; 3. and the pedestrian detection result and pedestrian activity area result output module.
Fig. 3 is a flowchart of an implementation of a near infrared pedestrian monitoring method based on a multitasking effect det according to an embodiment of the present invention.
FIG. 4 is a diagram of a network architecture of a multitasking EfficientDet-D0 provided by an embodiment of the invention.
Detailed Description
The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Aiming at the problems existing in the prior art, the invention provides a near infrared pedestrian monitoring method, a near infrared pedestrian monitoring system, a near infrared pedestrian monitoring medium and near infrared pedestrian monitoring equipment based on a multi-task EfficientDet, and the invention is described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the near infrared pedestrian monitoring method based on the multitasking effect provided by the invention comprises the following steps:
s101: obtaining pedestrian activity area distribution in a fixed scene by using a training data set, inputting a near infrared image, copying a channel of the input image, and dividing the training set and a test data set in a random division mode;
s102: training the improved multitasking Efficientdet-D0 model, and aiming at the segmentation branch, giving a smaller learning rate to prevent overfitting;
s103: and obtaining a pedestrian activity area prediction result through inputting k images, inputting a single near infrared image for testing, and obtaining a final detection result through the pedestrian detection result and the pedestrian activity area prediction result.
Other steps may be performed by those skilled in the art of the near infrared pedestrian monitoring method based on the multi-tasking EfficientDet provided by the present invention, and the near infrared pedestrian monitoring method based on the multi-tasking EfficientDet provided by the present invention of FIG. 1 is only one specific embodiment.
As shown in fig. 2, the near infrared pedestrian monitoring method system based on the multitasking effect provided by the invention includes:
the pedestrian activity area distribution obtaining module 1 is used for obtaining pedestrian activity area distribution in different scenes by utilizing a near infrared image pedestrian detection training data set;
the shared bottom layer feature segmentation module 2 is used for adopting an EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by using P2, P3, P4 and P5 layers in a backbone network of the EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and sharing bottom layer features through target detection and semantic segmentation;
and the pedestrian detection result and pedestrian activity area result output module 3 is used for carrying out post-processing on the pedestrian target detection result through the predicted pedestrian activity area based on the multitasking EfficientDet-D0 model to obtain a final pedestrian detection result and a final pedestrian activity area result.
The technical scheme of the invention is further described below with reference to the accompanying drawings.
The invention utilizes the near infrared image pedestrian detection data set to obtain pedestrian activity area distribution under different scenes; adopting an EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by using P2, P3, P4 and P5 layers in a backbone network of the EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance by a cavity space pyramid pooling module and a feature attention module, sharing bottom layer features by using target detection and semantic segmentation, and improving model generalization capability; based on the multitasking pedestrian detection model, post-processing is carried out on the pedestrian target detection result through the predicted pedestrian activity area, and a final pedestrian detection result and a pedestrian activity area result are obtained. According to the invention, through the combined action of the segmentation branch and the target detection branch, the prediction of the false positive sample is effectively improved and reduced, the detection performance on the pedestrian target in the near infrared image is higher, the FP sample can be reduced, and the method has important significance for pedestrian activity monitoring in a night monitoring scene.
As shown in fig. 3, the near infrared pedestrian monitoring method based on the near infrared image and the multitask effect det provided by the embodiment of the invention specifically includes the following steps:
step one, the used image training set is a single-channel near infrared image data set in a monitoring scene; traversing all images in each scene in the training data set, obtaining the center point coordinates of the pedestrian targets through the annotation data, obtaining mask images of pedestrian areas through the center point coordinates of all the pedestrian targets in the scene, and executing the operation on each scene to obtain the pedestrian distribution areas of each scene.
Secondly, adopting an Efficientdet-D0 as a basic detection network, constructing semantic segmentation branches by using P2, P3, P4 and P5 layers in a backbone network of the Efficientdet-D0 to monitor pedestrian activity areas, and adding a cavity space pyramid pooling module and an attention module to enhance segmentation performance;
2a) The design of the segmentation branches is as follows, and the segmentation performance is improved by mainly utilizing a hole space pyramid pooling module and a convolution module attention mechanism attention module:
the invention adopts P2, P3, P4 and P5 layers in the backbone network of Efficientdet-D0 to construct semantic segmentation branches, the number of corresponding characteristic channels is (24, 40, 112, 320), P5 is subjected to 2 times up-sampling after passing through a cavity space pyramid pooling module and is subjected to channel splicing with P4, then after passing through a convolution module attention mechanism, the number of channels is reduced to 128 by a convolution layer, and then the number of channels is reduced to 128 by a BN layer and an activation layer, and then a characteristic diagram O1 is obtained through the convolution module attention mechanism. And performing 2 times up sampling on the obtained characteristic diagram O1 and performing channel splicing on P3, then performing a convolution module attention mechanism, then obtaining the characteristic diagram O2 by reducing the number of channels to 64 through a convolution layer, performing 2 times up sampling on the obtained characteristic diagram O2 and performing channel splicing on P2 through the convolution module attention mechanism, then performing a convolution layer, reducing the number of channels to 32 through the BN layer and the activation layer, obtaining the characteristic diagram O3 through the convolution module attention mechanism, and reducing the number of channels to 1 through the BN layer and the activation layer to obtain an output result O4. Directly calculating model loss by using O4; if in the test stage, the O4 is up-sampled by 4 times to obtain the segmentation result of the original image size.
Training an improved multitasking EfficientDet-D0 model, and improving the generalization capability of the model to pedestrian targets through multitasking learning;
3a) Embedding the segmentation branches into a detection network, and sharing the bottom layer characteristics with a target detection network;
3b) And a hole space pyramid pooling module and a convolution module attention mechanism attention model are adopted in the segmentation branches, so that the segmentation performance of the model is improved, and different learning rates are adopted in the segmentation branches and the detection branches to reduce the overfitting of the model.
3c) The definition of Dice Loss is as follows, using Dice Loss as a Loss function of the split branches:
where X is the predictive mask and Y is the labeling mask for the pedestrian region, where |X n Y| is the intersection between X and Y and |X|+|Y| is the union between X and Y.
3d) Focal Loss is used as a Loss function of the classification, and the definition of Focal Loss is as follows
Wherein y is a label, y' is a model prediction result, gamma is a super parameter, the weight of the difficult sample is adjusted to be 2.0, alpha is a super parameter, and the ratio of positive and negative samples is adjusted to be 0.05.
3e) Loss function using smoothL1 Loss as frame regression
Where y is the label and y' is the model prediction result.
3f) Training parameter setting:
learning Rate:3e-4;
learning rate reduction mode: cosine dip;
batch Size: setting the batch size to be 16;
input image size: the input image size is 768 x 512;
optimizer: adopting an AdamW optimizer to realize rapid convergence of the network;
because the data volume of pedestrian region segmentation is less than that of pedestrian detection branches, the learning rate of the semantic segmentation branches is adjusted to be 0.01 times of the normal learning rate of the network; and the ratio of the classification loss, the regression loss and the segmentation loss was set to 10:10:1.
and step four, firstly inputting k images, testing to obtain pedestrian activity area prediction of the scene, inputting a near infrared image for testing to obtain a pedestrian target detection result, and filtering false positive samples in pedestrian target detection through the pedestrian activity area prediction result. The specific flow is as follows:
firstly, predicting pedestrian activity areas of k images (taking k=10), inputting the k images, and obtaining a final probability map result after the output result of the segmentation branch is subjected to a sigmoid function, wherein the definition of the sigmoid function is as follows:
the output result processing process is as follows:
D=S(D');
then taking a threshold T to carry out binarization processing on the prediction result, wherein the processing procedure is as follows:
and taking the active areas predicted by the k image segmentation branches as prior information, and filtering target points of which the border central points are not in the predicted active areas in the final detection result, thereby achieving the purpose of reducing false positive samples. The treatment process is as follows:
sorce(x,y)=D(x,y)*sorce(x,y);
where the source (x, y) is the confidence score of the target at the point (x, y).
The technical effects of the present invention will be described in detail with reference to simulation.
1. Simulation conditions
The simulation experiment of the invention is completed by applying Pycharm software on a PC with CPU being an Intel (R) Core (TM) i7-7820X, CPU3.60GHz, RAM 32.00GB, 2X 2080Ti and ubuntu18.0 operating system.
2. Simulation experiment contents
The experiment was trained and tested using a data set from 10 acquired scenes with an original image resolution of 2560 x 1440, scaled to 768 x 512. The scene A, B is taken as a test set, other scenes are taken as training sets, the total number of the test sets is 3380 images, and the total number of the training sets is 13072 images.
3. Simulation experiment results and analysis
Table 2 shows the comparison of the method of the present invention with the Cascade RCNN original version, the EfficientDet-D0 original version, and the EfficientDet-D0 method after Anchor clustering
TABLE 1 comparison of the invention with other target detection methods
As can be seen from Table 1, the method of the present invention maintains high detection accuracy while maintaining high frame rate by means of multi-task learning, and improves detection performance by 1.5 points when compared with the original Efficientdet-D0+Anchor clustering model, and compared with other methods, the method of the present invention reduces the number of false positive samples, and constrains the existence region of the positive samples, thereby improving detection performance. In conclusion, the invention successfully utilizes the multi-task detection model to improve the detection performance, reduces the number of false positive samples, obtains higher detection precision, and has important significance for researching intelligent security, automatic driving and the like.
It should be noted that the embodiments of the present invention can be realized in hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or special purpose design hardware. Those of ordinary skill in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such as provided on a carrier medium such as a magnetic disk, CD or DVD-ROM, a programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The device of the present invention and its modules may be implemented by hardware circuitry, such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., as well as software executed by various types of processors, or by a combination of the above hardware circuitry and software, such as firmware.
The foregoing is merely illustrative of specific embodiments of the present invention, and the scope of the invention is not limited thereto, but any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention will be apparent to those skilled in the art within the scope of the present invention.

Claims (7)

1. The near-infrared pedestrian monitoring method based on the multitasking EfficientDet is characterized by specifically comprising the following steps of:
step one, an image training set used is a single-channel near-infrared image data set under a monitoring scene, and all images under each scene in the training data set are traversed by using the single-channel near-infrared image data set to obtain pedestrian distribution areas of each scene;
secondly, adopting an Efficientdet-D0 as a basic detection network, constructing semantic segmentation branches by using P2, P3, P4 and P5 layers in a backbone network of the Efficientdet-D0 to monitor pedestrian activity areas, and adding a cavity space pyramid pooling module and an attention module to enhance segmentation performance;
training an improved multitasking EfficientDet-D0 model, and improving the generalization capability of the model to pedestrian targets through multitasking learning; embedding the segmentation branches into a basic detection network and sharing bottom layer characteristics with a target detection network;
inputting k images, testing to obtain pedestrian activity area prediction of the scene, inputting a near infrared image for testing to obtain a pedestrian target detection result, and filtering false positive samples in pedestrian target detection through the pedestrian activity area prediction result; the fourth step comprises the following steps: firstly, predicting pedestrian activity areas of k images, inputting the k images, and obtaining a final probability map result after the output result of a segmentation branch passes through a sigmoid function, wherein the definition of the sigmoid function is as follows:
the output result processing process is as follows:
then taking a threshold T to carry out binarization processing on the prediction result, wherein the processing procedure is as follows:
taking the active areas predicted by the k image segmentation branches as prior information, and filtering target points of which the border central points are not in the predicted active areas in the final detection result, thereby achieving the purpose of reducing false positive samples, wherein the processing procedure is as follows:
wherein the method comprises the steps ofAt the point->Confidence score at;
the semantic division branch of the second step comprises: adopting an Efficientdet-D0 as a basic detection network, constructing semantic segmentation branches by using P2, P3, P4 and P5 layers in a backbone network of the Efficientdet-D0, wherein the number of corresponding characteristic channels is (24, 40, 112, 320), performing 2 times up sampling after P5 passes through a cavity space pyramid pooling module, performing channel splicing with P4, then performing a convolution module attention mechanism, reducing the number of channels to 128 through a convolution layer, a BN layer and an activation layer, and obtaining a characteristic diagram O1 through the convolution module attention mechanism; the method comprises the steps of performing 2 times up-sampling on an obtained feature map O1 and performing channel splicing on P3, then reducing the number of channels to 64 through a convolution layer after a convolution module attention mechanism, obtaining a feature map O2 through a convolution module attention mechanism, performing 2 times up-sampling on the obtained feature map O2 and performing channel splicing on P2, then reducing the number of channels to 32 through the convolution layer after the convolution module attention mechanism, obtaining a feature map O3 through the convolution module attention mechanism, and reducing the number of channels to 1 through a BN layer and an activation layer by O3 to obtain an output result O4; directly calculating model loss by using O4; if in the test stage, the O4 is up-sampled by 4 times to obtain the segmentation result of the original image size.
2. The near infrared pedestrian monitoring method based on multitasking effect det of claim 1, wherein said step one comprises:
a) The used image training set is a single-channel near infrared image data set in a monitoring scene;
b) Traversing all images in each scene in the training data set, obtaining the center point coordinates of the pedestrian targets through the annotation data, obtaining mask images of pedestrian areas through the center point coordinates of all the pedestrian targets in the scene, and executing the operation on each scene to obtain the pedestrian distribution areas of each scene.
3. The near infrared pedestrian monitoring method based on multitasking effect det of claim 1, wherein said step three comprises:
a) Embedding the segmentation branches into a detection network, and sharing the bottom layer characteristics with a target detection network;
b) The method comprises the steps that a hole space pyramid pooling module and a convolution module attention mechanism attention model are adopted in a segmentation branch, so that the segmentation performance of the model is improved, and different learning rates are adopted in the segmentation branch and a detection branch to reduce the overfitting of the model;
c) The definition of Dice Loss is as follows, using Dice Loss as a Loss function of the split branches:
wherein the method comprises the steps ofFor predicting mask->Marking a mask for pedestrian areas, wherein +.>Is->And->The intersection between the two is defined as the intersection,is->And->A union of the two;
b) Focal Loss is used as a Loss function of classification, and Focal Loss is defined:
wherein the method comprises the steps ofFor labels, for example->For model prediction results, ++>Is super parameter, is used for adjusting the weight of the difficult sample, is set to 2.0,is super parameter, is used for adjusting the proportion of positive and negative samples, and is set to 0.05;
c) SmoothL1 Loss was used as the Loss function for the frame regression:
wherein the method comprises the steps ofFor labels, for example->The model prediction result is obtained;
d) Training parameter setting:
learning Rate:3e-4;
learning rate reduction mode: cosine dip;
batch Size: setting the batch size to be 16;
input image size: the input image size is 768 x 512;
optimizer: adopting an AdamW optimizer to realize rapid convergence of the network;
because the data volume of pedestrian region segmentation is smaller than that of pedestrian detection branches, the learning rate of the semantic segmentation branches is adjusted to be 0.01 times of the normal learning rate of the network; and the ratio of the classification loss, the regression loss and the segmentation loss was set to 10:10:1.
4. a computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program, which when executed by the processor, causes the processor to execute the near infrared pedestrian monitoring method based on the multitasking effect det as claimed in any one of claims 1 to 3.
5. A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the near infrared pedestrian monitoring method based on a multitasking effect det of any one of claims 1 to 3.
6. An information data processing terminal, characterized in that the information data processing terminal is used for implementing the near infrared pedestrian monitoring method based on the multitasking EfficientDet according to any one of claims 1 to 3.
7. A near infrared pedestrian monitoring system based on a multitasking EfficientDet implementing the near infrared pedestrian monitoring method based on a multitasking EfficientDet of any one of claims 1 to 3, characterized in that the near infrared pedestrian monitoring system based on a multitasking EfficientDet comprises:
the pedestrian activity area distribution obtaining module is used for detecting pedestrians by utilizing the near infrared images and obtaining pedestrian activity area distribution under different scenes through the training data set;
the segmentation and detection module is used for adopting an EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by using P2, P3, P4 and P5 layers in a backbone network of the EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through the cavity space pyramid pooling module and the attention module, and sharing the bottom features through target detection and semantic segmentation;
and the pedestrian detection result and pedestrian activity area result output module is used for carrying out post-processing on the pedestrian target detection result through the predicted pedestrian activity area based on the multitasking EfficientDet-D0 model to obtain a final pedestrian detection result and a pedestrian activity area result.
CN202011427301.4A 2020-12-09 2020-12-09 Near-infrared pedestrian monitoring method, system, medium and equipment based on multitasking EfficientDet Active CN112633086B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011427301.4A CN112633086B (en) 2020-12-09 2020-12-09 Near-infrared pedestrian monitoring method, system, medium and equipment based on multitasking EfficientDet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011427301.4A CN112633086B (en) 2020-12-09 2020-12-09 Near-infrared pedestrian monitoring method, system, medium and equipment based on multitasking EfficientDet

Publications (2)

Publication Number Publication Date
CN112633086A CN112633086A (en) 2021-04-09
CN112633086B true CN112633086B (en) 2024-01-26

Family

ID=75308801

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011427301.4A Active CN112633086B (en) 2020-12-09 2020-12-09 Near-infrared pedestrian monitoring method, system, medium and equipment based on multitasking EfficientDet

Country Status (1)

Country Link
CN (1) CN112633086B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486898B (en) * 2021-07-08 2024-05-31 西安电子科技大学 Radar signal RD image interference identification method and system based on improvement ShuffleNet
CN114663724A (en) * 2022-03-21 2022-06-24 国网江苏省电力有限公司南通供电分公司 Intelligent identification method and system for kite string image
CN115187783B (en) * 2022-09-09 2022-12-27 之江实验室 Multi-task hybrid supervision medical image segmentation method and system based on federal learning

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063559A (en) * 2018-06-28 2018-12-21 东南大学 A kind of pedestrian detection method returned based on improvement region
CN109584248A (en) * 2018-11-20 2019-04-05 西安电子科技大学 Infrared surface object instance dividing method based on Fusion Features and dense connection network
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN110633632A (en) * 2019-08-06 2019-12-31 厦门大学 Weak supervision combined target detection and semantic segmentation method based on loop guidance
CN110941995A (en) * 2019-11-01 2020-03-31 中山大学 Real-time target detection and semantic segmentation multi-task learning method based on lightweight network
CN110969124A (en) * 2019-12-02 2020-04-07 重庆邮电大学 Two-dimensional human body posture estimation method and system based on lightweight multi-branch network
CN111027493A (en) * 2019-12-13 2020-04-17 电子科技大学 Pedestrian detection method based on deep learning multi-network soft fusion
CN111507381A (en) * 2020-03-31 2020-08-07 上海商汤智能科技有限公司 Image recognition method and related device and equipment
CN111652213A (en) * 2020-05-24 2020-09-11 浙江理工大学 Ship water gauge reading identification method based on deep learning
CN111798425A (en) * 2020-06-30 2020-10-20 天津大学 Intelligent detection method for mitotic image in gastrointestinal stromal tumor based on deep learning
CN111860316A (en) * 2020-07-20 2020-10-30 上海汽车集团股份有限公司 Driving behavior recognition method and device and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10165964B2 (en) * 2015-05-08 2019-01-01 Vida Diagnostics, Inc. Systems and methods for quantifying regional fissure features

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN109063559A (en) * 2018-06-28 2018-12-21 东南大学 A kind of pedestrian detection method returned based on improvement region
CN109584248A (en) * 2018-11-20 2019-04-05 西安电子科技大学 Infrared surface object instance dividing method based on Fusion Features and dense connection network
CN110633632A (en) * 2019-08-06 2019-12-31 厦门大学 Weak supervision combined target detection and semantic segmentation method based on loop guidance
CN110941995A (en) * 2019-11-01 2020-03-31 中山大学 Real-time target detection and semantic segmentation multi-task learning method based on lightweight network
CN110969124A (en) * 2019-12-02 2020-04-07 重庆邮电大学 Two-dimensional human body posture estimation method and system based on lightweight multi-branch network
CN111027493A (en) * 2019-12-13 2020-04-17 电子科技大学 Pedestrian detection method based on deep learning multi-network soft fusion
CN111507381A (en) * 2020-03-31 2020-08-07 上海商汤智能科技有限公司 Image recognition method and related device and equipment
CN111652213A (en) * 2020-05-24 2020-09-11 浙江理工大学 Ship water gauge reading identification method based on deep learning
CN111798425A (en) * 2020-06-30 2020-10-20 天津大学 Intelligent detection method for mitotic image in gastrointestinal stromal tumor based on deep learning
CN111860316A (en) * 2020-07-20 2020-10-30 上海汽车集团股份有限公司 Driving behavior recognition method and device and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TS2C:Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection;Yunchao Wei 等,;《ECCV 2018》;20181231;第2018年卷;全文 *
利用边界校正网络提取建筑物轮廓;胡敏 等,;《遥感信息》;20201031;第35卷(第5期);全文 *

Also Published As

Publication number Publication date
CN112633086A (en) 2021-04-09

Similar Documents

Publication Publication Date Title
Wang et al. An improved light-weight traffic sign recognition algorithm based on YOLOv4-tiny
Tian et al. A dual neural network for object detection in UAV images
CN112633086B (en) Near-infrared pedestrian monitoring method, system, medium and equipment based on multitasking EfficientDet
Zheng et al. HLU 2-Net: a residual U-structure embedded U-Net with hybrid loss for tire defect inspection
CN107545263A (en) A kind of object detecting method and device
Xiang et al. Lightweight fully convolutional network for license plate detection
Hou et al. M-YOLO: an object detector based on global context information for infrared images
Qu et al. Improved YOLOv5-based for small traffic sign detection under complex weather
Panigrahi et al. MS-ML-SNYOLOv3: A robust lightweight modification of SqueezeNet based YOLOv3 for pedestrian detection
Yun et al. Part-level convolutional neural networks for pedestrian detection using saliency and boundary box alignment
Yu et al. An improved YOLOX for detection in urine sediment images
Wang et al. CDFF: a fast and highly accurate method for recognizing traffic signs
Xu et al. Crack-Att Net: Crack detection based on improved U-Net with parallel attention
Li et al. Incremental learning of infrared vehicle detection method based on SSD
Wang et al. Summary of object detection based on convolutional neural network
Wu et al. Research on asphalt pavement disease detection based on improved YOLOv5s
Deng et al. Abnormal behavior recognition based on feature fusion C3D network
Wang et al. Ctl-dnnet: effective circular traffic light recognition with a deep neural network
Nguyen et al. An efficient model for floating trash detection based on YOLOv5s
Liu et al. UDP-YOLO: High Efficiency and Real-Time Performance of Autonomous Driving Technology
Laptev et al. Integrating Traditional Machine Learning and Neural Networks for Image Processing
Yin et al. A real-time vehicle logo detection method based on improved YOLOv2
Ren et al. RBS-YOLO: a vehicle detection algorithm based on multi-scale feature extraction
CN118015261B (en) Remote sensing image target detection method based on multi-scale feature multiplexing
Chen et al. MS-FPN-based pavement defect identification algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant