CN112733749A - Real-time pedestrian detection method integrating attention mechanism - Google Patents

Real-time pedestrian detection method integrating attention mechanism Download PDF

Info

Publication number
CN112733749A
CN112733749A CN202110049426.6A CN202110049426A CN112733749A CN 112733749 A CN112733749 A CN 112733749A CN 202110049426 A CN202110049426 A CN 202110049426A CN 112733749 A CN112733749 A CN 112733749A
Authority
CN
China
Prior art keywords
network
pedestrian
detection
attention
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110049426.6A
Other languages
Chinese (zh)
Other versions
CN112733749B (en
Inventor
冯宇平
管玉宇
刘宁
杨旭睿
赵文仓
王明甲
刘雪峰
秦浩华
王兆辉
赵德钊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao University of Science and Technology
Original Assignee
Qingdao University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao University of Science and Technology filed Critical Qingdao University of Science and Technology
Priority to CN202110049426.6A priority Critical patent/CN112733749B/en
Publication of CN112733749A publication Critical patent/CN112733749A/en
Application granted granted Critical
Publication of CN112733749B publication Critical patent/CN112733749B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Abstract

The invention relates to a real-time pedestrian detection method integrating an attention mechanism, and belongs to the field of target detection. In order to improve the accuracy of the Tiny YOLOV3 target detection algorithm in the pedestrian detection task, the invention carries out research and improvement on the algorithm. The method comprises the steps of deepening a feature extraction network of the Tiny YOLOV3, and enhancing the network feature extraction capability; then, adding a channel domain attention mechanism into two detection scales of the prediction network respectively, giving different weights to different channels of the characteristic diagram, and guiding the network to pay more attention to the visible region of the pedestrian; and finally, improving the activation function and the loss function and adopting a K-means clustering algorithm to reselect the initial candidate box. The invention improves the detection precision of the pedestrian, keeps the faster detection speed and meets the real-time operation requirement.

Description

Real-time pedestrian detection method integrating attention mechanism
Technical Field
The invention relates to a real-time pedestrian detection method integrating an attention mechanism, and belongs to the technical field of target detection.
Background
With the development of scientific technology, pedestrian detection is more and more widely applied in daily life and industrial production. Because the image background containing the pedestrian is complex and is influenced by the problems of posture, wearing and shielding, the difficulty of pedestrian detection is greatly increased, and in an actual pedestrian detection system, higher accuracy and higher real-time performance are required, so that the method has very important practical significance for the research of pedestrian detection.
Conventional pedestrian detection algorithms typically employ methods of artificial feature extraction and classification. For example, journal, "a pedestrian detection model with local features", trains Adaboost classifiers with Haar features for different body parts, and detects pedestrians using a support vector machine. Journal pedestrian detection by improving features and GPU (graphic processing Unit) adopts SILTP texture features and gradient direction histograms to extract features of different parts of a human body, and pedestrian detection is realized by GPU acceleration. With the improvement of computer computing power, a target detection algorithm based on a convolutional neural network is proposed in succession. The commonly used methods at present comprise a two-stage detection algorithm R-CNN series and a one-stage detection algorithm SSD and YOLO series. The double-stage detection algorithm utilizes the selective search or the regional candidate network to generate the candidate region, and then the type and the position of the target are predicted, so that the target detection precision is improved. However, since the candidate area generation and the detection network are separately performed, it is difficult to achieve real-time object detection. The one-stage detection algorithm directly regresses the type and the position of the target, and has high detection speed. At present, a plurality of scholars are researching on pedestrian detection. For example, a gradual positioning fitting module is proposed in a document of "Learning effective single-stage geometric detectors by adaptive positioning fitting", which realizes the gradual positioning of pedestrians by using multiple scales, and improves the detection precision. The document "Dense connection and spatial gradient based on clustered YOLO for object detection" improves a feature extraction network on the basis of YOLOV2, proposes a YOLO target detection algorithm based on Dense connection and a spatial pyramid pooling structure, and balances detection accuracy and speed. The document "Pedestrian object detection with fusion of visual attribute mechanism and semantic fusion" utilizes the visual attention mechanism and Laplacian pyramid fusion method to determine Pedestrian saliency maps, achieving 92.78% detection accuracy on the INRIA dataset. The method effectively improves the pedestrian detection effect, but is not suitable for actual scenes, and for some scenes with high real-time requirements, the method not only requires high detection precision, but also requires high detection speed.
The YOLOV3 algorithm effectively improves the detection precision by using the structural design of a Feature Pyramid (FPN) and a residual error Network. However, the algorithm has a complex network structure and a large model volume, and is difficult to meet the real-time requirement on embedded equipment. The Tiny YOLOV3 is a simplified version of YOLOV3, and has the advantages of simple network structure, small model volume, high detection speed and low detection precision; meanwhile, the Tiny YOLOV3 uses the structural design of FPN to fuse the feature maps of two detection scales, but this way is only to connect the features of different channels in series, and cannot reflect the importance degree between the channels of the feature maps. Aiming at the problems, the invention optimizes and improves the algorithm of the Tiny Yolov 3. Firstly, deepening a backbone network by adopting a 3 multiplied by 3 convolution, and enhancing the feature extraction capability of the network; then, carrying out dimension reduction on the feature map by adopting 1 × 1 convolution, reducing the model parameter quantity, and realizing cross-channel information interaction; then, introducing a lightweight channel domain attention mechanism into the two prediction networks, fusing information of different scales by using the attention mechanism, giving different weights to different channels of the characteristic diagram, and guiding the networks to pay attention to pedestrian areas; and finally, optimizing a regression loss function and an activation function of the bounding box, and reselecting the initial candidate box by adopting a K-means clustering algorithm. Experimental results show that the improved Tiny Yoloov 3 has higher pedestrian detection precision, higher detection speed, less model parameters, small volume and suitability for real-time and embedded application.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a real-time pedestrian detection method integrating an attention mechanism, which is characterized in that the algorithm of the Tiny Yolov3 is optimized and improved: firstly, deepening a backbone network by adopting a 3 multiplied by 3 convolution, and enhancing the feature extraction capability of the network; then, carrying out dimension reduction on the feature map by adopting 1 × 1 convolution, reducing the model parameter quantity, and realizing cross-channel information interaction; then, introducing a lightweight channel domain attention mechanism into the two prediction networks, fusing information of different scales by using the attention mechanism, giving different weights to different channels of the characteristic diagram, and guiding the networks to pay attention to pedestrian areas; and finally, optimizing a regression loss function and an activation function of the bounding box, and reselecting the initial candidate box by adopting a K-means clustering algorithm. Experimental results show that the improved Tiny Yoloov 3 has higher pedestrian detection precision, higher detection speed, less model parameters, small volume and suitability for real-time and embedded application.
The invention discloses a real-time pedestrian detection method integrating an attention mechanism, which comprises the following steps of:
s1: selecting a Tiny Yoloov 3 algorithm, comprising the following steps:
s11: firstly, dividing an image into S multiplied by S grids, predicting B bounding boxes, confidence coefficient and C class probability by each grid, wherein the confidence coefficient formula is as follows:
Figure BDA0002898457710000021
wherein P (object) is the existence probability of the object in the grid,
Figure BDA0002898457710000022
the intersection ratio of the prediction frame and the real frame is obtained;
s12: the feature extraction network of the Tiny YOLOV3 is 7-layer convolution and 6-layer maximum pooling, meanwhile, multi-scale detection of the YOLOV3 is simplified, and prediction output is carried out on a feature map by adopting two detection scales of 26 multiplied by 26 and 13 multiplied by 13;
s2: deepening the feature extraction network, comprising the following steps:
s21: firstly, expanding the number of channels to 2 times of the number of the previous layer by 3 multiplied by 3 convolution, and extracting high-dimensional features;
s22: then, compressing the number of channels to 2 times of the original number by 1 × 1 convolution, reducing the channel dimension, reducing the calculated amount and realizing the cross-channel interaction of information;
s23: finally expanding the channel through 3 × 3 convolution to restore the original channel dimension;
s3: the prediction network fusing the attention of the channel: the method includes the steps that an attention mechanism is introduced into a prediction network of the Tiny YOLOV3, information of different scales is fused by the attention mechanism, different weights are given to characteristic channels, the network is guided to pay attention to pedestrian characteristics, and the influence of interference information is reduced, so that the detection accuracy is improved, and the method includes the following steps:
s31: introducing a non-dimensionality reduction lightweight channel domain attention mechanism ECA-Net, and inputting a characteristic diagram X belonging to RH×W×CX has C characteristic channels;
s32: compressing the global spatial information through global average pooling, namely compressing the global spatial information on a spatial dimension H multiplied by W to obtain 1 multiplied by 1 weight information, wherein the global average pooling formula is as follows:
Figure BDA0002898457710000031
wherein, Y is the weight obtained after compression, and H multiplied by W is the space dimension information;
s33: in order to enable the network to automatically learn attention weights of different channels, cross-channel information interaction is completed by using one-dimensional convolution, the size of a one-dimensional convolution kernel is adaptively determined by a function of a channel dimension C, and a formula for calculating the size of the one-dimensional convolution kernel is as follows:
Figure BDA0002898457710000032
s34: and using the obtained convolution kernel for one-dimensional convolution, and obtaining the weight of each channel by using Sigmoid, wherein the formula is as follows:
ωc=σ(C1Dk(y)) (4)
where σ is the Sigmoid activation function, ωcIs the generated channel attention weight with dimensions of 1 × 1 × C;
s35: then, the attention weight and the input feature map are weighted to realize the importance expression of the feature map channel, and the weighting formula is as follows:
Figure BDA0002898457710000033
wherein the content of the first and second substances,
Figure BDA0002898457710000034
denotes element-by-element multiplication, XcIndicating an output result by the attention mechanism;
s4: improving the loss function and the activation function, including the following steps:
s41: during the training process, the Loss function of Tiny YOLOV3 can be divided into three parts, namely, bounding box regression Loss, confidence Loss and classification Loss, and the total Loss can be expressed by equation (6):
Figure BDA0002898457710000035
wherein i represents a scale;
s42: with the generalized cross-over ratio GIOU as the regression loss, IOU and GIOU are defined as follows:
Figure BDA0002898457710000036
Figure BDA0002898457710000041
wherein B represents a prediction box, BgtRepresenting a real box, C representing a minimum closed surface containing the real box and a prediction box;
s43: the activation function is an important unit of the convolutional neural network, so that the network introduces a nonlinear factor, the model is not single any more, better learning of the network is facilitated, and the improved feature extraction network adopts a Mish activation function.
Preferably, in the step S1, the YOLO series algorithm is a one-stage target detection algorithm based on a convolutional neural network, and the Tiny YOLOV3 is a simplified version based on YOLOV 3.
Preferably, in the step S2, the feature extraction network of the Tiny YOLOV3 is shallow, so that deep features are difficult to extract, and the accuracy is low in pedestrian target detection; under the condition of overlarge calculated amount, by taking the idea of dense connection network as reference, before the increased 3 × 3 convolutional layer, the convolutional layer with the convolutional kernel size of 1 × 1 is introduced, and the channel dimension is reduced so as to reduce the calculated amount of the network.
Preferably, in step S3, in an actual pedestrian detection scene, the existence of the interference and the occlusion condition of the background information affects the extraction of the pedestrian features by the network, and further affects the pedestrian detection accuracy; the prediction network of the Tiny YOLOV3 fuses feature maps of two scales, and the fusion mode only connects the features in series in the channel dimension and cannot reflect the importance degree of the pedestrian features on certain channels.
Preferably, in step S32, the convolutional neural network can only learn the local receptive field, and cannot utilize the context information outside the region.
Preferably, in step S12, the feature map output by the two detection scales has sizes of 13 × 13 and 26 × 26, respectively, that is, the input image is divided into 13 × 13 and 26 × 26 grids, and pedestrians at long distance and short distance are detected, respectively, and each grid corresponds to a channel.
Preferably, in step S12, each grid is preset with 3 preselection frames, which are continuously adjusted during training, and an optimal preselection frame is selected as an output result; the different channels represent output parameters of each grid, taking a 13 × 13 feature map as an example, the parameters of each channel include the center coordinates (bx, by) of the prediction frame, the length and width (bw, bh) of the prediction frame, the confidence score p0 of the prediction frame, and the prediction score s of the pedestrian; each grid contains 3 prediction frames and each grid contains 6 parameters, so that the channel dimensions of the output feature map are all 18.
Preferably, in step S12, an ECA attention module is added to the prediction network that outputs a 13 × 13 feature map, the feature map after passing through the attention module is upsampled and connected in series with a 26 × 26 feature map, a feature map of 384-dimensional channels is output, and then weights are redistributed by the ECA attention module, so that the final two output layers will pay more attention to pedestrian information, thereby effectively reducing the influence of interference information and occlusion problems.
The invention has the beneficial effects that: the invention relates to a real-time pedestrian detection method integrating an attention mechanism, which aims to improve the accuracy of a Tiny Yolov3 target detection algorithm in a pedestrian detection task, and the algorithm is researched and improved; firstly, deepening a feature extraction network of the Tiny YOLOV3, and enhancing the network feature extraction capability; then, adding a channel domain attention mechanism into two detection scales of the prediction network respectively, giving different weights to different channels of the characteristic diagram, and guiding the network to pay more attention to the visible region of the pedestrian; finally, improving an activation function and a loss function and adopting a K-means clustering algorithm to reselect an initial candidate box; the experimental result shows that the accuracy of the improved Tiny Yolov3 algorithm reaches 77% on a VOC2007 pedestrian subset, is improved by 8.5% compared with that of the Tiny Yolov3, reaches 92.7% on an INRIA data set, is improved by 2.5%, and the running speed reaches 92.6 frames per second and 31.2 frames per second respectively; the invention improves the detection precision of the pedestrian, keeps the faster detection speed and meets the real-time operation requirement.
Drawings
FIG. 1 is a view showing a structure of a Tiny Yolov3 model.
FIG. 2 is a view of a Tiny Yoloov 3 modified model.
Fig. 3 is a diagram of the structure of the ECA model.
Fig. 4 is a diagram of a prediction layer structure.
Fig. 5 is a graph of the LeakyRelu and Mish activation functions.
Fig. 6(a) -6 (b) are graphs of AP variations under different data sets.
FIGS. 7(a) to 7(c) are graphs showing the change in the detection result of the Tiny Yolov 3.
FIGS. 8(a) to 8(c) are graphs showing the change in the detection results of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1:
the optimization of the Tiny YOLOV3 algorithm is improved. Firstly, deepening a backbone network by adopting a 3 multiplied by 3 convolution, and enhancing the feature extraction capability of the network; then, carrying out dimension reduction on the feature map by adopting 1 × 1 convolution, reducing the model parameter quantity, and realizing cross-channel information interaction; then, introducing a lightweight channel domain attention mechanism into the two prediction networks, fusing information of different scales by using the attention mechanism, giving different weights to different channels of the characteristic diagram, and guiding the networks to pay attention to pedestrian areas; and finally, optimizing a regression loss function and an activation function of the bounding box, and reselecting the initial candidate box by adopting a K-means clustering algorithm. Experimental results show that the improved Tiny Yoloov 3 has higher pedestrian detection precision, higher detection speed, less model parameters, small volume and suitability for real-time and embedded application.
S1: selecting a Tiny Yoloov 3 algorithm:
the YOLO series of algorithms are one-stage target detection algorithms based on convolutional neural networks. The algorithm divides an image into S multiplied by S grids, each grid predicts B bounding boxes, confidence coefficients and C class probabilities, and the confidence coefficient formula is as follows:
Figure BDA0002898457710000061
wherein P (object) is the existence probability of the object in the grid,
Figure BDA0002898457710000062
the intersection ratio of the prediction frame and the real frame is obtained.
The Tiny YOLOV3 is a simplified version based on YOLOV 3. Compared with a complex network structure of the YOLOV3, the Tiny YOLOV3 reduces the feature extraction network into 7-layer convolution and 6-layer maximum pooling (Maxpool), reduces the size of the model, simplifies the multi-scale detection of the YOLOV3, adopts two detection scales of 26 × 26 and 13 × 13 to predict and output the feature map, and the network structure is shown in fig. 1.
S2: deepening a feature extraction network;
the feature extraction network of the Tiny YOLOV3 is shallow, the deep features are difficult to extract, and the accuracy is low in pedestrian target detection. Therefore, the invention deepens the feature extraction network, adds 4 convolution layers with convolution kernel size of 3 multiplied by 3 on the basis of the original network, enhances the feature extraction capability and improves the detection precision. Although the accuracy of pedestrian detection can be improved by adding the convolutional layers, the parameter quantity of the model is increased sharply along with the superposition of the convolutional layers, and the calculation quantity and the occupation of memory resources are greatly increased.
Under the condition of overlarge calculated amount, the invention introduces the convolution layer with the convolution kernel size of 1 × 1 before the added 3 × 3 convolution layer by using the thought of a dense connection network, and reduces the channel dimension so as to reduce the calculated amount of the network. Specifically, firstly expanding the number of channels to 2 times of the number of the previous layer by 3 × 3 convolution to extract high-dimensional features; then, compressing the number of channels to 2 times of the original number by 1 × 1 convolution, reducing the channel dimension, reducing the calculated amount and realizing the cross-channel interaction of information; and finally expanding the channel through 3 multiplied by 3 convolution to restore the original channel dimension. The improved model structure is shown in fig. 2, where the left dashed box is the improved feature extraction network.
S3: the prediction network fusing the attention of the channel:
in an actual pedestrian detection scene, the interference of background information and the existence of shielding conditions influence the extraction of pedestrian characteristics by a network, and further influence the pedestrian detection precision. The prediction network of the Tiny YOLOV3 fuses feature maps of two scales, and the fusion mode only connects the features in series (contact) in the channel dimension and cannot reflect the importance degree of the pedestrian features on some channels. Therefore, the attention mechanism is introduced into a prediction network of the Tiny Yolov3, information of different scales is fused by the attention mechanism, different weights are given to characteristic channels, the network is guided to pay attention to pedestrian characteristics, the influence of interference information is reduced, and therefore detection accuracy is improved, and a right dotted line frame in fig. 2 is the improved prediction network. In order to enable the network to automatically learn the weight of the feature Channel, the invention introduces an efficiency Channel Attention network (ECA-Net) mechanism of lightweight Channel domain without dimensionality reduction, as shown in FIG. 3.
In FIG. 3, the input feature map X ∈ RH×W×CAnd X has C characteristic channels. Generally, the convolutional neural network can only learn the local receptive field, and cannot utilize the context information outside the region. For this purpose, global spatial information is compressed by global average pooling, i.e. compressed in the spatial dimension H × W, to obtain 1 × 1 weight information, where the global average pooling formula is as follows:
Figure BDA0002898457710000071
where Y is the weight obtained after compression, and H × W is the spatial dimension information.
In order for the network to automatically learn attention weights of different channels, cross-channel information interaction is accomplished using one-dimensional convolution. The size of the one-dimensional convolution kernel is adaptively determined by a function of the channel dimension C, and a formula for calculating the size of the one-dimensional convolution kernel is as follows:
Figure BDA0002898457710000072
the resulting convolution kernel is used for one-dimensional convolution and Sigmoid is used to obtain the weight of each channel. The formula is as follows:
ωc=σ(C1Dk(y)) (4)
where σ is the Sigmoid activation function, ωcIs the generated channel attention weight with dimensions of 1 × 1 × C. Then, the attention weight and the input feature map are weighted to realize the importance expression of the feature map channel, and the weighting formula is as follows:
Figure BDA0002898457710000073
wherein the content of the first and second substances,
Figure BDA0002898457710000074
denotes element-by-element multiplication, XcIndicating the output result by the attention mechanism.
As shown in fig. 4, the feature map sizes of the two detection scales output are 13 × 13 and 26 × 26, respectively, i.e., the input image is divided into 13 × 13 and 26 × 26 grids, which respectively detect pedestrians at long distances and short distances, and each grid corresponds to a channel. Each grid is preset with 3 preselection frames, and the preselection frames are continuously adjusted during training and selected as output results. The different channels represent the output parameters of each grid, and the parameters of each channel include the center coordinates (b) of the prediction box, taking a 13 × 13 feature map as an examplex,by) Length and width of prediction frame (b)w,bh) Confidence score p of prediction box0And a predicted score s of the pedestrian. Each grid contains 3 prediction frames, each grid contains the above 6 parameters, so that a feature map is outputThe channel dimensions of (a) are all 18. The present invention combines the ECA attention module with the prediction network of the Tiny Yoloov 3, and adds the ECA attention module and the prediction network to two detection scales respectively. An ECA attention module is added to a prediction network outputting a 13 x 13 feature map, the feature map passing through the attention module is subjected to up-sampling and is connected with a 26 x 26 feature map in series, a 384-dimensional channel feature map is output, the ECA attention module is used for redistributing weights, and finally two output layers focus on more pedestrian information, so that the influence of interference information and shielding problems is effectively reduced.
S4: modified loss and activation functions:
during the training process, the Loss function of Tiny YOLOV3 can be divided into three parts, namely, bounding box regression Loss, confidence Loss and classification Loss, and the total Loss can be expressed by equation (6):
Figure BDA0002898457710000075
where i represents a scale.
The positioning of pedestrian detection usually depends on accurate bounding box regression, and in order to improve the positioning accuracy and detection precision, the bounding box regression loss is optimized and improved. The invention adopts Generalized Intersection Over Union (GIOU) as the regression loss. The reason for adopting the GIOU has two aspects, that is, when an Intersection Over Unit (IOU) is in a condition that a real frame and a prediction frame do not intersect, the IOU cannot perform evaluation measurement; secondly, the IOU cannot accurately reflect the overlapping degree of the real frame and the predicted frame. IOU and GIOU are defined as follows:
Figure BDA0002898457710000081
Figure BDA0002898457710000082
wherein B represents a prediction box, BgtRepresenting real boxes, C representing packagesThe minimum closed surface containing the real box and the prediction box.
The activation function is an important unit of the convolutional neural network, and is updated quickly along with the gradual maturity of a network model, so that the network can introduce nonlinear factors, the model is not single any more, the model is more complicated, and the network can learn better. The feature extraction network of the Tiny YOLOV3 adopts a LeakyRelu activation function, the method replaces the LeakyRelu activation function with a Mish activation function, and the function image is shown in figure 5. The Mish activation function is smoother, so that the network can learn pedestrian information better, and meanwhile, the Mish activation function allows a smaller negative gradient to flow in, so that information is not interrupted, and better accuracy and generalization capability are obtained.
Example 2:
the experimental environment configuration of the present invention is shown in table 1. The experiment is written in python 3.6 language, and the deep learning frame is Pytrch 1.4. Training batch set to 300, mini-batch set to 16, initial learning rate of 0.01, weight attenuation coefficient of 0.0005, momentum coefficient of 0.9. The invention adopts a multi-scale training mode, and images of each batch are randomly selected in (320, 352, 384, 416, 448, 480, 512, 544, 576, 608 and 640) so as to improve the generalization capability of the model.
Table 1 experimental environment configuration
Figure BDA0002898457710000083
The experimental data set used the VOC2007 and INRIA data sets. The VOC2007 data set contained 20 classes of targets, totaling 9963 images. According to the invention, all pedestrian images are extracted from a VOC2007 data set, the total number of the pedestrian images is 4015, the background of the data set is complex, the change of the pedestrian posture is large, different degrees of sheltering exist, the generalization capability of a training model can be enhanced, and the data set adopts 8: 2 into training and test sets. Most pedestrians in the INRIA data set are in a standing posture and are close to a real road scene, and the training set and the testing set are divided. The number of pedestrian images in the data set is shown in table 2.
TABLE 2 pedestrian data set image number
Figure BDA0002898457710000084
Results and analysis of the experiments
To evaluate the effectiveness of the improved algorithm, YOLOV3, Tiny YOLOV3 and the present invention were trained and tested in the VOC2007 and INRIA datasets, respectively. Before training, in order to make the pre-selected frame more fit to the pedestrian, the K-means clustering algorithm is used to re-select the initial pre-selected frame to obtain 6 pre-selected frame sizes, wherein (38,97), (81,202), (126,386) corresponds to 13 × 13 prediction layers, and (203,271), (251,473), (448,521) corresponds to 26 × 26 prediction layers.
The test indexes comprise Precision (Precision) and Recall (Recall), the accuracy of the detection algorithm is measured by adopting an Average Precision (AP) of the comprehensive indexes, and the detection speed is measured by adopting a Frame Per Second (FPS). To get the best model trained, each batch of training is completed, the test set is used for testing, and the model with the highest AP is saved. Fig. 6(a) and 6(b) are the accuracy variations of the present invention trained on VOC2007 and INRIA datasets, respectively.
Table 3 shows the model size and parameter values for different algorithms, the model size of the present invention is 39.8MB, which is only 6.6MB larger than Tiny YOLOV3, and the model size and parameter values are much smaller than YOLOV3, which has certain advantages in model size and parameter values.
TABLE 3 model sizes and parameter quantities for the algorithms
Figure BDA0002898457710000091
Table 4 shows the results of the training tests of each algorithm on two data sets, and the accuracy and recall ratio of the present invention are improved compared to the Tiny YOLOV 3. The pedestrian detection accuracy on the VOC data set is 77%, which is improved by 8.5% compared with the Tiny Yolov3, and although the pedestrian detection accuracy does not reach the detection accuracy of Yolov3, the detection speed reaches 92.6 frames per second, which is improved by 77.1% compared with Yolov 3. The accuracy on an INRIA data set is 92.7 percent, is improved by 2.5 percent compared with the Tiny YOLOV3, is only 0.2 percent lower than the YOLOV3 algorithm, is equivalent to the detection precision of the literature, but is superior to the detection precision of the literature on the detection speed, and the detection speed of the invention reaches 31.2 frames per second, thereby meeting the real-time detection requirement.
TABLE 4 comparison of the results of the experiments for each algorithm
Figure BDA0002898457710000092
FIGS. 7 and 8 are a comparison of the results of the detection of the present invention with the Tiny YOLOV3, respectively. Two pedestrian objects were missed in FIG. 7(a), and no pedestrian was missed in FIG. 8 (a); fig. 7(b) and fig. 8(b) show pedestrian detection in a crowded scene, where the missing detection of Tiny YOLOV3 is more serious, and the present invention is improved significantly; fig. 7(c) shows missing detection of pedestrian objects of small size on the left, and no missing detection in fig. 8 (c). The pedestrian detection method and the pedestrian detection device have the advantages that the better pedestrian detection effect is achieved, and the good detection effect can be achieved in the crowded scene and the detection of small targets, so that the pedestrian detection method and the pedestrian detection device have good generalization capability and can be used for accurately detecting pedestrians.
The invention provides a pedestrian detection algorithm integrated with an attention mechanism on the basis of the Tiny Yolov3, improves the characteristic extraction capability of pedestrian information by deepening the network, reduces the parameter quantity and the model size by 1 multiplied by 1 convolution, and ensures the speed of pedestrian detection. Meanwhile, a non-dimensionality-reduction lightweight channel attention mechanism is introduced into the prediction network, and weights of different channels are redistributed, so that the pedestrian information is more concerned by the model. In addition, the detection precision is further improved by optimizing the regression loss function and the activation function of the bounding box. The detection accuracy rates of 77% and 92.7% are obtained on the VOC2007 pedestrian subset and the INRIA data set, the accuracy rate and the recall rate are improved compared with the accuracy rate and the recall rate of the Tiny YOLOV3, and the detection speeds respectively reach 92.6 frames and 31.2 frames per second, so that the model has good robustness under different data sets and meets the real-time detection requirement. The invention has the speed advantage while keeping higher detection accuracy, but in the face of the conditions of larger change of the posture of the pedestrian and more serious shielding, the accuracy is still different from that of a complex large-scale network, and the detection accuracy is further improved under the condition of meeting the real-time detection in the following work.
The invention can be widely applied to target detection occasions.
It is to be noted that, in the present invention, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (8)

1. A real-time pedestrian detection method fused with an attention mechanism is characterized by comprising the following steps:
s1: selecting a Tiny Yoloov 3 algorithm, comprising the following steps:
s11: firstly, dividing an image into S multiplied by S grids, predicting B bounding boxes, confidence coefficient and C class probability by each grid, wherein the confidence coefficient formula is as follows:
Figure FDA0002898457700000011
wherein P (object) is the existence probability of the object in the grid,
Figure FDA0002898457700000012
the intersection ratio of the prediction frame and the real frame is obtained;
s12: the feature extraction network of the Tiny YOLOV3 is 7-layer convolution and 6-layer maximum pooling, meanwhile, multi-scale detection of the YOLOV3 is simplified, and prediction output is carried out on a feature map by adopting two detection scales of 26 multiplied by 26 and 13 multiplied by 13;
s2: deepening the feature extraction network, comprising the following steps:
s21: firstly, expanding the number of channels to 2 times of the number of the previous layer by 3 multiplied by 3 convolution, and extracting high-dimensional features;
s22: then, compressing the number of channels to 2 times of the original number by 1 × 1 convolution, reducing the channel dimension, reducing the calculated amount and realizing the cross-channel interaction of information;
s23: finally expanding the channel through 3 × 3 convolution to restore the original channel dimension;
s3: the prediction network fusing the attention of the channel: the method includes the steps that an attention mechanism is introduced into a prediction network of the Tiny YOLOV3, information of different scales is fused by the attention mechanism, different weights are given to characteristic channels, the network is guided to pay attention to pedestrian characteristics, and the influence of interference information is reduced, so that the detection accuracy is improved, and the method includes the following steps:
s31: introducing a non-dimensionality reduction lightweight channel domain attention mechanism ECA-Net, and inputting a characteristic diagram X belonging to RH×W×CX has C characteristic channels;
s32: compressing the global spatial information through global average pooling, namely compressing the global spatial information on a spatial dimension H multiplied by W to obtain 1 multiplied by 1 weight information, wherein the global average pooling formula is as follows:
Figure FDA0002898457700000013
wherein, Y is the weight obtained after compression, and H multiplied by W is the space dimension information;
s33: in order to enable the network to automatically learn attention weights of different channels, cross-channel information interaction is completed by using one-dimensional convolution, the size of a one-dimensional convolution kernel is adaptively determined by a function of a channel dimension C, and a formula for calculating the size of the one-dimensional convolution kernel is as follows:
Figure FDA0002898457700000014
s34: and using the obtained convolution kernel for one-dimensional convolution, and obtaining the weight of each channel by using Sigmoid, wherein the formula is as follows:
ωc=σ(C1Dk(y)) (4)
where σ is the Sigmoid activation function, ωcIs the generated channel attention weight with dimensions of 1 × 1 × C;
s35: then, the attention weight and the input feature map are weighted to realize the importance expression of the feature map channel, and the weighting formula is as follows:
Figure FDA0002898457700000021
wherein the content of the first and second substances,
Figure FDA0002898457700000022
denotes element-by-element multiplication, XcIndicating an output result by the attention mechanism;
s4: improving the loss function and the activation function, including the following steps:
s41: during the training process, the Loss function of Tiny YOLOV3 can be divided into three parts, namely, bounding box regression Loss, confidence Loss and classification Loss, and the total Loss can be expressed by equation (6):
Figure FDA0002898457700000023
wherein i represents a scale;
s42: with the generalized cross-over ratio GIOU as the regression loss, IOU and GIOU are defined as follows:
Figure FDA0002898457700000024
Figure FDA0002898457700000025
wherein B represents a prediction box, BgtRepresenting a real box, C representing a minimum closed surface containing the real box and a prediction box;
s43: the activation function is an important unit of the convolutional neural network, so that the network introduces a nonlinear factor, the model is not single any more, better learning of the network is facilitated, and the improved feature extraction network adopts a Mish activation function.
2. The method for real-time pedestrian detection with attention fused mechanism according to claim 1, wherein in step S1, the YOLO series algorithm is a one-stage target detection algorithm based on convolutional neural network, and Tiny YOLOV3 is a simplified version based on YOLOV 3.
3. The method for detecting pedestrians in real time with the integrated attention mechanism as claimed in claim 1, wherein in step S2, the feature extraction network of Tiny YOLOV3 is shallow, it is difficult to extract deep features, and the accuracy is low in detecting pedestrian targets; under the condition of overlarge calculated amount, by taking the idea of dense connection network as reference, before the increased 3 × 3 convolutional layer, the convolutional layer with the convolutional kernel size of 1 × 1 is introduced, and the channel dimension is reduced so as to reduce the calculated amount of the network.
4. The method for detecting the pedestrian in real time with the integrated attention mechanism according to claim 1, wherein in step S3, in an actual pedestrian detection scene, the existence of the interference and the occlusion condition of the background information affects the extraction of the pedestrian feature by the network, and further affects the pedestrian detection accuracy; the prediction network of the Tiny YOLOV3 fuses feature maps of two scales, and the fusion mode only connects the features in series in the channel dimension and cannot reflect the importance degree of the pedestrian features on certain channels.
5. The method for real-time pedestrian detection with attention fused mechanism according to claim 1, wherein in step S32, the convolutional neural network can only learn local receptive fields, and cannot utilize context information outside the region.
6. The method for detecting pedestrians in real time with attention fused mechanism as claimed in claim 1, wherein in step S12, the feature map output by the two detection scales is 13 × 13 and 26 × 26 in size, i.e. the input image is divided into 13 × 13 and 26 × 26 grids, and pedestrians at far distance and near distance are detected respectively, each grid corresponding to one channel.
7. The method for detecting pedestrians in real time according to claim 6, wherein in step S12, each grid is preset with 3 preselection frames, and the preselection frames are adjusted continuously during training to select the optimal preselection frame as the output result; the different channels represent output parameters of each grid, taking a 13 × 13 feature map as an example, the parameters of each channel include the center coordinates (bx, by) of the prediction frame, the length and width (bw, bh) of the prediction frame, the confidence score p0 of the prediction frame, and the prediction score s of the pedestrian; each grid contains 3 prediction frames and each grid contains 6 parameters, so that the channel dimensions of the output feature map are all 18.
8. The method for detecting pedestrians in real time with fusion of attention mechanism as claimed in claim 7, wherein in step S12, an ECA attention module is added to the prediction network outputting 13 × 13 feature map, the feature map after passing through the attention module is up-sampled and connected in series with 26 × 26 feature map, feature map of 384-dimensional channel is output, and then weight is redistributed by ECA attention module, and finally two output layers will pay more attention to pedestrian information, which effectively reduces the influence of interference information and occlusion problem.
CN202110049426.6A 2021-01-14 2021-01-14 Real-time pedestrian detection method integrating attention mechanism Active CN112733749B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110049426.6A CN112733749B (en) 2021-01-14 2021-01-14 Real-time pedestrian detection method integrating attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110049426.6A CN112733749B (en) 2021-01-14 2021-01-14 Real-time pedestrian detection method integrating attention mechanism

Publications (2)

Publication Number Publication Date
CN112733749A true CN112733749A (en) 2021-04-30
CN112733749B CN112733749B (en) 2022-04-12

Family

ID=75593101

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110049426.6A Active CN112733749B (en) 2021-01-14 2021-01-14 Real-time pedestrian detection method integrating attention mechanism

Country Status (1)

Country Link
CN (1) CN112733749B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282088A (en) * 2021-05-21 2021-08-20 潍柴动力股份有限公司 Unmanned driving method, device and equipment of engineering vehicle, storage medium and engineering vehicle
CN113327243A (en) * 2021-06-24 2021-08-31 浙江理工大学 PAD light guide plate defect visualization detection method based on AYOLOv3-Tiny new framework
CN113496260A (en) * 2021-07-06 2021-10-12 浙江大学 Grain depot worker non-standard operation detection method based on improved YOLOv3 algorithm
CN113516080A (en) * 2021-07-16 2021-10-19 上海高德威智能交通系统有限公司 Behavior detection method and device
CN113538347A (en) * 2021-06-29 2021-10-22 中国电子科技集团公司电子科学研究院 Image detection method and system based on efficient bidirectional path aggregation attention network
CN113537013A (en) * 2021-07-06 2021-10-22 哈尔滨理工大学 Multi-scale self-attention feature fusion pedestrian detection method
CN113705478A (en) * 2021-08-31 2021-11-26 中国林业科学研究院资源信息研究所 Improved YOLOv 5-based mangrove forest single tree target detection method
CN113989624A (en) * 2021-12-08 2022-01-28 北京环境特性研究所 Infrared low-slow small target detection method and device, computing equipment and storage medium
CN114067186A (en) * 2021-09-26 2022-02-18 北京建筑大学 Pedestrian detection method and device, electronic equipment and storage medium
CN114092820A (en) * 2022-01-20 2022-02-25 城云科技(中国)有限公司 Target detection method and moving target tracking method applying same
CN114373118A (en) * 2021-12-30 2022-04-19 华南理工大学 Underwater target detection method based on improved YOLOV4
CN114724012A (en) * 2022-06-10 2022-07-08 天津大学 Tropical unstable wave early warning method and device based on spatio-temporal cross-scale attention fusion
CN115063691A (en) * 2022-07-04 2022-09-16 西安邮电大学 Small target detection method based on feature enhancement under complex scene
CN115273154A (en) * 2022-09-26 2022-11-01 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Thermal infrared pedestrian detection method and system based on edge reconstruction and storage medium
CN115424230A (en) * 2022-09-23 2022-12-02 哈尔滨市科佳通用机电股份有限公司 Fault detection method for vehicle door pulley out-of-track, storage medium and equipment
CN115439765A (en) * 2022-09-17 2022-12-06 艾迪恩(山东)科技有限公司 Marine plastic garbage rotation detection method based on machine learning unmanned aerial vehicle visual angle
CN115908952A (en) * 2023-01-07 2023-04-04 石家庄铁道大学 High-speed rail tunnel fixture detection method based on improved YOLOv5 algorithm
CN117649633A (en) * 2024-01-30 2024-03-05 武汉纺织大学 Pavement pothole detection method for highway inspection
CN115063691B (en) * 2022-07-04 2024-04-12 西安邮电大学 Feature enhancement-based small target detection method in complex scene

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110619309A (en) * 2019-09-19 2019-12-27 天津天地基业科技有限公司 Embedded platform face detection method based on octave convolution sum YOLOv3
CN111079584A (en) * 2019-12-03 2020-04-28 东华大学 Rapid vehicle detection method based on improved YOLOv3
CN111681240A (en) * 2020-07-07 2020-09-18 福州大学 Bridge surface crack detection method based on YOLO v3 and attention mechanism
CN111767882A (en) * 2020-07-06 2020-10-13 江南大学 Multi-mode pedestrian detection method based on improved YOLO model
CN112070713A (en) * 2020-07-03 2020-12-11 中山大学 Multi-scale target detection method introducing attention mechanism
CN112101434A (en) * 2020-09-04 2020-12-18 河南大学 Infrared image weak and small target detection method based on improved YOLO v3

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110619309A (en) * 2019-09-19 2019-12-27 天津天地基业科技有限公司 Embedded platform face detection method based on octave convolution sum YOLOv3
CN111079584A (en) * 2019-12-03 2020-04-28 东华大学 Rapid vehicle detection method based on improved YOLOv3
CN112070713A (en) * 2020-07-03 2020-12-11 中山大学 Multi-scale target detection method introducing attention mechanism
CN111767882A (en) * 2020-07-06 2020-10-13 江南大学 Multi-mode pedestrian detection method based on improved YOLO model
CN111681240A (en) * 2020-07-07 2020-09-18 福州大学 Bridge surface crack detection method based on YOLO v3 and attention mechanism
CN112101434A (en) * 2020-09-04 2020-12-18 河南大学 Infrared image weak and small target detection method based on improved YOLO v3

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
QILONG WANG ET AL.: "ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks", 《2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
XIAOLAN WANG ET AL.: "Data-Driven Based Tiny-YOLOv3 Method for Front Vehicle Detection Inducing SPP-Net", 《SPECIAL SECTION ON INTELLIGENT LOGISTICS BASED ON BIG DATA》 *
周志锋 等: "基于YOLO V3框架改进的目标检测", 《电子测量技术》 *
成玉荣 等: "基于改进Tiny-YOLOv3的人数统计方法", 《科技创新导报》 *
王艺皓: "复杂场景下基于改进 YOLOv3 的口罩佩戴检测算法", 《计算机工程》 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282088A (en) * 2021-05-21 2021-08-20 潍柴动力股份有限公司 Unmanned driving method, device and equipment of engineering vehicle, storage medium and engineering vehicle
CN113327243A (en) * 2021-06-24 2021-08-31 浙江理工大学 PAD light guide plate defect visualization detection method based on AYOLOv3-Tiny new framework
CN113327243B (en) * 2021-06-24 2024-01-23 浙江理工大学 PAD light guide plate defect visual detection method based on Ayolov3-Tiny new framework
CN113538347A (en) * 2021-06-29 2021-10-22 中国电子科技集团公司电子科学研究院 Image detection method and system based on efficient bidirectional path aggregation attention network
CN113538347B (en) * 2021-06-29 2023-10-27 中国电子科技集团公司电子科学研究院 Image detection method and system based on efficient bidirectional path aggregation attention network
CN113496260B (en) * 2021-07-06 2024-01-30 浙江大学 Grain depot personnel non-standard operation detection method based on improved YOLOv3 algorithm
CN113496260A (en) * 2021-07-06 2021-10-12 浙江大学 Grain depot worker non-standard operation detection method based on improved YOLOv3 algorithm
CN113537013A (en) * 2021-07-06 2021-10-22 哈尔滨理工大学 Multi-scale self-attention feature fusion pedestrian detection method
CN113516080A (en) * 2021-07-16 2021-10-19 上海高德威智能交通系统有限公司 Behavior detection method and device
CN113705478A (en) * 2021-08-31 2021-11-26 中国林业科学研究院资源信息研究所 Improved YOLOv 5-based mangrove forest single tree target detection method
CN113705478B (en) * 2021-08-31 2024-02-27 中国林业科学研究院资源信息研究所 Mangrove single wood target detection method based on improved YOLOv5
CN114067186B (en) * 2021-09-26 2024-04-16 北京建筑大学 Pedestrian detection method and device, electronic equipment and storage medium
CN114067186A (en) * 2021-09-26 2022-02-18 北京建筑大学 Pedestrian detection method and device, electronic equipment and storage medium
CN113989624A (en) * 2021-12-08 2022-01-28 北京环境特性研究所 Infrared low-slow small target detection method and device, computing equipment and storage medium
CN114373118A (en) * 2021-12-30 2022-04-19 华南理工大学 Underwater target detection method based on improved YOLOV4
CN114373118B (en) * 2021-12-30 2024-04-05 华南理工大学 Underwater target detection method based on improved YOLOV4
CN114092820A (en) * 2022-01-20 2022-02-25 城云科技(中国)有限公司 Target detection method and moving target tracking method applying same
CN114092820B (en) * 2022-01-20 2022-04-22 城云科技(中国)有限公司 Target detection method and moving target tracking method applying same
CN114724012A (en) * 2022-06-10 2022-07-08 天津大学 Tropical unstable wave early warning method and device based on spatio-temporal cross-scale attention fusion
CN114724012B (en) * 2022-06-10 2022-08-23 天津大学 Tropical unstable wave early warning method and device based on space-time cross-scale attention fusion
CN115063691B (en) * 2022-07-04 2024-04-12 西安邮电大学 Feature enhancement-based small target detection method in complex scene
CN115063691A (en) * 2022-07-04 2022-09-16 西安邮电大学 Small target detection method based on feature enhancement under complex scene
CN115439765A (en) * 2022-09-17 2022-12-06 艾迪恩(山东)科技有限公司 Marine plastic garbage rotation detection method based on machine learning unmanned aerial vehicle visual angle
CN115439765B (en) * 2022-09-17 2024-02-02 艾迪恩(山东)科技有限公司 Marine plastic garbage rotation detection method based on machine learning unmanned aerial vehicle visual angle
CN115424230B (en) * 2022-09-23 2023-06-06 哈尔滨市科佳通用机电股份有限公司 Method for detecting failure of vehicle door pulley derailment track, storage medium and device
CN115424230A (en) * 2022-09-23 2022-12-02 哈尔滨市科佳通用机电股份有限公司 Fault detection method for vehicle door pulley out-of-track, storage medium and equipment
CN115273154A (en) * 2022-09-26 2022-11-01 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Thermal infrared pedestrian detection method and system based on edge reconstruction and storage medium
CN115908952A (en) * 2023-01-07 2023-04-04 石家庄铁道大学 High-speed rail tunnel fixture detection method based on improved YOLOv5 algorithm
CN115908952B (en) * 2023-01-07 2023-05-19 石家庄铁道大学 High-speed railway tunnel fixture detection method based on improved YOLOv5 algorithm
CN117649633A (en) * 2024-01-30 2024-03-05 武汉纺织大学 Pavement pothole detection method for highway inspection

Also Published As

Publication number Publication date
CN112733749B (en) 2022-04-12

Similar Documents

Publication Publication Date Title
CN112733749B (en) Real-time pedestrian detection method integrating attention mechanism
Huang et al. Multi-scale feature fusion convolutional neural network for indoor small target detection
Tian et al. A dual neural network for object detection in UAV images
CN109934285B (en) Deep learning-based image classification neural network compression model system
Wang et al. Object detection using clustering algorithm adaptive searching regions in aerial images
CN108921198A (en) commodity image classification method, server and system based on deep learning
CN112541532B (en) Target detection method based on dense connection structure
CN112070713A (en) Multi-scale target detection method introducing attention mechanism
Tian et al. Small object detection via dual inspection mechanism for UAV visual images
Yuan Face detection and recognition based on visual attention mechanism guidance model in unrestricted posture
CN113449573A (en) Dynamic gesture recognition method and device
CN111898432A (en) Pedestrian detection system and method based on improved YOLOv3 algorithm
Raparthi et al. Machine Learning Based Deep Cloud Model to Enhance Robustness and Noise Interference
Zhao et al. Fire smoke detection based on target-awareness and depthwise convolutions
CN115294563A (en) 3D point cloud analysis method and device based on Transformer and capable of enhancing local semantic learning ability
CN115375781A (en) Data processing method and device
CN115222998A (en) Image classification method
Fan et al. A novel sonar target detection and classification algorithm
Hu et al. Supervised multi-scale attention-guided ship detection in optical remote sensing images
CN112132207A (en) Target detection neural network construction method based on multi-branch feature mapping
CN111582057A (en) Face verification method based on local receptive field
CN108257148B (en) Target suggestion window generation method of specific object and application of target suggestion window generation method in target tracking
Zhao et al. Multi-scale attention-based feature pyramid networks for object detection
Thirumaladevi et al. Multilayer feature fusion using covariance for remote sensing scene classification
Xiao et al. Optimization methods of video images processing for mobile object recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant