CN112507861A - Pedestrian detection method based on multilayer convolution feature fusion - Google Patents
Pedestrian detection method based on multilayer convolution feature fusion Download PDFInfo
- Publication number
- CN112507861A CN112507861A CN202011409937.6A CN202011409937A CN112507861A CN 112507861 A CN112507861 A CN 112507861A CN 202011409937 A CN202011409937 A CN 202011409937A CN 112507861 A CN112507861 A CN 112507861A
- Authority
- CN
- China
- Prior art keywords
- convolution
- feature
- network
- downsampling
- scale
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 37
- 230000004927 fusion Effects 0.000 title claims abstract description 25
- 238000000605 extraction Methods 0.000 claims abstract description 48
- 238000005070 sampling Methods 0.000 claims abstract description 18
- 238000000034 method Methods 0.000 claims abstract description 10
- 230000001965 increasing effect Effects 0.000 abstract description 5
- 238000012512 characterization method Methods 0.000 abstract description 3
- 238000012545 processing Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 11
- 238000012549 training Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Biology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a pedestrian detection method with multi-layer convolution feature fusion, which constructs a new feature extraction network Darknet-61 by reconstructing a residual error network, so that the feature extraction network Darknet-61 has the capability of 6 times of down-sampling, the output of a YOLOv3 algorithm YOLO output layer is increased from 3 layers to 5 layers by the new feature extraction network Darknet-61, a target candidate box is obtained on the basis of a YOLOv3 algorithm with 5-layer output by a k-mean algorithm, and the subsequent processing is carried out on the current optimal candidate box in the target candidate box by an NMS method. According to the invention, by improving the Darknet-53 feature extraction network, four residual error networks and convolution layers are introduced, the down-sampling times are increased, 7 × 7 feature maps are output, the characterization capability of low-level features is enhanced, and the accuracy of large-scale pedestrian detection is improved.
Description
Technical Field
The invention relates to the field of pedestrian detection, in particular to a pedestrian detection method based on multilayer convolution feature fusion.
Background
Pedestrian detection is a core technology of intelligent equipment, can let machine equipment acquire video or image information around to utilize the powerful analysis ability of computer, carry out visual processing to the information that acquires, thereby have the ability of observing and analyzing to complicated thing like the people. The system helps people to complete various recognition and detection tasks.
Pedestrian detection is an important research direction in the field of computer vision, whether pedestrians exist or not is identified from images or video clips through a computer, if the pedestrians exist, specific positions of the pedestrians are further detected, and therefore the position coordinates of the pedestrians are accurately calibrated, and the method and the device are widely applied to the fields of manufacturing industry, military, medical treatment and the like.
The pedestrian is one of the most difficult objects that detect among the target detection, and traditional pedestrian detection algorithm based on artifical characteristic extraction can't obtain big breakthrough in detecting the precision, this is because the background is complicated various in real environment, and the pedestrian will cause the interference to detecting in different occasions, and the polytropy and the yardstick difference of pedestrian gesture simultaneously hardly carry out the pedestrian characterization through traditional characteristic, and the rate of missing inspection can increase in actual detection. The most serious is the shielding problem, which brings difficulty to sample collection, and the detection precision is necessarily tested.
In order to better solve the problems, the invention provides a pedestrian detection method based on multilayer convolution feature fusion. After the second downsampling in the feature extraction network, information obtained by connecting 2 residual error modules is fused with information of a lower-layer network, and a 112 × 112 feature map is output. And meanwhile, 4 residual modules and one downsampling convolution layer are added for downsampling for the 6 th time, the characteristic after downsampling outputs convolution characteristic through the residual modules, finally 7 × 7 characteristic graphs are output through the convolution modules, the output of the YOLO layer is changed from the original three scale characteristic graphs into five scale characteristic graphs, the detection accuracy of pedestrians on large and small targets is enhanced, and the characteristic description capability of the network is improved. The finally improved network has higher accuracy and lower false alarm rate, and simultaneously maintains the robustness of the original algorithm.
Disclosure of Invention
The invention provides a pedestrian detection method based on multi-layer convolution feature fusion, and aims to solve the problem of missing detection in pedestrian detection of different scales in the prior art.
The invention relates to a pedestrian detection method with fusion of multilayer convolution characteristics, which comprises the following steps:
step 1: constructing a residual error network of the characteristic extraction network Darknet, and merging the parameters of the BN layer in the basic unit of the residual error network into the convolution layer; constructing a feature extraction network according to the constructed residual error network, and recording as a feature extraction network Darknet-61;
step 2: constructing a characteristic pyramid network, and fusing 5 convolution characteristics of the image obtained by the characteristic extraction network Darknet-61 through 6 times of downsampling with the scale information of 7 × 7, 14 × 14, 28 × 28 and 56 × 56 output by YOLO; enabling a YOLO output layer in the yollov 3 algorithm to output feature maps of 5 scales, wherein the 5 scales comprise: 7 × 7, 14 × 14, 28 × 28, 56 × 56, 112 × 112;
and step 3: optimizing a YOLOv3 algorithm according to the characteristic extraction network Darknet-61 and a YOLO output layer;
and 4, step 4: obtaining a plurality of target candidate boxes on the feature map with 5 scales output by the optimized YOLOv3 algorithm by using a k-means algorithm;
and 5: and selecting the target candidate frame with the largest IOU from the plurality of target candidate frames on the feature map by applying an NMS (network management system) method in the candidate frames, and predicting the pedestrian target according to the selected target candidate frame.
Further, in the step 1, merging the parameters of the BN layer in the residual network basic unit into the convolutional layer thereof, specifically:
wherein, WmergedThe combined convolution weight bias value is obtained; w is the convolution weight; b ismergedTo mergeThe subsequent convolution offset; b is convolution offset; mu is a mean value; sigma2Is the variance; gamma is a scaling factor; β is an offset; epsilon is a small number.
Further, in the step 2, the feature extraction network Darknet-61 obtains 5 convolution features of the image through six times of downsampling, and specifically includes the following steps:
step A21: using the image of size 448 x 448 as the network input of Darknet-61, performing a first down-sampling;
step A22: performing second downsampling, performing feature extraction on the second downsampling result by using 2 residual error networks constructed in the step 1, and outputting a first convolution feature of 112 × 128;
step A23: performing third downsampling, performing feature extraction on the third downsampling result by using 8 residual error networks constructed in the step 1, and outputting a second convolution feature of 56 × 256;
step A24: performing fourth down-sampling, performing feature extraction on the fourth down-sampling result by using convolution with a channel of 512, and outputting a third convolution feature of 28 × 512;
step A25: performing fifth downsampling, performing feature extraction on the fifth downsampling result by using 4 residual error networks constructed in the step 1, and outputting a fourth convolution feature of 14 × 1024;
step A26: and performing sixth downsampling, performing feature extraction on the sixth downsampling result by using 4 residual error networks constructed in the step 1, and outputting a fifth convolution feature of 7 × 2028.
Further, the specific steps of step 2 are as follows:
step B21: the feature extraction network Darknet-61 obtains 5 convolution features of the image through six times of down sampling, and obtains a feature map with 7 x 7 scales through convolution of the fifth convolution feature;
constructing a characteristic pyramid network, and performing characteristic fusion on the characteristic graph with the 7 × 7 scale and the fourth convolution characteristic through the characteristic pyramid network to obtain a characteristic graph with the 14 × 14 scale;
step B22: carrying out feature fusion on the feature map with the 14 × 14 scale and the third convolution features through a feature pyramid network to obtain a feature map with the 28 × 28 scale;
step B23: carrying out feature fusion on the feature map with the scale of 28 × 28 and the second convolution features through a feature pyramid network to obtain a feature map with the scale of 56 × 56;
step B24: and carrying out feature fusion on the feature map with the 56 × 56 scale and the first convolution features through a feature pyramid network to obtain a feature map with the 112 × 112 scale.
Further, the number of the target candidate frames acquired in the step 4 is 3.
The method has the following advantages:
1. by improving a Darknet-53 feature extraction network, introducing four residual error networks and convolution layers, increasing down-sampling times, outputting a 7 x 7 feature map, enhancing the characterization capability of low-level features and improving the accuracy of large-scale pedestrian detection;
2. in the feature extraction network, the second down-sampling uses 2 residual modules to obtain information, the information is fused with the information of the lower network, and a 112 × 112 feature map is output, so that the accuracy of detecting the small-scale pedestrians is improved;
3. the FPN is utilized to fully fuse the deep layer feature information and the shallow layer feature information of the image, the output of the YOLO layer is increased from the original three-scale feature maps to five-scale feature maps, and the detection effect on large and small pedestrian targets and mutually-shielded pedestrian targets is enhanced. The robustness of pedestrian detection is improved;
4. parameters of a BN layer in a residual error network basic unit are merged into a convolution layer of the BN layer, so that the calculation amount is reduced, and the detection speed is improved.
Under the condition that the detection speed is kept to meet the real-time requirement, the detection precision is effectively improved, and particularly the effect of small target detection is effectively improved.
Drawings
The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and not to be construed as limiting the invention in any way, and in which:
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a structure of a feature extraction network Darket-61 of the present invention;
FIG. 3 is a network structure of YOLOv3 according to the present invention;
fig. 4 is a schematic structural diagram of an FPN module according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a pedestrian detection method with multi-layer convolution feature fusion according to the present invention, fig. 2 is a structure of a feature extraction network dark-61 according to the present invention, fig. 3 is a network structure of YOLOv3 according to the present invention, fig. 4 is a schematic diagram of a structure of an FPN module according to an embodiment of the present invention, and with reference to fig. 1, fig. 2, fig. 3, and fig. 4, steps of a specific embodiment of the present invention are:
step 1: constructing a residual error network of the characteristic extraction network Darknet, and merging the parameters of the BN layer in the basic unit of the residual error network into the convolution layer; constructing a feature extraction network according to the constructed residual error network, and recording as a feature extraction network Darknet-61;
in step 1, parameters of a BN layer in a residual network basic unit are merged into a convolutional layer thereof, which specifically comprises:
wherein, WmergedThe combined convolution weight bias value is obtained; w is the convolution weight; b ismergedBiasing for the combined convolution; b is convolution offset; mu is a mean value; sigma2Is the variance; gamma is a scaling factor; beta is an offset(ii) a Epsilon is a small number.
Step 2: constructing a characteristic pyramid network, and fusing 5 convolution characteristics of the image obtained by the characteristic extraction network Darknet-61 through 6 times of downsampling with the scale information of 7 × 7, 14 × 14, 28 × 28 and 56 × 56 output by YOLO; enabling a YOLO output layer in the yollov 3 algorithm to output feature maps of 5 scales, wherein the 5 scales comprise: 7 × 7, 14 × 14, 28 × 28, 56 × 56, 112 × 112;
further, in step 2, the feature extraction network Darknet-61 obtains 5 convolution features of the image through six times of downsampling, and the specific steps are as follows:
step A21: using the image of size 448 x 448 as the network input of Darknet-61, performing a first down-sampling;
step A22: performing second downsampling, performing feature extraction on the second downsampling result by using 2 residual error networks constructed in the step 1, and outputting a first convolution feature of 112 × 128;
step A23: performing third downsampling, performing feature extraction on the third downsampling result by using 8 residual error networks constructed in the step 1, and outputting a second convolution feature of 56 × 256;
step A24: performing fourth down-sampling, performing feature extraction on the fourth down-sampling result by using convolution with a channel of 512, and outputting a third convolution feature of 28 × 512;
step A25: performing fifth downsampling, performing feature extraction on the fifth downsampling result by using 4 residual error networks constructed in the step 1, and outputting a fourth convolution feature of 14 × 1024;
step A26: and performing sixth downsampling, performing feature extraction on the sixth downsampling result by using 4 residual error networks constructed in the step 1, and outputting a fifth convolution feature of 7 × 2028.
Further, the specific steps of step 2 are as follows:
step B21: the feature extraction network Darknet-61 obtains 5 convolution features of the image through six times of down sampling, and obtains a feature map with 7 x 7 scales through convolution of the fifth convolution feature;
constructing a characteristic pyramid network, and performing characteristic fusion on the characteristic graph with the 7 × 7 scale and the fourth convolution characteristic through the characteristic pyramid network to obtain a characteristic graph with the 14 × 14 scale;
step B22: carrying out feature fusion on the feature map with the 14 × 14 scale and the third convolution features through a feature pyramid network to obtain a feature map with the 28 × 28 scale;
step B23: carrying out feature fusion on the feature map with the scale of 28 × 28 and the second convolution features through a feature pyramid network to obtain a feature map with the scale of 56 × 56;
step B24: and carrying out feature fusion on the feature map with the 56 × 56 scale and the first convolution features through a feature pyramid network to obtain a feature map with the 112 × 112 scale.
And the superficial layer information and the deep layer characteristic information are fused to enhance the representation capability of the image pyramid. The obtained 7 × 7 and 14 × 14 feature maps are suitable for detecting large-size pedestrian targets in the image, the obtained 28 × 28 and 56 × 56 feature maps are suitable for detecting large-size pedestrian targets in the image, and the obtained 112 × 112 feature maps are suitable for detecting small-size pedestrian targets in the image, so that the missing rate of pedestrians is reduced.
And step 3: optimizing a YOLOv3 algorithm according to the characteristic extraction network Darknet-61 and a YOLO output layer;
further, after the second down-sampling of the feature extraction network Darknet-61, information obtained by connecting 2 residual modules is fused with the up-sampling information of the lower network, and a 112 × 112 feature map is output. And 4 residual modules and a downsampling convolution layer are added for downsampling for the 6 th time, the characteristic after downsampling outputs convolution characteristic through the residual modules, finally 7 x 7 characteristic graphs are output through the convolution modules, the output of the YOLO layer is changed from the original three scale characteristic graphs into five scale characteristic graphs, and the detection accuracy of pedestrians on large and small targets is enhanced.
And 4, step 4: obtaining a plurality of target candidate boxes on the feature map with 5 scales output by the optimized YOLOv3 algorithm by using a k-means algorithm;
in order to find the size of the candidate frame, the best centroid point is found by using a k-means algorithm, and the best k value and the distance function are considered, wherein the method specifically comprises the following steps:
1) the K values are sequentially increased from three times to find the optimal K value;
2) taking the found K as the starting point of clustering, taking the best clustering result as the optimal centroid, and then iterating according to a clustering rule to obtain the optimal clustering result;
3) calculating the distances from all the data to the mass center, and classifying the data into corresponding sets nearby;
4) after all data are calculated, the mass center of each set is recalculated;
5) performing set classification on all data again according to the newly calculated centroid;
6) repeating steps (4) and (5) until the newly calculated centroid does not change any more or the distance of the two centroids reaches the threshold value that we expect, and the algorithm terminates.
In the distance function, firstly, 15 frames are selected as candidate frames from one detection point, the IOU value is used as the distance function in the clustering function to calculate the centroid point, and the optimal distance function is (1-IOU)2The IOU value obtained under the condition is better, and the difficulty of middle frame regression in later training is reduced. Because a single object is detected, the weight of class prediction in the loss function is reduced, and the network can be converged better, namely, the class error function gives smaller weight to prevent interference to the overall error and optimize the loss function. The loss function expression is as follows:
wherein S2Indicating the number of grids, B indicating the number of frames generated for each grid, C indicating the type of detectable identification,indicating that the object falls within the jth bbox of the lattice i, either 0 or 1, λcoordIs a weight for coordinate errors, typically taken as 5, λnoobjIs a weight for the wide-high error. (x)i,yi) Represents the coordinates of the center point of the ith pre-selected box, (w)i,hi) Representing the ith pre-selected boxWidth and height.
The coordinate prediction of the first part of the formula is a loss function of the position and the size of the bounding box, and the root cutting of the width and the height of the second part is to reduce the difference of the bounding box with larger size difference; and the third part of the formula calculates that if an object falls into the boundary box, the confidence coefficient of the predicted boundary box containing the object and the loss of the real object and the boundary box IOU are calculated, and if no object center falls into the boundary box, the confidence coefficient of the predicted boundary box containing the object is smaller and better. However, most bounding boxes have no object, and the product is too small and too large, which causes the deviation of loss, therefore, the fourth part of the formula is added with the weight λnoobj0.5; in section 5 of the equation, if the object is included in the suggestion box, the probability of predicting the correct category is better as closer to 1, and the probability of the wrong category is better as closer to 0.
And 5: and selecting the target candidate frame with the largest IOU from the plurality of target candidate frames on the feature map by applying an NMS (network management system) method in the candidate frames, and predicting the pedestrian target according to the selected target candidate frame. The method comprises the following specific steps:
1) the extracted 5 scale feature graphs are sent to a YOLO network for detection, the maximum iteration number set by the method is 4000, the batch _ size is set to 64, the subdivisions is set to 16, the decay is 0.0005, the momentum is 0.9, the initial learning rate is 0.001, according to the trend of loss reduction, the learning rate can be properly adjusted, and the training is stopped until the loss function value output by the training data set is less than or equal to the threshold value or the set maximum iteration number is reached, so that the trained improved network is obtained.
2) Selecting an optimal target boundary frame by adopting a non-maximum value inhibition method, arranging the candidate frames according to the numerical values of the confidence coefficients, calculating the IOU values of the candidate frames and the real target frames to generate an IOU queue, selecting the boundary frame with the maximum IOU value to generate a prediction frame, and finally converting the coordinates of the prediction frame to an original image to output a prediction result.
In order to increase the detection speed of the pedestrian detection system, the feature extraction network darknet-61 and the FPN module in the embodiment are provided with a NVIDIA GTX 1080Ti GPU computer and a Ubuntu 16.04 system, so that real-time detection can be realized.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.
Claims (5)
1. A pedestrian detection method based on multilayer convolution feature fusion is characterized by comprising the following steps:
step 1: constructing a residual error network of the characteristic extraction network Darknet, and merging the parameters of the BN layer in the basic unit of the residual error network into the convolution layer; constructing a feature extraction network according to the constructed residual error network, and recording as a feature extraction network Darknet-61;
step 2: constructing a characteristic pyramid network, and fusing 5 convolution characteristics of the image obtained by the characteristic extraction network Darknet-61 through 6 times of downsampling with the scale information of 7 × 7, 14 × 14, 28 × 28 and 56 × 56 output by YOLO; enabling a YOLO output layer in the yollov 3 algorithm to output feature maps of 5 scales, wherein the 5 scales comprise: 7 × 7, 14 × 14, 28 × 28, 56 × 56, 112 × 112;
and step 3: optimizing a YOLOv3 algorithm according to the characteristic extraction network Darknet-61 and a YOLO output layer;
and 4, step 4: obtaining a plurality of target candidate boxes on the feature map with 5 scales output by the optimized YOLOv3 algorithm by using a k-means algorithm;
and 5: and selecting the target candidate frame with the largest IOU from the plurality of target candidate frames on the feature map by applying an NMS (network management system) method in the candidate frames, and predicting the pedestrian target according to the selected target candidate frame.
2. The pedestrian detection method with fusion of multilayer convolution features according to claim 1, wherein the step 1 of merging the parameters of the BN layer in the residual network basic unit into its convolution layer specifically comprises:
wherein, WmergedThe combined convolution weight bias value is obtained; w is the convolution weight; b ismergedBiasing for the combined convolution; b is convolution offset; mu is a mean value; sigma2Is the variance; gamma is a scaling factor; β is an offset; epsilon is a small number.
3. The pedestrian detection method with fusion of multilayer convolution features as claimed in claim 1 or 2, wherein the feature extraction network Darknet-61 in step 2 obtains 5 convolution features of the image through six times of downsampling, and the specific steps are as follows:
step A21: using the image of size 448 x 448 as the network input of Darknet-61, performing a first down-sampling;
step A22: performing second downsampling, performing feature extraction on the second downsampling result by using 2 residual error networks constructed in the step 1, and outputting a first convolution feature of 112 × 128;
step A23: performing third downsampling, performing feature extraction on the third downsampling result by using 8 residual error networks constructed in the step 1, and outputting a second convolution feature of 56 × 256;
step A24: performing fourth down-sampling, performing feature extraction on the fourth down-sampling result by using convolution with a channel of 512, and outputting a third convolution feature of 28 × 512;
step A25: performing fifth downsampling, performing feature extraction on the fifth downsampling result by using 4 residual error networks constructed in the step 1, and outputting a fourth convolution feature of 14 × 1024;
step A26: and performing sixth downsampling, performing feature extraction on the sixth downsampling result by using 4 residual error networks constructed in the step 1, and outputting a fifth convolution feature of 7 × 2028.
4. The pedestrian detection method with fusion of multilayer convolution features according to claim 3, characterized in that the specific steps of the step 2 are as follows:
step B21: the feature extraction network Darknet-61 obtains 5 convolution features of the image through six times of down sampling, and obtains a feature map with 7 x 7 scales through convolution of the fifth convolution feature;
constructing a characteristic pyramid network, and performing characteristic fusion on the characteristic graph with the 7 × 7 scale and the fourth convolution characteristic through the characteristic pyramid network to obtain a characteristic graph with the 14 × 14 scale;
step B22: carrying out feature fusion on the feature map with the 14 × 14 scale and the third convolution features through a feature pyramid network to obtain a feature map with the 28 × 28 scale;
step B23: carrying out feature fusion on the feature map with the scale of 28 × 28 and the second convolution features through a feature pyramid network to obtain a feature map with the scale of 56 × 56;
step B24: and carrying out feature fusion on the feature map with the 56 × 56 scale and the first convolution features through a feature pyramid network to obtain a feature map with the 112 × 112 scale.
5. The pedestrian detection method by multi-layer convolution feature fusion according to claim 1, wherein the number of target candidate frames acquired in the step 4 is 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011409937.6A CN112507861A (en) | 2020-12-04 | 2020-12-04 | Pedestrian detection method based on multilayer convolution feature fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011409937.6A CN112507861A (en) | 2020-12-04 | 2020-12-04 | Pedestrian detection method based on multilayer convolution feature fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112507861A true CN112507861A (en) | 2021-03-16 |
Family
ID=74971783
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011409937.6A Pending CN112507861A (en) | 2020-12-04 | 2020-12-04 | Pedestrian detection method based on multilayer convolution feature fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112507861A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113111979A (en) * | 2021-06-16 | 2021-07-13 | 上海齐感电子信息科技有限公司 | Model training method, image detection method and detection device |
CN113139615A (en) * | 2021-05-08 | 2021-07-20 | 北京联合大学 | Unmanned environment target detection method based on embedded equipment |
CN113516012A (en) * | 2021-04-09 | 2021-10-19 | 湖北工业大学 | Pedestrian re-identification method and system based on multi-level feature fusion |
CN113537014A (en) * | 2021-07-06 | 2021-10-22 | 北京观微科技有限公司 | Improved darknet network-based ground-to-air missile position target detection and identification method |
CN113792660A (en) * | 2021-09-15 | 2021-12-14 | 江苏科技大学 | Pedestrian detection method, system, medium and equipment based on improved YOLOv3 network |
CN113837199A (en) * | 2021-08-30 | 2021-12-24 | 武汉理工大学 | Image feature extraction method based on cross-layer residual error double-path pyramid network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110660052A (en) * | 2019-09-23 | 2020-01-07 | 武汉科技大学 | Hot-rolled strip steel surface defect detection method based on deep learning |
CN111368850A (en) * | 2018-12-25 | 2020-07-03 | 展讯通信(天津)有限公司 | Image feature extraction method, image target detection method, image feature extraction device, image target detection device, convolution device, CNN network device and terminal |
CN111461083A (en) * | 2020-05-26 | 2020-07-28 | 青岛大学 | Rapid vehicle detection method based on deep learning |
CN111563525A (en) * | 2020-03-25 | 2020-08-21 | 北京航空航天大学 | Moving target detection method based on YOLOv3-Tiny |
CN111898432A (en) * | 2020-06-24 | 2020-11-06 | 南京理工大学 | Pedestrian detection system and method based on improved YOLOv3 algorithm |
-
2020
- 2020-12-04 CN CN202011409937.6A patent/CN112507861A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111368850A (en) * | 2018-12-25 | 2020-07-03 | 展讯通信(天津)有限公司 | Image feature extraction method, image target detection method, image feature extraction device, image target detection device, convolution device, CNN network device and terminal |
CN110660052A (en) * | 2019-09-23 | 2020-01-07 | 武汉科技大学 | Hot-rolled strip steel surface defect detection method based on deep learning |
CN111563525A (en) * | 2020-03-25 | 2020-08-21 | 北京航空航天大学 | Moving target detection method based on YOLOv3-Tiny |
CN111461083A (en) * | 2020-05-26 | 2020-07-28 | 青岛大学 | Rapid vehicle detection method based on deep learning |
CN111898432A (en) * | 2020-06-24 | 2020-11-06 | 南京理工大学 | Pedestrian detection system and method based on improved YOLOv3 algorithm |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113516012A (en) * | 2021-04-09 | 2021-10-19 | 湖北工业大学 | Pedestrian re-identification method and system based on multi-level feature fusion |
CN113516012B (en) * | 2021-04-09 | 2022-04-15 | 湖北工业大学 | Pedestrian re-identification method and system based on multi-level feature fusion |
CN113139615A (en) * | 2021-05-08 | 2021-07-20 | 北京联合大学 | Unmanned environment target detection method based on embedded equipment |
CN113111979A (en) * | 2021-06-16 | 2021-07-13 | 上海齐感电子信息科技有限公司 | Model training method, image detection method and detection device |
CN113111979B (en) * | 2021-06-16 | 2021-09-07 | 上海齐感电子信息科技有限公司 | Model training method, image detection method and detection device |
CN113537014A (en) * | 2021-07-06 | 2021-10-22 | 北京观微科技有限公司 | Improved darknet network-based ground-to-air missile position target detection and identification method |
CN113837199A (en) * | 2021-08-30 | 2021-12-24 | 武汉理工大学 | Image feature extraction method based on cross-layer residual error double-path pyramid network |
CN113837199B (en) * | 2021-08-30 | 2024-01-09 | 武汉理工大学 | Image feature extraction method based on cross-layer residual double-path pyramid network |
CN113792660A (en) * | 2021-09-15 | 2021-12-14 | 江苏科技大学 | Pedestrian detection method, system, medium and equipment based on improved YOLOv3 network |
CN113792660B (en) * | 2021-09-15 | 2024-03-01 | 江苏科技大学 | Pedestrian detection method, system, medium and equipment based on improved YOLOv3 network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112507861A (en) | Pedestrian detection method based on multilayer convolution feature fusion | |
Jung et al. | Boundary enhancement semantic segmentation for building extraction from remote sensed image | |
CN107766894B (en) | Remote sensing image natural language generation method based on attention mechanism and deep learning | |
CN113705478B (en) | Mangrove single wood target detection method based on improved YOLOv5 | |
CN113642431B (en) | Training method and device of target detection model, electronic equipment and storage medium | |
CN109145836B (en) | Ship target video detection method based on deep learning network and Kalman filtering | |
CN116579616B (en) | Risk identification method based on deep learning | |
CN114612937B (en) | Pedestrian detection method based on single-mode enhancement by combining infrared light and visible light | |
CN114332473B (en) | Object detection method, device, computer apparatus, storage medium, and program product | |
CN111798417A (en) | SSD-based remote sensing image target detection method and device | |
CN113610070A (en) | Landslide disaster identification method based on multi-source data fusion | |
CN114821102A (en) | Intensive citrus quantity detection method, equipment, storage medium and device | |
CN117132914B (en) | Method and system for identifying large model of universal power equipment | |
CN115830471A (en) | Multi-scale feature fusion and alignment domain self-adaptive cloud detection method | |
CN116310850B (en) | Remote sensing image target detection method based on improved RetinaNet | |
CN113128550A (en) | Apparatus and computer-implemented method for data efficient active machine learning | |
CN117975002A (en) | Weak supervision image segmentation method based on multi-scale pseudo tag fusion | |
CN114283326A (en) | Underwater target re-identification method combining local perception and high-order feature reconstruction | |
CN111242028A (en) | Remote sensing image ground object segmentation method based on U-Net | |
CN118279320A (en) | Target instance segmentation model building method based on automatic prompt learning and application thereof | |
CN113496260A (en) | Grain depot worker non-standard operation detection method based on improved YOLOv3 algorithm | |
CN117671480A (en) | Landslide automatic identification method, system and computer equipment based on visual large model | |
CN112465821A (en) | Multi-scale pest image detection method based on boundary key point perception | |
CN115661429B (en) | System and method for identifying defects of boiler water wall pipe and storage medium | |
CN115830302A (en) | Multi-scale feature extraction and fusion power distribution network equipment positioning identification method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |