CN112507861A - Pedestrian detection method based on multilayer convolution feature fusion - Google Patents

Pedestrian detection method based on multilayer convolution feature fusion Download PDF

Info

Publication number
CN112507861A
CN112507861A CN202011409937.6A CN202011409937A CN112507861A CN 112507861 A CN112507861 A CN 112507861A CN 202011409937 A CN202011409937 A CN 202011409937A CN 112507861 A CN112507861 A CN 112507861A
Authority
CN
China
Prior art keywords
convolution
feature
network
downsampling
scale
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011409937.6A
Other languages
Chinese (zh)
Inventor
马国军
韩松
夏健
郑威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University of Science and Technology
Original Assignee
Jiangsu University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University of Science and Technology filed Critical Jiangsu University of Science and Technology
Priority to CN202011409937.6A priority Critical patent/CN112507861A/en
Publication of CN112507861A publication Critical patent/CN112507861A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a pedestrian detection method with multi-layer convolution feature fusion, which constructs a new feature extraction network Darknet-61 by reconstructing a residual error network, so that the feature extraction network Darknet-61 has the capability of 6 times of down-sampling, the output of a YOLOv3 algorithm YOLO output layer is increased from 3 layers to 5 layers by the new feature extraction network Darknet-61, a target candidate box is obtained on the basis of a YOLOv3 algorithm with 5-layer output by a k-mean algorithm, and the subsequent processing is carried out on the current optimal candidate box in the target candidate box by an NMS method. According to the invention, by improving the Darknet-53 feature extraction network, four residual error networks and convolution layers are introduced, the down-sampling times are increased, 7 × 7 feature maps are output, the characterization capability of low-level features is enhanced, and the accuracy of large-scale pedestrian detection is improved.

Description

Pedestrian detection method based on multilayer convolution feature fusion
Technical Field
The invention relates to the field of pedestrian detection, in particular to a pedestrian detection method based on multilayer convolution feature fusion.
Background
Pedestrian detection is a core technology of intelligent equipment, can let machine equipment acquire video or image information around to utilize the powerful analysis ability of computer, carry out visual processing to the information that acquires, thereby have the ability of observing and analyzing to complicated thing like the people. The system helps people to complete various recognition and detection tasks.
Pedestrian detection is an important research direction in the field of computer vision, whether pedestrians exist or not is identified from images or video clips through a computer, if the pedestrians exist, specific positions of the pedestrians are further detected, and therefore the position coordinates of the pedestrians are accurately calibrated, and the method and the device are widely applied to the fields of manufacturing industry, military, medical treatment and the like.
The pedestrian is one of the most difficult objects that detect among the target detection, and traditional pedestrian detection algorithm based on artifical characteristic extraction can't obtain big breakthrough in detecting the precision, this is because the background is complicated various in real environment, and the pedestrian will cause the interference to detecting in different occasions, and the polytropy and the yardstick difference of pedestrian gesture simultaneously hardly carry out the pedestrian characterization through traditional characteristic, and the rate of missing inspection can increase in actual detection. The most serious is the shielding problem, which brings difficulty to sample collection, and the detection precision is necessarily tested.
In order to better solve the problems, the invention provides a pedestrian detection method based on multilayer convolution feature fusion. After the second downsampling in the feature extraction network, information obtained by connecting 2 residual error modules is fused with information of a lower-layer network, and a 112 × 112 feature map is output. And meanwhile, 4 residual modules and one downsampling convolution layer are added for downsampling for the 6 th time, the characteristic after downsampling outputs convolution characteristic through the residual modules, finally 7 × 7 characteristic graphs are output through the convolution modules, the output of the YOLO layer is changed from the original three scale characteristic graphs into five scale characteristic graphs, the detection accuracy of pedestrians on large and small targets is enhanced, and the characteristic description capability of the network is improved. The finally improved network has higher accuracy and lower false alarm rate, and simultaneously maintains the robustness of the original algorithm.
Disclosure of Invention
The invention provides a pedestrian detection method based on multi-layer convolution feature fusion, and aims to solve the problem of missing detection in pedestrian detection of different scales in the prior art.
The invention relates to a pedestrian detection method with fusion of multilayer convolution characteristics, which comprises the following steps:
step 1: constructing a residual error network of the characteristic extraction network Darknet, and merging the parameters of the BN layer in the basic unit of the residual error network into the convolution layer; constructing a feature extraction network according to the constructed residual error network, and recording as a feature extraction network Darknet-61;
step 2: constructing a characteristic pyramid network, and fusing 5 convolution characteristics of the image obtained by the characteristic extraction network Darknet-61 through 6 times of downsampling with the scale information of 7 × 7, 14 × 14, 28 × 28 and 56 × 56 output by YOLO; enabling a YOLO output layer in the yollov 3 algorithm to output feature maps of 5 scales, wherein the 5 scales comprise: 7 × 7, 14 × 14, 28 × 28, 56 × 56, 112 × 112;
and step 3: optimizing a YOLOv3 algorithm according to the characteristic extraction network Darknet-61 and a YOLO output layer;
and 4, step 4: obtaining a plurality of target candidate boxes on the feature map with 5 scales output by the optimized YOLOv3 algorithm by using a k-means algorithm;
and 5: and selecting the target candidate frame with the largest IOU from the plurality of target candidate frames on the feature map by applying an NMS (network management system) method in the candidate frames, and predicting the pedestrian target according to the selected target candidate frame.
Further, in the step 1, merging the parameters of the BN layer in the residual network basic unit into the convolutional layer thereof, specifically:
Figure BDA0002818469330000021
Figure BDA0002818469330000022
wherein, WmergedThe combined convolution weight bias value is obtained; w is the convolution weight; b ismergedTo mergeThe subsequent convolution offset; b is convolution offset; mu is a mean value; sigma2Is the variance; gamma is a scaling factor; β is an offset; epsilon is a small number.
Further, in the step 2, the feature extraction network Darknet-61 obtains 5 convolution features of the image through six times of downsampling, and specifically includes the following steps:
step A21: using the image of size 448 x 448 as the network input of Darknet-61, performing a first down-sampling;
step A22: performing second downsampling, performing feature extraction on the second downsampling result by using 2 residual error networks constructed in the step 1, and outputting a first convolution feature of 112 × 128;
step A23: performing third downsampling, performing feature extraction on the third downsampling result by using 8 residual error networks constructed in the step 1, and outputting a second convolution feature of 56 × 256;
step A24: performing fourth down-sampling, performing feature extraction on the fourth down-sampling result by using convolution with a channel of 512, and outputting a third convolution feature of 28 × 512;
step A25: performing fifth downsampling, performing feature extraction on the fifth downsampling result by using 4 residual error networks constructed in the step 1, and outputting a fourth convolution feature of 14 × 1024;
step A26: and performing sixth downsampling, performing feature extraction on the sixth downsampling result by using 4 residual error networks constructed in the step 1, and outputting a fifth convolution feature of 7 × 2028.
Further, the specific steps of step 2 are as follows:
step B21: the feature extraction network Darknet-61 obtains 5 convolution features of the image through six times of down sampling, and obtains a feature map with 7 x 7 scales through convolution of the fifth convolution feature;
constructing a characteristic pyramid network, and performing characteristic fusion on the characteristic graph with the 7 × 7 scale and the fourth convolution characteristic through the characteristic pyramid network to obtain a characteristic graph with the 14 × 14 scale;
step B22: carrying out feature fusion on the feature map with the 14 × 14 scale and the third convolution features through a feature pyramid network to obtain a feature map with the 28 × 28 scale;
step B23: carrying out feature fusion on the feature map with the scale of 28 × 28 and the second convolution features through a feature pyramid network to obtain a feature map with the scale of 56 × 56;
step B24: and carrying out feature fusion on the feature map with the 56 × 56 scale and the first convolution features through a feature pyramid network to obtain a feature map with the 112 × 112 scale.
Further, the number of the target candidate frames acquired in the step 4 is 3.
The method has the following advantages:
1. by improving a Darknet-53 feature extraction network, introducing four residual error networks and convolution layers, increasing down-sampling times, outputting a 7 x 7 feature map, enhancing the characterization capability of low-level features and improving the accuracy of large-scale pedestrian detection;
2. in the feature extraction network, the second down-sampling uses 2 residual modules to obtain information, the information is fused with the information of the lower network, and a 112 × 112 feature map is output, so that the accuracy of detecting the small-scale pedestrians is improved;
3. the FPN is utilized to fully fuse the deep layer feature information and the shallow layer feature information of the image, the output of the YOLO layer is increased from the original three-scale feature maps to five-scale feature maps, and the detection effect on large and small pedestrian targets and mutually-shielded pedestrian targets is enhanced. The robustness of pedestrian detection is improved;
4. parameters of a BN layer in a residual error network basic unit are merged into a convolution layer of the BN layer, so that the calculation amount is reduced, and the detection speed is improved.
Under the condition that the detection speed is kept to meet the real-time requirement, the detection precision is effectively improved, and particularly the effect of small target detection is effectively improved.
Drawings
The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and not to be construed as limiting the invention in any way, and in which:
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a structure of a feature extraction network Darket-61 of the present invention;
FIG. 3 is a network structure of YOLOv3 according to the present invention;
fig. 4 is a schematic structural diagram of an FPN module according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a pedestrian detection method with multi-layer convolution feature fusion according to the present invention, fig. 2 is a structure of a feature extraction network dark-61 according to the present invention, fig. 3 is a network structure of YOLOv3 according to the present invention, fig. 4 is a schematic diagram of a structure of an FPN module according to an embodiment of the present invention, and with reference to fig. 1, fig. 2, fig. 3, and fig. 4, steps of a specific embodiment of the present invention are:
step 1: constructing a residual error network of the characteristic extraction network Darknet, and merging the parameters of the BN layer in the basic unit of the residual error network into the convolution layer; constructing a feature extraction network according to the constructed residual error network, and recording as a feature extraction network Darknet-61;
in step 1, parameters of a BN layer in a residual network basic unit are merged into a convolutional layer thereof, which specifically comprises:
Figure BDA0002818469330000051
Figure BDA0002818469330000052
wherein, WmergedThe combined convolution weight bias value is obtained; w is the convolution weight; b ismergedBiasing for the combined convolution; b is convolution offset; mu is a mean value; sigma2Is the variance; gamma is a scaling factor; beta is an offset(ii) a Epsilon is a small number.
Step 2: constructing a characteristic pyramid network, and fusing 5 convolution characteristics of the image obtained by the characteristic extraction network Darknet-61 through 6 times of downsampling with the scale information of 7 × 7, 14 × 14, 28 × 28 and 56 × 56 output by YOLO; enabling a YOLO output layer in the yollov 3 algorithm to output feature maps of 5 scales, wherein the 5 scales comprise: 7 × 7, 14 × 14, 28 × 28, 56 × 56, 112 × 112;
further, in step 2, the feature extraction network Darknet-61 obtains 5 convolution features of the image through six times of downsampling, and the specific steps are as follows:
step A21: using the image of size 448 x 448 as the network input of Darknet-61, performing a first down-sampling;
step A22: performing second downsampling, performing feature extraction on the second downsampling result by using 2 residual error networks constructed in the step 1, and outputting a first convolution feature of 112 × 128;
step A23: performing third downsampling, performing feature extraction on the third downsampling result by using 8 residual error networks constructed in the step 1, and outputting a second convolution feature of 56 × 256;
step A24: performing fourth down-sampling, performing feature extraction on the fourth down-sampling result by using convolution with a channel of 512, and outputting a third convolution feature of 28 × 512;
step A25: performing fifth downsampling, performing feature extraction on the fifth downsampling result by using 4 residual error networks constructed in the step 1, and outputting a fourth convolution feature of 14 × 1024;
step A26: and performing sixth downsampling, performing feature extraction on the sixth downsampling result by using 4 residual error networks constructed in the step 1, and outputting a fifth convolution feature of 7 × 2028.
Further, the specific steps of step 2 are as follows:
step B21: the feature extraction network Darknet-61 obtains 5 convolution features of the image through six times of down sampling, and obtains a feature map with 7 x 7 scales through convolution of the fifth convolution feature;
constructing a characteristic pyramid network, and performing characteristic fusion on the characteristic graph with the 7 × 7 scale and the fourth convolution characteristic through the characteristic pyramid network to obtain a characteristic graph with the 14 × 14 scale;
step B22: carrying out feature fusion on the feature map with the 14 × 14 scale and the third convolution features through a feature pyramid network to obtain a feature map with the 28 × 28 scale;
step B23: carrying out feature fusion on the feature map with the scale of 28 × 28 and the second convolution features through a feature pyramid network to obtain a feature map with the scale of 56 × 56;
step B24: and carrying out feature fusion on the feature map with the 56 × 56 scale and the first convolution features through a feature pyramid network to obtain a feature map with the 112 × 112 scale.
And the superficial layer information and the deep layer characteristic information are fused to enhance the representation capability of the image pyramid. The obtained 7 × 7 and 14 × 14 feature maps are suitable for detecting large-size pedestrian targets in the image, the obtained 28 × 28 and 56 × 56 feature maps are suitable for detecting large-size pedestrian targets in the image, and the obtained 112 × 112 feature maps are suitable for detecting small-size pedestrian targets in the image, so that the missing rate of pedestrians is reduced.
And step 3: optimizing a YOLOv3 algorithm according to the characteristic extraction network Darknet-61 and a YOLO output layer;
further, after the second down-sampling of the feature extraction network Darknet-61, information obtained by connecting 2 residual modules is fused with the up-sampling information of the lower network, and a 112 × 112 feature map is output. And 4 residual modules and a downsampling convolution layer are added for downsampling for the 6 th time, the characteristic after downsampling outputs convolution characteristic through the residual modules, finally 7 x 7 characteristic graphs are output through the convolution modules, the output of the YOLO layer is changed from the original three scale characteristic graphs into five scale characteristic graphs, and the detection accuracy of pedestrians on large and small targets is enhanced.
And 4, step 4: obtaining a plurality of target candidate boxes on the feature map with 5 scales output by the optimized YOLOv3 algorithm by using a k-means algorithm;
in order to find the size of the candidate frame, the best centroid point is found by using a k-means algorithm, and the best k value and the distance function are considered, wherein the method specifically comprises the following steps:
1) the K values are sequentially increased from three times to find the optimal K value;
2) taking the found K as the starting point of clustering, taking the best clustering result as the optimal centroid, and then iterating according to a clustering rule to obtain the optimal clustering result;
3) calculating the distances from all the data to the mass center, and classifying the data into corresponding sets nearby;
4) after all data are calculated, the mass center of each set is recalculated;
5) performing set classification on all data again according to the newly calculated centroid;
6) repeating steps (4) and (5) until the newly calculated centroid does not change any more or the distance of the two centroids reaches the threshold value that we expect, and the algorithm terminates.
In the distance function, firstly, 15 frames are selected as candidate frames from one detection point, the IOU value is used as the distance function in the clustering function to calculate the centroid point, and the optimal distance function is (1-IOU)2The IOU value obtained under the condition is better, and the difficulty of middle frame regression in later training is reduced. Because a single object is detected, the weight of class prediction in the loss function is reduced, and the network can be converged better, namely, the class error function gives smaller weight to prevent interference to the overall error and optimize the loss function. The loss function expression is as follows:
Figure BDA0002818469330000081
wherein S2Indicating the number of grids, B indicating the number of frames generated for each grid, C indicating the type of detectable identification,
Figure BDA0002818469330000082
indicating that the object falls within the jth bbox of the lattice i, either 0 or 1, λcoordIs a weight for coordinate errors, typically taken as 5, λnoobjIs a weight for the wide-high error. (x)i,yi) Represents the coordinates of the center point of the ith pre-selected box, (w)i,hi) Representing the ith pre-selected boxWidth and height.
The coordinate prediction of the first part of the formula is a loss function of the position and the size of the bounding box, and the root cutting of the width and the height of the second part is to reduce the difference of the bounding box with larger size difference; and the third part of the formula calculates that if an object falls into the boundary box, the confidence coefficient of the predicted boundary box containing the object and the loss of the real object and the boundary box IOU are calculated, and if no object center falls into the boundary box, the confidence coefficient of the predicted boundary box containing the object is smaller and better. However, most bounding boxes have no object, and the product is too small and too large, which causes the deviation of loss, therefore, the fourth part of the formula is added with the weight λnoobj0.5; in section 5 of the equation, if the object is included in the suggestion box, the probability of predicting the correct category is better as closer to 1, and the probability of the wrong category is better as closer to 0.
And 5: and selecting the target candidate frame with the largest IOU from the plurality of target candidate frames on the feature map by applying an NMS (network management system) method in the candidate frames, and predicting the pedestrian target according to the selected target candidate frame. The method comprises the following specific steps:
1) the extracted 5 scale feature graphs are sent to a YOLO network for detection, the maximum iteration number set by the method is 4000, the batch _ size is set to 64, the subdivisions is set to 16, the decay is 0.0005, the momentum is 0.9, the initial learning rate is 0.001, according to the trend of loss reduction, the learning rate can be properly adjusted, and the training is stopped until the loss function value output by the training data set is less than or equal to the threshold value or the set maximum iteration number is reached, so that the trained improved network is obtained.
2) Selecting an optimal target boundary frame by adopting a non-maximum value inhibition method, arranging the candidate frames according to the numerical values of the confidence coefficients, calculating the IOU values of the candidate frames and the real target frames to generate an IOU queue, selecting the boundary frame with the maximum IOU value to generate a prediction frame, and finally converting the coordinates of the prediction frame to an original image to output a prediction result.
In order to increase the detection speed of the pedestrian detection system, the feature extraction network darknet-61 and the FPN module in the embodiment are provided with a NVIDIA GTX 1080Ti GPU computer and a Ubuntu 16.04 system, so that real-time detection can be realized.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (5)

1. A pedestrian detection method based on multilayer convolution feature fusion is characterized by comprising the following steps:
step 1: constructing a residual error network of the characteristic extraction network Darknet, and merging the parameters of the BN layer in the basic unit of the residual error network into the convolution layer; constructing a feature extraction network according to the constructed residual error network, and recording as a feature extraction network Darknet-61;
step 2: constructing a characteristic pyramid network, and fusing 5 convolution characteristics of the image obtained by the characteristic extraction network Darknet-61 through 6 times of downsampling with the scale information of 7 × 7, 14 × 14, 28 × 28 and 56 × 56 output by YOLO; enabling a YOLO output layer in the yollov 3 algorithm to output feature maps of 5 scales, wherein the 5 scales comprise: 7 × 7, 14 × 14, 28 × 28, 56 × 56, 112 × 112;
and step 3: optimizing a YOLOv3 algorithm according to the characteristic extraction network Darknet-61 and a YOLO output layer;
and 4, step 4: obtaining a plurality of target candidate boxes on the feature map with 5 scales output by the optimized YOLOv3 algorithm by using a k-means algorithm;
and 5: and selecting the target candidate frame with the largest IOU from the plurality of target candidate frames on the feature map by applying an NMS (network management system) method in the candidate frames, and predicting the pedestrian target according to the selected target candidate frame.
2. The pedestrian detection method with fusion of multilayer convolution features according to claim 1, wherein the step 1 of merging the parameters of the BN layer in the residual network basic unit into its convolution layer specifically comprises:
Figure FDA0002818469320000011
Figure FDA0002818469320000012
wherein, WmergedThe combined convolution weight bias value is obtained; w is the convolution weight; b ismergedBiasing for the combined convolution; b is convolution offset; mu is a mean value; sigma2Is the variance; gamma is a scaling factor; β is an offset; epsilon is a small number.
3. The pedestrian detection method with fusion of multilayer convolution features as claimed in claim 1 or 2, wherein the feature extraction network Darknet-61 in step 2 obtains 5 convolution features of the image through six times of downsampling, and the specific steps are as follows:
step A21: using the image of size 448 x 448 as the network input of Darknet-61, performing a first down-sampling;
step A22: performing second downsampling, performing feature extraction on the second downsampling result by using 2 residual error networks constructed in the step 1, and outputting a first convolution feature of 112 × 128;
step A23: performing third downsampling, performing feature extraction on the third downsampling result by using 8 residual error networks constructed in the step 1, and outputting a second convolution feature of 56 × 256;
step A24: performing fourth down-sampling, performing feature extraction on the fourth down-sampling result by using convolution with a channel of 512, and outputting a third convolution feature of 28 × 512;
step A25: performing fifth downsampling, performing feature extraction on the fifth downsampling result by using 4 residual error networks constructed in the step 1, and outputting a fourth convolution feature of 14 × 1024;
step A26: and performing sixth downsampling, performing feature extraction on the sixth downsampling result by using 4 residual error networks constructed in the step 1, and outputting a fifth convolution feature of 7 × 2028.
4. The pedestrian detection method with fusion of multilayer convolution features according to claim 3, characterized in that the specific steps of the step 2 are as follows:
step B21: the feature extraction network Darknet-61 obtains 5 convolution features of the image through six times of down sampling, and obtains a feature map with 7 x 7 scales through convolution of the fifth convolution feature;
constructing a characteristic pyramid network, and performing characteristic fusion on the characteristic graph with the 7 × 7 scale and the fourth convolution characteristic through the characteristic pyramid network to obtain a characteristic graph with the 14 × 14 scale;
step B22: carrying out feature fusion on the feature map with the 14 × 14 scale and the third convolution features through a feature pyramid network to obtain a feature map with the 28 × 28 scale;
step B23: carrying out feature fusion on the feature map with the scale of 28 × 28 and the second convolution features through a feature pyramid network to obtain a feature map with the scale of 56 × 56;
step B24: and carrying out feature fusion on the feature map with the 56 × 56 scale and the first convolution features through a feature pyramid network to obtain a feature map with the 112 × 112 scale.
5. The pedestrian detection method by multi-layer convolution feature fusion according to claim 1, wherein the number of target candidate frames acquired in the step 4 is 3.
CN202011409937.6A 2020-12-04 2020-12-04 Pedestrian detection method based on multilayer convolution feature fusion Pending CN112507861A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011409937.6A CN112507861A (en) 2020-12-04 2020-12-04 Pedestrian detection method based on multilayer convolution feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011409937.6A CN112507861A (en) 2020-12-04 2020-12-04 Pedestrian detection method based on multilayer convolution feature fusion

Publications (1)

Publication Number Publication Date
CN112507861A true CN112507861A (en) 2021-03-16

Family

ID=74971783

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011409937.6A Pending CN112507861A (en) 2020-12-04 2020-12-04 Pedestrian detection method based on multilayer convolution feature fusion

Country Status (1)

Country Link
CN (1) CN112507861A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111979A (en) * 2021-06-16 2021-07-13 上海齐感电子信息科技有限公司 Model training method, image detection method and detection device
CN113139615A (en) * 2021-05-08 2021-07-20 北京联合大学 Unmanned environment target detection method based on embedded equipment
CN113516012A (en) * 2021-04-09 2021-10-19 湖北工业大学 Pedestrian re-identification method and system based on multi-level feature fusion
CN113537014A (en) * 2021-07-06 2021-10-22 北京观微科技有限公司 Improved darknet network-based ground-to-air missile position target detection and identification method
CN113792660A (en) * 2021-09-15 2021-12-14 江苏科技大学 Pedestrian detection method, system, medium and equipment based on improved YOLOv3 network
CN113837199A (en) * 2021-08-30 2021-12-24 武汉理工大学 Image feature extraction method based on cross-layer residual error double-path pyramid network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110660052A (en) * 2019-09-23 2020-01-07 武汉科技大学 Hot-rolled strip steel surface defect detection method based on deep learning
CN111368850A (en) * 2018-12-25 2020-07-03 展讯通信(天津)有限公司 Image feature extraction method, image target detection method, image feature extraction device, image target detection device, convolution device, CNN network device and terminal
CN111461083A (en) * 2020-05-26 2020-07-28 青岛大学 Rapid vehicle detection method based on deep learning
CN111563525A (en) * 2020-03-25 2020-08-21 北京航空航天大学 Moving target detection method based on YOLOv3-Tiny
CN111898432A (en) * 2020-06-24 2020-11-06 南京理工大学 Pedestrian detection system and method based on improved YOLOv3 algorithm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368850A (en) * 2018-12-25 2020-07-03 展讯通信(天津)有限公司 Image feature extraction method, image target detection method, image feature extraction device, image target detection device, convolution device, CNN network device and terminal
CN110660052A (en) * 2019-09-23 2020-01-07 武汉科技大学 Hot-rolled strip steel surface defect detection method based on deep learning
CN111563525A (en) * 2020-03-25 2020-08-21 北京航空航天大学 Moving target detection method based on YOLOv3-Tiny
CN111461083A (en) * 2020-05-26 2020-07-28 青岛大学 Rapid vehicle detection method based on deep learning
CN111898432A (en) * 2020-06-24 2020-11-06 南京理工大学 Pedestrian detection system and method based on improved YOLOv3 algorithm

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113516012A (en) * 2021-04-09 2021-10-19 湖北工业大学 Pedestrian re-identification method and system based on multi-level feature fusion
CN113516012B (en) * 2021-04-09 2022-04-15 湖北工业大学 Pedestrian re-identification method and system based on multi-level feature fusion
CN113139615A (en) * 2021-05-08 2021-07-20 北京联合大学 Unmanned environment target detection method based on embedded equipment
CN113111979A (en) * 2021-06-16 2021-07-13 上海齐感电子信息科技有限公司 Model training method, image detection method and detection device
CN113111979B (en) * 2021-06-16 2021-09-07 上海齐感电子信息科技有限公司 Model training method, image detection method and detection device
CN113537014A (en) * 2021-07-06 2021-10-22 北京观微科技有限公司 Improved darknet network-based ground-to-air missile position target detection and identification method
CN113837199A (en) * 2021-08-30 2021-12-24 武汉理工大学 Image feature extraction method based on cross-layer residual error double-path pyramid network
CN113837199B (en) * 2021-08-30 2024-01-09 武汉理工大学 Image feature extraction method based on cross-layer residual double-path pyramid network
CN113792660A (en) * 2021-09-15 2021-12-14 江苏科技大学 Pedestrian detection method, system, medium and equipment based on improved YOLOv3 network
CN113792660B (en) * 2021-09-15 2024-03-01 江苏科技大学 Pedestrian detection method, system, medium and equipment based on improved YOLOv3 network

Similar Documents

Publication Publication Date Title
CN112507861A (en) Pedestrian detection method based on multilayer convolution feature fusion
Jung et al. Boundary enhancement semantic segmentation for building extraction from remote sensed image
CN107766894B (en) Remote sensing image natural language generation method based on attention mechanism and deep learning
CN113705478B (en) Mangrove single wood target detection method based on improved YOLOv5
CN113642431B (en) Training method and device of target detection model, electronic equipment and storage medium
CN109145836B (en) Ship target video detection method based on deep learning network and Kalman filtering
CN116579616B (en) Risk identification method based on deep learning
CN114612937B (en) Pedestrian detection method based on single-mode enhancement by combining infrared light and visible light
CN114332473B (en) Object detection method, device, computer apparatus, storage medium, and program product
CN111798417A (en) SSD-based remote sensing image target detection method and device
CN113610070A (en) Landslide disaster identification method based on multi-source data fusion
CN114821102A (en) Intensive citrus quantity detection method, equipment, storage medium and device
CN117132914B (en) Method and system for identifying large model of universal power equipment
CN115830471A (en) Multi-scale feature fusion and alignment domain self-adaptive cloud detection method
CN116310850B (en) Remote sensing image target detection method based on improved RetinaNet
CN113128550A (en) Apparatus and computer-implemented method for data efficient active machine learning
CN117975002A (en) Weak supervision image segmentation method based on multi-scale pseudo tag fusion
CN114283326A (en) Underwater target re-identification method combining local perception and high-order feature reconstruction
CN111242028A (en) Remote sensing image ground object segmentation method based on U-Net
CN118279320A (en) Target instance segmentation model building method based on automatic prompt learning and application thereof
CN113496260A (en) Grain depot worker non-standard operation detection method based on improved YOLOv3 algorithm
CN117671480A (en) Landslide automatic identification method, system and computer equipment based on visual large model
CN112465821A (en) Multi-scale pest image detection method based on boundary key point perception
CN115661429B (en) System and method for identifying defects of boiler water wall pipe and storage medium
CN115830302A (en) Multi-scale feature extraction and fusion power distribution network equipment positioning identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination