CN113033371A - CSP model-based multi-level feature fusion pedestrian detection method - Google Patents

CSP model-based multi-level feature fusion pedestrian detection method Download PDF

Info

Publication number
CN113033371A
CN113033371A CN202110295911.1A CN202110295911A CN113033371A CN 113033371 A CN113033371 A CN 113033371A CN 202110295911 A CN202110295911 A CN 202110295911A CN 113033371 A CN113033371 A CN 113033371A
Authority
CN
China
Prior art keywords
target
stage
network
size
center point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110295911.1A
Other languages
Chinese (zh)
Inventor
宦若虹
谢超杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202110295911.1A priority Critical patent/CN113033371A/en
Publication of CN113033371A publication Critical patent/CN113033371A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A CSP model-based multi-level feature fusion pedestrian detection method adopts a CSP framework, uses CNN to extract pedestrian features, then a network is divided into 3 branches to respectively predict a target center point, a target height and a center point offset, after image preprocessing, PycnvResNet-101 is used as a feature extraction network to extract feature maps from input images, the obtained feature maps in different stages are subjected to multi-level fusion to obtain final feature maps, the final feature maps are sent to the prediction network, the prediction network is trained by using Focal local and Smooth L1, the target center point, the target height and the center point offset in the prediction maps are generated into target detection frames, and redundant detection frames are removed by using a non-maximum suppression algorithm to obtain a final detection result. The invention can fully integrate the abundant semantic information of the high-level characteristic diagram and the abundant position information of the low-level characteristic diagram, and effectively reduce false detection and missing detection under the conditions of small targets and serious shielding.

Description

CSP model-based multi-level feature fusion pedestrian detection method
Technical Field
The invention relates to the field of computer vision and target detection, in particular to a pedestrian detection method facing to videos.
Background
Computer vision has been a hot point and a difficult point of research in computer science, and pedestrian detection, as a subtask of target detection, has become a very important research problem in the field of computer vision. Convolutional Neural Networks (CNNs) have shown great power in the fields of computer vision and object detection in recent years. The development of many CNN-based general target detection methods has facilitated the development of research and application of pedestrian detection directions. But the pedestrian detection technology still has a great promotion space at present. The main problem is that the feature information of small objects and severely occluded objects is difficult to extract, resulting in missed detection and false detection. The csp (center and Scale prediction) is a pedestrian detection algorithm proposed in 2019, which learns pedestrian features through CNN, predicts central point coordinates and size information of a pedestrian target, and completes a pedestrian detection task.
Disclosure of Invention
Aiming at the problem of false detection caused by small targets and severe occlusion in pedestrian detection, the invention provides a CSP model-based multistage feature fusion pedestrian detection method, which can fully fuse rich semantic information of a high-level feature map and rich position information of a low-level feature map, and effectively reduce false detection and false detection under the conditions of small targets and severe occlusion.
A multi-level feature fusion pedestrian detection method based on a CSP model comprises the following steps:
step 1, adopting a CSP (compact size distribution) framework, extracting pedestrian features by using CNN (CNN), then respectively predicting a target central point, a target height and a central point offset by dividing a network into 3 branches, preprocessing a training image in a training stage, and then inputting the preprocessed training image into the network, wherein the preprocessing comprises the steps of adjusting the size of the image to set pixels, randomly cutting the image and adjusting the brightness, extracting the pedestrian features by using PycnvResNet-101 as a feature extraction network, performing multi-level fusion on 4 feature maps obtained in a second stage, a third stage, a fourth stage and a fifth stage of the PycnvResNet-101 network to obtain a final feature map, wherein the number of channels of the final feature map is 1024, and randomly erasing data enhancement is used for expanding the training data, and the target central point, the target height and the central point offset are trained by using Focal Loss and Smooth L1;
step 2, obtaining the mostThe final feature map is sent to a subsequent prediction network, and the prediction network firstly adjusts the number of channels of the final feature map to 2 by using 3-by-3 convolutionnN is a positive integer, n is more than or equal to 3 and less than or equal to 9, then the target center point, the target height and the center point offset are respectively predicted by using two convolutions of 1 x 1 and one convolution of 2 x 2 to generate a target detection frame, and a non-maximum suppression algorithm is used for removing redundant detection frames to obtain a final detection result;
and 3, in the testing stage, adjusting the test image into a specific size and inputting the test image into the network, performing multi-stage fusion on the obtained characteristic graph and then sending the obtained characteristic graph into a prediction network, wherein the prediction network outputs the center point of the target, the height of the target and the offset of the center point of the target, and the target width is obtained by multiplying the target height by a coefficient.
Further, the process of step 1 is as follows: using the last profile p of stage two, stage three, stage four and stage five in a PycnvResNet-101 network2,p3,p4And p5Performing a multi-stage fusion wherein p2,p3,p4And p5Respectively obtaining the width and the height of an input image by respectively sampling 4 times, 8 times, 16 times and 32 times, and the multi-stage fusion comprises the following steps:
1.1) deconvolution with convolution kernel size 4 x 4, step size 2, margin 1 to p5Up-sampling 2 times and p4Splicing in the channel direction to obtain p4_l1(ii) a Deconvolution of p with convolution kernel size 4 x 4, step size 2, margin 1 was used to convolve p4Up-sampling 2 times and p3Splicing in the channel direction to obtain p3_l1(ii) a Deconvolution of p with convolution kernel size 4 x 4, step size 2, margin 1 was used to convolve p3Up-sampling 2 times and p2Splicing in the channel direction to obtain p2_l1
1.2) deconvolution of the convolution kernel size 4 x 4, step size 2, margin 1, was used to fit the feature p obtained in 1.1)4_l1Up-sampling 2 times and obtaining the characteristic map p in 1.1)3_l1Splicing in the channel direction to obtain p3_l2(ii) a Deconvolution of p with convolution kernel size 4 x 4, step size 2, margin 1 was used to convolve p3_l1Up-sampling 2 times ofGet a characteristic diagram p2_l1Splicing in the channel direction to obtain p2_l2
1.3), deconvolution with convolution kernel size 4 x 4, step size 2, margin 1, and fitting the resulting feature map p in 1.2)3_l2Up-sampling by 2 times and obtaining the characteristic map p in 1.2)2_l2Splicing in the channel direction to obtain a final characteristic diagram poutAnd sending the data into a subsequent prediction network.
Preferably, in the step 3, the coefficient is 0.41.
The invention has the beneficial effects that: the invention adopts CSP model architecture, adopts PycnvResNet-101 as a feature extraction network, performs multi-stage fusion on 4 feature graphs output by the feature extraction network, can fully fuse rich semantic information of a high-level feature graph and rich position information of a low-level feature graph, and can effectively reduce false detection and missed detection under the conditions of small targets and serious shielding.
Drawings
Fig. 1 is a flow chart of a multilevel feature fusion pedestrian detection method based on a CSP model according to the present invention.
FIG. 2 is a CSP model architecture diagram.
FIG. 3 is a schematic diagram of pyramid convolution.
Fig. 4 is a structure diagram of a multilevel feature fusion pedestrian detection method based on a CSP model.
Fig. 5 is a comparison graph of the effect of the CSP model-based multi-level feature fusion pedestrian detection method and other pedestrian detection technologies on the Caltech dataset, where (a) represents a Reasonable subset, (b) represents a Heavy subset, and (c) represents a Medium subset; (d) representing a Near subset; (e) representing the All subset.
Detailed Description
The invention is further illustrated by the following figures and examples.
Referring to fig. 1 to 5, a multilevel feature fusion pedestrian detection method based on a CSP model includes the following steps:
step 1, referring to fig. 2, a CSP framework is adopted, the pedestrian features are extracted by using CNN, and then the network is divided into 3 paths of predicting a target central point, a target height and a central point offset respectively. In the training stage, a training image is input into a network after being preprocessed, wherein the preprocessing comprises the steps of adjusting the size of the image to set pixels, randomly cutting the image and adjusting the brightness, using PycnvResNet-101 as a feature extraction network to extract pedestrian features, performing multi-level fusion on 4 feature maps obtained in a second stage, a third stage, a fourth stage and a fifth stage of the PycnvResNet-101 network to obtain a final feature map, the number of channels of the final feature map is 1024, using random erasure data enhancement to expand training data, and training a target center point, a target height and a center point offset by using Focal Loss and Smooth L1.
Each convolution kernel of PyconvResNet-101 comprises a multi-layered pyramid structure, each layer comprising a different type of convolution kernel, see fig. 3. Pyramid convolution can process an input image using convolution kernels of multiple scales without increasing the computational burden and model complexity. Convolution kernels at different levels of the pyramid have different sizes and channel numbers. The convolution kernel becomes progressively larger in size from the bottom layer to the top layer. At the same time, the number of channels of the convolution kernel gradually becomes smaller. And in order to adapt to the number of channels of convolution kernels of different layers of the pyramid, carrying out grouping convolution on the input feature map.
The specific steps of random erasing are as follows: for one image I in one batch, the probability of performing the random erasing process thereon is set to 0.5. For a picture with width W and height H, the picture area S is W × H. Random initialization erase region Se. Is provided with
Figure BDA0002984332700000051
Setting the aspect ratio of the erase region to re∈[0.3,3.3]. The erase region has a height of
Figure BDA0002984332700000052
Figure BDA0002984332700000053
Width is
Figure BDA0002984332700000054
Then, randomly on the image IInitializing a point a ═ xe,ye) If x ise+WeW and y are not more thane+HeSetting the area I when the height is less than or equal to He=(xe,ye,xe+We,ye+He) As a randomly erased area, otherwise repeating the above steps until a qualified IeAnd occurs. I iseEach pixel within a region is assigned a [0,255 ] value]Is calculated.
Referring to fig. 4, the last profile p of phase two, phase three, phase four and phase five in a pyconvResNet-101 network is used2,p3,p4And p5Performing a multi-stage fusion wherein p2,p3,p4And p5Respectively, the width and the height of the input image are respectively obtained by sampling 4 times, 8 times, 16 times and 32 times. The fusion mode specifically comprises the following steps:
1.1) deconvolution with convolution kernel size 4 x 4, step size 2, margin 1 to p5Up-sampling 2 times and p4Splicing in the channel direction to obtain p4_l1(ii) a Deconvolution of p with convolution kernel size 4 x 4, step size 2, margin 1 was used to convolve p4Up-sampling 2 times and p3Splicing in the channel direction to obtain p3_l1(ii) a Deconvolution of p with convolution kernel size 4 x 4, step size 2, margin 1 was used to convolve p3Up-sampling 2 times and p2Splicing in the channel direction to obtain p2_l1
1.2) deconvolution of the convolution kernel size 4 x 4, step size 2, margin 1, with the feature p obtained in step 14_l1After 2 times of upsampling, the feature map p obtained in step 1.1 is obtained3_l1Splicing in the channel direction to obtain p3_l2(ii) a Deconvolution of p with convolution kernel size 4 x 4, step size 2, margin 1 was used to convolve p3_l1After 2 times of upsampling, the feature map p obtained in step 1.1 is obtained2_l1Splicing in the channel direction to obtain p2_l2
1.3) deconvolution with convolution kernel size 4 x 4, step size 2, margin 1, the feature map p obtained in step 1.23_l2After 2 times of upsampling, the feature map p obtained in step 1.22_l2Splicing in the channel direction to obtain a final characteristic diagram poutAnd sending the data into a subsequent prediction network.
Using Focal local and Smooth L1 as Loss functions: predicting the target center point is a binary problem, namely judging whether the target center point exists in each position of the feature map, if so, determining the target center point as a positive sample, and otherwise, determining the target center point as a negative sample. Because the negative samples around the positive sample are very close to the central point, the training is easy to be disturbed, and therefore, the two-dimensional Gaussian mask is added on the positive sample point during the training:
Figure BDA0002984332700000061
Figure BDA0002984332700000062
where K is the target number in the picture,
Figure BDA0002984332700000063
is the center point, width and height information of the kth target. Variance of Gaussian mask
Figure BDA0002984332700000064
Proportional to the height and width of the individual targets, respectively. If there is coincidence between the Gaussian masks of the two targets, the maximum of the two is selected.
For the predicted target center point, Focal local is adopted:
Figure BDA0002984332700000071
Figure BDA0002984332700000072
Figure BDA0002984332700000073
pij∈[0,1]indicating the possibility of the network determining the presence of an object center at location (i, j), yijE {0,1} represents a ground route tag, where yijWith 1 representing a positive sample, β and γ are hyper-parameters, set to 4 and 2, respectively.
Predicting the target height and center point offset is a regression problem, using Smooth L1:
Figure BDA0002984332700000074
Figure BDA0002984332700000075
wherein s iskAnd tkRespectively representing the predicted value and the true value of the network for each positive sample.
The total loss function is a weighted sum of three branch loss functions:
L=λcLcentersLscaleoLoffset
wherein λc,λs,λoThe weight coefficients for the target center point classification loss, the scale regression loss, and the offset regression loss were set to 0.01, 1, and 0.1, respectively.
And 2, sending the obtained final feature map into a subsequent prediction network, wherein the prediction network firstly uses a convolution layer with a convolution kernel of 3 x 3, the step length of 1 and the margin of 1 to adjust the channel of the input feature map to 2nN is a positive integer, n is not less than 3 and not more than 9, in this embodiment, n is 8, and then the convolution kernels of 1 × 1, 1 × 1 and 2 × 2 are respectively used to predict the target center point, the target height and the offset of the target center point.
And in the testing stage, the testing image is adjusted to a specific size and then is input into the network, the obtained characteristic graph is subjected to multi-stage fusion and then is sent into the prediction network, the prediction network outputs the center point of the target, the height of the target and the offset of the center point of the target, and the target width is obtained by multiplying the target height by a coefficient of 0.41. And analyzing to obtain a pedestrian prediction frame, and finally removing the redundant prediction frame by using a non-maximum suppression algorithm to obtain a final pedestrian detection frame.
The CSP model-based multi-level feature fusion pedestrian detection method is trained on a citreprersons training set and a Caltech training set respectively, tests are carried out on a citrerpersons verification set and a Caltech test set, and the evaluation index is the average logarithm omission ratio. As shown in table 1, table 2 and fig. 5, the method of the present invention is improved by 0.8%, 3.1%, 1.0%, 0.1%, 1.8% and 1.0% respectively in the citrerpersons validation set accessible subset, the Heavy subset, the Partial subset, the Bare subset and the Large subset, compared to the CSP algorithm. The improvement is respectively 0.4%, 10.5% and 4.8% in the Caltech test set Reasonable subset, Heavy subset and All subset. And simultaneously, the pedestrian detection device also shows a better effect in comparison with the existing pedestrian detection technology. Experimental results show that the detection performance of the CSP algorithm on small targets and seriously-shielded targets is effectively improved.
Table 1 shows the average log miss rate of each subset on the Citypersons validation set
Figure BDA0002984332700000081
TABLE 1
Table 2 shows the average log miss rate for each subset on the Caltech test set
Figure BDA0002984332700000091
Table 2.

Claims (3)

1. A CSP model-based multi-level feature fusion pedestrian detection method is characterized by comprising the following steps:
step 1, adopting a CSP (compact size distribution) framework, extracting pedestrian features by using CNN (CNN), then respectively predicting a target central point, a target height and a central point offset by dividing a network into 3 branches, preprocessing a training image in a training stage, and then inputting the preprocessed training image into the network, wherein the preprocessing comprises the steps of adjusting the size of the image to set pixels, randomly cutting the image and adjusting the brightness, extracting the pedestrian features by using PycnvResNet-101 as a feature extraction network, performing multi-level fusion on 4 feature maps obtained in a second stage, a third stage, a fourth stage and a fifth stage of the PycnvResNet-101 network to obtain a final feature map, wherein the number of channels of the final feature map is 1024, and randomly erasing data enhancement is used for expanding the training data, and the target central point, the target height and the central point offset are trained by using Focal Loss and Smooth L1;
step 2, sending the obtained final feature map into a subsequent prediction network, and firstly adjusting the number of channels of the final feature map to 2 by using 3-by-3 convolution in the prediction networknN is a positive integer, n is more than or equal to 3 and less than or equal to 9, then the target center point, the target height and the center point offset are respectively predicted by using two convolutions of 1 x 1 and one convolution of 2 x 2 to generate a target detection frame, and a non-maximum suppression algorithm is used for removing redundant detection frames to obtain a final detection result;
and 3, in the testing stage, adjusting the test image into a specific size and inputting the test image into the network, performing multi-stage fusion on the obtained characteristic graph and then sending the obtained characteristic graph into a prediction network, wherein the prediction network outputs the center point of the target, the height of the target and the offset of the center point of the target, and the target width is obtained by multiplying the target height by a coefficient.
2. The CSP model-based multi-level feature fusion pedestrian detection method according to claim 1, characterized in that: the process of the step 1 is as follows: using the last profile p of stage two, stage three, stage four and stage five in a PycnvResNet-101 network2,p3,p4And p5Performing a multi-stage fusion wherein p2,p3,p4And p5Respectively obtaining the width and the height of an input image by respectively sampling 4 times, 8 times, 16 times and 32 times, and the multi-stage fusion comprises the following steps:
1.1) deconvolution with convolution kernel size 4 x 4, step size 2, margin 1 to p5Up-sampling 2 times and p4In the direction of the passageRow stitching to p4_l1(ii) a Deconvolution of p with convolution kernel size 4 x 4, step size 2, margin 1 was used to convolve p4Up-sampling 2 times and p3Splicing in the channel direction to obtain p3_l1(ii) a Deconvolution of p with convolution kernel size 4 x 4, step size 2, margin 1 was used to convolve p3Up-sampling 2 times and p2Splicing in the channel direction to obtain p2_l1
1.2) deconvolution of the convolution kernel size 4 x 4, step size 2, margin 1, was used to fit the feature p obtained in 1.1)4_l1Up-sampling 2 times and obtaining the characteristic map p in 1.1)3_l1Splicing in the channel direction to obtain p3_l2(ii) a Deconvolution of p with convolution kernel size 4 x 4, step size 2, margin 1 was used to convolve p3_l1After 2 times of upsampling, the characteristic diagram p obtained in the step (1)2_l1Splicing in the channel direction to obtain p2_l2
1.3), deconvolution with convolution kernel size 4 x 4, step size 2, margin 1, and fitting the resulting feature map p in 1.2)3_l2Up-sampling by 2 times and obtaining the characteristic map p in 1.2)2_l2Splicing in the channel direction to obtain a final characteristic diagram poutAnd sending the data into a subsequent prediction network.
3. The CSP model-based multi-level feature fusion pedestrian detection method according to claim 1, characterized in that: in step 3, the coefficient is 0.41.
CN202110295911.1A 2021-03-19 2021-03-19 CSP model-based multi-level feature fusion pedestrian detection method Pending CN113033371A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110295911.1A CN113033371A (en) 2021-03-19 2021-03-19 CSP model-based multi-level feature fusion pedestrian detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110295911.1A CN113033371A (en) 2021-03-19 2021-03-19 CSP model-based multi-level feature fusion pedestrian detection method

Publications (1)

Publication Number Publication Date
CN113033371A true CN113033371A (en) 2021-06-25

Family

ID=76471689

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110295911.1A Pending CN113033371A (en) 2021-03-19 2021-03-19 CSP model-based multi-level feature fusion pedestrian detection method

Country Status (1)

Country Link
CN (1) CN113033371A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723322A (en) * 2021-09-02 2021-11-30 南京理工大学 Pedestrian detection method and system based on single-stage anchor-free frame
WO2023001059A1 (en) * 2021-07-19 2023-01-26 中国第一汽车股份有限公司 Detection method and apparatus, electronic device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023001059A1 (en) * 2021-07-19 2023-01-26 中国第一汽车股份有限公司 Detection method and apparatus, electronic device and storage medium
CN113723322A (en) * 2021-09-02 2021-11-30 南京理工大学 Pedestrian detection method and system based on single-stage anchor-free frame

Similar Documents

Publication Publication Date Title
CN109859190B (en) Target area detection method based on deep learning
CN108509978B (en) Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion
CN113052834B (en) Pipeline defect detection method based on convolution neural network multi-scale features
CN108520203B (en) Multi-target feature extraction method based on fusion of self-adaptive multi-peripheral frame and cross pooling feature
CN112215795B (en) Intelligent detection method for server component based on deep learning
CN111680705B (en) MB-SSD method and MB-SSD feature extraction network suitable for target detection
CN112308825B (en) SqueezeNet-based crop leaf disease identification method
CN114494981B (en) Action video classification method and system based on multi-level motion modeling
CN111696110A (en) Scene segmentation method and system
CN113033371A (en) CSP model-based multi-level feature fusion pedestrian detection method
CN114155474A (en) Damage identification technology based on video semantic segmentation algorithm
CN116721414A (en) Medical image cell segmentation and tracking method
Singh et al. Semantic segmentation using deep convolutional neural network: A review
CN111553337A (en) Hyperspectral multi-target detection method based on improved anchor frame
Aldhaheri et al. MACC Net: Multi-task attention crowd counting network
CN112991281B (en) Visual detection method, system, electronic equipment and medium
CN112488220B (en) Small target detection method based on deep learning
CN111339950B (en) Remote sensing image target detection method
CN112132207A (en) Target detection neural network construction method based on multi-branch feature mapping
TWI809957B (en) Object detection method and electronic apparatus
CN116563285A (en) Focus characteristic identifying and dividing method and system based on full neural network
CN116542988A (en) Nodule segmentation method, nodule segmentation device, electronic equipment and storage medium
CN114494893B (en) Remote sensing image feature extraction method based on semantic reuse context feature pyramid
CN113269734B (en) Tumor image detection method and device based on meta-learning feature fusion strategy
CN112396126A (en) Target detection method and system based on detection of main stem and local feature optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination