CN112348036A - Self-adaptive target detection method based on lightweight residual learning and deconvolution cascade - Google Patents

Self-adaptive target detection method based on lightweight residual learning and deconvolution cascade Download PDF

Info

Publication number
CN112348036A
CN112348036A CN202011342607.XA CN202011342607A CN112348036A CN 112348036 A CN112348036 A CN 112348036A CN 202011342607 A CN202011342607 A CN 202011342607A CN 112348036 A CN112348036 A CN 112348036A
Authority
CN
China
Prior art keywords
feature map
target
network
convolution
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011342607.XA
Other languages
Chinese (zh)
Inventor
刘芳
韩笑
孙亚楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202011342607.XA priority Critical patent/CN112348036A/en
Publication of CN112348036A publication Critical patent/CN112348036A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a self-adaptive target detection method based on a lightweight residual error network and deconvolution cascade, which comprises the following steps: acquiring an image training data set and a test data set; extracting deep level characteristics of the image to be detected through a lightweight residual error network combining depth separable convolution and residual error learning to obtain deep level expression of a target; fixing the dimension of the output feature map on the extracted different-level feature maps by adopting 1x1 convolution; increasing the resolution of the deep level feature map by using a deconvolution cascade structure to achieve the spatial dimension consistency with the previous level feature map; utilizing semantic features to guide a candidate region generation network to generate a target candidate frame which is matched with a real target in a self-adaptive mode on a multi-scale feature map; and finally, correcting the generated target candidate frame Anchor. The invention effectively improves the accuracy of target detection, can quickly and accurately detect the target under complex conditions, and effectively improves the real-time property of target detection.

Description

Self-adaptive target detection method based on lightweight residual learning and deconvolution cascade
Technical Field
The invention relates to a target detection method, belongs to the field of digital image processing, deep learning and artificial intelligence, and particularly designs a self-adaptive target detection method based on lightweight residual learning and deconvolution cascade.
Background
With the rapid development of computer vision technology, target detection technology has become a research hotspot in the fields of artificial intelligence and computer vision, and is widely applied to the military and civil fields. The target detection mainly aims at one or more specific targets in the video image sequence, and identifies and positions the specific targets. In most cases, video image acquisition equipment contains rich visual content, and although more comprehensive scene information can be provided, the target to be detected usually has large scale change, concentrated distribution and shielding in an image or a video, and sufficient detection details are not available, so that the target characteristics cannot be effectively extracted by the current target detection algorithm, and the target position can be accurately positioned. Therefore, accurately and efficiently detecting a target object is one of the key issues of a target detection task.
In recent years, a target detection technique based on deep learning has been highly successful, and many researchers have started studying target detection using a deep learning method. Currently, mainstream target detection algorithms are mainly classified into two types, one is single-stage target detection based on a regression method, and the other is two-stage target detection based on a region candidate box. The former is mainly represented by a YOLO series, and the detection idea is to regard the detection problem as a regression analysis problem of target position and target category information and directly output the detection result through a convolutional neural network; the latter is mainly represented by the R-CNN series, as the name implies. The method divides the target detection process into two main stages, wherein a candidate area extraction module is the first part and extracts network characteristics through a main stream backbone network to detect background and foreground areas, and the second stage is to classify and correct coordinates of the candidate areas to finish accurate detection of targets. The former has higher speed but unsatisfactory precision; the latter requires two convolution network operations, which undoubtedly results in a two-stage detection network with higher detection accuracy, but reduces the detection speed to some extent. However, with the development of convolutional neural networks, the appearance of various lightweight backbone networks (such as shuffle net, MobileNet, etc.), convolution modes (such as deep convolution, separable convolution, point convolution, etc.), and different Connection modes (such as Skip Connection, etc.) makes the network complexity and the calculation complexity decrease continuously, and meanwhile, hardware devices are continuously developed, which also lays a foundation for the improvement of the target detection speed. In addition, with the wide application of the convolutional neural network, the deconvolution also enters the sight of people, the deconvolution is used as the inverse process of the convolution, the problems of reduced resolution of a feature map, feature loss and the like caused by deep convolution operation can be effectively solved, and the method is an important means for carrying out multi-scale feature fusion.
The existing method has the following defects: on one hand, the traditional classical target detection algorithm is limited by manually designed manual features and a selective search algorithm, so that the target detection precision is low, the detection speed is low, and the algorithm robustness is poor; on the other hand, although the target detection based on deep learning has improved precision, the convolutional neural network has a large number of parameters, the algorithm structure has high complexity, the calculation amount is large, and the real-time requirement is difficult to meet.
Disclosure of Invention
The invention aims to solve the defects and provides a self-adaptive target detection algorithm based on lightweight residual learning and deconvolution cascade. The advantages of residual error learning are combined, and common convolution operation is divided into a depth convolution layer and a point convolution layer to be used for compressing network parameters, so that the calculation efficiency of the network is improved. Then, a multi-scale self-adaptive candidate area generation network is constructed on the basis of the lightweight residual error network, high-level semantic features are added and fused into a low-level feature map through a deconvolution cascade structure, the expression capability of the features on a target is enhanced, multi-level different-scale feature maps are used for target prediction, and sparse candidate frames with arbitrary shapes are generated according to the positions and the shapes of the candidate frames predicted by the image features, so that better target detection performance is achieved.
In order to achieve the above object, the present invention provides a method for detecting an adaptive target based on lightweight residual learning and deconvolution cascade, comprising the following steps:
s1: acquiring data through image acquisition equipment to obtain an image training data set and a test data set;
s2: constructing a lightweight depth residual error network, inputting an image training data set and a test data set in S1, and extracting features;
s3: selecting feature maps extracted from the last four levels in the lightweight residual error network, and convolving and fixing the dimensions of the output feature maps by 1x 1;
s4: constructing a multi-scale self-adaptive candidate area generation network, wherein the sizes of feature maps of different levels are different, the size of a feature map of a previous layer is larger than that of a feature map of a current layer, in order to fuse the feature maps of different levels, the feature maps of different sizes extracted in S3 are increased by using a deconvolution cascade structure, the spatial size of the feature maps of the previous layer is consistent, the feature maps are subjected to weighted fusion operation according to channel dimensions, and a candidate area generation network is adopted to generate a prediction target frame and category information;
s5: position correction and category regression are carried out by adopting a multitask loss function of the following formula through the position and category information of a prediction target frame generated by the self-adaptive candidate region;
L=Lcls+Lreg1Lloc2Lshape (1)
wherein L is the overall loss function of the algorithm, LclsRepresenting the classification loss function, L, in classifying featuresregDenotes the regression loss function in the case of position regression, LlocIndicating the loss of positioning when positioning the object, LshapeRepresenting the shape loss, beta, of the target detection frame1And beta2The weighting coefficients, which represent the multitask loss function, are 1 and 0.1, respectively.
Advantageous effects
According to the self-adaptive target detection method based on the light-weight residual error learning and the deconvolution cascade connection, in the aspect of feature extraction, a light-weight residual error network is adopted for feature extraction, the light-weight feature extraction network is established through the advantages of depth separable convolution and residual error learning, then feature fusion is carried out according to the deconvolution cascade connection, a multi-scale self-adaptive candidate region generation network is established, the consistency of the space sizes of feature maps of different levels is realized, weighting fusion operation is carried out, and finally, the semantic features are used for guiding the network to adaptively generate a target candidate frame which is more matched with a real target. Simulation experiments show that the method can effectively extract the target features in the video image sequence, enhances the expression capability of the features on the target, can quickly and accurately identify and position the target under the conditions of shielding, scale change and small target, has high precision and robustness, and simultaneously greatly reduces the calculated amount by a lightweight network and meets the detection real-time property.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become more readily appreciated from the following description of the embodiments taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a flowchart of an adaptive target detection method based on lightweight residual learning and deconvolution cascade according to an embodiment of the present invention, and
FIG. 2 is a schematic diagram of a lightweight residual error network according to an embodiment of the present invention, an
FIG. 3 is a diagram illustrating multi-scale feature fusion of a deconvolution cascade structure according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.
As shown in fig. 1, the adaptive target detection method based on lightweight residual learning and deconvolution cascade according to the present invention includes the following steps:
s1: acquiring data through image acquisition equipment to obtain an image training data set and a test data set;
s1.1: preprocessing samples in the data set by cutting, turning, rotating, scaling and the like to expand the data set;
s1.2: extracting positive and negative samples in each image, labeling the positive samples to be detected, and labeling the position and the category of each target by using a rectangular frame;
s2: constructing a lightweight depth residual error network, inputting a training data set, and performing feature extraction;
the method comprises the following specific steps:
s2.1: inputting a training data set into a lightweight residual error network, and performing depth separable convolution on the image;
1) carrying out deep convolution on an input image, independently allocating a convolution kernel to each of N channels of the input image characteristic F, wherein each convolution kernel is only responsible for carrying out convolution operation on the image characteristic of the channel, the size of the convolution kernel is consistent with that of a convolution kernel of standard convolution, the number of the convolution kernels is N, the step length is 1, and the convolution kernel comprises padding operation;
2) performing point convolution on the feature map obtained through the depth convolution in the previous step, wherein the size of a convolution kernel is 1 multiplied by 1, and the number of the convolution kernels is L, so as to obtain a feature map of a specified channel dimension;
s2.2: connecting a shallow network and a deep network in a Skip Connection mode, and fusing the feature information of feature maps of different levels after convolution, namely fusing the feature information of a bottom layer into a high layer;
s3: selecting feature maps in the last four levels in the lightweight residual error network, and fixing the dimension of the output feature map by using 1x1 convolution;
s4: constructing a multi-scale self-adaptive candidate area generation network, increasing the resolution of a deep level feature map by using a deconvolution cascade structure, realizing the consistency with the space size of the previous level feature map, carrying out weighted fusion operation on the feature maps with consistent space sizes according to channel dimensions, and generating a prediction target frame and category information by using the candidate area generation network;
the method comprises the following steps:
s4.1: selecting a multilevel feature map { C2, C3, C4, C5} in the lightweight residual network, corresponding to the output of the last layer of each network level;
s4.2: the feature map size is made to be consistent with C4 by using deconvolution operation on the high-level feature map P5 (obtained by C5 through 1x1 convolution), and then the feature map is weighted and fused with the corresponding previous-level feature map C4 to obtain a new feature map P4.
S4.3: the process of S4.2 is repeated until a signature P2 is generated that is consistent in size with C2, with more detailed signature information for smaller targets. Therefore, on the basis of directly adding the same weight, the weights are additionally allocated to 6 different feature maps, and the weighted fusion formula is as follows:
Figure BDA0002798948270000041
where D (-) is the deconvolution transfer function, α1、α2、α3、α4、α5And alpha6And the values of the weight coefficients are respectively 0.7, 0.3, 0.6, 0.4, 0.45 and 0.55, and the sum of the weight coefficients fused in each layer is 1 in order to avoid characteristic information redundancy.
S4.4: inputting the feature map subjected to deconvolution cascade feature fusion into a self-adaptive candidate area generation network to obtain the center position and the shape of the Anchor, wherein the specific steps are as follows;
1) the Anchor shape generated according to the image characteristic self-adaption is changed according to different positions, and the Anchor characteristic self-adaption branch network N is adoptedTThe features are converted, so that the content of a larger region needs to be coded by the features of a larger Anchor, the content of a smaller region can be extracted by the features of a smaller Anchor, and the branch network is realized by adopting a 3 x 3 deformable convolution layer.
fi'=NT(fi,wi,hi) (3)
Wherein f isiIs a feature of the ith position, wi,hiIs the corresponding Anchor shape. I.e. the offset is predicted from the output of the shape prediction branch, and then f is obtained using deformable convolution on the original feature mapi'。
2) Anchor's center prediction branch network NLGenerating a sum input feature map FIProbability map of the same size, P (i, j | F)I) Indicating that the position of the feature map (i, j) is possibleThe probability of the current target object corresponds to the coordinates [ (I +1/2) s, (j +1/2) s in the image I]Where s is the step size of the feature map. N is a radical ofLThe branch network uses a 1x1 convolutional network to obtain a confidence map of the target, which is then converted to probability values using a sigmoid function. According to the generated probability map, determining a region where the target possibly exists by selecting a position (the value in a contrast experiment is 0.05) of which the corresponding probability value is larger than a predefined threshold value;
3) predicting branch N by shape after determining the likely location of the targetSThe network branch contains a convolutional layer of 1x1 size, which can generate a two-channel map containing values for dw and dh. Input feature map FIAnd the shape prediction branch will predict the optimal shape (w, h) at each position, and because the range of w and h may be very large, the shape prediction branch will output dw and dh after the transformation of equation (6), and these two values can be mapped out to w and h, where s is the step size and λ is the empirical scale factor (the value in this experiment is 8). The non-linear transformation mapping may be to [0,1000 ]]Mapping to [ -1,1 [ ]]The shape prediction branch calculation is made simpler and more stable.
Figure BDA0002798948270000051
S5: performing position correction and category regression by adopting a multitask loss function of the following formula according to the position and the category of a prediction target frame generated by the self-adaptive candidate region;
L=Lcls+Lreg1Lloc2Lshape (5)
wherein L isclsAnd LregRespectively representing the classification and regression losses, L, in a conventional networkshapeAnd LlocThe newly added anchor localization loss and anchor shape loss, respectively.
The method comprises the following steps:
s5.1: get the real target frame (x)g,yg,wg,hg) Feature map of (x'g,y′g,w′g,h′g) Respectively obtaining classification loss and regression loss by adopting a cross entropy loss function and a mean square error function, and then defining two regions (x ') in the target feature mapping region'g,y′g1w′g1h′g) And (x'g,y′g2w′g2h′g),δ1、δ2The values are 0.2 and 0.5 respectively. (x'g,y′g1w′g1h′g) Is a central region of (x'g,y′g2w′g2h′g) The position without the central area is the neglected area, and the rest is the peripheral area.
S5.2: the central area is used as a positive sample, the peripheral area is used as a negative sample, and the Focal local is used for training the positioning branch Lloc
S5.3: training shape prediction branches by adopting the following IOU calculation mode:
vIOU(awh,G)=maxIOU(awh,G) (6)
where IOU (-) is the definition of a conventional IOU, G represents the real target box, awhRepresenting the anchor variable, enumerating 9 as a for anchors of different proportions and sizes as is commonwhAnd using the maximum value as the final vIOU (a)whG). Determining shape loss L of target box anchorshapeAs shown in the following formula (7), wherein l1Is smoothL1Loss functions, (w, h) and (w)g,hg) Respectively representing the predicted anchor shape and the corresponding true target shape.
Figure BDA0002798948270000052
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (5)

1. The self-adaptive target detection method based on lightweight residual learning and deconvolution cascade is characterized by comprising the following steps of:
s1: acquiring data through image acquisition equipment to obtain an image training data set and a test data set;
s2: constructing a lightweight depth residual error network, inputting an image training data set and a test data set in S1, and extracting features;
s3: selecting feature maps extracted from the last four levels in the lightweight residual error network, and convolving and fixing the dimensions of the output feature maps by 1x 1;
s4: constructing a multi-scale self-adaptive candidate area generation network, wherein the sizes of feature maps of different levels are different, the size of a feature map of a previous layer is larger than that of a feature map of a current layer, in order to fuse the feature maps of different levels, the feature maps of different sizes extracted in S3 are increased by using a deconvolution cascade structure, the spatial size of the feature maps of the previous layer is consistent, the feature maps are subjected to weighted fusion operation according to channel dimensions, and a candidate area generation network is adopted to generate a prediction target frame and category information;
s5: position correction and category regression are carried out by adopting a multitask loss function of the following formula through the position and category information of a prediction target frame generated by the self-adaptive candidate region;
L=Lcls+Lreg1Lloc2Lshape (1)
wherein L is the overall loss function of the algorithm, LclsRepresenting the classification loss function, L, in classifying featuresregDenotes the regression loss function in the case of position regression, LlocIndicating the loss of positioning when positioning the object, LshapeRepresenting the shape loss, beta, of the target detection frame1And beta2The weighting coefficients, which represent the multitask loss function, are 1 and 0.1, respectively.
2. The adaptive target detection method based on light-weighted residual learning and deconvolution cascade of claim 1, characterized in that:
in S1, acquiring data through an image acquisition device to obtain an image training data set and a test data set;
s1.1: preprocessing samples in the data set through cutting, turning, rotating and scale transformation to expand the data set;
s1.2: and extracting positive and negative samples in each image, labeling the positive samples to be detected, and labeling the position and the category of each target by using a rectangular frame.
3. The adaptive target detection method based on light-weighted residual learning and deconvolution cascade of claim 1, characterized in that: in S2, constructing a lightweight depth residual error network, inputting a training data set, and performing feature extraction;
the method comprises the following steps:
s2.1: inputting a training data set into a lightweight residual error network, and performing depth separable convolution on the image;
1) carrying out deep convolution on an input image, independently allocating a convolution kernel to each of N channels of the input image characteristic F, wherein each convolution kernel is only responsible for carrying out convolution operation on the image characteristic of the channel, the size of the convolution kernel is consistent with that of a convolution kernel of standard convolution, the number of the convolution kernels is N, the step length is 1, and the convolution kernel comprises padding operation;
2) performing point convolution on the feature map obtained through the depth convolution in the previous step, wherein the size of a convolution kernel is 1 multiplied by 1, and the number of the convolution kernels is L, so as to obtain a feature map of a specified channel dimension;
s2.2: the shallow network and the deep network are connected in a jump connection mode, feature information of feature maps of different levels after convolution is fused, and equivalently, feature information of a bottom layer is fused into a high layer.
4. The adaptive target detection method based on light-weighted residual learning and deconvolution cascade of claim 1, characterized in that:
in S4, a multi-scale self-adaptive candidate area generating network is constructed, the resolution of a deep level feature map is increased by using a deconvolution cascade structure, the spatial dimension of the deep level feature map is consistent with that of a previous level feature map, the feature maps with consistent spatial dimension are subjected to weighted fusion operation according to the channel dimension, and a candidate area generating network is adopted to generate a prediction target frame and category information;
the method comprises the following steps:
s4.1: selecting a multilevel feature map { C2, C3, C4, C5} in the lightweight residual network, corresponding to the output of the last layer of each network level;
s4.2: performing deconvolution operation on the high-level feature map P5 (obtained by C5 through 1x1 convolution) to make the feature map size consistent with C4, and then performing weighted fusion on the feature map and the corresponding previous-level feature map C4 to obtain a new feature map P4;
s4.3: repeating the S4.2 process until a feature map P2 with the size consistent with that of C2 is generated, and the detail feature information of more small targets is possessed; therefore, on the basis of directly adding the same weight, the weights are additionally allocated to 6 different feature maps, and the weighted fusion formula is as follows:
Figure FDA0002798948260000021
where D (-) is the deconvolution transfer function, α1、α2、α3、α4、α5And alpha6Representing weight coefficients with the values of 0.7, 0.3, 0.6, 0.4, 0.45 and 0.55 respectively, wherein the sum of the weight coefficients fused in each layer is 1 in order to avoid characteristic information redundancy;
s4.4: inputting the feature map subjected to deconvolution cascade feature fusion into a self-adaptive candidate area generation network to obtain the center position and the shape of the Anchor, wherein the specific steps are as follows;
1) the Anchor shape generated according to the image characteristic self-adaption is changed according to different positions, and the Anchor characteristic self-adaption branch network N is adoptedTFeatures are transformed, and the branch network uses a 3 x 3 deformable convolution layerThe implementation is carried out;
fi'=NT(fi,wi,hi)
wherein f isiIs a feature of the ith position, wi,hiIs a corresponding Anchor shape; i.e. the offset is predicted from the output of the shape prediction branch, and then f is obtained using deformable convolution on the original feature mapi';
2) Anchor's center prediction branch network NLGenerating a sum input feature map FIProbability map of the same size, P (i, j | F)I) The probability that the target object may appear at the position of the feature map (I, j) is represented by coordinates [ (I +1/2) s, (j +1/2) s) in the image I]Where s is the step size of the feature map; n is a radical ofLThe branch network uses a convolution network of 1x1 to obtain a confidence map of the target, and then converts the confidence map into a probability value by using a sigmoid function; according to the generated probability map, determining a region in which the target possibly exists by selecting a position with a corresponding probability value larger than a predefined threshold value;
3) predicting branch N by shape after determining the likely location of the targetSThe network branches contain a convolution layer of 1x1 size, producing a two-channel map containing values for dw and dh; input feature map FIThe shape prediction branch will predict the best shape (w, h) for each position, and since the range of w and h may be large, the shape prediction branch will output dw and dh after transformation, and these two values can be mapped out as w and h, where s is the step size and λ is the empirical scale factor;
Figure FDA0002798948260000031
5. the adaptive target detection method based on light-weighted residual learning and deconvolution cascade of claim 1, characterized in that: in S5, position correction and category regression are performed by using the position and category of the prediction target frame generated by the adaptive candidate region and using the multitask loss function of the following formula;
L=Lcls+Lreg1Lloc2Lshape
wherein L isclsAnd LregRespectively representing the classification and regression losses, L, in a conventional networkshapeAnd LlocThe anchor positioning loss and the anchor shape loss which are newly increased are respectively;
the method comprises the following steps:
s5.1: get the real target frame (x)g,yg,wg,hg) Feature map of (x'g,y′g,w′g,h′g) Respectively obtaining classification loss and regression loss by adopting a cross entropy loss function and a mean square error function, and then defining two regions (x ') in the target feature mapping region'g,y′g1w′g1h′g) And (x'g,y′g2w′g2h′g),δ1、δ2The values are 0.2 and 0.5 respectively; (x'g,y′g1w′g1h′g) Is a central region of (x'g,y′g1w′g1h′g) Is a central region of (x'g,y′g2w′g2h′g) The position without the central area is an neglected area, and the rest part is a peripheral area;
s5.2: the central area is used as a positive sample, the peripheral area is used as a negative sample, and the Focal local is used for training the positioning branch Lloc
S5.3: training shape prediction branches by adopting the following IOU calculation mode:
vIOU(awh,G)=maxIOU(awh,G)
where IOU (-) is the definition of IOU, G represents the real target box, awhRepresenting the anchor variable, enumerating 9 as a for anchors of different proportions and sizes as is commonwhAnd using the maximum value as the final vIOU (a)whG); determining shape loss L of target box anchorshape,l1Is smooth L1Loss functions, (w, h) and (w)g,hg) Respectively representing a predicted anchor shape and a corresponding real target shape;
Figure FDA0002798948260000041
CN202011342607.XA 2020-11-26 2020-11-26 Self-adaptive target detection method based on lightweight residual learning and deconvolution cascade Pending CN112348036A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011342607.XA CN112348036A (en) 2020-11-26 2020-11-26 Self-adaptive target detection method based on lightweight residual learning and deconvolution cascade

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011342607.XA CN112348036A (en) 2020-11-26 2020-11-26 Self-adaptive target detection method based on lightweight residual learning and deconvolution cascade

Publications (1)

Publication Number Publication Date
CN112348036A true CN112348036A (en) 2021-02-09

Family

ID=74365696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011342607.XA Pending CN112348036A (en) 2020-11-26 2020-11-26 Self-adaptive target detection method based on lightweight residual learning and deconvolution cascade

Country Status (1)

Country Link
CN (1) CN112348036A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926605A (en) * 2021-04-01 2021-06-08 天津商业大学 Multi-stage strawberry fruit rapid detection method in natural scene
CN113486718A (en) * 2021-06-08 2021-10-08 天津大学 Fingertip detection method based on deep multitask learning
CN113486979A (en) * 2021-07-28 2021-10-08 佛山市南海区广工大数控装备协同创新研究院 Lightweight target detection method based on key points
CN113554131A (en) * 2021-09-22 2021-10-26 四川大学华西医院 Medical image processing and analyzing method, computer device, system and storage medium
CN114022705A (en) * 2021-10-29 2022-02-08 电子科技大学 Adaptive target detection method based on scene complexity pre-classification
CN114842189A (en) * 2021-11-10 2022-08-02 北京中电兴发科技有限公司 Adaptive Anchor generation method for target detection
CN114936983A (en) * 2022-06-16 2022-08-23 福州大学 Underwater image enhancement method and system based on depth cascade residual error network
CN115393682A (en) * 2022-08-17 2022-11-25 龙芯中科(南京)技术有限公司 Target detection method, target detection device, electronic device, and medium
CN115526886A (en) * 2022-10-26 2022-12-27 中国铁路设计集团有限公司 Optical satellite image pixel level change detection method based on multi-scale feature fusion

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN109948607A (en) * 2019-02-21 2019-06-28 电子科技大学 Candidate frame based on deep learning deconvolution network generates and object detection method
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN109948607A (en) * 2019-02-21 2019-06-28 电子科技大学 Candidate frame based on deep learning deconvolution network generates and object detection method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘芳,等: "基于多尺度特征融合的自适应无人机目标检测", 《光学学报》, vol. 40, no. 10, 25 May 2020 (2020-05-25), pages 1 - 10 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926605B (en) * 2021-04-01 2022-07-08 天津商业大学 Multi-stage strawberry fruit rapid detection method in natural scene
CN112926605A (en) * 2021-04-01 2021-06-08 天津商业大学 Multi-stage strawberry fruit rapid detection method in natural scene
CN113486718B (en) * 2021-06-08 2023-04-07 天津大学 Fingertip detection method based on deep multitask learning
CN113486718A (en) * 2021-06-08 2021-10-08 天津大学 Fingertip detection method based on deep multitask learning
CN113486979A (en) * 2021-07-28 2021-10-08 佛山市南海区广工大数控装备协同创新研究院 Lightweight target detection method based on key points
CN113554131A (en) * 2021-09-22 2021-10-26 四川大学华西医院 Medical image processing and analyzing method, computer device, system and storage medium
CN114022705A (en) * 2021-10-29 2022-02-08 电子科技大学 Adaptive target detection method based on scene complexity pre-classification
CN114022705B (en) * 2021-10-29 2023-08-04 电子科技大学 Self-adaptive target detection method based on scene complexity pre-classification
CN114842189A (en) * 2021-11-10 2022-08-02 北京中电兴发科技有限公司 Adaptive Anchor generation method for target detection
CN114842189B (en) * 2021-11-10 2022-11-04 北京中电兴发科技有限公司 Adaptive Anchor generation method for target detection
CN114936983A (en) * 2022-06-16 2022-08-23 福州大学 Underwater image enhancement method and system based on depth cascade residual error network
CN115393682A (en) * 2022-08-17 2022-11-25 龙芯中科(南京)技术有限公司 Target detection method, target detection device, electronic device, and medium
CN115526886A (en) * 2022-10-26 2022-12-27 中国铁路设计集团有限公司 Optical satellite image pixel level change detection method based on multi-scale feature fusion
CN115526886B (en) * 2022-10-26 2023-05-26 中国铁路设计集团有限公司 Optical satellite image pixel level change detection method based on multi-scale feature fusion

Similar Documents

Publication Publication Date Title
CN112348036A (en) Self-adaptive target detection method based on lightweight residual learning and deconvolution cascade
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN111126472B (en) SSD (solid State disk) -based improved target detection method
CN111210443B (en) Deformable convolution mixing task cascading semantic segmentation method based on embedding balance
CN109241982B (en) Target detection method based on deep and shallow layer convolutional neural network
WO2021042828A1 (en) Neural network model compression method and apparatus, and storage medium and chip
CN109145979B (en) Sensitive image identification method and terminal system
CN109949255B (en) Image reconstruction method and device
WO2021022521A1 (en) Method for processing data, and method and device for training neural network model
CN114022432B (en) Insulator defect detection method based on improved yolov5
CN110059586B (en) Iris positioning and segmenting system based on cavity residual error attention structure
CN112070044B (en) Video object classification method and device
CN112651438A (en) Multi-class image classification method and device, terminal equipment and storage medium
CN112329760B (en) Method for recognizing and translating Mongolian in printed form from end to end based on space transformation network
CN110633708A (en) Deep network significance detection method based on global model and local optimization
CN111612017A (en) Target detection method based on information enhancement
CN111783819B (en) Improved target detection method based on region of interest training on small-scale data set
CN113609896A (en) Object-level remote sensing change detection method and system based on dual-correlation attention
CN112150493A (en) Semantic guidance-based screen area detection method in natural scene
CN110796199A (en) Image processing method and device and electronic medical equipment
CN113569881A (en) Self-adaptive semantic segmentation method based on chain residual error and attention mechanism
CN111899203A (en) Real image generation method based on label graph under unsupervised training and storage medium
CN115311502A (en) Remote sensing image small sample scene classification method based on multi-scale double-flow architecture
CN113297959A (en) Target tracking method and system based on corner attention twin network
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination