CN113627269B - Pest target detection method based on decoupling classification and regression feature optimal layer technology - Google Patents

Pest target detection method based on decoupling classification and regression feature optimal layer technology Download PDF

Info

Publication number
CN113627269B
CN113627269B CN202110804036.5A CN202110804036A CN113627269B CN 113627269 B CN113627269 B CN 113627269B CN 202110804036 A CN202110804036 A CN 202110804036A CN 113627269 B CN113627269 B CN 113627269B
Authority
CN
China
Prior art keywords
network
feature
convolution
layer
convolutions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110804036.5A
Other languages
Chinese (zh)
Other versions
CN113627269A (en
Inventor
宋良图
陈天娇
王儒敬
谢成军
张洁
杜健铭
李�瑞
陈红波
胡海瀛
刘海云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Institutes of Physical Science of CAS
Original Assignee
Hefei Institutes of Physical Science of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Institutes of Physical Science of CAS filed Critical Hefei Institutes of Physical Science of CAS
Priority to CN202110804036.5A priority Critical patent/CN113627269B/en
Publication of CN113627269A publication Critical patent/CN113627269A/en
Application granted granted Critical
Publication of CN113627269B publication Critical patent/CN113627269B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

Compared with the prior art, the invention solves the defect of low pest identification rate caused by large size difference of the pests in the pest killing lamp. The invention comprises the following steps: acquiring a training sample set; constructing a pest target detection network; training a pest target detection network; obtaining a pest image sample to be detected; and (5) detecting and positioning pest targets. According to the invention, classification and regression tasks can be allocated to different feature layers according to the common feature layer setting to respectively obtain a final detection result, so that the difference detection is carried out on the insect body difference in the insect killing lamp environment, the insect body detection recognition rate in the insect killing lamp environment is improved, and the actual application requirement is met.

Description

Pest target detection method based on decoupling classification and regression feature optimal layer technology
Technical Field
The invention relates to the technical field of pest image detection, in particular to a pest target detection method based on decoupling classification and regression characteristic optimal layer technology.
Background
With the rapid development of technologies such as the Internet of things, cloud computing, mobile interconnection, intelligent terminals and the like, big data rapidly enter the field of vision of people. At present, large agricultural data is driving the agricultural production to be changed into precision and intelligent, and the data gradually becomes an emerging production element in modern agricultural production.
For big data of crop pests, the system has the characteristics of territory, seasonality, diversity, periodicity and the like, has wide data sources, various types and complex structures, integrates the insect condition information automatic acquisition system comprising a high-definition camera based on the remote real-time insect condition monitoring requirement of the insect condition image automatic information acquisition and prediction lamp, and can carry out light supplementing, automatic photographing, storage and remote transmission on insect bodies captured by the measurement and prediction lamp according to a set time interval, thereby acquiring high-resolution images, ensuring that the images are clearer and the background is free from sundry interference. The method mainly comprises the following steps of insect pests such as leaf rollers, armyworms, cotton bollworms, athetis lepigone, prodenia litura, black gill scara, copper green ali scara, mole cricket and flammules tenuifolia, which are small-size insect pests relative to pictures, wherein the insect pests still have large size differences, for example, the size differences of rice planthoppers and mole cricket are obvious, and after corresponding regions of interest are obtained, the target regions with the large size differences need to be searched for optimal characteristic layers for classification and regression.
The existing method uses the same layer of characteristics or the combination of different layers as the basis of classification and regression, namely the same characteristics are used for different tasks, but classification is biased to semantic information of a higher layer, and regression is biased to position information of a lower layer. However, in practical application, the uncertainty exists in the insect bodies collected in the insect killing lamp, and the large difference exists in the insect body size, so that the non-difference classification and regression technology in the prior art ensures that the insect pest identification rate in the insect killing lamp aiming at the large size difference is low, the error rate is high, and the practical application in the insect killing lamp environment is difficult to meet.
Disclosure of Invention
The invention aims to solve the defect of low pest identification rate caused by large size difference of pest bodies in a pest killing lamp in the prior art, and provides a pest target detection method based on decoupling classification and regression characteristic optimal layer technology to solve the problems.
In order to achieve the above object, the technical scheme of the present invention is as follows:
a pest target detection method based on decoupling classification and regression characteristic optimal layer technology comprises the following steps:
11 Acquisition of training sample set: acquiring pest image samples and preprocessing to form a training sample set;
12 Construction of pest target detection network: constructing a pest target detection network based on the basic feature representation network, the feature pyramid network and the target area extraction network;
13 Training of pest target detection network: training the pest target detection network based on decoupling classification and regression feature optimal layer technology by using a training sample;
14 Acquisition of pest image samples to be detected: acquiring a pest image sample to be detected, and preprocessing;
15 Detection and localization of pest targets): and inputting the preprocessed pest image sample to be detected into a trained pest target detection network, and positioning the pest position in the pest image.
The construction of the pest target detection network comprises the following steps:
21 Setting a first layer of the pest target detection network as a basic characteristic representation network, a second layer of the pest target detection network as a characteristic pyramid network and a third layer of the pest target detection network as a target area extraction network;
22 Setting a basic feature representation network as a residual network, and mining the most representative image feature representation from the depth, width and receptive field factors of the convolution block to act as a feature extractor;
23 Setting a feature pyramid network as a transversely connected hierarchical structure, and transmitting semantic information in high-level features to low-level features from top to bottom, wherein in the feature pyramid network, the feature extraction process is divided into two parts: a bottom-up process and a top-down transverse connection fusion process, wherein the bottom-up process is a process for extracting features from a backbone network;
24 A first phase network of the set target area extraction network: setting reference frames with different scales for different feature layers, for locating and returning to a preliminary possible target area, setting reference frames with 16 x 16 size for a first layer of a feature pyramid, setting reference frames with 32 x 32 size for a second layer, setting reference frames with 64 x 64 size for a third layer, setting reference frames with 128 x 128 size for a fourth layer, setting reference frames with 256 x 256 size for a fifth layer, and setting reference frames with 512 x 512 size for a sixth layer;
25 A second stage network that sets a target area extraction network: and respectively searching the optimal characteristic layers for classification and regression for the preliminary possible target areas to respectively locate and classify.
The training of the pest target detection network comprises the following steps:
31 Training basic characteristics to represent a network, inputting pest pictures I of a training sample set, wherein w is input into the basic characteristics to represent the network, and extracting characteristics through the basic characteristics to represent the network:
311 Via conv 1): the method comprises 7 x 64 convolutions with step length of 2, and adding batch normalization and nonlinear activation functions after each convolution, and the output is marked as c1;
312 Via conv2_x): firstly, carrying out maximum pooling through 3 x 3maxpooling with a step length of 2, and then, passing through 3 convolution blocks, wherein each convolution block respectively comprises 1 x 64 convolutions, 3 x 64 convolutions and 1 x 256 convolutions, the output is marked as c2, and the first convolution of the first convolution block adopts downsampling convolution with the step length of 2;
313 Via conv3_x): the method comprises 4 convolution blocks, wherein each convolution block comprises 1 x 128 convolution, 3 x 128 convolution and 1 x 512 convolution respectively, the output is marked as c3, the first convolution of the first convolution block adopts downsampled convolution with the step length of 2, and the step length of the rest convolution blocks is 1;
314 Via conv4_x): each convolution block comprises 1 x 256 convolutions, 3 x 256 convolutions and 1 x 1024 convolutions, the output is marked as c4, the first convolution of the first convolution block adopts downsampled convolutions with the step length of 2, and the step length of the rest convolution blocks is 1;
315 Via conv5_x): each convolution block comprises 1 x 512 convolutions, 3 x 512 convolutions and 1 x 2048 convolutions, the output is marked as c5, the first convolution of the first convolution block adopts downsampled convolutions with the step length of 2, and the step length of the rest convolution blocks is 1;
32 Training a feature pyramid network:
inputting feature graphs c1, c2, c3, c4 and c5 with different channels into a feature pyramid network respectively, carrying out channel normalization through convolution of 1 x 256 to obtain M1, M2, M3, M4 and M5 respectively, carrying out up-sampling and M4 addition on the feature graphs M5 to obtain a feature graph M4, carrying out up-sampling and M3 addition on the feature graph M4 to obtain a feature graph M3, carrying out up-sampling and M2 addition on the feature graph M3 to obtain a feature graph M2, and carrying out up-sampling and M1 addition on the feature graph M2 to obtain a feature graph M1; the expression is as follows:
M4=m4+upsampling(M5),
M3=m3+upsampling(M4),
M2=m2+upsampling(M3),
M1=m1+upsampling(M2);
in order to eliminate the aliasing effect of the upsampling, P1, P2, P3, P4 and P5 are obtained by convolving M1, M2, M3, M4 and M5 with 3*3 of the same channel, and at the same time P5 is obtained by downsampling to obtain a feature map P6, which has the following expression:
Figure BDA0003165692570000041
Figure BDA0003165692570000042
Figure BDA0003165692570000043
Figure BDA0003165692570000044
Figure BDA0003165692570000045
Figure BDA0003165692570000046
where w=2xw1, wi= 2*w (i+1), i e {1,2,3,4,5};
33 First stage network of training target area extraction network:
for different feature layers P1, P2, P3, P4, P5 and P6, the area sizes of the reference frames are respectively set to 16×16, 32×32, 64×64, 128×128, 256×256, 512×512, and each area corresponds to 3 length-width contrasts, namely 1:2, 1:1 and 2:1, so that 18 candidate frames are used for extracting the first stage network in the target area;
for the P1 layer, w1 x h1 x 3 reference frames are taken as a total, and if a certain reference frame and a certain real frame have the highest IOU or the IOU of any real frame is greater than 0.7, positive samples are set; if the IOU of a certain reference frame and any real frame is less than 0.3, setting to be a negative sample; a first stage network extracted through positive and negative sample learning target areas;
the first stage network input of the target area extraction network is a characteristic diagram of different characteristic layers, and consists of 3 x 256 convolutions, parallel 1*1 x (3 x 4) regression branches and 1*1 x (3*2) convolution classification branches, and network parameters are learned through the loss back propagation errors of network values and true values; finally, extracting a preliminary target area through a first-stage network of a target area extraction network;
34 A second stage network that trains the target area extraction network:
setting a preliminary target area and a IoU of a real frame to be more than 0.5, wherein the frame is a positive sample; when IoU of the preliminary target area and the real frame is smaller than 0.3, the frame is a negative sample;
decoupling classification and regression tasks are carried out on the preliminary target region, and an optimal classification feature layer for classification and an optimal positioning feature layer for positioning are respectively searched for and classified and positioned;
the characteristic layer is selected by
Figure BDA0003165692570000051
Wherein k0 is set to be 4, and then the k layer characteristics are used as an optimal characteristic layer according to the width w and the height h of the preliminary target area and the common setting k0 to classify and regress the preliminary target area in the second stage;
the k-1 layer is used for positioning and the k+1 layer is used for classifying the preliminary target area, after the optimal characteristic layer is found, the preliminary target area is classified corresponding to the optimal classified characteristic layer, and then the optimal positioning characteristic layer is used for positioning, network parameters are learned through the loss back propagation error of the network value and the true value, the smoothL1 loss is used for positioning loss, and the softmax cross entropy loss is used for classifying loss.
The detection of the pest object comprises the following steps:
41 Inputting a pest image sample I to be detected, w is input into a basic characteristic representation network:
411 7 x 64 convolutions with step size of 2, each convolution is added with batch normalization and nonlinear activation function, and the output is marked as c1;
412 First through maximum pooling with 3 x 3max pooling with step size of 2, then through 3 convolution blocks, each convolution block contains 1 x 64 convolution, 3 x 64 convolution, 1 x 256 convolution, and output is marked as c2;
413 4 convolutions of 1 x 128 convolutions, 3 x 128 convolutions, 1 x 512 convolutions, and the output is denoted as c3;
414 23 convolutions, each convolutions comprising 1 x 256 convolutions, 3 x 256 convolutions, 1 x 1024 convolutions, the output being denoted c4;
415 3 convolutions of 1 x 512 convolutions, 3 x 512 convolutions, 1 x 2048 convolutions, and the output is denoted as c5;
42 Inputting feature graphs c1, c2, c3, c4 and c5 with different channels into a feature pyramid network respectively, carrying out channel normalization by convolution of 1 x 256 to obtain M1, M2, M3, M4 and M5 respectively, and carrying out up-sampling and M4 addition on the feature graph M5 to obtain a feature graph M4 and simultaneously obtaining M1, M2 and M3 respectively; in order to eliminate the aliasing effect of the up-sampling, the M1, M2, M3, M4 and M5 are respectively convolved by adopting 3*3 of the same channel to obtain P1, P2, P3, P4 and P5, and meanwhile, the P5 obtains a characteristic diagram P6 through down-sampling;
43 Input target area extraction network first stage network:
setting reference frames with different scales aiming at different feature layers P1, P2, P3, P4, P5 and P6, inputting a trained target region extraction network into a first stage network, firstly carrying out 3 x 256 convolution, and then extracting to a preliminary target region through a parallel 1*1 x (3 x 4) regression branch and a 1*1 x (3*2) convolution classification branch;
44 Input target area extraction network second stage network):
the decoupling classification and regression tasks are carried out aiming at the preliminary target area, the optimal classification characteristic layer for classification and the optimal positioning characteristic layer for positioning are respectively searched for classification and positioning,
according to the characteristic layer selection mode
Figure BDA0003165692570000061
Wherein k0 is commonly set to 4, the k-1 layer is selected for positioning, the k+1 layer is selected for classification, and then the preliminary target area is corresponding to the optimal target areaClassifying the classification feature layers, and positioning the detection target according to the optimal positioning feature layer to obtain the final detection target.
Advantageous effects
Compared with the prior art, the pest target detection method based on the decoupling classification and regression feature optimal layer technology can be used for assigning classification and regression tasks to different feature layers according to the common feature layer setting to obtain final detection results, so that the pest body difference in the pest killing lamp environment is detected in a differentiated mode, the pest detection recognition rate in the pest killing lamp environment is improved, and the requirements of practical application are met.
Drawings
FIG. 1 is a process sequence diagram of the present invention;
FIG. 2 is a diagram showing a pest image detection low-level feature;
fig. 3 is a diagram showing the characteristics of the high-level layer detected by the pest image.
Detailed Description
For a further understanding and appreciation of the structural features and advantages achieved by the present invention, the following description is provided in connection with the accompanying drawings, which are presently preferred embodiments and are incorporated in the accompanying drawings, in which:
as shown in fig. 1, the pest target detection method based on the decoupling classification and regression characteristic optimal layer technology of the invention comprises the following steps:
first, acquiring a training sample set: and acquiring pest image samples and preprocessing to form a training sample set.
Secondly, constructing a pest target detection network: and constructing a pest target detection network based on the basic feature representation network, the feature pyramid network and the target area extraction network.
The construction of the pest target detection network comprises the following steps:
(1) Setting a first layer of a pest target detection network as a basic characteristic representation network, a second layer of the pest target detection network as a characteristic pyramid network and a third layer of the pest target detection network as a target area extraction network;
(2) Setting a basic feature representation network as a residual network, and mining the most representative image feature representation from the depth, width and receptive field factors of the convolution block to act as a feature extractor;
(3) Setting a feature pyramid network as a transversely connected hierarchical structure, transmitting semantic information in high-level features to low-level features from top to bottom, wherein in the feature pyramid network, a feature extraction process is divided into two parts: a bottom-up process and a top-down transverse connection fusion process, wherein the bottom-up process is a process for extracting features from a backbone network;
(4) First stage network of setting target area extraction network: setting reference frames with different scales for different feature layers, for locating and returning to a preliminary possible target area, setting reference frames with 16 x 16 size for a first layer of a feature pyramid, setting reference frames with 32 x 32 size for a second layer, setting reference frames with 64 x 64 size for a third layer, setting reference frames with 128 x 128 size for a fourth layer, setting reference frames with 256 x 256 size for a fifth layer, and setting reference frames with 512 x 512 size for a sixth layer;
(5) Setting a second stage network of the target area extraction network: and respectively searching the optimal characteristic layers for classification and regression for the preliminary possible target areas to respectively locate and classify.
Thirdly, training a pest target detection network: training the pest target detection network based on decoupling classification and regression feature optimal layer technology by using a training sample.
Because the object detection task includes classifying and locating the 2 tasks, the features required by the 2 tasks are different, the classification requires high-level semantic information, and the locating requires low-level location information, as shown in fig. 2 and 3, the form in which the high-level semantic information and the low-level location information are transferred is also different. The most favorable feature layers for 2 tasks are not the same, so that the most favorable features are required to be found in the pyramid feature layers separately for different tasks, and finally summarized.
The training of the pest target detection network comprises the following steps:
(1) Training underlying features represent the network:
inputting the pest pictures I of the training sample set, wherein w is input into a basic feature representation network, and extracting features through the basic feature representation network:
a1 Via conv 1): the method comprises 7 x 64 convolutions with step length of 2, and adding batch normalization and nonlinear activation functions after each convolution, and the output is marked as c1;
a2 Via conv2_x): firstly, carrying out maximum pooling through 3 x 3maxpooling with a step length of 2, and then, passing through 3 convolution blocks, wherein each convolution block respectively comprises 1 x 64 convolutions, 3 x 64 convolutions and 1 x 256 convolutions, the output is marked as c2, and the first convolution of the first convolution block adopts downsampling convolution with the step length of 2;
a3 Via conv3_x): the method comprises 4 convolution blocks, wherein each convolution block comprises 1 x 128 convolution, 3 x 128 convolution and 1 x 512 convolution respectively, the output is marked as c3, the first convolution of the first convolution block adopts downsampled convolution with the step length of 2, and the step length of the rest convolution blocks is 1;
a4 Via conv4_x): each convolution block comprises 1 x 256 convolutions, 3 x 256 convolutions and 1 x 1024 convolutions, the output is marked as c4, the first convolution of the first convolution block adopts downsampled convolutions with the step length of 2, and the step length of the rest convolution blocks is 1;
a5 Via conv5_x): each convolution block comprises 1 x 512 convolutions, 3 x 512 convolutions and 1 x 2048 convolutions, the output is marked as c5, the first convolution of the first convolution block adopts downsampled convolutions with the step length of 2, and the step length of the rest convolution blocks is 1.
(2) Training a feature pyramid network: because c1, c2, c3, c4, c5 are downscaled step by pooling downsampling and convolutional downsampling, semantic information is more abundant and spatial location information is reduced at the same time, so features are mixed through the feature pyramid network.
Inputting feature graphs c1, c2, c3, c4 and c5 with different channels into a feature pyramid network respectively, carrying out channel normalization through convolution of 1 x 256 to obtain M1, M2, M3, M4 and M5 respectively, carrying out up-sampling and M4 addition on the feature graphs M5 to obtain a feature graph M4, carrying out up-sampling and M3 addition on the feature graph M4 to obtain a feature graph M3, carrying out up-sampling and M2 addition on the feature graph M3 to obtain a feature graph M2, and carrying out up-sampling and M1 addition on the feature graph M2 to obtain a feature graph M1; the expression is as follows:
M4=m4+upsampling(M5),
M3=m3+upsampling(M4),
M2=m2+upsampling(M3),
M1=m1+upsampling(M2);
in order to eliminate the aliasing effect of the upsampling, P1, P2, P3, P4 and P5 are obtained by convolving M1, M2, M3, M4 and M5 with 3*3 of the same channel, and at the same time P5 is obtained by downsampling to obtain a feature map P6, which has the following expression:
Figure BDA0003165692570000091
Figure BDA0003165692570000092
Figure BDA0003165692570000093
Figure BDA0003165692570000094
Figure BDA0003165692570000095
Figure BDA0003165692570000096
where w=2xw1, wi= 2*w (i+1), i e {1,2,3,4,5}.
(3) First stage network of training target area extraction network:
for different feature layers P1, P2, P3, P4, P5 and P6, the area sizes of the reference frames are respectively set to 16×16, 32×32, 64×64, 128×128, 256×256, 512×512, and each area corresponds to 3 length-width contrasts, namely 1:2, 1:1 and 2:1, so that 18 candidate frames are used for extracting the first stage network in the target area;
for the P1 layer, w1 x h1 x 3 reference frames are taken as a total, and if a certain reference frame and a certain real frame have the highest IOU or the IOU of any real frame is greater than 0.7, positive samples are set; if the IOU of a certain reference frame and any real frame is less than 0.3, setting to be a negative sample; a first stage network extracted through positive and negative sample learning target areas;
the first stage network input of the target area extraction network is a characteristic diagram of different characteristic layers, and consists of 3 x 256 convolutions, parallel 1*1 x (3 x 4) regression branches and 1*1 x (3*2) convolution classification branches, and network parameters are learned through the loss back propagation errors of network values and true values; finally, the preliminary target area is extracted through the first stage network of the target area extraction network.
(4) Training the second stage network of the target area extraction network: because the classification and positioning tasks are different, the required features most beneficial to the tasks are also different, so that the features possibly scattered on different layers are more beneficial to different tasks, the target detection comprises classifying and positioning 2 tasks, and the current target detection task still uses the same layer or the same features to do different tasks although a plurality of methods are used for searching the optimal layer or the combination of different layers.
Setting a preliminary target area and a IoU of a real frame to be more than 0.5, wherein the frame is a positive sample; when IoU of the preliminary target area and the real frame is smaller than 0.3, the frame is a negative sample;
decoupling classification and regression tasks are carried out on the preliminary target region, and an optimal classification feature layer for classification and an optimal positioning feature layer for positioning are respectively searched for and classified and positioned;
the characteristic layer is selected by
Figure BDA0003165692570000101
Wherein k0 is set to be 4, and then the k layer characteristics are used as an optimal characteristic layer according to the width w and the height h of the preliminary target area and the common setting k0 to classify and regress the preliminary target area in the second stage;
the k-1 layer is used for positioning and the k+1 layer is used for classifying the preliminary target area, after the optimal characteristic layer is found, the preliminary target area is classified corresponding to the optimal classified characteristic layer, and then the optimal positioning characteristic layer is used for positioning, network parameters are learned through the loss back propagation error of the network value and the true value, the smoothL1 loss is used for positioning loss, and the softmax cross entropy loss is used for classifying loss.
Because more semantic information features are needed for classification and more spatial position information features are needed for positioning, aiming at a preliminary possible target area, a k-1 layer is selected to be used for positioning, a k+1 layer is used for classification or an optimal layer of different tasks is found in other modes, such as an optimal layer is found through a minimum loss layer of different tasks, after the optimal feature layer is found, the preliminary possible target area is classified corresponding to the optimal classification feature layer and then is positioned corresponding to the optimal positioning feature layer, and network parameters are learned through loss back propagation errors of network values and true values.
Fourth, obtaining a pest image sample to be detected: and acquiring a pest image sample to be detected, and preprocessing.
Fifthly, detecting and positioning pest targets: and inputting the preprocessed pest image sample to be detected into a trained pest target detection network, and positioning the pest position in the pest image.
The detection of the pest object comprises the following steps:
(1) Inputting a pest image sample I to be detected, w is input into a basic characteristic representation network:
b1 7 x 64 convolutions with step size of 2, each convolution is added with batch normalization and nonlinear activation function, and the output is marked as c1;
b2 First through maximum pooling with 3 x 3max pooling with step size of 2, then through 3 convolution blocks, each convolution block contains 1 x 64 convolution, 3 x 64 convolution, 1 x 256 convolution, and output is marked as c2;
b3 4 convolutions of 1 x 128 convolutions, 3 x 128 convolutions, 1 x 512 convolutions, and the output is denoted as c3;
b4 23 convolutions, each convolutions comprising 1 x 256 convolutions, 3 x 256 convolutions, 1 x 1024 convolutions, the output being denoted c4;
b5 Through conv5_x, each convolution block contains 1 x 512 convolutions, 3 x 512 convolutions, and 1 x 2048 convolutions, and the output is denoted as c5.
(2) Respectively inputting feature graphs c1, c2, c3, c4 and c5 with different channels into a feature pyramid network, carrying out channel normalization by convolution of 1 x 256 to respectively obtain M1, M2, M3, M4 and M5, and carrying out up-sampling and M4 addition on the feature graph M5 to obtain a feature graph M4, and simultaneously respectively obtaining M1, M2 and M3; in order to eliminate the aliasing effect of the upsampling, P1, P2, P3, P4, P5 are obtained by convolving M1, M2, M3, M4, M5 with 3*3 of the same channel, respectively, while P5 is downsampled to obtain the feature map P6.
(3) Input target area extraction network first stage network:
for different feature layers P1, P2, P3, P4, P5 and P6, reference frames with different scales are set, a first stage network of a trained target region extraction network is input, and the first stage network is firstly subjected to 3 x 256 convolution, and then is extracted to a preliminary target region through parallel 1*1 x (3 x 4) regression branches and 1*1 x (3*2) convolution classification branches.
(4) Input target area extraction network second stage network:
the decoupling classification and regression tasks are carried out aiming at the preliminary target area, the optimal classification characteristic layer for classification and the optimal positioning characteristic layer for positioning are respectively searched for classification and positioning,
according to the characteristic layer selection mode
Figure BDA0003165692570000121
Wherein k0 is commonly set to 4, the k-1 layer is used for positioning, the k+1 layer is used for classifying, and then the preliminary target area is obtainedThe domain is classified corresponding to the optimal classification characteristic layer, and then is positioned corresponding to the optimal positioning characteristic layer, so that a final detection target is obtained.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention, which is defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (3)

1. The pest target detection method based on the decoupling classification and regression characteristic optimal layer technology is characterized by comprising the following steps of:
11 Acquisition of training sample set: acquiring pest image samples and preprocessing to form a training sample set;
12 Construction of pest target detection network: constructing a pest target detection network based on the basic feature representation network, the feature pyramid network and the target area extraction network;
13 Training of pest target detection network: training the pest target detection network based on decoupling classification and regression feature optimal layer technology by using a training sample;
the training of the pest target detection network comprises the following steps:
131 Training basic characteristics to represent a network, inputting pest pictures I of a training sample set, wherein w is input into the basic characteristics to represent the network, and extracting characteristics through the basic characteristics to represent the network:
1311 Via conv 1): the method comprises 7 x 64 convolutions with step length of 2, and adding batch normalization and nonlinear activation functions after each convolution, and the output is marked as c1;
1312 Via conv2_x): firstly, carrying out maximum pooling through 3 x 3maxpooling with a step length of 2, and then, passing through 3 convolution blocks, wherein each convolution block respectively comprises 1 x 64 convolutions, 3 x 64 convolutions and 1 x 256 convolutions, the output is marked as c2, and the first convolution of the first convolution block adopts downsampling convolution with the step length of 2;
1313 Via conv3_x): the method comprises 4 convolution blocks, wherein each convolution block comprises 1 x 128 convolution, 3 x 128 convolution and 1 x 512 convolution respectively, the output is marked as c3, the first convolution of the first convolution block adopts downsampled convolution with the step length of 2, and the step length of the rest convolution blocks is 1;
1314 Via conv4_x): each convolution block comprises 1 x 256 convolutions, 3 x 256 convolutions and 1 x 1024 convolutions, the output is marked as c4, the first convolution of the first convolution block adopts downsampled convolutions with the step length of 2, and the step length of the rest convolution blocks is 1;
1315 Via conv5_x): each convolution block comprises 1 x 512 convolutions, 3 x 512 convolutions and 1 x 2048 convolutions, the output is marked as c5, the first convolution of the first convolution block adopts downsampled convolutions with the step length of 2, and the step length of the rest convolution blocks is 1;
132 Training a feature pyramid network:
inputting feature graphs c1, c2, c3, c4 and c5 with different channels into a feature pyramid network respectively, carrying out channel normalization through convolution of 1 x 256 to obtain M1, M2, M3, M4 and M5 respectively, carrying out up-sampling and M4 addition on the feature graphs M5 to obtain a feature graph M4, carrying out up-sampling and M3 addition on the feature graph M4 to obtain a feature graph M3, carrying out up-sampling and M2 addition on the feature graph M3 to obtain a feature graph M2, and carrying out up-sampling and M1 addition on the feature graph M2 to obtain a feature graph M1; the expression is as follows:
M4=m4+upsampling(M5),
M3=m3+upsampling(M4),
M2=m2+upsampling(M3),
M1=m1+upsampling(M2);
in order to eliminate the aliasing effect of the upsampling, P1, P2, P3, P4 and P5 are obtained by convolving M1, M2, M3, M4 and M5 with 3*3 of the same channel, and at the same time P5 is obtained by downsampling to obtain a feature map P6, which has the following expression:
Figure FDA0004085345300000021
Figure FDA0004085345300000022
Figure FDA0004085345300000023
/>
Figure FDA0004085345300000024
Figure FDA0004085345300000025
Figure FDA0004085345300000026
where w=2xw1, wi= 2*w (i+1), i e {1,2,3,4,5};
133 First stage network of training target area extraction network:
for different feature layers P1, P2, P3, P4, P5 and P6, the area sizes of the reference frames are respectively set to 16×16, 32×32, 64×64, 128×128, 256×256, 512×512, and each area corresponds to 3 length-width contrasts, namely 1:2, 1:1 and 2:1, so that 18 candidate frames are used for extracting the first stage network in the target area;
for the P1 layer, w1 x h1 x 3 reference frames are taken as a total, and if a certain reference frame and a certain real frame have the highest IOU or the IOU of any real frame is greater than 0.7, positive samples are set; if the IOU of a certain reference frame and any real frame is less than 0.3, setting to be a negative sample; a first stage network extracted through positive and negative sample learning target areas;
the first stage network input of the target area extraction network is a characteristic diagram of different characteristic layers, and consists of 3 x 256 convolutions, parallel 1*1 x (3 x 4) regression branches and 1*1 x (3*2) convolution classification branches, and network parameters are learned through the loss back propagation errors of network values and true values; finally, extracting a preliminary target area through a first-stage network of a target area extraction network;
134 A second stage network that trains the target area extraction network:
setting a preliminary target area and a IoU of a real frame to be more than 0.5, wherein the frame is a positive sample; when IoU of the preliminary target area and the real frame is smaller than 0.3, the frame is a negative sample;
decoupling classification and regression tasks are carried out on the preliminary target region, and an optimal classification feature layer for classification and an optimal positioning feature layer for positioning are respectively searched for and classified and positioned;
the characteristic layer is selected by
Figure FDA0004085345300000031
Wherein k0 is set to be 4, and then the k layer characteristics are used as an optimal characteristic layer according to the width w and the height h of the preliminary target area and the common setting k0 to classify and regress the preliminary target area in the second stage;
selecting a k-1 layer for positioning and a k+1 layer for classifying a preliminary target area, classifying the preliminary target area corresponding to an optimal classifying feature layer after finding an optimal feature layer, positioning the preliminary target area corresponding to the optimal positioning feature layer, learning network parameters through the loss back propagation error of a network value and a true value, wherein the positioning loss uses smoothL1 loss, and the classifying loss uses softmax cross entropy loss;
14 Acquisition of pest image samples to be detected: acquiring a pest image sample to be detected, and preprocessing;
15 Detection and localization of pest targets): and inputting the preprocessed pest image sample to be detected into a trained pest target detection network, and positioning the pest position in the pest image.
2. The pest target detection method based on the decoupling classification and regression feature optimization layer technique according to claim 1, wherein the construction of the pest target detection network includes the steps of:
21 Setting a first layer of the pest target detection network as a basic characteristic representation network, a second layer of the pest target detection network as a characteristic pyramid network and a third layer of the pest target detection network as a target area extraction network;
22 Setting a basic feature representation network as a residual network, and mining the most representative image feature representation from the depth, width and receptive field factors of the convolution block to act as a feature extractor;
23 Setting a feature pyramid network as a transversely connected hierarchical structure, and transmitting semantic information in high-level features to low-level features from top to bottom, wherein in the feature pyramid network, the feature extraction process is divided into two parts: a bottom-up process and a top-down transverse connection fusion process, wherein the bottom-up process is a process for extracting features from a backbone network;
24 A first phase network of the set target area extraction network: setting reference frames with different scales for different feature layers, for locating and returning to a preliminary possible target area, setting reference frames with 16 x 16 size for a first layer of a feature pyramid, setting reference frames with 32 x 32 size for a second layer, setting reference frames with 64 x 64 size for a third layer, setting reference frames with 128 x 128 size for a fourth layer, setting reference frames with 256 x 256 size for a fifth layer, and setting reference frames with 512 x 512 size for a sixth layer;
25 A second stage network that sets a target area extraction network: and respectively searching the optimal characteristic layers for classification and regression for the preliminary possible target areas to respectively locate and classify.
3. The vermin target detection method based on the decoupling classification and regression feature optimization layer technique according to claim 1, wherein the detection of vermin object includes the steps of:
31 Inputting a pest image sample I to be detected, w is input into a basic characteristic representation network:
311 7 x 64 convolutions with step size of 2, each convolution is added with batch normalization and nonlinear activation function, and the output is marked as c1;
312 First through maximum pooling with 3 x 3max pooling with step size of 2, then through 3 convolution blocks, each convolution block contains 1 x 64 convolution, 3 x 64 convolution, 1 x 256 convolution, and output is marked as c2;
313 4 convolutions of 1 x 128 convolutions, 3 x 128 convolutions, 1 x 512 convolutions, and the output is denoted as c3;
314 23 convolutions, each convolutions comprising 1 x 256 convolutions, 3 x 256 convolutions, 1 x 1024 convolutions, the output being denoted c4;
315 3 convolutions of 1 x 512 convolutions, 3 x 512 convolutions, 1 x 2048 convolutions, and the output is denoted as c5;
32 Inputting feature graphs c1, c2, c3, c4 and c5 with different channels into a feature pyramid network respectively, carrying out channel normalization by convolution of 1 x 256 to obtain M1, M2, M3, M4 and M5 respectively, and carrying out up-sampling and M4 addition on the feature graph M5 to obtain a feature graph M4 and simultaneously obtaining M1, M2 and M3 respectively; in order to eliminate the aliasing effect of the up-sampling, the M1, M2, M3, M4 and M5 are respectively convolved by adopting 3*3 of the same channel to obtain P1, P2, P3, P4 and P5, and meanwhile, the P5 obtains a characteristic diagram P6 through down-sampling;
33 Input target area extraction network first stage network:
setting reference frames with different scales aiming at different feature layers P1, P2, P3, P4, P5 and P6, inputting a trained target region extraction network into a first stage network, firstly carrying out 3 x 256 convolution, and then extracting to a preliminary target region through a parallel 1*1 x (3 x 4) regression branch and a 1*1 x (3*2) convolution classification branch;
34 Input target area extraction network second stage network):
the decoupling classification and regression tasks are carried out aiming at the preliminary target area, the optimal classification characteristic layer for classification and the optimal positioning characteristic layer for positioning are respectively searched for classification and positioning,
according to the characteristic layer selection mode
Figure FDA0004085345300000061
Wherein k0 is commonly set as 4, the k-1 layer is used for positioning, the k+1 layer is used for classifying, then the preliminary target area is classified corresponding to the optimal classifying feature layer, and then the final detection target is obtained after positioning corresponding to the optimal positioning feature layer. />
CN202110804036.5A 2021-07-16 2021-07-16 Pest target detection method based on decoupling classification and regression feature optimal layer technology Active CN113627269B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110804036.5A CN113627269B (en) 2021-07-16 2021-07-16 Pest target detection method based on decoupling classification and regression feature optimal layer technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110804036.5A CN113627269B (en) 2021-07-16 2021-07-16 Pest target detection method based on decoupling classification and regression feature optimal layer technology

Publications (2)

Publication Number Publication Date
CN113627269A CN113627269A (en) 2021-11-09
CN113627269B true CN113627269B (en) 2023-04-28

Family

ID=78379903

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110804036.5A Active CN113627269B (en) 2021-07-16 2021-07-16 Pest target detection method based on decoupling classification and regression feature optimal layer technology

Country Status (1)

Country Link
CN (1) CN113627269B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10713794B1 (en) * 2017-03-16 2020-07-14 Facebook, Inc. Method and system for using machine-learning for object instance segmentation
CN111444865A (en) * 2020-03-31 2020-07-24 盐城禅图智能科技有限公司 Multi-scale target detection method based on gradual refinement
CN111738174A (en) * 2020-06-25 2020-10-02 中国科学院自动化研究所 Human body example analysis method and system based on depth decoupling
CN112183450A (en) * 2020-10-15 2021-01-05 成都思晗科技股份有限公司 Multi-target tracking method
CN112560876A (en) * 2021-02-23 2021-03-26 中国科学院自动化研究所 Single-stage small sample target detection method for decoupling measurement
CN112651404A (en) * 2020-12-22 2021-04-13 山东师范大学 Green fruit efficient segmentation method and system based on anchor-frame-free detector

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10713794B1 (en) * 2017-03-16 2020-07-14 Facebook, Inc. Method and system for using machine-learning for object instance segmentation
CN111444865A (en) * 2020-03-31 2020-07-24 盐城禅图智能科技有限公司 Multi-scale target detection method based on gradual refinement
CN111738174A (en) * 2020-06-25 2020-10-02 中国科学院自动化研究所 Human body example analysis method and system based on depth decoupling
CN112183450A (en) * 2020-10-15 2021-01-05 成都思晗科技股份有限公司 Multi-target tracking method
CN112651404A (en) * 2020-12-22 2021-04-13 山东师范大学 Green fruit efficient segmentation method and system based on anchor-frame-free detector
CN112560876A (en) * 2021-02-23 2021-03-26 中国科学院自动化研究所 Single-stage small sample target detection method for decoupling measurement

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Shin SJ et al.《Hierarchical Multi-Label Object Detection Framework for Remote Sensing Images》.《MDPI》.2020,第1-9页. *
李梦溪.《基于特征融合和困难样例挖掘的图像语义分割》.《中国优秀硕士学位论文全文数据库 信息科技辑》.2018,(第2018年第12期),第I138-1734页. *
陈天娇等.《基于深度学习的病虫害智能化识别系统》.《中国植保导刊》.2019,(第4期),第26-34页. *

Also Published As

Publication number Publication date
CN113627269A (en) 2021-11-09

Similar Documents

Publication Publication Date Title
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN109255334B (en) Remote sensing image ground feature classification method based on deep learning semantic segmentation network
AU2019101133A4 (en) Fast vehicle detection using augmented dataset based on RetinaNet
CN108399362B (en) Rapid pedestrian detection method and device
CN110148120B (en) Intelligent disease identification method and system based on CNN and transfer learning
CN108734208B (en) Multi-source heterogeneous data fusion system based on multi-mode deep migration learning mechanism
CN111222396B (en) All-weather multispectral pedestrian detection method
Komorowski et al. Minkloc++: lidar and monocular image fusion for place recognition
CN110796009A (en) Method and system for detecting marine vessel based on multi-scale convolution neural network model
CN108830254B (en) Fine-grained vehicle type detection and identification method based on data balance strategy and intensive attention network
CN106815576B (en) Target tracking method based on continuous space-time confidence map and semi-supervised extreme learning machine
CN111723829A (en) Full-convolution target detection method based on attention mask fusion
CN114581456B (en) Multi-image segmentation model construction method, image detection method and device
CN114283162A (en) Real scene image segmentation method based on contrast self-supervision learning
CN110969182A (en) Convolutional neural network construction method and system based on farmland image
CN115953630A (en) Cross-domain small sample image classification method based on global-local knowledge distillation
CN112651381A (en) Method and device for identifying livestock in video image based on convolutional neural network
CN116883650A (en) Image-level weak supervision semantic segmentation method based on attention and local stitching
CN116740418A (en) Target detection method based on graph reconstruction network
CN117649610B (en) YOLOv-based pest detection method and YOLOv-based pest detection system
CN117830788A (en) Image target detection method for multi-source information fusion
CN113627269B (en) Pest target detection method based on decoupling classification and regression feature optimal layer technology
CN114937239A (en) Pedestrian multi-target tracking identification method and tracking identification device
CN115988260A (en) Image processing method and device and electronic equipment
Leipnitz et al. The effect of image resolution in the human presence detection: A case study on real-world image data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant