CN111340096A

CN111340096A - Weakly supervised butterfly target detection method based on confrontation complementary learning

Info

Publication number: CN111340096A
Application number: CN202010111404.3A
Authority: CN
Inventors: 李玉鑑; 方宇; 张婷; 刘兆英
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2020-02-24
Filing date: 2020-02-24
Publication date: 2020-06-26

Abstract

The invention discloses a weak supervision butterfly target detection method based on antagonistic complementary learning, which sequentially comprises the following steps: firstly, mixing a butterfly ecological image crawled by a crawler with a butterfly specimen image according to categories to form a butterfly data set; then cutting and standardizing the image; then dividing the butterfly data set into a training image set and a testing image set according to a proportion; then, a backbone network and an confrontation complementary learning network are established, a training set is used for training the network, and the model is stored when the network converges; and finally, inputting the test image into the trained network model to obtain a target detection result graph.

Description

Weakly supervised butterfly target detection method based on confrontation complementary learning

Technical Field

The invention belongs to the technical field of image recognition, and particularly relates to a weak supervision butterfly target detection method based on confrontation complementary learning.

Background

Butterflies are a group of insects of the phylum arthropoda, class insecta, order lepidoptera, suborder hammer. On one hand, butterflies in the larval stage feed on agricultural and forestry crops and are one of main pests in the agricultural and forestry; on the other hand, the butterfly is a precious environmental index, the monitoring data of the butterfly is used for ecological environment monitoring, biodiversity protection and the like, and meanwhile, the butterfly has high ornamental value and economic value and is a natural resource. Therefore, the classification and identification of butterflies have great significance in the aspects of agriculture, forestry, disease and pest control, environmental protection, butterfly industry development and other practical works.

The traditional butterfly identification mode mainly comprises two modes of artificial identification and biochemical identification. The manual identification mode mainly compares the ecological characteristics with the specimen characteristics, and the method depends on long-term experience accumulation and is time-consuming; the biochemical identification method utilizes the reaction of butterfly genitals to biochemical reagents, and the method depends on professional biochemical knowledge and is expensive. Therefore, the two methods do not have the universality of butterfly identification.

With the development of image processing technology and machine learning theory, researchers realize identification of butterflies through a machine learning method, which mainly comprises the steps of artificially extracting image features (color, texture and shape information of butterfly wing surfaces) of butterflies, performing mathematical modeling according to the feature information, and determining a classifier for classification.

Most of machine learning methods need to manually select image features, and the final effect of classification is determined to a great extent by feature extraction and feature selection. Meanwhile, machine learning methods focus on identification of butterfly specimen images, and an effective identification means for butterfly ecological images (butterfly images shot in an ecological environment) is lacking. In the ecological image, on one hand, most butterflies do not occupy all positions in the image; on the other hand, butterflies have mimicry capability, so that butterfly targets are difficult to distinguish from backgrounds, which brings huge challenges to identification of butterfly images. Therefore, in order to better identify the butterfly in the ecological image, the position of the butterfly in the image needs to be determined, and then the identification of the butterfly is completed, which is the target detection of the butterfly.

Deep learning methods, represented by Convolutional Neural Networks (CNNs), have been highly successful in the field of image recognition. The deep learning method can automatically extract image features, and makes great breakthrough in various tasks such as image classification, target detection, image segmentation and the like. Aiming at a target detection task, a two-stage target detection algorithm represented by an R-CNN series and a one-stage target detection algorithm represented by SSD, YOLO and the like have excellent effects, but the algorithms belong to full-supervision detection algorithms, depend on manually marked object boundary boxes and are expensive. To solve the problem of fully supervised detection algorithms, researchers have come to focus on how to accomplish target detection under weak supervision (image-level label only), and some success has emerged. For example, Zhou et al uses a Global Average Pooling (GAP) layer instead of the fully connected layer of VGG to obtain the position information of the object, but this method can only obtain the most discriminative region. Singh et al randomly conceals the block of the region from each input image based on the Zhou method to obtain more distinct regions. However, this method cannot effectively locate the entire area of the object due to random concealment.

The invention effectively locates all positions of the butterfly in the image in a weak supervision mode by using the image-level label only through the confrontation complementary learning method and identifies the category of the butterfly.

Disclosure of Invention

The invention aims to provide a weak supervision butterfly target detection method based on confrontation complementary learning, which is used for target detection of butterfly images. In order to achieve the purpose, the invention adopts the following technical scheme:

a weak supervision butterfly target detection method based on antagonistic complementary learning comprises the following steps:

step 1: and constructing a butterfly data set. The butterfly data set is composed of two parts, the first part is composed of Google pictures and butterfly ecological images crawled on Baidu pictures, and the first part is called as a data set D₁The second part is composed of the butterfly specimen image on "Chinese butterfly Zhi", which is calledData set D₂. Data set D₁And a data set D₂Mixed composition of butterfly data sets

Wherein the butterfly image is I_iThe category label is y_i. The data set D contains N images of M butterflies in total, and the data set D is divided into a training set D_t(containing N)_tImage) and test set D_s(containing N)_sA web image);

step 2: and constructing a backbone network. The present invention selects the first 13 layers of the VGG-16 as the backbone network, which consists of 5 convolutional blocks. Butterfly image I with color input for backbone network_i∈R^h×w×3(1<i<N_t) Where h and w represent the height and width of the image, respectively, and 3 represents the number of channels of the image. Location-aware feature maps with multi-channel output for networks

Wherein K₁Number of channels, H, representing a location-aware feature map₁And W₁Respectively, the height and width of the feature map. The backbone network is represented as:

S_i＝f₀(θ₀,I_i)

wherein f is₀(. -) represents the role of the backbone network, θ₀Is a parameter of the backbone network;

and step 3: and constructing an antagonistic complementary learning network. The countervailing complementary learning network comprises two parallel branches A and B, each branch comprising a feature extractor and a classifier. Wherein the feature extractor and classifier of the A branch are respectively denoted as E_AAnd cls_AThe feature extractor and classifier of the B branch are denoted E_BAnd cls_B；

Step 3.1: for the A branch, the branch first uses a feature extractor E_AExtracting features and acquiring a category activation graph; then uses the classifier cls_AAnd (6) classifying. Wherein the feature extractor is a three-layer convolutional neural network whose input is boneOutput S of the trunk network_i(1≤i≤N_t) Output as class target map

The target graph shows the unique distinguished regions of the target class. Will be provided with

Normalized to [0,1 ]]And is defined as

The location map of the branch is obtained. The classifier consists of a Global Average Pooling (GAP) layer and a soft max (softmax) layer, whose inputs are

Output as classification result

Where M represents the number of categories of butterflies. The whole branch is specifically represented as:

wherein f is_A(. a) and

respectively represent feature extractor E_AAnd a classifier cls_AAction of theta_AAs a feature extractor E_AIs determined by the parameters of (a) and (b),

parameters of the A branch classifier;

step 3.2: erasing feature maps using a feature eraser Era

The most discriminative region in the set. Assuming that the threshold is δ, the most discriminating region is

The area where Aera is located is in the characteristic diagram S_iSetting the middle value to be 0, and generating a feature diagram after erasing

Namely, it is

Step 3.3: for the B branch, the structure is substantially the same as for the A branch. The branch first uses a feature extractor E_BExtracting features, obtaining class activation graph, and then using classifier cls_BAnd (6) classifying. Wherein, the feature extractor is also a three-layer convolution neural network, and the input of the feature extractor is an erased feature map

Output as class target map

The new most discriminative area will be learned. Will be provided with

Normalized to [0,1 ]]And is defined as

The location map of the branch is obtained. The classifier consists of a global average pooling layer anda soft maximum layer having as its input

Output as classification result

The whole branch is specifically represented as:

wherein f is_B(. a) and

respectively represent feature extractor E_BAnd a classifier cls_BAction of theta_BAs a feature extractor E_BIs determined by the parameters of (a) and (b),

parameters of the B-branch classifier;

and 4, step 4: establishing a loss function L of two branch networks_AAnd L_BThe loss function being the actual output vector

And

respectively with the target output vector y_iRespectively expressed as:

then netThe total loss of the collaterals is L ═ L_A+L_B；

And 5: and (5) network training. Setting super parameters such as iteration times, learning rate and the like, and setting a training set D_tInputting a network, iteratively updating network parameters by using a random gradient descent algorithm until loss is converged, and storing a final model;

step 6: and (5) testing the network. Loading the saved model, and testing the set D_sAnd inputting the data into a network to obtain the accuracy of classification. Inputting a single test image I_i∈R^h×w×3(1≤i≤N_s) Obtaining the location map of the A branch

And location map of branch B

Taking the maximum value of the corresponding positions of the two positioning maps to obtain the final positioning map

And drawing a rectangular frame on the image according to the positioning diagram, so as to obtain the position of the butterfly target in the image.

Drawings

Fig. 1 is an original image.

Fig. 2 is a backbone network structure.

Fig. 3 shows the overall network structure.

Fig. 4 is a graph of the test results.

Detailed Description

The embodiment of the invention provides a weak supervision butterfly target detection method based on antagonistic complementary learning, and the invention is explained and explained below by combining the related drawings:

the flow of the embodiment of the invention is as follows:

step 1: and constructing a butterfly data set. The butterfly data set consists of two parts, namely a butterfly ecological image data set and a butterfly specimen image data set. The butterfly ecological image is obtained by crawling reptiles on Google pictures and Baidu pictures and is called as a data set D₁The number ofThe image of the data set is shown in fig. 1 (a). The butterfly specimen image is obtained from the book "Chinese butterfly log", and is called as data set D₂The image of the data set is shown in fig. 1 (b). Data set D₁And a data set D₂Mixed composition of butterfly data sets

Wherein the butterfly image is I_iThe category label is y_iThe butterfly data set D is classified into M334, which contains N74111 images. Dividing the data set D into training sets D according to the proportion of 8:2 of each class_t(containing N)_t58288 images) and test set D_s(containing N)_s14823 images) to prevent the computational burden, each butterfly image is resampled to 256 × 256 and then randomly cropped to 224 × 224 as input to the network, where the data needs to be normalized (the dimensions of the image minus the mean of the dataset and divided by the standard deviation of the dataset);

step 2: and constructing a backbone network. The present invention selects the first 13 layers of the VGG-16 as the backbone network, which consists of 5 convolutional blocks. Wherein, for the first 2 convolutional blocks, each convolutional block consists of 2 convolutional layers; for the last 3 convolutional blocks, each convolutional block is composed of 3 convolutional layers, and the structure is shown in fig. 2, wherein the total number of the convolutional layers is 13. Butterfly image I with color input for backbone network_i∈R^224×224×3(1<i<N_t) Where 3 denotes the number of image channels, and h-224 and w-224 denote the height and width of the image, respectively. Location-aware feature map S with multi-channel output for network_i∈R^28×28×512Where 512 represents the number of channels of the feature map, 28 × 28 represents the resolution of the feature map, the backbone network is represented as:

S_i＝f₀(θ₀,I_i),1＜i＜N_t

and step 3: and constructing an antagonistic complementary learning network. The antagonistic complementary learning network comprises two of A and BThe line branches, each branch containing a feature extractor and a classifier. Wherein the feature extractor and classifier of the A branch are respectively denoted as E_AAnd cls_AThe feature extractor and classifier of the B branch are denoted E_BAnd cls_B；

Step 3.1: for the first branch A, the branch first uses a feature extractor E_AExtracting features, obtaining class activation map, and using classifier cls_AAnd (6) classifying. Wherein the feature extractor is a three-layer convolutional neural network, and the input of the convolutional neural network is the output S of the backbone network_iOutput as class target map

The figure shows the unique distinguishing region of the target class. Will be provided with

Normalized to [0,1 ]]And is defined as

The location map of the branch is obtained. The classifier consists of a Global Average Pooling (GAP) layer and a soft max layer. The global average pooling layer replaces a full connection layer in the VGG16, and the output of the global average pooling layer is a one-dimensional vector with the size of 334; the soft max layer maps one-dimensional vectors to probabilities for each class. The input of the classifier is

Output as classification result

The whole branch is specifically represented as:

wherein f is_A(. a) and

parameters of the A branch classifier;

step 3.2: erasing feature maps using a feature eraser Era

Assuming the threshold is δ (δ ∈ {0.5,0.6,0.7,0.8,0.9}), the most discriminative region is

Namely, it is

Step 3.3: for the second branch B, the structure is substantially the same as branch a. The branch first uses a feature extractor E_BExtracting features, obtaining class activation graph, and then using classifier cls_BAnd (6) classifying. Wherein, the feature extractor is also a three-layer convolution neural network, and the input of the feature extractor is an erased feature map

Output as class target map

Will learn the most new judgmentA sexual area. Will be provided with

Normalized to [0,1 ]]And is defined as

The location map of the branch is obtained. The classifier consists of a global average pooling layer and a soft maximum layer, wherein the output of the global average pooling layer is a one-dimensional vector with the size of 334; the soft max layer maps one-dimensional vectors to probabilities for each class. The input of the classifier is

Output as classification result

The whole branch is specifically represented as:

wherein f is_B(. a) and

parameters of the B-branch classifier;

And

and the target output vector y_iRespectively expressed as:

the total loss of the network is L ═ L_A+L_B；

And 5: and (5) network training. Setting the iteration number to be 50, the learning rate to be 0.001 and the threshold value delta to be 0.6, and setting the training set D_tInputting a network, initializing a backbone network by using a VGG16 weight trained by ImageNet, iteratively updating network parameters by using a random gradient descent algorithm until loss is converged, and storing a final model;

step 6: and (5) testing the network. Loading the saved model, and testing the set D_sAnd inputting the data into a network to obtain the classification accuracy. Inputting a single test image I_i∈R^h×w×3(1≤i≤N_s) Obtaining the location map of the A branch

And location map of branch B

Adding the scout map to the test image, as shown in fig. 4 (a); and (3) binarizing the positioning map, then obtaining the contour of the butterfly target, then obtaining a circumscribed rectangle of the contour, and finally drawing the rectangle on the test image to obtain the position of the butterfly target in the image, as shown in (b) of the attached figure 4.

The above examples are only used to describe the present invention, and do not limit the technical solutions described in the present invention. Therefore, all technical solutions and modifications that do not depart from the spirit and scope of the present invention should be construed as being included in the scope of the appended claims.

Claims

1. A weak supervision butterfly target detection method based on confrontation complementary learning is characterized by comprising the following steps: the method comprises the following steps of,

step 1: constructing a butterfly data set: the butterfly data set is composed of two parts, the first part is composed of Google pictures and butterfly ecological images crawled on Baidu pictures, and the first part is called as a data set D₁The second part consisting of an image of a butterfly specimen, called dataset D₂(ii) a Data set D₁And a data set D₂Mixed composition of butterfly data sets

Wherein the butterfly image is I_iThe category label is y_i(ii) a The data set D contains N images of M butterflies in total, and the data set D is divided into a training set D_tAnd test set D_s(ii) a Training set D_tContaining N_tImage, test set D_sContaining N_sA frame of images;

step 2: constructing a backbone network: selecting the first 13 layers of the VGG-16 as a backbone network, wherein the backbone network consists of 5 convolution blocks; butterfly image I with color input for backbone network_i∈R^h×w×3，1<i<N_tWherein h and w respectively represent the height and width of the image, and 3 represents the number of channels of the image; location-aware feature maps with multi-channel output for networks

Wherein K₁Number of channels, H, representing a location-aware feature map₁And W₁The height and width of the feature map, respectively; the backbone network is represented as:

S_i＝f₀(θ₀,I_i),

and step 3: construction of antagonistic complementationLearning the network: the countervailing complementary learning network comprises two parallel branches A and B, wherein each branch comprises a feature extractor and a classifier; wherein the feature extractor and classifier of the A branch are respectively denoted as E_AAnd cls_AThe feature extractor and classifier of the B branch are denoted E_BAnd cls_B；

And

respectively with the target output vector y_iRespectively expressed as:

the total loss of the network is L ═ L_A+L_B；

And 5: network training: setting super parameters such as iteration times, learning rate and the like, and setting a training set D_tInputting a network, iteratively updating network parameters by using a random gradient descent algorithm until loss is converged, and storing a final model;

step 6: network testing: loading the saved model, and testing the set D_sInputting the data into a network to obtain the accuracy of classification; inputting a single test image I_i∈R^h×w×3Obtaining the location map of the A branch

And location map of branch B

And drawing a rectangular frame on the image according to the positioning diagram to obtain the position of the butterfly target in the image.

2. The method for detecting a weakly supervised butterfly target based on antagonistic complementary learning as claimed in claim 1, wherein: step 3 comprises the following steps, step 3.1: for the A branch, the branch first uses a feature extractor E_AExtracting features and acquiring a category activation graph; then uses the classifier cls_AClassifying; wherein the feature extractor is a three-layer convolutional neural network, and the input of the convolutional neural network is the output S of the backbone network_iOutputting the class target map

The target graph shows a unique distinguished region of the target class; will be provided with

Normalized to [0,1 ]]And is defined as

The positioning diagram of the branch is obtained; classifier cls_AComprises a global average pooling layer and a soft maximum output layer, and the input is

Output as classification result

The whole branch is specifically represented as:

wherein f is_A(. a) and

parameters of the A branch classifier;

step 3.2: erasing feature maps using a feature eraser Era

The most discriminative region: assuming the threshold is δ, the most discriminative region can be represented as

The Area where the Area is located is positioned in a characteristic diagram S_iSetting the middle value to be 0, and generating a feature diagram after erasing

Namely, it is

Step 3.3: for the B branch, the structure is basically the same as that of the A branch; the branch also first uses a feature extractor E_BExtracting features, obtaining class activation map, and using classifier cls_BClassifying; wherein, the feature extractor is also a three-layer convolution neural network, and the input is the erased feature map