CN116797818B

CN116797818B - Feature enhancement loss method and system for target detection and image classification

Info

Publication number: CN116797818B
Application number: CN202310424606.7A
Authority: CN
Inventors: 杨贤昭; 刘雄彪; 赵帅通; 刘震
Original assignee: Wuhan University of Science and Engineering WUSE
Current assignee: Wuhan University of Science and Engineering WUSE
Priority date: 2023-04-19
Filing date: 2023-04-19
Publication date: 2024-04-19
Anticipated expiration: 2043-04-19
Also published as: CN116797818A

Abstract

The invention belongs to the technical field of target detection, and discloses a feature enhancement loss method and a system for target detection and image classification, wherein a channel attention mechanism is added in a backbone network; calculating the average value of the sum of squares of the differences between all the attention weights and the maximum value of the attention weights, and further calculating the total characteristic enhancement loss; adding a characteristic enhancement loss function into the loss function of the model, and optimizing the characteristic enhancement loss function in the training process of the model to enable the model to learn more important characteristics; a data set is prepared, parameter values are set, and training of the model is started. According to the invention, the network learns more important features through the loss function, and the feature representation is enhanced; the feature enhancement loss function will allow the network to learn more and more important features, thereby reducing this dispersion; the addition of the "feature enhancement loss function" to the model does not increase the complexity of the model, but only achieves a model performance improvement with negligible computational overhead during the training phase of the model.

Description

Feature enhancement loss method and system for target detection and image classification

Technical Field

The invention belongs to the technical field of target detection, and particularly relates to a feature enhancement loss method and system for target detection and image classification.

Background

Currently, object detection is one of the most basic and challenging key tasks in computer vision research, which attempts to find object instances of predefined categories from natural images. Deep learning is a powerful feature extraction tool through which feature representations can be learned directly from data, and is a major breakthrough in the field of target detection due to its powerful feature learning capability. The target detection algorithm based on deep learning mainly has two types: one is a single-stage target detection algorithm, and the other is a two-stage target detection algorithm.

The two-stage target detection algorithm needs to generate candidate areas first, and then send the candidate areas into a neural network for classification and positioning. The R-CNN algorithm proposed by Ross Girshick et al first extracts about 2000 candidate regions from the input image by a selective search algorithm, then extracts features from the candidate regions using AlexNet network, and finally determines the class and location of the object by SVM classifier and regressor. Compared with the traditional target detection algorithm, the R-CNN algorithm has a great improvement in accuracy, but the R-CNN algorithm requires an input image to have a fixed size. In response to this problem, he et al propose SPPNet an algorithm that samples the feature map generated by the convolution layer to a specified size by inserting a sampling layer, called a spatial pooling pyramid layer, in front of the fully connected layer so that the dimension of the feature vector is independent of the input image size, increasing the robustness of the network model. And SPPNet algorithm firstly performs feature extraction and then generates candidate regions, so that the efficiency is improved by 24-102 times and the precision is further improved compared with the method of firstly generating regions of R-CNN algorithm and then performing feature extraction. Ross Girshick et al inspired by SPPNet, proposed Fast R-CNN networks that feature VGG-16 to extract networks that improve the accuracy of PASCAL VOC 2007 to 70.0%. Whether SPPNet or Fast R-CNN relies on a region suggestion algorithm to infer the target location, ren et al introduced a region suggestion network RPN in the proposed Fast R-CNN algorithm. The RPN network shares the complete convolution characteristic with the detection network, realizes the cost-free regional suggestion, achieves 73.2% precision on the PASCAL VOC 2007 data set, and improves the detection speed from 3 frames per second to 7 frames per second of Fast R-CNN.

The generation stage of the candidate region is omitted in the single-stage target detection algorithm, and the boundary box and the class probability can be predicted from the complete image only by one evaluation. Redomn et al in 2016 proposed a YOLOv1 object detection algorithm that achieved a surprisingly high detection rate of 45 frames per second. Later, liu et al propose an SSD destination detection algorithm based on YOLOv's 1 that discretizes the output space of the bounding box into a set of a priori boxes of different aspect ratios and dimensions. The accuracy is improved to 76.8% on the pasal VOC 2007 dataset and the detection speed is improved to 59 frames per second. Redomn et al improve on the shortcomings of the YOLOv algorithm, propose YOLOv algorithm, achieve 78.6% accuracy on the pasal VOC 2007, then sequentially propose YOLOv algorithm, further improve detection accuracy and speed, and besides many excellent single-stage target detection algorithms such as RETINANET, EFFICIENTDET and the like.

Image classification is also an important task in computer vision neighborhoods, whose goal is to categorize images into different categories. It forms the basis for other computer vision tasks. Early image classification methods were based mainly on traditional machine learning algorithms, such as support vector machines, random forests, etc. These methods first extract the hand-made features from the image using feature descriptors and take them as input to a trainable classifier. Therefore, the classification effect on large-scale and high-dimensional image data is poor, and the practical application requirements are difficult to achieve. With the development of deep learning technology, convolutional Neural Networks (CNNs) become the dominant algorithm for image classification. CNNs have the ability to automatically extract features, which can avoid the problem of manually designing features in conventional methods. After AlexNet, a series of improved algorithms, including VGG, googLeNet, resNet, etc., have emerged. These algorithms vary in depth, width, number of parameters, etc. of the network, but achieve good performance. The ResNet has a great contribution to solving the problem of deep network gradient disappearance, and obtains the best results in the image net competition, wherein the best results are known as AlexNet, and the best results in the 2012 image net image classification competition are overwhelmed, so that CNN is widely applied to image classification. In addition, in order to improve the generalization capability and effect of image classification, many techniques and methods have also emerged, such as transfer learning, data enhancement, regularization, network pruning, and the like. The techniques can effectively reduce the problem of overfitting and improve the accuracy and generalization capability of classification.

Attention mechanisms have an important role in improving the accuracy of object detection and image classification, where the insertion of channel attention modules into convolution blocks has led to many studies, showing great potential in terms of performance improvement. One representative approach is a squeeze and stimulus network (SENet) that learns the channel attention of each convolution block, resulting in significant performance gains for various deep CNN architectures. After SENet, some studies have improved the SE block by capturing more complex channel correlations or combining additional spatial attention, some of which have reduced the complexity of the model by various methods.

The development of channel attention can be roughly divided into two directions: 1) Enhancement feature aggregation; 2) Effective channel attention is learned with low model complexity. In particular, CBAM uses the average pool and the maximum pool to aggregate features in terms of enhanced feature aggregation. GSoP embody the channel-to-channel relationship in the form of covariance. GE explores spatial expansion using deep convolution to aggregate features. In terms of reducing model complexity, GCNet shares similar principles with non-local neural networks, a simplified NL network was developed and integrated with SE blocks to form lightweight modules for long-term dependencies. GCTNet proposes a channel normalization layer to reduce parameters and computation to reduce the complexity of the model, which ECA reduces by considering only the direct interactions between each channel and its k-neighbors, not the indirect correspondence.

Through the above analysis, the problems and defects existing in the prior art are as follows:

(1) In the existing target detection method, the characteristic representation capability of the network is poor.

(2) The image classification method based on the deep learning algorithm has poor classification effect on large-scale and high-dimensional image data, and is difficult to meet the requirement of practical application.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention provides a feature enhancement loss method and a feature enhancement loss system for target detection and image classification.

The present invention is achieved by a feature enhancement loss method for object detection and image classification, the feature enhancement loss method for object detection and image classification comprising: analyzing the channel attention module in SENet, giving input features to the channel attention module in SENet; using global averaging pooling for each channel with SE blocks and connecting two fully connected layers with nonlinearity; the attention weight of each channel is generated by the Sigmoid function, and the attention weight generated by the SE module is utilized to realize the feature enhancement loss of target detection and image classification.

Further, the feature enhancement loss method for object detection and image classification includes the steps of:

Step one, adding a channel attention mechanism into a backbone network;

Calculating the average value of the sum of squares of differences between all the attention weights and the maximum value of the attention weights of the channel attention module by using the attention weights generated by the channel attention module, so as to calculate the total characteristic enhancement loss;

Step three, adding a characteristic enhancement loss function into the loss function of the model, and optimizing the characteristic enhancement loss function in the training process of the model to enable the model to learn more important characteristics;

step four, preparing a data set, setting parameter values and starting training a model.

Further, in step one, a channel attention mechanism is added into the backbone network, and the interdependence among modeling channels is displayed by using the channel attention mechanism, so that the channel characteristic response is adaptively calibrated.

Further, let the output of the convolution block beWherein W, H and C are width, height and channel dimension, and channel dimension is the number of filters, then the calculation formula of channel attention weight s is:

s＝Fse(X,θ)＝σ(W2δ(W1GAP(X)))；

X＝sX；

wherein, Represents the global average pool of channels, delta represents the ReLU function,

Further, in the second step, the single-channel attention module feature enhancement loss calculation formula is:

Where smax represents the largest weight of the attention weights, s _i represents the weight of the ith channel of the attention weights, and c represents the total number of channels of the feature.

Further, in step three, the total loss function expression is:

LossFunction＝αLorigin+(1-α)Lfeature-augmentation；

wherein Lorigin functions are original loss functions and Lfeature-augmentation functions are characteristic enhancement loss functions; alpha is a balance factor for balancing the relationship between the original loss function and the feature enhancement loss function; n is the total number of add channel attention modules, Is the value of s ² in the ith channel attention module.

It is another object of the present invention to provide a feature enhancement loss system for object detection and image classification, comprising:

The channel attention mechanism module is used for adding a channel attention mechanism into the backbone network;

The feature enhancement loss calculation module is used for calculating the total feature enhancement loss by calculating the average value of the sum of squares of the differences between all the attention weights and the maximum value of the attention weights of the attention module of a certain channel;

the feature enhancement loss optimization module is used for adding a feature enhancement loss function into the loss function of the model, and optimizing the feature enhancement loss function in the training process so that the model learns more important features;

It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the feature enhancement loss method for object detection and image classification.

It is a further object of the present invention to provide a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the feature enhancement loss method for object detection and image classification.

Another object of the present invention is to provide an information data processing terminal for implementing the feature enhancement loss system for object detection and image classification.

In combination with the technical scheme and the technical problems to be solved, the technical scheme to be protected has the following advantages and positive effects:

Firstly, the feature enhancement loss method for object detection and image classification provided by the invention firstly analyzes a channel attention module in SENet, gives input features to the channel attention module in SENet, uses global average pooling for each channel independently by an SE block firstly, then connects two complete connection layers with nonlinearity, and finally generates attention weight of each channel by a Sigmoid function; after analyzing the attention weights generated by the SE blocks, the attention weight of each channel is found to be quite discrete, as shown in fig. 2A-2F, which shows that the importance degree of the features learned by the trained network is quite different.

Simulation experiment results show that the characteristic enhancement loss method of the invention improves the Top 1 accuracy by 2.07% compared with the original method, and improves the Top 1 accuracy by 0.43% compared with 76.74% of the deep ResNet-101; when ResNet-101 was used as the backbone network, the network trained with the feature enhancement loss function achieved 78.03% more Top 1 accuracy than 76.74% and 77.51% more Top 1 accuracy achieved with the classification models trained with ResNet-101 and SE-ResNet-101, respectively, by 1.29% and 0.52% without increasing the complexity of the network model; when ResNet-152 is used as a backbone network, the network trained with the feature enhancement loss function achieves 78.82% Top 1 accuracy, which is 1.34% and 0.48% higher than Top 1 accuracy of classification models trained with ResNet-152 and SE-ResNet-152, respectively, without increasing the complexity of the network model; comparing test results using different backbone networks, the method of the invention is superior to other methods; by adding additional penalty terms to the original loss function, the complexity of the model remains unchanged.

After the SE module is added, the detection precision of the original model is obviously improved, and the SE module improves the Average Precision (AP) index of ResNet-50 from 37.1% to 38.6% (1.5%), and ResNet-101 from 39.9% to 41.1% (1.2%). The AP50 and the AP75 indexes are improved; the model trained by the characteristic enhancement loss function is respectively improved by 2.1% and 1.7% compared with the original ResNet AP index. From the experimental results, the detection precision of the object detector can be obviously improved by training the characteristic enhancement loss function. Calculating the s ² values for their attention weights, the s ² trained using the feature enhancement loss function is smaller than the SENet network, indicating that the network trained using the feature enhancement loss function can learn more important features.

Second, the large dispersion of attention weights indicates that the important degree of the features learned by the model is different, and the network learns some less important features, so the invention provides a new loss function, namely a feature enhancement loss function, through which the network learns more important features and enhances the feature representation. The feature enhancement loss function of the present invention will enable the network to learn more important features, thereby reducing this dispersion; meanwhile, the feature enhancement loss function is added to the model, so that the complexity of the model is not increased, and the improvement of the model performance is obtained only in the training stage of the model with negligible calculation cost.

Thirdly, as inventive supplementary evidence of the claims of the present invention, the following important aspects are also presented:

the expected benefits and commercial values after the technical scheme of the invention is converted are as follows:

The technical scheme of the invention can be applied to the fields of safety protection, unmanned driving, retail industry, medical health, industrial manufacturing and the like. In the field of security and protection, the accuracy rate of target identity recognition is improved when suspicious personnel are compared; accurately detecting events such as fire smoke, regional invasion and the like; the security monitoring strength of household burglary prevention, public places and generation environments is enhanced; in the unmanned neighborhood, the safety and driving experience of the vehicle are improved. The life and property safety of people is ensured. In the retail industry, the functions of intelligent goods shelves, intelligent payment and the like can be realized, and the service efficiency and profit margin of the retail industry are improved; in the field of medical health, the method can be used for medical image analysis, medical diagnosis, health monitoring and other aspects, and improves medical efficiency and accuracy; in the aspect of industrial manufacturing, the method can be used for quality control, automatic production, logistics management and the like, the efficiency and quality of industrial manufacturing are improved, and the cost is reduced, so that the profit and competitiveness of enterprises are increased.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a feature enhancement loss method for object detection and image classification provided by an embodiment of the present invention;

FIG. 2A is an excitation-induced activation in the SE-ResNet-50 SE_2_3 module on an ImageNet, designated as 'SE staged blockID', provided by an embodiment of the present invention;

FIG. 2B is an excitation-induced activation in the SE-ResNet-50 SE_3_4 module on an ImageNet, designated as 'SE staged blockID', provided by an embodiment of the present invention;

FIG. 2C is an excitation-induced activation in the SE-ResNet-50 SE_4_6 module on an ImageNet, designated as 'SE staged blockID', provided by an embodiment of the present invention;

FIG. 2D is an excitation-induced activation in the SE-ResNet-50 SE_5_1 module on an ImageNet, designated as 'SE staged blockID', provided by an embodiment of the present invention;

FIG. 2E is an excitation-induced activation in the SE-ResNet-50 SE_5_2 module on an ImageNet, designated as 'SE staged blockID', provided by an embodiment of the present invention;

FIG. 2F is an excitation-induced activation in the SE-ResNet-50 SE_5_3 module on an ImageNet, designated as 'SE staged blockID', provided by an embodiment of the present invention;

fig. 3 is a TOP 1 accuracy verification graph on ImageNet-1K provided by an embodiment of the present invention;

FIG. 4A is an excitation-induced activation in the SE-ResNet-50 SE_2_3 module on an ImageNet trained using a feature enhanced loss function provided by an embodiment of the present invention, the module being designated as 'SE staged blockID';

FIG. 4B is an excitation-induced activation in the SE-ResNet-50 SE_3_4 module on an ImageNet trained using feature enhanced loss functions, the module being designated as 'SE staged blockID', provided by an embodiment of the present invention;

FIG. 4C is an excitation-induced activation in the SE-ResNet-50 SE_4_6 module on an ImageNet trained using feature enhanced loss functions provided by an embodiment of the present invention, the module being named 'SE staged blockID';

Fig. 4D is an excitation-induced activation in the SE-ResNet-50 se_5_3 module, named 'SE staged blockID', trained on ImageNet using feature-enhanced loss functions provided by an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

In view of the problems existing in the prior art, the present invention provides a feature enhancement loss method and system for object detection and image classification, and the present invention is described in detail below with reference to the accompanying drawings.

Term interpretation: object detection is one of the most basic and challenging critical tasks in computer vision, whose purpose is to detect and locate a specific object from an image; image classification is an important research direction in the computer vision neighborhood, whose purpose is to classify input images into different predefined categories.

As shown in fig. 1, the feature enhancement loss method for object detection and image classification provided by the embodiment of the invention includes the following steps:

S101, adding a channel attention mechanism into a backbone network;

s102, calculating the average value of the sum of squares of the differences between all the attention weights and the maximum value of the attention weights of the attention modules of the single channel, and further calculating the total characteristic enhancement loss;

S103, adding a characteristic enhancement loss function into the loss function of the model, and optimizing the characteristic enhancement loss function in the training process of the model to enable the model to learn more important characteristics;

S104, preparing a data set, setting parameter values, and starting training a model.

The feature enhancement loss method for object detection and image classification provided by the embodiment of the invention comprises the steps of firstly analyzing a channel attention module in SENet, giving input features to the channel attention module in SENet, firstly independently pooling global average for each channel by an SE block, then connecting two complete connecting layers with nonlinearity, and finally generating the attention weight of each channel by a Sigmoid function; the large degree of dispersion of the attention weights of each channel was found after analysis of the attention weights generated by the SE blocks (see fig. 2A-2F), which illustrates the large differences in the importance of the features learned by the trained network.

As a preferred embodiment, the feature enhancement loss method for object detection and image classification provided by the embodiment of the present invention specifically includes the following steps:

step 1: first, a channel attention mechanism is added to the backbone network, and the channel attention mechanism can display interdependence among modeling channels, so as to adaptively calibrate channel characteristic response. Let the output of a convolution block be Wherein W, H and C are width, height, and channel dimensions (i.e., number of filters), the channel attention weight s is calculated as shown in equation (1):

s＝Fse(X,θ)＝σ(W2δ(W1GAP(X))) (1)

X＝sX (2)

wherein, Is a global average pool of channels, delta represents the ReLU function,

Step 2: calculating the average value of the sum of squares of the differences between all the attention weights of the layer and the maximum value of the attention weights of the layer, and calculating the layer characteristic enhancement loss, as shown in the formula (3):

Where smax is the largest weight in the layer's attention weights, si represents the weight of the ith channel in a layer's attention weights, and c represents the total number of channels for a layer's features.

Step 3: and adding a characteristic enhancement loss function into the loss function of the model, and optimizing the characteristic enhancement loss function in the training process of the model to enable the model to learn more important characteristics.

Wherein the total loss function is shown in formula (4):

LossFunction＝αLorigin+(1-α)Lfeature-augmentation (4)

In equation (4), lorigin functions are original loss functions, lfeature-augmentation functions are feature enhancement functions, and α is a balance factor for balancing the two relationships. In formula (5), n is the total number of added channel attention modules. Is the value of s ² in the ith channel attention module.

Step 4: a data set is prepared, parameter values are set, and training of the model is started.

The feature enhancement loss system for object detection and image classification provided by the embodiment of the invention comprises the following components:

The characteristic enhancement loss calculation module is used for calculating the total characteristic enhancement loss by calculating the average value of the sum of squares of the differences between all the attention weights of the single channel attention module and the maximum value of the attention weights of the layer;

The feature enhancement loss optimization module is used for adding a feature enhancement loss function into the loss function of the model, and the model learns more important features by optimizing the feature enhancement loss function in the training process;

As a main application of the embodiment of the invention, an end-to-end detector for plant diseases and insect pests: and (3) data collection: and acquiring enough plant disease and pest image data and marking the image data.

As a main application of the embodiment of the invention, the construction of a network model: and setting up a classification model by taking ResNet as a backbone network, adding a channel attention module in the ResNet network, calculating the characteristic enhancement loss of a single channel attention module, further acquiring the total characteristic enhancement loss, and adding the total characteristic enhancement loss into the original loss.

As a main application of the embodiment of the present invention, model training: and sending the marked plant disease and insect pest picture data into a classification model for training, and selecting a trained optimal model through related indexes.

As a main application of the embodiment of the present invention, model migration: and transplanting the trained model to a mobile terminal for detection.

The embodiment of the invention has a great advantage in the research and development or use process, and has the following description in combination with data, charts and the like of the test process.

1. Details of implementation

Using ResNet-50, resNet-101, and ResNet-152 as backbone models, the effectiveness of the feature enhancement loss function on ImageNet classification was evaluated. The invention uses the same optimization scheme for training, including randomly cropping the input image 224 x 224 and horizontally flipping when ResNet is used as the backbone network. The network parameters were optimized using a random gradient descent method (SGD), with a weight decay of 5e-4 and a momentum of 0.9. All models were trained for 200 durations with an initial learning rate of 0.1, 10 coefficients per 60 duration reduction, implemented in PyTorch.

The MS COCO 2017 dataset was evaluated using FASTER RCNN, resNet-50 and ResNet-101 as backbone models, and all detectors were implemented using PyTorch toolkit. Specifically, for the MS COCO dataset, the short side of the input image was set to 800, all models were optimized using a gradient descent algorithm, the weight decay was 1e-4, and the momentum was 0.9. The learning rate was initially set to 0.01 and reduced by a factor of 10 after 8 and 11 iterations, respectively.

The channel attention module is an SE module, and the specific implementation steps are as follows:

(1) Adding a channel attention module in different backbone networks;

(2) Calculating an s ² value of the attention of each channel;

(3) Adding a feature enhancement loss to the model;

(4) And training a model.

2. Evaluation index

The TOP1 accuracy and TOP5 accuracy indexes are adopted in the image classification task, wherein the TOP1 accuracy is that the highest probability is taken as a prediction result in the model prediction category, if the largest prediction classification result is correct, the highest probability is incorrect, and if the highest probability is incorrect, the highest probability is incorrect. The TOP5 accuracy is that the first five of the model prediction categories with the highest probability are selected, if one prediction is correct in the five prediction categories, the prediction classification result is correct, and if all the five prediction classifications are incorrect, the prediction classification result is incorrect.

The AP50, the AP75, and the AP are used as evaluation indexes in the target detection. The AP50 refers to a ratio of the number of correctly detected targets to the number of all real targets when IoU of the bounding box output by the target detection algorithm and the real bounding box is equal to or greater than 0.5, and then averages the ratio. The calculation of AP75 is similar except that IoU threshold is raised to 0.75.AP is the average of all IoU average accuracies at the threshold.

3. Image classification

The effectiveness of the feature enhancement loss function was evaluated on ImageNet datasets using different ResNet skeletons, the results are shown in table 1. For ResNet-50, the invention takes ResNet-50 as a backbone network, uses SE-ResNet-50 classification models and adds a classification model of a characteristic enhancement loss function in SE-ResNet-50, and trains three classification models. As can be seen from FIG. 3, the method of the invention improves the Top 1 accuracy by 2.07% compared with the original method, and improves the Top 1 accuracy by 0.43% compared with 76.74% of the deep ResNet-101. When ResNet-101 was used as the backbone network, the network trained with the feature-enhanced loss function achieved 78.03% more Top 1 accuracy than 76.74% and 77.51% more Top 1 accuracy, respectively, achieved with the classification models trained with ResNet-101 and SE-ResNet-101, without increasing the complexity of the network model. Also, when ResNet-152 was used as the backbone network, the network trained with the feature-enhanced loss function achieved 78.82% more Top 1 accuracy than the classification models trained with ResNet-152 and SE-ResNet-152 by 1.34% and 0.48% respectively, without increasing the complexity of the network model. Comparing test results using different backbone networks, the method of the present invention is superior to other methods. By adding additional penalty terms to the original loss function, the complexity of the model remains unchanged. The TOP-1 accuracy change curve of the model during training is shown in FIG. 3.

Table 1 test results on ImageNet dataset

4. Target detection

The characteristic enhancement loss function proposed by the present invention was evaluated using the Faster R-CNN object detection algorithm. A neural network ResNet backbone network of different depths is used.

The fast R-CNN target detection model was used, resNet-50 and ResNet-101 as backbone models, and the detection results were evaluated on the MS COCO 2017 dataset. As shown in Table 2, the detection accuracy of the original model is remarkably improved by adding the SE module, and the SE module improves the average Accuracy (AP) index of ResNet-50 from 37.1% to 38.6% (1.5% improvement), and ResNet-101 from 39.9% to 41.1% (1.2% improvement). Both the AP50 and AP75 metrics are improved. The model trained with the feature enhancement loss function was improved by 2.1% and 1.7% over the original ResNet AP index, respectively. From the experimental results, the detection precision of the object detector can be obviously improved by training the characteristic enhancement loss function.

Table 2 results of detection on COCO 2017 dataset

Attention weight spread comparison on MS COCO 2017 dataset

The number of SE_2_3 channels is 256, the sampling interval is 5, the number of SE_3_4 channels is 512, the sampling interval is 10, the number of SE_4_6 channels is 1024, the sampling interval is 15, the number of SE_5_3 channels is 2048, and the sampling interval is 20. Calculating the s ² values for their attention weights, it can be seen from fig. 4A-4D that s ² trained using the feature enhancement loss function is smaller than for the SENet network, indicating that the network trained using the feature enhancement loss function can learn more important features.

It should be noted that the embodiments of the present invention can be realized in hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or special purpose design hardware. Those of ordinary skill in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such as provided on a carrier medium such as a magnetic disk, CD or DVD-ROM, a programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The device of the present invention and its modules may be implemented by hardware circuitry, such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., as well as software executed by various types of processors, or by a combination of the above hardware circuitry and software, such as firmware.

The foregoing is merely illustrative of specific embodiments of the present invention, and the scope of the invention is not limited thereto, but any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention will be apparent to those skilled in the art within the scope of the present invention.

Claims

1. A feature enhancement loss method for object detection and image classification, characterized in that the feature enhancement loss method for object detection and image classification comprises: analyzing the channel attention module in SENet, giving input features to the channel attention module in SENet; using global averaging pooling for each channel with SE blocks and connecting two fully connected layers with nonlinearity; generating the attention weight of each channel by using the Sigmoid function, and utilizing the attention weight generated by the SE module to further realize the feature enhancement loss of target detection and image classification;

the feature enhancement loss method for object detection and image classification comprises the following steps:

Step one, adding a channel attention mechanism into a backbone network;

step four, preparing a data set, setting parameter values and starting training a model;

in the first step, a channel attention mechanism is added into the backbone network, and the interdependence among modeling channels is displayed by using the channel attention mechanism, so that the channel characteristic response is adaptively calibrated.

2. The feature enhancement loss method for object detection and image classification according to claim 1, wherein the output of the convolution block is set to beWherein W, H and C are width, height and channel dimension, and channel dimension is the number of filters, then the calculation formula of channel attention weight s is:

s＝Fse(X,θ)＝σ(W2δ(W1GAP(X)))；

X＝sX；

3. The feature enhancement loss method for object detection and image classification according to claim 1, wherein in step two, the single channel attention module feature enhancement loss calculation formula is:

4. The feature enhancement loss method for object detection and image classification of claim 1, wherein in step three, the total loss function expression is:

LossFunction＝αLorigin+(1-α)Lfeature-augmentation；

5. A feature enhancement loss system for object detection and image classification applying the feature enhancement loss method for object detection and image classification as claimed in any one of claims 1 to 4, characterized in that the feature enhancement loss system for object detection and image classification comprises:

The feature enhancement loss calculation module is used for calculating the total feature enhancement loss by calculating the average value of the sum of squares of the differences between all the attention weights and the maximum value of the attention weights of the single channel attention module;

The feature enhancement loss optimization module is used for adding a feature enhancement loss function into the loss function of the model, and the model learns more important features by optimizing the feature enhancement loss function in the training process.

6. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the feature enhancement loss method for object detection and image classification as claimed in any one of claims 1 to 4.

7. A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the feature enhancement loss method for object detection and image classification as claimed in any one of claims 1 to 4.

8. An information data processing terminal for implementing the feature enhancement loss system for object detection and image classification as set forth in claim 5.