CN113536896A

CN113536896A - Small target detection method, device and storage medium based on improved fast RCNN

Info

Publication number: CN113536896A
Application number: CN202110593538.8A
Authority: CN
Inventors: 李乾; 张明; 余志强; 孙晓云; 刘保安; 韩广; 郑海清; 戎士敏; 药炜
Original assignee: State Grid Corp of China SGCC; Shijiazhuang Tiedao University; Shijiazhuang Power Supply Co of State Grid Hebei Electric Power Co Ltd; Taiyuan Power Supply Co of State Grid Shanxi Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; Shijiazhuang Tiedao University; Shijiazhuang Power Supply Co of State Grid Hebei Electric Power Co Ltd; Taiyuan Power Supply Co of State Grid Shanxi Electric Power Co Ltd
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2021-10-22
Anticipated expiration: 2041-05-28
Also published as: CN113536896B

Abstract

The invention relates to a small target detection method based on improved Faser RCNN, which is realized by a processor executing an improved Faser RCNN algorithm instruction and comprises the following steps: receiving a scene picture containing a small target and extracting a first feature map F; according to the first characteristic diagram F₁Obtaining a prediction anchor frame a (x, y, w, h); according to the first characteristic diagram F₁Obtaining the first feature map F from the prediction anchor frame a (x, y, w, h)₁Second feature pattern F with same size₂(ii) a According to the second characteristic diagram F₂And the prediction anchor block a (x, y, w, h) obtains the fieldAnd (5) detecting the scene picture. The invention modifies the frame based on the fast RCNN algorithm, replaces the RPN network under the frame with the self-adaptive anchor point frame network, so that the generated anchor point frame can be matched with targets with different scales better, thereby avoiding the detection omission caused by the unreasonable size of the anchor point frame and improving the detection accuracy.

Description

Small target detection method, device and storage medium based on improved fast RCNN

Technical Field

The invention relates to the field of target identification and recognition, in particular to a small target detection method based on improved fast RCNN. Meanwhile, the invention also relates to a small target detection device and a storage medium based on the improved fast RCNN.

Background

Object detection, also called object extraction, is an image segmentation based on object geometry and statistical features. The method combines the segmentation and the identification of the target into a whole, and the accuracy and the real-time performance of the method are important capabilities of the whole system.

The target detection is a hot direction of computer vision and digital image processing, is widely applied to various fields of robot navigation, intelligent video monitoring, industrial detection, aerospace and the like, reduces the consumption of human capital through the computer vision, and has important practical significance. Therefore, target detection has become a research hotspot of theory and application in recent years. The rapid development of machine learning, especially deep learning, in recent years provides the possibility of achieving low cost and high efficiency of target detection.

The currently excellent deep learning models can be roughly divided into two types: the first type belongs to a two-stage target detection algorithm (two-stage), such as R-CN, SPP-Net, Fast-RCNN and the like, which firstly extracts target information from a region candidate frame (RPN) of a target image, and then predicts the position and identifies the type of a target in the candidate frame by using a detection network; the second category belongs to a one-stage (one-stage) target detection algorithm, such as SSD, YOLO, etc., which does not need to establish an RPN network, but directly performs target prediction and class identification on an image.

However, in real applications, the detection effect of small targets is often far less than that of large targets and medium targets. This is due to two problems with small target detection: information quantity is deficient, namely the target occupies a very small proportion in an image, and the information quantity reflected by pixels in a corresponding area is very limited; and secondly, the data volume is scarce, namely, the data set contains few images of small targets, so that the category of the whole training set is unbalanced, and the accuracy rate of detecting the small target objects is far lower than that of medium and large objects. At present, the following methods are provided for the problem of poor detection effect of small targets:

first, image data is amplified, and the image is enlarged to enlarge a small target. However, the method is simple and rough, complex in operation, too large in calculation amount and not strong in practical significance.

Secondly, amplifying and detecting the small target by using a GAN model, wherein the method is consistent with the image data amplification idea, but has the defect of complex operation.

And thirdly, modifying parameters of model training, such as setting parameters stride to be 1, but the effect of the method is also common.

CN 111985540A discloses a method for improving small target detection rate based on oversampling fast-RCNN, relating to the field of target identification and comprising the following steps: acquiring a target picture data set, and dividing the data set into a training set and a test set; step 2: obtaining a subset of the training set as an oversampling set according to the training set in the step 1; and step 3: constructing a fast-RCNN model; and 4, step 4: training the master-RCNN model by using a training set and an oversampling set; and 5: testing the trained faster-RCNN model by using a test set, modifying parameters if the test result is lower than the average accuracy rate AP threshold, and testing the training result after the step 4 is carried out again until the test result reaches the AP threshold; step 6: inputting a picture to be detected, and detecting the small target by using the trained fast-RCNN model. The invention improves the detection rate of the small target by oversampling the small target.

CN 111986160A discloses a method for improving small target detection effect based on fast-RCNN, belonging to the field of target detection. The invention comprises the following steps: acquiring a data set, and dividing the data set into a training set and a test set according to a corresponding proportion; constructing a fast-RCNN model; training the model by using a training set, and during training, if the loss value of a small target reaches a preset condition in the nth iteration, reducing a plurality of pictures in the (n + 1) th iteration, splicing the pictures into the size of an original image, and then training; after training, testing the model by using the test set to obtain an AP value, if the AP value is smaller than a set threshold value, modifying corresponding parameters, and re-training until a small target AP value of the model reaches the set threshold value; and detecting the small target by using the trained model. The invention can make the distribution of the small targets more uniform, thereby improving the fullness of the small target training and further improving the detection precision of the small targets.

CN 111898668A discloses a small target object detection method based on deep learning, which can overcome the problems of insufficient detection efficiency and low accuracy in the existing small target object detection method. Firstly, extracting an image without a small target object based on a COCO data set, splicing after adjusting the size of the image, forming a new data set by the spliced image and the image with the small target object in the COCO data set, and carrying out image splicing according to the ratio of 4: 1, dividing a data set into a training set and a testing set; then, modifying the basic feature extraction network of the Faster-RCNN to perform feature fusion; then, selecting a candidate region of each level of fused features after fusion through an RPN; training the improved network by using a training set to obtain a training model; and finally, inputting the test set into the trained model for target detection.

CN 111368769A provides a ship multi-target detection method based on an improved anchor point frame generation model, which comprises the following steps: acquiring an SAR ship image; constructing a low-complexity network architecture, and putting an image into the low-complexity network to generate a feature mapping space; generating an initial anchor point frame by adopting a clustering method based on shape similarity; and generating a new candidate frame in a low-complexity feature space by adopting a sliding window mechanism on the basis of the generated initial anchor frame, and performing regression training on the candidate frame for multi-target detection of the ship. The invention solves the problems of low algorithm efficiency and detection quality caused by complex network and poor candidate frame quality, and has better accuracy. Due to the fact that the low-complexity network architecture is adopted for detection, from the aspect of statistical analysis, the larger the data acquisition amount is, the more the detection times are, and the better the detection effect is.

The target detection methods of the patent applications belong to a two-stage target detection algorithm based on fast-RCNN, and the detection precision of the small target is improved by oversampling the small target, improving the fullness of small target training, improving the model or model training process, and generating a new candidate frame in a low-complexity feature space by adopting a sliding window mechanism. The main steps of the fast-RCNN are as follows: first, the convolutional layer is used to extract feature maps (feature maps) of an input picture; secondly, the RPN network is used to generate regions (explosals); thirdly, the Roi Pooling layer extracts a region feature map (pro-spatial feature maps) according to feature maps and pro-spatial; and finally, the classification layer calculates the classification of the proxy according to the proxy feature maps, and performs border regression (bounding box regression) again to obtain the final accurate position of the detection box. However, if the explosals is not reasonable in set size, the problem of missing detection is easily caused, and particularly, the problem of missing detection of a small-scale target is easily caused under the condition that the sizes of two targets are greatly different, so that the detection accuracy is reduced.

Disclosure of Invention

The invention aims to provide a small target detection method based on improved Faser RCNN, which can generate an anchor point frame matched with the size of a target better, reduce the omission factor of the small-scale target and improve the detection accuracy.

The technical scheme provided by the invention is a small target detection method based on improved Faser RCNN, which is realized by a processor executing an instruction of an improved Faser RCNN algorithm, and comprises the following steps: receiving a scene picture containing a small target; extracting a first feature map F of the scene picture by using a first volume module of the fast RCNN₁(ii) a Using the second convolution module of the fast RCNN to obtain the first feature map F₁Obtaining a central position a (x, y) and a size a (w, h) of a prediction anchor frame, and obtaining the prediction anchor frame a (x, y, w, h) according to the central position a (x, y) and the size a (w, h) of the prediction anchor frame; using the third convolution module of the fast RCNN to obtain the first feature map F₁Obtaining the first feature map F from the prediction anchor frame a (x, y, w, h)₁Second feature pattern F with same size₂(ii) a Using the fourth convolution module of the fast RCNN to obtain the second feature map F₂And the prediction anchor frame a (x, y, w, h) obtains the detection result of the scene picture.

Further, the main part of the first convolution module adopts a convolution structure of ResNet.

Further, the main part of the first convolution module adopts a convolution structure of ResNet 50.

Further, the convolution structure of the ResNet50 includes a multi-layered Deform ResNet50 residual block structure, and the Deform ResNet50 residual block structure has its second convolutional layer replaced with a depth separable convolutional layer.

Further, the first volume module packA channel attention mechanism module configured to obtain a feature weight S from the scene picture_c(ii) a The second convolution module is configured to convolve the first feature map F₁And the feature weight S_cObtaining the central position a (x, y) of the prediction anchor frame and obtaining the prediction anchor frame a (x, y, w, h) according to the central position a (x, y) of the prediction anchor frame.

Further, the trunk portion of the second convolution module adopts a self-adaptive anchor point frame network structure.

Further, the adaptive anchor frame network is based on the first profile F₁Obtaining a score feature map F_PThen according to said score feature map F_PObtaining the central position a (x, y) of the prediction anchor frame; and obtaining the prediction anchor frame a (x, y, w, h) according to the center position a (x, y) of the prediction anchor frame and the size a (w, h) of the prediction anchor frame.

Further, the adaptive anchor frame network comprises an adaptive adjustment module configured to adjust the adaptive adjustment module according to the predicted anchor frame a (x, y, w, h) and the first feature map F₁Obtaining an adaptive prediction anchor point frame a' (x, y, w, h); the third convolution module is configured to convolve the first feature map F₁Obtaining the adaptive prediction anchor point frame a' (x, y, w, h) and the first feature map F₁Second feature pattern F with same size₂。

Meanwhile, the invention also provides a small target detection device based on the improved fast RCNN, which comprises:

a processor; and

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the program instructions of the algorithm structure to implement the above-mentioned improved fast RCNN-based small object detection method.

In addition, the present invention also provides a computer storage medium for a small object detection method based on modified fast RCNN, wherein when instructions in the computer storage medium are executed by a processor of a small object detection device based on modified fast RCNN, the small object detection device based on modified fast RCNN is enabled to execute the small object detection method based on modified fast RCNN.

The invention mainly comprises four parts: the first part is a process of extracting features of the scene picture; the second part is a process of obtaining the shape of the prediction anchor frame according to the center point of the prediction anchor frame and finally generating a prediction anchor frame A (x, y, w, h); the third part is a process of generating a feature map with the same size as the anchor point frame A (x, y, w, h); the fourth part is a process of obtaining a scene picture detection result.

The invention has the beneficial effects that:

1. the invention replaces the RPN network under the frame of the Faser RCNN algorithm with the self-adaptive anchor point frame network by modifying the frame based on the frame of the Faser RCNN algorithm, so that the generated anchor point frame can be matched with targets with different scales better, thereby avoiding the phenomenon of missing detection caused by unreasonable set size of the anchor point frame and finally achieving the purpose of improving the detection accuracy.

2. The invention adopts an adaptive anchor frame network which mainly comprises two branches of shape prediction and position prediction, and generates an anchor frame by selecting a position with prediction probability higher than a certain threshold value and the most possible shape of each selected position through the two branches. Through foretell improvement mode, can obtain the anchor point frame of reasonable size, avoid causing the condition of louing to examine, improve the network and to the detection capability of target, effectual improvement detection performance. When the method is applied to insulator detection, the detection accuracy of the method is greatly improved by comparing the method with the existing insulator detection algorithm.

3. When the method is applied to insulator detection, the comparison between the method and the existing insulator detection algorithm shows that the method reduces the parameter quantity of the network by utilizing the depth separable convolution instead of the conventional convolution, and greatly improves the detection speed.

4. According to the method, the residual error network is improved in the feature extraction network, and a channel attention mechanism structure is added, so that the channels in the feature diagram are connected with each other and the important feature information in the channels is enhanced, on one hand, the method is more beneficial to the subsequent scene picture small target detection process, and on the other hand, the accuracy of the detection result can be improved.

Drawings

Fig. 1 is a schematic structural diagram of a residual block of form ResNet50 in a small target detection method based on improved fast RCNN in embodiment 1 of the present invention;

FIG. 2 is a schematic diagram illustrating the principle of deep separable convolution in a small target detection method based on the improved fast RCNN in embodiment 1 of the present invention;

fig. 3 is a schematic diagram of a network structure of an adaptive anchor point box in a small target detection method based on improved fast RCNN in embodiment 1 of the present invention;

fig. 4 is a network training flowchart of the improved fast RCNN in the improved fast RCNN-based small target detection method in embodiment 1 of the present invention;

FIG. 5 is a network framework diagram of the improved Faser RCNN in the improved Faser RCNN-based small object detection method in embodiment 1 of the present invention;

fig. 6 is a flowchart of the detection of insulator defects in embodiment 1 of the present invention;

fig. 7 is a diagram showing a defect detection result of a normal insulator in embodiment 2 of the present invention;

fig. 8 is a diagram showing a defect detection result of a defective insulator in embodiment 2 of the present invention.

Detailed Description

The insulator is an important component in an overhead transmission line and is used for supporting and fixing a bus and a live conductor and ensuring that the live conductor or the conductor has enough distance and insulation with the ground. Because the overhead transmission line is exposed in the natural environment for a long time and is influenced by natural or artificial factors, the problems of line aging, damage and the like exist, and if the problems are not regularly checked and overhauled, serious safety accidents can be caused. As known to those skilled in the art, the insulator defects include erosion, cracking, breaking, exposed core rod, etc., but the size difference between the insulator defects and the insulator is large. Therefore, in the insulator defect detection, the missing rate is relatively high.

The technical solution and the corresponding technical effects provided by the present invention will be described in detail below with reference to the accompanying drawings and taking insulator detection as an example.

Example 1

The embodiment provides a small target detection method based on improved fast RCNN, which is a method for extracting features of the scene picture, generating an anchor point frame based on the center position of the anchor point frame and finally obtaining a detection result. For convenience of understanding, the present embodiment describes a specific process through the following steps 100 to 600, the steps do not represent a time sequence in an actual implementation, and the implementation of changing the sequence is also a different embodiment of the present invention on the premise of implementing preparation conditions for implementing each step.

And step 100, preprocessing an original power transmission line image sample picture for learning, namely a scene picture.

In this embodiment, in the preprocessing, each sample picture is adjusted to the same size, and data enhancement is performed. In an exemplary embodiment, each sample picture is adjusted to 900 × 600 size by using a bilinear interpolation method, the data enhancement method used in training expands the data set by using a data enhancement method such as rotating, clipping, increasing contrast, and the like, and specific parameters of the data enhancement method are shown in table 1. This step obtains a scene picture dataset containing small objects. Specifically, in this embodiment, after the sample picture is expanded by the three RGB channel components, one scene picture is expressed by one tensor of 900 × 600 × 3.

TABLE 1 data enhancement mode

And manually labeling the obtained data set by using LabelImg software, respectively setting labels with the names of Insulator and defect for the Insulator, and making the labels into a VOC format for network detection training. And randomly selecting 80% of scene picture data sets as a training set for optimizing the network model, and selecting 20% of scene picture data sets as a test set for evaluating the model effect.

Step 200, constructing an improved fast RCNN (shown in FIG. 5) composed of a first convolution module, a second convolution module, a third convolution module, a fourth convolution module and a channel attention machine module. In this embodiment, the more specific structural arrangement and operation of the fast RCNN used in the present invention can be obtained from the following description of the operation principle thereof. The neural network construction of the present embodiment includes 4 main parts: the first part is a convolutional layer, the second part is an adaptive anchor point frame network, the third part is an ROI Pooling layer, and the fourth part is a classification and regression layer. The details are as follows

A first part: convolution layers (conv layers) for extracting the features of the picture, the input of which is a preprocessed tensor (224 × 3), and the output of which is the extracted features, which is simply called the first feature map F₁. In one embodiment of this section, we use the convolution structure of ResNet to perform feature extraction on the scene picture, so as to obtain a first feature map F₁. Preferably, the main part of the first convolution module of the first neural network adopts the convolution structure of ResNet50, i.e. the tensor (224 × 3) is input into the feature extraction layer of ResNet 50. The Resnet50 network is composed of several residual blocks, and the structure diagram is shown in Table 2.

TABLE 2 Resnet50 network architecture

Firstly, an input image is convoluted by 7x7 with 64 dimensions, the step size of a convolution kernel is 2, then the input image is downsampled by maximum pooling with the convolution kernel size of 3x3 and the step size of 2, finally a series of residual blocks are passed, global average pooling and 1000-dimension full-connection layers are carried out on the input image, and the input image is output to a softmax classifier for classification processing. The backbone network structure of the Resnet50 is not modified and will not be described herein. After feature extraction, a first feature map F is obtained₁。

As a preferred feature of this embodiment, a residual block of the Resnet50 network is modified, so as to form a Deform _ Resnet50 backbone feature extraction network. The convolution structure of the ResNet50 comprises a multilayer Deform ResNet50 residual block structure, each residual block structure comprises three convolution layers (1x1+3x3+1x1), the middle 3x3 convolution layer firstly reduces calculation under one dimension-reduced 1x1 convolution layer, and then reduces calculation under the other 1x1 convolution layer, so that the precision is maintained, and the calculation amount is reduced.

Exemplarily, one of the specific embodiments of the present embodiment is as follows: the Deform ResNet50 residual block structure has the second convolution layer (i.e. 3x3 convolution layer) replaced by a depth separable convolution layer, and the construction mode of the Deform _ Resnet50 residual block is shown in FIG. 1. The depth separable Convolution includes channel-wise Convolution (Depthwise Convolution) and point-wise Convolution (Pointwise Convolution). Firstly, generating C corresponding feature maps from the input C-channel images, and then recombining the generated feature maps to form a new feature map in a point-by-point convolution mode. The size of the convolution kernel of the point-by-point convolution is 1x1xM, and M is the number of channels of the previous layer. A depth separable convolution schematic is shown in fig. 2. Under the condition of the same input, the number of parameters of the depth separable convolution is greatly reduced, and the complexity of calculation is greatly reduced.

Exemplarily, another specific implementation manner of the present embodiment is: the first neural network includes a channel attention mechanism module in a first convolution module. Specifically, in this embodiment, a channel attention mechanism module is added to the branch portion (shown as the right branch in fig. 1) of the residual block, that is, the channel attention mechanism module is called when the tensor (112 × 3) is input to the feature extraction layer of the ResNet 50. Inputting an image with the size of W multiplied by H multiplied by C into a detection network, wherein W, H, C respectively represents the width, height and channel number of the image, firstly carrying out Global average pooling (Global pooling) on C channels of the image to play a role in reducing the dimension, converting the C channels into a vector of 1 multiplied by C, reducing the channel dimension of the vector by using 1 full connection layer (FC), then activating the function through a ReLU layer, then recovering the original dimension through an FC layer, and finally activating by using a Sigmoid function to obtain the characteristic weight. Then, the input image and the characteristic weight are subjected to multiplication operation:

F_scale(u_c,s_c)＝u_c×s_c

wherein: u. of_cFor inputting a characteristic map, s_cThe weights obtained for the firing operation.

Illustratively, the entire channel attention mechanism module may be expressed by the following equation:

wherein the content of the first and second substances,

δ is the activation function, Q₁、Q₂Represented as weight values for two fully connected layers.

A second part: and (4) self-adapting an anchor frame network.

Based on the probability formula P (x, y, w, h | F) ═ P (x, y | F) × P (w, h | x, y, F), it can be seen that the probabilities of affecting the prediction frames corresponding to different center point positions (x, y) are different, and are denoted as P (x, y | F). When the center point position (x, y) is determined, the probability of occurrence of different sized prediction boxes is different, denoted as P (w, h | x, y, F). Thus indicating that the influence of a set of anchor blocks is influenced by location and shape size factors. Therefore, the adaptive anchor block network of the present embodiment has two branches, a position prediction module and a shape prediction module.

A location prediction module: first feature map F obtained by first convolution module of first neural network₁The first characteristic diagram F₁It can be represented by four parameters (x, y, w, h), where (x, y) represents the coordinates of the center point of the prediction anchor block, and (w, h) represents the width and height of the shape of the prediction anchor block. Then, the first characteristic diagram F is used₁After a convolution operation of 1x1 and activation by an activation function Sigmoid, a corresponding score feature map F with the same scale is generated_pWherein each point represents a score for the presence of an object for each pixel point. Then the obtained product isThe threshold was experimentally found to be 0.6 compared to the score threshold τ. If F_pThe corresponding pixel of the point is represented as the center point of the target, and is marked as the center position a (x, y) of the anchor frame.

A shape prediction module: the center position a (x, y) of the anchor frame a (x, y, w, h) is determined by position prediction, and then the shape of the anchor frame a, i.e. a (w, h), needs to be determined by using the nearest real frame B' corresponding to the center position. Since it is difficult to obtain the value of a (w, h) by using a regression operation, a method of sampling w, h approximately is used. The w: h of the invention has three values of 0.5, 1.0 and 2.0. B ' (x ', y ', w ', h ') having the maximum cross-over ratio with a is obtained through the maximum cross-over ratio voiou of the anchor point frame a and a real frame B, and a (w, h) at this time, namely, the result output by the shape prediction branch is a (w, h). Since this range of numbers is widely unstable, the width and height of the prediction box need to be formulated: w ═ μ · S · e^dW，H＝μ·S·e^dHWherein S is the step size and μ is an empirical factor (generally, 8). Range the parameters to be learned from [1,1000]Adjusted to [ -1,1 [ ]]The aim of simplifying network training is achieved. And finally obtaining the anchor frame a (x, y, w, h) according to the center position a (x, y) of the anchor frame output by the shape prediction branch and the size a (w, h) of the anchor frame output by the position prediction branch.

As a more preferred implementation of this embodiment, the adaptive anchor frame network includes an adaptive adjustment module configured to adjust the adaptive anchor frame according to the anchor frame a (x, y, w, h) and the first feature map F₁The specific operation formula of the feature adaptation obtained from the adaptive anchor block a' (x, y, w, h) is as follows:

f_i'＝D(f_i，W_i，H_i)

wherein, by f_iIs that the generated ith anchor point frame is mapped on the input first feature map F₁The above eigenvalue, D (-) consists of a deformable convolution of 3x3, Offset field described in fig. 3. f. of_i' is the adjusted feature value, i.e. the feature and width and height of the ith position are processed. According to f_i' obtaining self-adaptationThe box a' (x, y, w, h) should be anchored. The network structure diagram of the adaptive anchor point frame is shown in FIG. 3, and the input is a first characteristic diagram F₁The output is the prediction anchor block a (x, y, w, h). In fig. 3, W × H × 1 and W × H × 2 are both the first feature map F₁Wherein W, H corresponds to the width and height of the first characteristic diagram F, and 1 and 2 represent the number of channels. As shown in fig. 3, a first characteristic diagram F₁Firstly, obtaining the central position a (x, y) of an anchor point frame through position positioning prediction, and secondly, obtaining a predicted anchor point frame a (x, y, w, h) through shape prediction; finally, an adaptive anchor frame a' (x, y, w, h) is obtained through adaptive adjustment of the features.

And a third part: ROI Pooling, which collects the input first feature map F₁And predicting anchor point frames a (x, y, w, h), and obtaining the first characteristic diagram F after integrating the information₁Second feature pattern F with same size₂Second characteristic diagram F₂The feature map is fixed in size and then sent to the subsequent full connection layer. Preferably, the layer collects the first profile F of the input₁And an adaptive anchor point frame a' (x, y, w, h), and after integrating the information, obtaining the first characteristic diagram F₁Second feature pattern F with same size₂，

The fourth part: classification and regression (Classification and regression), the input of this layer is the second feature map F₂The output is the bounding box of the target and the confidence in the defect class. The classification probability and Bounding box regression (Bounding box regression) are jointly trained by utilizing Softmax Loss and Smooth L1Loss, so as to obtain the Bounding box and defect class confidence of the target (as shown in fig. 6).

Step 300, configuring a loss function and training a convolutional neural network.

In this embodiment, the whole training process of the neural network is end-to-end training, and the loss function mainly includes an anchor point positioning loss function L_locAnchor shape prediction loss function L_shapeTarget classification loss function L_clsRegression loss function L_regConsists of the following components:

L_loss＝λ₁L_loc+λ₂L_shape+L_cls+L_reg

wherein λ is₁1 and λ₂＝0.1

First, for the localization loss function L_locThe penalty function is used to control the corresponding number of anchor boxes to achieve more anchor boxes placed at the target center and less anchor boxes placed at non-center coordinates to prevent the positive and negative samples in the generated anchor boxes from being unbalanced. And the determination of the position is the center position of the reserved anchor block. That is, when the center point is a negative sample, i.e., a simple classification, the prediction score y is close to 1, and thus the weight corresponding to the substitution formula becomes smaller, and thus the number of anchor blocks placed for the center point is reduced. The number of generated anchor blocks is huge, and the anchor blocks containing negative samples occupy most. Therefore, to balance the positive and negative samples contained in the anchor point, the location branch is trained with Focal local. The formula for Focal loss is as follows:

where y represents the score of the anchor box location center predicted as a sample, y' represents the location center at the pre-labeled actual label value, with the positive sample set to 1 and the negative sample set to 0. The value of α controls the weight of the positive and negative anchor boxes, typically α ∈ (0,0.5), set here to 0.25. Gamma is an attention parameter, and when gamma is 0, the loss function is a traditional cross entropy loss function, so that gamma is more than or equal to 0, and gamma is 2. (1-y')^γIs a modulation coefficient used for controlling the weight of samples which are easy to classify and difficult to classify.

For location prediction, each box is first divided into three types of regions: a central region; ignoring the region; a negative sample region. The three specific regions are defined according to the corresponding position information of the mapping of the real frame to the corresponding feature map, namely (x)₀,y₀,w₀,h₀). The corresponding central area is therefore indicated as

The ignored region is represented as

For other bounding regions, a negative sample region is defined. Wherein the content of the first and second substances,

for adjusting the number of anchor boxes generated, typically

Thereby determining the center position of its anchor block.

Second, the loss function is predicted for the shape

The center position (x, y) of the anchor frame a (x, y, w, h) is determined by position prediction, and then the shape of the anchor frame a, i.e. a (w, h), needs to be determined by using the nearest real frame B' corresponding to the center position. Because it is difficult to obtain the value of a (w, h) by using regression operation, the w, h sampling approximation method is adopted, and the w, h of the invention has three values of 0.5, 1.0 and 2.0. Through the maximum intersection ratio voiou of the anchor frame a and a real frame B, B ' (x ', y ', w ', h ') with the maximum intersection ratio to a is obtained, and a (w, h) at this time. The shape predicted penalty function is:

wherein L is₁For the smoothing function:

third, for classification loss function

The loss function of the classification part uses a binary cross entropy function, and the formula is as follows:

wherein: p is a radical of_iThe probability of being predicted as a target for the anchor block (anchor),

in order to be the probability that the background is true,

is as follows:

fourth, for the regression loss function L_reg

Regression loss function L_regAs described by the following formula:

wherein, t_i＝{t_x,t_y,t_w,t_hAnd 4 parameters of the anchor point frame are represented, namely the center position coordinate, the width and the height of the anchor point frame.

Is the 4 coordinate parameters of the grountruth corresponding to positiveanchor. R is smoothL1 function:

in this embodiment, the whole training process of the neural network is end-to-end training, and the first feature map F is obtained by the first convolution module of the first neural network₁And obtaining a prediction result of the center position of the anchor point frame firstly, a prediction result of the center size of the anchor point frame secondly and a prediction result of the anchor point frame finally in the first convolution module of the second neural network, and converting the scene picture into a target boundary frame and a defect type confidence coefficient.

Step 400, training the model.

In the training process (as shown in fig. 4), gradient descent is adopted to optimize the back propagation stage, the batch size of training is set to be 16, the momentum value (momentum) is 0.9, the weight attenuation is exponentially attenuated, the learning rate is set to be 0.004, the parameter num _ classes is set to be 3 (representing the defects of the insulator and the background), and a warp training strategy is adopted. The epochs are set to 40000, the models are saved every 3000 epochs and the last model is saved, and finally the model with the lowest loss is selected for detection.

Step 500, model application.

After the training process, a plurality of models can be obtained, the optimal model (with the minimum loss function value) is selected for application, and at the moment, the image data processing does not need data enhancement, only needs to adjust the image to 900 × 600, and normalization can be used as the input of the model. The parameters of the whole network model are fixed, so long as the image data is input and propagated forwards. Sequentially obtaining a first characteristic diagram F, a prediction anchor point frame a (x, y, w, h) and a second characteristic diagram F₂And the detection result can be directly obtained through the whole model. When a large number of original transmission line images need to be tested, all the images can be integrated into one data file, for example, an lmdb format file can be used when the RGB values of all the images are stored by adopting a data table, so that all the images can be conveniently read at one time.

In the embodiment, the ResNet50 with the channel attention mechanism in the improved fast RCNN network architecture is used as a feature extraction network to extract the features of the scene picture; based on the self-adaptive anchor point frame network in the improved fast RCNN network architecture as a prediction anchor point frame network, the target anchor point is predicted and generated; and finally outputting a category confidence and a category bounding box based on an ROI Powing layer and a classification or regression layer in the improved fast RCNN network architecture.

Step 500, model verification

To verify the effectiveness of this example, the Aut-Faster RCNN detection method and Fast RCNN, SSD, YOLO v3, YOLO v4 methods were trained on the insulator dataset, and the results are shown in table 3 comparing the operating speed and the maps.

TABLE 3 comparison of the six test methods on the insulator dataset

According to the experimental analysis result, the average accuracy (mAP) of the Aut-Faster RCNN algorithm of the embodiment is 93.67%, and the accuracy is higher than that of the current mainstream detection network.

Embodiments of the present invention also include a computer-readable storage medium storing program instructions implementing the inventive method and/or model parameters obtained by training the inventive method.

The embodiment of the invention also comprises a small target detection device based on the improved Faser RCNN, which comprises a memory and a processor, wherein the memory stores the model parameters obtained by training in the embodiment of the method, and the processor reads the program instruction for realizing the algorithm structure described by the improved Faser RCNN and realizes the small target detection according to the model parameters.

Example 2

Two scene pictures containing the insulator are selected as detection objects, and detection is carried out by using the detection method provided by the embodiment 1 of the invention. The detection process is as follows.

Example 2.1

(1) A scene picture containing insulators is adjusted to 900 × 600 size, and then the tensor is 900 × 600 × 3, and the adjusted scene picture is input into the improved fast RCNN-based small object detection device in example 1.

(2) The tensor (900 x 600 x 3) is the first eigenmap F obtained by convolution₁∈R^37×50×512。

(3) First characteristic diagram F₁∈R^37×50×512The result of the adaptive anchor block network is an adaptive anchor block a' (-212, -419, 183, 359).

(4) First characteristic diagram F₁∈R^37×50×512And a' (-212, -419, 183, 359) obtained through the ROI Pooling layer is the secondTwo characteristic diagram F₂∈R^7×7×512.

(5) Second characteristic diagram F₂∈R^7×7×512And a' (-212, -419, 183, 359) classification and regression layers, the classification classes and class confidences are obtained, and the results are shown in fig. 7.

Example 2.2

(1) Another scene picture containing the defective insulator is adjusted to 900 × 600 size, and then the tensor is 900 × 600 × 3, and the adjusted scene picture is input into the improved fast RCNN-based small object detection device in example 1.

(3) First characteristic diagram F₁∈R^37×50×512The adaptive anchor frame network obtains an adaptive anchor frame a '(-415, -425, 286, 431) and an adaptive anchor frame a' (-490, -652, 83, 135).

(4) First characteristic diagram F₁∈R^37×50×512And a '(-415, -425, 286, 431) and a' (-490, -652, 83, 135) through the ROI Pooling layer as a second feature pattern F₂∈R^7×7×512.

(5) Second characteristic diagram F₂∈R^7×7×512And a '(-415, -425, 286, 431) and a' (-490, -652, 83, 135) classified and regressed layers, the classification categories and category confidences are obtained, and the results are shown in fig. 8.

It should be noted that, in the adaptive generation process of the anchor frame, instead of generating one anchor frame, a plurality of anchor frames are generated in the corresponding ROI area, and finally, each anchor frame is subjected to a classification operation to distinguish whether the insulator is normal or damaged.

Fig. 7 is a diagram showing a defect detection result of a normal insulator, and fig. 8 is a diagram showing a defect detection result of a defective insulator. As can be seen from the figure, fig. 7 includes only the bounding box of one insulator target, and the insulator type confidence is shown above the bounding box of the insulator target. Fig. 8 includes two boundary frames of an insulator target, one of which is a boundary frame of the insulator target, and the confidence of the insulator type is shown above the boundary frame of the insulator target, and the other of which is a boundary frame of a defective insulator sub-target, and the confidence of the defective insulator type is shown above the boundary frame of the defective insulator sub-target.

It should be noted that, in order to better distinguish the target of the insulator sub-target from the target of the defective insulator in this embodiment, the patent application of the present invention is filed synchronously with fig. 5, 7 and 8 in the form of other documents of identification at the time of filing. Wherein, fig. 5 corresponds to the certification material 1 in other certification documents, and the two are the same except for the color; FIG. 7 corresponds to the authentication material 2 in the other authentication documents, both of which are identical except for the color; fig. 8 corresponds to the authentication material 3 in the other authentication documents, both of which are identical except for the color difference.

The invention is not to be considered as limited to the particular embodiments shown, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A small target detection method based on improved Faser RCNN is realized by a processor executing an improved Faser RCNN algorithm instruction, and is characterized by comprising the following steps: receiving a scene picture containing a small target; extracting a first feature map F of the scene picture by using a first volume module of the fast RCNN; using the second convolution module of the fast RCNN to obtain the first feature map F₁Obtaining a central position a (x, y) and a size a (w, h) of a prediction anchor frame, and obtaining the prediction anchor frame a (x, y, w, h) according to the central position a (x, y) and the size a (w, h) of the prediction anchor frame; using the third convolution module of the fast RCNN to obtain the first feature map F₁Obtaining the first feature map F from the prediction anchor frame a (x, y, w, h)₁Second feature pattern F with same size₂(ii) a Using the fourth convolution module of the fast RCNN to obtain the second feature map F₂And the prediction anchor frame a (x, y, w, h) obtains the detection result of the scene picture.

2. The method for improved fast RCNN-based small object detection according to claim 1, wherein the trunk portion of the first convolution module employs the convolution structure of ResNet.

3. The improved fast RCNN-based small-object detection method according to claim 2, wherein the trunk portion of the first convolution module adopts the convolution structure of ResNet 50.

4. The improved fast RCNN-based small target detection method according to claim 3, wherein the convolution structure of ResNet50 includes a multi-layered Deform ResNet50 residual block structure, and the Deform ResNet50 residual block structure has its second convolutional layer replaced with a depth separable convolutional layer.

5. The improved fast RCNN-based small-object detection method according to claim 1, wherein the first convolution module comprises a channel attention mechanism module configured to obtain feature weights S from the scene picture_c(ii) a The second convolution module is configured to convolve the first feature map F₁And the feature weight S_cObtaining the central position a (x, y) of the prediction anchor frame and obtaining the prediction anchor frame a (x, y, w, h) according to the central position a (x, y) of the prediction anchor frame.

6. The improved fast RCNN-based small object detection method according to any one of claims 1-5, wherein the trunk portion of the second convolution module employs an adaptive anchor point box network structure.

7. The improved fast RCNN-based small object detection method as claimed in claim 6, wherein said adaptive anchor box network is based on said first feature map F₁Obtaining a score feature map F_PThen according to said score feature map F_PObtaining the central position a (x, y) of the prediction anchor frame; and according to the central position of the predicted anchor point framea (x, y) and the prediction anchor frame size a (w, h) to obtain the prediction anchor frame a (x, y, w, h).

8. The improved fast RCNN-based small-object detection method according to claim 7, wherein: the adaptive anchor frame network comprises an adaptive adjustment module configured to adjust the adaptive adjustment module according to the predicted anchor frame a (x, y, w, h) and the first profile F₁Obtaining an adaptive prediction anchor point frame a' (x, y, w, h); the third convolution module is configured to convolve the first feature map F₁Obtaining the adaptive prediction anchor point frame a' (x, y, w, h) and the first feature map F₁Second feature pattern F with same size₂。

9. A small object detection device based on modified fast RCNN, comprising:

a processor; and

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute program instructions of the algorithm structure to implement the small object detection method of any one of claims 1 to 8.

10. A computer storage medium for a method for small object detection based on improved fast RCNN, comprising: the instructions in the computer storage medium, when executed by a processor of a modified fast RCNN-based small object detection apparatus, enable the modified fast RCNN-based small object detection apparatus to perform the small object detection method of any one of claims 1 to 8.