CN113536896A - Small target detection method, device and storage medium based on improved fast RCNN - Google Patents

Small target detection method, device and storage medium based on improved fast RCNN Download PDF

Info

Publication number
CN113536896A
CN113536896A CN202110593538.8A CN202110593538A CN113536896A CN 113536896 A CN113536896 A CN 113536896A CN 202110593538 A CN202110593538 A CN 202110593538A CN 113536896 A CN113536896 A CN 113536896A
Authority
CN
China
Prior art keywords
frame
prediction
rcnn
feature map
fast rcnn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110593538.8A
Other languages
Chinese (zh)
Other versions
CN113536896B (en
Inventor
李乾
张明
余志强
孙晓云
刘保安
韩广
郑海清
戎士敏
药炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Shijiazhuang Tiedao University
Shijiazhuang Power Supply Co of State Grid Hebei Electric Power Co Ltd
Taiyuan Power Supply Co of State Grid Shanxi Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Shijiazhuang Tiedao University
Shijiazhuang Power Supply Co of State Grid Hebei Electric Power Co Ltd
Taiyuan Power Supply Co of State Grid Shanxi Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Shijiazhuang Tiedao University, Shijiazhuang Power Supply Co of State Grid Hebei Electric Power Co Ltd, Taiyuan Power Supply Co of State Grid Shanxi Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202110593538.8A priority Critical patent/CN113536896B/en
Publication of CN113536896A publication Critical patent/CN113536896A/en
Application granted granted Critical
Publication of CN113536896B publication Critical patent/CN113536896B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a small target detection method based on improved Faser RCNN, which is realized by a processor executing an improved Faser RCNN algorithm instruction and comprises the following steps: receiving a scene picture containing a small target and extracting a first feature map F; according to the first characteristic diagram F1Obtaining a prediction anchor frame a (x, y, w, h); according to the first characteristic diagram F1Obtaining the first feature map F from the prediction anchor frame a (x, y, w, h)1Second feature pattern F with same size2(ii) a According to the second characteristic diagram F2And the prediction anchor block a (x, y, w, h) obtains the fieldAnd (5) detecting the scene picture. The invention modifies the frame based on the fast RCNN algorithm, replaces the RPN network under the frame with the self-adaptive anchor point frame network, so that the generated anchor point frame can be matched with targets with different scales better, thereby avoiding the detection omission caused by the unreasonable size of the anchor point frame and improving the detection accuracy.

Description

Small target detection method, device and storage medium based on improved fast RCNN
Technical Field
The invention relates to the field of target identification and recognition, in particular to a small target detection method based on improved fast RCNN. Meanwhile, the invention also relates to a small target detection device and a storage medium based on the improved fast RCNN.
Background
Object detection, also called object extraction, is an image segmentation based on object geometry and statistical features. The method combines the segmentation and the identification of the target into a whole, and the accuracy and the real-time performance of the method are important capabilities of the whole system.
The target detection is a hot direction of computer vision and digital image processing, is widely applied to various fields of robot navigation, intelligent video monitoring, industrial detection, aerospace and the like, reduces the consumption of human capital through the computer vision, and has important practical significance. Therefore, target detection has become a research hotspot of theory and application in recent years. The rapid development of machine learning, especially deep learning, in recent years provides the possibility of achieving low cost and high efficiency of target detection.
The currently excellent deep learning models can be roughly divided into two types: the first type belongs to a two-stage target detection algorithm (two-stage), such as R-CN, SPP-Net, Fast-RCNN and the like, which firstly extracts target information from a region candidate frame (RPN) of a target image, and then predicts the position and identifies the type of a target in the candidate frame by using a detection network; the second category belongs to a one-stage (one-stage) target detection algorithm, such as SSD, YOLO, etc., which does not need to establish an RPN network, but directly performs target prediction and class identification on an image.
However, in real applications, the detection effect of small targets is often far less than that of large targets and medium targets. This is due to two problems with small target detection: information quantity is deficient, namely the target occupies a very small proportion in an image, and the information quantity reflected by pixels in a corresponding area is very limited; and secondly, the data volume is scarce, namely, the data set contains few images of small targets, so that the category of the whole training set is unbalanced, and the accuracy rate of detecting the small target objects is far lower than that of medium and large objects. At present, the following methods are provided for the problem of poor detection effect of small targets:
first, image data is amplified, and the image is enlarged to enlarge a small target. However, the method is simple and rough, complex in operation, too large in calculation amount and not strong in practical significance.
Secondly, amplifying and detecting the small target by using a GAN model, wherein the method is consistent with the image data amplification idea, but has the defect of complex operation.
And thirdly, modifying parameters of model training, such as setting parameters stride to be 1, but the effect of the method is also common.
CN 111985540A discloses a method for improving small target detection rate based on oversampling fast-RCNN, relating to the field of target identification and comprising the following steps: acquiring a target picture data set, and dividing the data set into a training set and a test set; step 2: obtaining a subset of the training set as an oversampling set according to the training set in the step 1; and step 3: constructing a fast-RCNN model; and 4, step 4: training the master-RCNN model by using a training set and an oversampling set; and 5: testing the trained faster-RCNN model by using a test set, modifying parameters if the test result is lower than the average accuracy rate AP threshold, and testing the training result after the step 4 is carried out again until the test result reaches the AP threshold; step 6: inputting a picture to be detected, and detecting the small target by using the trained fast-RCNN model. The invention improves the detection rate of the small target by oversampling the small target.
CN 111986160A discloses a method for improving small target detection effect based on fast-RCNN, belonging to the field of target detection. The invention comprises the following steps: acquiring a data set, and dividing the data set into a training set and a test set according to a corresponding proportion; constructing a fast-RCNN model; training the model by using a training set, and during training, if the loss value of a small target reaches a preset condition in the nth iteration, reducing a plurality of pictures in the (n + 1) th iteration, splicing the pictures into the size of an original image, and then training; after training, testing the model by using the test set to obtain an AP value, if the AP value is smaller than a set threshold value, modifying corresponding parameters, and re-training until a small target AP value of the model reaches the set threshold value; and detecting the small target by using the trained model. The invention can make the distribution of the small targets more uniform, thereby improving the fullness of the small target training and further improving the detection precision of the small targets.
CN 111898668A discloses a small target object detection method based on deep learning, which can overcome the problems of insufficient detection efficiency and low accuracy in the existing small target object detection method. Firstly, extracting an image without a small target object based on a COCO data set, splicing after adjusting the size of the image, forming a new data set by the spliced image and the image with the small target object in the COCO data set, and carrying out image splicing according to the ratio of 4: 1, dividing a data set into a training set and a testing set; then, modifying the basic feature extraction network of the Faster-RCNN to perform feature fusion; then, selecting a candidate region of each level of fused features after fusion through an RPN; training the improved network by using a training set to obtain a training model; and finally, inputting the test set into the trained model for target detection.
CN 111368769A provides a ship multi-target detection method based on an improved anchor point frame generation model, which comprises the following steps: acquiring an SAR ship image; constructing a low-complexity network architecture, and putting an image into the low-complexity network to generate a feature mapping space; generating an initial anchor point frame by adopting a clustering method based on shape similarity; and generating a new candidate frame in a low-complexity feature space by adopting a sliding window mechanism on the basis of the generated initial anchor frame, and performing regression training on the candidate frame for multi-target detection of the ship. The invention solves the problems of low algorithm efficiency and detection quality caused by complex network and poor candidate frame quality, and has better accuracy. Due to the fact that the low-complexity network architecture is adopted for detection, from the aspect of statistical analysis, the larger the data acquisition amount is, the more the detection times are, and the better the detection effect is.
The target detection methods of the patent applications belong to a two-stage target detection algorithm based on fast-RCNN, and the detection precision of the small target is improved by oversampling the small target, improving the fullness of small target training, improving the model or model training process, and generating a new candidate frame in a low-complexity feature space by adopting a sliding window mechanism. The main steps of the fast-RCNN are as follows: first, the convolutional layer is used to extract feature maps (feature maps) of an input picture; secondly, the RPN network is used to generate regions (explosals); thirdly, the Roi Pooling layer extracts a region feature map (pro-spatial feature maps) according to feature maps and pro-spatial; and finally, the classification layer calculates the classification of the proxy according to the proxy feature maps, and performs border regression (bounding box regression) again to obtain the final accurate position of the detection box. However, if the explosals is not reasonable in set size, the problem of missing detection is easily caused, and particularly, the problem of missing detection of a small-scale target is easily caused under the condition that the sizes of two targets are greatly different, so that the detection accuracy is reduced.
Disclosure of Invention
The invention aims to provide a small target detection method based on improved Faser RCNN, which can generate an anchor point frame matched with the size of a target better, reduce the omission factor of the small-scale target and improve the detection accuracy.
The technical scheme provided by the invention is a small target detection method based on improved Faser RCNN, which is realized by a processor executing an instruction of an improved Faser RCNN algorithm, and comprises the following steps: receiving a scene picture containing a small target; extracting a first feature map F of the scene picture by using a first volume module of the fast RCNN1(ii) a Using the second convolution module of the fast RCNN to obtain the first feature map F1Obtaining a central position a (x, y) and a size a (w, h) of a prediction anchor frame, and obtaining the prediction anchor frame a (x, y, w, h) according to the central position a (x, y) and the size a (w, h) of the prediction anchor frame; using the third convolution module of the fast RCNN to obtain the first feature map F1Obtaining the first feature map F from the prediction anchor frame a (x, y, w, h)1Second feature pattern F with same size2(ii) a Using the fourth convolution module of the fast RCNN to obtain the second feature map F2And the prediction anchor frame a (x, y, w, h) obtains the detection result of the scene picture.
Further, the main part of the first convolution module adopts a convolution structure of ResNet.
Further, the main part of the first convolution module adopts a convolution structure of ResNet 50.
Further, the convolution structure of the ResNet50 includes a multi-layered Deform ResNet50 residual block structure, and the Deform ResNet50 residual block structure has its second convolutional layer replaced with a depth separable convolutional layer.
Further, the first volume module packA channel attention mechanism module configured to obtain a feature weight S from the scene picturec(ii) a The second convolution module is configured to convolve the first feature map F1And the feature weight ScObtaining the central position a (x, y) of the prediction anchor frame and obtaining the prediction anchor frame a (x, y, w, h) according to the central position a (x, y) of the prediction anchor frame.
Further, the trunk portion of the second convolution module adopts a self-adaptive anchor point frame network structure.
Further, the adaptive anchor frame network is based on the first profile F1Obtaining a score feature map FPThen according to said score feature map FPObtaining the central position a (x, y) of the prediction anchor frame; and obtaining the prediction anchor frame a (x, y, w, h) according to the center position a (x, y) of the prediction anchor frame and the size a (w, h) of the prediction anchor frame.
Further, the adaptive anchor frame network comprises an adaptive adjustment module configured to adjust the adaptive adjustment module according to the predicted anchor frame a (x, y, w, h) and the first feature map F1Obtaining an adaptive prediction anchor point frame a' (x, y, w, h); the third convolution module is configured to convolve the first feature map F1Obtaining the adaptive prediction anchor point frame a' (x, y, w, h) and the first feature map F1Second feature pattern F with same size2
Meanwhile, the invention also provides a small target detection device based on the improved fast RCNN, which comprises:
a processor; and
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the program instructions of the algorithm structure to implement the above-mentioned improved fast RCNN-based small object detection method.
In addition, the present invention also provides a computer storage medium for a small object detection method based on modified fast RCNN, wherein when instructions in the computer storage medium are executed by a processor of a small object detection device based on modified fast RCNN, the small object detection device based on modified fast RCNN is enabled to execute the small object detection method based on modified fast RCNN.
The invention mainly comprises four parts: the first part is a process of extracting features of the scene picture; the second part is a process of obtaining the shape of the prediction anchor frame according to the center point of the prediction anchor frame and finally generating a prediction anchor frame A (x, y, w, h); the third part is a process of generating a feature map with the same size as the anchor point frame A (x, y, w, h); the fourth part is a process of obtaining a scene picture detection result.
The invention has the beneficial effects that:
1. the invention replaces the RPN network under the frame of the Faser RCNN algorithm with the self-adaptive anchor point frame network by modifying the frame based on the frame of the Faser RCNN algorithm, so that the generated anchor point frame can be matched with targets with different scales better, thereby avoiding the phenomenon of missing detection caused by unreasonable set size of the anchor point frame and finally achieving the purpose of improving the detection accuracy.
2. The invention adopts an adaptive anchor frame network which mainly comprises two branches of shape prediction and position prediction, and generates an anchor frame by selecting a position with prediction probability higher than a certain threshold value and the most possible shape of each selected position through the two branches. Through foretell improvement mode, can obtain the anchor point frame of reasonable size, avoid causing the condition of louing to examine, improve the network and to the detection capability of target, effectual improvement detection performance. When the method is applied to insulator detection, the detection accuracy of the method is greatly improved by comparing the method with the existing insulator detection algorithm.
3. When the method is applied to insulator detection, the comparison between the method and the existing insulator detection algorithm shows that the method reduces the parameter quantity of the network by utilizing the depth separable convolution instead of the conventional convolution, and greatly improves the detection speed.
4. According to the method, the residual error network is improved in the feature extraction network, and a channel attention mechanism structure is added, so that the channels in the feature diagram are connected with each other and the important feature information in the channels is enhanced, on one hand, the method is more beneficial to the subsequent scene picture small target detection process, and on the other hand, the accuracy of the detection result can be improved.
Drawings
Fig. 1 is a schematic structural diagram of a residual block of form ResNet50 in a small target detection method based on improved fast RCNN in embodiment 1 of the present invention;
FIG. 2 is a schematic diagram illustrating the principle of deep separable convolution in a small target detection method based on the improved fast RCNN in embodiment 1 of the present invention;
fig. 3 is a schematic diagram of a network structure of an adaptive anchor point box in a small target detection method based on improved fast RCNN in embodiment 1 of the present invention;
fig. 4 is a network training flowchart of the improved fast RCNN in the improved fast RCNN-based small target detection method in embodiment 1 of the present invention;
FIG. 5 is a network framework diagram of the improved Faser RCNN in the improved Faser RCNN-based small object detection method in embodiment 1 of the present invention;
fig. 6 is a flowchart of the detection of insulator defects in embodiment 1 of the present invention;
fig. 7 is a diagram showing a defect detection result of a normal insulator in embodiment 2 of the present invention;
fig. 8 is a diagram showing a defect detection result of a defective insulator in embodiment 2 of the present invention.
Detailed Description
The insulator is an important component in an overhead transmission line and is used for supporting and fixing a bus and a live conductor and ensuring that the live conductor or the conductor has enough distance and insulation with the ground. Because the overhead transmission line is exposed in the natural environment for a long time and is influenced by natural or artificial factors, the problems of line aging, damage and the like exist, and if the problems are not regularly checked and overhauled, serious safety accidents can be caused. As known to those skilled in the art, the insulator defects include erosion, cracking, breaking, exposed core rod, etc., but the size difference between the insulator defects and the insulator is large. Therefore, in the insulator defect detection, the missing rate is relatively high.
The technical solution and the corresponding technical effects provided by the present invention will be described in detail below with reference to the accompanying drawings and taking insulator detection as an example.
Example 1
The embodiment provides a small target detection method based on improved fast RCNN, which is a method for extracting features of the scene picture, generating an anchor point frame based on the center position of the anchor point frame and finally obtaining a detection result. For convenience of understanding, the present embodiment describes a specific process through the following steps 100 to 600, the steps do not represent a time sequence in an actual implementation, and the implementation of changing the sequence is also a different embodiment of the present invention on the premise of implementing preparation conditions for implementing each step.
And step 100, preprocessing an original power transmission line image sample picture for learning, namely a scene picture.
In this embodiment, in the preprocessing, each sample picture is adjusted to the same size, and data enhancement is performed. In an exemplary embodiment, each sample picture is adjusted to 900 × 600 size by using a bilinear interpolation method, the data enhancement method used in training expands the data set by using a data enhancement method such as rotating, clipping, increasing contrast, and the like, and specific parameters of the data enhancement method are shown in table 1. This step obtains a scene picture dataset containing small objects. Specifically, in this embodiment, after the sample picture is expanded by the three RGB channel components, one scene picture is expressed by one tensor of 900 × 600 × 3.
TABLE 1 data enhancement mode
Figure BDA0003090140040000061
And manually labeling the obtained data set by using LabelImg software, respectively setting labels with the names of Insulator and defect for the Insulator, and making the labels into a VOC format for network detection training. And randomly selecting 80% of scene picture data sets as a training set for optimizing the network model, and selecting 20% of scene picture data sets as a test set for evaluating the model effect.
Step 200, constructing an improved fast RCNN (shown in FIG. 5) composed of a first convolution module, a second convolution module, a third convolution module, a fourth convolution module and a channel attention machine module. In this embodiment, the more specific structural arrangement and operation of the fast RCNN used in the present invention can be obtained from the following description of the operation principle thereof. The neural network construction of the present embodiment includes 4 main parts: the first part is a convolutional layer, the second part is an adaptive anchor point frame network, the third part is an ROI Pooling layer, and the fourth part is a classification and regression layer. The details are as follows
A first part: convolution layers (conv layers) for extracting the features of the picture, the input of which is a preprocessed tensor (224 × 3), and the output of which is the extracted features, which is simply called the first feature map F1. In one embodiment of this section, we use the convolution structure of ResNet to perform feature extraction on the scene picture, so as to obtain a first feature map F1. Preferably, the main part of the first convolution module of the first neural network adopts the convolution structure of ResNet50, i.e. the tensor (224 × 3) is input into the feature extraction layer of ResNet 50. The Resnet50 network is composed of several residual blocks, and the structure diagram is shown in Table 2.
TABLE 2 Resnet50 network architecture
Figure BDA0003090140040000062
Figure BDA0003090140040000071
Firstly, an input image is convoluted by 7x7 with 64 dimensions, the step size of a convolution kernel is 2, then the input image is downsampled by maximum pooling with the convolution kernel size of 3x3 and the step size of 2, finally a series of residual blocks are passed, global average pooling and 1000-dimension full-connection layers are carried out on the input image, and the input image is output to a softmax classifier for classification processing. The backbone network structure of the Resnet50 is not modified and will not be described herein. After feature extraction, a first feature map F is obtained1
As a preferred feature of this embodiment, a residual block of the Resnet50 network is modified, so as to form a Deform _ Resnet50 backbone feature extraction network. The convolution structure of the ResNet50 comprises a multilayer Deform ResNet50 residual block structure, each residual block structure comprises three convolution layers (1x1+3x3+1x1), the middle 3x3 convolution layer firstly reduces calculation under one dimension-reduced 1x1 convolution layer, and then reduces calculation under the other 1x1 convolution layer, so that the precision is maintained, and the calculation amount is reduced.
Exemplarily, one of the specific embodiments of the present embodiment is as follows: the Deform ResNet50 residual block structure has the second convolution layer (i.e. 3x3 convolution layer) replaced by a depth separable convolution layer, and the construction mode of the Deform _ Resnet50 residual block is shown in FIG. 1. The depth separable Convolution includes channel-wise Convolution (Depthwise Convolution) and point-wise Convolution (Pointwise Convolution). Firstly, generating C corresponding feature maps from the input C-channel images, and then recombining the generated feature maps to form a new feature map in a point-by-point convolution mode. The size of the convolution kernel of the point-by-point convolution is 1x1xM, and M is the number of channels of the previous layer. A depth separable convolution schematic is shown in fig. 2. Under the condition of the same input, the number of parameters of the depth separable convolution is greatly reduced, and the complexity of calculation is greatly reduced.
Exemplarily, another specific implementation manner of the present embodiment is: the first neural network includes a channel attention mechanism module in a first convolution module. Specifically, in this embodiment, a channel attention mechanism module is added to the branch portion (shown as the right branch in fig. 1) of the residual block, that is, the channel attention mechanism module is called when the tensor (112 × 3) is input to the feature extraction layer of the ResNet 50. Inputting an image with the size of W multiplied by H multiplied by C into a detection network, wherein W, H, C respectively represents the width, height and channel number of the image, firstly carrying out Global average pooling (Global pooling) on C channels of the image to play a role in reducing the dimension, converting the C channels into a vector of 1 multiplied by C, reducing the channel dimension of the vector by using 1 full connection layer (FC), then activating the function through a ReLU layer, then recovering the original dimension through an FC layer, and finally activating by using a Sigmoid function to obtain the characteristic weight. Then, the input image and the characteristic weight are subjected to multiplication operation:
Fscale(uc,sc)=uc×sc
wherein: u. ofcFor inputting a characteristic map, scThe weights obtained for the firing operation.
Illustratively, the entire channel attention mechanism module may be expressed by the following equation:
Figure BDA0003090140040000081
wherein the content of the first and second substances,
Figure BDA0003090140040000082
δ is the activation function, Q1、Q2Represented as weight values for two fully connected layers.
A second part: and (4) self-adapting an anchor frame network.
Based on the probability formula P (x, y, w, h | F) ═ P (x, y | F) × P (w, h | x, y, F), it can be seen that the probabilities of affecting the prediction frames corresponding to different center point positions (x, y) are different, and are denoted as P (x, y | F). When the center point position (x, y) is determined, the probability of occurrence of different sized prediction boxes is different, denoted as P (w, h | x, y, F). Thus indicating that the influence of a set of anchor blocks is influenced by location and shape size factors. Therefore, the adaptive anchor block network of the present embodiment has two branches, a position prediction module and a shape prediction module.
A location prediction module: first feature map F obtained by first convolution module of first neural network1The first characteristic diagram F1It can be represented by four parameters (x, y, w, h), where (x, y) represents the coordinates of the center point of the prediction anchor block, and (w, h) represents the width and height of the shape of the prediction anchor block. Then, the first characteristic diagram F is used1After a convolution operation of 1x1 and activation by an activation function Sigmoid, a corresponding score feature map F with the same scale is generatedpWherein each point represents a score for the presence of an object for each pixel point. Then the obtained product isThe threshold was experimentally found to be 0.6 compared to the score threshold τ. If FpThe corresponding pixel of the point is represented as the center point of the target, and is marked as the center position a (x, y) of the anchor frame.
A shape prediction module: the center position a (x, y) of the anchor frame a (x, y, w, h) is determined by position prediction, and then the shape of the anchor frame a, i.e. a (w, h), needs to be determined by using the nearest real frame B' corresponding to the center position. Since it is difficult to obtain the value of a (w, h) by using a regression operation, a method of sampling w, h approximately is used. The w: h of the invention has three values of 0.5, 1.0 and 2.0. B ' (x ', y ', w ', h ') having the maximum cross-over ratio with a is obtained through the maximum cross-over ratio voiou of the anchor point frame a and a real frame B, and a (w, h) at this time, namely, the result output by the shape prediction branch is a (w, h). Since this range of numbers is widely unstable, the width and height of the prediction box need to be formulated: w ═ μ · S · edW,H=μ·S·edHWherein S is the step size and μ is an empirical factor (generally, 8). Range the parameters to be learned from [1,1000]Adjusted to [ -1,1 [ ]]The aim of simplifying network training is achieved. And finally obtaining the anchor frame a (x, y, w, h) according to the center position a (x, y) of the anchor frame output by the shape prediction branch and the size a (w, h) of the anchor frame output by the position prediction branch.
As a more preferred implementation of this embodiment, the adaptive anchor frame network includes an adaptive adjustment module configured to adjust the adaptive anchor frame according to the anchor frame a (x, y, w, h) and the first feature map F1The specific operation formula of the feature adaptation obtained from the adaptive anchor block a' (x, y, w, h) is as follows:
fi'=D(fi,Wi,Hi)
wherein, by fiIs that the generated ith anchor point frame is mapped on the input first feature map F1The above eigenvalue, D (-) consists of a deformable convolution of 3x3, Offset field described in fig. 3. f. ofi' is the adjusted feature value, i.e. the feature and width and height of the ith position are processed. According to fi' obtaining self-adaptationThe box a' (x, y, w, h) should be anchored. The network structure diagram of the adaptive anchor point frame is shown in FIG. 3, and the input is a first characteristic diagram F1The output is the prediction anchor block a (x, y, w, h). In fig. 3, W × H × 1 and W × H × 2 are both the first feature map F1Wherein W, H corresponds to the width and height of the first characteristic diagram F, and 1 and 2 represent the number of channels. As shown in fig. 3, a first characteristic diagram F1Firstly, obtaining the central position a (x, y) of an anchor point frame through position positioning prediction, and secondly, obtaining a predicted anchor point frame a (x, y, w, h) through shape prediction; finally, an adaptive anchor frame a' (x, y, w, h) is obtained through adaptive adjustment of the features.
And a third part: ROI Pooling, which collects the input first feature map F1And predicting anchor point frames a (x, y, w, h), and obtaining the first characteristic diagram F after integrating the information1Second feature pattern F with same size2Second characteristic diagram F2The feature map is fixed in size and then sent to the subsequent full connection layer. Preferably, the layer collects the first profile F of the input1And an adaptive anchor point frame a' (x, y, w, h), and after integrating the information, obtaining the first characteristic diagram F1Second feature pattern F with same size2
The fourth part: classification and regression (Classification and regression), the input of this layer is the second feature map F2The output is the bounding box of the target and the confidence in the defect class. The classification probability and Bounding box regression (Bounding box regression) are jointly trained by utilizing Softmax Loss and Smooth L1Loss, so as to obtain the Bounding box and defect class confidence of the target (as shown in fig. 6).
Step 300, configuring a loss function and training a convolutional neural network.
In this embodiment, the whole training process of the neural network is end-to-end training, and the loss function mainly includes an anchor point positioning loss function LlocAnchor shape prediction loss function LshapeTarget classification loss function LclsRegression loss function LregConsists of the following components:
Lloss=λ1Lloc2Lshape+Lcls+Lreg
wherein λ is11 and λ2=0.1
First, for the localization loss function LlocThe penalty function is used to control the corresponding number of anchor boxes to achieve more anchor boxes placed at the target center and less anchor boxes placed at non-center coordinates to prevent the positive and negative samples in the generated anchor boxes from being unbalanced. And the determination of the position is the center position of the reserved anchor block. That is, when the center point is a negative sample, i.e., a simple classification, the prediction score y is close to 1, and thus the weight corresponding to the substitution formula becomes smaller, and thus the number of anchor blocks placed for the center point is reduced. The number of generated anchor blocks is huge, and the anchor blocks containing negative samples occupy most. Therefore, to balance the positive and negative samples contained in the anchor point, the location branch is trained with Focal local. The formula for Focal loss is as follows:
Figure BDA0003090140040000101
where y represents the score of the anchor box location center predicted as a sample, y' represents the location center at the pre-labeled actual label value, with the positive sample set to 1 and the negative sample set to 0. The value of α controls the weight of the positive and negative anchor boxes, typically α ∈ (0,0.5), set here to 0.25. Gamma is an attention parameter, and when gamma is 0, the loss function is a traditional cross entropy loss function, so that gamma is more than or equal to 0, and gamma is 2. (1-y')γIs a modulation coefficient used for controlling the weight of samples which are easy to classify and difficult to classify.
For location prediction, each box is first divided into three types of regions: a central region; ignoring the region; a negative sample region. The three specific regions are defined according to the corresponding position information of the mapping of the real frame to the corresponding feature map, namely (x)0,y0,w0,h0). The corresponding central area is therefore indicated as
Figure BDA0003090140040000102
The ignored region is represented as
Figure BDA0003090140040000103
For other bounding regions, a negative sample region is defined. Wherein the content of the first and second substances,
Figure BDA0003090140040000104
for adjusting the number of anchor boxes generated, typically
Figure BDA0003090140040000105
Thereby determining the center position of its anchor block.
Second, the loss function is predicted for the shape
The center position (x, y) of the anchor frame a (x, y, w, h) is determined by position prediction, and then the shape of the anchor frame a, i.e. a (w, h), needs to be determined by using the nearest real frame B' corresponding to the center position. Because it is difficult to obtain the value of a (w, h) by using regression operation, the w, h sampling approximation method is adopted, and the w, h of the invention has three values of 0.5, 1.0 and 2.0. Through the maximum intersection ratio voiou of the anchor frame a and a real frame B, B ' (x ', y ', w ', h ') with the maximum intersection ratio to a is obtained, and a (w, h) at this time. The shape predicted penalty function is:
Figure BDA0003090140040000111
wherein L is1For the smoothing function:
Figure BDA0003090140040000112
third, for classification loss function
The loss function of the classification part uses a binary cross entropy function, and the formula is as follows:
Figure BDA0003090140040000113
wherein: p is a radical ofiThe probability of being predicted as a target for the anchor block (anchor),
Figure BDA0003090140040000114
in order to be the probability that the background is true,
Figure BDA0003090140040000115
is as follows:
Figure BDA0003090140040000116
fourth, for the regression loss function Lreg
Regression loss function LregAs described by the following formula:
Figure BDA0003090140040000117
wherein, ti={tx,ty,tw,thAnd 4 parameters of the anchor point frame are represented, namely the center position coordinate, the width and the height of the anchor point frame.
Figure BDA0003090140040000118
Is the 4 coordinate parameters of the grountruth corresponding to positiveanchor. R is smoothL1 function:
Figure BDA0003090140040000119
in this embodiment, the whole training process of the neural network is end-to-end training, and the first feature map F is obtained by the first convolution module of the first neural network1And obtaining a prediction result of the center position of the anchor point frame firstly, a prediction result of the center size of the anchor point frame secondly and a prediction result of the anchor point frame finally in the first convolution module of the second neural network, and converting the scene picture into a target boundary frame and a defect type confidence coefficient.
Step 400, training the model.
In the training process (as shown in fig. 4), gradient descent is adopted to optimize the back propagation stage, the batch size of training is set to be 16, the momentum value (momentum) is 0.9, the weight attenuation is exponentially attenuated, the learning rate is set to be 0.004, the parameter num _ classes is set to be 3 (representing the defects of the insulator and the background), and a warp training strategy is adopted. The epochs are set to 40000, the models are saved every 3000 epochs and the last model is saved, and finally the model with the lowest loss is selected for detection.
Step 500, model application.
After the training process, a plurality of models can be obtained, the optimal model (with the minimum loss function value) is selected for application, and at the moment, the image data processing does not need data enhancement, only needs to adjust the image to 900 × 600, and normalization can be used as the input of the model. The parameters of the whole network model are fixed, so long as the image data is input and propagated forwards. Sequentially obtaining a first characteristic diagram F, a prediction anchor point frame a (x, y, w, h) and a second characteristic diagram F2And the detection result can be directly obtained through the whole model. When a large number of original transmission line images need to be tested, all the images can be integrated into one data file, for example, an lmdb format file can be used when the RGB values of all the images are stored by adopting a data table, so that all the images can be conveniently read at one time.
In the embodiment, the ResNet50 with the channel attention mechanism in the improved fast RCNN network architecture is used as a feature extraction network to extract the features of the scene picture; based on the self-adaptive anchor point frame network in the improved fast RCNN network architecture as a prediction anchor point frame network, the target anchor point is predicted and generated; and finally outputting a category confidence and a category bounding box based on an ROI Powing layer and a classification or regression layer in the improved fast RCNN network architecture.
Step 500, model verification
To verify the effectiveness of this example, the Aut-Faster RCNN detection method and Fast RCNN, SSD, YOLO v3, YOLO v4 methods were trained on the insulator dataset, and the results are shown in table 3 comparing the operating speed and the maps.
TABLE 3 comparison of the six test methods on the insulator dataset
Figure BDA0003090140040000121
According to the experimental analysis result, the average accuracy (mAP) of the Aut-Faster RCNN algorithm of the embodiment is 93.67%, and the accuracy is higher than that of the current mainstream detection network.
Embodiments of the present invention also include a computer-readable storage medium storing program instructions implementing the inventive method and/or model parameters obtained by training the inventive method.
The embodiment of the invention also comprises a small target detection device based on the improved Faser RCNN, which comprises a memory and a processor, wherein the memory stores the model parameters obtained by training in the embodiment of the method, and the processor reads the program instruction for realizing the algorithm structure described by the improved Faser RCNN and realizes the small target detection according to the model parameters.
Example 2
Two scene pictures containing the insulator are selected as detection objects, and detection is carried out by using the detection method provided by the embodiment 1 of the invention. The detection process is as follows.
Example 2.1
(1) A scene picture containing insulators is adjusted to 900 × 600 size, and then the tensor is 900 × 600 × 3, and the adjusted scene picture is input into the improved fast RCNN-based small object detection device in example 1.
(2) The tensor (900 x 600 x 3) is the first eigenmap F obtained by convolution1∈R37×50×512
(3) First characteristic diagram F1∈R37×50×512The result of the adaptive anchor block network is an adaptive anchor block a' (-212, -419, 183, 359).
(4) First characteristic diagram F1∈R37×50×512And a' (-212, -419, 183, 359) obtained through the ROI Pooling layer is the secondTwo characteristic diagram F2∈R7×7×512.
(5) Second characteristic diagram F2∈R7×7×512And a' (-212, -419, 183, 359) classification and regression layers, the classification classes and class confidences are obtained, and the results are shown in fig. 7.
Example 2.2
(1) Another scene picture containing the defective insulator is adjusted to 900 × 600 size, and then the tensor is 900 × 600 × 3, and the adjusted scene picture is input into the improved fast RCNN-based small object detection device in example 1.
(2) The tensor (900 x 600 x 3) is the first eigenmap F obtained by convolution1∈R37×50×512
(3) First characteristic diagram F1∈R37×50×512The adaptive anchor frame network obtains an adaptive anchor frame a '(-415, -425, 286, 431) and an adaptive anchor frame a' (-490, -652, 83, 135).
(4) First characteristic diagram F1∈R37×50×512And a '(-415, -425, 286, 431) and a' (-490, -652, 83, 135) through the ROI Pooling layer as a second feature pattern F2∈R7×7×512.
(5) Second characteristic diagram F2∈R7×7×512And a '(-415, -425, 286, 431) and a' (-490, -652, 83, 135) classified and regressed layers, the classification categories and category confidences are obtained, and the results are shown in fig. 8.
It should be noted that, in the adaptive generation process of the anchor frame, instead of generating one anchor frame, a plurality of anchor frames are generated in the corresponding ROI area, and finally, each anchor frame is subjected to a classification operation to distinguish whether the insulator is normal or damaged.
Fig. 7 is a diagram showing a defect detection result of a normal insulator, and fig. 8 is a diagram showing a defect detection result of a defective insulator. As can be seen from the figure, fig. 7 includes only the bounding box of one insulator target, and the insulator type confidence is shown above the bounding box of the insulator target. Fig. 8 includes two boundary frames of an insulator target, one of which is a boundary frame of the insulator target, and the confidence of the insulator type is shown above the boundary frame of the insulator target, and the other of which is a boundary frame of a defective insulator sub-target, and the confidence of the defective insulator type is shown above the boundary frame of the defective insulator sub-target.
It should be noted that, in order to better distinguish the target of the insulator sub-target from the target of the defective insulator in this embodiment, the patent application of the present invention is filed synchronously with fig. 5, 7 and 8 in the form of other documents of identification at the time of filing. Wherein, fig. 5 corresponds to the certification material 1 in other certification documents, and the two are the same except for the color; FIG. 7 corresponds to the authentication material 2 in the other authentication documents, both of which are identical except for the color; fig. 8 corresponds to the authentication material 3 in the other authentication documents, both of which are identical except for the color difference.
The invention is not to be considered as limited to the particular embodiments shown, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A small target detection method based on improved Faser RCNN is realized by a processor executing an improved Faser RCNN algorithm instruction, and is characterized by comprising the following steps: receiving a scene picture containing a small target; extracting a first feature map F of the scene picture by using a first volume module of the fast RCNN; using the second convolution module of the fast RCNN to obtain the first feature map F1Obtaining a central position a (x, y) and a size a (w, h) of a prediction anchor frame, and obtaining the prediction anchor frame a (x, y, w, h) according to the central position a (x, y) and the size a (w, h) of the prediction anchor frame; using the third convolution module of the fast RCNN to obtain the first feature map F1Obtaining the first feature map F from the prediction anchor frame a (x, y, w, h)1Second feature pattern F with same size2(ii) a Using the fourth convolution module of the fast RCNN to obtain the second feature map F2And the prediction anchor frame a (x, y, w, h) obtains the detection result of the scene picture.
2. The method for improved fast RCNN-based small object detection according to claim 1, wherein the trunk portion of the first convolution module employs the convolution structure of ResNet.
3. The improved fast RCNN-based small-object detection method according to claim 2, wherein the trunk portion of the first convolution module adopts the convolution structure of ResNet 50.
4. The improved fast RCNN-based small target detection method according to claim 3, wherein the convolution structure of ResNet50 includes a multi-layered Deform ResNet50 residual block structure, and the Deform ResNet50 residual block structure has its second convolutional layer replaced with a depth separable convolutional layer.
5. The improved fast RCNN-based small-object detection method according to claim 1, wherein the first convolution module comprises a channel attention mechanism module configured to obtain feature weights S from the scene picturec(ii) a The second convolution module is configured to convolve the first feature map F1And the feature weight ScObtaining the central position a (x, y) of the prediction anchor frame and obtaining the prediction anchor frame a (x, y, w, h) according to the central position a (x, y) of the prediction anchor frame.
6. The improved fast RCNN-based small object detection method according to any one of claims 1-5, wherein the trunk portion of the second convolution module employs an adaptive anchor point box network structure.
7. The improved fast RCNN-based small object detection method as claimed in claim 6, wherein said adaptive anchor box network is based on said first feature map F1Obtaining a score feature map FPThen according to said score feature map FPObtaining the central position a (x, y) of the prediction anchor frame; and according to the central position of the predicted anchor point framea (x, y) and the prediction anchor frame size a (w, h) to obtain the prediction anchor frame a (x, y, w, h).
8. The improved fast RCNN-based small-object detection method according to claim 7, wherein: the adaptive anchor frame network comprises an adaptive adjustment module configured to adjust the adaptive adjustment module according to the predicted anchor frame a (x, y, w, h) and the first profile F1Obtaining an adaptive prediction anchor point frame a' (x, y, w, h); the third convolution module is configured to convolve the first feature map F1Obtaining the adaptive prediction anchor point frame a' (x, y, w, h) and the first feature map F1Second feature pattern F with same size2
9. A small object detection device based on modified fast RCNN, comprising:
a processor; and
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute program instructions of the algorithm structure to implement the small object detection method of any one of claims 1 to 8.
10. A computer storage medium for a method for small object detection based on improved fast RCNN, comprising: the instructions in the computer storage medium, when executed by a processor of a modified fast RCNN-based small object detection apparatus, enable the modified fast RCNN-based small object detection apparatus to perform the small object detection method of any one of claims 1 to 8.
CN202110593538.8A 2021-05-28 2021-05-28 Insulator defect detection method and device based on improved Faster RCNN and storage medium Active CN113536896B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110593538.8A CN113536896B (en) 2021-05-28 2021-05-28 Insulator defect detection method and device based on improved Faster RCNN and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110593538.8A CN113536896B (en) 2021-05-28 2021-05-28 Insulator defect detection method and device based on improved Faster RCNN and storage medium

Publications (2)

Publication Number Publication Date
CN113536896A true CN113536896A (en) 2021-10-22
CN113536896B CN113536896B (en) 2022-07-08

Family

ID=78094881

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110593538.8A Active CN113536896B (en) 2021-05-28 2021-05-28 Insulator defect detection method and device based on improved Faster RCNN and storage medium

Country Status (1)

Country Link
CN (1) CN113536896B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114022705A (en) * 2021-10-29 2022-02-08 电子科技大学 Adaptive target detection method based on scene complexity pre-classification
CN114998576A (en) * 2022-08-08 2022-09-02 广东电网有限责任公司佛山供电局 Method, device, equipment and medium for detecting loss of cotter pin of power transmission line

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427920A (en) * 2018-02-26 2018-08-21 杭州电子科技大学 A kind of land and sea border defense object detection method based on deep learning
CN110751214A (en) * 2019-10-21 2020-02-04 山东大学 Target detection method and system based on lightweight deformable convolution
CN111783819A (en) * 2020-05-08 2020-10-16 国家电网有限公司 Improved target detection method based on region-of-interest training on small-scale data set
CN112101430A (en) * 2020-08-28 2020-12-18 电子科技大学 Anchor frame generation method for image target detection processing and lightweight target detection method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427920A (en) * 2018-02-26 2018-08-21 杭州电子科技大学 A kind of land and sea border defense object detection method based on deep learning
CN110751214A (en) * 2019-10-21 2020-02-04 山东大学 Target detection method and system based on lightweight deformable convolution
CN111783819A (en) * 2020-05-08 2020-10-16 国家电网有限公司 Improved target detection method based on region-of-interest training on small-scale data set
CN112101430A (en) * 2020-08-28 2020-12-18 电子科技大学 Anchor frame generation method for image target detection processing and lightweight target detection method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
付磊等: ""基于ResNet网络的医用塑瓶制造缺陷检测方法"", 《计算机与现代化》 *
罗晖等: ""基于改进Faster R-CNN的钢轨踏面块状伤损检测方法"", 《计算机应用》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114022705A (en) * 2021-10-29 2022-02-08 电子科技大学 Adaptive target detection method based on scene complexity pre-classification
CN114022705B (en) * 2021-10-29 2023-08-04 电子科技大学 Self-adaptive target detection method based on scene complexity pre-classification
CN114998576A (en) * 2022-08-08 2022-09-02 广东电网有限责任公司佛山供电局 Method, device, equipment and medium for detecting loss of cotter pin of power transmission line
CN114998576B (en) * 2022-08-08 2022-12-30 广东电网有限责任公司佛山供电局 Method, device, equipment and medium for detecting loss of cotter pin of power transmission line

Also Published As

Publication number Publication date
CN113536896B (en) 2022-07-08

Similar Documents

Publication Publication Date Title
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN108961235B (en) Defective insulator identification method based on YOLOv3 network and particle filter algorithm
CN109859190B (en) Target area detection method based on deep learning
CN109800631B (en) Fluorescence coding microsphere image detection method based on mask region convolution neural network
CN110135503B (en) Deep learning identification method for parts of assembly robot
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN111640125A (en) Mask R-CNN-based aerial photograph building detection and segmentation method and device
CN110569782A (en) Target detection method based on deep learning
CN111898432B (en) Pedestrian detection system and method based on improved YOLOv3 algorithm
CN113536896B (en) Insulator defect detection method and device based on improved Faster RCNN and storage medium
CN110135430A (en) A kind of aluminium mold ID automatic recognition system based on deep neural network
CN112926652B (en) Fish fine granularity image recognition method based on deep learning
CN113420643B (en) Lightweight underwater target detection method based on depth separable cavity convolution
CN114663346A (en) Strip steel surface defect detection method based on improved YOLOv5 network
CN111199523A (en) Power equipment identification method and device, computer equipment and storage medium
CN111680705B (en) MB-SSD method and MB-SSD feature extraction network suitable for target detection
CN116645592B (en) Crack detection method based on image processing and storage medium
CN113628211B (en) Parameter prediction recommendation method, device and computer readable storage medium
CN111476226B (en) Text positioning method and device and model training method
CN112270404A (en) Detection structure and method for bulge defect of fastener product based on ResNet64 network
CN112132207A (en) Target detection neural network construction method based on multi-branch feature mapping
CN114283431B (en) Text detection method based on differentiable binarization
CN116597275A (en) High-speed moving target recognition method based on data enhancement
CN116189160A (en) Infrared dim target detection method based on local contrast mechanism
CN114219757A (en) Vehicle intelligent loss assessment method based on improved Mask R-CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant