CN111783819A - Improved target detection method based on region-of-interest training on small-scale data set - Google Patents

Improved target detection method based on region-of-interest training on small-scale data set Download PDF

Info

Publication number
CN111783819A
CN111783819A CN202010383794.XA CN202010383794A CN111783819A CN 111783819 A CN111783819 A CN 111783819A CN 202010383794 A CN202010383794 A CN 202010383794A CN 111783819 A CN111783819 A CN 111783819A
Authority
CN
China
Prior art keywords
training
target detection
detection model
scale
detection method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010383794.XA
Other languages
Chinese (zh)
Other versions
CN111783819B (en
Inventor
尹子会
付炜平
赵冀宁
孟荣
贾志辉
董俊虎
杜江龙
赵振兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
North China Electric Power University
Maintenance Branch of State Grid Hebei Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
North China Electric Power University
Maintenance Branch of State Grid Hebei Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, North China Electric Power University, Maintenance Branch of State Grid Hebei Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202010383794.XA priority Critical patent/CN111783819B/en
Publication of CN111783819A publication Critical patent/CN111783819A/en
Application granted granted Critical
Publication of CN111783819B publication Critical patent/CN111783819B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an improved target detection method based on region-of-interest training on a small-scale data set, which belongs to the technical field of image analysis, and is characterized in that an image target detection result is obtained through a target detection model, the training process of the target detection model comprises a stage of independently performing frame regression task training and classification task training in sequence in a circulating mode, the frame regression task training is performed on the target detection model by using a first training set obtained by enhancing first data of the small-scale data set, and the classification task training is performed on the target detection model by using a second training set obtained by enhancing second data of the first training set; each image of the second training set contains global information of part of the picture outside the region of interest. According to the method, a region-of-interest mechanism is introduced in a training Stage, so that the overfitting phenomenon which is easy to occur when the existing One-Stage target detection model is trained on a small-scale data set is overcome, and an accurate target detection model is obtained.

Description

Improved target detection method based on region-of-interest training on small-scale data set
Technical Field
The invention belongs to the technical field of image analysis, and relates to a target detection method based on region-of-interest training improvement on a small-scale data set.
Background
Deep Learning (DL) is a research direction in the field of Machine Learning (ML), and features are extracted through neural network Learning instead of using the characteristic of artificially extracting features, so that the Learning efficiency and accuracy are greatly improved, and the Deep Learning (DL) is widely applied to the fields of image classification, target detection, image segmentation, natural language processing and the like. However, since the deep learning method is generally data-driven, it has high requirements for sample data quantity, richness, accuracy and the like. In the field of target detection, if the sample data volume and the abundance are insufficient, the deep learning model not only extracts the target features in the learning sample, but also brings the background noise in the sample into the learning range, so that the model is over-fitted to the data. After the overfitting occurs, the recall rate of the target detection to the target is seriously reduced, and the detection performance is seriously influenced.
Target detection methods based on deep learning generally fall into two categories: the Two-Stage detection algorithm divides a detection problem into Two stages, the first Stage generates a candidate Region, the second Stage classifies and corrects the position of a target, and a main representative model comprises a Region with CNN (R-CNN), Fast R-CNN and the like; the other is a One-Stage detection algorithm, which directly predicts the type probability and the position information of a target by using a single network without generating a candidate region, and typically represents an ssd (single Shot multi box detector) model and a yolo (yolook Only One) model.
For the One-Stage target detection model, due to the lack of a target frame first-check mechanism similar to the Two-Stage algorithm, more serious overfitting is often generated on training set data during classification training. Particularly in
Disclosure of Invention
The invention aims to provide an improved target detection method based on region-of-interest training on a small-scale data set, a region-of-interest mechanism is introduced in a training Stage, and the phenomenon that an existing One-Stage target detection model is easy to be over-fitted when trained on the small-scale data set is overcome, so that an accurate target detection model is obtained.
The technical scheme provided by the invention is an improved target detection method based on region-of-interest training on a small-scale data set, and an image target detection result is obtained through a target detection model, wherein the target detection model comprises a multi-layer output depth feature extraction network and a multi-scale fusion detection head; the training process of the target detection model comprises a stage of independently performing frame regression task training and classification task training in a circulating mode. The independent training can be realized by adjusting a loss coefficient in a loss function, so that the classification task training at the stage can possibly learn partial global information of each picture in a training set, and meanwhile, the frame identification learning of a frame regression task training on an interested region is not influenced.
In one embodiment of the invention, the small-scale data set marking the region of interest is used for performing frame regression task training and classification task training on the target detection model.
In one embodiment of the present invention, the deep feature extraction network is pre-trained using a large-scale data set, the large-scale data set is a classified data set, classification categories of the classified data set are basically irrelevant to classification of a target to be recognized, the deep feature extraction network is a classification model which is only classified and does not regress, and a process of training the classification model is pre-training, and network weights obtained by the pre-training can shorten training time based on a small-scale data set. When using a large-scale dataset without classification, the dataset needs to be transformed to obtain a dataset in the classification format required for pre-training.
In one embodiment of the present invention, the frame regression task training is performed on the target detection model using a first training set obtained by first data enhancement on the small-scale data set, and the classification task training is performed on the target detection model using a second training set obtained by second data enhancement on the first training set; and each image of the second training set contains partial global information of the picture outside the region of interest. Different small-scale training sets are used in the Stage of circularly and independently performing frame regression task training and classification task training in sequence, the first training set aims to enable the One-Stage type target detection model to obtain frame recognition capability, the second training set aims to enable the One-Stage type target detection model to obtain classification capability, and the classification capability can inhibit overfitting.
An improvement of the above embodiment may be that the first data enhancement is used to obtain a first training set of a size larger than the small-scale data set by one or more of flipping, panning, blurring, scaling, and cropping; the second data enhancement is used for preserving background information of a background area of the image according to a distance part between the background area and an interested area of the image, and the method comprises the step of adding noise. And a first training set with the scale larger than that of the original small-scale data set is obtained through first data enhancement to obtain richer training data, and a second training set with the scale basically the same as that of the first training set but containing partial global information is obtained through second data enhancement to reserve partial background information and improve the classification and identification capacity of the target detection model after training.
In one embodiment of the present invention, exemplarily, a noise adding method is provided: for a picture marked with several regions of interest, its pixel px,yAmplitude n of the added noisex,yIs min (b, a × d), where d is the pixel px,yThe shortest distance to all interested areas, a is a noise intensity parameter, and b is the maximum noise intensity. In a further improvement, the training results can be optimized by adjusting the above parameters.
In one embodiment of the invention, in the multi-scale fusion detection head, feature images with different sizes in the output of the obtained depth model feature extraction network are subjected to up-sampling, fusion and convolution layer by utilizing a feature pyramid network structure, so that target detection output with n scales, the number of which is the same as that of the detection heads, is obtained.
In an embodiment of the present invention, each of the multi-scale fusion detection heads includes a classification output layer for training a classification task and a regression output layer for training a frame regression task. In an independent training, if the loss coefficients corresponding to the classification output layer account for the larger weight of all losses, the training can be concentrated in the classification task training, and if the loss coefficients corresponding to the regression output layer account for the larger weight of all losses, the training can be concentrated in the frame regression task training.
In an embodiment of the present invention, a learning rate of each frame regression task training is lower than a learning rate of a last frame regression task training, and a learning rate of each classification task training is lower than a learning rate of a last classification task training.
In an embodiment of the invention, after the loop independent training phase is finished, the target detection model is fine-tuned using the first training set. In the fine tuning of the model, basically, the weights of the losses are slightly different, so that the classification task training and the bounding box regression task training are simultaneously considered in the fine tuning training.
Compared with the prior art, the invention has the beneficial effects that:
the invention improves the defect that the existing One-Stage target detection excessively depends on data by improving the data enhancement and training methods. The training input data is subjected to local limit strong denoising processing, and the farther the training input data is from a target, the higher the noise intensity is, so that the fitting difficulty of the feature extraction network on the background noise of the input picture is increased, and the overfitting possibility of the model on a small data set is reduced. And for the area close to the target, partial background information is also reserved, so that the network can adaptively learn features in different ranges. During training, a regression task and a classification task are respectively trained. Different training sets are used according to different tasks: for the regression task needing more global information, the image without noise is input, so that the global information is easier to extract; and for a classification task needing to pay more attention to the local part, inputting a noise-added picture and paying more attention to the target characteristic. Through testing, the method has relatively common practical significance on small-scale data sets. The invention is practical and has certain reference significance for the scheme design of related problems.
Drawings
FIG. 1 is a schematic diagram of a target detection model according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating a method for training a target detection model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a data flow of the embodiment of FIG. 2 in training a target detection model;
FIG. 4 is a graph of images containing a target object in a second training set after being noisy with a maximum noise strength according to an embodiment of the present invention;
FIG. 5 is a graph of images containing a target object in a second training set after being noisy with a maximum noise strength according to an embodiment of the present invention;
FIG. 6 is a partial inspection result of a defect image of substation equipment after training by the method of the present invention in an application embodiment;
fig. 7 is a result image obtained by detecting a part of a VOC2007 data set image using the method of the present invention in one application example.
Detailed Description
It should be noted at first that the basic idea of the technical solution of the present invention is: when the One-Stage type target detection model is used for training, firstly, input data are processed, the learning difficulty of different regions of an image is adjusted, the target classification task training process and the frame regression task training process can be adaptive to the attention range, namely the region of interest, and meanwhile, the classification task training process can learn partial whole information and pay attention to local information. In one aspect of the model training phase, the method aims at forming a soft region-of-interest mechanism, which is a method different from a candidate region mechanism of Two-Stage, in a target detection model of Two-Stage, a module for identifying a candidate region as a region-of-interest is included, when classifying, a part of a target is directly segmented from extracted full-image features, the segmentation is a hard region-of-interest mechanism other than that, while the method adjusts the learning difficulty of different regions, such as gradually increasing different-intensity noise, and does not set a soft definite boundary to enable the classification task training process to pay attention to the target itself.
The technical scheme of the invention is based on a target detection model, as shown in figure 1, the target detection model is basically One-Stage and comprises a multi-layer output depth feature extraction network 1 and a multi-scale fusion detection head 2. The multi-layer output depth feature extraction network comprises n layers of trunk networks from top to bottom, each layer of trunk network comprises one or more convolution layers, each layer of trunk network outputs a feature map of one scale to the lower layer of trunk network, and a plurality of trunk networks are selected from the trunk networks from top to bottom to output the feature map of one scale acquired by the layer to the multi-scale fusion detection head 2. The deeper the backbone network goes down, the smaller the scale of its output profile. In the multiscale fusion Detection Head 2, a plurality of independent Detection heads (Detection heads) are arranged corresponding to the selected backbone networks in a consistent and scale manner, the characteristic diagram output by the selected backbone network is subjected to upsampling and tensor splicing in the multiscale fusion Detection Head 2 layer by layer, except for the bottommost layer or the minimum scale Detection Head which directly identifies the characteristic diagram output by the bottommost layer in the backbone network, the input of other Detection heads is the characteristic diagram after the tensor splicing of the layer. The output of each detection head is processed by a regression output layer and a classification output layer respectively and then is used as a target detection output result of the target detection model. In the embodiment of fig. 1, 1 ≧ i > j > n, because three backbone networks of the i-th layer, the j-th layer and the n-th layer are respectively selected, and correspondingly, three detection heads for different scales are arranged in the multi-scale fusion detection head 2 from bottom to top, in other embodiments, the number of the detection heads in the multi-scale fusion detection head 2 is correspondingly different due to the different number of the selected backbone networks.
In the embodiment shown in fig. 2 and 3, based on the structure of the target detection model, the target detection model is trained in the following steps S100 to S110, so as to obtain the weight values of the nodes of the target detection model.
S100, pre-training based on a large-scale data set is carried out on the multi-layer output depth feature extraction network, and initial parameter values of the target detection model are obtained.
Specifically, a multi-tiered output depth feature extraction network in a target detection model is pre-trained using a large-scale dataset. And taking the weight value obtained by pre-training as an initial parameter value of a depth feature extraction network in the target detection model so as to achieve the purposes of accelerating convergence speed and improving detection precision.
Illustratively, the large-scale data set in the embodiment of the invention is an image data set provided by ImageNet, and the multi-layer output deep feature extraction network is a MobileNet V2 network.
S101, obtaining an Anchor point (Anchor) required for training a target detection model by using a small-scale data set marked with a region of interest.
Specifically, the invention marks the region of interest by setting a group Truth target box for each picture of the small-scale data set, and exemplarily, the region of interest is a minimum rectangle covering the device of interest. Based on the small-scale data set, performing cluster analysis after normalization on the size of the Ground Truth target frame, exemplarily, the embodiment analyzes the size distribution of the Ground Truth target frame by using a Kmeans algorithm to obtain a group of size clustering results about the Ground Truth target frame, exemplarily, the results include a plurality of different scales, one scale corresponds to the shape of one Anchor frame (Anchor Box), and a set including a plurality of Anchor frame scales is established.
And taking the feature point of the feature map of each scale as an anchor point, wherein each anchor point corresponds to anchor frames with the sizes of a plurality of anchor frames in the set, and the number of the anchor frames of all the anchor points needing to be detected by the multi-scale fusion detection head for one image is as follows:
Figure BDA0002483172490000071
wherein, wiAnd hiSpecifically, for a feature map with a size of 7 × 10, 70 pixels, i.e. 70 feature points and 70 anchor points, if each anchor point corresponds to 3 anchor frame sizes, the detection head corresponding to the feature map detects 210 anchor framesAnd the decoder after the output of the detection head decodes the combination output by the multi-scale fusion detection head.
S102, a first training set used for performing frame regression task training on the target detection model and a second training set used for performing classification task training on the target detection model are obtained by using the small-scale data set marked with the region of interest.
Exemplarily, in this embodiment, a larger number of pictures are obtained by enhancing the small-scale data set pictures respectively through methods such as flipping, translating, blurring, zooming, cropping, and the like, and a set of the pictures is used as the first training set.
Exemplarily, in this embodiment, the first training set is subjected to noise adding processing according to a distance between one pixel of each picture and each groudtruth target frame, and then the first training set is used as the second training set. The specific noise adding method is that for a picture marked with a plurality of interested areas, the pixel px,yAmplitude n of the added noisex,yComprises the following steps:
min(b,a×d)
wherein d is a pixel px,yThe shortest distance to all interested areas, a is a noise intensity parameter, and b is the maximum noise intensity. And taking the set of the pictures subjected to the noise processing as a second training set. The noise addition keeps partial background information of the background area outside the interested area, namely, in each picture of the second training set, the interested area has no clear visual boundary, and the closer to the boundary position of the interested area of the picture, the more the retained background information is, in the Two-Stage type target detection model, after the candidate area identified in the first Stage, the detection information provided for the second Stage does not contain any background information outside the candidate area.
S103, in the multi-scale fusion detection heads, multi-scale fusion is carried out to obtain target detection data of each detection head.
Specifically, in the forward propagation process of the depth model feature extraction network, the initial parameters obtained by pre-training in S100 are used to select output feature maps of different sizes of a plurality of layers of backbone networks with different depths in the depth model feature extraction network as the output of the depth model feature extraction network. In the multi-scale fusion detection head, the Feature Pyramid Network (FPN) structure is utilized to perform up-sampling, fusion and convolution on feature maps with different sizes in the output of the obtained depth model feature extraction network, and the target detection output of n scales which is the same as the number n of the detection heads is obtained as follows:
wi×hi×k×(c+5)
where c is the number of target classes, wiAnd hiThe length and width of the ith output convolution signature, respectively. And outputting c classification results of all anchors, four coordinates of the corresponding prediction boxes and a confidence coefficient. The four coordinates of the prediction frame are respectively the abscissa position, the ordinate position, the prediction frame length and the prediction frame width.
S104, a decoding algorithm of the output of the multi-scale fusion detection head is configured. The decoding algorithm aims to convert the output of the target detection model detection head into a coordinate prediction result, namely, coordinates in a real picture.
Specifically, in this embodiment, the anchor frame generated in step S101 is used to perform regression training, the anchor frame with the largest IOU compared to the groudtruth target frame is selected as the anchor point responsible for predicting a target object, and the relationship between the prediction output and the actual coordinate is expressed by equations (1) to (4):
x′=x+sig mod(px)×w (1)
y′=y+sig mod(py)×h (2)
Figure BDA0002483172490000091
Figure BDA0002483172490000092
wherein, x ', y', w 'and h' respectively represent the center coordinate, length and width of the regression of anchor points Anchors in the anchor point set, x, y, w and h respectively represent the upper left point coordinate of the anchor points Anchors in the anchor point set and the width and height of the Anchors, and px、py、pw、phAnd predicting the regression value obtained in the primary frame regression training on the whole target detection network.
When the target detection model is used for prediction, for the classification result of each anchor point, the product of c classification prediction results of the anchor point and the confidence coefficient of the anchor point is used as the confidence coefficient of c classes. And selecting a value as a threshold value for ensuring that the anchor point correctly predicts the target, wherein the value range of the threshold value is 0-1, and the preferred value is 0.7. And for each anchor point, when the confidence coefficient of one or more classes is greater than or equal to a threshold value, outputting the anchor point as effective output, and performing non-maximum suppression processing to obtain a final prediction frame.
And S105, configuring a sum loss function in the training of the target detection model.
Specifically, in this embodiment, for the Anchor responsible for detecting the target, the confidence C is 1; the Anchor which does not burden the detection target and the IOU of the prediction box and the ground truth is more than 0.5 is ignored; the other Anchor confidence C is 0.
The embodiment uses the cross entropy function as the loss function of confidence prediction, and the formula is as follows
Figure BDA0002483172490000101
Figure BDA0002483172490000102
In the formula, CijIs a measure of the confidence in the prediction,
Figure BDA0002483172490000103
is a true confidence value, the network has n output scales, sigma is sigmoid function, and the Anchor is responsible for prediction
Figure BDA0002483172490000104
Figure BDA0002483172490000104
1, Anchor is not responsible for prediction
Figure BDA0002483172490000105
Is 0; when the Anchor is ignored, the control unit,
Figure BDA0002483172490000106
otherwise it is 1.
The cross entropy function is used as a loss function of the classification prediction network, and the formula is as follows
Figure BDA0002483172490000107
In the formula, pijIs the value of the predicted classification that is,
Figure BDA0002483172490000108
is a real classification value, the network has n output scales, sigma is a sigmoid function, and the Anchor is in charge of prediction
Figure BDA0002483172490000109
Is 1, and is 0 when Anchor is not responsible for prediction.
On the frame regression, the invention uses the mean square error loss function, and the formula is as follows:
Figure BDA00024831724900001010
Figure BDA00024831724900001011
in the formula, xij、yij、wij、hijIs the predicted frame center coordinates and length width,
Figure BDA00024831724900001012
is the real frame center coordinate and width height.
The sum loss function is given by:
LOSS=αobjLobjnoobjLnoobjclassLclasswhLwhxyLxy(10)
in the formula, αobj、αnoobj、αclass、αwh、αxyThe weights of the respective loss functions of equations (5) to (9).
And S106, under the premise of determining the decoding algorithm and the sum loss function, performing frame regression task training on the target detection model through the first training set to obtain the target detection model subjected to frame regression task training independently performed once.
Specifically, a in the summation loss function is adjusted in a frame regression task training which is independently implementedclassThe value of (a) is zero, which is equivalent to training only the frame regression output capability of the target detection model.
Specifically, most images of the training set are used as train sets, the rest are used as valid sets, the train sets are trained by using a first learning rate, exemplarily, the learning rate is 0.001, and the valid sets are used as verification sets, α is usedclassIs set to 0, αnoobjSet to 0.01 and the remaining weights to 1, will stop this bounding box regression task training when the loss of the validation set is no longer decreasing, in other embodiments αclassWeight values much smaller than other loss coefficients can be set so that the training is focused on the bounding box regression task, at the same time αnoobjA smaller weight is also set.
And S107, on the premise of determining the decoding algorithm and the sum loss function, performing classification task training on the target detection model through a second training set to obtain the target detection model which is subjected to the classification task training independently performed once.
Specifically, in a classification task training which is independently implemented, a in a sum loss function is usedwh, axyAll are set to zero, namely, the training is equivalent to only the classification task output capability of the target detection model.
Specifically, the second training set is used for training. Most of the training set is taken as train set and the rest as evaluation set. Train the train set using the second learning rate, exemplarily, set the second learning rate to 0.001, while validating the set as the verification set. In particularThe present embodiment will αclassSetting the weight as 1 and setting the other weights as 0 for training. And stopping the training of the classification task when the loss of the verification set does not decrease.
And S108, repeating S106 and S107 in a circulating mode in sequence, and gradually reducing the first learning rate and the second learning rate in the circulating mode until the loss of the frame regression task training under the first learning rate is not reduced compared with the loss of the last frame regression task training, and meanwhile, the loss of the classification task training under the second learning rate is not reduced compared with the loss of the last classification task training.
Specifically, the first learning rate used each time S106 is implemented is lower than the first learning rate used in the previous implementation of S106, for example, if the first learning rate is 0.001, this time may be 0.0005; the second learning rate used each time S107 is executed is lower than the second learning rate used in S107 last time, and for example, if the second learning rate is 0.001 last time, this time, the second learning rate may be 0.0005. The first learning rate and the second learning rate may be different in each cycle. Meanwhile, since S106 and S107 are repeatedly performed, at the start of the first cycle, S106 may be performed after S107 is performed first.
S109, α at lower levelnoobjAnd (4) weight fine tuning the model.
Specifically, a learning rate lower than the first learning rate and the second learning rate used in the last cycle is set, the first training set is used to train the target detection model to be fine-tuned as a whole, and the total loss from the target detection model obtained in the training S108 to the verification set does not decrease, exemplarily, the sum-up loss function in the training will αnoobjSet to 0.01 and the remaining weights to 1, to reduce L in the fine tuning trainingonobjThe overall weight of (c).
And S110, testing the model.
And predicting the image by using the original non-enhanced small-scale data set as a test set by using the trained target detection model obtained in S108 or S109. And evaluating the performance of the model according to the accuracy of the prediction result.
Detailed description of the preferred embodiment
In a specific embodiment, after data enhancement and noise addition are performed on a substation equipment defect image, the substation equipment defect image is respectively used as an input of a target detection model of the present invention, and the noisy second training set substation equipment defect images shown in fig. 4 and 5 are processed, wherein fig. 4 shows a processing result with a maximum noise intensity of 127, an area of interest contains a target object of cattle, fig. 5 shows a processing result with a maximum noise intensity of 255, the area of interest contains an object of a respirator, it can be seen that there is no clear boundary between the area containing the target object and the background, and the area of interest contains more background information as the area of interest is closer, so that the present invention is a soft area of interest mechanism. In contrast, in a picture containing a candidate region of a hard roi mechanism, the image is completely black outside the candidate region, i.e. the background information of any region outside the candidate region is zero. The MobileNet V2 network with the resolution of 320 x 224 and the pre-training is used as the depth feature extraction network of the target detection model, two main network output feature maps are selected in the depth feature extraction network and set as two outputs of the depth feature extraction network, and the scale and the size are 7 x 10 and 14 x 20 respectively. The normalized sizes of the anchors generated from the data sets are (0.73 × 0.79), (0.54 × 0.42), (0.33 × 0.71), (0.24 × 0.25), (0.16 × 0.46), and (0.07 × 0.16), respectively, and after training using the algorithm of the present invention, the detection results of the defective image portion of the substation equipment are shown in fig. 6, where (a) is a discoloration failure of the respirator, (b) is a normal respirator, (c) is a breakage failure of the insulator, and (d) is a bird nest foreign object.
Detailed description of the invention
And selecting a part of the VOC2007 data set as a small-scale data set of the target detection model, labeling, performing data enhancement and noise addition, and using the labeled part of the VOC2007 data set as the input of the target detection model, and extracting the characteristics by using a 320X 224 pre-trained MobileNet V2 network, wherein the two outputs are set by the characteristic extraction network, and the sizes of the two outputs are 7X 10 and 14X 20 respectively. The Anchor normalized sizes generated from the data sets are (0.50 × 0.72), (0.46 × 0.33), (0.30 × 0.36), (0.20 × 0.56), (0.17 × 0.27), (0.10 × 0.11), respectively, and after training using the algorithm of the present invention, partial image detection results for the VOC2007 data set using the method of the present invention are shown in FIG. 7, where (a) is a bus and (b) is a cow.

Claims (10)

1. An improved target detection method based on region-of-interest training on a small-scale data set obtains an image target detection result through a target detection model, and is characterized in that: the target detection model comprises a multi-layer output depth feature extraction network and a multi-scale fusion detection head; the training process of the target detection model comprises a stage of independently performing frame regression task training and classification task training in a circulating mode.
2. The object detection method according to claim 1, characterized in that: and performing frame regression task training and classification task training on the target detection model by using the small-scale data set for marking the interested region.
3. The object detection method according to claim 1, characterized in that: pre-training the deep feature extraction network using a large-scale dataset.
4. The object detection method according to claim 2, characterized in that: performing the frame regression task training on the target detection model by using a first training set obtained by performing first data enhancement on the small-scale data set, and performing the classification task training on the target detection model by using a second training set obtained by performing second data enhancement on the first training set; and each image of the second training set contains partial global information of the picture outside the region of interest.
5. The object detection method according to claim 4, characterized in that: the first data enhancement is used to obtain a first training set that is larger in size than the small-scale data set by one or more of flipping, panning, blurring, zooming, and cropping; the second data enhancement is used for preserving background information of a background area of the image according to a distance part between the background area and an interested area of the image, and the method comprises the step of adding noise.
6. The object detection method according to claim 5, characterized in that: the noise adding method is that for a picture marked with a plurality of interested areas, the pixels of the picture are
Figure DEST_PATH_IMAGE002
Amplitude of the added noise
Figure DEST_PATH_IMAGE004
Is composed of
Figure DEST_PATH_IMAGE006
Wherein, in the step (A),
Figure DEST_PATH_IMAGE008
is a pixel
Figure 219601DEST_PATH_IMAGE002
The shortest distance to all the regions of interest,
Figure DEST_PATH_IMAGE010
in order to be a parameter of the intensity of the noise,
Figure DEST_PATH_IMAGE012
the maximum noise intensity.
7. The object detection method according to claim 2, characterized in that: in the multi-scale fusion detection head, the feature pyramid network structure is utilized to perform up-sampling, fusion and convolution on feature graphs with different sizes in the output of the obtained depth model feature extraction network layer by layer to obtain the number of detection heads
Figure DEST_PATH_IMAGE014
Same as
Figure 120430DEST_PATH_IMAGE014
And outputting target detection of each scale.
8. The object detection method according to claim 2, characterized in that: each detection head of the multi-scale fusion detection head comprises a classification output layer used for classification task training and a regression output layer used for frame regression task training.
9. The object detection method according to claim 1, characterized in that: the learning rate of each frame regression task training is lower than that of the last frame regression task training, and meanwhile, the learning rate of each classification task training is lower than that of the last classification task training.
10. The object detection method according to claim 4, characterized in that: and after the stage is finished, fine-tuning the target detection model by using the first training set.
CN202010383794.XA 2020-05-08 2020-05-08 Improved target detection method based on region of interest training on small-scale data set Active CN111783819B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010383794.XA CN111783819B (en) 2020-05-08 2020-05-08 Improved target detection method based on region of interest training on small-scale data set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010383794.XA CN111783819B (en) 2020-05-08 2020-05-08 Improved target detection method based on region of interest training on small-scale data set

Publications (2)

Publication Number Publication Date
CN111783819A true CN111783819A (en) 2020-10-16
CN111783819B CN111783819B (en) 2024-02-09

Family

ID=72753473

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010383794.XA Active CN111783819B (en) 2020-05-08 2020-05-08 Improved target detection method based on region of interest training on small-scale data set

Country Status (1)

Country Link
CN (1) CN111783819B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990348A (en) * 2021-04-12 2021-06-18 华南理工大学 Small target detection method for self-adjustment feature fusion
CN113536896A (en) * 2021-05-28 2021-10-22 国网河北省电力有限公司石家庄供电分公司 Small target detection method, device and storage medium based on improved fast RCNN
CN113673510A (en) * 2021-07-29 2021-11-19 复旦大学 Target detection algorithm combining feature point and anchor frame joint prediction and regression
CN113808084A (en) * 2021-08-25 2021-12-17 杭州安脉盛智能技术有限公司 Model-fused online tobacco bale surface mildew detection method and system
CN114299366A (en) * 2022-03-10 2022-04-08 青岛海尔工业智能研究院有限公司 Image detection method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109615016A (en) * 2018-12-20 2019-04-12 北京理工大学 A kind of object detection method of the convolutional neural networks based on pyramid input gain
CN109614985A (en) * 2018-11-06 2019-04-12 华南理工大学 A kind of object detection method based on intensive connection features pyramid network
CN110766098A (en) * 2019-11-07 2020-02-07 中国石油大学(华东) Traffic scene small target detection method based on improved YOLOv3
CN111046923A (en) * 2019-11-26 2020-04-21 佛山科学技术学院 Image target detection method and device based on bounding box and storage medium
CN111091105A (en) * 2019-12-23 2020-05-01 郑州轻工业大学 Remote sensing image target detection method based on new frame regression loss function

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614985A (en) * 2018-11-06 2019-04-12 华南理工大学 A kind of object detection method based on intensive connection features pyramid network
CN109615016A (en) * 2018-12-20 2019-04-12 北京理工大学 A kind of object detection method of the convolutional neural networks based on pyramid input gain
CN110766098A (en) * 2019-11-07 2020-02-07 中国石油大学(华东) Traffic scene small target detection method based on improved YOLOv3
CN111046923A (en) * 2019-11-26 2020-04-21 佛山科学技术学院 Image target detection method and device based on bounding box and storage medium
CN111091105A (en) * 2019-12-23 2020-05-01 郑州轻工业大学 Remote sensing image target detection method based on new frame regression loss function

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUE WU等: ""Rethinking Classification and Localization for Object Detection"", 《ARXIV》, pages 1 - 13 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990348A (en) * 2021-04-12 2021-06-18 华南理工大学 Small target detection method for self-adjustment feature fusion
CN112990348B (en) * 2021-04-12 2023-08-22 华南理工大学 Small target detection method based on self-adjusting feature fusion
CN113536896A (en) * 2021-05-28 2021-10-22 国网河北省电力有限公司石家庄供电分公司 Small target detection method, device and storage medium based on improved fast RCNN
CN113673510A (en) * 2021-07-29 2021-11-19 复旦大学 Target detection algorithm combining feature point and anchor frame joint prediction and regression
CN113673510B (en) * 2021-07-29 2024-04-26 复旦大学 Target detection method combining feature point and anchor frame joint prediction and regression
CN113808084A (en) * 2021-08-25 2021-12-17 杭州安脉盛智能技术有限公司 Model-fused online tobacco bale surface mildew detection method and system
CN114299366A (en) * 2022-03-10 2022-04-08 青岛海尔工业智能研究院有限公司 Image detection method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111783819B (en) 2024-02-09

Similar Documents

Publication Publication Date Title
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN111126472B (en) SSD (solid State disk) -based improved target detection method
CN111783819A (en) Improved target detection method based on region-of-interest training on small-scale data set
CN110348319B (en) Face anti-counterfeiting method based on face depth information and edge image fusion
CN109241982B (en) Target detection method based on deep and shallow layer convolutional neural network
CN111027493B (en) Pedestrian detection method based on deep learning multi-network soft fusion
CN111723748A (en) Infrared remote sensing image ship detection method
CN111652317B (en) Super-parameter image segmentation method based on Bayes deep learning
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN112464911A (en) Improved YOLOv 3-tiny-based traffic sign detection and identification method
CN112348036A (en) Self-adaptive target detection method based on lightweight residual learning and deconvolution cascade
CN110245620B (en) Non-maximization inhibition method based on attention
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN113076871A (en) Fish shoal automatic detection method based on target shielding compensation
CN111680705B (en) MB-SSD method and MB-SSD feature extraction network suitable for target detection
CN114841972A (en) Power transmission line defect identification method based on saliency map and semantic embedded feature pyramid
CN111553414A (en) In-vehicle lost object detection method based on improved Faster R-CNN
CN112528961A (en) Video analysis method based on Jetson Nano
CN112733942A (en) Variable-scale target detection method based on multi-stage feature adaptive fusion
CN116129291A (en) Unmanned aerial vehicle animal husbandry-oriented image target recognition method and device
CN114782410A (en) Insulator defect detection method and system based on lightweight model
CN110991374B (en) Fingerprint singular point detection method based on RCNN
CN116740758A (en) Bird image recognition method and system for preventing misjudgment
CN115546187A (en) Agricultural pest and disease detection method and device based on YOLO v5
CN112347967A (en) Pedestrian detection method fusing motion information in complex scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant