CN110738208A

CN110738208A - efficient scale-normalized target detection training method

Info

Publication number: CN110738208A
Application number: CN201910949649.0A
Authority: CN
Inventors: 张发恩; 赵江华; 秦永强
Original assignee: Innovation Qizhi (chongqing) Technology Co Ltd
Current assignee: Innovation Qizhi (chongqing) Technology Co Ltd
Priority date: 2019-10-08
Filing date: 2019-10-08
Publication date: 2020-01-31

Abstract

The invention relates to an high-efficiency dimension standardization target detection training method which comprises the steps of selecting an effective dimension range output by a model and the number of image pyramid layers according to problem analysis, constructing an image pyramid with the set number of layers, cutting out images with fixed sizes according to fixed intervals, selecting the least cut image capable of covering all marked targets in the effective dimension range from the cut images as a positive sample, simultaneously carrying out rough training detection on an original image, selecting a small image with the least effective dimension covering all false positive targets as a negative sample, putting the positive sample and the negative sample into a training set for training and then predicting, and carrying out NMS fusion on prediction results of multiple dimensions in a prediction stage.

Description

efficient scale-normalized target detection training method

Technical Field

The application belongs to the technical field of image recognition, and particularly relates to efficient scale-normalized target detection training methods.

Background

The development of target detection and identification technology is rapid, the market demand is gradually increased, and the following application scenes are mainly adopted:

① safety field, such as fingerprint identification and face identification;

② military field, terrain survey, flyer identification, etc.;

③ traffic field, license plate number recognition, unmanned driving, traffic sign recognition, etc.;

④ medical field, such as electrocardiogram, B-ultrasonic, health management, and nutriology;

⑤ the field of life, intelligent home, shopping, and intelligent skin test.

At present, most of target detection and recognition algorithms with the best practical effect are based on convolutional neural network methods, and the methods are all based on classification and regression tasks of model learning targets in Anchor of various preset scales and length-width ratios, but under the condition of target scale distribution , the number of Anchor required is larger, so that more parameters and calculated amount can be added to the model, meanwhile, as the convolutional neural network has obvious change in image feature description vectors when the image scale changes, the model is difficult to learn a plurality of scale stable recognition models, efficient training methods for target scale normalization are required.

Disclosure of Invention

The technical problem to be solved by the invention is to provide efficient scale-normalized target detection training methods, which can effectively reduce the calculated amount, accelerate the training speed and make the model more stable on multi-scale target recognition.

In order to solve the problems, the technical scheme adopted by the invention is as follows:

high-efficiency dimension standardization target detection training method includes constructing an image pyramid for an input image, generating an input image with a fixed size by combining an annotation target, selecting a positive sample by only the annotation target with dimension in a given range in the input image, performing negative sample mining training of a small image by using a training set, performing preliminary training by using any detection model, finding out a target with false positive, selecting the small image with the false positive target as a negative sample, putting the positive sample and the negative sample into the training set for model training, and fusing prediction results of multiple dimensions in a prediction stage.

The improvement of the technical scheme of the invention in step is that the method comprises the following steps:

a. selecting a positive sample: determining an original image, selecting an effective scale range output by a model corresponding to the original image and the number of image pyramid layers according to problem analysis, constructing an image pyramid with a set number of layers, then segmenting the image pyramid into images with fixed sizes according to fixed intervals, and selecting a least segmented image capable of covering all marked targets in the effective scale range from the segmented images as a positive sample;

b. selecting suitable detection models according to a determined training set, carrying out rough training on the detection models on the training set without accurate result, then testing on an original image by using the trained detection models to find false positive targets in the original image, then segmenting the original image pyramid according to a positive sample selection mode, selecting a small image containing false positive targets from the segmented small image as a negative sample, and skipping if the small image is added into the positive sample;

c. training and label mapping: selecting an end-to-end detection model to train a positive sample and a negative sample, and performing label mapping operation on targets falling into a small graph during training without screening according to a set scale range;

d. prediction and result fusion: in the prediction stage, the original image pyramid is received by the prediction model as input, different prediction results are generated by different pyramids, the result in the effective scale range is selected from the different prediction results to be output, and finally, the result of the input images under multiple scales is fused by using the non-maximum value to inhibit NMS.

The technical scheme of the invention is further improved in step , in step a, a small icon containing all marked targets as effective targets is determined as a positive sample by adopting a greedy algorithm.

The improvement of the technical scheme of the invention in step is that in step c, an end-to-end detection model is fast-Rcnn, SSD, Mask Rcnn or Tprobe.

The technical proposal of the invention is further improved by that in the step d, NMS fusion or Soft-NMS fusion is used in the fusion stage.

By adopting the technical scheme, the batch size can be set to be larger during model training, and the larger batch size can be input, so that the batch normalization parameters can be more accurately updated by the model at nodes according to training data, the model performance is improved, and a more stable detection result can be obtained when the target scale distribution in the data is .

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic illustration of a positive sample selection process according to the present invention;

FIG. 3 is a second schematic diagram illustrating a positive sample selection process according to the present invention;

FIG. 4 is a schematic diagram of a negative example selection process of the present invention;

FIG. 5 is a schematic diagram of the training and reasoning process of the present invention.

Detailed Description

The present invention is further illustrated in detail in connection with the following examples.

The invention discloses an efficient dimension standardization target detection training method, which comprises the steps of constructing an image pyramid on an input image, generating an input image with a fixed size by combining an annotation target, wherein only the annotation target with a dimension within a given range in the input image belongs to the effective target, selecting a positive sample, determining small icons containing all the annotation targets as the effective targets as positive samples by a greedy mode, performing negative sample mining training of the small icons by using a training set, performing preliminary training by using any detection model, finding out the existing false positive targets from the small icons, selecting the false positive targets as negative samples from the small icons, finally putting the positive samples and the negative samples into the training set for model training, and performing NMS (non-max-suppression) fusion on prediction results of multiple dimensions in a prediction stage.

The method comprises the following steps:

c. training and label mapping: selecting an end-to-end detection model to train a positive sample and a negative sample, wherein during training, the RPN label mapping can carry out label mapping operation on targets falling into a small graph, and screening can not be carried out according to a set scale range, and at least the result of ignoring the RPN corresponding to the targets falling out of the range is omitted;

In a, a greedy algorithm is adopted to determine a small icon containing all marked targets as effective targets as a positive sample.

In c, the end-to-end detection model is fast-Rcnn, SSD, Mask Rcnn or Tprobe.

In d, the fusion stage uses NMS fusion or Soft-NMS fusion.

The pyramid of images is a series image set arranged in a pyramid shape, the resolution is gradually reduced, and the pyramid is derived from an image set of original images, the image set is obtained by sampling downwards in a gradient mode, the sampling is stopped until a certain termination condition is reached, and the image of layers layers is compared to the pyramid mode, and the higher the level is, the smaller the image is, the lower the resolution is.

Fig. 2-3 are examples of positive sample selection. The positive sample adaptively samples the context region (also referred to as the chip) based on the presence of objects inside the image. In the image of fig. 2, ground truth boxes (represented by thin lines), and chips in the original image scale (represented by thick rectangles). FIG. 3: the down/up sampling is performed in consideration of the size of the object. The overlaid objects are displayed as rectangles and the invalid objects in the corresponding scale are displayed as other (colored) rectangles.

FIG. 4 is an example of negative sample selection, where line represents images and ground truth boxes, the bottom line represents negative proposals not contained in the positive chips (represented by circles with clarity at the center of each proposal) and negative chips generated based on the proposal (represented by rectangles).

Fig. 5 shows the training and reasoning process. Invalid RoI (Region of interest) that is out of the specified range at each scale is discarded during training and reasoning, with each batch of data during training including images sampled from a particular scale. The invalid GT (GT, real label) box is used to invalidate an anchor point in an RPN (Region pro-hierarchical network, which completes generation of a target Region), and after passing through an RCN (regional classification network), readjusts detection of each scale by a non-maximum suppression NMS (non-max-suppression) via a reset (cache), and performs combination and fusion. The related invalid real label (invalidGT) indicates that the scale of the invalid real label is not in the range of the valid scale, and the invalid real label is a real label which does not participate in direct training during training.

The RPN is a small network application of convolution layers (256 dimensions) + relu + two layers (clc layer and reg layer) on the sliding window area, and all sliding windows share the RPN.

Claims

The efficient dimension normalization target detection training method is characterized in that an image pyramid is constructed for an input image, the input image with a fixed size is generated by combining an annotation target, only the annotation target with a dimension within a given range in the input image belongs to an effective target, a positive sample is selected, meanwhile, a training set is used for negative sample mining training of small pictures, the target with false positive is found, the small pictures containing the target with false positive are selected as negative samples, finally, the positive sample and the negative sample are placed into the training set for model training, and prediction results of multiple dimensions are fused in a prediction stage.
2. The efficient scale-normalized target detection training method of claim 1, wherein: the method comprises the following steps:

a. selecting a positive sample: determining an original image, selecting an effective scale range output by a model corresponding to the original image and the number of image pyramid layers according to problem analysis, constructing an image pyramid with a set number of layers, then segmenting the image pyramid into images with fixed sizes according to fixed intervals, and selecting a least segmented image capable of covering all marked targets in the effective scale range from the segmented images as a positive sample;

b. selecting suitable detection models according to a determined training set, roughly training the detection models on the training set, testing the original images by using the trained detection models to find false positive targets in the original images, segmenting the original image pyramid according to a positive sample selection mode, selecting a small image containing false positive targets from the segmented small images as a negative sample, and skipping if the small image is added into the positive sample;

c. training and label mapping: selecting an end-to-end detection model to train a positive sample and a negative sample, and performing label mapping operation on target objects falling into a small graph during training without screening according to a set scale range;

d. prediction and result fusion: in the prediction stage, the original image pyramid is received by the prediction model as input, different prediction results are generated by different pyramids, the result in the effective scale range is selected from the different prediction results to be output, and finally, the result of the input images under multiple scales is fused by using the non-maximum value to inhibit NMS.
3. The efficient scale-normalized target detection training method of claim 2, wherein: in a, a greedy algorithm is adopted to determine a small icon containing all marked targets as effective targets as a positive sample.
4. The efficient scale-normalized target detection training method of claim 2, wherein: in c, the end-to-end detection model is fast-Rcnn, SSD, Mask Rcnn or Tprobe.
5. The efficient scale-normalized target detection training method of claim 2, wherein: in d, the fusion stage uses NMS fusion or Soft-NMS fusion.