CN110738208A - efficient scale-normalized target detection training method - Google Patents
efficient scale-normalized target detection training method Download PDFInfo
- Publication number
- CN110738208A CN110738208A CN201910949649.0A CN201910949649A CN110738208A CN 110738208 A CN110738208 A CN 110738208A CN 201910949649 A CN201910949649 A CN 201910949649A CN 110738208 A CN110738208 A CN 110738208A
- Authority
- CN
- China
- Prior art keywords
- training
- image
- selecting
- target
- positive sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Abstract
The invention relates to an high-efficiency dimension standardization target detection training method which comprises the steps of selecting an effective dimension range output by a model and the number of image pyramid layers according to problem analysis, constructing an image pyramid with the set number of layers, cutting out images with fixed sizes according to fixed intervals, selecting the least cut image capable of covering all marked targets in the effective dimension range from the cut images as a positive sample, simultaneously carrying out rough training detection on an original image, selecting a small image with the least effective dimension covering all false positive targets as a negative sample, putting the positive sample and the negative sample into a training set for training and then predicting, and carrying out NMS fusion on prediction results of multiple dimensions in a prediction stage.
Description
Technical Field
The application belongs to the technical field of image recognition, and particularly relates to efficient scale-normalized target detection training methods.
Background
The development of target detection and identification technology is rapid, the market demand is gradually increased, and the following application scenes are mainly adopted:
① safety field, such as fingerprint identification and face identification;
② military field, terrain survey, flyer identification, etc.;
③ traffic field, license plate number recognition, unmanned driving, traffic sign recognition, etc.;
④ medical field, such as electrocardiogram, B-ultrasonic, health management, and nutriology;
⑤ the field of life, intelligent home, shopping, and intelligent skin test.
At present, most of target detection and recognition algorithms with the best practical effect are based on convolutional neural network methods, and the methods are all based on classification and regression tasks of model learning targets in Anchor of various preset scales and length-width ratios, but under the condition of target scale distribution , the number of Anchor required is larger, so that more parameters and calculated amount can be added to the model, meanwhile, as the convolutional neural network has obvious change in image feature description vectors when the image scale changes, the model is difficult to learn a plurality of scale stable recognition models, efficient training methods for target scale normalization are required.
Disclosure of Invention
The technical problem to be solved by the invention is to provide efficient scale-normalized target detection training methods, which can effectively reduce the calculated amount, accelerate the training speed and make the model more stable on multi-scale target recognition.
In order to solve the problems, the technical scheme adopted by the invention is as follows:
high-efficiency dimension standardization target detection training method includes constructing an image pyramid for an input image, generating an input image with a fixed size by combining an annotation target, selecting a positive sample by only the annotation target with dimension in a given range in the input image, performing negative sample mining training of a small image by using a training set, performing preliminary training by using any detection model, finding out a target with false positive, selecting the small image with the false positive target as a negative sample, putting the positive sample and the negative sample into the training set for model training, and fusing prediction results of multiple dimensions in a prediction stage.
The improvement of the technical scheme of the invention in step is that the method comprises the following steps:
a. selecting a positive sample: determining an original image, selecting an effective scale range output by a model corresponding to the original image and the number of image pyramid layers according to problem analysis, constructing an image pyramid with a set number of layers, then segmenting the image pyramid into images with fixed sizes according to fixed intervals, and selecting a least segmented image capable of covering all marked targets in the effective scale range from the segmented images as a positive sample;
b. selecting suitable detection models according to a determined training set, carrying out rough training on the detection models on the training set without accurate result, then testing on an original image by using the trained detection models to find false positive targets in the original image, then segmenting the original image pyramid according to a positive sample selection mode, selecting a small image containing false positive targets from the segmented small image as a negative sample, and skipping if the small image is added into the positive sample;
c. training and label mapping: selecting an end-to-end detection model to train a positive sample and a negative sample, and performing label mapping operation on targets falling into a small graph during training without screening according to a set scale range;
d. prediction and result fusion: in the prediction stage, the original image pyramid is received by the prediction model as input, different prediction results are generated by different pyramids, the result in the effective scale range is selected from the different prediction results to be output, and finally, the result of the input images under multiple scales is fused by using the non-maximum value to inhibit NMS.
The technical scheme of the invention is further improved in step , in step a, a small icon containing all marked targets as effective targets is determined as a positive sample by adopting a greedy algorithm.
The improvement of the technical scheme of the invention in step is that in step c, an end-to-end detection model is fast-Rcnn, SSD, Mask Rcnn or Tprobe.
The technical proposal of the invention is further improved by that in the step d, NMS fusion or Soft-NMS fusion is used in the fusion stage.
By adopting the technical scheme, the batch size can be set to be larger during model training, and the larger batch size can be input, so that the batch normalization parameters can be more accurately updated by the model at nodes according to training data, the model performance is improved, and a more stable detection result can be obtained when the target scale distribution in the data is .
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a schematic illustration of a positive sample selection process according to the present invention;
FIG. 3 is a second schematic diagram illustrating a positive sample selection process according to the present invention;
FIG. 4 is a schematic diagram of a negative example selection process of the present invention;
FIG. 5 is a schematic diagram of the training and reasoning process of the present invention.
Detailed Description
The present invention is further illustrated in detail in connection with the following examples.
The invention discloses an efficient dimension standardization target detection training method, which comprises the steps of constructing an image pyramid on an input image, generating an input image with a fixed size by combining an annotation target, wherein only the annotation target with a dimension within a given range in the input image belongs to the effective target, selecting a positive sample, determining small icons containing all the annotation targets as the effective targets as positive samples by a greedy mode, performing negative sample mining training of the small icons by using a training set, performing preliminary training by using any detection model, finding out the existing false positive targets from the small icons, selecting the false positive targets as negative samples from the small icons, finally putting the positive samples and the negative samples into the training set for model training, and performing NMS (non-max-suppression) fusion on prediction results of multiple dimensions in a prediction stage.
The method comprises the following steps:
a. selecting a positive sample: determining an original image, selecting an effective scale range output by a model corresponding to the original image and the number of image pyramid layers according to problem analysis, constructing an image pyramid with a set number of layers, then segmenting the image pyramid into images with fixed sizes according to fixed intervals, and selecting a least segmented image capable of covering all marked targets in the effective scale range from the segmented images as a positive sample;
b. selecting suitable detection models according to a determined training set, carrying out rough training on the detection models on the training set without accurate result, then testing on an original image by using the trained detection models to find false positive targets in the original image, then segmenting the original image pyramid according to a positive sample selection mode, selecting a small image containing false positive targets from the segmented small image as a negative sample, and skipping if the small image is added into the positive sample;
c. training and label mapping: selecting an end-to-end detection model to train a positive sample and a negative sample, wherein during training, the RPN label mapping can carry out label mapping operation on targets falling into a small graph, and screening can not be carried out according to a set scale range, and at least the result of ignoring the RPN corresponding to the targets falling out of the range is omitted;
d. prediction and result fusion: in the prediction stage, the original image pyramid is received by the prediction model as input, different prediction results are generated by different pyramids, the result in the effective scale range is selected from the different prediction results to be output, and finally, the result of the input images under multiple scales is fused by using the non-maximum value to inhibit NMS.
In a, a greedy algorithm is adopted to determine a small icon containing all marked targets as effective targets as a positive sample.
In c, the end-to-end detection model is fast-Rcnn, SSD, Mask Rcnn or Tprobe.
In d, the fusion stage uses NMS fusion or Soft-NMS fusion.
The pyramid of images is a series image set arranged in a pyramid shape, the resolution is gradually reduced, and the pyramid is derived from an image set of original images, the image set is obtained by sampling downwards in a gradient mode, the sampling is stopped until a certain termination condition is reached, and the image of layers layers is compared to the pyramid mode, and the higher the level is, the smaller the image is, the lower the resolution is.
Fig. 2-3 are examples of positive sample selection. The positive sample adaptively samples the context region (also referred to as the chip) based on the presence of objects inside the image. In the image of fig. 2, ground truth boxes (represented by thin lines), and chips in the original image scale (represented by thick rectangles). FIG. 3: the down/up sampling is performed in consideration of the size of the object. The overlaid objects are displayed as rectangles and the invalid objects in the corresponding scale are displayed as other (colored) rectangles.
FIG. 4 is an example of negative sample selection, where line represents images and ground truth boxes, the bottom line represents negative proposals not contained in the positive chips (represented by circles with clarity at the center of each proposal) and negative chips generated based on the proposal (represented by rectangles).
Fig. 5 shows the training and reasoning process. Invalid RoI (Region of interest) that is out of the specified range at each scale is discarded during training and reasoning, with each batch of data during training including images sampled from a particular scale. The invalid GT (GT, real label) box is used to invalidate an anchor point in an RPN (Region pro-hierarchical network, which completes generation of a target Region), and after passing through an RCN (regional classification network), readjusts detection of each scale by a non-maximum suppression NMS (non-max-suppression) via a reset (cache), and performs combination and fusion. The related invalid real label (invalidGT) indicates that the scale of the invalid real label is not in the range of the valid scale, and the invalid real label is a real label which does not participate in direct training during training.
The RPN is a small network application of convolution layers (256 dimensions) + relu + two layers (clc layer and reg layer) on the sliding window area, and all sliding windows share the RPN.
Claims (5)
- The efficient dimension normalization target detection training method is characterized in that an image pyramid is constructed for an input image, the input image with a fixed size is generated by combining an annotation target, only the annotation target with a dimension within a given range in the input image belongs to an effective target, a positive sample is selected, meanwhile, a training set is used for negative sample mining training of small pictures, the target with false positive is found, the small pictures containing the target with false positive are selected as negative samples, finally, the positive sample and the negative sample are placed into the training set for model training, and prediction results of multiple dimensions are fused in a prediction stage.
- 2. The efficient scale-normalized target detection training method of claim 1, wherein: the method comprises the following steps:a. selecting a positive sample: determining an original image, selecting an effective scale range output by a model corresponding to the original image and the number of image pyramid layers according to problem analysis, constructing an image pyramid with a set number of layers, then segmenting the image pyramid into images with fixed sizes according to fixed intervals, and selecting a least segmented image capable of covering all marked targets in the effective scale range from the segmented images as a positive sample;b. selecting suitable detection models according to a determined training set, roughly training the detection models on the training set, testing the original images by using the trained detection models to find false positive targets in the original images, segmenting the original image pyramid according to a positive sample selection mode, selecting a small image containing false positive targets from the segmented small images as a negative sample, and skipping if the small image is added into the positive sample;c. training and label mapping: selecting an end-to-end detection model to train a positive sample and a negative sample, and performing label mapping operation on target objects falling into a small graph during training without screening according to a set scale range;d. prediction and result fusion: in the prediction stage, the original image pyramid is received by the prediction model as input, different prediction results are generated by different pyramids, the result in the effective scale range is selected from the different prediction results to be output, and finally, the result of the input images under multiple scales is fused by using the non-maximum value to inhibit NMS.
- 3. The efficient scale-normalized target detection training method of claim 2, wherein: in a, a greedy algorithm is adopted to determine a small icon containing all marked targets as effective targets as a positive sample.
- 4. The efficient scale-normalized target detection training method of claim 2, wherein: in c, the end-to-end detection model is fast-Rcnn, SSD, Mask Rcnn or Tprobe.
- 5. The efficient scale-normalized target detection training method of claim 2, wherein: in d, the fusion stage uses NMS fusion or Soft-NMS fusion.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910949649.0A CN110738208A (en) | 2019-10-08 | 2019-10-08 | efficient scale-normalized target detection training method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910949649.0A CN110738208A (en) | 2019-10-08 | 2019-10-08 | efficient scale-normalized target detection training method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110738208A true CN110738208A (en) | 2020-01-31 |
Family
ID=69268547
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910949649.0A Pending CN110738208A (en) | 2019-10-08 | 2019-10-08 | efficient scale-normalized target detection training method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110738208A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111539291A (en) * | 2020-04-16 | 2020-08-14 | 创新奇智(合肥)科技有限公司 | Target detection method and device based on radar waves, electronic equipment and storage medium |
WO2023207073A1 (en) * | 2022-04-29 | 2023-11-02 | 浪潮电子信息产业股份有限公司 | Object detection method and apparatus, and device and medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110182516A1 (en) * | 2010-01-27 | 2011-07-28 | Sony Corporation | Learning device, learning method, identifying device, identifying method, program, and information processing system |
CN106022300A (en) * | 2016-06-02 | 2016-10-12 | 中国科学院信息工程研究所 | Traffic sign identifying method and traffic sign identifying system based on cascading deep learning |
US9471836B1 (en) * | 2016-04-01 | 2016-10-18 | Stradvision Korea, Inc. | Method for learning rejector by forming classification tree in use of training images and detecting object in test images, and rejector using the same |
CN107944442A (en) * | 2017-11-09 | 2018-04-20 | 北京智芯原动科技有限公司 | Based on the object test equipment and method for improving convolutional neural networks |
CN108460403A (en) * | 2018-01-23 | 2018-08-28 | 上海交通大学 | The object detection method and system of multi-scale feature fusion in a kind of image |
CN108830179A (en) * | 2018-05-25 | 2018-11-16 | 太原科技大学 | Merge the pedestrian detection algorithm of Color Image Edge and depth direction histogram |
-
2019
- 2019-10-08 CN CN201910949649.0A patent/CN110738208A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110182516A1 (en) * | 2010-01-27 | 2011-07-28 | Sony Corporation | Learning device, learning method, identifying device, identifying method, program, and information processing system |
US9471836B1 (en) * | 2016-04-01 | 2016-10-18 | Stradvision Korea, Inc. | Method for learning rejector by forming classification tree in use of training images and detecting object in test images, and rejector using the same |
CN106022300A (en) * | 2016-06-02 | 2016-10-12 | 中国科学院信息工程研究所 | Traffic sign identifying method and traffic sign identifying system based on cascading deep learning |
CN107944442A (en) * | 2017-11-09 | 2018-04-20 | 北京智芯原动科技有限公司 | Based on the object test equipment and method for improving convolutional neural networks |
CN108460403A (en) * | 2018-01-23 | 2018-08-28 | 上海交通大学 | The object detection method and system of multi-scale feature fusion in a kind of image |
CN108830179A (en) * | 2018-05-25 | 2018-11-16 | 太原科技大学 | Merge the pedestrian detection algorithm of Color Image Edge and depth direction histogram |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111539291A (en) * | 2020-04-16 | 2020-08-14 | 创新奇智(合肥)科技有限公司 | Target detection method and device based on radar waves, electronic equipment and storage medium |
WO2023207073A1 (en) * | 2022-04-29 | 2023-11-02 | 浪潮电子信息产业股份有限公司 | Object detection method and apparatus, and device and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111444821B (en) | Automatic identification method for urban road signs | |
CN112990310B (en) | Artificial intelligence system and method for serving electric robot | |
CN110929577A (en) | Improved target identification method based on YOLOv3 lightweight framework | |
CN110598736A (en) | Power equipment infrared image fault positioning, identifying and predicting method | |
CN111598900B (en) | Image region segmentation model training method, segmentation method and device | |
CN109935080B (en) | Monitoring system and method for real-time calculation of traffic flow on traffic line | |
CN110738132B (en) | Target detection quality blind evaluation method with discriminant perception capability | |
CN111274926B (en) | Image data screening method, device, computer equipment and storage medium | |
CN110659601B (en) | Depth full convolution network remote sensing image dense vehicle detection method based on central point | |
CN113052295B (en) | Training method of neural network, object detection method, device and equipment | |
CN111008994A (en) | Moving target real-time detection and tracking system and method based on MPSoC | |
CN111985325A (en) | Aerial small target rapid identification method in extra-high voltage environment evaluation | |
CN114916964B (en) | Pharynx swab sampling effectiveness detection method and self-service pharynx swab sampling method | |
CN111540203B (en) | Method for adjusting green light passing time based on fast-RCNN | |
CN110738208A (en) | efficient scale-normalized target detection training method | |
CN110909656B (en) | Pedestrian detection method and system integrating radar and camera | |
CN114241511A (en) | Weak supervision pedestrian detection method, system, medium, equipment and processing terminal | |
CN113469950A (en) | Method for diagnosing abnormal heating defect of composite insulator based on deep learning | |
WO2023160666A1 (en) | Target detection method and apparatus, and target detection model training method and apparatus | |
CN112529836A (en) | High-voltage line defect detection method and device, storage medium and electronic equipment | |
CN115083229B (en) | Intelligent recognition and warning system of flight training equipment based on AI visual recognition | |
CN114998570B (en) | Method and device for determining object detection frame, storage medium and electronic device | |
CN111241941A (en) | Public water-saving control method and system based on artificial intelligence | |
CN110765900A (en) | DSSD-based automatic illegal building detection method and system | |
CN113222989A (en) | Image grading method and device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |