CN110738208A - efficient scale-normalized target detection training method - Google Patents

efficient scale-normalized target detection training method Download PDF

Info

Publication number
CN110738208A
CN110738208A CN201910949649.0A CN201910949649A CN110738208A CN 110738208 A CN110738208 A CN 110738208A CN 201910949649 A CN201910949649 A CN 201910949649A CN 110738208 A CN110738208 A CN 110738208A
Authority
CN
China
Prior art keywords
training
image
selecting
target
positive sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910949649.0A
Other languages
Chinese (zh)
Inventor
张发恩
赵江华
秦永强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Innovation Qizhi (chongqing) Technology Co Ltd
Original Assignee
Innovation Qizhi (chongqing) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innovation Qizhi (chongqing) Technology Co Ltd filed Critical Innovation Qizhi (chongqing) Technology Co Ltd
Priority to CN201910949649.0A priority Critical patent/CN110738208A/en
Publication of CN110738208A publication Critical patent/CN110738208A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention relates to an high-efficiency dimension standardization target detection training method which comprises the steps of selecting an effective dimension range output by a model and the number of image pyramid layers according to problem analysis, constructing an image pyramid with the set number of layers, cutting out images with fixed sizes according to fixed intervals, selecting the least cut image capable of covering all marked targets in the effective dimension range from the cut images as a positive sample, simultaneously carrying out rough training detection on an original image, selecting a small image with the least effective dimension covering all false positive targets as a negative sample, putting the positive sample and the negative sample into a training set for training and then predicting, and carrying out NMS fusion on prediction results of multiple dimensions in a prediction stage.

Description

efficient scale-normalized target detection training method
Technical Field
The application belongs to the technical field of image recognition, and particularly relates to efficient scale-normalized target detection training methods.
Background
The development of target detection and identification technology is rapid, the market demand is gradually increased, and the following application scenes are mainly adopted:
① safety field, such as fingerprint identification and face identification;
② military field, terrain survey, flyer identification, etc.;
③ traffic field, license plate number recognition, unmanned driving, traffic sign recognition, etc.;
④ medical field, such as electrocardiogram, B-ultrasonic, health management, and nutriology;
⑤ the field of life, intelligent home, shopping, and intelligent skin test.
At present, most of target detection and recognition algorithms with the best practical effect are based on convolutional neural network methods, and the methods are all based on classification and regression tasks of model learning targets in Anchor of various preset scales and length-width ratios, but under the condition of target scale distribution , the number of Anchor required is larger, so that more parameters and calculated amount can be added to the model, meanwhile, as the convolutional neural network has obvious change in image feature description vectors when the image scale changes, the model is difficult to learn a plurality of scale stable recognition models, efficient training methods for target scale normalization are required.
Disclosure of Invention
The technical problem to be solved by the invention is to provide efficient scale-normalized target detection training methods, which can effectively reduce the calculated amount, accelerate the training speed and make the model more stable on multi-scale target recognition.
In order to solve the problems, the technical scheme adopted by the invention is as follows:
high-efficiency dimension standardization target detection training method includes constructing an image pyramid for an input image, generating an input image with a fixed size by combining an annotation target, selecting a positive sample by only the annotation target with dimension in a given range in the input image, performing negative sample mining training of a small image by using a training set, performing preliminary training by using any detection model, finding out a target with false positive, selecting the small image with the false positive target as a negative sample, putting the positive sample and the negative sample into the training set for model training, and fusing prediction results of multiple dimensions in a prediction stage.
The improvement of the technical scheme of the invention in step is that the method comprises the following steps:
a. selecting a positive sample: determining an original image, selecting an effective scale range output by a model corresponding to the original image and the number of image pyramid layers according to problem analysis, constructing an image pyramid with a set number of layers, then segmenting the image pyramid into images with fixed sizes according to fixed intervals, and selecting a least segmented image capable of covering all marked targets in the effective scale range from the segmented images as a positive sample;
b. selecting suitable detection models according to a determined training set, carrying out rough training on the detection models on the training set without accurate result, then testing on an original image by using the trained detection models to find false positive targets in the original image, then segmenting the original image pyramid according to a positive sample selection mode, selecting a small image containing false positive targets from the segmented small image as a negative sample, and skipping if the small image is added into the positive sample;
c. training and label mapping: selecting an end-to-end detection model to train a positive sample and a negative sample, and performing label mapping operation on targets falling into a small graph during training without screening according to a set scale range;
d. prediction and result fusion: in the prediction stage, the original image pyramid is received by the prediction model as input, different prediction results are generated by different pyramids, the result in the effective scale range is selected from the different prediction results to be output, and finally, the result of the input images under multiple scales is fused by using the non-maximum value to inhibit NMS.
The technical scheme of the invention is further improved in step , in step a, a small icon containing all marked targets as effective targets is determined as a positive sample by adopting a greedy algorithm.
The improvement of the technical scheme of the invention in step is that in step c, an end-to-end detection model is fast-Rcnn, SSD, Mask Rcnn or Tprobe.
The technical proposal of the invention is further improved by that in the step d, NMS fusion or Soft-NMS fusion is used in the fusion stage.
By adopting the technical scheme, the batch size can be set to be larger during model training, and the larger batch size can be input, so that the batch normalization parameters can be more accurately updated by the model at nodes according to training data, the model performance is improved, and a more stable detection result can be obtained when the target scale distribution in the data is .
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a schematic illustration of a positive sample selection process according to the present invention;
FIG. 3 is a second schematic diagram illustrating a positive sample selection process according to the present invention;
FIG. 4 is a schematic diagram of a negative example selection process of the present invention;
FIG. 5 is a schematic diagram of the training and reasoning process of the present invention.
Detailed Description
The present invention is further illustrated in detail in connection with the following examples.
The invention discloses an efficient dimension standardization target detection training method, which comprises the steps of constructing an image pyramid on an input image, generating an input image with a fixed size by combining an annotation target, wherein only the annotation target with a dimension within a given range in the input image belongs to the effective target, selecting a positive sample, determining small icons containing all the annotation targets as the effective targets as positive samples by a greedy mode, performing negative sample mining training of the small icons by using a training set, performing preliminary training by using any detection model, finding out the existing false positive targets from the small icons, selecting the false positive targets as negative samples from the small icons, finally putting the positive samples and the negative samples into the training set for model training, and performing NMS (non-max-suppression) fusion on prediction results of multiple dimensions in a prediction stage.
The method comprises the following steps:
a. selecting a positive sample: determining an original image, selecting an effective scale range output by a model corresponding to the original image and the number of image pyramid layers according to problem analysis, constructing an image pyramid with a set number of layers, then segmenting the image pyramid into images with fixed sizes according to fixed intervals, and selecting a least segmented image capable of covering all marked targets in the effective scale range from the segmented images as a positive sample;
b. selecting suitable detection models according to a determined training set, carrying out rough training on the detection models on the training set without accurate result, then testing on an original image by using the trained detection models to find false positive targets in the original image, then segmenting the original image pyramid according to a positive sample selection mode, selecting a small image containing false positive targets from the segmented small image as a negative sample, and skipping if the small image is added into the positive sample;
c. training and label mapping: selecting an end-to-end detection model to train a positive sample and a negative sample, wherein during training, the RPN label mapping can carry out label mapping operation on targets falling into a small graph, and screening can not be carried out according to a set scale range, and at least the result of ignoring the RPN corresponding to the targets falling out of the range is omitted;
d. prediction and result fusion: in the prediction stage, the original image pyramid is received by the prediction model as input, different prediction results are generated by different pyramids, the result in the effective scale range is selected from the different prediction results to be output, and finally, the result of the input images under multiple scales is fused by using the non-maximum value to inhibit NMS.
In a, a greedy algorithm is adopted to determine a small icon containing all marked targets as effective targets as a positive sample.
In c, the end-to-end detection model is fast-Rcnn, SSD, Mask Rcnn or Tprobe.
In d, the fusion stage uses NMS fusion or Soft-NMS fusion.
The pyramid of images is a series image set arranged in a pyramid shape, the resolution is gradually reduced, and the pyramid is derived from an image set of original images, the image set is obtained by sampling downwards in a gradient mode, the sampling is stopped until a certain termination condition is reached, and the image of layers layers is compared to the pyramid mode, and the higher the level is, the smaller the image is, the lower the resolution is.
Fig. 2-3 are examples of positive sample selection. The positive sample adaptively samples the context region (also referred to as the chip) based on the presence of objects inside the image. In the image of fig. 2, ground truth boxes (represented by thin lines), and chips in the original image scale (represented by thick rectangles). FIG. 3: the down/up sampling is performed in consideration of the size of the object. The overlaid objects are displayed as rectangles and the invalid objects in the corresponding scale are displayed as other (colored) rectangles.
FIG. 4 is an example of negative sample selection, where line represents images and ground truth boxes, the bottom line represents negative proposals not contained in the positive chips (represented by circles with clarity at the center of each proposal) and negative chips generated based on the proposal (represented by rectangles).
Fig. 5 shows the training and reasoning process. Invalid RoI (Region of interest) that is out of the specified range at each scale is discarded during training and reasoning, with each batch of data during training including images sampled from a particular scale. The invalid GT (GT, real label) box is used to invalidate an anchor point in an RPN (Region pro-hierarchical network, which completes generation of a target Region), and after passing through an RCN (regional classification network), readjusts detection of each scale by a non-maximum suppression NMS (non-max-suppression) via a reset (cache), and performs combination and fusion. The related invalid real label (invalidGT) indicates that the scale of the invalid real label is not in the range of the valid scale, and the invalid real label is a real label which does not participate in direct training during training.
The RPN is a small network application of convolution layers (256 dimensions) + relu + two layers (clc layer and reg layer) on the sliding window area, and all sliding windows share the RPN.

Claims (5)

  1. The efficient dimension normalization target detection training method is characterized in that an image pyramid is constructed for an input image, the input image with a fixed size is generated by combining an annotation target, only the annotation target with a dimension within a given range in the input image belongs to an effective target, a positive sample is selected, meanwhile, a training set is used for negative sample mining training of small pictures, the target with false positive is found, the small pictures containing the target with false positive are selected as negative samples, finally, the positive sample and the negative sample are placed into the training set for model training, and prediction results of multiple dimensions are fused in a prediction stage.
  2. 2. The efficient scale-normalized target detection training method of claim 1, wherein: the method comprises the following steps:
    a. selecting a positive sample: determining an original image, selecting an effective scale range output by a model corresponding to the original image and the number of image pyramid layers according to problem analysis, constructing an image pyramid with a set number of layers, then segmenting the image pyramid into images with fixed sizes according to fixed intervals, and selecting a least segmented image capable of covering all marked targets in the effective scale range from the segmented images as a positive sample;
    b. selecting suitable detection models according to a determined training set, roughly training the detection models on the training set, testing the original images by using the trained detection models to find false positive targets in the original images, segmenting the original image pyramid according to a positive sample selection mode, selecting a small image containing false positive targets from the segmented small images as a negative sample, and skipping if the small image is added into the positive sample;
    c. training and label mapping: selecting an end-to-end detection model to train a positive sample and a negative sample, and performing label mapping operation on target objects falling into a small graph during training without screening according to a set scale range;
    d. prediction and result fusion: in the prediction stage, the original image pyramid is received by the prediction model as input, different prediction results are generated by different pyramids, the result in the effective scale range is selected from the different prediction results to be output, and finally, the result of the input images under multiple scales is fused by using the non-maximum value to inhibit NMS.
  3. 3. The efficient scale-normalized target detection training method of claim 2, wherein: in a, a greedy algorithm is adopted to determine a small icon containing all marked targets as effective targets as a positive sample.
  4. 4. The efficient scale-normalized target detection training method of claim 2, wherein: in c, the end-to-end detection model is fast-Rcnn, SSD, Mask Rcnn or Tprobe.
  5. 5. The efficient scale-normalized target detection training method of claim 2, wherein: in d, the fusion stage uses NMS fusion or Soft-NMS fusion.
CN201910949649.0A 2019-10-08 2019-10-08 efficient scale-normalized target detection training method Pending CN110738208A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910949649.0A CN110738208A (en) 2019-10-08 2019-10-08 efficient scale-normalized target detection training method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910949649.0A CN110738208A (en) 2019-10-08 2019-10-08 efficient scale-normalized target detection training method

Publications (1)

Publication Number Publication Date
CN110738208A true CN110738208A (en) 2020-01-31

Family

ID=69268547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910949649.0A Pending CN110738208A (en) 2019-10-08 2019-10-08 efficient scale-normalized target detection training method

Country Status (1)

Country Link
CN (1) CN110738208A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111539291A (en) * 2020-04-16 2020-08-14 创新奇智(合肥)科技有限公司 Target detection method and device based on radar waves, electronic equipment and storage medium
WO2023207073A1 (en) * 2022-04-29 2023-11-02 浪潮电子信息产业股份有限公司 Object detection method and apparatus, and device and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110182516A1 (en) * 2010-01-27 2011-07-28 Sony Corporation Learning device, learning method, identifying device, identifying method, program, and information processing system
CN106022300A (en) * 2016-06-02 2016-10-12 中国科学院信息工程研究所 Traffic sign identifying method and traffic sign identifying system based on cascading deep learning
US9471836B1 (en) * 2016-04-01 2016-10-18 Stradvision Korea, Inc. Method for learning rejector by forming classification tree in use of training images and detecting object in test images, and rejector using the same
CN107944442A (en) * 2017-11-09 2018-04-20 北京智芯原动科技有限公司 Based on the object test equipment and method for improving convolutional neural networks
CN108460403A (en) * 2018-01-23 2018-08-28 上海交通大学 The object detection method and system of multi-scale feature fusion in a kind of image
CN108830179A (en) * 2018-05-25 2018-11-16 太原科技大学 Merge the pedestrian detection algorithm of Color Image Edge and depth direction histogram

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110182516A1 (en) * 2010-01-27 2011-07-28 Sony Corporation Learning device, learning method, identifying device, identifying method, program, and information processing system
US9471836B1 (en) * 2016-04-01 2016-10-18 Stradvision Korea, Inc. Method for learning rejector by forming classification tree in use of training images and detecting object in test images, and rejector using the same
CN106022300A (en) * 2016-06-02 2016-10-12 中国科学院信息工程研究所 Traffic sign identifying method and traffic sign identifying system based on cascading deep learning
CN107944442A (en) * 2017-11-09 2018-04-20 北京智芯原动科技有限公司 Based on the object test equipment and method for improving convolutional neural networks
CN108460403A (en) * 2018-01-23 2018-08-28 上海交通大学 The object detection method and system of multi-scale feature fusion in a kind of image
CN108830179A (en) * 2018-05-25 2018-11-16 太原科技大学 Merge the pedestrian detection algorithm of Color Image Edge and depth direction histogram

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111539291A (en) * 2020-04-16 2020-08-14 创新奇智(合肥)科技有限公司 Target detection method and device based on radar waves, electronic equipment and storage medium
WO2023207073A1 (en) * 2022-04-29 2023-11-02 浪潮电子信息产业股份有限公司 Object detection method and apparatus, and device and medium

Similar Documents

Publication Publication Date Title
CN111444821B (en) Automatic identification method for urban road signs
CN112990310B (en) Artificial intelligence system and method for serving electric robot
CN110929577A (en) Improved target identification method based on YOLOv3 lightweight framework
CN110598736A (en) Power equipment infrared image fault positioning, identifying and predicting method
CN111598900B (en) Image region segmentation model training method, segmentation method and device
CN109935080B (en) Monitoring system and method for real-time calculation of traffic flow on traffic line
CN110738132B (en) Target detection quality blind evaluation method with discriminant perception capability
CN111274926B (en) Image data screening method, device, computer equipment and storage medium
CN110659601B (en) Depth full convolution network remote sensing image dense vehicle detection method based on central point
CN113052295B (en) Training method of neural network, object detection method, device and equipment
CN111008994A (en) Moving target real-time detection and tracking system and method based on MPSoC
CN111985325A (en) Aerial small target rapid identification method in extra-high voltage environment evaluation
CN114916964B (en) Pharynx swab sampling effectiveness detection method and self-service pharynx swab sampling method
CN111540203B (en) Method for adjusting green light passing time based on fast-RCNN
CN110738208A (en) efficient scale-normalized target detection training method
CN110909656B (en) Pedestrian detection method and system integrating radar and camera
CN114241511A (en) Weak supervision pedestrian detection method, system, medium, equipment and processing terminal
CN113469950A (en) Method for diagnosing abnormal heating defect of composite insulator based on deep learning
WO2023160666A1 (en) Target detection method and apparatus, and target detection model training method and apparatus
CN112529836A (en) High-voltage line defect detection method and device, storage medium and electronic equipment
CN115083229B (en) Intelligent recognition and warning system of flight training equipment based on AI visual recognition
CN114998570B (en) Method and device for determining object detection frame, storage medium and electronic device
CN111241941A (en) Public water-saving control method and system based on artificial intelligence
CN110765900A (en) DSSD-based automatic illegal building detection method and system
CN113222989A (en) Image grading method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination