CN111582093A - Automatic small target detection method in high-resolution image based on computer vision and deep learning - Google Patents

Automatic small target detection method in high-resolution image based on computer vision and deep learning Download PDF

Info

Publication number
CN111582093A
CN111582093A CN202010346094.3A CN202010346094A CN111582093A CN 111582093 A CN111582093 A CN 111582093A CN 202010346094 A CN202010346094 A CN 202010346094A CN 111582093 A CN111582093 A CN 111582093A
Authority
CN
China
Prior art keywords
detection
small
model
scale
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010346094.3A
Other languages
Chinese (zh)
Inventor
孙光民
陈佳阳
李煜
林朋飞
朱美龙
梁浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202010346094.3A priority Critical patent/CN111582093A/en
Publication of CN111582093A publication Critical patent/CN111582093A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for automatically detecting small targets in a high-resolution image based on computer vision and deep learning, which mainly comprises the following steps: firstly, the original small target detection task is decomposed on different scales to obtain a multi-scale task group. And then training the low-resolution detectors under different scales respectively, and detecting by using the low-resolution detectors to obtain detection results under different scales. And finally, fusing the detection results to obtain a final detection result aiming at the small target. The invention solves the problem that the target detector in the prior art is difficult to detect the tiny target in a high-resolution image.

Description

Automatic small target detection method in high-resolution image based on computer vision and deep learning
Technical Field
The invention belongs to a target detection technology, and particularly relates to a method for automatically detecting a tiny target in a high-resolution image based on computer vision and deep learning.
Background
The current widely used target detectors based on deep learning can be mainly divided into two types: the first type is a two-step (two stage) target detector, such as Fast R-CNN, Mask R-CNN, etc., and these algorithms are characterized in that the target detection is divided into two stages: firstly, extracting a candidate area, and then sending the candidate area into a detection network to complete the positioning and identification of the target. The second type is a Single step (one stage) target Detection algorithm, such as Single Shot Detection (SSD), young Look Only one (YOLO), YOLO 9000, YOLO V3, etc., which does not need to extract a candidate box in advance, but directly completes the regression of the target position and the judgment of the type through a preset box in the network, and is an end-to-end target Detection algorithm. Under the scene that the size of the target to be detected is large and not dense, the two-step target detector and the single-step target detector have high detection accuracy, and the latter has higher detection speed compared with the former. However, due to the influence of factors such as the actual size of the target, the shooting equipment, the shooting distance, the observation scale and the like, the real target often appears as a small target in the image. Small objects have fewer pixels and less extractable features than large objects. When the small targets are detected, the two-step detector or the single-step detector cannot achieve good detection effect.
At present, optimization aiming at a small target detection algorithm mainly focuses on improvement of a model, namely on the premise that the size of an input low-resolution image is not changed, the feature extraction capability of a detector and the detection precision of the detector are improved by improving the structure of the model. The currently more effective improved algorithm is a Feature Pyramid network (Feature Pyramid Networks). The network can be embedded into the two-step and single-step detector, and the low-level feature map and the high-level feature map generated by the main network can be fused in a specific mode to complete the reconstruction of the feature pyramid. After the operation, the receptive field range of the low-level characteristic diagram is improved, the semantic information of the characteristic diagram is enhanced, and finally the precision of the model for detecting the small target is greatly improved.
Although the above improvements may improve the model detection accuracy, the objects processed by these models are still low resolution images. With the improvement of the hardware performance of the image pickup device, people can obtain images with higher resolution. In contrast to the low-resolution image, the small object can be represented by more pixels in the high-resolution image, i.e., can be depicted more clearly. This feature provides efficient data support for small object detection tasks. While data support is obtained, none of the current detection algorithms are basically suitable for images with resolutions up to tens of millions of pixels. However, if the high-resolution image is down-sampled to adapt to the detection model, the information is lost, the characteristics of the high-resolution image cannot be fully utilized, and finally, the small target is difficult to detect. Aiming at the problem of small target Detection of high-resolution images, a target Detection process based on high-resolution Satellite images, namely Satellite image multi-scale Rapid Detection with window Networks (SIMRDWN), detects candidate regions obtained through a sliding window by using a Rapid detector, and can complete a Rapid Detection task of high-resolution images of any size. However, the method has low detection precision, more false alarms and longer execution time. In order to solve the problem, the patent provides a simple and effective method for solving the problem of small target detection in a high-resolution image. The algorithm splits an original detection task on different scales of an image to obtain a multi-scale detection task group with logical association; then respectively training corresponding low-resolution target detectors aiming at detection tasks under different scales; and finally, fusing the detection results under all scales to obtain a final defect mark detection result. The research aims at constructing a pattern recognition framework based on a deep neural network and exploring a small target detection method based on a high-resolution image.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a small target automatic detection method in a high-resolution image, which solves the problem that a target detector in the prior art is difficult to detect a small target in the high-resolution image.
The method is divided into detection and training processes, and is an automatic detection method for small targets in high-resolution images based on computer vision and deep learning, and is characterized by comprising the following steps:
s1 detection flow
S1.1 establishing multiscale task groups
And establishing a dual-scale image pyramid for the original high-resolution image to be detected according to the Gaussian pyramid theory, and properly decomposing the small target detection task under the two scales to obtain a multi-scale task group. Wherein, a large target segmentation task without inclusion relation with a small target is set at a large scale, and an original small target detection task is set at a small scale.
S1.2 object segmentation at large scale
Under the large scale, a Mask R-CNN model is utilized to carry out example segmentation on the large-scale image, and the obtained low-resolution Mask is subjected to up-sampling to recover the resolution of the original image.
S1.3 Small Scale target detection
Under the small scale, extracting a candidate region in the small scale image by using an overlapping sliding window, screening the candidate region according to a Mask, and sending the candidate region without intersection with a large target region in the Mask to a target detector SSD for detection. After all candidate regions are detected, the detection results are mapped from the sub-regions back to the original image.
S1.4 segmentation under dual scales and detection result fusion
And carrying out secondary screening on the detection frame obtained under the small scale by utilizing the segmentation mask obtained under the large scale. Firstly, morphological processing is carried out on the segmentation mask, then the detection frames appearing in the large target area are deleted, and finally non-maximum value inhibition is applied to fuse the overlapped detection frames to obtain the final detection result of the small target in the high-resolution image.
S2 training procedure
S2.1 segmentation model training
The Mask RCNN model is trained in a transfer learning mode by using pictures and labeled information under large scale, and protection is carried outThe node with the best performance on the verification set exists as a well-trained segmentation ModelS
S2.2 detection model initial training
And randomly cutting the defect peripheral area in the small-scale high-resolution image according to a specific mode to obtain a slice sample set which accords with the input size of the SSD model. Training an SSD Model in a migration learning mode, and saving the best-performing node on a verification set as a trained segmentation ModelD1
S2.3 test model second training
Model to be trainedSAnd a ModelD1And embedding the image into the detection flow, and detecting the high-resolution atlas according to the detection flow. Cutting out false detection frames in the result, adding the false detection frames as a single class into the original slice set, and detecting the Model by using the new training setD1Retraining to obtain a secondary training ModelD2. Finally, the Model of the secondary training Model is usedD2Model for replacing target detector in original detection frameD1And finishing the construction of the final detection frame.
In summary, the present invention utilizes a multi-scale segmentation and detection model combination to complete the automatic detection of small targets in high resolution images. Meanwhile, the method has the main work that the combined deep learning model understands the picture content on multiple scales, and improves the detection precision and accelerates the detection efficiency by utilizing different information.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a schematic flow chart of a high-resolution image small target detection method according to the present invention;
FIG. 2 is a flow chart of a method for detecting micro defects of a high-resolution image of a floor according to an embodiment of the present invention;
FIG. 3 is a high resolution image of a floor including defects provided by an embodiment of the present invention;
fig. 4 is a non-wall region segmentation algorithm at a large scale according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an intra-window sampling method according to an embodiment of the present invention;
FIG. 6 is a defect detection algorithm at a small scale according to an embodiment of the present invention;
fig. 7 is a comparison of the single-scale and multi-scale fusion detection effects provided by the embodiment of the present invention. (a) The detection result is on a small scale; (b) fusing the small-scale detection result after the mask is divided;
FIG. 8 shows the training results of Mask-RCNN according to an embodiment of the present invention. (a) To verify the mAP changes on the set; (b) the model detection effect is shown schematically;
fig. 9 shows the training result of the SSD according to the embodiment of the present invention. (a) To verify the mAP changes on the set; (b) the model detection effect is shown schematically;
fig. 10 is a representation of the detection algorithm before and after the second training on the images in the verification set according to the embodiment of the present invention. (a) The result is a detection result 1 before secondary training; (b) the result is a detection result 1 after the secondary training; (c) the result is a detection result 2 before the second training; (d) the result is the detection result 2 after the second training.
Detailed Description
For the purpose of better explaining the present invention and to facilitate understanding, the present invention will be described in detail by way of specific embodiments with reference to the accompanying drawings.
In the following description, various aspects of the invention will be described, however, it will be apparent to those skilled in the art that the invention may be practiced with only some or all of the structures or processes of the present invention. Specific numbers, configurations and sequences are set forth in order to provide clarity of explanation, but it will be apparent that the invention may be practiced without these specific details. In other instances, well-known features have not been set forth in detail in order not to obscure the invention.
The invention provides a small target automatic detection method in a high-resolution image based on computer vision and deep learning, aiming at the problem that the existing small target detection technology is mainly aimed at model improvement and is difficult to process the high-resolution image. By means of multi-scale task decomposition and combination of large-scale segmentation and small-scale detection results, automatic detection of tiny targets in high-precision and high-speed high-resolution images is finally achieved.
First, detection process
(one) establishing multiscale task groups
And establishing an image pyramid containing one large scale and one small scale for the original high-resolution image to be detected according to the Gaussian pyramid theory. The large-scale image is obtained by performing Gaussian filtering on the original image and then performing 8-time down-sampling. The width and height of the generated large-scale image are both 1/8 of the original image.
Wherein the two-dimensional gaussian kernel function is as follows:
Figure BDA0002470125700000051
performing convolution operation on a Gaussian kernel template G (x, y) with the size of 5 x 5 and an original image I (x, y), and performing 8-time down-sampling to obtain a large-scale image L (x, y), wherein the formula is expressed as follows:
Figure BDA0002470125700000052
and the small scale is the original scale of the image, so the small scale image S (x, y) ═ I (x, y) can be obtained without any operation on the original image.
For the large-scale image L (x, y), a target segmentation task is set. The choice of segmented objects should be large objects that do not contain the small objects to be detected and which still retain the main features in the large scale image. While for the small scale image S (x, y), the small target detection task is still set.
(II) object segmentation at Large Scale
Under the large scale, a set large target in the large-scale image is segmented by utilizing a trained Mask R-CNN model to obtain a binary Mask image Ms (x, y) with the same resolution as the large-scale image L (x, y). In order to restore the resolution of the original image I (x, y) and facilitate the use of the subsequent steps, the obtained low-resolution mask Ms (x, y) needs to be up-sampled by 8 times to obtain M (x, y).
M(x,y)=8↑Ms(x,y)
Wherein the interpolation mode adopts a nearest neighbor interpolation method. In the obtained high-resolution mask image M (x, y), the pixel value of the large target region is 255, and the pixel values of the other regions are 0. Meanwhile, the segmentation means under the large scale is not limited to the depth model, and can be assisted by a traditional image processing method.
(III) target detection at Small Scale
At small scale, because the image resolution is much higher than the input size of the SSD detector, a sliding window consistent with the input size of the detector is used to slide on the small-scale image in an overlapping manner to extract a candidate detection region Proposal in the small-scale imageS. Before the SSD is detected, screening the candidate area according to a mask M (x, y), wherein the screening mode is as follows: simultaneous extraction of the sub-regions Proposal on M (x, y) using one and the same sliding windowMIf Proposal is usedMThe middle 9 predetermined sampling points are all 255 pixels, i.e. the area is considered to be mainly a large target area, and then the subsequent detection is not performed. After the window traverses the complete picture, each Proposal will be obtainedSAnd (5) corresponding detection results. Setting the detection result greater than the confidence threshold to have D1The d-th detection box can be expressed as coordinates (x)d,yd,wd,hd) Class cdAnd a confidence sdThen detect the set DBox1={((xd,yd,wd,hd),cd,sd)|d∈[1,D1]}。
These measurements were then derived from ProposalSMapping back to the original image, the mapping method is as follows: is provided withProposalSThe position of the upper left corner in the original image is recorded as (X)ps,Yps). Then 4 parameters (X) of the target detection frame corresponding to the original image can be obtained according to the coordinate transformationd,Yd,Wd,Hd)
Xd=Xps+xd
Yd=Yps+yd
Wd=wd
Hd=hd
Detection ensemble becomes DBox2={((Xd,Yd,Wd,Hd),cd,sd)|d∈[1,D1]}。
(IV) segmentation under dual scales and fusion of detection results
After the segmentation and detection under the multi-scale are finished, the multi-scale information can be fused to improve the detection precision of the small target. The detection result set DBox obtained under the small scale can be detected by utilizing the segmentation mask obtained under the large scale1And (5) carrying out secondary screening. First, the division mask is etched to reduce the area of the large target region. And then judging whether the pixel values of the 4 corner points of the detection frame are all 0, if so, keeping, and otherwise, discarding the detection frame. Assuming that the number of the test results after screening is D2If yes, the detection set becomes DBox2={((Xg,Yg,Wg,Hg),cg,sg)|g∈[1,D2]}。
In order to inhibit the overlapping phenomenon of the detection frames caused by the overlapping sliding of the sliding window, a non-maximum value inhibition algorithm is applied to complete the further fusion of the detection frames to obtain a set DBox containing F final detection resultsfinal={((Xf,Yf,Wf,Hf),cf,sf)|g∈[1,F]}. Now assuming that there is a candidate set of boxes B and its corresponding scores set S, the algorithm for non-maximum suppression is as follows:
1. finding out DBox2The element dbox with the highest confidence level s;
2. removing DBox from DBox2Deleted and added to DBoxfinalPerforming the following steps;
3. from DBox2The overlap area of the deletion and the dbox coordinate frame is more than a threshold value NtThe other frames of (1);
4. repeating the steps 1-3.
Second, training process
Segmentation model training
And training an example segmentation model Mask RCNN by using the large-scale picture set and the labeling information thereof. The sample set is first divided into two subsets, training and validation. And then, adjusting the model parameters by using the training set, and testing the quality of the model on the verification set every 10 steps. The training mode adopts transfer learning, namely pre-training is carried out on a large data set, and then parameter fine-tuning is carried out on a small training set of an actual application scene. The optimization mode adopts a small Batch gradient descent method (Mini-Batch gradient Description), and the Batch size (Batch size) is set according to the hardware environment. And after the specified number of rounds is trained, finally saving the node with the best performance on the verification set. Obtaining a ModelS
(II) sample primary training of detection model
Unlike training of segmentation models, training of detection models requires that samples be processed first. And randomly cutting the defect peripheral area in the small-scale high-resolution image to obtain a slice sample set which accords with the input size of the SSD model. In order to ensure that the labeled boxes in the slice are complete and not truncated, a random cropping algorithm is designed as follows. Let WwAnd HwThe length and width of the window are provided with a marking frame BBoxiHas a length and width of WbiAnd HbiThe coordinate of the upper left corner point is (x)bi,ybi) Wherein i ∈ [0, N]. The algorithm flow is as follows:
1.i=0
2. for BBoxiFirst, regardless of the effect of nearby labeled boxes, only the crop box window is required to fit the BBoxiAnd if the clipping window is completely contained, the value range of the coordinates of the upper left corner point of the clipping window is a rectangular area.The coordinates of the upper left corner point of the area Cand are (x)bi+Wbi-Ww,ybi+Hbi-Hw). The length and width of the region are (W)w-Wbi,Hw-Hbi). A mask image is set to record the position of the selectable point with the point on which the pixel value is 255. Meanwhile, the upper left corner of the area Cover possibly covered by a window needs to be calculated, wherein the area Cover is the same as the Cand and the width is Wcover=2Ww-WbiHeight is Hcover=2Hw-Hbi. With BBoxiThe area is the center, the side length of the area is prolonged, and the rest area of the Cover is divided into 8 rectangular blocks. And taking the upper left corner block as a starting point, clockwise distributing sequence numbers for the blocks, and sequentially: the upper left corner block-0, the upper right block-1, the upper right block-2, the right block-3, the lower right block-4, the lower right block-5, the lower left block-6, and the left right block-7.
3. Traversing all the labeled boxes, and detecting M boxes (except BBox) intersected with the Cover areaiOuter) record { NeorBBoxj|j∈[0,M]}
4. Traversal { NeorBBoxj|j∈[0,M]And performing 0 setting operation on different parts of CandMask according to the positions of the subblocks where the intersection part is located. Setting the NeorBBox with the upper left corner of each subregion as (0, 0)jAnd coverubregionkThe coordinate of the upper left corner point of the intersection part of (a) is (x)LT,jk,yLT,jk) The coordinate of the lower right corner point is (x)RD,jk,yRD,jk). The specific operation mode is as follows:
Figure BDA0002470125700000081
Figure BDA0002470125700000091
5. at this time, all positions of 255 pixel values in the CandMask are candidate points of coordinates of the top left corner point of the cutting frame which can completely contain the labeling frame. Then, some coordinates can be randomly selected from the coordinates and used for cutting the image. Meanwhile, the coordinate information of the marking frame is required to be linearly adjusted.
After a sample set for training a detector is manufactured according to the algorithm, the sample set is firstly divided into two subsets of training and verification. And then, adjusting the model parameters by using the training set, and testing the quality of the model on the verification set every 10 steps. The training mode adopts transfer learning, namely pre-training is carried out on a large data set, and then parameter fine-tuning is carried out on a small training set of an actual application scene. The optimization mode adopts a small-Batch Gradient decline method (Mini-Batch Gradient Description), and the Batch size (Batch size) is set according to the hardware environment. And after the specified number of rounds is trained, finally saving the node with the best performance on the verification set. Obtaining a ModelD1
(III) Secondary training of detection model
And detecting the high-resolution atlas by using the detection flow, adding the false detection frame as a single type into the original slice set, and retraining the detection model to obtain a secondary training model. And then replacing the target detector in the original detection framework by the secondary training model. And judging whether a detection result is false detection or not according to the area intersection ratio IOU of the detection frame and the real frame. Let the detection threshold be TIOUThen, then
Figure BDA0002470125700000092
And taking the error detection frame as a center, and randomly clipping the error detection frame by using a method in two (two) to obtain a negative sample set which accords with the input size of the detector. Then, the original detection frame on each slice is removed, and a detection frame with the category of normal and the size of the slice is reset. These annotated slices are then blended with the initial set of positive sample slices, and the initial training Model is modeled using the post-synthesis datasetD1Performing secondary training to obtain a ModelD2. Finally, Model is used in the detection processSAnd a ModelD2And (6) detecting.
In conclusion, the invention disassembles the small target detection task in the high-resolution image on multiple scales, solves each subtask by using different depth models, and finally fuses the multiple-scale segmentation and the detection result to obtain the high-precision detection result.
Examples
Fig. 1 shows a high-resolution image small target general detection framework based on computer vision and deep learning. Fig. 2 is a schematic diagram showing the flow and intermediate results of the detection framework applied to the automatic detection of the micro-defects of the high-resolution floor images in the embodiment.
In the embodiment, the wall of a certain cell is used for shooting the image, the resolution is 7952 × 5304, and the file size is generally about 30M. The defect images obtained initially are 107, and 195 defect targets. As shown in fig. 3, the right frame is a defect after partial enlargement.
Firstly, a detection process is introduced, and according to the first detection step, a dual-scale pyramid is firstly constructed, as shown in fig. 2. On a large scale, non-wall regions containing no defects are set as segmentation targets, such as windows, air conditioners, sky, and the like. On a small scale, common floor defects such as brick shortage, broken bricks and the like are taken as detection targets.
According to the second detection step, the windows and the air conditioners are segmented by using Mask RCNN on a large scale, and the sky is segmented by assisting in a traditional region growing mode, so that a non-wall large-object segmentation Mask is finally obtained, as shown in fig. 4.
And according to the third detection step, detecting the broken bricks on a small scale by using the SSD, firstly, primarily screening candidate frames provided by a sliding window with the size of 640 × 640 by using a wall mask, wherein the ProposalMThe middle 9 preset sampling point positions are shown in fig. 5. The sub-blocks are then detected by the SSD and the coordinates of the detected frame are mapped back to the original image, as shown in fig. 6. The defect detection result on a small scale is shown in fig. 7 (a).
And according to the fourth detection step, combining the large-scale segmentation and the small-scale detection result to improve the detection precision. The final detection result after the mask secondary screening and the non-maximum suppression processing is shown in fig. 7 (b).
Next, a training process is introduced, according to a training step one, 31 samples with the resolution of 994 × 663 for training an example segmentation model MaskR-CNN are divided into a training set (25 samples) and a verification set (6 samples), and the number of windows and air conditioners in each picture is different. And (3) fine-tuning parameters on the training set based on the COCO pre-training model parameters. The Batch size (Batch size) was set to 1, the initial learning rate was set to 0.0001, and the number of iterations was 100000 steps. The variation of the mean average accuracy maps of the model on the validation set during the training process is shown in fig. 8 (a). The variation of the mAP curve on the verification set is shown in (a) of FIG. 8. The model with the mAP of 96.89% at step 38440 was saved as the final segmentation model, and the confidence threshold was set to 0.5, and the partial detection result thereof was obtained as shown in (b) of FIG. 8.
According to the second training step, a 640 × 640 slice atlas is first created from the high resolution image set. The number of the defects in each picture is different, and the number of the generated slice samples corresponding to the training set, the verification set and the test set is 3180, 220 and 200 respectively. And then, performing migration learning on the floor defect slice sample set by using an SSD model pre-trained on the COCO data set, setting the batch size to be 4, setting the learning rate to be 0.00005, and iterating 30000 times. The graph of the mAP of the model on the validation slice set as a function of the number of iterations is shown, with the IOU threshold set to 0.5. From fig. 9 (a), it can be seen that after 100000 iterations, the maps of the model on the verification slice set can reach about 0.7, and then tend to be stable. And saving the model at the highest mAP node on the verification slice set as an initial training model, wherein the mAP of the model on the test slice set is 0.714. The confidence threshold of the primary training model is set to 0.5, and the partial detection result is shown in fig. 9 (b).
And according to the third training step, detecting the high-resolution images of the training and verification set by using the Mask RCNN and the SSD which are trained for the first time according to the detection process, cutting the obtained negative sample, adding the negative sample into the original slice set, carrying out secondary fine adjustment on the SSD which is trained for the first time, replacing the original SSD with the SSD, and completing the final construction of the detection frame. The detection algorithm applies the detection result pairs of the primary training SSD and the secondary training SSD as shown in fig. 10.
Finally, it should be noted that: various parameters designed by the method need to be adjusted according to the specific interest of practical application. The above-mentioned embodiments are only used for illustrating the technical solution of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (6)

1. A method for automatically detecting small targets in a high-resolution image based on computer vision and deep learning is characterized by comprising the following steps:
s1 detection flow
S1.1 establishing multiscale task groups
Establishing a dual-scale image pyramid for the original high-resolution image to be detected according to a Gaussian pyramid theory, and decomposing a small target detection task under the two scales to obtain a multi-scale task group; setting a large target segmentation task which has no inclusion relation with a small target at a large scale, and setting an original small target detection task at a small scale;
s1.2 object segmentation at large scale
Under the large scale, a Mask R-CNN model is used for segmenting the large scale image, and the obtained low resolution Mask is subjected to up-sampling to recover the resolution of the original image;
s1.3 Small Scale target detection
Under the small scale, extracting a candidate region in the small scale image by using an overlapping sliding window, screening the candidate region according to a Mask, and sending the candidate region without intersection with a large target region in the Mask to a target detector SSD for detection; after all the candidate areas are detected, mapping the detection results from the sub-areas back to the original image;
s1.4 segmentation under dual scales and detection result fusion
Carrying out secondary screening on the detection frame obtained under the small scale by utilizing the segmentation mask obtained under the large scale; firstly, performing morphological processing on a segmentation mask, then deleting a detection frame appearing in a large target area, and finally fusing overlapped detection frames by applying non-maximum value inhibition to obtain a final detection result of a small target in a high-resolution image;
s2 training procedure
S2.1 segmentation model training
The Mask RCNN Model is trained in a transfer learning mode by utilizing pictures and labeled information under large scale, and nodes stored on a verification set are used as a trained segmentation ModelS
S2.2 detection model initial training
Randomly cutting a defect peripheral area in the small-scale high-resolution image according to a specific mode to obtain a slice sample set which accords with the input size of the SSD model; training an SSD Model in a migration learning mode, and saving the best-performing node on a verification set as a trained segmentation ModelD1
S2.3 test model second training
Model to be trainedSAnd a ModelD1Embedding the image into a detection flow, and detecting according to the high-resolution atlas; cutting out false detection frames in the result, adding the false detection frames as a single class into the original slice set, and detecting the Model by using the new training setD1Retraining to obtain a secondary training ModelD2(ii) a Finally, Model of secondary training Model is usedD2Model for replacing target detector in original detection frameD1And finishing the construction of the final detection frame.
2. The method for automatically detecting small objects in high-resolution images based on computer vision and deep learning according to claim 1,
at small scale, since the image resolution is much higher than the input size of the SSD detector, use is made ofA sliding window with the same size as the input of the detector is overlapped and slid on the small-scale image to extract the candidate detection region Proposal in the small-scale imageS(ii) a Before the SSD is detected, screening the candidate area according to a mask M (x, y), wherein the screening mode is as follows: simultaneous extraction of the sub-regions Proposal on M (x, y) using one and the same sliding windowMIf Proposal is usedMThe middle 9 preset sampling points are all 255 pixels; after the window traverses the complete picture, each Proposal will be obtainedSCorresponding detection results; setting the detection result greater than the confidence threshold to have D1The d-th detection frame is expressed as a coordinate (x)d,yd,wd,hd) Class cdAnd a confidence sdThen detect the set DBox1={((xd,yd,wd,hd),cd,sd)|d∈[1,D1]};
These measurements were then derived from ProposalSMapping back to the original image, the mapping method is as follows: proposal is providedSThe position of the upper left corner in the original image is recorded as (X)ps,Yps) (ii) a Then 4 parameters (X) of the target detection frame corresponding to the original image can be obtained according to the coordinate transformationd,Yd,Wd,Hd)
Xd=Xps+xd
Yd=Yps+yd
Wd=wd
Hd=hd
Detection ensemble becomes DBox2={((Xd,Yd,Wd,Hd),cd,sd)|d∈[1,D1]}。
3. The method for automatically detecting small targets in high-resolution images based on computer vision and deep learning as claimed in claim 1, wherein after segmentation and detection under multiple scales are completed, multi-scale information is fusedThe detection precision of the small target is improved; utilizing the segmentation mask obtained under the large scale to detect the DBox result set obtained under the small scale1Carrying out secondary screening; firstly, carrying out corrosion treatment on the segmentation mask to reduce the area of a large target region; then judging whether the pixel values of 4 angular points of the detection frame are all 0, if so, keeping, otherwise, abandoning the detection frame; assuming that the number of the test results after screening is D2If yes, the detection set becomes DBox2={((Xg,Yg,Wg,Hg),cg,sg)|g∈[1,D2]}。
4. The method as claimed in claim 1, wherein the method for automatically detecting small objects in the high resolution image based on computer vision and deep learning is characterized in that, in order to suppress the overlapping phenomenon of the detection frames caused by the overlapping sliding of the sliding window, a non-maximum suppression algorithm is applied to complete the further fusion of the detection frames, so as to obtain a set DBox containing F final detection resultsfinal={((Xf,Yf,Wf,Hf),cf,sf)|g∈[1,F]}。
5. The method for automatically detecting the small and medium targets in the high-resolution image based on the computer vision and the deep learning as claimed in claim 1, wherein a segmentation model Mask RCNN is trained by a large-scale picture set and labeling information thereof; firstly, dividing the sample set into two subsets of training and verification; then, adjusting model parameters by using the training set, and testing the quality of the model on the verification set every 10 steps; the training mode adopts transfer learning, namely pre-training is carried out on a large data set, and then parameter fine tuning is carried out on a small training set of an actual application scene; training the designated number of rounds by adopting a small batch gradient descent method, and finally saving the nodes on the verification set to obtain a ModelS
6. Automatic detection of small objects in high-resolution images based on computer vision and deep learning as claimed in claim 1The measuring method is characterized in that the peripheral area of the defect in the small-scale high-resolution image is randomly cut to obtain a slice sample set which conforms to the input size of the SSD model; in order to ensure that all the labeled boxes in the slice are complete and not truncated, a random clipping algorithm is designed as follows; let WwAnd HwThe length and width of the window are provided with a marking frame BBoxiHas a length and width of WbiAnd HbiThe coordinate of the upper left corner point is (x)bi,ybi) Wherein i ∈ [0, N]。
CN202010346094.3A 2020-04-27 2020-04-27 Automatic small target detection method in high-resolution image based on computer vision and deep learning Pending CN111582093A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010346094.3A CN111582093A (en) 2020-04-27 2020-04-27 Automatic small target detection method in high-resolution image based on computer vision and deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010346094.3A CN111582093A (en) 2020-04-27 2020-04-27 Automatic small target detection method in high-resolution image based on computer vision and deep learning

Publications (1)

Publication Number Publication Date
CN111582093A true CN111582093A (en) 2020-08-25

Family

ID=72124530

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010346094.3A Pending CN111582093A (en) 2020-04-27 2020-04-27 Automatic small target detection method in high-resolution image based on computer vision and deep learning

Country Status (1)

Country Link
CN (1) CN111582093A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183463A (en) * 2020-10-23 2021-01-05 珠海大横琴科技发展有限公司 Ship identification model verification method and device based on radar image
CN112446379A (en) * 2021-02-01 2021-03-05 清华大学 Self-adaptive intelligent processing method for dynamic large scene
CN112927247A (en) * 2021-03-08 2021-06-08 常州微亿智造科技有限公司 Graph cutting method based on target detection, graph cutting device and storage medium
CN113222889A (en) * 2021-03-30 2021-08-06 大连智慧渔业科技有限公司 Industrial aquaculture counting method and device for aquatic aquaculture objects under high-resolution images
CN113591668A (en) * 2021-07-26 2021-11-02 南京大学 Wide-area unknown dam automatic detection method using deep learning and spatial analysis
CN113781502A (en) * 2021-09-30 2021-12-10 浪潮云信息技术股份公司 Method for preprocessing image training data with ultra-large resolution
CN113989744A (en) * 2021-10-29 2022-01-28 西安电子科技大学 Pedestrian target detection method and system based on oversized high-resolution image
CN114092364A (en) * 2021-08-12 2022-02-25 荣耀终端有限公司 Image processing method and related device
CN114120220A (en) * 2021-10-29 2022-03-01 北京航天自动控制研究所 Target detection method and device based on computer vision
CN116503607A (en) * 2023-06-28 2023-07-28 天津市中西医结合医院(天津市南开医院) CT image segmentation method and system based on deep learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN109859171A (en) * 2019-01-07 2019-06-07 北京工业大学 A kind of flooring defect automatic testing method based on computer vision and deep learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN109859171A (en) * 2019-01-07 2019-06-07 北京工业大学 A kind of flooring defect automatic testing method based on computer vision and deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GUANG-MIN SUN 等: "Small Object Detection in High-Resolution Images Based on Multiscale Detection and Re-training" *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183463B (en) * 2020-10-23 2021-10-15 珠海大横琴科技发展有限公司 Ship identification model verification method and device based on radar image
CN112183463A (en) * 2020-10-23 2021-01-05 珠海大横琴科技发展有限公司 Ship identification model verification method and device based on radar image
CN112446379A (en) * 2021-02-01 2021-03-05 清华大学 Self-adaptive intelligent processing method for dynamic large scene
CN112927247A (en) * 2021-03-08 2021-06-08 常州微亿智造科技有限公司 Graph cutting method based on target detection, graph cutting device and storage medium
CN113222889A (en) * 2021-03-30 2021-08-06 大连智慧渔业科技有限公司 Industrial aquaculture counting method and device for aquatic aquaculture objects under high-resolution images
CN113222889B (en) * 2021-03-30 2024-03-12 大连智慧渔业科技有限公司 Industrial aquaculture counting method and device for aquaculture under high-resolution image
CN113591668A (en) * 2021-07-26 2021-11-02 南京大学 Wide-area unknown dam automatic detection method using deep learning and spatial analysis
CN113591668B (en) * 2021-07-26 2023-11-21 南京大学 Wide area unknown dam automatic detection method using deep learning and space analysis
CN114092364B (en) * 2021-08-12 2023-10-03 荣耀终端有限公司 Image processing method and related device
CN114092364A (en) * 2021-08-12 2022-02-25 荣耀终端有限公司 Image processing method and related device
CN113781502A (en) * 2021-09-30 2021-12-10 浪潮云信息技术股份公司 Method for preprocessing image training data with ultra-large resolution
CN114120220A (en) * 2021-10-29 2022-03-01 北京航天自动控制研究所 Target detection method and device based on computer vision
CN113989744A (en) * 2021-10-29 2022-01-28 西安电子科技大学 Pedestrian target detection method and system based on oversized high-resolution image
CN116503607B (en) * 2023-06-28 2023-09-19 天津市中西医结合医院(天津市南开医院) CT image segmentation method and system based on deep learning
CN116503607A (en) * 2023-06-28 2023-07-28 天津市中西医结合医院(天津市南开医院) CT image segmentation method and system based on deep learning

Similar Documents

Publication Publication Date Title
CN111582093A (en) Automatic small target detection method in high-resolution image based on computer vision and deep learning
US20210319561A1 (en) Image segmentation method and system for pavement disease based on deep learning
CN112232391B (en) Dam crack detection method based on U-net network and SC-SAM attention mechanism
CN109409263B (en) Method for detecting urban ground feature change of remote sensing image based on Siamese convolutional network
CN109978839B (en) Method for detecting wafer low-texture defects
CN108921799B (en) Remote sensing image thin cloud removing method based on multi-scale collaborative learning convolutional neural network
CN113076871B (en) Fish shoal automatic detection method based on target shielding compensation
CN110163213B (en) Remote sensing image segmentation method based on disparity map and multi-scale depth network model
CN115797350B (en) Bridge disease detection method, device, computer equipment and storage medium
CN111754538B (en) Threshold segmentation method for USB surface defect detection
CN110889863A (en) Target tracking method based on target perception correlation filtering
CN111353396A (en) Concrete crack segmentation method based on SCSEOCUnet
CN111986164A (en) Road crack detection method based on multi-source Unet + Attention network migration
CN117495735B (en) Automatic building elevation texture repairing method and system based on structure guidance
CN112669301B (en) High-speed rail bottom plate paint removal fault detection method
CN115829995A (en) Cloth flaw detection method and system based on pixel-level multi-scale feature fusion
CN116403109A (en) Building identification and extraction method and system based on improved neural network
CN116740528A (en) Shadow feature-based side-scan sonar image target detection method and system
CN117541652A (en) Dynamic SLAM method based on depth LK optical flow method and D-PROSAC sampling strategy
CN111222514B (en) Local map optimization method based on visual positioning
CN116205876A (en) Unsupervised notebook appearance defect detection method based on multi-scale standardized flow
CN113610024B (en) Multi-strategy deep learning remote sensing image small target detection method
CN115457044A (en) Pavement crack segmentation method based on class activation mapping
CN113158856B (en) Processing method and device for extracting target area in remote sensing image
CN113160078B (en) Method, device and equipment for removing rain from traffic vehicle image in rainy day and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200825