CN111582093A

CN111582093A - An automatic detection method for small objects in high-resolution images based on computer vision and deep learning

Info

Publication number: CN111582093A
Application number: CN202010346094.3A
Authority: CN
Inventors: 孙光民; 陈佳阳; 李煜; 林朋飞; 朱美龙; 梁浩
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2020-04-27
Filing date: 2020-04-27
Publication date: 2020-08-25

Abstract

The invention discloses an automatic detection method for small objects in high-resolution images based on computer vision and deep learning. Then, the low-resolution detectors are trained at different scales and applied for detection, and the detection results at different scales are obtained. Finally, these detection results are fused to obtain the final detection results for small targets. The invention solves the problem that it is difficult for the target detector in the prior art to detect tiny targets in high-resolution images.

Description

A self-portrait of small objects in high-resolution images based on computer vision and deep learning Motion detection method

技术领域technical field

本发明属于目标检测技术，尤其涉及一种基于计算机视觉和深度学习的高分辨率图像中微小目标自动检测方法。The invention belongs to target detection technology, in particular to an automatic detection method for tiny targets in high-resolution images based on computer vision and deep learning.

背景技术Background technique

目前应用比较广泛的基于深度学习的目标检测器主要可以分为两类：第一类是两步(two stage)目标检测器，如Fast R-CNN，Faster R-CNN，Mask R-CNN等，这些算法特点都是将目标检测分为两个阶段：首先提取候选区域，然后再将其送入检测网络完成对目标的定位与识别。第二类是单步(one stage)目标检测算法，如Single Shot Detection(SSD)、You Look Only Once(YOLO)，YOLO 9000，YOLO V3等，此类算法不需要预先提取候选框，而是直接通过网络中预设框来完成目标位置的回归和类别的判断，是一种端到端的目标检测算法。在待检测目标尺度较大且不密集的场景下，两步目标检测器和单步目标检测器都具有较高的检测精度，而后者相较前者有更快的检测速度。然而由于受到目标实际大小、拍摄设备、拍摄距离、观测尺度等因素的影响，真实目标在图像中往往会表现为小目标。与大目标相比，小目标的像素更少、可提取的特征也不明显。在对这些小目标进行检测时，无论是两步还是单步检测器都难以取得较好的检测效果。Currently widely used target detectors based on deep learning can be mainly divided into two categories: the first category is two-stage target detectors, such as Fast R-CNN, Faster R-CNN, Mask R-CNN, etc. The characteristics of these algorithms are to divide the target detection into two stages: first extract the candidate area, and then send it to the detection network to complete the target location and recognition. The second category is one-stage target detection algorithms, such as Single Shot Detection (SSD), You Look Only Once (YOLO), YOLO 9000, YOLO V3, etc. Such algorithms do not need to pre-extract candidate frames, but directly It is an end-to-end target detection algorithm to complete the target position regression and category judgment through the preset frame in the network. In the scene where the scale of the target to be detected is large and not dense, both the two-step target detector and the one-step target detector have higher detection accuracy, while the latter has a faster detection speed than the former. However, due to the influence of the actual size of the target, shooting equipment, shooting distance, observation scale and other factors, the real target often appears as a small target in the image. Compared with large objects, small objects have fewer pixels and less obvious features to extract. When detecting these small targets, it is difficult to achieve better detection results with either two-step or single-step detectors.

目前针对小目标检测算法的优化主要集中在模型的改进上，即在输入低分辨率图像尺寸不变的前提上，通过改进模型结构提升检测器的特征提取能力以及其检测精度。目前比较有效的改进算法是特征金字塔网络(Feature Pyramid Networks)。该网络可嵌入到上述两步、单步检测器中，其可将主体网络生成的低层次特征图与高层次特征图以特定方式进行融合，完成对特征金字塔的重构。这样操作后低层次的特征图感受野范围提升，其语义信息得到增强，最终使得模型对小目标检测的精度有了很大提升。At present, the optimization of the small target detection algorithm mainly focuses on the improvement of the model, that is, on the premise that the input low-resolution image size remains unchanged, the feature extraction ability and detection accuracy of the detector can be improved by improving the model structure. The most effective improved algorithm at present is Feature Pyramid Networks. The network can be embedded in the above two-step, single-step detector, which can fuse the low-level feature map and high-level feature map generated by the main network in a specific way to complete the reconstruction of the feature pyramid. After this operation, the receptive field range of the low-level feature map is improved, and its semantic information is enhanced, which finally greatly improves the accuracy of the model for small target detection.

虽然上述改进可以提升模型检测精度，但是这些模型处理的对象仍然是低分辨率图像。随着摄像设备硬件性能的提升，人们可以获得更高分辨率的图像。而与低分辨率图像相比，小目标在高分辨率图像中可以用更多的像素来表征，即可以被更加清晰的刻画出来。这一特点为小目标检测任务提供了有效的数据支撑。虽然获得了数据上的支撑，但目前的检测算法基本都不适用于分辨率高达几千万像素的图像。但如果将高分辨率图像进行下采样以适应检测模型，又将丢失信息，无法充分地利用高分辨率图像的特点，最终也很难对小目标进行检测。针对高分辨率图像小目标检测问题，基于高分辨率卫星影像的目标检测流程Satellite Imagery Multiscale Rapid Detection with Windowed Networks(SIMRDWN)利用快速检测器对通过滑窗获取的候选区域进行检测，可以完成对任意尺寸高分辨率图像的快速检测任务。但是该方法检测精度低、虚警多，且执行时间较长。针对此问题，本专利提出了一种简单有效的办法来解决高分辨率图像中小目标检测的问题。该算法将原始检测任务在图像的不同尺度上进行拆分，得到具有逻辑关联的多尺度检测任务组；然后分别针对不同尺度下的检测任务训练相应的低分辨率目标检测器；最后将各尺度下的检测结果进行融合得到最终的缺陷标检测结果。研究目的在于构建基于深度神经网络的模式识别框架，探索基于高分辨率图像的小目标检测方法。Although the above improvements can improve the model detection accuracy, the objects processed by these models are still low-resolution images. With the improvement of the hardware performance of camera equipment, people can obtain higher resolution images. Compared with low-resolution images, small objects can be represented by more pixels in high-resolution images, that is, they can be more clearly depicted. This feature provides effective data support for small target detection tasks. Although supported by the data, the current detection algorithms are basically not suitable for images with a resolution of up to tens of millions of pixels. However, if the high-resolution image is downsampled to adapt to the detection model, information will be lost, the characteristics of the high-resolution image cannot be fully utilized, and it is difficult to detect small objects in the end. Aiming at the problem of small target detection in high-resolution images, the target detection process based on high-resolution satellite images Satellite Imagery Multiscale Rapid Detection with Windowed Networks (SIMRDWN) uses a fast detector to detect candidate regions obtained through sliding windows, which can complete any A fast detection task for high-resolution images of size. However, this method has low detection accuracy, many false alarms, and long execution time. In response to this problem, this patent proposes a simple and effective method to solve the problem of small target detection in high-resolution images. The algorithm splits the original detection task on different scales of the image to obtain a multi-scale detection task group with logical correlation; then trains corresponding low-resolution object detectors for detection tasks at different scales; The following detection results are fused to obtain the final defect mark detection result. The purpose of the research is to construct a pattern recognition framework based on deep neural network and explore small target detection methods based on high-resolution images.

发明内容SUMMARY OF THE INVENTION

针对现有技术中的问题，本发明提供一种高分辨率图像中小目标自动检测方法，该方法解决了现有技术中目标检测器难以在高分辨率图像进行微小目标检测的难题。Aiming at the problems in the prior art, the present invention provides an automatic detection method for small objects in high-resolution images, which solves the problem that it is difficult for object detectors in the prior art to detect tiny objects in high-resolution images.

本方法分为检测和训练流程，一种基于计算机视觉和深度学习的高分辨率图像中小目标自动检测方法，其特征在于，步骤如下：The method is divided into detection and training processes, an automatic detection method for small targets in high-resolution images based on computer vision and deep learning, and is characterized in that the steps are as follows:

S1检测流程S1 detection process

S1.1建立多尺度任务组S1.1 Establish a multiscale task force

依据高斯金字塔理论对原始待检测高分辨率图像建立双尺度图像金字塔，并将小目标检测任务在这两个尺度下进行适当分解得到多尺度任务组。其中，在大尺度下设置与小目标没有包含关系的大目标分割任务，而在小尺度下设置原始小目标检测任务。According to the Gaussian pyramid theory, a dual-scale image pyramid is established for the original high-resolution image to be detected, and the small target detection task is appropriately decomposed at these two scales to obtain a multi-scale task group. Among them, the large-scale object segmentation task that has no inclusion relationship with the small object is set at the large scale, and the original small-object detection task is set at the small scale.

S1.2大尺度下的目标分割S1.2 Object segmentation at large scale

在大尺度下，利用Mask R-CNN模型对大尺度图像进行实例分割，并将得到的低分辨率掩膜进行上采样恢复成原始图像分辨率大小。At large scales, the Mask R-CNN model is used to segment large-scale images, and the resulting low-resolution masks are upsampled to restore the original image resolution.

S1.3小尺度下目标检测S1.3 Target detection at small scale

在小尺度下，利用重叠滑窗提取小尺度图像中的候选区域，并依据掩膜Mask对候选区域进行筛选，将与Mask中大目标区域没有交集的候选区域送入目标检测器SSD检测。在对所有候选区域检测完成后，将这些检测结果从子区域映射回原图。In the small scale, the overlapping sliding window is used to extract the candidate regions in the small scale image, and the candidate regions are screened according to the mask, and the candidate regions that have no intersection with the large target region in the Mask are sent to the target detector SSD for detection. After the detection of all candidate regions is completed, these detection results are mapped from the sub-regions back to the original image.

S1.4双尺度下分割与检测结果融合S1.4 Fusion of segmentation and detection results under dual scales

利用大尺度下得到的分割掩膜对小尺度下得到的检测框进行二次筛选。首先对分割掩膜进行形态学处理，然后将出现在大目标区域的检测框删除，最后应用非极大值抑制对重叠检测框进行融合，得到高分辨率图像中小目标的最终检测结果。Using the segmentation mask obtained at the large scale, the detection frame obtained at the small scale is used for secondary screening. First, the segmentation mask is morphologically processed, then the detection frames appearing in the large target area are deleted, and finally, the overlapping detection frames are fused by non-maximum suppression to obtain the final detection results of small targets in high-resolution images.

S2训练流程S2 training process

S2.1分割模型训练S2.1 Segmentation model training

利用大尺度下的图片及标注信息，以迁移学习的方式训练Mask RCNN模型，保存在验证集上表现最好的节点作为训练好的分割模型Model_S。Using large-scale images and annotation information, the Mask RCNN model is trained by transfer learning, and the best performing node on the validation set is saved as the trained segmentation model Model _S.

S2.2检测模型初次训练S2.2 Initial training of detection model

对小尺度高分辨率图像中的缺陷周边区域按特定方式进行随机裁剪，得到符合SSD模型输入尺寸大小的切片样本集。以迁移学习的方式训练SSD模型，保存在验证集上表现最好的节点作为训练好的分割模型Model_D1。The defect surrounding area in the small-scale high-resolution image is randomly cropped in a specific way to obtain a slice sample set that conforms to the input size of the SSD model. The SSD model is trained by transfer learning, and the best performing node on the validation set is saved as the trained segmentation model Model _D1 .

S2.3检测模型二次训练S2.3 Secondary training of detection model

将训练好的Model_S和Model_D1嵌入到上述检测流程中，并依照其对高分辨率图集进行检测。将结果中的误检框剪裁出来作为单独一类加入到原始切片集中，并使用该新训练集对检测模型Model_D1进行重新训练，得到二次训练模型Model_D2。最后用该二次训练模型Model_D2替换原检测框架中的目标检测器Model_D1，完成最终检测框架的搭建。The trained Model _S and Model _D1 are embedded in the above detection process, and the high-resolution atlas is detected according to them. The false detection frame in the result is cut out as a separate class and added to the original slice set, and the detection model Model _D1 is retrained using the new training set to obtain the secondary training model Model _D2 . Finally, the target detector Model _D1 in the original detection framework is replaced with the secondary training model Model _D2 to complete the construction of the final detection framework.

综上所述，本发明利用一种多尺度分割与检测模型结合的方式完成对高分辨率图像中小目标的自动检测。同时，本发明的主要工作是联合深度学习模型在多个尺度上对图片内容进行理解，利用不同信息提升检测精度并加快检测效率。To sum up, the present invention uses a combination of multi-scale segmentation and detection model to complete the automatic detection of small targets in high-resolution images. At the same time, the main work of the present invention is to combine the deep learning model to understand the picture content on multiple scales, and to use different information to improve the detection accuracy and speed up the detection efficiency.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其它的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.

图1为本发明提出的高分辨率图像小目标检测方法流程示意图；1 is a schematic flowchart of a method for detecting small objects in a high-resolution image proposed by the present invention;

图2为本发明实施例提供的楼面高分辨率图像微小缺陷检测方法流程图；2 is a flowchart of a method for detecting small defects in a high-resolution image of a floor provided by an embodiment of the present invention;

图3为本发明实施例提供的包含缺陷的楼面高分辨率图像；3 is a high-resolution image of a floor including defects provided by an embodiment of the present invention;

图4为本发明实施例提供的大尺度下的非墙体区域分割算法；FIG. 4 is a large-scale non-wall area segmentation algorithm provided by an embodiment of the present invention;

图5为本发明实施例提供的窗口内采样方式示意；FIG. 5 is a schematic diagram of an in-window sampling method provided by an embodiment of the present invention;

图6为本发明实施例提供的小尺度下的缺陷检测算法；FIG. 6 is a defect detection algorithm at a small scale provided by an embodiment of the present invention;

图7为本发明实施例提供的单尺度与多尺度融合检测效果对比。(a)为小尺度上的检测结果；(b)为融合分割掩膜后的小尺度检测结果；FIG. 7 is a comparison of single-scale and multi-scale fusion detection effects provided by an embodiment of the present invention. (a) is the detection result on the small scale; (b) is the small scale detection result after the fusion segmentation mask;

图8为本发明实施例提供的Mask-RCNN的训练结果。(a)为验证集上的mAP变化；(b)为模型检测效果示意图；FIG. 8 is a training result of Mask-RCNN provided by an embodiment of the present invention. (a) is the mAP change on the validation set; (b) is a schematic diagram of the model detection effect;

图9为本发明实施例提供的SSD的训练结果。(a)为验证集上的mAP变化；(b)为模型检测效果示意图；FIG. 9 is a training result of an SSD provided by an embodiment of the present invention. (a) is the mAP change on the validation set; (b) is a schematic diagram of the model detection effect;

图10为本发明实施例提供的二次训练前后检测算法在验证集图像上的表现。(a)为二次训练前的检测结果1；(b)为二次训练后的检测结果1；(c)为二次训练前的检测结果2；(d)为二次训练后的检测结果2。FIG. 10 is the performance of the detection algorithm before and after the secondary training provided by the embodiment of the present invention on the images of the validation set. (a) is the test result 1 before the second training; (b) is the test result 1 after the second training; (c) is the test result 2 before the second training; (d) is the test result after the second training 2.

具体实施方式Detailed ways

为了更好的解释本发明，以便于理解，下面结合附图，通过具体实施方式，对本发明作详细描述。In order to better explain the present invention and facilitate understanding, the present invention will be described in detail below with reference to the accompanying drawings and through specific embodiments.

在以下的描述中，将描述本发明的多个不同的方面，然而，对于本领域内的普通技术人员而言，可以仅仅利用本发明的一些或者全部结构或者流程来实施本发明。为了解释的明确性而言，阐述了特定的数目、配置和顺序，但是很明显，在没有这些特定细节的情况下也可以实施本发明。在其它情况下，为了不混淆本发明，对于一些众所周知的特征将不再进行详细阐述。In the following description, various aspects of the present invention will be described, however, to those of ordinary skill in the art, the present invention may be practiced using only some or all of the structures or processes of the present invention. For clarity of explanation, specific numbers, configurations, and orders are set forth, but it will be apparent that the invention may be practiced without these specific details. In other instances, well-known features have not been described in detail in order not to obscure the present invention.

针对现有小目标检测技术大部分是针对模型改进而难以处理高分辨率图像的问题，本发明提出一种基于计算机视觉和深度学习的高分辨率图像中小目标自动检测方法。通过依靠多尺度任务分解，联合大尺度分割和小尺度检测结果，最终实现高精度、高速度的高分辨率图像中微小目标的自动检测。Aiming at the problem that most of the existing small target detection technologies are aimed at model improvement and are difficult to process high-resolution images, the present invention proposes an automatic detection method for small targets in high-resolution images based on computer vision and deep learning. By relying on multi-scale task decomposition and combining large-scale segmentation and small-scale detection results, the automatic detection of small objects in high-precision and high-speed high-resolution images is finally realized.

一.检测流程1. Detection process

(一)建立多尺度任务组(1) Establish a multi-scale task force

依据高斯金字塔理论对原始待检测高分辨率图像建立包含一大一小两个尺度的图像金字塔。大尺度图像是先通过对原图像做高斯滤波，再经由8倍下采样得到的。生成的大尺度图像的宽和高皆为原图的1/8。According to the Gaussian pyramid theory, an image pyramid with two scales, one large and one small, is established for the original high-resolution image to be detected. Large-scale images are first obtained by Gaussian filtering of the original image, and then 8 times downsampling. The width and height of the generated large-scale image are both 1/8 of the original image.

其中二维高斯核函数如下所示：The two-dimensional Gaussian kernel function is as follows:

利用尺寸为5*5的高斯核模板G(x，y)与原图像I(x，y)做卷积操作，并进行8倍下采样，得到大尺度图像L(x，y)，其公式表述如下：Use the Gaussian kernel template G(x, y) of size 5*5 to perform convolution operation with the original image I(x, y), and perform 8 times downsampling to obtain a large-scale image L(x, y), the formula It is expressed as follows:

而小尺度即为图像的原始尺度，故小尺度图像S(x，y)＝I(x，y)不需要对原图进行任何操作即可获得。The small scale is the original scale of the image, so the small scale image S(x, y)=I(x, y) can be obtained without any operation on the original image.

针对大尺度图像L(x，y)，设定目标分割任务。分割目标的选定应该是不包含待检测小目标的尺寸较大的目标，且这些目标能在该大尺度图像中仍保留着主要的特征。而针对小尺度图像S(x，y)，仍然设定小目标检测任务。For a large-scale image L(x, y), a target segmentation task is set. The selection of segmentation targets should be large targets that do not contain small targets to be detected, and these targets can still retain the main features in the large-scale image. For the small-scale image S(x, y), the small target detection task is still set.

(二)大尺度下的目标分割(2) Target segmentation at large scale

在大尺度下，利用已经训练好的Mask R-CNN模型对大尺度图像中设定好的大目标进行分割，得到与大尺度图像L(x，y)分辨率相同的二值掩膜图像Ms(x，y)。为了恢复成原始图像I(x，y)的分辨率大小，方便后续步骤的使用，还需要将得到的低分辨率掩膜Ms(x，y)进行8倍上采样得到M(x，y)。At large scale, the trained Mask R-CNN model is used to segment the large target set in the large scale image, and the binary mask image Ms with the same resolution as the large scale image L(x, y) is obtained. (x, y). In order to restore the resolution size of the original image I(x, y) to facilitate the use of subsequent steps, it is also necessary to upsample the obtained low-resolution mask Ms(x, y) by 8 times to obtain M(x, y) .

M(x，y)＝8↑Ms(x，y)M(x,y)=8↑Ms(x,y)

其中插值方式采用最临近插值法。在得到的高分辨率掩膜图像M(x，y)中，大目标区域的像素值为255，其他区域像素值为0。同时，大尺度下的分割手段不限于深度模型，也可以辅以传统图像处理方法。The interpolation method adopts the nearest neighbor interpolation method. In the obtained high-resolution mask image M(x, y), the pixel value of the large target area is 255, and the pixel value of other areas is 0. At the same time, segmentation methods at large scales are not limited to depth models, but can also be supplemented by traditional image processing methods.

(三)小尺度下目标检测(3) Target detection at small scales

在小尺度下，由于其图像分辨率远远高于SSD检测器的输入尺寸，故利用一个与检测器输入尺寸一致的滑窗在小尺度图像上重叠滑动以提取小尺度图像中的候选检测区域Proposal_S。在送入SSD检测前，先依据掩膜M(x，y)对候选区域进行筛选，筛选方式如下所述：利用一个同样的滑窗在M(x，y)上同步提取子区域Proposal_M，如果Proposal_M中9个预先设定的采样点皆为255的像素，即认为该区域主要为大目标区域，那么就不对其进行后续检测。在窗口遍历完整张图片后，将会得到每个Proposal_S对应的检测结果。设大于置信度阈值的检测结果有D₁个，第d个检测框可表示为坐标(x_d，y_d，w_d，h_d)、类别c_d以及置信度s_d，则检测集合DBox₁＝{((x_d，y_d，w_d，h_d)，c_d，s_d)|d∈[1，D₁]}。At small scales, since the image resolution is much higher than the input size of the SSD detector, a sliding window that is consistent with the input size of the detector is used to overlap and slide on the small-scale image to extract candidate detection regions in the small-scale image. Proposal _S. Before being sent to the SSD for detection, the candidate regions are first screened according to the mask _M (x, y). If the 9 preset sampling points in Proposal _M are all 255 pixels, that is, the area is considered to be mainly a large target area, then subsequent detection will not be performed. After the window traverses the entire image, the detection result corresponding to each Proposal _S will be obtained. There are D ₁ detection results that are greater than the confidence threshold, and the d-th detection box can be expressed as coordinates (x _d , y _d , w _d , h _d ), category _{cd and confidence s d} _, then the detection set DBox ₁ ={((x _d , y _d , w _d , h _d ), c _d , s _d )|d∈[1, D ₁ ]}.

接着将这些检测结果从Proposal_S映射回原图，映射方法如下所述：设Proposal_S在原始图像中的左上角点的位置被记录为(X_ps，Y_ps)。那么依据坐标变换，可以得到目标检测框对应于原图像的4个参数(X_d，Y_d，W_d，H_d)These detection results are then mapped from the Proposal _S back to the original image. The mapping method is as follows: Let the position of the upper left corner of the Proposal _S in the original image be recorded as (X _ps , Y _ps ). Then according to the coordinate transformation, the four parameters (X _d , Y _d , W _d , H _d ) of the target detection frame corresponding to the original image can be obtained

X_d＝X_ps+x_d X _d =X _ps +x _d

Y_d＝Y_ps+y_d Y _d =Y _ps +y _d

W_d＝w_d W _d = w _d

H_d＝h_d H _d = h _d

检测集合变为DBox₂＝{((X_d，Y_d，W_d，H_d)，c_d，s_d)|d∈[1，D₁]}。The detection set becomes DBox ₂ ={((X _d , Y _d , W _d , H _d ), c _d , s _d )|d∈[1, D ₁ ]}.

(四)双尺度下分割与检测结果融合(4) Fusion of segmentation and detection results under dual scales

在完成多尺度下的分割与检测后，可以将多尺度信息融合起来提高小目标的检测精度。利用大尺度下得到的分割掩膜可以对小尺度下得到的检测结果集合DBox₁进行二次筛选。首先对分割掩膜进行腐蚀处理，使得大目标区域的面积缩小。然后将判断检测框的4个角点的像素值是否全部为0，如果是则保留，否则就舍弃该检测框。设筛选后的检测结果个数变为D₂个，则检测集合变为DBox₂＝{((X_g，Y_g，W_g，H_g)，c_g，s_g)|g∈[1，D₂]}。After completing the multi-scale segmentation and detection, the multi-scale information can be fused to improve the detection accuracy of small objects. Using the segmentation mask obtained at the large scale, the detection result set DBox ₁ obtained at the small scale can be used for secondary screening. First, the segmentation mask is etched so that the area of the large target area is reduced. Then it will be judged whether the pixel values of the four corners of the detection frame are all 0, if so, keep it, otherwise the detection frame will be discarded. Assuming that the number of filtered detection results becomes D ₂ , the detection set becomes DBox ₂ ={((X _g , Y _g , W _g , H _g ), c _g , s _g )|g∈[1, D ₂ ]}.

为了抑制由滑窗重叠滑动引起的检测框的重叠现象，应用非极大值抑制算法完成对检测框的进一步融合，得到含有F个最终检测结果的集合DBox_final＝{((X_f，Y_f，W_f，H_f)，c_f，s_f)|g∈[1，F]}。现在假设有一个候选的boxes的集合B和其对应的scores集合S，非极大值抑制的算法如下：In order to suppress the overlapping phenomenon of the detection frames caused by the overlapping sliding of the sliding window, the non-maximum value suppression algorithm is used to complete the further fusion of the detection frames, and a set containing F final detection results is obtained DBox _final = {((X _f , Y _f , W _f , H _f ), c _f , s _f )|g∈[1, F]}. Now suppose there is a set B of candidate boxes and its corresponding set of scores S, the algorithm of non-maximum suppression is as follows:

1.找出DBox₂中置信度s最高的元素dbox；1. Find the element dbox with the highest confidence s in DBox ₂ ;

2.将dbox从DBox₂中删除，并加入到DBox_final中；2. Delete dbox from DBox ₂ and add it to DBox _final ;

3.从DBox₂中删除与dbox坐标框重叠区域大于阈值N_t的其他框；3. Delete other boxes whose overlapping area with the dbox coordinate frame is greater than the threshold N _t from DBox ₂ ;

4.重复上述步骤1-3。4. Repeat steps 1-3 above.

二.训练流程2. Training process

(一)分割模型训练(1) Segmentation model training

以大尺度图片集及其标注信息来训练实例分割模型Mask RCNN。首先将该样本集分为训练与验证两个子集。然后利用训练集进行模型参数的调整，并每隔10步在验证集上测试模型的优劣。训练方式采用的是迁移学习，即先在大数据集上预训练，再在实际应用场景的小训练集上进行参数微调。优化方式采用小批量梯度下降法(Mini-Batch GradientDescent)，批量大小(Batch size)根据硬件环境进行设定。训练指定轮数后，最后保存出验证集上表现最好的节点。得到Model_S。The instance segmentation model Mask RCNN is trained with a large-scale image set and its annotation information. First, the sample set is divided into two subsets, training and validation. Then use the training set to adjust the model parameters, and test the pros and cons of the model on the validation set every 10 steps. The training method adopts transfer learning, that is, pre-training on a large data set, and then fine-tuning the parameters on a small training set of practical application scenarios. The optimization method adopts Mini-Batch GradientDescent, and the batch size is set according to the hardware environment. After training for a specified number of rounds, the best performing node on the validation set is finally saved. Get a Model _S.

(二)检测模型的样本初次训练(2) Initial training of samples of the detection model

与分割模型的训练不同，检测模型的训练需要先对样本进行处理。对小尺度高分辨率图像中的缺陷周边区域进行随机裁剪，得到符合SSD模型输入尺寸大小的切片样本集。为了保证切片中标注框都是完整的而不被截断，设计了一种随机剪裁算法如下。设W_w和H_w是窗口的长和宽，设标注框BBox_i的长和宽为W_bi和H_bi，左上角点坐标为(x_bi，y_bi)，其中i∈[0，N]。算法流程如下：Unlike the training of the segmentation model, the training of the detection model requires the samples to be processed first. The defect surrounding area in the small-scale high-resolution image is randomly cropped to obtain a slice sample set that conforms to the input size of the SSD model. In order to ensure that the annotation boxes in the slice are complete and not truncated, a random cropping algorithm is designed as follows. Let W _w and H _w be the length and width of the window, let the length and width of the label box BBox _i be W _bi and H _bi , and the coordinates of the upper left corner point are (x _bi , y _bi ), where i∈[0, N] . The algorithm flow is as follows:

1.i＝01. i=0

2.对于BBox_i，首先不考虑附近标注框的影响，只要求裁剪框窗口将BBox_i完整包含，那么裁剪窗口左上角点的坐标取值范围为一矩形区域。该区域Cand的左上角点坐标为(x_bi+W_bi-W_w，y_bi+H_bi-H_w)。该区域的长和宽为(W_w-W_bi，H_w-H_bi)。设置一张掩膜图像，以其上像素值为255的点来记录可选点的位置。同时还需要计算出一个窗口可能覆盖的区域Cover左上角与Cand相同，宽为W_cover＝2W_w-W_bi，高为H_cover＝2H_w-H_bi。以BBox_i区域为中心，延长其边长，将Cover的其余区域分为8个矩形区块。以左上角块为起点，顺时针为区块分配序号，依次为：左上角块——0，正上块——1，右上块——2，正右块——3，右下块——4，正下块——5，左下块——6，正左块——7。2. For BBox _i , the influence of the nearby annotation frame is not considered first, and only the cropping box window is required to completely contain BBox _i , then the coordinate value range of the upper left corner of the cropping window is a rectangular area. The coordinates of the upper left corner of the area Cand are (x _bi +W _bi -W _w , y _bi +H _bi -H _w ). The length and width of this region are (W _w -W _bi , H _w -H _bi ). Set up a mask image to record the positions of selectable points with points on which the pixel value is 255. At the same time, it is also necessary to calculate that the upper left corner of Cover, which may be covered by a window, is the same as Cand, the width is W _cover =2W _w -W _bi , and the height is H _cover =2H _w -H _bi . Take the BBox _i area as the center, extend its side length, and divide the rest of the Cover area into 8 rectangular blocks. Starting from the upper left block, assign serial numbers to the blocks clockwise, as follows: upper left block - 0, upper right block - 1, upper right block - 2, right right block - 3, lower right block - 4, the lower block - 5, the lower left block - 6, the positive left block - 7.

3.遍历所有标注框，将与Cover区域相交的M个检测框(除BBox_i外)记录下来{NearBBox_j|j∈[0，M]}3. Traverse all the annotation boxes and record the M detection boxes (except BBox _i ) that intersect the Cover area {NearBBox _j |j∈[0, M]}

4.遍历{NearBBox_j|j∈[0，M]}，依据交集部分所处子块位置对CandMask的不同部位进行置0操作。以每个子区的左上角点为(0，0)，设NearBBox_j与CoverSubRegion_k的交集部分的左上角点坐标为(x_LT，jk，y_LT，jk)，右下角点坐标为(x_RD，jk，y_RD，jk)。具体操作方式如下所述：4. Traverse {NearBBox _j | j∈[0, M]}, and set 0 to different parts of CandMask according to the position of the sub-block where the intersection part is located. Taking the upper left corner of each subregion as (0, 0), the coordinates of the upper left corner of the intersection of NearBBox _j and CoverSubRegion _k are (x _{LT, jk} , y _{LT, jk} ), and the coordinates of the lower right corner are (x _{RD , jk} , y _{RD, jk} ). The specific operation method is as follows:

5.此时CandMask中所有像素值为255的位置都是满足能完整包含标注框的裁剪框的左上角点坐标候选点。然后即可从中随机挑选一些坐标，并用于裁剪图像。同时，还要对标注框的坐标信息进行线性调整。5. At this time, all positions with a pixel value of 255 in CandMask are candidate points that satisfy the coordinates of the upper left corner of the cropping frame that can completely contain the annotation frame. Then some random coordinates can be picked from it and used to crop the image. At the same time, the coordinate information of the callout frame should be adjusted linearly.

依据上述算法制作出用于训练检测器的样本集后，首先将该样本集分为训练与验证两个子集。然后利用训练集进行模型参数的调整，并每隔10步在验证集上测试模型的优劣。训练方式采用的是迁移学习，即先在大数据集上预训练，再在实际应用场景的小训练集上进行参数微调。优化方式采用小批量梯度下降法(Mini-Batch Gradient Descent)，批量大小(Batch size)根据硬件环境进行设定。训练指定轮数后，最后保存出验证集上表现最好的节点。得到Model_D1。After the sample set for training the detector is produced according to the above algorithm, the sample set is firstly divided into two subsets of training and verification. Then use the training set to adjust the model parameters, and test the pros and cons of the model on the validation set every 10 steps. The training method adopts transfer learning, that is, pre-training on a large data set, and then fine-tuning the parameters on a small training set of practical application scenarios. The optimization method adopts Mini-Batch Gradient Descent, and the batch size is set according to the hardware environment. After training for a specified number of rounds, the best performing node on the validation set is finally saved. Get Model _D1 .

(三)检测模型二次训练(3) Secondary training of detection model

利用上述检测流程对高分辨率图集进行检测，将误检框作为单独一类加入到原始切片集中，并对检测模型进行重新训练，得到二次训练模型。然后用该二次训练模型替换原检测框架中的目标检测器。依据检测框和真实框的面积交并比IOU来判断，一个检测结果是否为误检。设检测阈值为T_IOU，则The above detection process is used to detect the high-resolution atlas, and the false positive frame is added to the original slice set as a separate category, and the detection model is retrained to obtain a secondary training model. Then replace the object detector in the original detection framework with this secondary training model. According to the area intersection of the detection frame and the real frame, it is judged whether a detection result is a false detection. Suppose the detection threshold is T _IOU , then

以错误检测框为中心，以二(二)中的方法对其进行随机剪裁，得到符合检测器输入大小的负样本集。然后去除每一个切片上原有的检测框，并重新设置一个类别为normal，尺寸为切片大小的检测框。接着将这些附带标注的切片与初始正样本切片集进行混合，并用该合成之后的数据集对初次训练模型Model_D1进行二次训练，得到Model_D2。最终在检测流程中使用Model_S和Model_D2进行检测。Taking the error detection frame as the center, it is randomly cropped by the method in two (two) to obtain a negative sample set that matches the input size of the detector. Then remove the original detection frame on each slice, and reset a detection frame whose category is normal and whose size is the size of the slice. Then, these annotated slices are mixed with the initial positive sample slice set, and the initial training model Model _D1 is retrained with the synthesized data set to obtain Model _D2 . Finally, Model _S and Model _D2 are used for detection in the detection process.

综上所述，本发明将高分辨率图像中小目标检测任务在多尺度上进行拆解，并使用不同的深度模型去解决各个子任务，最终将多尺度分割与检测结果融合起来得到高精度的检测结果。In summary, the present invention disassembles the small target detection task in high-resolution images on multiple scales, and uses different depth models to solve each sub-task, and finally fuses the multi-scale segmentation and detection results to obtain high-precision detection results. Test results.

实施例Example

图1展示了基于计算机视觉和深度学习的高分辨率图像小目标通用检测框架。图2示出了该检测框架在本实施例中的应用于高分辨率楼面图像微小缺陷自动检测的流程及中间结果示意图。Figure 1 shows a general detection framework for small objects in high-resolution images based on computer vision and deep learning. FIG. 2 shows a schematic diagram of the process of applying the detection framework to the automatic detection of small defects in high-resolution floor images in this embodiment and an intermediate result.

实施例采用某小区的墙体实拍图像，分辨率为7952*5304，文件大小一般在30M左右。初始获得的缺陷图像107张，缺陷目标195个。如图3所示，其中右侧框内为局部放大后的缺陷。The embodiment adopts the real shot image of the wall of a certain district, the resolution is 7952*5304, and the file size is generally about 30M. Initially obtained 107 defect images and 195 defect targets. As shown in Figure 3, the right frame is a partially enlarged defect.

首先介绍检测流程，根据检测步骤一，首先构建双尺度金字塔，如图2所示。在大尺度上，设定不包含缺陷的非墙体区域作为分割目标，如窗户、空调和天空等。在小尺度上，以缺砖、碎砖等楼面常见缺陷作为检测目标。First, the detection process is introduced. According to the first detection step, a dual-scale pyramid is first constructed, as shown in Figure 2. On a large scale, non-wall regions that do not contain defects are set as segmentation targets, such as windows, air conditioners, and sky. On a small scale, common floor defects such as missing bricks and broken bricks are used as detection targets.

根据检测步骤二，在大尺度上利用Mask RCNN对窗户和空调进行分割，并辅以传统区域增长的方式分割天空，最终得到非墙体大目标分割掩膜，如图4所示。According to the detection step 2, Mask RCNN is used to segment the windows and air conditioners on a large scale, and the sky is segmented by the traditional area growth method, and finally a non-wall large target segmentation mask is obtained, as shown in Figure 4.

根据检测步骤三，在小尺度上利用SSD对缺碎砖进行检测。先利用墙体掩膜对由尺寸为640×640的滑窗提出的候选框进行初次筛选。其中，Proposal_M中9个预先设定的采样点位置如图5所示。然后利用SSD对子块进行检测，并将检测框坐标映射回原图，流程如图6所示。小尺度上的缺陷检测结果如图7的(a)所示。According to the detection step 3, the SSD is used to detect the missing bricks on a small scale. The candidate frame proposed by the sliding window of size 640×640 is initially screened by using the wall mask. Among them, the positions of 9 preset sampling points in Proposal _M are shown in Figure 5. Then use SSD to detect the sub-block, and map the coordinates of the detection frame back to the original image. The process is shown in Figure 6. The defect detection results on the small scale are shown in Fig. 7(a).

根据检测步骤四，融合大尺度分割与小尺度检测结果提升检测精度。经过掩膜二次筛选以及非极大值抑制处理的最终检测结果如图7的(b)所示。According to the fourth detection step, the results of large-scale segmentation and small-scale detection are integrated to improve detection accuracy. The final detection result after mask secondary screening and non-maximum suppression processing is shown in Figure 7(b).

接下来介绍训练流程，根据训练步骤一，先将用于训练实例分割模型MaskR-CNN的分辨率为994×663的31张样本分为训练集(25张)和验证集(6张)，每张图片中的窗户和空调个数不等。以COCO预训练模型参数为基础，在上述训练集上对参数进行微调。批量大小(Batch size)设定为1，初始学习率设定为0.0001，迭代次数为100000步。训练过程中，模型在验证集上的均值平均精度mAP的变化如图8的(a)所示。在验证集上的mAP曲线变化如图8的(a)所示。保存出第38440步mAP为96.89％的模型作为最终分割模型，并将其置信度阈值设定为0.5，得到其部分检测结果如图8的(b)所示。Next, the training process is introduced. According to the first training step, 31 samples with a resolution of 994×663 used to train the instance segmentation model MaskR-CNN are divided into training set (25) and validation set (6). The number of windows and air conditioners in this picture varies. Based on the COCO pretrained model parameters, the parameters are fine-tuned on the above training set. The batch size is set to 1, the initial learning rate is set to 0.0001, and the number of iterations is 100000 steps. During the training process, the variation of the mean mean precision mAP of the model on the validation set is shown in (a) of Figure 8. The mAP curve changes on the validation set are shown in Fig. 8(a). The model with mAP of 96.89% in step 38440 is saved as the final segmentation model, and its confidence threshold is set to 0.5, and some of the detection results are obtained as shown in (b) of Figure 8.

根据训练步骤二，先由高分辨率图像集制作640×640切片图集。其中用于训练单步检测器SSD模型的样本共107张，图像分辨率一般为7952×5304，分为训练集(88张)、验证集(9张)和测试集(10张)，每张图片中的缺陷个数不等，而生成的训练集、验证集和测试集对应的切片样本数量分别为3180、220和200。然后利用在COCO数据集上预训练好的SSD模型在楼面缺陷切片样本集上进行迁移学习，batch size设为4，学习率设为0.00005，迭代30000次。模型在验证切片集上的mAP随迭代次数变化的曲线如图所示，其中IOU阈值设定为0.5。从图9的(a)中可以发现，在100000步迭代后，模型在验证切片集上的mAP可达到0.7左右，之后趋于稳定。保存出验证切片集上最高mAP节点处的模型作为初次训练模型，该模型在测试切片集上的mAP为0.714。将初次训练模型的置信度阈值设定为0.5，其部分检测结果如图9的(b)所示。According to the second training step, a 640×640 slice atlas is first created from the high-resolution image set. Among them, there are 107 samples used to train the SSD model of the single-step detector, and the image resolution is generally 7952×5304, which are divided into training set (88), validation set (9) and test set (10). The number of defects in the pictures varies, and the number of slice samples corresponding to the generated training set, validation set, and test set are 3180, 220, and 200, respectively. Then, the pre-trained SSD model on the COCO dataset is used to perform migration learning on the sample set of floor defect slices. The batch size is set to 4, the learning rate is set to 0.00005, and the iteration is 30,000 times. The mAP of the model on the validation slice set as a function of the number of iterations is shown in the figure, where the IOU threshold is set to 0.5. It can be found from (a) of Figure 9 that after 100,000 iterations, the mAP of the model on the validation slice set can reach about 0.7, and then tends to be stable. The model at the highest mAP node on the validation slice set is saved as the initial training model, and the mAP of this model on the test slice set is 0.714. The confidence threshold of the initial training model is set to 0.5, and some of the detection results are shown in Figure 9(b).

根据训练步骤三，利用初次训练的Mask RCNN和SSD，依据上述检测流程对训练和验证集的高分辨率图像进行检测，将得到的负样本剪裁下来加入到原始切片集中，并对初次训练的SSD进行二次微调，用其替换原始SSD，完成检测框架的最终搭建。检测算法中应用初次训练SSD和二次训练SSD的检测结果对比如图10所示。According to the training step 3, the Mask RCNN and SSD for the initial training are used to detect the high-resolution images of the training and validation sets according to the above detection process, and the obtained negative samples are cropped and added to the original slice set. Perform secondary fine-tuning and replace the original SSD with it to complete the final construction of the detection framework. Figure 10 shows the comparison of the detection results of the first training SSD and the second training SSD in the detection algorithm.

最后应说明的是：本方法所设计的各种参数需要根据实际应用的具体兴趣进行调整。以上所述的各实施例仅用于说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述实施例所记载的技术方案进行修改，或者对其中部分或全部技术特征进行等同替换；而这些修改或替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the various parameters designed by this method need to be adjusted according to the specific interests of the actual application. The above-mentioned embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: The technical solutions described in the embodiments are modified, or some or all of the technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for automatically detecting small targets in a high-resolution image based on computer vision and deep learning is characterized by comprising the following steps:

s1 detection flow

S1.1 establishing multiscale task groups

Establishing a dual-scale image pyramid for the original high-resolution image to be detected according to a Gaussian pyramid theory, and decomposing a small target detection task under the two scales to obtain a multi-scale task group; setting a large target segmentation task which has no inclusion relation with a small target at a large scale, and setting an original small target detection task at a small scale;

s1.2 object segmentation at large scale

Under the large scale, a Mask R-CNN model is used for segmenting the large scale image, and the obtained low resolution Mask is subjected to up-sampling to recover the resolution of the original image;

s1.3 Small Scale target detection

Under the small scale, extracting a candidate region in the small scale image by using an overlapping sliding window, screening the candidate region according to a Mask, and sending the candidate region without intersection with a large target region in the Mask to a target detector SSD for detection; after all the candidate areas are detected, mapping the detection results from the sub-areas back to the original image;

s1.4 segmentation under dual scales and detection result fusion

Carrying out secondary screening on the detection frame obtained under the small scale by utilizing the segmentation mask obtained under the large scale; firstly, performing morphological processing on a segmentation mask, then deleting a detection frame appearing in a large target area, and finally fusing overlapped detection frames by applying non-maximum value inhibition to obtain a final detection result of a small target in a high-resolution image;

s2 training procedure

S2.1 segmentation model training

The Mask RCNN Model is trained in a transfer learning mode by utilizing pictures and labeled information under large scale, and nodes stored on a verification set are used as a trained segmentation Model_S；

S2.2 detection model initial training

Randomly cutting a defect peripheral area in the small-scale high-resolution image according to a specific mode to obtain a slice sample set which accords with the input size of the SSD model; training an SSD Model in a migration learning mode, and saving the best-performing node on a verification set as a trained segmentation Model_D1；

S2.3 test model second training

Model to be trained_SAnd a Model_D1Embedding the image into a detection flow, and detecting according to the high-resolution atlas; cutting out false detection frames in the result, adding the false detection frames as a single class into the original slice set, and detecting the Model by using the new training set_D1Retraining to obtain a secondary training Model_D2(ii) a Finally, Model of secondary training Model is used_D2Model for replacing target detector in original detection frame_D1And finishing the construction of the final detection frame.

2. The method for automatically detecting small objects in high-resolution images based on computer vision and deep learning according to claim 1,

at small scale, since the image resolution is much higher than the input size of the SSD detector, use is made ofA sliding window with the same size as the input of the detector is overlapped and slid on the small-scale image to extract the candidate detection region Proposal in the small-scale image_S(ii) a Before the SSD is detected, screening the candidate area according to a mask M (x, y), wherein the screening mode is as follows: simultaneous extraction of the sub-regions Proposal on M (x, y) using one and the same sliding window_MIf Proposal is used_MThe middle 9 preset sampling points are all 255 pixels; after the window traverses the complete picture, each Proposal will be obtained_SCorresponding detection results; setting the detection result greater than the confidence threshold to have D₁The d-th detection frame is expressed as a coordinate (x)_d，y_d，w_d，h_d) Class c_dAnd a confidence s_dThen detect the set DBox₁＝{((x_d，y_d，w_d，h_d)，c_d，s_d)|d∈[1，D₁]}；

These measurements were then derived from Proposal_SMapping back to the original image, the mapping method is as follows: proposal is provided_SThe position of the upper left corner in the original image is recorded as (X)_ps，Y_ps) (ii) a Then 4 parameters (X) of the target detection frame corresponding to the original image can be obtained according to the coordinate transformation_d，Y_d，W_d，H_d)

X_d＝X_ps+x_d

Y_d＝Y_ps+y_d

W_d＝w_d

H_d＝h_d

Detection ensemble becomes DBox₂＝{((X_d，Y_d，W_d，H_d)，c_d，s_d)|d∈[1，D₁]}。

3. The method for automatically detecting small targets in high-resolution images based on computer vision and deep learning as claimed in claim 1, wherein after segmentation and detection under multiple scales are completed, multi-scale information is fusedThe detection precision of the small target is improved; utilizing the segmentation mask obtained under the large scale to detect the DBox result set obtained under the small scale₁Carrying out secondary screening; firstly, carrying out corrosion treatment on the segmentation mask to reduce the area of a large target region; then judging whether the pixel values of 4 angular points of the detection frame are all 0, if so, keeping, otherwise, abandoning the detection frame; assuming that the number of the test results after screening is D₂If yes, the detection set becomes DBox₂＝{((X_g，Y_g，W_g，H_g)，c_g，s_g)|g∈[1，D₂]}。

4. The method as claimed in claim 1, wherein the method for automatically detecting small objects in the high resolution image based on computer vision and deep learning is characterized in that, in order to suppress the overlapping phenomenon of the detection frames caused by the overlapping sliding of the sliding window, a non-maximum suppression algorithm is applied to complete the further fusion of the detection frames, so as to obtain a set DBox containing F final detection results_final＝{((X_f，Y_f，W_f，H_f)，c_f，s_f)|g∈[1，F]}。

5. The method for automatically detecting the small and medium targets in the high-resolution image based on the computer vision and the deep learning as claimed in claim 1, wherein a segmentation model Mask RCNN is trained by a large-scale picture set and labeling information thereof; firstly, dividing the sample set into two subsets of training and verification; then, adjusting model parameters by using the training set, and testing the quality of the model on the verification set every 10 steps; the training mode adopts transfer learning, namely pre-training is carried out on a large data set, and then parameter fine tuning is carried out on a small training set of an actual application scene; training the designated number of rounds by adopting a small batch gradient descent method, and finally saving the nodes on the verification set to obtain a Model_S。

6. Automatic detection of small objects in high-resolution images based on computer vision and deep learning as claimed in claim 1The measuring method is characterized in that the peripheral area of the defect in the small-scale high-resolution image is randomly cut to obtain a slice sample set which conforms to the input size of the SSD model; in order to ensure that all the labeled boxes in the slice are complete and not truncated, a random clipping algorithm is designed as follows; let W_wAnd H_wThe length and width of the window are provided with a marking frame BBox_iHas a length and width of W_biAnd H_biThe coordinate of the upper left corner point is (x)_bi，y_bi) Wherein i ∈ [0, N]。