WO2021208502A1 - Remote-sensing image target detection method based on smooth bounding box regression function - Google Patents

Remote-sensing image target detection method based on smooth bounding box regression function Download PDF

Info

Publication number
WO2021208502A1
WO2021208502A1 PCT/CN2020/140022 CN2020140022W WO2021208502A1 WO 2021208502 A1 WO2021208502 A1 WO 2021208502A1 CN 2020140022 W CN2020140022 W CN 2020140022W WO 2021208502 A1 WO2021208502 A1 WO 2021208502A1
Authority
WO
WIPO (PCT)
Prior art keywords
target detection
regression
region
interest
network
Prior art date
Application number
PCT/CN2020/140022
Other languages
French (fr)
Chinese (zh)
Inventor
申原
刘军
李洪忠
郭善昕
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Publication of WO2021208502A1 publication Critical patent/WO2021208502A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]

Definitions

  • the invention belongs to the field of image processing and machine learning, and relates to a remote sensing image target detection method based on a smooth border regression function.
  • Remote sensing image target detection is one of the core tasks in remote sensing image understanding. Its main purpose is to quickly find and accurately locate the target of interest in remote sensing images. Target detection itself is an important task, and it is also the basis of many tasks. Such as instance segmentation, image understanding, etc.
  • the detection accuracy of remote sensing images was low before, and only the intersection ratio between the position of the prediction frame and the target real reference frame is greater than 0.5, it can be considered as a correct detection.
  • people need to change Detect targets under precise positioning to achieve high-quality detection.
  • Deep learning is the most popular and cutting-edge basic artificial intelligence technology. Its powerful representation learning ability can automatically learn features from big data and has strong robustness.
  • the network first uses a deep neural network to extract the features of the picture, and then uses a detector based on the feature to detect, but because the detector is sensitive to feature fluctuations, the robustness is not good enough, resulting in poor regression results.
  • the regression process is mainly realized by the border regression function.
  • Frame regression is to make the candidate frame return to a position closer to the true reference frame.
  • Box regression is achieved by minimizing the gap with the real candidate box.
  • the L2 loss function used in RCNN is improved to a smooth L1 loss function in Fast RCNN.
  • the quality of the candidate frame is gradually improved, getting closer and closer to the real frame, at this time the gap will become smaller, and the volatility is greater, the more difficult it is to stabilize the regression, especially at the zero point Nearby will cause the failure of the regression due to continuous oscillations, resulting in low accuracy of the regression.
  • the present invention provides a remote sensing image target detection method based on a smooth frame regression function.
  • the technical problem to be solved is to provide a smooth frame regression function, so that the candidate frame and the real reference frame will fluctuate due to the small gap between them.
  • the regression process becomes more stable, so as to obtain higher regression accuracy and detection accuracy.
  • the present invention provides a target detection method based on a smooth border regression function, which includes the following steps:
  • Step 1 Image preprocessing: Perform necessary preprocessing on training images, including image rotation, mirroring and other enhancement operations, image normalization operations, image size adjustment operations, and setting of hyperparameters for network training;
  • Step 2 Feature extraction: input the image into the target detection class convolutional neural network to obtain the feature map; then input the feature map into the regional suggestion network to obtain the candidate frame; then send the candidate frame and the feature map to the sensor In the region of interest pooling layer, the characteristics of the region of interest are obtained;
  • Step three classification: send the region of interest features obtained in step two to the softmax classifier for classification;
  • Step 4 Send the features of the region of interest obtained in step 3 to the fully connected layer to get the predicted offset, and send the predicted offset to the smooth border regression function to get the actual offset. Move the amount to correct the candidate frame to a new position;
  • Step 5 Use the bounding box of the candidate box after regression correction as the new candidate box, and send it to the region of interest layer together with the feature map to obtain the region of interest feature, repeat step 3, step 4, and step 5. Until the training process is over, a trained network is obtained;
  • Step 6 Input the image to be detected into the trained network after preprocessing to obtain the target detection result.
  • sgn represents the coincidence function, to ensure that there is no error in the operation of negative numbers
  • exp is an exponential function
  • c x , c y , c w , c h are the weight adjustment values of the regression
  • t x , t y , t h , t w are The offset predicted by the convolutional neural network
  • p x , p y are the position coordinates of the center point of the candidate box
  • p w , p h are the width and height of the candidate box
  • G x , G y are the center of the bounding box after regression correction
  • G w , G h are the width and height of the bounding box after regression correction.
  • the target detection convolutional neural network includes but is not limited to Faster RCNN, YOLO v1, YOLO v2, YOLO v3, SSD, FPN, RetinaNet, and Cascade RCNN.
  • the beneficial effect of the present invention is that by constructing a smooth frame regression function, the stability of the regression process can be enhanced, the regression process where the gap between the candidate frame and the real reference frame is too small and the fluctuations will become more stable, and the solution to the problem of continuous occurrence near the zero point
  • the oscillating causes the problem of regression failure, which makes the detection accuracy higher under the high IoU threshold, so as to obtain higher target detection accuracy.
  • Figure 1 is a comparison diagram of the adjustment range of the smooth frame regression function and the original frame regression function
  • Figure 2 is an enlarged comparison diagram of the smooth border regression function and the original border regression function near the zero point;
  • Figure 3 is a visual display of the feature map output by the convolutional neural network
  • Figure 4 is a schematic diagram of the regional proposal network structure
  • Figure 5 is a schematic diagram of the structure of a cascade detector
  • Figure 6 is the result of the Cascade RCNN method using the original frame regression function
  • Fig. 7 is a detection result diagram of the Cascade RCNN method using the smooth border regression function provided by the present invention.
  • FIG. 8 is a schematic diagram of a workflow of a hyperspectral image retrieval method according to an embodiment of the present invention.
  • sgn represents the conformance function to ensure that there is no error in the operation of negative numbers.
  • exp is an exponential function
  • c x , c y , c w , c h are the weight adjustment values of the regression, and its value usually defaults to (10, 10, 5, 5)
  • t x , t y , t h , t w are The offset predicted by the convolutional neural network
  • p x , p y are the position coordinates of the center point of the candidate box
  • p w , p h are the width and height of the candidate box
  • G x , G y are the center of the bounding box after regression correction
  • Point position coordinates G w , G h are the width and height of the bounding box after regression correction.
  • the straight line represents the original regression function
  • the curve represents the improved frame function.
  • Figure 2 is an enlarged view of Figure 1 when it approaches zero. It can be seen that the improved regression function is smoother near the true value, so that the frame tends to move to the true frame after the regression, and does not easily cross the true frame, which enhances the nature of convergence.
  • Cascade RCNN uses three cascaded detectors to achieve target detection.
  • the original DOTA data set is large in size and contains many objects. According to the requirements of the present invention, the focus is on selecting pictures containing a large number of densely arranged small targets such as airplanes, ships, cars, etc., and then doing the pictures With a certain amount of cropping, the pictures are cropped to between 600-800, and the data set we need is obtained.
  • the training set contains 15,070 pictures, and the test set contains 2,700 pictures.
  • Send the picture to the network first perform operations such as horizontal mirroring and rotation on the picture to enhance the data set; then normalize the gray value of the picture, and then scale it according to the size of the training setting, usually the smallest edge is set
  • the size is 600, and the maximum side size is 1000; then the picture is filtered, and if there is no target on the picture, the picture is excluded.
  • the framework used is caffe2
  • the backbone network is resnet101
  • the minimum edge of the image is set to 600 during training
  • the maximum edge is limited to 1000.
  • the training method uses SGD with momentum, and the momentum is set to 0.9
  • the initial learning rate is set to 0.01
  • the penalty term coefficient is 0.0001.
  • This article uses segmented training, a total of 360,000 iterations, and the learning rate decays to 0.001 and 0.0001 at 240,000 and 320,000 times, respectively.
  • the pre-processed pictures are sequentially sent to the convolutional neural network layer, and the image data is convolved and pooled through the convolutional network neural to extract the characteristics of the picture for use in the subsequent Cascade RCNN detector Detection.
  • Figure 3 shows the visual display of the feature map output by the convolutional neural network layer.
  • the features extracted from the convolutional neural network are input into the regional suggestion network.
  • the regional suggestion network a series of anchor points are preset for all regions on the picture by means of a sliding window, as shown in Figure 4. By filtering all preset anchor points according to the foreground confidence ranking method, the anchor point with the highest confidence is finally obtained as the candidate frame.
  • the feature map is sent to the detector to detect the target object.
  • B0 is the candidate area selected in the region suggestion network
  • conv represents the convolutional neural network
  • the candidate area is sent together with the feature map obtained from the convolutional neural network.
  • Enter the RoI Pooling layer to obtain the features of the region of interest, and then send the features to the fully connected layer (H1), and then send the features output from the fully connected layer to the classifier (C1) for classification and smoothing provided by the present invention Fine-tune the positioning in the border regression function (B1).
  • the network has three detectors.
  • the candidate frame B1 which has been fine-tuned by the smooth frame regression function provided by the present invention from the previous layer, is used as a new input and sent to the detector of the next layer until the candidate frame B3 is obtained. Calculate the error between B3 and the real frame as a loss, carry out backward propagation, and adjust the parameters of the convolutional neural network. Repeat the above process until the end of the training process.
  • preprocess the test picture scale it to the size set by the network, and normalize the gray value.
  • the pictures are sequentially sent to the convolutional neural network to extract features to obtain feature maps.
  • the extracted feature map is input into the region suggestion network to obtain a candidate frame, and then the candidate frame and the feature map are input into the region of interest pooling layer together to obtain the feature of the region of interest.
  • the features of the region of interest are sent to the first layer of the cascade detector to obtain the offset of the regression, and then calculate according to the smooth border regression function to obtain the corrected position of the bounding box, and use the bounding box as a new candidate
  • the box and the feature map are input into the region of interest pooling layer together to obtain the new region of interest feature, and then the new region of interest feature is input to the detector of the second layer, and this operation is repeated until the last layer detector.
  • the bounding box obtained by the last layer of detectors after bounding regression correction is the final bounding box of the network.
  • input the features of the region of interest of the last layer into the classifier of each layer of the detector to obtain the classification result, and then synthesize the classification results of each classifier to obtain the final classification result of the network.
  • the above-mentioned DOTA data set is used for testing.
  • the YOLO v2, SSD, Faster RCNN, YOLO v3, RetinaNet, FPN, Cascade RCNN methods are used for testing.
  • the frame regression function of the original method is used for calculation, and then the original method
  • the border regression function replaces the smooth border regression function provided by the invention, and then recalculates.
  • evaluating the performance of a classifier is generally measured by two quantities: Precision and Recall.
  • the samples can be divided into four categories according to the situation between the true value and the predicted value of the sample: True Positives (TP): Predict the positive sample as a positive case; False Positives , FP): predict positive samples as negative examples; True Negatives (TN): predict negative samples as negative examples; False Negatives (FN): predict negative samples as positive examples; through the confusion matrix (Confusion Matrix ) Can clearly present these four types of relationships.
  • the performance of the detector in target detection is measured by AP and mAP.
  • the average precision (Average Precision, AP) is usually taken as the evaluation index.
  • the AP for single-type target detection is to calculate the "P-R curve" and the area enclosed by the horizontal and vertical axes of this type.
  • target detection to determine the four types of samples TP, FP, TN and FN, it is necessary to calculate the IoU between each prediction frame and the true reference frame. Only when the threshold is greater than the set threshold, the sample can be judged as a positive sample .
  • indicators such as AP, AP50, AP75, AP60, AP70, AP80, and AP90 are used to evaluate the accuracy of target detection for each comparison method used.
  • AP50 refers to the AP value when IoU is set to 0.5, and the meaning of other indicators is similar to AP50. It can be seen that the higher the IoU, the higher the accuracy of target detection and the greater the difficulty.
  • Figures 6 and 7 show the target detection results using Cascade RCNN as the basic network architecture, using the original border regression function and the smooth border regression function provided by the present invention.
  • Fig. 6 is the result of the original frame regression function
  • Fig. 7 is the result of the smooth frame regression function provided by the present invention. It can be clearly seen that the positioning accuracy of the method of the present invention is higher than that of the original frame regression function.
  • Table 1 shows the comparison between the detection results of the method of the present invention and the original method under other network architectures. Where ⁇ indicates the accuracy of using the smooth frame regression function provided by the present invention under a given network architecture, and if there is no ⁇ , it indicates that the original frame regression function is used.
  • AP represents the overall average accuracy index
  • AP50 represents the average accuracy under the threshold where the IoU is greater than 0.5
  • AP750 represents the average accuracy under the threshold where the IoU is greater than 0.75
  • the frame regression process in target detection can be better realized based on the smooth frame regression function. Under the condition of high IoU threshold, the accuracy of the detected target frame is better. Compared with the original frame regression function, the original frame regression function is more accurate.
  • the smooth border regression function provided by the invention can realize target detection with higher precision.
  • the smooth border regression function provided by the present invention can be used in any target detection network framework.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

A remote-sensing image target detection method based on a smooth bounding box regression function, comprising: performing necessary preprocessing on a training image, and setting a hyperparameter of network training; inputting a picture into a target detection convolutional neural network to obtain a feature map; then inputting the feature map into a region suggestion network to obtain a candidate box; and then sending the candidate box and the feature map into a region-of-interest pooling layer to obtain features of a region of interest, and classifying, in a classifier, the features of the region of interest; sending the obtained features of the region of interest into a full connection layer to obtain a predicted offset, then sending the predicted offset into the smooth bounding box regression function to obtain an actual offset, and correcting the candidate box to a new position; repeating the steps until a training process is finished; and preprocessing an image to be detected and then inputting same into a trained network to obtain a target detection result. High-precision bounding box regression can be effectively realized, and higher-precision target detection can be realized under the condition of a high IoU threshold.

Description

一种基于平滑边框回归函数的遥感图像目标检测方法A remote sensing image target detection method based on smooth border regression function 技术领域Technical field
本发明属于图像处理与机器学习领域,涉及一种基于平滑边框回归函数的遥感图像目标检测方法。The invention belongs to the field of image processing and machine learning, and relates to a remote sensing image target detection method based on a smooth border regression function.
背景技术Background technique
随着遥感技术的飞速发展,遥感数据量急速攀升,面对越来越庞大而复杂的遥感信息,如何快速高效的对原始遥感图像进行处理,使其成为用户可以理解并使用的信息成为了重要的研究课题。遥感图像目标检测是遥感图像理解中的核心任务之一,其主要目的是为了在遥感图像中快速找到并准确定位感兴趣的目标,目标检测本身是一个重要的任务,同时也是许多任务的基础,如实例分割、图像理解等。但是此前遥感图像的检测精度是偏低的,只需要预测框的位置和目标真实基准框的交并比大于0.5,便可认为是正确的检测,不过随着算法性能的提升,人们需要在更精准定位的情况下检测目标,以实现高质量的检测。With the rapid development of remote sensing technology, the amount of remote sensing data is rising rapidly. In the face of increasingly large and complex remote sensing information, how to quickly and efficiently process the original remote sensing image to make it the information that users can understand and use becomes important. Research topics. Remote sensing image target detection is one of the core tasks in remote sensing image understanding. Its main purpose is to quickly find and accurately locate the target of interest in remote sensing images. Target detection itself is an important task, and it is also the basis of many tasks. Such as instance segmentation, image understanding, etc. However, the detection accuracy of remote sensing images was low before, and only the intersection ratio between the position of the prediction frame and the target real reference frame is greater than 0.5, it can be considered as a correct detection. However, as the performance of the algorithm improves, people need to change Detect targets under precise positioning to achieve high-quality detection.
传统目标检测技术在面对庞大而复杂的遥感信息时,检测精度、鲁棒性、迁移性都不够好,无法解决本文上述提及的问题,难以满足人类的需要,迫切需要更加高效精准的方法。深度学习是当下最热门前沿的人工智能基础技术,其强大的表示学习能力,能够自动化的从 大数据中学习特征,拥有较强的鲁棒性。Traditional target detection technology is not good enough in detection accuracy, robustness, and mobility when facing huge and complex remote sensing information. It cannot solve the problems mentioned in this article, and it is difficult to meet the needs of human beings. There is an urgent need for more efficient and accurate methods. . Deep learning is the most popular and cutting-edge basic artificial intelligence technology. Its powerful representation learning ability can automatically learn features from big data and has strong robustness.
但是当前深度算法存在着回归定位不准确,检测精度表现较差等问题。通常情况下,网络中先使用深度神经网络对图片的特征进行提取,然后基于特征使用探测器进行探测,但是由于探测器对于特征波动敏感,鲁棒性不够好,导致回归效果差。However, current depth algorithms have problems such as inaccurate regression positioning and poor detection accuracy. Under normal circumstances, the network first uses a deep neural network to extract the features of the picture, and then uses a detector based on the feature to detect, but because the detector is sensitive to feature fluctuations, the robustness is not good enough, resulting in poor regression results.
在网络中,回归过程主要通过边框回归函数来实现。边框回归就是要使候选框经过回归到达更接近真实基准框的位置。边框回归通过最小化与真实候选框之间的差距来实现回归。在RCNN中使用的L2的损失函数,在Fast RCNN中则改进为了smooth L1的损失函数。随着训练过程的进行,候选框的质量得到逐步的提高,越来越接近真实框,此时差距会变得较小,而波动性较大,越难趋于稳定的回归,尤其是在零点附近会因为出现持续的震荡导致回归的失效,从而导致回归的精度不高。In the network, the regression process is mainly realized by the border regression function. Frame regression is to make the candidate frame return to a position closer to the true reference frame. Box regression is achieved by minimizing the gap with the real candidate box. The L2 loss function used in RCNN is improved to a smooth L1 loss function in Fast RCNN. As the training process progresses, the quality of the candidate frame is gradually improved, getting closer and closer to the real frame, at this time the gap will become smaller, and the volatility is greater, the more difficult it is to stabilize the regression, especially at the zero point Nearby will cause the failure of the regression due to continuous oscillations, resulting in low accuracy of the regression.
发明内容Summary of the invention
基于此,本发明提供了一种基于平滑边框回归函数的遥感图像目标检测方法,所要解决的技术问题是,提供一种平滑边框回归函数,让候选框与真实基准框因差距过小而产生波动的回归过程变得更加稳定,从而获得更高的回归精度和检测精度。Based on this, the present invention provides a remote sensing image target detection method based on a smooth frame regression function. The technical problem to be solved is to provide a smooth frame regression function, so that the candidate frame and the real reference frame will fluctuate due to the small gap between them. The regression process becomes more stable, so as to obtain higher regression accuracy and detection accuracy.
本发明提供了一种基于平滑边框回归函数的目标检测方法,包括以下步骤:The present invention provides a target detection method based on a smooth border regression function, which includes the following steps:
步骤一、图像预处理:对训练图像进行必要的预处理,包括图像 旋转、镜像等增强操作,图像归一化操作,图像大小调整操作,并设置网络训练的超参数; Step 1. Image preprocessing: Perform necessary preprocessing on training images, including image rotation, mirroring and other enhancement operations, image normalization operations, image size adjustment operations, and setting of hyperparameters for network training;
步骤二、特征提取:将图片输入到目标检测类卷积神经网络中,得到特征图;然后将特征图输入到区域建议网络中,得到候选框;再将候选框与特征图一同送入到感兴趣区域池化层中,得到感兴趣区域特征;Step 2. Feature extraction: input the image into the target detection class convolutional neural network to obtain the feature map; then input the feature map into the regional suggestion network to obtain the candidate frame; then send the candidate frame and the feature map to the sensor In the region of interest pooling layer, the characteristics of the region of interest are obtained;
步骤三、分类:将步骤二得到的感兴趣区域特征送入到softmax分类器中进行分类;Step three, classification: send the region of interest features obtained in step two to the softmax classifier for classification;
步骤四、回归:将步骤三得到的感兴趣区域特征送入到全连接层中得到预测的偏移量,将预测的偏移量送入到平滑边框回归函数中得到实际偏移量,依据偏移量,将候选框修正至新的位置;Step 4. Regression: Send the features of the region of interest obtained in step 3 to the fully connected layer to get the predicted offset, and send the predicted offset to the smooth border regression function to get the actual offset. Move the amount to correct the candidate frame to a new position;
步骤五、修正:将候选框经过回归修正后的边界框作为新的候选框,与特征图一同送入到感兴趣区域层中,得到感兴趣区域特征,重复步骤三、步骤四、步骤五,直到训练过程结束,得到训练好的网络; Step 5. Correction: Use the bounding box of the candidate box after regression correction as the new candidate box, and send it to the region of interest layer together with the feature map to obtain the region of interest feature, repeat step 3, step 4, and step 5. Until the training process is over, a trained network is obtained;
步骤六、将待检测图像经过预处理后输入到训练好的网络中,得到目标检测结果。Step 6. Input the image to be detected into the trained network after preprocessing to obtain the target detection result.
进一步地,所述的用于回归的平滑边框回归函数为:Further, the smooth border regression function used for regression is:
(sgn((t x/c x))×|(t x/c x)|) 4/3×p w+p x=G x (sgn((t x /c x ))×|(t x /c x )|) 4/3 ×p w +p x =G x
(sgn((t y/c y))×|(t y/c y)|) 4/3×p h+p y=G y (sgn((t y /c y ))×|(t y /c y )|) 4/3 ×p h +p y =G y
exp(sgn((t w/c w))×|(t w/c w)|) 4/3×p w=G w exp(sgn((t w /c w ))×|(t w /c w )|) 4/3 ×p w =G w
exp(sgn((t h/c h))×|(t h/c h)|) 4/3×p h=G h exp(sgn((t h /c h ))×|(t h /c h )|) 4/3 × p h =G h
其中,sgn表示符合函数,保证负数运算的时候不出错,exp是指数函数,c x,c y,c w,c h为回归的权重调节值,t x,t y,t h,t w是卷积神经网络 预测的偏移量,p x,p y是候选框中心点的位置坐标,p w,p h是候选框的宽和高,G x,G y是回归修正之后的边界框中心点位置坐标,G w,G h是回归修正之后的边界框的宽和高。 Among them, sgn represents the coincidence function, to ensure that there is no error in the operation of negative numbers, exp is an exponential function, c x , c y , c w , c h are the weight adjustment values of the regression, t x , t y , t h , t w are The offset predicted by the convolutional neural network, p x , p y are the position coordinates of the center point of the candidate box, p w , p h are the width and height of the candidate box, G x , G y are the center of the bounding box after regression correction Point position coordinates, G w , G h are the width and height of the bounding box after regression correction.
进一步地,所述的目标检测类卷积神经网络包括但不限于Faster RCNN,YOLO v1,YOLO v2,YOLO v3,SSD,FPN,RetinaNet,Cascade RCNN。Further, the target detection convolutional neural network includes but is not limited to Faster RCNN, YOLO v1, YOLO v2, YOLO v3, SSD, FPN, RetinaNet, and Cascade RCNN.
本发明的有益效果是,通过构建平滑边框回归函数,能够增强回归过程的稳定性,让候选框与真实基准框差距过小而产生波动的回归过程变得更加稳定,解决零点附近会因为出现持续的震荡导致回归失效的问题,使得在高IoU阈值下检测精度更高,从而获得更高的目标检测精度。The beneficial effect of the present invention is that by constructing a smooth frame regression function, the stability of the regression process can be enhanced, the regression process where the gap between the candidate frame and the real reference frame is too small and the fluctuations will become more stable, and the solution to the problem of continuous occurrence near the zero point The oscillating causes the problem of regression failure, which makes the detection accuracy higher under the high IoU threshold, so as to obtain higher target detection accuracy.
附图说明Description of the drawings
图1是平滑边框回归函数和原始边框回归函数的调节范围的对比图;Figure 1 is a comparison diagram of the adjustment range of the smooth frame regression function and the original frame regression function;
图2是平滑边框回归函数和原始边框回归函数的在零点附近的放大对比图;Figure 2 is an enlarged comparison diagram of the smooth border regression function and the original border regression function near the zero point;
图3是卷积神经网络输出的特征图可视化显示;Figure 3 is a visual display of the feature map output by the convolutional neural network;
图4是区域建议网络结构示意图;Figure 4 is a schematic diagram of the regional proposal network structure;
图5是为级联探测器的结构示意图;Figure 5 is a schematic diagram of the structure of a cascade detector;
图6是Cascade RCNN方法采用原始边框回归函数检测结果图;Figure 6 is the result of the Cascade RCNN method using the original frame regression function;
图7是Cascade RCNN方法采用本发明提供的平滑边框回归函数检测结果图;Fig. 7 is a detection result diagram of the Cascade RCNN method using the smooth border regression function provided by the present invention;
图8是本发明一种实施例的高光谱影像检索方法工作流程示意图。FIG. 8 is a schematic diagram of a workflow of a hyperspectral image retrieval method according to an embodiment of the present invention.
具体实施方式Detailed ways
为了使本发明所解决的技术问题、技术方案及有益效果更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the technical problems, technical solutions and beneficial effects solved by the present invention clearer, the following further describes the present invention in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, but not used to limit the present invention.
结合图1和图2,对平滑边框回归函数进行说明:Combining Figure 1 and Figure 2, the smooth border regression function is explained:
(sgn((t x/c x))×|(t x/c x)|) 4/3×p w+p x=G x (sgn((t x /c x ))×|(t x /c x )|) 4/3 ×p w +p x =G x
(sgn((t y/c y))×|(t y/c y)|) 4/3×p h+p y=G y (sgn((t y /c y ))×|(t y /c y )|) 4/3 ×p h +p y =G y
exp(sgn((t w/c w))×|(t w/c w)|) 4/3×p w=G w exp(sgn((t w /c w ))×|(t w /c w )|) 4/3 ×p w =G w
exp(sgn((t h/c h))×|(t h/c h)|) 4/3×p h=G h exp(sgn((t h /c h ))×|(t h /c h )|) 4/3 × p h =G h
其中,sgn表示符合函数,保证负数运算的时候不出错。exp是指数函数,c x,c y,c w,c h为回归的权重调节值,其值通常默认是(10,10,5,5),t x,t y,t h,t w是卷积神经网络预测的偏移量,p x,p y是候选框中心点的位置坐标,p w,p h是候选框的宽和高,G x,G y是回归修正之后的边界框中心点位置坐标,G w,G h是回归修正之后的边界框的宽和高。 Among them, sgn represents the conformance function to ensure that there is no error in the operation of negative numbers. exp is an exponential function, c x , c y , c w , c h are the weight adjustment values of the regression, and its value usually defaults to (10, 10, 5, 5), t x , t y , t h , t w are The offset predicted by the convolutional neural network, p x , p y are the position coordinates of the center point of the candidate box, p w , p h are the width and height of the candidate box, G x , G y are the center of the bounding box after regression correction Point position coordinates, G w , G h are the width and height of the bounding box after regression correction.
如图1和图2所示,其中直线表现原始的回归函数,曲线表示改进的边框函数,图2为图1在趋近零值时的放大图。可以看出,改进的回归函数在真实值附近更加平滑,使得边框在经过回归之后更加趋向于向真实框移动,而不轻易越过真实框,增强了收敛的性质。As shown in Figure 1 and Figure 2, the straight line represents the original regression function, and the curve represents the improved frame function. Figure 2 is an enlarged view of Figure 1 when it approaches zero. It can be seen that the improved regression function is smoother near the true value, so that the frame tends to move to the true frame after the regression, and does not easily cross the true frame, which enhances the nature of convergence.
以下以Cascade RCNN为基础网络结构,对本发明的实施例步骤进行详细说明,如图8所示。Cascade RCNN采用三个级联探测器实 现目标检测。The following uses Cascade RCNN as the basic network structure to describe in detail the steps of the embodiments of the present invention, as shown in FIG. 8. Cascade RCNN uses three cascaded detectors to achieve target detection.
1.数据集介绍1. Introduction to the data set
选取自DOTA数据集,原DOTA数据集尺寸较大,包含物体众多,根据本发明的需求,重点选取了包含大量密集排列的小目标如飞机、轮船、汽车等的图片,然后对图片做了一定的剪裁,将图片裁剪到600-800之间,得到了我们需要的数据集,其中训练集包含了15070张图片,测试集包含了2700张图片。Selected from the DOTA data set. The original DOTA data set is large in size and contains many objects. According to the requirements of the present invention, the focus is on selecting pictures containing a large number of densely arranged small targets such as airplanes, ships, cars, etc., and then doing the pictures With a certain amount of cropping, the pictures are cropped to between 600-800, and the data set we need is obtained. The training set contains 15,070 pictures, and the test set contains 2,700 pictures.
2.训练过程2. Training process
1)数据预处理1) Data preprocessing
将图片送入到网络中,首先对图片进行水平镜像、旋转等操作以增强数据集;然后对图片进行灰度值归一化,再按照训练设置的尺寸进行缩放,通常设定的是最小边尺寸为600,最大边尺寸为1000;然后对图片进行筛选,假如图片上没有一个目标存在,则将这张图片排除掉。Send the picture to the network, first perform operations such as horizontal mirroring and rotation on the picture to enhance the data set; then normalize the gray value of the picture, and then scale it according to the size of the training setting, usually the smallest edge is set The size is 600, and the maximum side size is 1000; then the picture is filtered, and if there is no target on the picture, the picture is excluded.
2)训练参数设置2) Training parameter settings
使用4个GPU进行训练,使用的框架是caffe2,主干网络是resnet101,在训练中图片的最小边设置为600,最大边限制在1000,训练的方式使用的是带动量的SGD,动量设置为0.9,初始学习率设置为0.01,惩罚项系数为0.0001。本文使用分段训练,总共迭代360000次,在240000,320000次的时候学习率分别衰减为0.001,0.0001。Use 4 GPUs for training, the framework used is caffe2, the backbone network is resnet101, the minimum edge of the image is set to 600 during training, and the maximum edge is limited to 1000. The training method uses SGD with momentum, and the momentum is set to 0.9 , The initial learning rate is set to 0.01, and the penalty term coefficient is 0.0001. This article uses segmented training, a total of 360,000 iterations, and the learning rate decays to 0.001 and 0.0001 at 240,000 and 320,000 times, respectively.
3)特征提取3) Feature extraction
将预处理后的图片依次送入到卷积神经网络层中,经过卷积网络 神经对图片数据进行卷积、池化等操作计算,提取出图片的特征,用于后续的Cascade RCNN探测器的检测。如图3所示的是卷积神经网络层输出的特征图的可视化显示。The pre-processed pictures are sequentially sent to the convolutional neural network layer, and the image data is convolved and pooled through the convolutional network neural to extract the characteristics of the picture for use in the subsequent Cascade RCNN detector Detection. Figure 3 shows the visual display of the feature map output by the convolutional neural network layer.
4)候选框的选择4) Selection of candidate frame
将卷积神经网络中提取得到的特征输入到区域建议网络中,在区域建议网络中,通过滑窗的方式,在图片上对所有区域预先设定一系列锚点,如图4所示。通过对所有预设锚点按照前景置信度排序的方式进行筛选,最终得到置信度最高的锚点作为候选框。The features extracted from the convolutional neural network are input into the regional suggestion network. In the regional suggestion network, a series of anchor points are preset for all regions on the picture by means of a sliding window, as shown in Figure 4. By filtering all preset anchor points according to the foreground confidence ranking method, the anchor point with the highest confidence is finally obtained as the candidate frame.
5)级联探测器迭代回归5) Iterative regression of cascaded detectors
从卷积神经网络中得到特征图之后,会将特征图送入到探测器中检测目标物体。如图5所示,为Cascade RCNN的级联探测器结构,其中B0即为区域建议网络中选取出来的候选区域,conv表示卷积神经网络,候选区域与卷积神经网络中得到特征图一起送入到RoI Pooling层中得到感兴趣区域的特征,然后将特征送入全连接层(H1),再将全连接层输出的特征分别送入到分类器(C1)进行分类和本发明提供的平滑边框回归函数(B1)中微调定位。该网络具有三个探测器。将上一层经过本发明提供的平滑边框回归函数微调后的候选框B1,作为新的输入,送入到下一层的探测器中去,直到得到候选框B3。计算B3与真实边框的误差作为损失,进行后向传播,对卷积神经网络网络的参数进行调整。进行重复上述过程,直到训练过程结束。After obtaining the feature map from the convolutional neural network, the feature map is sent to the detector to detect the target object. As shown in Figure 5, it is the Cascade RCNN cascade detector structure, where B0 is the candidate area selected in the region suggestion network, conv represents the convolutional neural network, and the candidate area is sent together with the feature map obtained from the convolutional neural network. Enter the RoI Pooling layer to obtain the features of the region of interest, and then send the features to the fully connected layer (H1), and then send the features output from the fully connected layer to the classifier (C1) for classification and smoothing provided by the present invention Fine-tune the positioning in the border regression function (B1). The network has three detectors. The candidate frame B1, which has been fine-tuned by the smooth frame regression function provided by the present invention from the previous layer, is used as a new input and sent to the detector of the next layer until the candidate frame B3 is obtained. Calculate the error between B3 and the real frame as a loss, carry out backward propagation, and adjust the parameters of the convolutional neural network. Repeat the above process until the end of the training process.
3.测试过程3. Test process
首先将测试图片进行预处理,将其缩放到网络设置的大小,并进 行灰度值归一化。然后将图片依次送入到卷积神经网络中提取特征得到特征图。将提取得到的特征图输入到区域建议网络中得到候选框,然后将候选框和特征图一起输入到感兴趣区域池化层中,得到感兴趣区域特征。将感兴趣区域特征送入到级联探测器的第一层中,得到回归的偏移量,然后按照平滑边框回归函数进行计算,得到修正之后的边界框位置,将该边界框作为新的候选框和特征图一起输入到感兴趣区域池化层中,得到新的感兴趣区域特征,然后将新的感兴趣区域特征输入到第二层的探测器中,重复此操作,直至到最后一层探测器。最后一层探测器经过边框回归修正后得到的边界框,就是网络最终的边界框。同时将最后一层的感兴趣区域特征输入到每层探测器的分类器中,得到分类结果,然后综合各个分类器的分类结果得到网络最终的分类结果。First, preprocess the test picture, scale it to the size set by the network, and normalize the gray value. Then the pictures are sequentially sent to the convolutional neural network to extract features to obtain feature maps. The extracted feature map is input into the region suggestion network to obtain a candidate frame, and then the candidate frame and the feature map are input into the region of interest pooling layer together to obtain the feature of the region of interest. The features of the region of interest are sent to the first layer of the cascade detector to obtain the offset of the regression, and then calculate according to the smooth border regression function to obtain the corrected position of the bounding box, and use the bounding box as a new candidate The box and the feature map are input into the region of interest pooling layer together to obtain the new region of interest feature, and then the new region of interest feature is input to the detector of the second layer, and this operation is repeated until the last layer detector. The bounding box obtained by the last layer of detectors after bounding regression correction is the final bounding box of the network. At the same time, input the features of the region of interest of the last layer into the classifier of each layer of the detector to obtain the classification result, and then synthesize the classification results of each classifier to obtain the final classification result of the network.
为了验证本发明所述的遥感图像目标检测方法的有效性,采用上述DOTA数据集进行测试。为了验证本发明提供的平滑边框回归函数的效果,采用YOLO v2,SSD,Faster RCNN,YOLO v3,RetinaNet,FPN,Cascade RCNN方法进行测试,先采用原方法的边框回归函数进行计算,然后将原方法的边框回归函数替换成本发明提供的平滑边框回归函数,再重新进行计算。In order to verify the effectiveness of the remote sensing image target detection method of the present invention, the above-mentioned DOTA data set is used for testing. In order to verify the effect of the smooth frame regression function provided by the present invention, the YOLO v2, SSD, Faster RCNN, YOLO v3, RetinaNet, FPN, Cascade RCNN methods are used for testing. First, the frame regression function of the original method is used for calculation, and then the original method The border regression function replaces the smooth border regression function provided by the invention, and then recalculates.
评价指标:Evaluation index:
在机器学习领域,评价一个分类器的性能一般用查准率(Precision)和查全率(Recall)两个量衡量。要计算这两个指标首先根据样本真实值和预测值之间的情况,可以将样本分为四类:真正 例(True Positives,TP):将正样本预测为正例;假正例(False Positives,FP):将正样本预测为反例;真反例(True Negatives,TN):将负样本预测为反例;假反例(False Negatives,FN):将负样本预测为正例;通过混淆矩阵(Confusion Matrix)可以清晰的呈现这四种类型的关系。In the field of machine learning, evaluating the performance of a classifier is generally measured by two quantities: Precision and Recall. To calculate these two indicators, the samples can be divided into four categories according to the situation between the true value and the predicted value of the sample: True Positives (TP): Predict the positive sample as a positive case; False Positives , FP): predict positive samples as negative examples; True Negatives (TN): predict negative samples as negative examples; False Negatives (FN): predict negative samples as positive examples; through the confusion matrix (Confusion Matrix ) Can clearly present these four types of relationships.
查准率P和查全率R分辨表示为:The resolution of precision P and recall R is expressed as:
Figure PCTCN2020140022-appb-000001
Figure PCTCN2020140022-appb-000001
Figure PCTCN2020140022-appb-000002
Figure PCTCN2020140022-appb-000002
查准率和查全率都是越高越好,但是一般情况下两者存在矛盾,在查全率高时,查准率会偏低;而在查准率高时,查全率又会偏低。在一般情况下,我们会根据分类的分数进行排序,按照分数从高到低的顺序依次对分类样本进行计算,统计出当前的查准率和查全率。以查全率为横轴,查准率为纵轴,我们可以做出一条曲线称为“P-R曲线”。通过计算“P-R曲线”和横纵轴包围下的面积,可以一定程度上反映性能优劣,面积越高,则性能越好。The higher the precision rate and the recall rate, the better, but in general, there is a contradiction between the two. When the recall rate is high, the precision rate will be lower; and when the precision rate is high, the recall rate will be lower. Low. In general, we will sort according to the classification scores, and calculate the classification samples in order of the scores from high to low, and calculate the current precision and recall rates. With the recall rate on the horizontal axis and the precision rate on the vertical axis, we can make a curve called "P-R curve". By calculating the "P-R curve" and the area enclosed by the horizontal and vertical axis, the performance can be reflected to a certain extent. The higher the area, the better the performance.
目标检测中探测器性能的优劣是用AP和mAP来衡量的。对于单类的目标检测通常采样平均精度(Average Precision,AP)作为评价指标。单类目标检测的AP就是计算这一类所作出的“P-R曲线”和横纵轴包围下的面积。在目标检测中,确定TP,FP,TN和FN这四类样本需要计算每个预测框和真实基准框之间的IoU,只有当阈值大于设定的阈值时,该样本才能被判定为正样本。The performance of the detector in target detection is measured by AP and mAP. For single-class target detection, the average precision (Average Precision, AP) is usually taken as the evaluation index. The AP for single-type target detection is to calculate the "P-R curve" and the area enclosed by the horizontal and vertical axes of this type. In target detection, to determine the four types of samples TP, FP, TN and FN, it is necessary to calculate the IoU between each prediction frame and the true reference frame. Only when the threshold is greater than the set threshold, the sample can be judged as a positive sample .
对于单类目标检测AP表示为:For single-type target detection AP is expressed as:
Figure PCTCN2020140022-appb-000003
Figure PCTCN2020140022-appb-000003
本实施例采用AP,AP50,AP75,AP60,AP70,AP80,AP90等指标对采用的各对比方法进行目标检测精度评价。AP50是指当IoU设定为0.5时的AP值,其余指标的意义与AP50类似。可以看出, 当IoU越高,表明目标检测的精度越高,难度越大。In this embodiment, indicators such as AP, AP50, AP75, AP60, AP70, AP80, and AP90 are used to evaluate the accuracy of target detection for each comparison method used. AP50 refers to the AP value when IoU is set to 0.5, and the meaning of other indicators is similar to AP50. It can be seen that the higher the IoU, the higher the accuracy of target detection and the greater the difficulty.
图6和图7给出了在使用Cascade RCNN为基础网络架构,采用原始边框回归函数与使用本发明提供的平滑边框回归函数的目标检测结果。图6是原始边框回归函数结果,图7是本发明提供的平滑边框回归函数结果。可以清晰的看出本发明方法在定位精度要高于原始边框回归函数。Figures 6 and 7 show the target detection results using Cascade RCNN as the basic network architecture, using the original border regression function and the smooth border regression function provided by the present invention. Fig. 6 is the result of the original frame regression function, and Fig. 7 is the result of the smooth frame regression function provided by the present invention. It can be clearly seen that the positioning accuracy of the method of the present invention is higher than that of the original frame regression function.
表1给出了在其它网络架构下,本发明方法和原始方法的检测结果对比。其中√表示在给定网络架构下,使用本发明提供的平滑边框回归函数的精度,没有√则表示使用的是原始边框回归函数。Table 1 shows the comparison between the detection results of the method of the present invention and the original method under other network architectures. Where √ indicates the accuracy of using the smooth frame regression function provided by the present invention under a given network architecture, and if there is no √, it indicates that the original frame regression function is used.
表1各网络架构下目标检测精度对比Table 1 Comparison of target detection accuracy under each network architecture
Figure PCTCN2020140022-appb-000004
Figure PCTCN2020140022-appb-000004
Figure PCTCN2020140022-appb-000005
Figure PCTCN2020140022-appb-000005
AP表示总体平均精度指标,AP50表示IoU大于0.5的阈值下的平均精度,AP750表示IoU大于0.75的阈值下的平均精度,后面同理。可见采用了本发明提供的平滑边框回归函数后,各网络架构在各个IoU级别下的精度均有了明显提升,尤其在高IoU阈值下的检测提升较高,表明本发明提供的平滑边框回归函数在提升定位精度上有较明显的效果,尤其是高IoU阈值下,效果更加明显。AP represents the overall average accuracy index, AP50 represents the average accuracy under the threshold where the IoU is greater than 0.5, and AP750 represents the average accuracy under the threshold where the IoU is greater than 0.75, the same goes for the following. It can be seen that after the smooth border regression function provided by the present invention is used, the accuracy of each network architecture at each IoU level has been significantly improved, especially the detection improvement under the high IoU threshold is higher, indicating that the smooth border regression function provided by the present invention It has a more obvious effect in improving the positioning accuracy, especially at a high IoU threshold, the effect is more obvious.
利用本发明的技术方案,基于平滑边框回归函数能够更好地实现目标检测中的边框回归过程,在高IoU阈值条件下,检测出的目标边框精度更好,与原始边框回归函数相比,本发明提供的平滑边框回归函数可以实现精度更高的目标检测。本发明提供的平滑边框回归函数能够用于任何目标检测网络框架中。With the technical solution of the present invention, the frame regression process in target detection can be better realized based on the smooth frame regression function. Under the condition of high IoU threshold, the accuracy of the detected target frame is better. Compared with the original frame regression function, the original frame regression function is more accurate. The smooth border regression function provided by the invention can realize target detection with higher precision. The smooth border regression function provided by the present invention can be used in any target detection network framework.
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等, 均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modification, equivalent replacement and improvement made within the spirit and principle of the present invention shall be included in the protection of the present invention. Within range.

Claims (3)

  1. 一种基于平滑边框回归函数的遥感图像目标检测方法,其特征在于,包括以下步骤:A remote sensing image target detection method based on a smooth border regression function is characterized in that it includes the following steps:
    步骤一、图像预处理:对训练图像进行必要的预处理,包括图像旋转、镜像等增强操作,图像归一化操作,图像大小调整操作,并设置网络训练的超参数;Step 1. Image preprocessing: Perform necessary preprocessing on training images, including image rotation, mirroring and other enhancement operations, image normalization operations, image size adjustment operations, and setting of hyperparameters for network training;
    步骤二、特征提取:将图片输入到目标检测类卷积神经网络中,得到特征图;然后将特征图输入到区域建议网络中,得到候选框;再将候选框与特征图一同送入到感兴趣区域池化层中,得到感兴趣区域特征;Step 2. Feature extraction: input the image into the target detection class convolutional neural network to obtain the feature map; then input the feature map into the regional suggestion network to obtain the candidate frame; then send the candidate frame and the feature map to the sensor In the region of interest pooling layer, the characteristics of the region of interest are obtained;
    步骤三、分类:将步骤二得到的感兴趣区域特征送入到softmax分类器中进行分类;Step three, classification: send the region of interest features obtained in step two to the softmax classifier for classification;
    步骤四、回归:将步骤三得到的感兴趣区域特征送入到全连接层中得到预测的偏移量,将预测的偏移量送入到平滑边框回归函数中得到实际偏移量,依据偏移量,将候选框修正至新的位置;Step 4. Regression: Send the features of the region of interest obtained in step 3 to the fully connected layer to get the predicted offset, and send the predicted offset to the smooth border regression function to get the actual offset. Move the amount to correct the candidate frame to a new position;
    步骤五、修正:将候选框经过回归修正后的边界框作为新的候选框,与特征图一同送入到感兴趣区域层中,得到感兴趣区域特征,重复步骤三、步骤四、步骤五,直到训练过程结束,得到训练好的网络;Step 5. Correction: Use the bounding box of the candidate box after regression correction as the new candidate box, and send it to the region of interest layer together with the feature map to obtain the region of interest feature, repeat step 3, step 4, and step 5. Until the training process is over, a trained network is obtained;
    步骤六、将待检测图像经过预处理后输入到训练好的网络中,得到目标检测结果。Step 6. Input the image to be detected into the trained network after preprocessing to obtain the target detection result.
  2. 根据权利要求1所述的一种基于平滑边框回归函数的遥感图像目标检测方法,其特征在于,所述的用于回归的平滑边框回归函数为:A remote sensing image target detection method based on a smooth frame regression function according to claim 1, wherein the smooth frame regression function used for regression is:
    (sgn((t x/c x))×|(t x/c x)|) 4/3×p w+p x=G x (sgn((t x /c x ))×|(t x /c x )|) 4/3 ×p w +p x =G x
    (sgn((t y/c y))×|(t y/c y)|) 4/3×p h+p y=G y (sgn((t y /c y ))×|(t y /c y )|) 4/3 ×p h +p y =G y
    exp(sgn((t w/c w))×|(t w/c w)|) 4/3×p w=G w exp(sgn((t w /c w ))×|(t w /c w )|) 4/3 ×p w =G w
    exp(sgn((t h/c h))×|(t h/c h)|) 4/3×p h=G h exp(sgn((t h /c h ))×|(t h /c h )|) 4/3 × p h =G h
    其中,sgn表示符合函数,保证负数运算的时候不出错,exp是指数函数,c x,c y,c w,c h为回归的权重调节值,t x,t y,t h,t w是卷积神经网络预测的偏移量,p x,p y是候选框中心点的位置坐标,p w,p h是候选框的宽和高,G x,G y是回归修正之后的边界框中心点位置坐标,G w,G h是回归修正之后的边界框的宽和高。 Among them, sgn represents the coincidence function, to ensure that there is no error in the operation of negative numbers, exp is an exponential function, c x , c y , c w , c h are the weight adjustment values of the regression, t x , t y , t h , t w are The offset predicted by the convolutional neural network, p x , p y are the position coordinates of the center point of the candidate box, p w , p h are the width and height of the candidate box, G x , G y are the center of the bounding box after regression correction Point position coordinates, G w , G h are the width and height of the bounding box after regression correction.
  3. 根据权利要求1所述的一种基于平滑边框回归函数的遥感图像目标检测方法,其特征在于,所述的目标检测类卷积神经网络包括但不限于Faster RCNN,YOLO v1,YOLO v2,YOLO v3,SSD,FPN,RetinaNet,Cascade RCNN。A remote sensing image target detection method based on a smooth border regression function according to claim 1, wherein the target detection convolutional neural network includes but not limited to Faster RCNN, YOLO v1, YOLO v2, YOLO v3 , SSD, FPN, RetinaNet, Cascade RCNN.
PCT/CN2020/140022 2020-04-16 2020-12-28 Remote-sensing image target detection method based on smooth bounding box regression function WO2021208502A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010302996.7 2020-04-16
CN202010302996.7A CN111553212B (en) 2020-04-16 2020-04-16 Remote sensing image target detection method based on smooth frame regression function

Publications (1)

Publication Number Publication Date
WO2021208502A1 true WO2021208502A1 (en) 2021-10-21

Family

ID=72005720

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/140022 WO2021208502A1 (en) 2020-04-16 2020-12-28 Remote-sensing image target detection method based on smooth bounding box regression function

Country Status (2)

Country Link
CN (1) CN111553212B (en)
WO (1) WO2021208502A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113920375A (en) * 2021-11-01 2022-01-11 国网新疆电力有限公司营销服务中心(资金集约中心、计量中心) Fusion characteristic typical load recognition method and system based on combination of Faster R-CNN and SVM
CN114529552A (en) * 2022-03-03 2022-05-24 北京航空航天大学 Remote sensing image building segmentation method based on geometric contour vertex prediction
CN114707532A (en) * 2022-01-11 2022-07-05 中铁隧道局集团有限公司 Ground penetrating radar tunnel disease target detection method based on improved Cascade R-CNN
CN114757970A (en) * 2022-04-15 2022-07-15 合肥工业大学 Multi-level regression target tracking method and system based on sample balance
CN114792300A (en) * 2022-01-27 2022-07-26 河南大学 Multi-scale attention X-ray broken needle detection method
CN114925387A (en) * 2022-04-02 2022-08-19 北方工业大学 Sorting system and method based on end edge cloud architecture and readable storage medium
CN115170883A (en) * 2022-07-19 2022-10-11 哈尔滨市科佳通用机电股份有限公司 Method for detecting loss fault of brake cylinder piston push rod open pin
CN116645523A (en) * 2023-07-24 2023-08-25 济南大学 Rapid target detection method based on improved RetinaNet

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553212B (en) * 2020-04-16 2022-02-22 中国科学院深圳先进技术研究院 Remote sensing image target detection method based on smooth frame regression function
CN112132033B (en) * 2020-09-23 2023-10-10 平安国际智慧城市科技股份有限公司 Vehicle type recognition method and device, electronic equipment and storage medium
CN112232180A (en) * 2020-10-14 2021-01-15 上海海洋大学 Night underwater fish target detection method
CN112464769A (en) * 2020-11-18 2021-03-09 西北工业大学 High-resolution remote sensing image target detection method based on consistent multi-stage detection
CN112560682A (en) * 2020-12-16 2021-03-26 重庆守愚科技有限公司 Valve automatic detection method based on deep learning
CN115035552B (en) * 2022-08-11 2023-01-17 深圳市爱深盈通信息技术有限公司 Fall detection method and device, equipment terminal and readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020012451A1 (en) * 2000-06-13 2002-01-31 Ching-Fang Lin Method for target detection and identification by using proximity pixel information
CN108052940A (en) * 2017-12-17 2018-05-18 南京理工大学 SAR remote sensing images waterborne target detection methods based on deep learning
CN108564109A (en) * 2018-03-21 2018-09-21 天津大学 A kind of Remote Sensing Target detection method based on deep learning
CN109800755A (en) * 2018-12-14 2019-05-24 中国科学院深圳先进技术研究院 A kind of remote sensing image small target detecting method based on Analysis On Multi-scale Features
CN110288017A (en) * 2019-06-21 2019-09-27 河北数云堂智能科技有限公司 High-precision cascade object detection method and device based on dynamic structure optimization
CN110956157A (en) * 2019-12-14 2020-04-03 深圳先进技术研究院 Deep learning remote sensing image target detection method and device based on candidate frame selection
CN111553212A (en) * 2020-04-16 2020-08-18 中国科学院深圳先进技术研究院 Remote sensing image target detection method based on smooth frame regression function

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110211097B (en) * 2019-05-14 2021-06-08 河海大学 Crack image detection method based on fast R-CNN parameter migration

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020012451A1 (en) * 2000-06-13 2002-01-31 Ching-Fang Lin Method for target detection and identification by using proximity pixel information
CN108052940A (en) * 2017-12-17 2018-05-18 南京理工大学 SAR remote sensing images waterborne target detection methods based on deep learning
CN108564109A (en) * 2018-03-21 2018-09-21 天津大学 A kind of Remote Sensing Target detection method based on deep learning
CN109800755A (en) * 2018-12-14 2019-05-24 中国科学院深圳先进技术研究院 A kind of remote sensing image small target detecting method based on Analysis On Multi-scale Features
CN110288017A (en) * 2019-06-21 2019-09-27 河北数云堂智能科技有限公司 High-precision cascade object detection method and device based on dynamic structure optimization
CN110956157A (en) * 2019-12-14 2020-04-03 深圳先进技术研究院 Deep learning remote sensing image target detection method and device based on candidate frame selection
CN111553212A (en) * 2020-04-16 2020-08-18 中国科学院深圳先进技术研究院 Remote sensing image target detection method based on smooth frame regression function

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CAI ZHAOWEI; VASCONCELOS NUNO: "Cascade R-CNN: Delving Into High Quality Object Detection", 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, IEEE, 18 June 2018 (2018-06-18), pages 6154 - 6162, XP033473531, DOI: 10.1109/CVPR.2018.00644 *
SHENG YUAN: "Research on Object Detection Algorithm of Remote Sensing Images Based on Deep Learning", CHINESE MASTER'S THESES FULL-TEXT DATABASE, TIANJIN POLYTECHNIC UNIVERSITY, CN, 15 July 2020 (2020-07-15), CN , XP055857596, ISSN: 1674-0246 *
WANG GUOWEN: "Pedestrian and Vehicle Detection Using Improved YOLOv3 Network with Multiscale Feature Fusion", CHINESE MASTER'S THESES FULL-TEXT DATABASE, TIANJIN POLYTECHNIC UNIVERSITY, CN, 15 February 2020 (2020-02-15), CN , XP055857603, ISSN: 1674-0246 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113920375A (en) * 2021-11-01 2022-01-11 国网新疆电力有限公司营销服务中心(资金集约中心、计量中心) Fusion characteristic typical load recognition method and system based on combination of Faster R-CNN and SVM
CN114707532A (en) * 2022-01-11 2022-07-05 中铁隧道局集团有限公司 Ground penetrating radar tunnel disease target detection method based on improved Cascade R-CNN
CN114707532B (en) * 2022-01-11 2023-05-19 中铁隧道局集团有限公司 Improved Cascade R-CNN-based ground penetrating radar tunnel disease target detection method
CN114792300B (en) * 2022-01-27 2024-02-20 河南大学 X-ray broken needle detection method based on multi-scale attention
CN114792300A (en) * 2022-01-27 2022-07-26 河南大学 Multi-scale attention X-ray broken needle detection method
CN114529552A (en) * 2022-03-03 2022-05-24 北京航空航天大学 Remote sensing image building segmentation method based on geometric contour vertex prediction
CN114925387A (en) * 2022-04-02 2022-08-19 北方工业大学 Sorting system and method based on end edge cloud architecture and readable storage medium
CN114925387B (en) * 2022-04-02 2024-06-07 北方工业大学 Sorting system, method and readable storage medium based on end-edge cloud architecture
CN114757970A (en) * 2022-04-15 2022-07-15 合肥工业大学 Multi-level regression target tracking method and system based on sample balance
CN114757970B (en) * 2022-04-15 2024-03-08 合肥工业大学 Sample balance-based multi-level regression target tracking method and tracking system
CN115170883A (en) * 2022-07-19 2022-10-11 哈尔滨市科佳通用机电股份有限公司 Method for detecting loss fault of brake cylinder piston push rod open pin
CN115170883B (en) * 2022-07-19 2023-03-14 哈尔滨市科佳通用机电股份有限公司 Brake cylinder piston push rod opening pin loss fault detection method
CN116645523B (en) * 2023-07-24 2023-12-01 江西蓝瑞存储科技有限公司 Rapid target detection method based on improved RetinaNet
CN116645523A (en) * 2023-07-24 2023-08-25 济南大学 Rapid target detection method based on improved RetinaNet

Also Published As

Publication number Publication date
CN111553212A (en) 2020-08-18
CN111553212B (en) 2022-02-22

Similar Documents

Publication Publication Date Title
WO2021208502A1 (en) Remote-sensing image target detection method based on smooth bounding box regression function
CN110059554B (en) Multi-branch target detection method based on traffic scene
Yang et al. Real-time face detection based on YOLO
CN109117876B (en) Dense small target detection model construction method, dense small target detection model and dense small target detection method
CN110070074B (en) Method for constructing pedestrian detection model
CN110796186A (en) Dry and wet garbage identification and classification method based on improved YOLOv3 network
CN110033473B (en) Moving target tracking method based on template matching and depth classification network
JP5227888B2 (en) Person tracking method, person tracking apparatus, and person tracking program
CN111723798B (en) Multi-instance natural scene text detection method based on relevance hierarchy residual errors
CN110930387A (en) Fabric defect detection method based on depth separable convolutional neural network
CN112767357A (en) Yolov 4-based concrete structure disease detection method
CN107844785A (en) A kind of method for detecting human face based on size estimation
CN107909027A (en) It is a kind of that there is the quick human body target detection method for blocking processing
CN109284779A (en) Object detecting method based on the full convolutional network of depth
CN113435282B (en) Unmanned aerial vehicle image ear recognition method based on deep learning
CN109087261A (en) Face antidote based on untethered acquisition scene
CN111860587A (en) Method for detecting small target of picture
WO2023160666A1 (en) Target detection method and apparatus, and target detection model training method and apparatus
CN117495735B (en) Automatic building elevation texture repairing method and system based on structure guidance
CN114529552A (en) Remote sensing image building segmentation method based on geometric contour vertex prediction
CN113850761A (en) Remote sensing image target detection method based on multi-angle detection frame
CN113496480A (en) Method for detecting weld image defects
CN113627302B (en) Ascending construction compliance detection method and system
CN110276358A (en) High similarity wooden unit cross section detection method under intensive stacking
CN112199984B (en) Target rapid detection method for large-scale remote sensing image

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20930942

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20930942

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20930942

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04.07.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20930942

Country of ref document: EP

Kind code of ref document: A1