CN112101277A - Remote sensing target detection method based on image semantic feature constraint - Google Patents

Remote sensing target detection method based on image semantic feature constraint Download PDF

Info

Publication number
CN112101277A
CN112101277A CN202011018965.5A CN202011018965A CN112101277A CN 112101277 A CN112101277 A CN 112101277A CN 202011018965 A CN202011018965 A CN 202011018965A CN 112101277 A CN112101277 A CN 112101277A
Authority
CN
China
Prior art keywords
feature
frame
remote sensing
feature map
image semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011018965.5A
Other languages
Chinese (zh)
Other versions
CN112101277B (en
Inventor
孙斌
马付严
李树涛
孙俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Fujitsu Ltd
Original Assignee
Hunan University
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University, Fujitsu Ltd filed Critical Hunan University
Priority to CN202011018965.5A priority Critical patent/CN112101277B/en
Publication of CN112101277A publication Critical patent/CN112101277A/en
Application granted granted Critical
Publication of CN112101277B publication Critical patent/CN112101277B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing target detection method based on image semantic feature constraint, which comprises the following steps: extracting features of the input image by adopting a depth residual error network ResNet50 and a feature pyramid network, fusing to obtain a multi-scale feature map input center estimation module, and combining the output and input of the center estimation module to obtain an image semantic feature map with negative samples filtered; utilizing the extracted image semantic features to restrict the generation of anchor points in any direction, rotating a candidate region generation network, extracting candidate regions from the image semantic feature map with negative samples filtered, and extracting feature vectors with uniform size for each candidate region by rotating an interest region aggregation layer; and finishing classification and regression tasks by using the two full-connection layer branches respectively to obtain the detection result and the detection position of each candidate region in the input remote sensing image. The invention greatly reduces the calculation cost and improves the detection speed and the detection accuracy.

Description

Remote sensing target detection method based on image semantic feature constraint
Technical Field
The invention relates to an image target detection method, in particular to a remote sensing target detection method based on image semantic feature constraint.
Background
The need for intelligent transportation and ground observation has led to great interest in remote sensing image vehicle detection. It aims to identify the class of vehicles and to accurately locate each vehicle in the remote sensing image. Although much effort has been expended to address this task, vehicle detection remains very challenging due to the various sizes and appearances of vehicles in the remote sensing images. In particular, detecting vehicles with arbitrary directions also makes it an extremely difficult task, since direct application of horizontal object detection methods often results in regions of interest (RoIs) and vehicle regions not matching, thus greatly expanding the search space.
Fast R-CNN published by Shaoqing Ren et al (in Advances in Neural Information Processing Systems,2015, pp.91-99.) records anchor points (or initially estimated object borders) of different sizes and different aspect ratios, and based on the way the preset anchor points regress the object position in the image, has proven to be effective in open reference testing. Most of the target detection methods in any direction also adopt the same strategy, taking a rotation candidate region generation network recorded in "aligned-oriented scene text detection view rotation protocols" published by Jianqi Ma et al (IEEE Transactions on Multimedia, vol.20, No.11, pp.3111-3122,2018) as an example, the method generates rotation candidate regions (or a set of candidate frames) by using anchor points with angles, and returns and refines the positions of the rotation candidate regions based on the rotation candidate regions. The detection performance of the detection algorithm based on the anchor points is good, but the algorithm usually starts from a large number of anchor points which are distributed densely, the intersection ratio of a truth border and a prediction border is calculated in the training stage of an algorithm model, and a large amount of calculation cost is generated by removing a negative sample of the prediction border of which the intersection ratio is smaller than a threshold value. The anchorless detection method described in "Central: Keypoint triplets for object detection" by Kaiwen Duan et al (in Proceedings of the IEEE International Conference on Computer Vision,2019, pp.6569-6578.), and "Corneret: Detecting objects as paired keypoints" by Hei Law and Jia Deng (in Proceedings of the European Conference on Computer Vision,2018, pp.734-750.) predicts borders by key points rather than anchors of predetermined size and aspect ratio. However, since only the key points are used for predicting the bounding box, the anchor-free detection method has a lower recall rate than the anchor-based detection method.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the invention provides a remote sensing target detection method based on image semantic feature constraint, which utilizes semantic feature information in an image to constrain anchor point generation, greatly reduces the calculation cost, and improves the detection speed and the detection accuracy.
In order to solve the technical problems, the invention adopts the technical scheme that:
a remote sensing target detection method based on image semantic feature constraint is characterized by comprising the following steps:
step 1): extracting the characteristics of the input image by adopting a depth residual error network ResNet50 and a characteristic pyramid network, and fusing to obtain a multi-scale characteristic map fusing multi-scale information;
step 2): filtering the negative samples by the multi-scale feature map obtained by fusion through a center estimation module, and combining the center feature map output by the center estimation module and the feature map input into the center estimation module to filter out the negative samples to obtain an image semantic feature map with the negative samples filtered out;
step 3): utilizing the extracted image semantic features to restrict the generation of anchor points in any direction, generating anchors in the image semantic feature map with negative samples filtered, rotating a candidate region generation network based on the anchors generated by the anchor points to obtain candidate regions, and extracting feature vectors with uniform sizes for each candidate region by rotating an interest region aggregation layer;
step 4): and (3) aiming at the feature vectors with uniform size extracted from each candidate region, completing classification and regression tasks by utilizing two full-connection layer branches respectively to obtain the detection result and the detection position of each candidate region in the input remote sensing image.
Optionally, the detailed steps of step 1) include: down-sampling: downsampling the input remote sensing image through a depth residual network ResNet50, calling a layer with the unchanged feature map size of a depth residual network ResNet50 as a stage, and obtaining 4 stages of feature maps C2, C3, C4 and C5 with 4 scales; and (3) upsampling: forming feature maps C2, C3, C4 and C5 of 4 scales into a feature pyramid network, performing 2-time upsampling on the feature map C5 by using bilinear interpolation, fixing the feature dimension to be 256 by using 1-by-1 convolution layers on the feature map C4, and finally adding the feature maps of two stages with the same size according to elements to obtain a fused feature map F4; 2 times of upsampling is carried out on the feature map F4, the feature dimension is fixed to be 256, then the feature dimension is fixed to be 256 on the feature map C3, and the feature map F3 is obtained by adding the two according to elements; 2 times of upsampling is carried out on the feature diagram F3, the feature dimension 256 is fixed, then the feature dimension 256 is fixed on the feature diagram C2, the feature diagram F2 fusing high-order features and low-order features is obtained by adding the two according to elements, and the feature diagram F2 is output as a feature diagram fusing multi-scale information.
Optionally, the center estimation module in step 2) is composed of a 1 × 1 convolution layer and an element-based sigmoid active layer, and is configured to convert the feature map of the input fused multi-scale information into a center feature map with a consistent size and representing the existence probability of the positive sample, and multiply the feature map of the input fused multi-scale information and the center feature map by elements, so that the area element value of the negative sample in the final feature map is close to 0, and the element value of the positive sample area is approximately kept unchanged.
Optionally, step 2) is preceded by a step of training a central estimation module, and a focused Loss function, Focal local, is used to supervise branches of the central estimation module during the training of the central estimation module, wherein a functional expression of the focused Loss function, Focal local, is as follows:
fl=-(1-p)αlog(p)
in the above formula, fl is a function value of a Focal Loss function Focal local, p represents a probability that a sample is a positive sample, and α is a coefficient, wherein the positive sample is a sample in which the intersection-sum ratio of a preset anchor point to a real frame in a remote sensing image is higher than a threshold, and the negative sample is a sample in which the intersection-sum ratio of the preset anchor point to the real frame in the remote sensing image is lower than the threshold.
Optionally, the rotation candidate region generating network in step 3) includes 3 × 3 convolutional layers and two 1 × 1 convolutional layers, and the rotation candidate region generating network is configured to output the feature map through the 3 × 3 convolutional layers to obtain feature maps that are consistent with H and W of the input feature map, and pass the feature maps through the two 1 × 1 convolutional layers respectively to obtain two groups of feature maps that respectively include category information and location information.
Optionally, a step of training a rotation candidate area generation network is further included before step 3), and training the rotation candidate area generation network to generate a candidate area, and whether the candidate area is a positive sample needs to be determined when calculating the intersection ratio of the candidate area and the real frame, where the determination principle is as follows: for the positive sample of the candidate region, the following requirements are satisfied: 1) the intersection ratio of the frame and the real frame is highest or is more than 0.7; 2) the included angle between the frame and the real frame is less than pi/12; for the candidate region negative examples, one of them needs to be satisfied: 1) the intersection ratio of the frame and the real frame is less than 0.3; 2) the intersection ratio of the positive sample and the negative sample to the real frame is more than 0.7, but the included angle between the positive sample and the real frame is more than pi/12, then the oblique intersection ratio is calculated for all the candidate areas of the positive sample and the negative sample, and the candidate areas which do not satisfy the positive sample and the negative sample do not participate in training; and then, inputting the feature vectors with uniform size output by the rotating interest region layer into a full convolution network pair, using a focusing loss function to supervise the rotating candidate region generation network, and repeating the process to finally finish the training of the rotating candidate region generation network.
Optionally, when the classification task and the regression task are completed by using two full-link layer branches in step 4), the classification task uses a Loss function Softmax Loss supervision network to complete training, the regression task of the border uses a Loss function Smooth L1 Loss, and a function expression used for calculating a regression variable is as follows:
Figure BDA0002700045430000031
Figure BDA0002700045430000032
tθ=θ-θa
Figure BDA0002700045430000033
Figure BDA0002700045430000034
Figure BDA0002700045430000035
in the above formula, (x, y, w, h, θ) are respectively the horizontal coordinate, vertical coordinate, frame width, frame height, and rotation angle of the central point of the predicted target frame, (x)a,ya,wa,ha,θa) Respectively the abscissa, ordinate, frame width, frame height and rotation angle of the center point of the anchor point frame (x)*,y*,w*,h*,θ*) The horizontal coordinate, the vertical coordinate, the frame width, the frame height and the rotation angle of the center point of the real target frame are respectively. (t)x,ty,tw,th,tθ) To predict the offset of the bounding box from the anchor box,
Figure BDA0002700045430000036
the offset of the real frame and the anchor frame is obtained; and the regression task of the bounding box uses the Loss function Smooth L1 Loss to calculate the functional expression of the Loss for two offsets as follows:
Figure BDA0002700045430000037
in the above formula, the first and second carbon atoms are,
Figure BDA0002700045430000045
represents the total regression loss of the true offset versus the predicted offset,
Figure BDA0002700045430000041
for the offset of the real frame from the anchor frame, t ═ tx,ty,tw,th,tθ) In order to predict the offset of the frame and the anchor frame, (x, y, w, h, theta) are respectively the abscissa, ordinate, frame width, frame height and rotation angle of the center point of the predicted target frame,
Figure BDA0002700045430000042
is about t*Smooth L1 loss of-t, t*-t represents the difference of the true offset and the predicted offset, with a loss of smoothed L1 for any x
Figure BDA0002700045430000043
The function of (a) is expressed as follows:
Figure BDA0002700045430000044
in addition, the invention also provides a remote sensing target detection system of image semantic feature constraint, which comprises a computer device, wherein the computer device comprises a microprocessor and a memory which are connected with each other, and the microprocessor of the computer device is programmed or configured to execute the steps of the remote sensing target detection method of image semantic feature constraint.
In addition, the invention also provides a remote sensing target detection system of image semantic feature constraint, which comprises a computer device, wherein the computer device comprises a microprocessor and a memory which are connected with each other, and a computer program which is programmed or configured to execute the remote sensing target detection method of the image semantic feature constraint is stored in the memory of the computer device.
In addition, the invention also provides a computer readable storage medium, wherein the steps of the remote sensing target detection method which is programmed or configured to execute the image semantic feature constraint are stored in the computer readable storage medium.
Compared with the prior art, the invention has the following advantages: 1) the remote sensing target detection method based on image semantic feature constraint comprises the steps of extracting features of an input image by adopting a depth residual error network ResNet50 and a feature pyramid network, and fusing to obtain a multi-scale feature map, so that accurate extraction of image features can be realized. 2) The method comprises the steps of filtering negative samples through a center estimation module by using feature maps obtained by fusion, combining the center feature map output by the center estimation module with the feature map input into the center estimation module to obtain an image semantic feature map with negative samples filtered, filtering out anchor points with low probability of covering a vehicle area by using semantic information, using part of the anchor points to participate in the generation of a rotation candidate area, increasing image semantic feature constraint for a detection method, and only operating on a small number of generated candidate areas by subsequent calculation, thereby retaining the performance advantage of the anchor point-based detection method and improving the detection speed. 3) The method comprises the steps of generating a network by rotating a candidate region, extracting the candidate region from an image semantic feature map with negative samples filtered, and extracting feature vectors with uniform size for each candidate region by rotating an interest region aggregation layer; the invention extracts the feature vectors with uniform size from each candidate region, completes classification and regression tasks by utilizing two full-connection layer branches respectively, obtains the detection result and the detection position of each candidate region in the input remote sensing image, can realize the remote sensing target detection of image semantic feature constraint, and can realize the detection result and the detection position of the candidate region at the same time. In conclusion, the invention fully considers the semantic information in the image, greatly reduces the calculation cost and improves the detection speed and the accuracy.
Drawings
FIG. 1 is a schematic diagram of a basic flow of a method according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of a remote sensing target detection network structure constrained by image semantic features adopted in the embodiment of the present invention.
Fig. 3 is a coordinate representation method of a conventional directional bezel.
Fig. 4 is a representation of the directional bounding box of the anchor in an embodiment of the present invention.
Fig. 5 is an illustration of a spin anchor employed in an embodiment of the present invention. Where (a) anchors at different angles and (b) an example of an anchor used in the present embodiment.
Fig. 6 is a schematic diagram illustrating a process of calculating a positive sample of a candidate region according to an embodiment of the present invention.
FIG. 7 is a representation of vehicle inspection results visualized on a DOTA test data set in accordance with an embodiment of the present invention.
Detailed Description
As shown in fig. 1 and fig. 2, the method for detecting a remote sensing target constrained by image semantic features in the embodiment of the present invention includes the following steps:
step 1): extracting the characteristics of the input image by adopting a depth residual error network ResNet50 and a characteristic pyramid network, and fusing to obtain a multi-scale characteristic map fusing multi-scale information;
step 2): filtering the negative samples by the multi-scale feature map obtained by fusion through a center estimation module, and combining the center feature map output by the center estimation module and the feature map input into the center estimation module to filter out the negative samples to obtain an image semantic feature map with the negative samples filtered out;
step 3): utilizing the extracted image semantic features to restrict the generation of anchor points in any direction, generating anchors in the image semantic feature map with negative samples filtered, rotating a candidate region generation network based on the anchors generated by the anchor points to obtain candidate regions, and extracting feature vectors with uniform sizes for each candidate region by rotating an interest region aggregation layer;
step 4): and (3) aiming at the feature vectors with uniform size extracted from each candidate region, completing classification and regression tasks by utilizing two full-connection layer branches respectively to obtain the detection result and the detection position of each candidate region in the input remote sensing image.
In order to construct the remote sensing target detection network constrained by the image semantic features shown in fig. 2, in this embodiment, an open remote sensing image target detection data set DOTA (a data set with a maximum scale labeled by a directed frame in the field of remote sensing image target detection) is obtained, and a real label corresponding to the category and position of a vehicle target in an image is extracted. And taking the images of the vehicle in the training set of the DOTA data set as the training set, and taking the test set of the DOTA data set as the test set. And (3) cutting the original image in the constructed training set into sub-images with the size of 1024 x 1024 according to the step length 512, performing data amplification, changing the scales of the original image into 0.5 time and 1.5 time according to different target scales, and performing the same cutting operation. The training set thus constructed contained 106965 images in total, and the test set similarly constructed contained 74058 images in total. By constructing the remote sensing image vehicle detection training set and the test set, the remote sensing target detection network constrained by the image semantic features shown in FIG. 2 can be trained and tested.
The depth residual error network ResNet50 and the feature pyramid network FPN are divided into two processes of down sampling and up sampling, and finally F2 fusing high-order features and low-order features is obtained, and the feature map F2 can represent target information of different scales in the remote sensing image. In this embodiment, the detailed steps of step 1) include: down-sampling: downsampling the input remote sensing image through a depth residual network ResNet50, calling a layer with the unchanged feature map size of a depth residual network ResNet50 as a stage, and obtaining 4 stages of feature maps C2, C3, C4 and C5 with 4 scales; and (3) upsampling: combining feature maps C2, C3, C4 and C5 of 4 scales into a Feature Pyramid Network (FPN), performing 2-time upsampling on the feature map C5 by using bilinear interpolation, fixing a feature dimension to be 256 by 1-1 convolution layers on a feature map C4, and finally adding the feature maps of two stages with the same size by elements to obtain a fused feature map F4; 2 times of upsampling is carried out on the feature map F4, the feature dimension is fixed to be 256, then the feature dimension is fixed to be 256 on the feature map C3, and the feature map F3 is obtained by adding the two according to elements; 2 times of upsampling is carried out on the feature diagram F3, the feature dimension 256 is fixed, then the feature dimension 256 is fixed on the feature diagram C2, the feature diagram F2 fusing high-order features and low-order features is obtained by adding the two according to elements, and the feature diagram F2 is output as a feature diagram fusing multi-scale information.
In step 2) of this embodiment, a central estimation module is used to filter negative samples in an input feature map, so as to improve the speed of model vehicle detection. The intersection ratio of most of the preset anchor points and the real frame is lower than a threshold value due to uneven vehicle distribution and different sizes in the remote sensing image, the intersection ratio is less than the positive sample of the preset anchor points higher than the threshold value, and the proportion of the positive sample and the negative sample is extremely unbalanced. In order to solve the problem that a large amount of calculation cost is spent on a negative sample in the target detection process, in this embodiment, a 1 × 1 convolution layer and an element-based sigmoid activation layer are used to convert feature maps extracted by a depth residual error network ResNet50 and a feature pyramid network FPN into central feature maps which are consistent in size and represent the existence probability of the positive sample, and the feature maps extracted by the depth residual error network ResNet50 and the feature pyramid network FPN are combined with the central feature maps according to elements, so that an image semantic feature map with the negative sample filtered out is obtained. Referring to fig. 2, the center estimation module in step 2) is composed of a 1 × 1 convolution layer and an elementary-operated sigmoid activation layer, and is configured to convert the input feature map of the fused multi-scale information into a center feature map with a consistent size and representing the existence probability of the positive sample, and multiply the input feature map of the fused multi-scale information and the center feature map by elements, so that the area element value of the negative sample in the final feature map is close to 0, and the element value of the positive sample area is approximately kept unchanged.
In this embodiment, step 2) further includes a step of training the central estimation module before, and in the process of training the central estimation module, a focused Loss function, Focal local, is used to supervise branches of the central estimation module, where a functional expression of the focused Loss function, Focal local, is as follows:
fl=-(1-p)αlog(p)
in the above formula, fl is a function value of a Focal Loss function Focal local, p represents a probability that a sample is a positive sample, and α is a coefficient (α is usually 2), where a positive sample refers to a sample in which an intersection ratio of a preset anchor point to a real frame in a remote sensing image is higher than a threshold, and a negative sample refers to a sample in which an intersection ratio of a preset anchor point to a real frame in a remote sensing image is lower than a threshold. It can be seen that, when the probability of a sample being considered as a positive sample is 0.9, the contribution of the sample to the cross-entropy Loss is 100 times greater than that of the common cross-entropy Loss by using the Focal local, and therefore, the Focal local can well control the contribution degree of the sample which is easy to classify to the model.
Fig. 3 illustrates a method for labeling the rotation of the directional frame in the conventional coordinate method. In this embodiment, when the anchor point in any direction is generated in the image semantic feature map after the negative sample is filtered in step 3), in this embodiment, different methods of the directional frame of the anchor point in any direction are shown in fig. 4, that is, a tuple (x, y, w, h, θ) containing 5 elements, where θ has a value range of [ -pi/2, pi/2), and the frame exceeding the angle range is moved in the opposite direction.
The rotation candidate region is used for obtaining a candidate region by rotating the candidate region generation network based on the anchor in any direction. In this embodiment, the rotation candidate region generation network in step 3) includes 3 × 3 convolution layers and two 1 × 1 convolution layers, and the rotation candidate region generation network is configured to output the feature map through the 3 × 3 convolution layers to obtain feature maps that are consistent with H and W of the input feature map, and pass the feature maps through the two 1 × 1 convolution layers respectively to obtain two groups of feature maps that respectively include category information and position information. As shown in sub-graph (b) in fig. 5, compared to the conventional candidate region generation network, the anchor point with angle information is composed of 2 scales and 6 angles (1:2 and 1:4 and-pi/2, -pi/3, -pi/6, 0, pi/6, pi/3, see sub-graph (a) in fig. 5), and the element in the feature map in the rotation candidate region generation network corresponds to 2 × 6 — 12 anchor points. And outputting the feature map through a 3 × 3 convolution layer to obtain the feature map consistent with H and W of the input feature map, and respectively passing the feature map through two 1 × 1 convolution layers to obtain two groups of feature maps respectively containing category information and position information.
As shown in fig. 6, a step of training a rotation candidate area generation network is further included before step 3), and the step of training the rotation candidate area generation network to generate a candidate area needs to determine whether the candidate area is a positive sample when calculating the intersection ratio between the candidate area and the real frame, where the determination principle is as follows: for the positive sample of the candidate region, the following requirements are satisfied: 1) the intersection ratio of the frame and the real frame is highest or is more than 0.7; 2) the included angle between the frame and the real frame is less than pi/12; for the candidate region negative examples, one of them needs to be satisfied: 1) the intersection ratio of the frame and the real frame is less than 0.3; 2) the intersection ratio of the positive sample and the negative sample to the real frame is more than 0.7, but the included angle between the positive sample and the real frame is more than pi/12, then the oblique intersection ratio is calculated for all the candidate areas of the positive sample and the negative sample, and the candidate areas which do not satisfy the positive sample and the negative sample do not participate in training; and then, inputting the feature vectors with uniform size output by the rotating interest region layer into a full convolution network pair, using a focusing loss function to supervise the rotating candidate region generation network, and repeating the process to finally finish the training of the rotating candidate region generation network.
In this embodiment, the method for calculating the intersection ratio between the candidate region and the real frame is as follows:
s1) inputting the candidate region and the real bounding box R1,R2,R3…;
S2) traversing to select an arbitrary candidate region and a real border<Ri,Rj>(i<j) As the current rectangular box pair, if the traversal is completed, ending and exiting, and if the traversal is not completed, jumping to execute step S3);
s3) setting the point set PSet as an empty set;
s4) forming a rectangular frame RiAnd a rectangular frame RjAdding the intersected point set to a point set PSet;
s5) will be in the rectangular frame RjIn the rectangular frame RiTo the point set PSet;
s6) will be in the rectangular frame RiIn the rectangular frame RjTo the point set PSet;
s7) sorting the point set PSet by reverse time needle;
s8) calculating an intersection point I by using a triangulation method on the point set PSet;
s9) is calculated using the following formula<Ri,Rj>(i<j) Cross-to-parallel ratio between IoU [ i, j];
Figure BDA0002700045430000081
In the above formula, Area (i) represents the Area where the candidate region and the real frame intersect, Area (R)i) Area (R) representing the Area of the candidate regionj) Representing the area of the real border.
S10) jumping to perform step S2).
Referring to fig. 2, the center mask segmentation module is used to improve the model accuracy in this embodiment. In consideration of the speed detected by the model vehicle, the central estimation module only uses a 1 × 1 convolution layer and a sigmoid active layer calculated according to elements, and deep networks commonly used in semantic segmentation are not used in the embodiment. In order to obtain better estimation effect, the embodiment uses the central mask module to restrict the extraction of the vehicle position information in the training phase. For uniform-sized feature vectors output by the rotating region of interest layer, we input them into a full convolution network pair and supervise the network using a focused loss function. The full convolution network is utilized to focus the network on each pixel of the image, each pixel is classified, and then the focused loss function is used for calculating the loss of each pixel, so that the network focuses on a difficult sample, the influence of a simple sample is reduced, and the accuracy of the model is improved.
In this embodiment, when the classification and regression tasks are completed by using two full-link layer branches in step 4), the classification task uses a Loss function Softmax Loss supervision network to complete training, the regression task of the border uses a Loss function Smooth L1 Loss, and a function expression used for calculating a regression variable is as follows:
Figure BDA0002700045430000082
Figure BDA0002700045430000083
tθ=θ-θa
Figure BDA0002700045430000084
Figure BDA0002700045430000085
Figure BDA0002700045430000086
in the above formula, (x, y, w, h, θ) are respectively the horizontal coordinate, vertical coordinate, frame width, frame height, and rotation angle of the central point of the predicted target frame, (x)a,ya,wa,ha,θa) Respectively the abscissa, ordinate, frame width, frame height and rotation angle of the center point of the anchor point frame (x)*,y*,w*,h*,θ*) The horizontal coordinate, the vertical coordinate, the frame width, the frame height and the rotation angle of the center point of the real target frame are respectively. (t)x,ty,tw,th,tθ) To predict the offset of the bounding box from the anchor box,
Figure BDA0002700045430000087
the offset of the real frame and the anchor frame is obtained; and the regression task of the bounding box uses the Loss function Smooth L1 Loss to calculate the functional expression of the Loss for two offsets as follows:
Figure BDA0002700045430000088
in the above formula, the first and second carbon atoms are,
Figure BDA0002700045430000089
represents the total regression loss of the true offset versus the predicted offset,
Figure BDA00027000454300000810
for the offset of the real frame from the anchor frame, t ═ tx,ty,tw,th,tθ) In order to predict the offset of the frame and the anchor frame, (x, y, w, h, theta) are respectively the abscissa, ordinate, frame width, frame height and rotation angle of the center point of the predicted target frame,
Figure BDA0002700045430000091
is about t*Smooth L1 loss of-t, t*-t represents the difference of the true offset and the predicted offset, with a loss of smoothed L1 for any x
Figure BDA0002700045430000092
The function of (a) is expressed as follows:
Figure BDA0002700045430000093
FIG. 7 is a representation of vehicle inspection results visualized on a DOTA test data set, where the box labeled A is the inspected truck and the remaining boxes are cars.
The initialization parameters of the deep residual error network ResNet50 used in this embodiment inherit the parameters of the pre-trained ImageNet dataset, the initial learning rate of the training model is set to 0.01, and the total iteration cycle is 12 rounds. The learning rate becomes 1/10 of the previous stage learning rate after the 8 th round and the 11 th round. And inputting the constructed remote sensing image vehicle detection training set into a constructed remote sensing target detection model constrained by image semantic features for training, training the model by using an SGD (generalized minimum mean square) optimization algorithm, and obtaining the trained remote sensing image vehicle detection model when the training reaches 12 rounds. The objective of the testing stage is to obtain the position information and the category information of the vehicle in each image, and the center mask segmentation module contained in the trained model is not needed, so that the position and the category of the vehicle are obtained by regression and classification respectively. And in the testing stage, only the predicted frames with the scores higher than 0.3 are selected by the frames of the vehicle, and the non-maximum value inhibition with the threshold value of 0.5 is applied for repeated deletion.
Table 1 shows the quantitative evaluation results of the remote sensing target detection method and other methods based on the image semantic feature constraint of this embodiment. Wherein: FR-O stands for FasterRCNN OBB detector, which is the official benchmark provided by DOTA. Mode 1 of the method of the present embodiment represents an example (without a central mask segmentation module) in which the method of the present embodiment has only semantic feature constraints of an image; mode 2 of the method of the present embodiment represents an example in which the method of the present embodiment has an image semantic feature constraint and a center mask segmentation module; mode 3 of the method of the present embodiment shows an example in which anchor points of various proportions are used on the basis of method 2. Through tests, the methods 1-3 of the embodiment are superior to other methods in terms of Average accuracy Average (mAP) and time cost. The method provided by modes 1-3 of the method of the embodiment obtains 76.9% of mAP, which is 40.2% higher than the official standard. Compared with the method of generating the network by using the rotation candidate area alone, the method of the embodiment has the advantage that the vehicle detection performance of the modes 1-3 is 6.2% higher. The method with the center mask split module of mode 2 of the method of this embodiment improves the mAP by 3.4% over the method without this branch. The time costs of embodiments 1 to 3 of the method of the present embodiment are also reported in table 1. The best results in each case are highlighted in bold. SV represents the average accuracy of small vehicle detection, LV represents the average accuracy of large vehicle detection, and mAP represents the average of the average accuracies of all categories.
Table 1: quantitative evaluation of the results of the different methods (average precision + run time).
Figure BDA0002700045430000094
Figure BDA0002700045430000101
In summary, direct application of conventional horizontal anchor-based detection methods to vehicle detection in any direction typically results in poor performance. Although rotational anchors have been used to address this problem, this design incurs significant computational costs due to the thousands of rotational anchors generated in each level of the feature map. In order to solve the problem, the embodiment provides a remote sensing target detection method based on image semantic feature constraint, before model calculation intersection, semantic information is used for filtering out anchors with low probability of covering a vehicle region, anchors in any direction are used for participating in generation of a rotation candidate region, subsequent calculation is only performed on a small number of generated candidate regions, performance advantages based on an anchor point detection method are reserved, and detection speed is improved. In general, the embodiment fully considers semantic information in the image, greatly reduces the calculation cost, and improves the detection speed and accuracy.
In addition, the embodiment also provides an image semantic feature constrained remote sensing target detection system, which comprises a computer device, wherein the computer device comprises a microprocessor and a memory which are connected with each other, and the microprocessor of the computer device is programmed or configured to execute the steps of the image semantic feature constrained remote sensing target detection method.
In addition, the embodiment also provides an image semantic feature constrained remote sensing target detection system, which includes a computer device, where the computer device includes a microprocessor and a memory connected to each other, and a computer program programmed or configured to execute the aforementioned image semantic feature constrained remote sensing target detection method is stored in the memory of the computer device.
In addition, the embodiment also provides a computer readable storage medium, and the computer readable storage medium stores the steps programmed or configured to execute the remote sensing target detection method of the semantic feature constraint of the image.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is directed to methods, apparatus (systems), and computer program products according to embodiments of the application wherein instructions, which execute via a flowchart and/or a processor of the computer program product, create means for implementing functions specified in the flowchart and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims (10)

1. A remote sensing target detection method based on image semantic feature constraint is characterized by comprising the following steps:
step 1): extracting the characteristics of the input image by adopting a depth residual error network ResNet50 and a characteristic pyramid network, and fusing to obtain a multi-scale characteristic map fusing multi-scale information;
step 2): filtering the negative samples by the multi-scale feature map obtained by fusion through a center estimation module, and combining the center feature map output by the center estimation module and the feature map input into the center estimation module to filter out the negative samples to obtain an image semantic feature map with the negative samples filtered out;
step 3): utilizing the extracted image semantic features to restrict the generation of anchor points in any direction, generating anchors in the image semantic feature map with negative samples filtered, rotating a candidate region generation network based on the anchors generated by the anchor points to obtain candidate regions, and extracting feature vectors with uniform sizes for each candidate region by rotating an interest region aggregation layer;
step 4): and (3) aiming at the feature vectors with uniform size extracted from each candidate region, completing classification and regression tasks by utilizing two full-connection layer branches respectively to obtain the detection result and the detection position of each candidate region in the input remote sensing image.
2. The method for detecting the remote sensing target constrained by the image semantic features according to claim 1, wherein the detailed steps of the step 1) comprise the following steps: down-sampling: downsampling the input remote sensing image through a depth residual network ResNet50, calling a layer with the unchanged feature map size of a depth residual network ResNet50 as a stage, and obtaining 4 stages of feature maps C2, C3, C4 and C5 with 4 scales; and (3) upsampling: forming feature maps C2, C3, C4 and C5 of 4 scales into a feature pyramid network, performing 2-time upsampling on the feature map C5 by using bilinear interpolation, fixing the feature dimension to be 256 by using 1-by-1 convolution layers on the feature map C4, and finally adding the feature maps of two stages with the same size according to elements to obtain a fused feature map F4; 2 times of upsampling is carried out on the feature map F4, the feature dimension is fixed to be 256, then the feature dimension is fixed to be 256 on the feature map C3, and the feature map F3 is obtained by adding the two according to elements; 2 times of upsampling is carried out on the feature diagram F3, the feature dimension 256 is fixed, then the feature dimension 256 is fixed on the feature diagram C2, the feature diagram F2 fusing high-order features and low-order features is obtained by adding the two according to elements, and the feature diagram F2 is output as a feature diagram fusing multi-scale information.
3. The method for detecting the remote sensing target constrained by the image semantic features according to claim 1, wherein the center estimation module in the step 2) is composed of a 1 x 1 convolution layer and a sigmoid active layer operated according to elements, and is used for converting the input feature map of the fused multi-scale information into a center feature map which is consistent in size and shows the existence probability of the positive sample, and multiplying the input feature map of the fused multi-scale information and the center feature map according to elements to obtain a final feature map in which the area element value of the negative sample is close to 0, and the element value of the area of the positive sample is approximately kept unchanged.
4. The method for detecting the remote sensing target constrained by the image semantic features according to claim 3, characterized in that the method further comprises a step of training a central estimation module before the step 2), and a Focal Loss function Focal local is used for supervising branches of the central estimation module in the process of training the central estimation module, wherein the functional expression of the Focal Loss function Focal local is as follows:
fl=-(1-p)αlog(p)
in the above formula, fl is a function value of a Focal Loss function Focal local, p represents a probability that a sample is a positive sample, and α is a coefficient, wherein the positive sample is a sample in which the intersection-sum ratio of a preset anchor point to a real frame in a remote sensing image is higher than a threshold, and the negative sample is a sample in which the intersection-sum ratio of the preset anchor point to the real frame in the remote sensing image is lower than the threshold.
5. The method for detecting the remote sensing target constrained by the image semantic features according to claim 1, wherein the rotation candidate region generation network in the step 3) comprises a 3 × 3 convolution layer and two 1 × 1 convolution layers, and the rotation candidate region generation network is used for outputting the feature map through the 3 × 3 convolution layer to obtain feature maps consistent with H and W of the input feature map, and respectively passing the feature maps through the two 1 × 1 convolution layers to obtain two groups of feature maps respectively containing category information and position information.
6. The method for detecting the remote sensing target constrained by the image semantic features according to claim 5, characterized in that a step of training a rotation candidate area to generate a network is further included before step 3), the rotation candidate area is trained to generate a network to generate a candidate area, whether the candidate area is a positive sample or not needs to be judged when calculating the intersection ratio of the candidate area and a real frame, and the judgment principle is as follows: for the positive sample of the candidate region, the following requirements are satisfied: 1) the intersection ratio of the frame and the real frame is highest or is more than 0.7; 2) the included angle between the frame and the real frame is less than pi/12; for the candidate region negative examples, one of them needs to be satisfied: 1) the intersection ratio of the frame and the real frame is less than 0.3; 2) the intersection ratio of the positive sample and the negative sample to the real frame is more than 0.7, but the included angle between the positive sample and the real frame is more than pi/12, then the oblique intersection ratio is calculated for all the candidate areas of the positive sample and the negative sample, and the candidate areas which do not satisfy the positive sample and the negative sample do not participate in training; and then, inputting the feature vectors with uniform size output by the rotating interest region layer into a full convolution network pair, using a focusing loss function to supervise the rotating candidate region generation network, and repeating the process to finally finish the training of the rotating candidate region generation network.
7. The image semantic feature constrained remote sensing target detection method according to claim 1, characterized in that, when completing classification and regression tasks by using two full-connected layer branches in step 4), the classification task uses a Loss function Softmax Loss supervision network to complete training, the regression task of the frame uses a Loss function Smooth L1 Loss, and a function expression used for calculating regression variables is as follows:
Figure FDA0002700045420000021
Figure FDA0002700045420000022
tθ=θ-θa
Figure FDA0002700045420000023
Figure FDA0002700045420000024
Figure FDA0002700045420000025
in the above formula, (x, y, w, h, θ) are respectively the horizontal coordinate, vertical coordinate, frame width, frame height, and rotation angle of the central point of the predicted target frame, (x)a,ya,wa,ha,θa) Respectively the abscissa, ordinate, frame width, frame height and rotation angle of the center point of the anchor point frame (x)*,y*,w*,h*,θ*) The horizontal coordinate, the vertical coordinate, the frame width, the frame height and the rotation angle of the center point of the real target frame are respectively. (t)x,ty,tw,th,tθ) To predict the offset of the bounding box from the anchor box,
Figure FDA0002700045420000026
the offset of the real frame and the anchor frame is obtained; and the regression task of the bounding box uses the Loss function Smooth L1 Loss to calculate the functional expression of the Loss for two offsets as follows:
Figure FDA0002700045420000031
in the above formula, the first and second carbon atoms are,
Figure FDA0002700045420000032
represents the total regression loss of the true offset versus the predicted offset,
Figure FDA0002700045420000033
for the offset of the real frame from the anchor frame, t ═ tx,ty,tw,th,tθ) In order to predict the offset of the frame and the anchor frame, (x, y, w, h, theta) are respectively the abscissa, ordinate, frame width, frame height and rotation angle of the center point of the predicted target frame,
Figure FDA0002700045420000034
is about t*Smooth L1 loss of-t, t*-t represents the difference of the true offset and the predicted offset, with a loss of smoothed L1 for any x
Figure FDA0002700045420000035
The function of (a) is expressed as follows:
Figure FDA0002700045420000036
8. an image semantic feature constrained remote sensing target detection system comprising a computer device comprising a microprocessor and a memory connected to each other, characterized in that the microprocessor of the computer device is programmed or configured to perform the steps of the image semantic feature constrained remote sensing target detection method according to any one of claims 1 to 7.
9. A remote sensing target detection system of image semantic feature constraint, comprising a computer device, wherein the computer device comprises a microprocessor and a memory which are connected with each other, and the remote sensing target detection system is characterized in that a computer program which is programmed or configured to execute the remote sensing target detection method of image semantic feature constraint according to any one of claims 1-7 is stored in the memory of the computer device.
10. A computer-readable storage medium having stored thereon steps programmed or configured to perform a method for remote sensing object detection constrained by the image semantic features of any one of claims 1-7.
CN202011018965.5A 2020-09-24 2020-09-24 Remote sensing target detection method based on image semantic feature constraint Active CN112101277B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011018965.5A CN112101277B (en) 2020-09-24 2020-09-24 Remote sensing target detection method based on image semantic feature constraint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011018965.5A CN112101277B (en) 2020-09-24 2020-09-24 Remote sensing target detection method based on image semantic feature constraint

Publications (2)

Publication Number Publication Date
CN112101277A true CN112101277A (en) 2020-12-18
CN112101277B CN112101277B (en) 2023-07-28

Family

ID=73755387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011018965.5A Active CN112101277B (en) 2020-09-24 2020-09-24 Remote sensing target detection method based on image semantic feature constraint

Country Status (1)

Country Link
CN (1) CN112101277B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112700444A (en) * 2021-02-19 2021-04-23 中国铁道科学研究院集团有限公司铁道建筑研究所 Bridge bolt detection method based on self-attention and central point regression model
CN112861744A (en) * 2021-02-20 2021-05-28 哈尔滨工程大学 Remote sensing image target rapid detection method based on rotation anchor point clustering
CN113095188A (en) * 2021-04-01 2021-07-09 山东捷讯通信技术有限公司 Deep learning-based Raman spectrum data analysis method and device
CN113111740A (en) * 2021-03-27 2021-07-13 西北工业大学 Characteristic weaving method for remote sensing image target detection
CN113420819A (en) * 2021-06-25 2021-09-21 西北工业大学 Lightweight underwater target detection method based on CenterNet
CN113468993A (en) * 2021-06-21 2021-10-01 天津大学 Remote sensing image target detection method based on deep learning
CN113468968A (en) * 2021-06-02 2021-10-01 中国地质大学(武汉) Remote sensing image rotating target detection method based on non-anchor frame
CN113505806A (en) * 2021-06-02 2021-10-15 北京化工大学 Robot grabbing detection method
CN113792357A (en) * 2021-09-09 2021-12-14 重庆大学 Tree growth model construction method and computer storage medium
CN114240946A (en) * 2022-02-28 2022-03-25 南京智莲森信息技术有限公司 Locator abnormality detection method, system, storage medium and computing device
CN117094343A (en) * 2023-10-19 2023-11-21 成都新西旺自动化科技有限公司 QR code decoding system and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
CN110909642A (en) * 2019-11-13 2020-03-24 南京理工大学 Remote sensing image target detection method based on multi-scale semantic feature fusion
CN111091105A (en) * 2019-12-23 2020-05-01 郑州轻工业大学 Remote sensing image target detection method based on new frame regression loss function

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
CN110909642A (en) * 2019-11-13 2020-03-24 南京理工大学 Remote sensing image target detection method based on multi-scale semantic feature fusion
CN111091105A (en) * 2019-12-23 2020-05-01 郑州轻工业大学 Remote sensing image target detection method based on new frame regression loss function

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
戴媛;易本顺;肖进胜;雷俊锋;童乐;程志钦;: "基于改进旋转区域生成网络的遥感图像目标检测", 光学学报, no. 01 *
王昌安;田金文;张强;张英辉;: "深度学习遥感影像近岸舰船识别方法", 遥感信息, no. 02 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112700444B (en) * 2021-02-19 2023-06-23 中国铁道科学研究院集团有限公司铁道建筑研究所 Bridge bolt detection method based on self-attention and central point regression model
CN112700444A (en) * 2021-02-19 2021-04-23 中国铁道科学研究院集团有限公司铁道建筑研究所 Bridge bolt detection method based on self-attention and central point regression model
CN112861744A (en) * 2021-02-20 2021-05-28 哈尔滨工程大学 Remote sensing image target rapid detection method based on rotation anchor point clustering
CN112861744B (en) * 2021-02-20 2022-06-17 哈尔滨工程大学 Remote sensing image target rapid detection method based on rotation anchor point clustering
CN113111740A (en) * 2021-03-27 2021-07-13 西北工业大学 Characteristic weaving method for remote sensing image target detection
CN113095188A (en) * 2021-04-01 2021-07-09 山东捷讯通信技术有限公司 Deep learning-based Raman spectrum data analysis method and device
CN113468968A (en) * 2021-06-02 2021-10-01 中国地质大学(武汉) Remote sensing image rotating target detection method based on non-anchor frame
CN113505806A (en) * 2021-06-02 2021-10-15 北京化工大学 Robot grabbing detection method
CN113505806B (en) * 2021-06-02 2023-12-15 北京化工大学 Robot grabbing detection method
CN113468993A (en) * 2021-06-21 2021-10-01 天津大学 Remote sensing image target detection method based on deep learning
CN113420819A (en) * 2021-06-25 2021-09-21 西北工业大学 Lightweight underwater target detection method based on CenterNet
CN113792357A (en) * 2021-09-09 2021-12-14 重庆大学 Tree growth model construction method and computer storage medium
CN113792357B (en) * 2021-09-09 2023-09-05 重庆大学 Tree growth model construction method and computer storage medium
CN114240946A (en) * 2022-02-28 2022-03-25 南京智莲森信息技术有限公司 Locator abnormality detection method, system, storage medium and computing device
CN117094343A (en) * 2023-10-19 2023-11-21 成都新西旺自动化科技有限公司 QR code decoding system and method
CN117094343B (en) * 2023-10-19 2023-12-29 成都新西旺自动化科技有限公司 QR code decoding system and method

Also Published As

Publication number Publication date
CN112101277B (en) 2023-07-28

Similar Documents

Publication Publication Date Title
CN112101277B (en) Remote sensing target detection method based on image semantic feature constraint
CN111738110A (en) Remote sensing image vehicle target detection method based on multi-scale attention mechanism
CN111612008B (en) Image segmentation method based on convolution network
CN111626176B (en) Remote sensing target rapid detection method and system based on dynamic attention mechanism
CN111738995B (en) RGBD image-based target detection method and device and computer equipment
CN110516514B (en) Modeling method and device of target detection model
CN111738055B (en) Multi-category text detection system and bill form detection method based on same
CN115457395A (en) Lightweight remote sensing target detection method based on channel attention and multi-scale feature fusion
CN111274981B (en) Target detection network construction method and device and target detection method
WO2023116632A1 (en) Video instance segmentation method and apparatus based on spatio-temporal memory information
CN114627052A (en) Infrared image air leakage and liquid leakage detection method and system based on deep learning
CN113313094B (en) Vehicle-mounted image target detection method and system based on convolutional neural network
CN111126278A (en) Target detection model optimization and acceleration method for few-category scene
CN115457415A (en) Target detection method and device based on YOLO-X model, electronic equipment and storage medium
CN112991364A (en) Road scene semantic segmentation method based on convolution neural network cross-modal fusion
Sofla et al. Road extraction from satellite and aerial image using SE-Unet
CN115439718A (en) Industrial detection method, system and storage medium combining supervised learning and feature matching technology
Li et al. SPCS: a spatial pyramid convolutional shuffle module for YOLO to detect occluded object
Wan et al. Small object detection leveraging density‐aware scale adaptation
CN111339934A (en) Human head detection method integrating image preprocessing and deep learning target detection
CN117593548A (en) Visual SLAM method for removing dynamic feature points based on weighted attention mechanism
CN117789160A (en) Multi-mode fusion target detection method and system based on cluster optimization
CN115810020B (en) Semantic guidance-based coarse-to-fine remote sensing image segmentation method and system
CN117011819A (en) Lane line detection method, device and equipment based on feature guidance attention
CN113569600A (en) Method and device for identifying weight of object, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant