CN116883862B - Multi-scale target detection method and device for optical remote sensing image - Google Patents
Multi-scale target detection method and device for optical remote sensing image Download PDFInfo
- Publication number
- CN116883862B CN116883862B CN202310885531.2A CN202310885531A CN116883862B CN 116883862 B CN116883862 B CN 116883862B CN 202310885531 A CN202310885531 A CN 202310885531A CN 116883862 B CN116883862 B CN 116883862B
- Authority
- CN
- China
- Prior art keywords
- feature
- target
- adaptive
- channel
- target detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 93
- 230000003287 optical effect Effects 0.000 title claims abstract description 52
- 238000000034 method Methods 0.000 claims abstract description 48
- 238000012549 training Methods 0.000 claims abstract description 26
- 230000008569 process Effects 0.000 claims abstract description 16
- 238000012804 iterative process Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 61
- 230000003044 adaptive effect Effects 0.000 claims description 38
- 239000011159 matrix material Substances 0.000 claims description 32
- 238000011176 pooling Methods 0.000 claims description 20
- 230000007246 mechanism Effects 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 10
- 230000005477 standard model Effects 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 7
- 230000009471 action Effects 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 238000009826 distribution Methods 0.000 abstract description 9
- 230000000694 effects Effects 0.000 abstract description 8
- 238000012544 monitoring process Methods 0.000 abstract description 8
- 238000000605 extraction Methods 0.000 abstract description 7
- 238000004364 calculation method Methods 0.000 abstract description 5
- 238000004891 communication Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 6
- 238000003860 storage Methods 0.000 description 5
- 230000006872 improvement Effects 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 101150071456 CSI2 gene Proteins 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- WDLRUFUQRNWCPK-UHFFFAOYSA-N Tetraxetan Chemical compound OC(=O)CN1CCN(CC(O)=O)CCN(CC(O)=O)CCN(CC(O)=O)CC1 WDLRUFUQRNWCPK-UHFFFAOYSA-N 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- 230000003416 augmentation Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000003032 molecular docking Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/13—Satellite images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/52—Scale-space analysis, e.g. wavelet analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
- G06V10/765—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Abstract
The invention discloses a method and a device for detecting a multi-scale target of an optical remote sensing image, wherein the method can adopt a function related to the size of the target, namely a self-adaptive adjusting factor, to adjust the position loss function value, balance the loss occupation ratio of targets with different scales in the training process and relieve the problem of long tail distribution of data under the condition of not increasing the calculation cost in the training iterative process of a target detection algorithm; the plug-and-play attention feature enhancement module solves the problem that the semantic features of the small targets are gradually lost along with deepening of the network hierarchy, so that the network adaptively adjusts the channel and the space direction weight to improve the extraction capacity of the features of the small targets, and the monitoring and detection effects of the network on the small-scale targets are improved.
Description
Technical Field
The invention relates to the technical field of optical remote sensing image target recognition, in particular to a method and a device for detecting optical remote sensing image multi-scale targets, which can relieve long tail distribution of data and have excellent monitoring and detecting effects on small-scale targets.
Background
The remote sensing image has rich surface feature information and becomes an important means for human to acquire the geospatial information, and is widely applied to the fields of environment monitoring, natural disaster monitoring, agriculture, city planning and the like. In recent years, with the rapid development of remote sensing satellite and sensor technologies, the generation and acquisition of high-resolution optical remote sensing images are more convenient, and meanwhile, the object detection technology is receiving a great deal of attention as an important method for reading the high-resolution optical remote sensing images and a key task in most application fields.
The high-resolution optical remote sensing image generally has a wide spatial range, a large number of small targets and dense distribution, and the existing target detection method still has defects in the aspect of processing the multi-scale change of the targets, and particularly limits the detection precision of the small-scale targets.
Researchers have developed a great deal of work around data enhancement, targeting mechanisms, feature enhancement networks, and loss function construction to solve the above problems.
For example, mateKisantal et al in its paper "Augmentation for small object detection" propose an enhanced strategy for copy-and-paste of small objects in a sample. YukangChen et al in its paper "Dynamic Scale Training forObject Detection" propose a collage data enhancement method based on a target loss function duty cycle feedback drive.
However, the method still adopts a processing mode with a natural image, and cannot be directly applied to high-resolution optical remote sensing images for multi-scale target detection.
Chang Xu et al in paper Detecting tiny objects in aerial images: A normalized Wassersteindistance and a new benchmark proposes a strategy for normalizing the Neisserian distance and sorting distribution, which can better improve label distribution, provide enough supervision information for a network and improve small target detection, but can still easily cause missed detection and false detection of large and medium targets when the method is directly applied to remote sensing image target detection.
Disclosure of Invention
In view of the foregoing, the present invention provides a method and apparatus for multi-scale object detection of an optical remote sensing image that overcomes or at least partially solves the foregoing problems. The method has better multi-scale target detection and recognition capability.
The invention provides the following scheme:
an optical remote sensing image multi-scale target detection method comprises the following steps:
1. the method for detecting the multi-scale target of the optical remote sensing image is characterized by comprising the following steps of:
determining an optical remote sensing image to be subjected to target detection;
inputting the optical remote sensing image into a target detection algorithm to perform multi-scale target detection so as to obtain a target detection result; the self-adaptive feature enhancement module based on an attention mechanism is arranged between a first feature layer and a feature utilization layer in the network structure of the target detection algorithm; the adaptive feature enhancement module is used for extracting the spatial attention weight and the channel attention weight, and is used for carrying out adaptive weighting on the spatial direction and the channel direction on the original feature map output by the first feature layer.
Preferably: the adaptive feature enhancement module is specifically configured to: generating a spatial attention weight matrix by carrying out pooling and convolution operation on the original feature map output by the first feature layer in the channel direction;
multiplying the spatial attention weight matrix with the original feature map, and carrying out feature adaptive weighting on the spatial direction.
Preferably: the adaptive feature enhancement module is specifically configured to, when generating the spatial attention weight matrix:
obtaining an original feature map output by the first feature layer, and respectively carrying out maximum pooling and average pooling on the original feature map in the channel direction;
splicing the pooled results along the channel direction, carrying out channel adjustment and feature fusion in a convolution mode, and normalizing the features by adopting an activation function to obtain the weight matrix of the spatial attention.
Preferably: the adaptive feature enhancement module is specifically configured to: carrying out further pooling, convolution and/or up-sampling operation on the feature map subjected to the self-adaptive weighting of the space direction to obtain a channel attention weight matrix;
multiplying the channel weight matrix with the feature map subjected to the self-adaptive weighting of the space direction to realize the self-adaptive weighting of the features of the channel direction.
Preferably: the adaptive feature enhancement module is specifically configured to, when generating the channel attention weight matrix:
carrying out average pooling on the feature map subjected to self-adaptive weighting of the space direction along the width direction and the height direction to generate the attention weight of the multi-region channel with the target size;
the 1 multiplied by 1 convolution is adopted to act on the pooled characteristics so as to activate the channel information therein, and the activation function is adopted to perform normalization processing so as to obtain the weight matrix of the channel attention.
Preferably: the target detection algorithm is as follows: the adaptive feature enhancement module based on the attention mechanism is added between a first feature layer and a feature utilization layer of a network structure of the YOLOX standard model.
Preferably: in the training iterative process of the target detection algorithm, the loss function of the target detection algorithm is an adjustment factor function based on target size self-adaptive feedback added on the basis of the position loss function, and the adjustment factor function is used for adjusting the balance between position loss, confidence loss and classification loss so as to dynamically adjust the position loss weights occupied by targets with different scales in the training process.
Preferably: and in the training iteration process of the target detection algorithm, the position loss of each target is multiplied by the adjusting factor function, wherein under the action of the adjusting factor function, the smaller the target size is, the larger the target position loss is, the larger the target size is, and the larger the target position loss is, the larger the target position loss is.
Preferably: the loss function is a function of adding the adjustment factor based on the target size adaptive feedback based on the position loss function used by the YOLOX standard model.
An optical remote sensing image multi-scale target detection device, comprising:
HDMI-CSI video interface adapter plate, TX2 processor carrier plate and TX2 processor; the HDMI-CSI video interface adapter plate is used for converting a standard HDMI video input source into a CSI-2 video interface and accessing the TX2 function carrier plate; the TX2 processor is used for carrying out real-time target detection based on the input video image by utilizing the optical remote sensing image multi-scale target detection method.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
according to the method and the device for detecting the multi-scale targets of the optical remote sensing image, provided by the embodiment of the application, under the condition that calculation cost is not increased, in the training iteration process of a target detection algorithm, a function related to the target size, namely a self-adaptive adjusting factor is adopted to adjust the position loss function value, so that the loss proportion of targets with different scales in the training process is balanced, and the problem of long tail distribution of data is solved; the plug-and-play attention feature enhancement module solves the problem that the semantic features of the small targets are gradually lost along with deepening of the network hierarchy, so that the network adaptively adjusts the channel and the space direction weight to improve the extraction capacity of the features of the small targets, and the monitoring and detection effects of the network on the small-scale targets are improved.
Of course, it is not necessary for any one product to practice the invention to achieve all of the advantages set forth above at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments will be briefly described below. It is evident that the drawings in the following description are only some embodiments of the present invention and that other drawings may be obtained from these drawings by those of ordinary skill in the art without inventive effort.
FIG. 1 is a flowchart of a method for detecting a multi-scale object in an optical remote sensing image according to an embodiment of the present invention;
FIG. 2 is a flow chart of OSA Loss construction provided by an embodiment of the present invention;
FIG. 3 is a network architecture diagram of an improved YOLOX provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of an optical remote sensing image multi-scale target detection device according to an embodiment of the present invention;
fig. 5 is a schematic diagram of an optical remote sensing image multi-scale target detection device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the invention, fall within the scope of protection of the invention.
Referring to fig. 1, a method for detecting a multi-scale object of an optical remote sensing image according to an embodiment of the present invention, as shown in fig. 1, may include:
s101: determining an optical remote sensing image to be subjected to target detection;
s102: inputting the optical remote sensing image into a target detection algorithm to perform multi-scale target detection so as to obtain a target detection result;
the self-adaptive feature enhancement module based on an attention mechanism is arranged between a first feature layer and a feature utilization layer in the network structure of the target detection algorithm; the adaptive feature enhancement module is used for extracting the spatial attention weight and the channel attention weight, and is used for carrying out adaptive weighting on the spatial direction and the channel direction on the original feature map output by the first feature layer.
Further, in the specific adaptive weighting process, the embodiment of the present application may provide that the adaptive feature enhancement module is specifically configured to:
generating a spatial attention weight matrix by carrying out pooling and convolution operation on the original feature map output by the first feature layer in the channel direction;
multiplying the spatial attention weight matrix with the original feature map, and carrying out feature adaptive weighting on the spatial direction.
The adaptive feature enhancement module is specifically configured to, when generating the spatial attention weight matrix:
obtaining an original feature map output by the first feature layer, and respectively carrying out maximum pooling and average pooling on the original feature map in the channel direction;
splicing the pooled results along the channel direction, carrying out channel adjustment and feature fusion in a convolution mode, and normalizing the features by adopting an activation function to obtain the weight matrix of the spatial attention.
The adaptive feature enhancement module is specifically configured to:
carrying out further pooling, convolution and/or up-sampling operation on the feature map subjected to the self-adaptive weighting of the space direction to obtain a channel attention weight matrix;
multiplying the channel weight matrix with the feature map subjected to the self-adaptive weighting of the space direction to realize the self-adaptive weighting of the features of the channel direction.
The adaptive feature enhancement module is specifically configured to, when generating the channel attention weight matrix:
carrying out average pooling on the feature map subjected to self-adaptive weighting of the space direction along the width direction and the height direction to generate the attention weight of the multi-region channel with the target size;
the 1 multiplied by 1 convolution is adopted to act on the pooled characteristics so as to activate the channel information therein, and the activation function is adopted to perform normalization processing so as to obtain the weight matrix of the channel attention.
The target detection algorithm is as follows: the adaptive feature enhancement module based on the attention mechanism is added between a first feature layer and a feature utilization layer of a network structure of the YOLOX standard model.
It will be appreciated that other similar object detection models may be used in the object detection algorithm provided in the embodiments of the present application.
According to the optical remote sensing image multi-scale target detection method provided by the embodiment of the application, an adaptive feature enhancement module based on an attention mechanism, namely Adaptive Feature Enhancement Module (AFEM), is introduced between a first feature layer and a feature utilization layer (PAN) of a main network (CSPdark). The module generates a spatial attention weight matrix by pooling and convolving the feature map in the channel direction.
The weight matrix is then multiplied by the original feature map to achieve adaptive weighting. Next, a channel attention weight matrix is obtained through further operations such as pooling, rolling and up-sampling. And finally, multiplying the channel weight matrix by the feature map subjected to the self-adaptive weighting of the space direction to realize the self-adaptive weighting of the features of the channel direction.
In order to enable the target detection algorithm provided by the embodiment of the application to be capable of adaptively adjusting the position loss function value according to the size of the target under the condition of not increasing calculation cost, the loss duty ratio of targets with different scales in the algorithm training process is balanced, and the problem of long tail distribution of data is solved. The embodiment of the application can also provide that in the training iteration process of the target detection algorithm, the loss function of the target detection algorithm is an adjustment factor function based on target size self-adaptive feedback is added on the basis of the position loss function, and the adjustment factor function is used for adjusting the balance between the position loss, the confidence loss and the classification loss so as to dynamically adjust the position loss weights occupied by targets with different scales in the training process.
Further, the loss function is a function of adding the adjustment factor based on the target size adaptive feedback based on a position loss function used by the YOLOX standard model.
By designing an adjustment factor function based on target size adaptive feedback, in combination with the position Loss in YOLOX, embodiments of the present application propose a Loss function named Object Scale Adaptive Loss (OSA Loss). The loss function can dynamically adjust the weight of the position loss of different scale targets in the YOLOX algorithm in the training process. The monitoring signals of the network to targets with different scales can be more sufficient, and the training of the target detection network can be more balanced.
The method for detecting the optical remote sensing image multi-scale target provided by the embodiment of the application is described in detail and verified by taking a network structure adopting a YOLOX standard model as a target detection algorithm as an example with reference to the accompanying drawings.
Compared with other YOLO series methods, the YOLOX standard model disclosed by the embodiment of the application introduces a new decoupling head and an anchor-free mechanism, so that the convergence speed of the network is accelerated and the algorithm performance and efficiency are improved under the condition that only a small amount of parameters are increased by the network. Specifically, firstly, using CSP Darknet-53 added with Fplus structure as a backbone network module, and carrying out feature extraction on an input image in a feature extraction stage; secondly, in the characteristic enhancement stage, a path aggregation module (PathAggregation Network, PAN) is adopted for characteristic enhancement; and then, inputting the enhanced features into a detection head to finally obtain a prediction result of the position coordinates, the belonging category and the confidence coefficient of the target.
The method provided by the embodiment of the application improves the yolox standard model, and the improvement point mainly comprises the step of adding the self-adaptive feature enhancement module based on the attention mechanism between the first feature layer and the feature utilization layer. The adjustment factor function based on the target size adaptive feedback is added on the basis of the position loss function, and two improvement points are respectively described in detail below.
An OSA Loss workflow diagram is shown in fig. 2. OSA Loss is built based on IoU position Loss using the YOLOX standard model, using a function of target size, i.e., an adaptive adjustment factor, to weigh the Loss weights taken up by different scale targets.
The influence factor function can be expressed as equation 1.
f(x)=αln(2-x)(1)
Wherein x represents the area of the target real frame normalized to [0,1] in the training sample, and alpha is a super parameter used for adjusting the balance between the position loss, the confidence loss and the classification loss.
During the training iteration, the position loss of each target is multiplied by the adjustment factor. Under the action of the influence factor function, the smaller the target size is, the larger the target position loss is, and conversely, the larger the target size is, the larger the target position loss is. In this way, the monitoring signals of the network for targets with different scales are more sufficient, and the training of the target detection network is more balanced. The calculation formula of OSA Loss can be expressed as formula 2.
Loss OSA =f(x)×(1-IOU 2 )=αln(2-x)×(1-IOU 2 )(2)
Since ln (2-x) ∈ [0, ln2]If alpha is set to be 1, the loss value of the whole position is reduced, the balance of the original various losses is destroyed, and in order to determine the proper alpha value, a great amount of experiments and mathematical analysis are carried out to finally obtain
As shown in fig. 3, which is a network structure diagram for improving YOLOX, the method establishes an adaptive feature enhancement module based on an attention mechanism between a first feature layer and a PAN layer of a YOLOX original network backbone CSPdarknet, and specifically includes:
step one: spatial attention weight extraction.
Firstly, a feature map output by a first feature layer of the CSPdark is obtained, and the feature map is subjected to maximum pooling and average pooling respectively in the channel direction, namely, the maximum value and the average value of each pixel point are respectively taken along the channel direction, so that the effects of aggregating the channel information of the feature map and highlighting the feature region are achieved.
The pooled results were then re-stitched along the channel direction and channel tuning and feature fusion were performed using 7 x 7 convolution.
And finally, normalizing the features by adopting a Sigmoid activation function to obtain a weight matrix of the spatial attention, and multiplying the spatial weight matrix by the original feature map to realize spatial direction self-adaptive weighting.
Step two: channel attention weight extraction.
Firstly, the characteristics output in the step one are subjected to average pooling along the width direction and the height direction, so that the attention weight of the multi-region channel with the size of 4 multiplied by 4 is generated, and the effects of gathering the space information of the characteristic diagram and improving the dimension weight information capacity of the channel are achieved.
Then, a 1 x 1 convolution is applied to the pooled features to fully activate the channel information therein. To simplify the channel attention weight adaptive weighting, the feature width and height is transformed from 4×4 to be consistent with the input h×w in an upsampling manner.
And finally, carrying out normalization processing by using a Sigmoid function to obtain a weight matrix of the channel attention, multiplying the channel weight matrix by the feature map weighted in the space direction, and realizing self-adaptive weighting of the channel direction.
In summary, the workflow of the optical remote sensing image multi-scale target detection method based on the improved YOLOX provided in the embodiment of the application includes the following steps:
the first step: experiments were performed from one or more of the disclosed optical remote sensing image target detection datasets, such as NWPU, VHR-10, LEVIR, DOTA, AI-TOD, etc., as required by the task.
The AI-TOD dataset (28036 pictures) was selected in this example, according to the training set: verification set: the test set is equal to 4:1: and 5, carrying out proportion division.
Where vt represents a target of 8×8 pixels in size, t represents a target of 8×8 < 16×16 pixels in size, s represents a target of 16×16 < 32×32 pixels in size, and m represents a target of > 32×32 pixels.
And a second step of: an improved YOLOX optical remote sensing image target detection model is built according to fig. 2 and 3. The method mainly comprises the steps of replacing original position Loss with an OSA Loss function, and embedding AFEM between a first feature layer and a feature utilization layer (PAN) of a backbone network (CSPdark).
And a third step of: and (3) inputting the divided training set in the first step, and performing data enhancement processing, wherein the data enhancement processing comprises image resolution unification, data normalization, random rotation transformation, random scale transformation, random tone transformation, mosaic and the like.
Fourth step: initializing network weights by adopting Gaussian distribution, setting the total training iteration number as 300, adjusting the model learning rate by adopting a cosine annealing strategy every 30 iterations, wherein the initial learning rate is 0.01, carrying out gradient update by adopting a random gradient descent algorithm, introducing a learning rate preheating strategy and setting the corresponding wakeup coefficient as 0.000005.
Fifthly, calculating Loss, drawing a P-R curve, comparing a model prediction result with a real result, and storing a weight file with relatively small Loss on a verification set.
Sixth step: and (3) performing model verification on the test set by adopting the weight file obtained in the fifth step, wherein the obtained target detection results are shown in tables 1 and 2.
TABLE 1 improved YOLOX target assay results
TABLE 2 improved YOLOX different class target detection accuracy table
Experimental results show that the method provided by the embodiment of the application can effectively improve the target detection precision of the optical remote sensing image.
In a word, the method for detecting the optical remote sensing image multi-scale targets can adaptively adjust the position loss function value according to the size of the targets without increasing calculation cost, balance the loss occupation ratio of the targets with different scales in the training process, and relieve the problem of long tail distribution of data; the plug-and-play attention feature enhancement module solves the problem that the semantic features of the small targets are gradually lost along with deepening of the network hierarchy, so that the network adaptively adjusts the channel and the space direction weight to improve the extraction capacity of the features of the small targets, and the monitoring and detection effects of the network on the small-scale targets are improved.
As shown in fig. 4, the embodiment of the present application may further provide an optical remote sensing image multi-scale target detection device, including: HDMI-CSI video interface adapter plate, TX2 processor carrier plate and TX2 processor; the HDMI-CSI video interface adapter plate is used for converting a standard HDMI video input source into a CSI-2 video interface and accessing the TX2 function carrier plate; the TX2 processor is used for carrying out real-time target detection based on the input video image by utilizing the optical remote sensing image multi-scale target detection method.
The device provided in the embodiment of the application may further include any other necessary hardware device when actually implemented, for example, in an actual application, the device may include an HDMI-CSI video interface adapter board, a TX2 processor carrier board, a TX2 processor, and a four-way USB expansion board. The HDMI-CSI video interface adapter plate converts a standard HDMI (type A) video input source into a CSI-2 video interface, the CSI-2 video interface is connected to the TX2 function carrier plate, and the TX2 processor performs real-time target detection based on an input video image; the USB four-way expansion board uses two ways of USB to RS422 serial port cables, one way is connected with the display control equipment to carry out bidirectional RS422 communication, and the other way is connected with the data chain to carry out bidirectional RS422 communication; and the processed result of the core processor (TX 2 processor) is output to a display through a Micro HDMI (type D), and the result is sent to a display controller through a serial port communication mode, so that the display of the target detection effect is realized.
The circuit board related to the device mainly comprises four parts:
firstly, for the HDMI to CSI2 interface board, the length is 49mm, the width is 35mm, a TOSHIBA company TC358743XBG chip is selected for conducting HDMI to CSI2 unidirectional transmission circuit design, the kernel voltage is 1.2V, the IO voltage is 1.8-3.3V, the HDMI voltage is 3.3V, the APLL voltage is 3.3V/2.5V, the I2C interface is packaged into BGA64, and the pin spacing is 0.65mm.
Secondly, the Jetson TX2 core carrier plate is 87mm long and 63mm wide, the number of the connectors is SEAM-50-02.0-S-08-2-A-K-TR, the number of the mounting holes is 4, the diameter of the mounting holes is 3.5mm, the distance between the mounting holes and the plate edge is 4mm, and the front height is 28mm; the front side is limited to be 4mm in height.
Thirdly, four USB docking stations are 29.5mm long and 18.9mm wide, FE1.1S is adopted as a main control IC to provide 4 USB2.0 interfaces, the characteristics of high performance, low power consumption, low cost and the like are achieved, an STT data transmission mode is adopted, the signal strength is as high as 10 meters, the SSOP is packaged, and the pin spacing is 0.64mm.
Fourthly, a MAX3490EESA is selected as a main control chip of the two paths of RS422 serial circuits, the power supply voltage is 3.3V, and the pin spacing of the packaged SOIC8 is 1.27mm; the matched level conversion chip selects TXS0102DCT, the power supply voltage range is 1.65-5.5, and the SM8 is packaged.
The embodiment of the application can also provide an optical remote sensing image multi-scale target detection device, which comprises a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is used for executing the steps of the optical remote sensing image multi-scale target detection method according to the instructions in the program codes.
As shown in fig. 5, an optical remote sensing image multi-scale target detection device provided in an embodiment of the present application may include: a processor 10, a memory 11, a communication interface 12 and a communication bus 13. The processor 10, the memory 11 and the communication interface 12 all complete communication with each other through a communication bus 13.
In the present embodiment, the processor 10 may be a central processing unit (Central Processing Unit, CPU), an asic, a dsp, a field programmable gate array, or other programmable logic device, etc.
The processor 10 may call a program stored in the memory 11, and in particular, the processor 10 may perform operations in an embodiment of the optical telemetry image multi-scale object detection method.
The memory 11 is used for storing one or more programs, and the programs may include program codes, where the program codes include computer operation instructions, and in this embodiment, at least the programs for implementing the following functions are stored in the memory 11:
determining an optical remote sensing image to be subjected to target detection;
inputting the optical remote sensing image into a target detection algorithm to perform multi-scale target detection so as to obtain a target detection result; the self-adaptive feature enhancement module based on an attention mechanism is arranged between a first feature layer and a feature utilization layer in the network structure of the target detection algorithm; the adaptive feature enhancement module is used for extracting the spatial attention weight and the channel attention weight, and is used for carrying out adaptive weighting on the spatial direction and the channel direction on the original feature map output by the first feature layer.
And/or a program for implementing the following functions:
determining an optical remote sensing image to be subjected to target detection;
inputting the optical remote sensing image into a target detection algorithm to perform multi-scale target detection so as to obtain a target detection result; in the training iterative process of the target detection algorithm, a loss function of the target detection algorithm is an adjustment factor function based on target size self-adaptive feedback is added on the basis of a position loss function, and the adjustment factor function is used for adjusting the balance between position loss, confidence loss and classification loss so as to dynamically adjust the position loss weights occupied by targets with different scales in the training process.
In one possible implementation, the memory 11 may include a storage program area and a storage data area, where the storage program area may store an operating system, and application programs required for at least one function (such as a file creation function, a data read-write function), and the like; the store data area may store data created during use, such as initialization data, etc.
In addition, the memory 11 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid-state storage device.
The communication interface 12 may be an interface of a communication module for interfacing with other devices or systems.
Of course, it should be noted that the structure shown in fig. 5 is not limited to the optical remote sensing image multi-scale object detection device in the embodiment of the present application, and in practical application, the optical remote sensing image multi-scale object detection device may include more or fewer components than those shown in fig. 5, or some components may be combined.
Embodiments of the present application may also provide a computer readable storage medium storing program code for performing the steps of the above-described optical remote sensing image multi-scale object detection method.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
From the description of the embodiments above, it will be apparent to those skilled in the art that the present application may be implemented in software plus the necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in the embodiments or some parts of the embodiments of the present application.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.
Claims (7)
1. The method for detecting the multi-scale target of the optical remote sensing image is characterized by comprising the following steps of:
determining an optical remote sensing image to be subjected to target detection;
inputting the optical remote sensing image into a target detection algorithm to perform multi-scale target detection so as to obtain a target detection result;
the self-adaptive feature enhancement module based on an attention mechanism is arranged between a first feature layer and a feature utilization layer in the network structure of the target detection algorithm; the self-adaptive feature enhancement module is used for extracting the spatial attention weight and the channel attention weight, and is used for carrying out self-adaptive weighting on the spatial direction and the channel direction on the original feature map output by the first feature layer;
the adaptive feature enhancement module is specifically configured to:
carrying out further pooling, convolution and/or up-sampling operation on the feature map subjected to the self-adaptive weighting of the space direction to obtain a channel attention weight matrix;
multiplying the channel weight matrix with the feature map subjected to the self-adaptive weighting of the space direction to realize the self-adaptive weighting of the features of the channel direction;
the target detection algorithm is as follows: adding the adaptive feature enhancement module based on the attention mechanism between a first feature layer and a feature utilization layer of a network structure of the YOLOX standard model;
in the training iterative process of the target detection algorithm, the loss function of the target detection algorithm is an adjustment factor function based on target size self-adaptive feedback added on the basis of the position loss function, and the adjustment factor function is used for adjusting the balance between position loss, confidence loss and classification loss so as to dynamically adjust the position loss weights occupied by targets with different scales in the training process;
the adjustment factor function based on the target size adaptive feedback is represented by the following formula:
f(x)=αln(2―x)
wherein x represents the area of the target real frame normalized to [0,1] in the training sample, and alpha is a super parameter used for adjusting the balance between the position loss, the confidence loss and the classification loss.
2. The method for detecting a multi-scale object in an optical remote sensing image according to claim 1, wherein the adaptive feature enhancement module is specifically configured to:
generating a spatial attention weight matrix by carrying out pooling and convolution operation on the original feature map output by the first feature layer in the channel direction;
multiplying the spatial attention weight matrix with the original feature map, and carrying out feature adaptive weighting on the spatial direction.
3. The method according to claim 2, wherein the adaptive feature enhancement module is specifically configured to, when generating the spatial attention weighting matrix:
obtaining an original feature map output by the first feature layer, and respectively carrying out maximum pooling and average pooling on the original feature map in the channel direction;
splicing the pooled results along the channel direction, carrying out channel adjustment and feature fusion in a convolution mode, and normalizing the features by adopting an activation function to obtain the weight matrix of the spatial attention.
4. The method for multi-scale object detection of an optical remote sensing image according to claim 1, wherein the adaptive feature enhancement module is specifically configured to, when generating the channel attention weight matrix:
carrying out average pooling on the feature map subjected to self-adaptive weighting of the space direction along the width direction and the height direction to generate the attention weight of the multi-region channel with the target size;
the 1 multiplied by 1 convolution is adopted to act on the pooled characteristics so as to activate the channel information therein, and the activation function is adopted to perform normalization processing so as to obtain the weight matrix of the channel attention.
5. The method according to claim 1, wherein the position loss of each target is multiplied by the adjustment factor function during the training iteration of the target detection algorithm, and wherein the smaller the target size, the larger the target position loss is the larger the increase in the ratio, and the larger the target size is the larger the decrease in the target position loss is under the action of the adjustment factor function.
6. The method for multi-scale object detection of an optical remote sensing image according to claim 1, wherein the loss function is a function of an adjustment factor based on adaptive feedback of object size added on the basis of a position loss function used by a YOLOX standard model.
7. An optical remote sensing image multi-scale target detection device, comprising:
HDMI-CSI video interface adapter plate, TX2 processor carrier plate and TX2 processor; the HDMI-CSI video interface adapter plate is used for converting a standard HDMI video input source into a CSI-2 video interface and accessing the CSI-2 video interface to the TX2 functional carrier plate; the TX2 processor is configured to perform real-time object detection using the optical remote sensing image multi-scale object detection method of any one of claims 1 to 6 based on an input video image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310885531.2A CN116883862B (en) | 2023-07-19 | 2023-07-19 | Multi-scale target detection method and device for optical remote sensing image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310885531.2A CN116883862B (en) | 2023-07-19 | 2023-07-19 | Multi-scale target detection method and device for optical remote sensing image |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116883862A CN116883862A (en) | 2023-10-13 |
CN116883862B true CN116883862B (en) | 2024-02-23 |
Family
ID=88256458
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310885531.2A Active CN116883862B (en) | 2023-07-19 | 2023-07-19 | Multi-scale target detection method and device for optical remote sensing image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116883862B (en) |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111325120A (en) * | 2020-02-09 | 2020-06-23 | 南通大学 | Target detection method suitable for embedded system |
WO2021139069A1 (en) * | 2020-01-09 | 2021-07-15 | 南京信息工程大学 | General target detection method for adaptive attention guidance mechanism |
CN114187268A (en) * | 2021-12-04 | 2022-03-15 | 北京工业大学 | Obstacle detection method based on target detection and semantic segmentation fusion |
CN114283336A (en) * | 2021-12-27 | 2022-04-05 | 中国地质大学(武汉) | Anchor-frame-free remote sensing image small target detection method based on mixed attention |
CN115082855A (en) * | 2022-06-20 | 2022-09-20 | 安徽工程大学 | Pedestrian occlusion detection method based on improved YOLOX algorithm |
CN115239946A (en) * | 2022-06-30 | 2022-10-25 | 锋睿领创(珠海)科技有限公司 | Small sample transfer learning training and target detection method, device, equipment and medium |
CN115761409A (en) * | 2022-11-24 | 2023-03-07 | 天翼数字生活科技有限公司 | Fire detection method, device, equipment and medium based on deep learning |
CN115830449A (en) * | 2022-12-01 | 2023-03-21 | 北京理工大学重庆创新中心 | Remote sensing target detection method with explicit contour guidance and spatial variation context enhancement |
CN115861853A (en) * | 2022-11-22 | 2023-03-28 | 西安工程大学 | Transmission line bird nest detection method in complex environment based on improved yolox algorithm |
CN115908295A (en) * | 2022-11-10 | 2023-04-04 | 长春工业大学 | Power grid insulator defect detection method and system based on deep learning |
CN115995041A (en) * | 2022-12-30 | 2023-04-21 | 清华大学深圳国际研究生院 | Attention mechanism-based SAR image multi-scale ship target detection method and device |
CN116258941A (en) * | 2023-03-13 | 2023-06-13 | 西安电子科技大学 | Yolox target detection lightweight improvement method based on Android platform |
CN116385873A (en) * | 2023-03-11 | 2023-07-04 | 北京理工大学 | SAR small target detection based on coordinate-aware attention and spatial semantic context |
CN116385876A (en) * | 2023-03-29 | 2023-07-04 | 中国人民解放军战略支援部队信息工程大学 | Optical remote sensing image ground object detection method based on YOLOX |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111325111A (en) * | 2020-01-23 | 2020-06-23 | 同济大学 | Pedestrian re-identification method integrating inverse attention and multi-scale deep supervision |
US20210383533A1 (en) * | 2020-06-03 | 2021-12-09 | Nvidia Corporation | Machine-learning-based object detection system |
US20230041290A1 (en) * | 2021-08-06 | 2023-02-09 | Yaim Cooper | Training and generalization of a neural network |
-
2023
- 2023-07-19 CN CN202310885531.2A patent/CN116883862B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021139069A1 (en) * | 2020-01-09 | 2021-07-15 | 南京信息工程大学 | General target detection method for adaptive attention guidance mechanism |
CN111325120A (en) * | 2020-02-09 | 2020-06-23 | 南通大学 | Target detection method suitable for embedded system |
CN114187268A (en) * | 2021-12-04 | 2022-03-15 | 北京工业大学 | Obstacle detection method based on target detection and semantic segmentation fusion |
CN114283336A (en) * | 2021-12-27 | 2022-04-05 | 中国地质大学(武汉) | Anchor-frame-free remote sensing image small target detection method based on mixed attention |
CN115082855A (en) * | 2022-06-20 | 2022-09-20 | 安徽工程大学 | Pedestrian occlusion detection method based on improved YOLOX algorithm |
CN115239946A (en) * | 2022-06-30 | 2022-10-25 | 锋睿领创(珠海)科技有限公司 | Small sample transfer learning training and target detection method, device, equipment and medium |
CN115908295A (en) * | 2022-11-10 | 2023-04-04 | 长春工业大学 | Power grid insulator defect detection method and system based on deep learning |
CN115861853A (en) * | 2022-11-22 | 2023-03-28 | 西安工程大学 | Transmission line bird nest detection method in complex environment based on improved yolox algorithm |
CN115761409A (en) * | 2022-11-24 | 2023-03-07 | 天翼数字生活科技有限公司 | Fire detection method, device, equipment and medium based on deep learning |
CN115830449A (en) * | 2022-12-01 | 2023-03-21 | 北京理工大学重庆创新中心 | Remote sensing target detection method with explicit contour guidance and spatial variation context enhancement |
CN115995041A (en) * | 2022-12-30 | 2023-04-21 | 清华大学深圳国际研究生院 | Attention mechanism-based SAR image multi-scale ship target detection method and device |
CN116385873A (en) * | 2023-03-11 | 2023-07-04 | 北京理工大学 | SAR small target detection based on coordinate-aware attention and spatial semantic context |
CN116258941A (en) * | 2023-03-13 | 2023-06-13 | 西安电子科技大学 | Yolox target detection lightweight improvement method based on Android platform |
CN116385876A (en) * | 2023-03-29 | 2023-07-04 | 中国人民解放军战略支援部队信息工程大学 | Optical remote sensing image ground object detection method based on YOLOX |
Non-Patent Citations (5)
Title |
---|
ResiDualGAN: Resize-Residual DualGAN for Cross-Domain Remote Sensing Images Semantic Segmentation;Yang Zhao等;remote sensing;第1-20页 * |
基于双注意力机制的遥感图像目标检测;周幸;陈立福;;计算机与现代化(第08期);全文 * |
基于改进YOLOv5的复杂场景多目标检测;强栋等;电子测量技术;第82-90页 * |
改进SSD的交通标志目标检测算法;肖丹东;陈劲杰;;软件导刊(第05期);全文 * |
深度学习典型目标检测算法的改进综述;王鑫鹏等;计算机工程与应用;第42-57页 * |
Also Published As
Publication number | Publication date |
---|---|
CN116883862A (en) | 2023-10-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11430134B2 (en) | Hardware-based optical flow acceleration | |
GB2571825A (en) | Semantic class localization digital environment | |
CN111352965B (en) | Training method of sequence mining model, and processing method and equipment of sequence data | |
US20210294945A1 (en) | Neural network control variates | |
US20220067512A1 (en) | Fine-grained per-vector scaling for neural network quantization | |
WO2020019102A1 (en) | Methods, systems, articles of manufacture and apparatus to train a neural network | |
CN111027576A (en) | Cooperative significance detection method based on cooperative significance generation type countermeasure network | |
CN110175641A (en) | Image-recognizing method, device, equipment and storage medium | |
US20220067530A1 (en) | Fine-grained per-vector scaling for neural network quantization | |
Zhao et al. | PCA dimensionality reduction method for image classification | |
US20230196806A1 (en) | Methods, systems, articles of manufacture and apparatus to extract region of interest text from receipts | |
CN116681083A (en) | Text data sensitive detection method, device, equipment and medium | |
CN116883862B (en) | Multi-scale target detection method and device for optical remote sensing image | |
CN113610856B (en) | Method and device for training image segmentation model and image segmentation | |
CN115049546A (en) | Sample data processing method and device, electronic equipment and storage medium | |
Liu et al. | Multi-task learning based on geometric invariance discriminative features | |
CN115034225A (en) | Word processing method and device applied to medical field, electronic equipment and medium | |
CN110826726B (en) | Target processing method, target processing device, target processing apparatus, and medium | |
CN113238975A (en) | Memory, integrated circuit and board card for optimizing parameters of deep neural network | |
CN113361656A (en) | Feature model generation method, system, device and storage medium | |
CN114692715A (en) | Sample labeling method and device | |
CN114580625A (en) | Method, apparatus, and computer-readable storage medium for training neural network | |
CN112949672A (en) | Commodity identification method, commodity identification device, commodity identification equipment and computer readable storage medium | |
US11972188B2 (en) | Rail power density aware standard cell placement for integrated circuits | |
US20230376659A1 (en) | Vlsi placement optimization using self-supervised graph clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |