CN116883862B - Multi-scale target detection method and device for optical remote sensing image - Google Patents

Multi-scale target detection method and device for optical remote sensing image Download PDF

Info

Publication number
CN116883862B
CN116883862B CN202310885531.2A CN202310885531A CN116883862B CN 116883862 B CN116883862 B CN 116883862B CN 202310885531 A CN202310885531 A CN 202310885531A CN 116883862 B CN116883862 B CN 116883862B
Authority
CN
China
Prior art keywords
feature
target
adaptive
channel
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310885531.2A
Other languages
Chinese (zh)
Other versions
CN116883862A (en
Inventor
宋红
李金夫
黄钰琪
刘磊
杨健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202310885531.2A priority Critical patent/CN116883862B/en
Publication of CN116883862A publication Critical patent/CN116883862A/en
Application granted granted Critical
Publication of CN116883862B publication Critical patent/CN116883862B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention discloses a method and a device for detecting a multi-scale target of an optical remote sensing image, wherein the method can adopt a function related to the size of the target, namely a self-adaptive adjusting factor, to adjust the position loss function value, balance the loss occupation ratio of targets with different scales in the training process and relieve the problem of long tail distribution of data under the condition of not increasing the calculation cost in the training iterative process of a target detection algorithm; the plug-and-play attention feature enhancement module solves the problem that the semantic features of the small targets are gradually lost along with deepening of the network hierarchy, so that the network adaptively adjusts the channel and the space direction weight to improve the extraction capacity of the features of the small targets, and the monitoring and detection effects of the network on the small-scale targets are improved.

Description

Multi-scale target detection method and device for optical remote sensing image
Technical Field
The invention relates to the technical field of optical remote sensing image target recognition, in particular to a method and a device for detecting optical remote sensing image multi-scale targets, which can relieve long tail distribution of data and have excellent monitoring and detecting effects on small-scale targets.
Background
The remote sensing image has rich surface feature information and becomes an important means for human to acquire the geospatial information, and is widely applied to the fields of environment monitoring, natural disaster monitoring, agriculture, city planning and the like. In recent years, with the rapid development of remote sensing satellite and sensor technologies, the generation and acquisition of high-resolution optical remote sensing images are more convenient, and meanwhile, the object detection technology is receiving a great deal of attention as an important method for reading the high-resolution optical remote sensing images and a key task in most application fields.
The high-resolution optical remote sensing image generally has a wide spatial range, a large number of small targets and dense distribution, and the existing target detection method still has defects in the aspect of processing the multi-scale change of the targets, and particularly limits the detection precision of the small-scale targets.
Researchers have developed a great deal of work around data enhancement, targeting mechanisms, feature enhancement networks, and loss function construction to solve the above problems.
For example, mateKisantal et al in its paper "Augmentation for small object detection" propose an enhanced strategy for copy-and-paste of small objects in a sample. YukangChen et al in its paper "Dynamic Scale Training forObject Detection" propose a collage data enhancement method based on a target loss function duty cycle feedback drive.
However, the method still adopts a processing mode with a natural image, and cannot be directly applied to high-resolution optical remote sensing images for multi-scale target detection.
Chang Xu et al in paper Detecting tiny objects in aerial images: A normalized Wassersteindistance and a new benchmark proposes a strategy for normalizing the Neisserian distance and sorting distribution, which can better improve label distribution, provide enough supervision information for a network and improve small target detection, but can still easily cause missed detection and false detection of large and medium targets when the method is directly applied to remote sensing image target detection.
Disclosure of Invention
In view of the foregoing, the present invention provides a method and apparatus for multi-scale object detection of an optical remote sensing image that overcomes or at least partially solves the foregoing problems. The method has better multi-scale target detection and recognition capability.
The invention provides the following scheme:
an optical remote sensing image multi-scale target detection method comprises the following steps:
1. the method for detecting the multi-scale target of the optical remote sensing image is characterized by comprising the following steps of:
determining an optical remote sensing image to be subjected to target detection;
inputting the optical remote sensing image into a target detection algorithm to perform multi-scale target detection so as to obtain a target detection result; the self-adaptive feature enhancement module based on an attention mechanism is arranged between a first feature layer and a feature utilization layer in the network structure of the target detection algorithm; the adaptive feature enhancement module is used for extracting the spatial attention weight and the channel attention weight, and is used for carrying out adaptive weighting on the spatial direction and the channel direction on the original feature map output by the first feature layer.
Preferably: the adaptive feature enhancement module is specifically configured to: generating a spatial attention weight matrix by carrying out pooling and convolution operation on the original feature map output by the first feature layer in the channel direction;
multiplying the spatial attention weight matrix with the original feature map, and carrying out feature adaptive weighting on the spatial direction.
Preferably: the adaptive feature enhancement module is specifically configured to, when generating the spatial attention weight matrix:
obtaining an original feature map output by the first feature layer, and respectively carrying out maximum pooling and average pooling on the original feature map in the channel direction;
splicing the pooled results along the channel direction, carrying out channel adjustment and feature fusion in a convolution mode, and normalizing the features by adopting an activation function to obtain the weight matrix of the spatial attention.
Preferably: the adaptive feature enhancement module is specifically configured to: carrying out further pooling, convolution and/or up-sampling operation on the feature map subjected to the self-adaptive weighting of the space direction to obtain a channel attention weight matrix;
multiplying the channel weight matrix with the feature map subjected to the self-adaptive weighting of the space direction to realize the self-adaptive weighting of the features of the channel direction.
Preferably: the adaptive feature enhancement module is specifically configured to, when generating the channel attention weight matrix:
carrying out average pooling on the feature map subjected to self-adaptive weighting of the space direction along the width direction and the height direction to generate the attention weight of the multi-region channel with the target size;
the 1 multiplied by 1 convolution is adopted to act on the pooled characteristics so as to activate the channel information therein, and the activation function is adopted to perform normalization processing so as to obtain the weight matrix of the channel attention.
Preferably: the target detection algorithm is as follows: the adaptive feature enhancement module based on the attention mechanism is added between a first feature layer and a feature utilization layer of a network structure of the YOLOX standard model.
Preferably: in the training iterative process of the target detection algorithm, the loss function of the target detection algorithm is an adjustment factor function based on target size self-adaptive feedback added on the basis of the position loss function, and the adjustment factor function is used for adjusting the balance between position loss, confidence loss and classification loss so as to dynamically adjust the position loss weights occupied by targets with different scales in the training process.
Preferably: and in the training iteration process of the target detection algorithm, the position loss of each target is multiplied by the adjusting factor function, wherein under the action of the adjusting factor function, the smaller the target size is, the larger the target position loss is, the larger the target size is, and the larger the target position loss is, the larger the target position loss is.
Preferably: the loss function is a function of adding the adjustment factor based on the target size adaptive feedback based on the position loss function used by the YOLOX standard model.
An optical remote sensing image multi-scale target detection device, comprising:
HDMI-CSI video interface adapter plate, TX2 processor carrier plate and TX2 processor; the HDMI-CSI video interface adapter plate is used for converting a standard HDMI video input source into a CSI-2 video interface and accessing the TX2 function carrier plate; the TX2 processor is used for carrying out real-time target detection based on the input video image by utilizing the optical remote sensing image multi-scale target detection method.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
according to the method and the device for detecting the multi-scale targets of the optical remote sensing image, provided by the embodiment of the application, under the condition that calculation cost is not increased, in the training iteration process of a target detection algorithm, a function related to the target size, namely a self-adaptive adjusting factor is adopted to adjust the position loss function value, so that the loss proportion of targets with different scales in the training process is balanced, and the problem of long tail distribution of data is solved; the plug-and-play attention feature enhancement module solves the problem that the semantic features of the small targets are gradually lost along with deepening of the network hierarchy, so that the network adaptively adjusts the channel and the space direction weight to improve the extraction capacity of the features of the small targets, and the monitoring and detection effects of the network on the small-scale targets are improved.
Of course, it is not necessary for any one product to practice the invention to achieve all of the advantages set forth above at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments will be briefly described below. It is evident that the drawings in the following description are only some embodiments of the present invention and that other drawings may be obtained from these drawings by those of ordinary skill in the art without inventive effort.
FIG. 1 is a flowchart of a method for detecting a multi-scale object in an optical remote sensing image according to an embodiment of the present invention;
FIG. 2 is a flow chart of OSA Loss construction provided by an embodiment of the present invention;
FIG. 3 is a network architecture diagram of an improved YOLOX provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of an optical remote sensing image multi-scale target detection device according to an embodiment of the present invention;
fig. 5 is a schematic diagram of an optical remote sensing image multi-scale target detection device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the invention, fall within the scope of protection of the invention.
Referring to fig. 1, a method for detecting a multi-scale object of an optical remote sensing image according to an embodiment of the present invention, as shown in fig. 1, may include:
s101: determining an optical remote sensing image to be subjected to target detection;
s102: inputting the optical remote sensing image into a target detection algorithm to perform multi-scale target detection so as to obtain a target detection result;
the self-adaptive feature enhancement module based on an attention mechanism is arranged between a first feature layer and a feature utilization layer in the network structure of the target detection algorithm; the adaptive feature enhancement module is used for extracting the spatial attention weight and the channel attention weight, and is used for carrying out adaptive weighting on the spatial direction and the channel direction on the original feature map output by the first feature layer.
Further, in the specific adaptive weighting process, the embodiment of the present application may provide that the adaptive feature enhancement module is specifically configured to:
generating a spatial attention weight matrix by carrying out pooling and convolution operation on the original feature map output by the first feature layer in the channel direction;
multiplying the spatial attention weight matrix with the original feature map, and carrying out feature adaptive weighting on the spatial direction.
The adaptive feature enhancement module is specifically configured to, when generating the spatial attention weight matrix:
obtaining an original feature map output by the first feature layer, and respectively carrying out maximum pooling and average pooling on the original feature map in the channel direction;
splicing the pooled results along the channel direction, carrying out channel adjustment and feature fusion in a convolution mode, and normalizing the features by adopting an activation function to obtain the weight matrix of the spatial attention.
The adaptive feature enhancement module is specifically configured to:
carrying out further pooling, convolution and/or up-sampling operation on the feature map subjected to the self-adaptive weighting of the space direction to obtain a channel attention weight matrix;
multiplying the channel weight matrix with the feature map subjected to the self-adaptive weighting of the space direction to realize the self-adaptive weighting of the features of the channel direction.
The adaptive feature enhancement module is specifically configured to, when generating the channel attention weight matrix:
carrying out average pooling on the feature map subjected to self-adaptive weighting of the space direction along the width direction and the height direction to generate the attention weight of the multi-region channel with the target size;
the 1 multiplied by 1 convolution is adopted to act on the pooled characteristics so as to activate the channel information therein, and the activation function is adopted to perform normalization processing so as to obtain the weight matrix of the channel attention.
The target detection algorithm is as follows: the adaptive feature enhancement module based on the attention mechanism is added between a first feature layer and a feature utilization layer of a network structure of the YOLOX standard model.
It will be appreciated that other similar object detection models may be used in the object detection algorithm provided in the embodiments of the present application.
According to the optical remote sensing image multi-scale target detection method provided by the embodiment of the application, an adaptive feature enhancement module based on an attention mechanism, namely Adaptive Feature Enhancement Module (AFEM), is introduced between a first feature layer and a feature utilization layer (PAN) of a main network (CSPdark). The module generates a spatial attention weight matrix by pooling and convolving the feature map in the channel direction.
The weight matrix is then multiplied by the original feature map to achieve adaptive weighting. Next, a channel attention weight matrix is obtained through further operations such as pooling, rolling and up-sampling. And finally, multiplying the channel weight matrix by the feature map subjected to the self-adaptive weighting of the space direction to realize the self-adaptive weighting of the features of the channel direction.
In order to enable the target detection algorithm provided by the embodiment of the application to be capable of adaptively adjusting the position loss function value according to the size of the target under the condition of not increasing calculation cost, the loss duty ratio of targets with different scales in the algorithm training process is balanced, and the problem of long tail distribution of data is solved. The embodiment of the application can also provide that in the training iteration process of the target detection algorithm, the loss function of the target detection algorithm is an adjustment factor function based on target size self-adaptive feedback is added on the basis of the position loss function, and the adjustment factor function is used for adjusting the balance between the position loss, the confidence loss and the classification loss so as to dynamically adjust the position loss weights occupied by targets with different scales in the training process.
Further, the loss function is a function of adding the adjustment factor based on the target size adaptive feedback based on a position loss function used by the YOLOX standard model.
By designing an adjustment factor function based on target size adaptive feedback, in combination with the position Loss in YOLOX, embodiments of the present application propose a Loss function named Object Scale Adaptive Loss (OSA Loss). The loss function can dynamically adjust the weight of the position loss of different scale targets in the YOLOX algorithm in the training process. The monitoring signals of the network to targets with different scales can be more sufficient, and the training of the target detection network can be more balanced.
The method for detecting the optical remote sensing image multi-scale target provided by the embodiment of the application is described in detail and verified by taking a network structure adopting a YOLOX standard model as a target detection algorithm as an example with reference to the accompanying drawings.
Compared with other YOLO series methods, the YOLOX standard model disclosed by the embodiment of the application introduces a new decoupling head and an anchor-free mechanism, so that the convergence speed of the network is accelerated and the algorithm performance and efficiency are improved under the condition that only a small amount of parameters are increased by the network. Specifically, firstly, using CSP Darknet-53 added with Fplus structure as a backbone network module, and carrying out feature extraction on an input image in a feature extraction stage; secondly, in the characteristic enhancement stage, a path aggregation module (PathAggregation Network, PAN) is adopted for characteristic enhancement; and then, inputting the enhanced features into a detection head to finally obtain a prediction result of the position coordinates, the belonging category and the confidence coefficient of the target.
The method provided by the embodiment of the application improves the yolox standard model, and the improvement point mainly comprises the step of adding the self-adaptive feature enhancement module based on the attention mechanism between the first feature layer and the feature utilization layer. The adjustment factor function based on the target size adaptive feedback is added on the basis of the position loss function, and two improvement points are respectively described in detail below.
An OSA Loss workflow diagram is shown in fig. 2. OSA Loss is built based on IoU position Loss using the YOLOX standard model, using a function of target size, i.e., an adaptive adjustment factor, to weigh the Loss weights taken up by different scale targets.
The influence factor function can be expressed as equation 1.
f(x)=αln(2-x)(1)
Wherein x represents the area of the target real frame normalized to [0,1] in the training sample, and alpha is a super parameter used for adjusting the balance between the position loss, the confidence loss and the classification loss.
During the training iteration, the position loss of each target is multiplied by the adjustment factor. Under the action of the influence factor function, the smaller the target size is, the larger the target position loss is, and conversely, the larger the target size is, the larger the target position loss is. In this way, the monitoring signals of the network for targets with different scales are more sufficient, and the training of the target detection network is more balanced. The calculation formula of OSA Loss can be expressed as formula 2.
Loss OSA =f(x)×(1-IOU 2 )=αln(2-x)×(1-IOU 2 )(2)
Since ln (2-x) ∈ [0, ln2]If alpha is set to be 1, the loss value of the whole position is reduced, the balance of the original various losses is destroyed, and in order to determine the proper alpha value, a great amount of experiments and mathematical analysis are carried out to finally obtain
As shown in fig. 3, which is a network structure diagram for improving YOLOX, the method establishes an adaptive feature enhancement module based on an attention mechanism between a first feature layer and a PAN layer of a YOLOX original network backbone CSPdarknet, and specifically includes:
step one: spatial attention weight extraction.
Firstly, a feature map output by a first feature layer of the CSPdark is obtained, and the feature map is subjected to maximum pooling and average pooling respectively in the channel direction, namely, the maximum value and the average value of each pixel point are respectively taken along the channel direction, so that the effects of aggregating the channel information of the feature map and highlighting the feature region are achieved.
The pooled results were then re-stitched along the channel direction and channel tuning and feature fusion were performed using 7 x 7 convolution.
And finally, normalizing the features by adopting a Sigmoid activation function to obtain a weight matrix of the spatial attention, and multiplying the spatial weight matrix by the original feature map to realize spatial direction self-adaptive weighting.
Step two: channel attention weight extraction.
Firstly, the characteristics output in the step one are subjected to average pooling along the width direction and the height direction, so that the attention weight of the multi-region channel with the size of 4 multiplied by 4 is generated, and the effects of gathering the space information of the characteristic diagram and improving the dimension weight information capacity of the channel are achieved.
Then, a 1 x 1 convolution is applied to the pooled features to fully activate the channel information therein. To simplify the channel attention weight adaptive weighting, the feature width and height is transformed from 4×4 to be consistent with the input h×w in an upsampling manner.
And finally, carrying out normalization processing by using a Sigmoid function to obtain a weight matrix of the channel attention, multiplying the channel weight matrix by the feature map weighted in the space direction, and realizing self-adaptive weighting of the channel direction.
In summary, the workflow of the optical remote sensing image multi-scale target detection method based on the improved YOLOX provided in the embodiment of the application includes the following steps:
the first step: experiments were performed from one or more of the disclosed optical remote sensing image target detection datasets, such as NWPU, VHR-10, LEVIR, DOTA, AI-TOD, etc., as required by the task.
The AI-TOD dataset (28036 pictures) was selected in this example, according to the training set: verification set: the test set is equal to 4:1: and 5, carrying out proportion division.
Where vt represents a target of 8×8 pixels in size, t represents a target of 8×8 < 16×16 pixels in size, s represents a target of 16×16 < 32×32 pixels in size, and m represents a target of > 32×32 pixels.
And a second step of: an improved YOLOX optical remote sensing image target detection model is built according to fig. 2 and 3. The method mainly comprises the steps of replacing original position Loss with an OSA Loss function, and embedding AFEM between a first feature layer and a feature utilization layer (PAN) of a backbone network (CSPdark).
And a third step of: and (3) inputting the divided training set in the first step, and performing data enhancement processing, wherein the data enhancement processing comprises image resolution unification, data normalization, random rotation transformation, random scale transformation, random tone transformation, mosaic and the like.
Fourth step: initializing network weights by adopting Gaussian distribution, setting the total training iteration number as 300, adjusting the model learning rate by adopting a cosine annealing strategy every 30 iterations, wherein the initial learning rate is 0.01, carrying out gradient update by adopting a random gradient descent algorithm, introducing a learning rate preheating strategy and setting the corresponding wakeup coefficient as 0.000005.
Fifthly, calculating Loss, drawing a P-R curve, comparing a model prediction result with a real result, and storing a weight file with relatively small Loss on a verification set.
Sixth step: and (3) performing model verification on the test set by adopting the weight file obtained in the fifth step, wherein the obtained target detection results are shown in tables 1 and 2.
TABLE 1 improved YOLOX target assay results
TABLE 2 improved YOLOX different class target detection accuracy table
Experimental results show that the method provided by the embodiment of the application can effectively improve the target detection precision of the optical remote sensing image.
In a word, the method for detecting the optical remote sensing image multi-scale targets can adaptively adjust the position loss function value according to the size of the targets without increasing calculation cost, balance the loss occupation ratio of the targets with different scales in the training process, and relieve the problem of long tail distribution of data; the plug-and-play attention feature enhancement module solves the problem that the semantic features of the small targets are gradually lost along with deepening of the network hierarchy, so that the network adaptively adjusts the channel and the space direction weight to improve the extraction capacity of the features of the small targets, and the monitoring and detection effects of the network on the small-scale targets are improved.
As shown in fig. 4, the embodiment of the present application may further provide an optical remote sensing image multi-scale target detection device, including: HDMI-CSI video interface adapter plate, TX2 processor carrier plate and TX2 processor; the HDMI-CSI video interface adapter plate is used for converting a standard HDMI video input source into a CSI-2 video interface and accessing the TX2 function carrier plate; the TX2 processor is used for carrying out real-time target detection based on the input video image by utilizing the optical remote sensing image multi-scale target detection method.
The device provided in the embodiment of the application may further include any other necessary hardware device when actually implemented, for example, in an actual application, the device may include an HDMI-CSI video interface adapter board, a TX2 processor carrier board, a TX2 processor, and a four-way USB expansion board. The HDMI-CSI video interface adapter plate converts a standard HDMI (type A) video input source into a CSI-2 video interface, the CSI-2 video interface is connected to the TX2 function carrier plate, and the TX2 processor performs real-time target detection based on an input video image; the USB four-way expansion board uses two ways of USB to RS422 serial port cables, one way is connected with the display control equipment to carry out bidirectional RS422 communication, and the other way is connected with the data chain to carry out bidirectional RS422 communication; and the processed result of the core processor (TX 2 processor) is output to a display through a Micro HDMI (type D), and the result is sent to a display controller through a serial port communication mode, so that the display of the target detection effect is realized.
The circuit board related to the device mainly comprises four parts:
firstly, for the HDMI to CSI2 interface board, the length is 49mm, the width is 35mm, a TOSHIBA company TC358743XBG chip is selected for conducting HDMI to CSI2 unidirectional transmission circuit design, the kernel voltage is 1.2V, the IO voltage is 1.8-3.3V, the HDMI voltage is 3.3V, the APLL voltage is 3.3V/2.5V, the I2C interface is packaged into BGA64, and the pin spacing is 0.65mm.
Secondly, the Jetson TX2 core carrier plate is 87mm long and 63mm wide, the number of the connectors is SEAM-50-02.0-S-08-2-A-K-TR, the number of the mounting holes is 4, the diameter of the mounting holes is 3.5mm, the distance between the mounting holes and the plate edge is 4mm, and the front height is 28mm; the front side is limited to be 4mm in height.
Thirdly, four USB docking stations are 29.5mm long and 18.9mm wide, FE1.1S is adopted as a main control IC to provide 4 USB2.0 interfaces, the characteristics of high performance, low power consumption, low cost and the like are achieved, an STT data transmission mode is adopted, the signal strength is as high as 10 meters, the SSOP is packaged, and the pin spacing is 0.64mm.
Fourthly, a MAX3490EESA is selected as a main control chip of the two paths of RS422 serial circuits, the power supply voltage is 3.3V, and the pin spacing of the packaged SOIC8 is 1.27mm; the matched level conversion chip selects TXS0102DCT, the power supply voltage range is 1.65-5.5, and the SM8 is packaged.
The embodiment of the application can also provide an optical remote sensing image multi-scale target detection device, which comprises a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is used for executing the steps of the optical remote sensing image multi-scale target detection method according to the instructions in the program codes.
As shown in fig. 5, an optical remote sensing image multi-scale target detection device provided in an embodiment of the present application may include: a processor 10, a memory 11, a communication interface 12 and a communication bus 13. The processor 10, the memory 11 and the communication interface 12 all complete communication with each other through a communication bus 13.
In the present embodiment, the processor 10 may be a central processing unit (Central Processing Unit, CPU), an asic, a dsp, a field programmable gate array, or other programmable logic device, etc.
The processor 10 may call a program stored in the memory 11, and in particular, the processor 10 may perform operations in an embodiment of the optical telemetry image multi-scale object detection method.
The memory 11 is used for storing one or more programs, and the programs may include program codes, where the program codes include computer operation instructions, and in this embodiment, at least the programs for implementing the following functions are stored in the memory 11:
determining an optical remote sensing image to be subjected to target detection;
inputting the optical remote sensing image into a target detection algorithm to perform multi-scale target detection so as to obtain a target detection result; the self-adaptive feature enhancement module based on an attention mechanism is arranged between a first feature layer and a feature utilization layer in the network structure of the target detection algorithm; the adaptive feature enhancement module is used for extracting the spatial attention weight and the channel attention weight, and is used for carrying out adaptive weighting on the spatial direction and the channel direction on the original feature map output by the first feature layer.
And/or a program for implementing the following functions:
determining an optical remote sensing image to be subjected to target detection;
inputting the optical remote sensing image into a target detection algorithm to perform multi-scale target detection so as to obtain a target detection result; in the training iterative process of the target detection algorithm, a loss function of the target detection algorithm is an adjustment factor function based on target size self-adaptive feedback is added on the basis of a position loss function, and the adjustment factor function is used for adjusting the balance between position loss, confidence loss and classification loss so as to dynamically adjust the position loss weights occupied by targets with different scales in the training process.
In one possible implementation, the memory 11 may include a storage program area and a storage data area, where the storage program area may store an operating system, and application programs required for at least one function (such as a file creation function, a data read-write function), and the like; the store data area may store data created during use, such as initialization data, etc.
In addition, the memory 11 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid-state storage device.
The communication interface 12 may be an interface of a communication module for interfacing with other devices or systems.
Of course, it should be noted that the structure shown in fig. 5 is not limited to the optical remote sensing image multi-scale object detection device in the embodiment of the present application, and in practical application, the optical remote sensing image multi-scale object detection device may include more or fewer components than those shown in fig. 5, or some components may be combined.
Embodiments of the present application may also provide a computer readable storage medium storing program code for performing the steps of the above-described optical remote sensing image multi-scale object detection method.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
From the description of the embodiments above, it will be apparent to those skilled in the art that the present application may be implemented in software plus the necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in the embodiments or some parts of the embodiments of the present application.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (7)

1. The method for detecting the multi-scale target of the optical remote sensing image is characterized by comprising the following steps of:
determining an optical remote sensing image to be subjected to target detection;
inputting the optical remote sensing image into a target detection algorithm to perform multi-scale target detection so as to obtain a target detection result;
the self-adaptive feature enhancement module based on an attention mechanism is arranged between a first feature layer and a feature utilization layer in the network structure of the target detection algorithm; the self-adaptive feature enhancement module is used for extracting the spatial attention weight and the channel attention weight, and is used for carrying out self-adaptive weighting on the spatial direction and the channel direction on the original feature map output by the first feature layer;
the adaptive feature enhancement module is specifically configured to:
carrying out further pooling, convolution and/or up-sampling operation on the feature map subjected to the self-adaptive weighting of the space direction to obtain a channel attention weight matrix;
multiplying the channel weight matrix with the feature map subjected to the self-adaptive weighting of the space direction to realize the self-adaptive weighting of the features of the channel direction;
the target detection algorithm is as follows: adding the adaptive feature enhancement module based on the attention mechanism between a first feature layer and a feature utilization layer of a network structure of the YOLOX standard model;
in the training iterative process of the target detection algorithm, the loss function of the target detection algorithm is an adjustment factor function based on target size self-adaptive feedback added on the basis of the position loss function, and the adjustment factor function is used for adjusting the balance between position loss, confidence loss and classification loss so as to dynamically adjust the position loss weights occupied by targets with different scales in the training process;
the adjustment factor function based on the target size adaptive feedback is represented by the following formula:
f(x)=αln(2―x)
wherein x represents the area of the target real frame normalized to [0,1] in the training sample, and alpha is a super parameter used for adjusting the balance between the position loss, the confidence loss and the classification loss.
2. The method for detecting a multi-scale object in an optical remote sensing image according to claim 1, wherein the adaptive feature enhancement module is specifically configured to:
generating a spatial attention weight matrix by carrying out pooling and convolution operation on the original feature map output by the first feature layer in the channel direction;
multiplying the spatial attention weight matrix with the original feature map, and carrying out feature adaptive weighting on the spatial direction.
3. The method according to claim 2, wherein the adaptive feature enhancement module is specifically configured to, when generating the spatial attention weighting matrix:
obtaining an original feature map output by the first feature layer, and respectively carrying out maximum pooling and average pooling on the original feature map in the channel direction;
splicing the pooled results along the channel direction, carrying out channel adjustment and feature fusion in a convolution mode, and normalizing the features by adopting an activation function to obtain the weight matrix of the spatial attention.
4. The method for multi-scale object detection of an optical remote sensing image according to claim 1, wherein the adaptive feature enhancement module is specifically configured to, when generating the channel attention weight matrix:
carrying out average pooling on the feature map subjected to self-adaptive weighting of the space direction along the width direction and the height direction to generate the attention weight of the multi-region channel with the target size;
the 1 multiplied by 1 convolution is adopted to act on the pooled characteristics so as to activate the channel information therein, and the activation function is adopted to perform normalization processing so as to obtain the weight matrix of the channel attention.
5. The method according to claim 1, wherein the position loss of each target is multiplied by the adjustment factor function during the training iteration of the target detection algorithm, and wherein the smaller the target size, the larger the target position loss is the larger the increase in the ratio, and the larger the target size is the larger the decrease in the target position loss is under the action of the adjustment factor function.
6. The method for multi-scale object detection of an optical remote sensing image according to claim 1, wherein the loss function is a function of an adjustment factor based on adaptive feedback of object size added on the basis of a position loss function used by a YOLOX standard model.
7. An optical remote sensing image multi-scale target detection device, comprising:
HDMI-CSI video interface adapter plate, TX2 processor carrier plate and TX2 processor; the HDMI-CSI video interface adapter plate is used for converting a standard HDMI video input source into a CSI-2 video interface and accessing the CSI-2 video interface to the TX2 functional carrier plate; the TX2 processor is configured to perform real-time object detection using the optical remote sensing image multi-scale object detection method of any one of claims 1 to 6 based on an input video image.
CN202310885531.2A 2023-07-19 2023-07-19 Multi-scale target detection method and device for optical remote sensing image Active CN116883862B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310885531.2A CN116883862B (en) 2023-07-19 2023-07-19 Multi-scale target detection method and device for optical remote sensing image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310885531.2A CN116883862B (en) 2023-07-19 2023-07-19 Multi-scale target detection method and device for optical remote sensing image

Publications (2)

Publication Number Publication Date
CN116883862A CN116883862A (en) 2023-10-13
CN116883862B true CN116883862B (en) 2024-02-23

Family

ID=88256458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310885531.2A Active CN116883862B (en) 2023-07-19 2023-07-19 Multi-scale target detection method and device for optical remote sensing image

Country Status (1)

Country Link
CN (1) CN116883862B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325120A (en) * 2020-02-09 2020-06-23 南通大学 Target detection method suitable for embedded system
WO2021139069A1 (en) * 2020-01-09 2021-07-15 南京信息工程大学 General target detection method for adaptive attention guidance mechanism
CN114187268A (en) * 2021-12-04 2022-03-15 北京工业大学 Obstacle detection method based on target detection and semantic segmentation fusion
CN114283336A (en) * 2021-12-27 2022-04-05 中国地质大学(武汉) Anchor-frame-free remote sensing image small target detection method based on mixed attention
CN115082855A (en) * 2022-06-20 2022-09-20 安徽工程大学 Pedestrian occlusion detection method based on improved YOLOX algorithm
CN115239946A (en) * 2022-06-30 2022-10-25 锋睿领创(珠海)科技有限公司 Small sample transfer learning training and target detection method, device, equipment and medium
CN115761409A (en) * 2022-11-24 2023-03-07 天翼数字生活科技有限公司 Fire detection method, device, equipment and medium based on deep learning
CN115830449A (en) * 2022-12-01 2023-03-21 北京理工大学重庆创新中心 Remote sensing target detection method with explicit contour guidance and spatial variation context enhancement
CN115861853A (en) * 2022-11-22 2023-03-28 西安工程大学 Transmission line bird nest detection method in complex environment based on improved yolox algorithm
CN115908295A (en) * 2022-11-10 2023-04-04 长春工业大学 Power grid insulator defect detection method and system based on deep learning
CN115995041A (en) * 2022-12-30 2023-04-21 清华大学深圳国际研究生院 Attention mechanism-based SAR image multi-scale ship target detection method and device
CN116258941A (en) * 2023-03-13 2023-06-13 西安电子科技大学 Yolox target detection lightweight improvement method based on Android platform
CN116385873A (en) * 2023-03-11 2023-07-04 北京理工大学 SAR small target detection based on coordinate-aware attention and spatial semantic context
CN116385876A (en) * 2023-03-29 2023-07-04 中国人民解放军战略支援部队信息工程大学 Optical remote sensing image ground object detection method based on YOLOX

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325111A (en) * 2020-01-23 2020-06-23 同济大学 Pedestrian re-identification method integrating inverse attention and multi-scale deep supervision
US20210383533A1 (en) * 2020-06-03 2021-12-09 Nvidia Corporation Machine-learning-based object detection system
US20230041290A1 (en) * 2021-08-06 2023-02-09 Yaim Cooper Training and generalization of a neural network

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021139069A1 (en) * 2020-01-09 2021-07-15 南京信息工程大学 General target detection method for adaptive attention guidance mechanism
CN111325120A (en) * 2020-02-09 2020-06-23 南通大学 Target detection method suitable for embedded system
CN114187268A (en) * 2021-12-04 2022-03-15 北京工业大学 Obstacle detection method based on target detection and semantic segmentation fusion
CN114283336A (en) * 2021-12-27 2022-04-05 中国地质大学(武汉) Anchor-frame-free remote sensing image small target detection method based on mixed attention
CN115082855A (en) * 2022-06-20 2022-09-20 安徽工程大学 Pedestrian occlusion detection method based on improved YOLOX algorithm
CN115239946A (en) * 2022-06-30 2022-10-25 锋睿领创(珠海)科技有限公司 Small sample transfer learning training and target detection method, device, equipment and medium
CN115908295A (en) * 2022-11-10 2023-04-04 长春工业大学 Power grid insulator defect detection method and system based on deep learning
CN115861853A (en) * 2022-11-22 2023-03-28 西安工程大学 Transmission line bird nest detection method in complex environment based on improved yolox algorithm
CN115761409A (en) * 2022-11-24 2023-03-07 天翼数字生活科技有限公司 Fire detection method, device, equipment and medium based on deep learning
CN115830449A (en) * 2022-12-01 2023-03-21 北京理工大学重庆创新中心 Remote sensing target detection method with explicit contour guidance and spatial variation context enhancement
CN115995041A (en) * 2022-12-30 2023-04-21 清华大学深圳国际研究生院 Attention mechanism-based SAR image multi-scale ship target detection method and device
CN116385873A (en) * 2023-03-11 2023-07-04 北京理工大学 SAR small target detection based on coordinate-aware attention and spatial semantic context
CN116258941A (en) * 2023-03-13 2023-06-13 西安电子科技大学 Yolox target detection lightweight improvement method based on Android platform
CN116385876A (en) * 2023-03-29 2023-07-04 中国人民解放军战略支援部队信息工程大学 Optical remote sensing image ground object detection method based on YOLOX

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ResiDualGAN: Resize-Residual DualGAN for Cross-Domain Remote Sensing Images Semantic Segmentation;Yang Zhao等;remote sensing;第1-20页 *
基于双注意力机制的遥感图像目标检测;周幸;陈立福;;计算机与现代化(第08期);全文 *
基于改进YOLOv5的复杂场景多目标检测;强栋等;电子测量技术;第82-90页 *
改进SSD的交通标志目标检测算法;肖丹东;陈劲杰;;软件导刊(第05期);全文 *
深度学习典型目标检测算法的改进综述;王鑫鹏等;计算机工程与应用;第42-57页 *

Also Published As

Publication number Publication date
CN116883862A (en) 2023-10-13

Similar Documents

Publication Publication Date Title
US11430134B2 (en) Hardware-based optical flow acceleration
GB2571825A (en) Semantic class localization digital environment
CN111352965B (en) Training method of sequence mining model, and processing method and equipment of sequence data
US20210294945A1 (en) Neural network control variates
US20220067512A1 (en) Fine-grained per-vector scaling for neural network quantization
WO2020019102A1 (en) Methods, systems, articles of manufacture and apparatus to train a neural network
CN111027576A (en) Cooperative significance detection method based on cooperative significance generation type countermeasure network
CN110175641A (en) Image-recognizing method, device, equipment and storage medium
US20220067530A1 (en) Fine-grained per-vector scaling for neural network quantization
Zhao et al. PCA dimensionality reduction method for image classification
US20230196806A1 (en) Methods, systems, articles of manufacture and apparatus to extract region of interest text from receipts
CN116681083A (en) Text data sensitive detection method, device, equipment and medium
CN116883862B (en) Multi-scale target detection method and device for optical remote sensing image
CN113610856B (en) Method and device for training image segmentation model and image segmentation
CN115049546A (en) Sample data processing method and device, electronic equipment and storage medium
Liu et al. Multi-task learning based on geometric invariance discriminative features
CN115034225A (en) Word processing method and device applied to medical field, electronic equipment and medium
CN110826726B (en) Target processing method, target processing device, target processing apparatus, and medium
CN113238975A (en) Memory, integrated circuit and board card for optimizing parameters of deep neural network
CN113361656A (en) Feature model generation method, system, device and storage medium
CN114692715A (en) Sample labeling method and device
CN114580625A (en) Method, apparatus, and computer-readable storage medium for training neural network
CN112949672A (en) Commodity identification method, commodity identification device, commodity identification equipment and computer readable storage medium
US11972188B2 (en) Rail power density aware standard cell placement for integrated circuits
US20230376659A1 (en) Vlsi placement optimization using self-supervised graph clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant