CN116883862B

CN116883862B - Multi-scale target detection method and device for optical remote sensing image

Info

Publication number: CN116883862B
Application number: CN202310885531.2A
Authority: CN
Inventors: 宋红; 李金夫; 黄钰琪; 刘磊; 杨健
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2023-07-19
Filing date: 2023-07-19
Publication date: 2024-02-23
Anticipated expiration: 2043-07-19
Also published as: CN116883862A

Abstract

The invention discloses a method and a device for detecting a multi-scale target of an optical remote sensing image, wherein the method can adopt a function related to the size of the target, namely a self-adaptive adjusting factor, to adjust the position loss function value, balance the loss occupation ratio of targets with different scales in the training process and relieve the problem of long tail distribution of data under the condition of not increasing the calculation cost in the training iterative process of a target detection algorithm; the plug-and-play attention feature enhancement module solves the problem that the semantic features of the small targets are gradually lost along with deepening of the network hierarchy, so that the network adaptively adjusts the channel and the space direction weight to improve the extraction capacity of the features of the small targets, and the monitoring and detection effects of the network on the small-scale targets are improved.

Description

Multi-scale target detection method and device for optical remote sensing image

Technical Field

The invention relates to the technical field of optical remote sensing image target recognition, in particular to a method and a device for detecting optical remote sensing image multi-scale targets, which can relieve long tail distribution of data and have excellent monitoring and detecting effects on small-scale targets.

Background

The remote sensing image has rich surface feature information and becomes an important means for human to acquire the geospatial information, and is widely applied to the fields of environment monitoring, natural disaster monitoring, agriculture, city planning and the like. In recent years, with the rapid development of remote sensing satellite and sensor technologies, the generation and acquisition of high-resolution optical remote sensing images are more convenient, and meanwhile, the object detection technology is receiving a great deal of attention as an important method for reading the high-resolution optical remote sensing images and a key task in most application fields.

The high-resolution optical remote sensing image generally has a wide spatial range, a large number of small targets and dense distribution, and the existing target detection method still has defects in the aspect of processing the multi-scale change of the targets, and particularly limits the detection precision of the small-scale targets.

Researchers have developed a great deal of work around data enhancement, targeting mechanisms, feature enhancement networks, and loss function construction to solve the above problems.

For example, mateKisantal et al in its paper "Augmentation for small object detection" propose an enhanced strategy for copy-and-paste of small objects in a sample. YukangChen et al in its paper "Dynamic Scale Training forObject Detection" propose a collage data enhancement method based on a target loss function duty cycle feedback drive.

However, the method still adopts a processing mode with a natural image, and cannot be directly applied to high-resolution optical remote sensing images for multi-scale target detection.

Chang Xu et al in paper Detecting tiny objects in aerial images: A normalized Wassersteindistance and a new benchmark proposes a strategy for normalizing the Neisserian distance and sorting distribution, which can better improve label distribution, provide enough supervision information for a network and improve small target detection, but can still easily cause missed detection and false detection of large and medium targets when the method is directly applied to remote sensing image target detection.

Disclosure of Invention

In view of the foregoing, the present invention provides a method and apparatus for multi-scale object detection of an optical remote sensing image that overcomes or at least partially solves the foregoing problems. The method has better multi-scale target detection and recognition capability.

The invention provides the following scheme:

an optical remote sensing image multi-scale target detection method comprises the following steps:

1. the method for detecting the multi-scale target of the optical remote sensing image is characterized by comprising the following steps of:

determining an optical remote sensing image to be subjected to target detection;

inputting the optical remote sensing image into a target detection algorithm to perform multi-scale target detection so as to obtain a target detection result; the self-adaptive feature enhancement module based on an attention mechanism is arranged between a first feature layer and a feature utilization layer in the network structure of the target detection algorithm; the adaptive feature enhancement module is used for extracting the spatial attention weight and the channel attention weight, and is used for carrying out adaptive weighting on the spatial direction and the channel direction on the original feature map output by the first feature layer.

Preferably: the adaptive feature enhancement module is specifically configured to: generating a spatial attention weight matrix by carrying out pooling and convolution operation on the original feature map output by the first feature layer in the channel direction;

multiplying the spatial attention weight matrix with the original feature map, and carrying out feature adaptive weighting on the spatial direction.

Preferably: the adaptive feature enhancement module is specifically configured to, when generating the spatial attention weight matrix:

obtaining an original feature map output by the first feature layer, and respectively carrying out maximum pooling and average pooling on the original feature map in the channel direction;

splicing the pooled results along the channel direction, carrying out channel adjustment and feature fusion in a convolution mode, and normalizing the features by adopting an activation function to obtain the weight matrix of the spatial attention.

Preferably: the adaptive feature enhancement module is specifically configured to: carrying out further pooling, convolution and/or up-sampling operation on the feature map subjected to the self-adaptive weighting of the space direction to obtain a channel attention weight matrix;

multiplying the channel weight matrix with the feature map subjected to the self-adaptive weighting of the space direction to realize the self-adaptive weighting of the features of the channel direction.

Preferably: the adaptive feature enhancement module is specifically configured to, when generating the channel attention weight matrix:

carrying out average pooling on the feature map subjected to self-adaptive weighting of the space direction along the width direction and the height direction to generate the attention weight of the multi-region channel with the target size;

the 1 multiplied by 1 convolution is adopted to act on the pooled characteristics so as to activate the channel information therein, and the activation function is adopted to perform normalization processing so as to obtain the weight matrix of the channel attention.

Preferably: the target detection algorithm is as follows: the adaptive feature enhancement module based on the attention mechanism is added between a first feature layer and a feature utilization layer of a network structure of the YOLOX standard model.

Preferably: in the training iterative process of the target detection algorithm, the loss function of the target detection algorithm is an adjustment factor function based on target size self-adaptive feedback added on the basis of the position loss function, and the adjustment factor function is used for adjusting the balance between position loss, confidence loss and classification loss so as to dynamically adjust the position loss weights occupied by targets with different scales in the training process.

Preferably: and in the training iteration process of the target detection algorithm, the position loss of each target is multiplied by the adjusting factor function, wherein under the action of the adjusting factor function, the smaller the target size is, the larger the target position loss is, the larger the target size is, and the larger the target position loss is, the larger the target position loss is.

Preferably: the loss function is a function of adding the adjustment factor based on the target size adaptive feedback based on the position loss function used by the YOLOX standard model.

An optical remote sensing image multi-scale target detection device, comprising:

HDMI-CSI video interface adapter plate, TX2 processor carrier plate and TX2 processor; the HDMI-CSI video interface adapter plate is used for converting a standard HDMI video input source into a CSI-2 video interface and accessing the TX2 function carrier plate; the TX2 processor is used for carrying out real-time target detection based on the input video image by utilizing the optical remote sensing image multi-scale target detection method.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

according to the method and the device for detecting the multi-scale targets of the optical remote sensing image, provided by the embodiment of the application, under the condition that calculation cost is not increased, in the training iteration process of a target detection algorithm, a function related to the target size, namely a self-adaptive adjusting factor is adopted to adjust the position loss function value, so that the loss proportion of targets with different scales in the training process is balanced, and the problem of long tail distribution of data is solved; the plug-and-play attention feature enhancement module solves the problem that the semantic features of the small targets are gradually lost along with deepening of the network hierarchy, so that the network adaptively adjusts the channel and the space direction weight to improve the extraction capacity of the features of the small targets, and the monitoring and detection effects of the network on the small-scale targets are improved.

Of course, it is not necessary for any one product to practice the invention to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments will be briefly described below. It is evident that the drawings in the following description are only some embodiments of the present invention and that other drawings may be obtained from these drawings by those of ordinary skill in the art without inventive effort.

FIG. 1 is a flowchart of a method for detecting a multi-scale object in an optical remote sensing image according to an embodiment of the present invention;

FIG. 2 is a flow chart of OSA Loss construction provided by an embodiment of the present invention;

FIG. 3 is a network architecture diagram of an improved YOLOX provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram of an optical remote sensing image multi-scale target detection device according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an optical remote sensing image multi-scale target detection device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the invention, fall within the scope of protection of the invention.

Referring to fig. 1, a method for detecting a multi-scale object of an optical remote sensing image according to an embodiment of the present invention, as shown in fig. 1, may include:

s101: determining an optical remote sensing image to be subjected to target detection;

s102: inputting the optical remote sensing image into a target detection algorithm to perform multi-scale target detection so as to obtain a target detection result;

the self-adaptive feature enhancement module based on an attention mechanism is arranged between a first feature layer and a feature utilization layer in the network structure of the target detection algorithm; the adaptive feature enhancement module is used for extracting the spatial attention weight and the channel attention weight, and is used for carrying out adaptive weighting on the spatial direction and the channel direction on the original feature map output by the first feature layer.

Further, in the specific adaptive weighting process, the embodiment of the present application may provide that the adaptive feature enhancement module is specifically configured to:

generating a spatial attention weight matrix by carrying out pooling and convolution operation on the original feature map output by the first feature layer in the channel direction;

The adaptive feature enhancement module is specifically configured to, when generating the spatial attention weight matrix:

The adaptive feature enhancement module is specifically configured to:

carrying out further pooling, convolution and/or up-sampling operation on the feature map subjected to the self-adaptive weighting of the space direction to obtain a channel attention weight matrix;

The adaptive feature enhancement module is specifically configured to, when generating the channel attention weight matrix:

The target detection algorithm is as follows: the adaptive feature enhancement module based on the attention mechanism is added between a first feature layer and a feature utilization layer of a network structure of the YOLOX standard model.

It will be appreciated that other similar object detection models may be used in the object detection algorithm provided in the embodiments of the present application.

According to the optical remote sensing image multi-scale target detection method provided by the embodiment of the application, an adaptive feature enhancement module based on an attention mechanism, namely Adaptive Feature Enhancement Module (AFEM), is introduced between a first feature layer and a feature utilization layer (PAN) of a main network (CSPdark). The module generates a spatial attention weight matrix by pooling and convolving the feature map in the channel direction.

The weight matrix is then multiplied by the original feature map to achieve adaptive weighting. Next, a channel attention weight matrix is obtained through further operations such as pooling, rolling and up-sampling. And finally, multiplying the channel weight matrix by the feature map subjected to the self-adaptive weighting of the space direction to realize the self-adaptive weighting of the features of the channel direction.

In order to enable the target detection algorithm provided by the embodiment of the application to be capable of adaptively adjusting the position loss function value according to the size of the target under the condition of not increasing calculation cost, the loss duty ratio of targets with different scales in the algorithm training process is balanced, and the problem of long tail distribution of data is solved. The embodiment of the application can also provide that in the training iteration process of the target detection algorithm, the loss function of the target detection algorithm is an adjustment factor function based on target size self-adaptive feedback is added on the basis of the position loss function, and the adjustment factor function is used for adjusting the balance between the position loss, the confidence loss and the classification loss so as to dynamically adjust the position loss weights occupied by targets with different scales in the training process.

Further, the loss function is a function of adding the adjustment factor based on the target size adaptive feedback based on a position loss function used by the YOLOX standard model.

By designing an adjustment factor function based on target size adaptive feedback, in combination with the position Loss in YOLOX, embodiments of the present application propose a Loss function named Object Scale Adaptive Loss (OSA Loss). The loss function can dynamically adjust the weight of the position loss of different scale targets in the YOLOX algorithm in the training process. The monitoring signals of the network to targets with different scales can be more sufficient, and the training of the target detection network can be more balanced.

The method for detecting the optical remote sensing image multi-scale target provided by the embodiment of the application is described in detail and verified by taking a network structure adopting a YOLOX standard model as a target detection algorithm as an example with reference to the accompanying drawings.

Compared with other YOLO series methods, the YOLOX standard model disclosed by the embodiment of the application introduces a new decoupling head and an anchor-free mechanism, so that the convergence speed of the network is accelerated and the algorithm performance and efficiency are improved under the condition that only a small amount of parameters are increased by the network. Specifically, firstly, using CSP Darknet-53 added with Fplus structure as a backbone network module, and carrying out feature extraction on an input image in a feature extraction stage; secondly, in the characteristic enhancement stage, a path aggregation module (PathAggregation Network, PAN) is adopted for characteristic enhancement; and then, inputting the enhanced features into a detection head to finally obtain a prediction result of the position coordinates, the belonging category and the confidence coefficient of the target.

The method provided by the embodiment of the application improves the yolox standard model, and the improvement point mainly comprises the step of adding the self-adaptive feature enhancement module based on the attention mechanism between the first feature layer and the feature utilization layer. The adjustment factor function based on the target size adaptive feedback is added on the basis of the position loss function, and two improvement points are respectively described in detail below.

An OSA Loss workflow diagram is shown in fig. 2. OSA Loss is built based on IoU position Loss using the YOLOX standard model, using a function of target size, i.e., an adaptive adjustment factor, to weigh the Loss weights taken up by different scale targets.

The influence factor function can be expressed as equation 1.

f(x)＝αln(2-x)(1)

Wherein x represents the area of the target real frame normalized to [0,1] in the training sample, and alpha is a super parameter used for adjusting the balance between the position loss, the confidence loss and the classification loss.

During the training iteration, the position loss of each target is multiplied by the adjustment factor. Under the action of the influence factor function, the smaller the target size is, the larger the target position loss is, and conversely, the larger the target size is, the larger the target position loss is. In this way, the monitoring signals of the network for targets with different scales are more sufficient, and the training of the target detection network is more balanced. The calculation formula of OSA Loss can be expressed as formula 2.

Loss _OSA ＝f(x)×(1-IOU ² )＝αln(2-x)×(1-IOU ² )(2)

Since ln (2-x) ∈ [0, ln2]If alpha is set to be 1, the loss value of the whole position is reduced, the balance of the original various losses is destroyed, and in order to determine the proper alpha value, a great amount of experiments and mathematical analysis are carried out to finally obtain

As shown in fig. 3, which is a network structure diagram for improving YOLOX, the method establishes an adaptive feature enhancement module based on an attention mechanism between a first feature layer and a PAN layer of a YOLOX original network backbone CSPdarknet, and specifically includes:

step one: spatial attention weight extraction.

Firstly, a feature map output by a first feature layer of the CSPdark is obtained, and the feature map is subjected to maximum pooling and average pooling respectively in the channel direction, namely, the maximum value and the average value of each pixel point are respectively taken along the channel direction, so that the effects of aggregating the channel information of the feature map and highlighting the feature region are achieved.

The pooled results were then re-stitched along the channel direction and channel tuning and feature fusion were performed using 7 x 7 convolution.

And finally, normalizing the features by adopting a Sigmoid activation function to obtain a weight matrix of the spatial attention, and multiplying the spatial weight matrix by the original feature map to realize spatial direction self-adaptive weighting.

Step two: channel attention weight extraction.

Firstly, the characteristics output in the step one are subjected to average pooling along the width direction and the height direction, so that the attention weight of the multi-region channel with the size of 4 multiplied by 4 is generated, and the effects of gathering the space information of the characteristic diagram and improving the dimension weight information capacity of the channel are achieved.

Then, a 1 x 1 convolution is applied to the pooled features to fully activate the channel information therein. To simplify the channel attention weight adaptive weighting, the feature width and height is transformed from 4×4 to be consistent with the input h×w in an upsampling manner.

And finally, carrying out normalization processing by using a Sigmoid function to obtain a weight matrix of the channel attention, multiplying the channel weight matrix by the feature map weighted in the space direction, and realizing self-adaptive weighting of the channel direction.

In summary, the workflow of the optical remote sensing image multi-scale target detection method based on the improved YOLOX provided in the embodiment of the application includes the following steps:

the first step: experiments were performed from one or more of the disclosed optical remote sensing image target detection datasets, such as NWPU, VHR-10, LEVIR, DOTA, AI-TOD, etc., as required by the task.

The AI-TOD dataset (28036 pictures) was selected in this example, according to the training set: verification set: the test set is equal to 4:1: and 5, carrying out proportion division.

Where vt represents a target of 8×8 pixels in size, t represents a target of 8×8 < 16×16 pixels in size, s represents a target of 16×16 < 32×32 pixels in size, and m represents a target of > 32×32 pixels.

And a second step of: an improved YOLOX optical remote sensing image target detection model is built according to fig. 2 and 3. The method mainly comprises the steps of replacing original position Loss with an OSA Loss function, and embedding AFEM between a first feature layer and a feature utilization layer (PAN) of a backbone network (CSPdark).

And a third step of: and (3) inputting the divided training set in the first step, and performing data enhancement processing, wherein the data enhancement processing comprises image resolution unification, data normalization, random rotation transformation, random scale transformation, random tone transformation, mosaic and the like.

Fourth step: initializing network weights by adopting Gaussian distribution, setting the total training iteration number as 300, adjusting the model learning rate by adopting a cosine annealing strategy every 30 iterations, wherein the initial learning rate is 0.01, carrying out gradient update by adopting a random gradient descent algorithm, introducing a learning rate preheating strategy and setting the corresponding wakeup coefficient as 0.000005.

Fifthly, calculating Loss, drawing a P-R curve, comparing a model prediction result with a real result, and storing a weight file with relatively small Loss on a verification set.

Sixth step: and (3) performing model verification on the test set by adopting the weight file obtained in the fifth step, wherein the obtained target detection results are shown in tables 1 and 2.

TABLE 1 improved YOLOX target assay results

TABLE 2 improved YOLOX different class target detection accuracy table

Experimental results show that the method provided by the embodiment of the application can effectively improve the target detection precision of the optical remote sensing image.

In a word, the method for detecting the optical remote sensing image multi-scale targets can adaptively adjust the position loss function value according to the size of the targets without increasing calculation cost, balance the loss occupation ratio of the targets with different scales in the training process, and relieve the problem of long tail distribution of data; the plug-and-play attention feature enhancement module solves the problem that the semantic features of the small targets are gradually lost along with deepening of the network hierarchy, so that the network adaptively adjusts the channel and the space direction weight to improve the extraction capacity of the features of the small targets, and the monitoring and detection effects of the network on the small-scale targets are improved.

As shown in fig. 4, the embodiment of the present application may further provide an optical remote sensing image multi-scale target detection device, including: HDMI-CSI video interface adapter plate, TX2 processor carrier plate and TX2 processor; the HDMI-CSI video interface adapter plate is used for converting a standard HDMI video input source into a CSI-2 video interface and accessing the TX2 function carrier plate; the TX2 processor is used for carrying out real-time target detection based on the input video image by utilizing the optical remote sensing image multi-scale target detection method.

The device provided in the embodiment of the application may further include any other necessary hardware device when actually implemented, for example, in an actual application, the device may include an HDMI-CSI video interface adapter board, a TX2 processor carrier board, a TX2 processor, and a four-way USB expansion board. The HDMI-CSI video interface adapter plate converts a standard HDMI (type A) video input source into a CSI-2 video interface, the CSI-2 video interface is connected to the TX2 function carrier plate, and the TX2 processor performs real-time target detection based on an input video image; the USB four-way expansion board uses two ways of USB to RS422 serial port cables, one way is connected with the display control equipment to carry out bidirectional RS422 communication, and the other way is connected with the data chain to carry out bidirectional RS422 communication; and the processed result of the core processor (TX 2 processor) is output to a display through a Micro HDMI (type D), and the result is sent to a display controller through a serial port communication mode, so that the display of the target detection effect is realized.

The circuit board related to the device mainly comprises four parts:

firstly, for the HDMI to CSI2 interface board, the length is 49mm, the width is 35mm, a TOSHIBA company TC358743XBG chip is selected for conducting HDMI to CSI2 unidirectional transmission circuit design, the kernel voltage is 1.2V, the IO voltage is 1.8-3.3V, the HDMI voltage is 3.3V, the APLL voltage is 3.3V/2.5V, the I2C interface is packaged into BGA64, and the pin spacing is 0.65mm.

Secondly, the Jetson TX2 core carrier plate is 87mm long and 63mm wide, the number of the connectors is SEAM-50-02.0-S-08-2-A-K-TR, the number of the mounting holes is 4, the diameter of the mounting holes is 3.5mm, the distance between the mounting holes and the plate edge is 4mm, and the front height is 28mm; the front side is limited to be 4mm in height.

Thirdly, four USB docking stations are 29.5mm long and 18.9mm wide, FE1.1S is adopted as a main control IC to provide 4 USB2.0 interfaces, the characteristics of high performance, low power consumption, low cost and the like are achieved, an STT data transmission mode is adopted, the signal strength is as high as 10 meters, the SSOP is packaged, and the pin spacing is 0.64mm.

Fourthly, a MAX3490EESA is selected as a main control chip of the two paths of RS422 serial circuits, the power supply voltage is 3.3V, and the pin spacing of the packaged SOIC8 is 1.27mm; the matched level conversion chip selects TXS0102DCT, the power supply voltage range is 1.65-5.5, and the SM8 is packaged.

The embodiment of the application can also provide an optical remote sensing image multi-scale target detection device, which comprises a processor and a memory:

the memory is used for storing program codes and transmitting the program codes to the processor;

the processor is used for executing the steps of the optical remote sensing image multi-scale target detection method according to the instructions in the program codes.

As shown in fig. 5, an optical remote sensing image multi-scale target detection device provided in an embodiment of the present application may include: a processor 10, a memory 11, a communication interface 12 and a communication bus 13. The processor 10, the memory 11 and the communication interface 12 all complete communication with each other through a communication bus 13.

In the present embodiment, the processor 10 may be a central processing unit (Central Processing Unit, CPU), an asic, a dsp, a field programmable gate array, or other programmable logic device, etc.

The processor 10 may call a program stored in the memory 11, and in particular, the processor 10 may perform operations in an embodiment of the optical telemetry image multi-scale object detection method.

The memory 11 is used for storing one or more programs, and the programs may include program codes, where the program codes include computer operation instructions, and in this embodiment, at least the programs for implementing the following functions are stored in the memory 11:

And/or a program for implementing the following functions:

inputting the optical remote sensing image into a target detection algorithm to perform multi-scale target detection so as to obtain a target detection result; in the training iterative process of the target detection algorithm, a loss function of the target detection algorithm is an adjustment factor function based on target size self-adaptive feedback is added on the basis of a position loss function, and the adjustment factor function is used for adjusting the balance between position loss, confidence loss and classification loss so as to dynamically adjust the position loss weights occupied by targets with different scales in the training process.

In one possible implementation, the memory 11 may include a storage program area and a storage data area, where the storage program area may store an operating system, and application programs required for at least one function (such as a file creation function, a data read-write function), and the like; the store data area may store data created during use, such as initialization data, etc.

In addition, the memory 11 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid-state storage device.

The communication interface 12 may be an interface of a communication module for interfacing with other devices or systems.

Of course, it should be noted that the structure shown in fig. 5 is not limited to the optical remote sensing image multi-scale object detection device in the embodiment of the present application, and in practical application, the optical remote sensing image multi-scale object detection device may include more or fewer components than those shown in fig. 5, or some components may be combined.

Embodiments of the present application may also provide a computer readable storage medium storing program code for performing the steps of the above-described optical remote sensing image multi-scale object detection method.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

From the description of the embodiments above, it will be apparent to those skilled in the art that the present application may be implemented in software plus the necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in the embodiments or some parts of the embodiments of the present application.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

inputting the optical remote sensing image into a target detection algorithm to perform multi-scale target detection so as to obtain a target detection result;

the self-adaptive feature enhancement module based on an attention mechanism is arranged between a first feature layer and a feature utilization layer in the network structure of the target detection algorithm; the self-adaptive feature enhancement module is used for extracting the spatial attention weight and the channel attention weight, and is used for carrying out self-adaptive weighting on the spatial direction and the channel direction on the original feature map output by the first feature layer;

the adaptive feature enhancement module is specifically configured to:

multiplying the channel weight matrix with the feature map subjected to the self-adaptive weighting of the space direction to realize the self-adaptive weighting of the features of the channel direction;

the target detection algorithm is as follows: adding the adaptive feature enhancement module based on the attention mechanism between a first feature layer and a feature utilization layer of a network structure of the YOLOX standard model;

in the training iterative process of the target detection algorithm, the loss function of the target detection algorithm is an adjustment factor function based on target size self-adaptive feedback added on the basis of the position loss function, and the adjustment factor function is used for adjusting the balance between position loss, confidence loss and classification loss so as to dynamically adjust the position loss weights occupied by targets with different scales in the training process;

the adjustment factor function based on the target size adaptive feedback is represented by the following formula:

f(x)＝αln(2―x)

2. The method for detecting a multi-scale object in an optical remote sensing image according to claim 1, wherein the adaptive feature enhancement module is specifically configured to:

3. The method according to claim 2, wherein the adaptive feature enhancement module is specifically configured to, when generating the spatial attention weighting matrix:

4. The method for multi-scale object detection of an optical remote sensing image according to claim 1, wherein the adaptive feature enhancement module is specifically configured to, when generating the channel attention weight matrix:

5. The method according to claim 1, wherein the position loss of each target is multiplied by the adjustment factor function during the training iteration of the target detection algorithm, and wherein the smaller the target size, the larger the target position loss is the larger the increase in the ratio, and the larger the target size is the larger the decrease in the target position loss is under the action of the adjustment factor function.

6. The method for multi-scale object detection of an optical remote sensing image according to claim 1, wherein the loss function is a function of an adjustment factor based on adaptive feedback of object size added on the basis of a position loss function used by a YOLOX standard model.

7. An optical remote sensing image multi-scale target detection device, comprising:

HDMI-CSI video interface adapter plate, TX2 processor carrier plate and TX2 processor; the HDMI-CSI video interface adapter plate is used for converting a standard HDMI video input source into a CSI-2 video interface and accessing the CSI-2 video interface to the TX2 functional carrier plate; the TX2 processor is configured to perform real-time object detection using the optical remote sensing image multi-scale object detection method of any one of claims 1 to 6 based on an input video image.