CN117789039B

CN117789039B - Remote sensing image target detection method based on context information distinguishing and utilizing

Info

Publication number: CN117789039B
Application number: CN202410213682.8A
Authority: CN
Inventors: 王永成; 张玉溪
Original assignee: Changchun Institute of Optics Fine Mechanics and Physics of CAS
Current assignee: Changchun Institute of Optics Fine Mechanics and Physics of CAS
Priority date: 2024-02-27
Filing date: 2024-02-27
Publication date: 2024-05-28
Anticipated expiration: 2044-02-27
Also published as: CN117789039A

Abstract

The invention relates to the technical field of remote sensing image processing, in particular to a remote sensing image target detection method based on context information distinguishing and utilizing. Comprising the following steps: s1: acquiring an input image, and sequentially inputting the input image into a backbone network and a neck network for processing to obtain a multi-scale feature map; s2: constructing a two-stage target detection network based on the context information distinguishing and utilizing, wherein the two-stage target detection network comprises a first-stage detection network and a second-stage detection network; s3: constructing an overall loss function, and training a two-stage target detection network by utilizing the overall loss function to obtain a trained two-stage target detection network; s4: inputting the image to be detected into a trained two-stage target detection network for detection, and obtaining a final detection result. The invention improves the detection accuracy of the ground object target of the remote sensing image.

Description

Remote sensing image target detection method based on context information distinguishing and utilizing

Technical Field

The invention relates to the technical field of remote sensing image processing, in particular to a remote sensing image target detection method based on context information distinguishing and utilizing.

Background

The ground surface space background of the remote sensing image is wide and complex, a large amount of information is contained, the ground object targets in the wide background contain less information relative to the surrounding environment, the characteristic performance is poor, the remote sensing image is easily interfered by environmental factors such as illumination intensity, weather and the like, the image quality difference is large, and the detection difficulty of the ground object targets is large. In order to increase the effective information of detecting the ground object target, reduce the uncertainty of the ground object target, improve the detection precision, many scholars research the contribution of the context information to the feature expression and the target detection. In some cases, visual objects will often appear in a particular environment, sometimes with other related objects, i.e. the context information has a complementary effect on the object information. However, the complex spatial pattern of the remote sensing image formed by the intersection of the ground object target and the ground space background can result in the target object being submerged in the background, so not all the contextual information is helpful for detection. Context information that is too similar to the target area can bring information noise, weakening the characteristic expression capability of the target object. Therefore, how to fully and reasonably utilize the context information in the complex background of the remote sensing image to provide help for detecting the ground object target is a critical problem to be solved urgently.

Disclosure of Invention

The invention provides a remote sensing image target detection method based on context information distinguishing utilization, which aims to solve the defect that the prior art cannot reasonably utilize context information in a complex background in a remote sensing image, so that the context information cannot be effectively assisted in detection of a ground object target.

The invention provides a remote sensing image target detection method based on context information distinguishing and utilizing, which specifically comprises the following steps:

s1: acquiring an input image, and sequentially inputting the input image into a backbone network and a neck network for processing to obtain a multi-scale feature map;

the backbone network adopts ConvNeXt networks, and the neck network adopts FPN networks;

S2: constructing a two-stage target detection network based on the distinguishing and utilizing of the context information, wherein the two-stage target detection network comprises a first-stage detection network and a second-stage detection network, and the second-stage detection network comprises a detection module based on the supplementing of the context information and a detection module based on the suppressing of the context information;

the first stage detection network adopts an RPN network;

S3: constructing an overall loss function, and training a two-stage target detection network by utilizing the overall loss function to obtain a trained two-stage target detection network;

The step S3 specifically comprises the following steps:

s31: inputting the multi-scale feature map into a trained first-stage detection network to perform convolution operation to obtain a target suggestion region, and expanding the length and width of the target suggestion region by one time to obtain a context region;

S32: creating an overall similarity evaluation formula, performing similarity evaluation on the target suggestion region and the context region by using the overall similarity evaluation formula, and constructing a low-similarity target marking frame and a high-similarity target marking frame on the multi-scale feature map according to a similarity evaluation result;

The step S32 specifically includes the following steps:

S321: the average gray level of the target advice region and the context region is calculated, and the luminance similarity L of the target advice region and the context region is calculated by the following formula:

（1）；

Wherein, For the gray average value of the target advice region,/>The gray average value of the context area, sigma is a minimum value for avoiding denominator 0;

S322: the contrast similarity D of the target suggested region and the context region is calculated by:

（2）；

（3）；

（4）；

Wherein, Suggesting contrast of region for target,/>For the contrast of the context region,/>The number of all pixels of the region is suggested for the target,/>For the number of all pixels in the context area, x is the value of the pixel point in the target suggestion area, i and j are the abscissa of the pixel point, and y is the value of the pixel point in the context area;

s323: the smoothness similarity P of the target suggested region and the context region is calculated by:

（5）；

（6）；

（7）；

Wherein, Suggesting contrast of region for target,/>Contrast for the context region;

s324: the texture feature similarity T of the target suggestion region and the context region is calculated by:

（8）；

（9）；

Wherein X is the LBP characteristic histogram of the target advice region, Y is the LBP characteristic histogram of the context region, Chi-square distance of LBP characteristic histogram for target suggestion region and context region;

S325: based on the luminance similarity, contrast similarity, smoothness similarity, and texture feature similarity, the overall similarity S of the target suggested region and the context region is calculated by:

（10）；

S326: respectively calculating probability density distribution of brightness similarity, contrast similarity, smoothness similarity and texture feature similarity, and correspondingly obtaining median values of the brightness similarity, the contrast similarity, the smoothness similarity and the texture feature similarity;

S327: taking the product of the median values of the brightness similarity, the contrast similarity, the smoothness similarity and the texture feature similarity as a threshold value of the overall similarity;

S328: constructing a high-similarity target annotation frame in an image area with overall similarity higher than a threshold value, and constructing a low-similarity target annotation frame in an image area with overall similarity lower than the threshold value;

s33: taking an image area marked by a low-similarity target marking frame on the multi-scale feature image as a low-similarity target feature image, and supplementing context information to the low-similarity target feature image by using a context information supplementing-based detection module to obtain a first detection value;

The step S33 specifically includes the following steps:

S331: resampling the low-similarity target feature map to obtain target to-be-identified areas, and expanding the length and width of each target to-be-identified area by one time to obtain context supplementing areas;

s332: inputting the target region to be identified and the context supplementing region into a first full-connection layer and a second full-connection layer respectively, and correspondingly obtaining a first characteristic vector and a second characteristic vector;

S333: adding the first feature vector and the second feature vector, and then processing the added first feature vector and the added second feature vector through a third full connection layer to obtain a third feature vector;

S334: inputting the third feature vector into a classification full-connection layer and a regression full-connection layer respectively to identify the category and the position of the target marking frame, and obtaining a first detection value;

S34: taking an image area marked by the high-similarity target marking frame on the multi-scale feature image as a high-similarity target feature image, and inhibiting the context information of the high-similarity target feature image by using a detection module based on the context information inhibition to obtain a second detection value;

the step S34 specifically includes the following steps:

S341: extracting a feature map A ₁ and a feature map B ₁ on the high-similarity target feature map, wherein the size of the feature map A ₁ is 1/4 of the input image, and the size of the feature map B ₁ is 1/8 of the input image;

S342: inputting the feature map A ₁ and the feature map B ₁ into a first convolution sub-module and a second convolution sub-module respectively for convolution processing, and correspondingly obtaining a feature map A ₂ and a feature map B ₂;

S343: performing up-sampling operation on the feature map A ₂ to obtain a feature map A ₃ with the same size as the feature map B ₁, adding the feature map A ₃ and the feature map B ₂, and processing by a softmax function to obtain a significance mask map;

S344: multiplying the saliency mask map and the feature map B ₁ to obtain a saliency feature map, and resampling the saliency feature map to obtain a feature map C;

s345: respectively inputting the feature map C into a classification full-connection layer and a regression full-connection layer to identify the category and the position of the target marking frame, and obtaining a second detection value;

S35: performing non-maximum suppression on the first detection value and the second detection value to obtain a final detection result;

s4: inputting the image to be detected into a trained two-stage target detection network for detection, and obtaining a final detection result.

Preferably, the first convolution sub-module and the second convolution sub-module are each comprised of 3*3 and 1*1 convolution layers in cascade.

Preferably, the calculation formula of the saliency mask map is:

（11）；

where M is a saliency mask map, For the convolution operation of the first convolution sub-module, f ₂ is the convolution operation of the second convolution sub-module,/>Is a characteristic diagram A ₁,/>For feature map B ₁, u is the upsampling operation and σ is a softmax function.

Preferably, the overall loss function is:

（12）；

（13）；

（14）；

Wherein, As the integral loss function, λ1, λ2, λ3 are loss balance coefficients, and λ1, λ2, λ3 are all set to 1,For the classification loss function, N is the number of positive samples,/>For the classification predictive value of the ith sample,/>Classification tag for the ith sample,/>For regression loss function,/>Equation is indicated in parentheses for Ai Fosen, and i is a positive sample,/>Greater than 0, ai Fosen brackets indicate that the equation has a value of 1, otherwise a value of 0,/>Is the position predictor of the i-th sample,/>For the position tag of the i-th sample,/>、/>、/>Loss detection in the first stage, loss detection in the second stage, loss detection,/>, respectivelyIs a significant loss.

Compared with the prior art, the invention has the following beneficial effects:

(1) The invention provides an overall similarity evaluation formula, which evaluates the overall similarity degree of a target suggestion region and a context region by comprehensively considering the brightness, the contrast, the smoothness and the texture characteristics of the target suggestion region and the context region, thereby obtaining a target with low similarity to the context region and a target with high similarity to the context region.

(2) The invention provides a two-stage target detection network based on the differentiated utilization of context information, wherein in the second-stage detection network, context information is supplemented to a low-similarity target feature map through a detection module based on context information supplementation, and context information is restrained to a high-similarity target feature map through a detection module based on context information restraint, so that the full utilization of the context information is realized.

Drawings

Fig. 1 is a schematic flow chart of a remote sensing image target detection method based on context information discrimination and utilization according to an embodiment of the present invention;

FIG. 2 is a network block diagram of a two-phase object detection network based on context information differentiated exploitation according to an embodiment of the present invention;

FIG. 3 is a network block diagram of a detection module based on context information supplementation provided in accordance with an embodiment of the present invention;

FIG. 4 is a network block diagram of a detection module based on context information suppression provided in accordance with an embodiment of the present invention;

Fig. 5 is a schematic diagram of a result of detecting a DOTA dataset by using a remote sensing image target detection method based on context information discrimination and utilization according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a detection result of a DIOR-R dataset by a remote sensing image target detection method based on context information discrimination and utilization according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a result of detecting UCAS-AOD datasets according to a remote sensing image target detection method based on context information discrimination and utilization according to an embodiment of the present invention.

Detailed Description

Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In the following description, like modules are denoted by like reference numerals. In the case of the same reference numerals, their names and functions are also the same. Therefore, a detailed description thereof will not be repeated.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not to be construed as limiting the invention.

According to the invention, through comprehensively considering brightness, contrast, smoothness and texture characteristics, an overall similarity evaluation formula is designed, a target area is expanded to generate a context area, then the overall similarity evaluation formula is utilized to evaluate the similarity of the target area and the context area, and a low-similarity marking frame and a high-similarity marking frame are obtained according to a similarity evaluation result. The invention also provides a two-stage target detection network based on the context information distinguishing and utilizing, and the context information supplementing is carried out on the low-similarity characteristic diagram by utilizing a detection module based on the context information supplementing, and the context information is restrained on the high-similarity characteristic diagram by utilizing a detection module based on the context information restraining, so that the full utilization of the context information is realized.

Fig. 1 illustrates a flow of a remote sensing image object detection method based on context information discrimination and utilization according to an embodiment of the present invention, fig. 2 illustrates a network structure of a two-stage object detection network based on context information discrimination and utilization according to an embodiment of the present invention, fig. 3 illustrates a network structure of a detection module based on context information supplementation according to an embodiment of the present invention, and fig. 4 illustrates a network structure of a detection module based on context information suppression according to an embodiment of the present invention.

As shown in fig. 1 to fig. 4, the remote sensing image target detection method based on the context information distinguishing and utilizing provided by the embodiment of the invention specifically includes the following steps:

s1: and acquiring an input image, and sequentially inputting the input image into a backbone network and a neck network for processing to obtain a multi-scale feature map.

The backbone network adopts ConvNeXt network, the neck network adopts FPN network, and the first stage detection network uses RPN network.

S2: and constructing a two-stage target detection network based on the context information distinguishing and utilizing, wherein the two-stage target detection network comprises a first-stage detection network and a second-stage detection network, and the second-stage detection network comprises a detection module based on context information supplement and a detection module based on context information suppression.

S3: and constructing an overall loss function, and training the two-stage target detection network by utilizing the overall loss function to obtain a trained two-stage target detection network.

The step S3 specifically comprises the following steps:

S31: inputting the multi-scale feature map into a trained first-stage detection network to perform convolution operation to obtain a target suggestion region, and expanding the length and width of the target suggestion region by one time to obtain a context region.

S32: creating an overall similarity evaluation formula, performing similarity evaluation on the target suggestion region and the context region by using the overall similarity evaluation formula, and constructing a low-similarity target labeling frame and a high-similarity target labeling frame on the multi-scale feature map according to the similarity evaluation result.

The step S32 specifically includes the following steps:

（1）；

Wherein, For the gray average value of the target advice region,/>The gray average value of the context area, σ is a minimum value for avoiding denominator of 0.

（2）；

（3）；

（4）；

Wherein, Suggesting contrast of region for target,/>For the contrast of the context region,/>The number of all pixels of the region is suggested for the target,/>For the number of all pixels of the context region, x is the value of the pixel point in the target suggestion region, i and j are the abscissa of the pixel point, respectively, and y is the value of the pixel point in the context region.

（5）；

（6）；

（7）；

Wherein, Suggesting contrast of region for target,/>Is the contrast of the context area.

（8）；

（9）；

Wherein X is the LBP characteristic histogram of the target advice region, Y is the LBP characteristic histogram of the context region, Chi-square distance of LBP feature histograms for the target suggestion region and the context region.

（10）；

S326: and respectively calculating probability density distribution of the brightness similarity, the contrast similarity, the smoothness similarity and the texture feature similarity, and correspondingly obtaining the median value of the brightness similarity, the contrast similarity, the smoothness similarity and the texture feature similarity.

S327: the product of the median of the luminance similarity, contrast similarity, smoothness similarity and texture feature similarity is taken as the threshold for overall similarity.

S328: and constructing a high-similarity target annotation frame in an image area with overall similarity higher than a threshold value, and constructing a low-similarity target annotation frame in an image area with overall similarity lower than the threshold value.

S33: taking an image area marked by the low-similarity target marking frame on the multi-scale feature map as a low-similarity target feature map, and supplementing context information to the low-similarity target feature map by using a context information supplementing-based detection module to obtain a first detection value.

The step S33 specifically includes the following steps:

S331: resampling the low-similarity target feature map to obtain target to-be-identified areas, and expanding the length and width of the target to-be-identified areas by one time to obtain context supplementing areas.

S332: and respectively inputting the target region to be identified and the context supplementing region into the first full-connection layer and the second full-connection layer to correspondingly obtain a first characteristic vector and a second characteristic vector.

S333: and adding the first feature vector and the second feature vector, and then processing the added first feature vector and the second feature vector through a third full connection layer to obtain a third feature vector.

S334: and respectively inputting the third feature vector into the classification full-connection layer and the regression full-connection layer to identify the category and the position of the target labeling frame, so as to obtain a first detection value.

S34: and taking the image area marked by the high-similarity target marking frame on the multi-scale feature image as a high-similarity target feature image, and inhibiting the context information of the high-similarity target feature image by using a detection module based on the context information inhibition to obtain a second detection value.

The step S34 specifically includes the following steps:

S341: feature map a ₁ and feature map B ₁ are extracted on the high-similarity target feature map, the size of feature map a ₁ is 1/4 of the input image, and the size of feature map B ₁ is 1/8 of the input image.

S342: and respectively inputting the characteristic diagram A ₁ and the characteristic diagram B ₁ into a first convolution sub-module and a second convolution sub-module to carry out convolution processing, and correspondingly obtaining the characteristic diagram A ₂ and the characteristic diagram B ₂.

The first convolution sub-module and the second convolution sub-module are each comprised of 3*3 and 1*1 convolution layers in cascade.

S343: the feature map a ₂ is subjected to an up-sampling operation to obtain a feature map a ₃ having the same size as the feature map B ₁, and the feature map a ₃ and the feature map B ₂ are added and processed by a softmax function to obtain a saliency mask map.

S344: and multiplying the saliency mask map by the feature map B ₁ to obtain a saliency feature map, and resampling the saliency feature map to obtain a feature map C.

S345: and respectively inputting the feature map C into a classification full-connection layer and a regression full-connection layer to identify the category and the position of the target labeling frame, and obtaining a second detection value.

S35: and performing non-maximum suppression on the first detection value and the second detection value to obtain a final detection result.

The calculation formula of the saliency mask map is as follows:

（11）；

The feature map A ₃ and the feature map B ₂ are added and processed through a softmax function, namely, the context information is restrained through a mask at a pixel level, the target area is highlighted, the influence of surrounding environment information which is easily confused with the target on detection is weakened, and therefore the detection effect is improved.

The overall loss function proposed by the embodiment of the invention consists of detection loss and pixel-level significance loss.

And training the two-stage target detection network based on the context information distinguishing and utilizing by utilizing the integral loss function until the preset iteration times are reached or convergence is carried out.

The overall loss function is:

（12）；

（13）；

（14）；

Fig. 5 shows a result of detecting a DOTA dataset by using a remote sensing image target detection method based on context information discrimination and utilization according to an embodiment of the present invention.

As shown in fig. 5, the remote sensing image object detection method based on the context information discrimination and utilization provided by the embodiment of the present invention should detect the following 15 kinds of objects, including an airplane (PL), a baseball field, a bridge, a sports field, a small vehicle, a large vehicle, a ship, a tennis court, a basketball court, a storage tank, a soccer field, a ring island, a port, a swimming pool, and a helicopter. In the DOTA data set, although the background of the remote sensing image is wide and complex, the scale difference of the targets is large, and the targets have any directions, the remote sensing image target detection method based on the context information distinguishing and utilizing provided by the embodiment of the invention can accurately mark the position of each target by using the rotary rectangular frame in most scenes, and the visual result achieves a satisfactory effect.

Fig. 6 shows a result of detecting a DIOR-R dataset by a remote sensing image target detection method based on context information discrimination and utilization according to an embodiment of the present invention.

As shown in fig. 6, the remote sensing image object detection method based on the context information discrimination provided by the embodiment of the present invention should detect 20 kinds of objects including an airplane, an airport, a baseball field, a basketball court, a bridge, a chimney, a dam, a highway service area, a highway toll gate, a golf course, a ground runway, a harbor, an Overpass (OP), a ship, a stadium, a storage tank, a tennis court, a train station, a vehicle, and a windmill. In the DIOR-R data set, although the target categories are various, the intra-category difference is large, the background is complex, and the detection difficulty is large, the remote sensing image target detection method based on the context information distinguishing and utilizing can still finish target detection based on a rotating frame with high quality, and the visual result achieves an ideal effect.

Fig. 7 shows a result of detecting UCAS-AOD datasets according to a remote sensing image target detection method based on context information discrimination and utilization according to an embodiment of the present invention.

As shown in fig. 7, the remote sensing image target detection method based on the context information discrimination and utilization provided by the embodiment of the invention is used for detecting the automobile and the airplane in different scenes, and ideal visual results are obtained.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. The remote sensing image target detection method based on the context information distinguishing and utilizing is characterized by comprising the following steps:

s1: acquiring an input image, and sequentially inputting the input image into a main network and a neck network for processing to obtain a multi-scale feature map;

s2: constructing a two-stage target detection network based on the context information distinguishing utilization, wherein the two-stage target detection network comprises a first-stage detection network and a second-stage detection network, and the second-stage detection network comprises a detection module based on context information supplement and a detection module based on context information suppression;

the first-stage detection network adopts an RPN network;

S3: constructing an overall loss function, and training the two-stage target detection network by utilizing the overall loss function to obtain a trained two-stage target detection network;

The step S3 specifically includes the following steps:

S31: inputting the multi-scale feature map to a trained first-stage detection network for convolution operation to obtain a target suggestion region, and expanding the length and width of the target suggestion region by one time to obtain a context region;

S32: creating an overall similarity evaluation formula, performing similarity evaluation on the target suggestion region and the context region by using the overall similarity evaluation formula, and constructing a low-similarity target labeling frame and a high-similarity target labeling frame on the multi-scale feature map according to a similarity evaluation result;

the step S32 specifically includes the following steps:

s321: calculating a gray average value of the target suggested area and the context area, and calculating a brightness similarity L of the target suggested area and the context area by the following formula:

（1）；

Wherein, Suggesting a gray average value of the region for the target,/>Sigma is a minimum value for avoiding denominator of 0 as a gray average value of the context area;

s322: calculating the contrast similarity D of the target suggested region and the context region by the following formula:

（2）；

（3）；

（4）；

Wherein, Suggesting a contrast of the region for the target,/>For the contrast of the context area,/>The number of all pixels of the region is suggested for the target,/>For the number of all pixels in the context area, x is the value of the pixel point in the target suggestion area, i and j are the abscissa of the pixel point, and y is the value of the pixel point in the context area;

s323: calculating the smoothness similarity P of the target suggested region and the context region by the following formula:

（5）；

（6）；

（7）；

Wherein, Suggesting a contrast of the region for the target,/>Contrast for the context region;

s324: calculating the texture feature similarity T of the target suggestion region and the context region by the following formula:

（8）；

（9）；

wherein X is the LBP characteristic histogram of the target suggested area, Y is the LBP characteristic histogram of the context area, Chi-square distance of LBP characteristic histogram of the target suggestion region and the context region;

S325: calculating the overall similarity S of the target suggested region and the context region according to the brightness similarity, the contrast similarity, the smoothness similarity and the texture feature similarity by the following formula:

（10）；

S326: respectively calculating probability density distribution of the brightness similarity, the contrast similarity, the smoothness similarity and the texture feature similarity, and correspondingly obtaining a median value of the brightness similarity, the contrast similarity, the smoothness similarity and the texture feature similarity;

S327: taking the product of the luminance similarity, the contrast similarity, the smoothness similarity and the median of the texture feature similarity as a threshold value of the overall similarity;

S328: constructing a high-similarity target labeling frame in an image area with the overall similarity higher than the threshold value, and constructing a low-similarity target labeling frame in an image area with the overall similarity lower than the threshold value;

S33: taking an image area marked by the low-similarity target marking frame on the multi-scale feature image as a low-similarity target feature image, and supplementing context information to the low-similarity target feature image by utilizing the context information supplementing-based detection module to obtain a first detection value;

The step S33 specifically includes the following steps:

S331: resampling the low-similarity target feature map to obtain target areas to be identified, and expanding the length and width of the target areas to be identified by one time to obtain context supplement areas;

s332: inputting the target region to be identified and the context supplement region into a first full-connection layer and a second full-connection layer respectively, and correspondingly obtaining a first feature vector and a second feature vector;

s333: adding the first feature vector and the second feature vector, and then processing the added first feature vector and the added second feature vector through a third full-connection layer to obtain a third feature vector;

S334: the third feature vector is respectively input into a classification full-connection layer and a regression full-connection layer to identify the category and the position of the target marking frame, and a first detection value is obtained;

S34: taking an image area marked by the high-similarity target marking frame on the multi-scale feature image as a high-similarity target feature image, and performing context information suppression on the high-similarity target feature image by using the context information suppression-based detection module to obtain a second detection value;

the step S34 specifically includes the following steps:

S342: inputting the feature map a ₁ and the feature map B ₁ to a first convolution sub-module and a second convolution sub-module to carry out convolution processing respectively, and correspondingly obtaining a feature map a ₂ and a feature map B ₂;

S343: performing up-sampling operation on the feature map A ₂ to obtain a feature map A ₃ with the same size as the feature map B ₁, adding the feature map A ₃ and the feature map B ₂, and processing by a softmax function to obtain a saliency mask map;

S345: inputting the feature map C to a classification full-connection layer and a regression full-connection layer respectively to identify the category and the position of the target marking frame, and obtaining a second detection value;

2. The method for detecting a target in a remote sensing image based on context information discrimination according to claim 1, wherein the first convolution sub-module and the second convolution sub-module are each composed of 3*3 convolution layers and 1*1 convolution layers in cascade.

3. The method for detecting a remote sensing image target based on context information discrimination utilization according to claim 1, wherein a calculation formula of the significance mask map is:

（11）；

where M is a saliency mask map, For the convolution operation of the first convolution sub-module, f ₂ is the convolution operation of the second convolution sub-module,/>For the characteristic diagram A ₁,/>For the feature map B ₁, u is the upsampling operation and σ is a softmax function.

4. The method for detecting a target of a remote sensing image based on the distinguishing and utilizing of context information according to claim 1, wherein the overall loss function is:

（12）；

（13）；

（14）；

Wherein, As the integral loss function, λ1, λ2 and λ3 are loss balance coefficients, and λ1, λ2 and λ3 are all 1,/>For the classification loss function, N is the number of positive samples,/>For the classification predictive value of the ith sample,/>Classification tag for the ith sample,/>For regression loss function,/>Equation is indicated in parentheses for Ai Fosen, and i is a positive sample,/>Above 0, the Ai Fosen brackets indicate that the equation has a value of 1, otherwise a value of 0,/>Is the position predictor of the i-th sample,/>For the position tag of the i-th sample,/>、/>、/>Loss detection in the first stage, loss detection in the second stage, loss detection,/>, respectivelyIs a significant loss.