CN115131568B

CN115131568B - Space target segmentation method and device based on non-local attention mechanism

Info

Publication number: CN115131568B
Application number: CN202211050721.4A
Authority: CN
Inventors: 李磊; 胡玉新; 丁昊; 喻小东; 闫国刚; 高斌; 崔婷婷; 刘怡丹
Original assignee: Aerospace Information Research Institute of CAS
Current assignee: Aerospace Information Research Institute of CAS
Priority date: 2022-08-31
Filing date: 2022-08-31
Publication date: 2022-12-27
Anticipated expiration: 2042-08-31
Also published as: CN115131568A

Abstract

The invention provides a space target segmentation method and device based on a Non-Local attention mechanism, relates to the technical field of computer vision, and aims to solve the technical problems that an existing image segmentation algorithm based on a Non-Local attention mechanism is large in weight matrix parameter quantity, limited in application range and difficult in compression of weight matrix parameters. The method comprises the following steps: constructing a two-stage local attention mechanism structure comprising a local feature aggregation stage and a cross-local feature aggregation stage; embedding the two-stage local attention mechanism structure into an image semantic segmentation network to obtain an updated image semantic segmentation network; and acquiring a space image to be processed, inputting the space image to be processed into the updated image semantic segmentation network, and outputting a segmentation result of the space target. The method can lighten the calculation amount of the attention mechanism, compress the parameter amount of the weight matrix and expand the application of the attention mechanism.

Description

Space target segmentation method and device based on non-local attention mechanism

Technical Field

The invention relates to the technical field of computer vision, in particular to the field of space target image segmentation, and more particularly relates to a space target segmentation method and device based on a non-local attention mechanism.

Background

In recent years, human aerospace exploration activities are increasingly frequent, the number of targets in orbit space of various countries is increased sharply, and collision early warning on the space targets has important significance for guaranteeing the on-orbit safety of space stations and various high-value space targets in China. The spatial situation awareness technology is used for determining the state, the attribute and the intention of a non-cooperative space target by monitoring the position and motion state information of the non-cooperative space target for a long time, and is a main countermeasure and precaution means for the space safety problem at present. At present, space-based optical observation is an important technical means for obtaining space target information, and compared with ground-based optical observation, the space-based optical observation is not limited by factors such as atmospheric interference and meteorological conditions.

The main task of the space target segmentation technology is to segment space targets and target component information from a starry sky background, so that subsequent target information (attributes, functions and intentions) can be further interpreted conveniently, and therefore, the space target segmentation is a basic key technology of a space situation perception technology. At present, the segmentation objects of the mainstream image segmentation technology are streetscapes, automobiles, airplanes, ships and warships, and a special segmentation method for a space target still remains blank in the industry. Compared with common natural images and high-resolution remote sensing images, the space-based optical observation images have the data characteristics of wide and large scenes, uneven illumination, blurred edges, serious overexposure and the like, and the data characteristics seriously interfere with image feature extraction, so that the design of a special space target segmentation method is a very challenging matter to solve urgently.

Aiming at the data characteristics of the space image, attention can be drawn for designing a special space target segmentation method. Since an image segmentation algorithm based on Non-Local attention mechanism is provided, the method is widely applied to the field of natural image and high-resolution remote sensing image segmentation due to the strong associated feature extraction capability. The method is limited by the Non-Local attention mechanism principle and the characteristics of spatial image data, and the following defects exist when the image segmentation algorithm of the conventional Non-Local attention mechanism is directly used for the spatial target image segmentation task:

1) The weight matrix has a large parameter number, and is very easy to occupy too much video memory, which causes the video memory explosion of a GPU (Graphics Processing Unit). The weight matrix parameter isO(HW×HW) The space complexity of the network, namely the parameter quantity and the calculated quantity are in direct proportion to the square of an image size value HW, so that the storage cost in the network calculation process is high;

2) The application range is limited. Due to the limitation of the size of the feature map, the application of a Non-Local attention mechanism in a network feature extraction stage of a dense prediction task (such as a semantic segmentation task) is greatly limited, and a Non-Local structure is difficult to use in a large-scale feature map such as a spatial target;

3) The parameters of the weight matrix are difficult to compress, the lightweight of the network parameters is difficult to realize, and the requirement on the performance of the server is high.

Therefore, the existing image segmentation algorithm based on the Non-Local attention mechanism has some defects and needs to be further improved.

Disclosure of Invention

In view of the above problems, the present invention provides a method and an apparatus for spatial object segmentation based on a non-local attention mechanism.

The invention provides a space target segmentation method based on a non-local attention mechanism, which comprises the following steps: constructing a two-stage local attention mechanism structure comprising a local feature aggregation stage and a cross-local feature aggregation stage, wherein the local feature aggregation stage is used for performing spatial region grouping on a complete feature map embedded in a feature space and extracting local correlation features; the cross-local feature aggregation stage is used for expanding local correlation features and extracting global correlation features; embedding the two-stage local attention mechanism structure into an image semantic segmentation network to obtain an updated image semantic segmentation network; and acquiring a space image to be processed, inputting the space image to be processed into the updated image semantic segmentation network, and outputting a segmentation result of the space target.

The invention provides a space target segmentation method based on a non-local attention mechanism, which comprises the following steps: the structure construction module is used for constructing a two-stage local attention mechanism structure comprising a local feature aggregation stage and a cross-local feature aggregation stage, wherein the local feature aggregation stage is used for carrying out space region grouping on a complete feature map embedded into a feature space and extracting local correlation features; the cross-local feature aggregation stage is used for expanding local correlation features and extracting global correlation features; the structure embedding module is used for embedding the two-stage local attention mechanism structure into the image semantic segmentation network to obtain an updated image semantic segmentation network; and the space target segmentation module is used for acquiring a space image to be processed, inputting the space image to be processed into the updated image semantic segmentation network and outputting a segmentation result of the space target.

The present invention also provides an electronic device, comprising: one or more processors; memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the non-local attention mechanism based spatial object segmentation method described above.

Compared with the prior art, the space target segmentation method and device based on the non-local attention mechanism provided by the invention at least have the following beneficial effects:

(1) The two-stage Local attention mechanism structure (TLA structure) is provided, the number of overall calculation characteristic points of the conventional Non-Local attention mechanism structure can be obviously reduced, the calculation amount of the attention mechanism is lightened, the parameter amount of a weight matrix is compressed, and the application of the attention mechanism is expanded;

(2) The method adopts a characteristic modeling strategy of 'Local area first and then integral', and divides the conventional Non-Local attention mechanism characteristic correlation calculation and aggregation process into two-stage calculation processes of 'Local characteristic aggregation' and 'cross-Local characteristic aggregation', so as to realize the sparseness of the sampling of the calculated characteristic points;

(3) The TLA structure is embedded into the deep layer of the convolution network, so that feature aggregation of a deep feature map is realized, adverse effects of uneven illumination, fuzzy edge and the like of a space image are inhibited, feature separability is enhanced, and space target segmentation precision is improved.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent from the following description of the embodiments of the present invention with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates a flow chart of a method for spatial object segmentation based on a non-local attention mechanism according to an embodiment of the present invention;

FIG. 2 schematically illustrates a schematic diagram of a TLA architecture in accordance with an embodiment of the present invention;

FIG. 3 schematically shows a flow diagram of a local feature aggregation phase according to an embodiment of the invention;

FIG. 4 schematically shows a flow diagram of a cross-local feature aggregation phase according to an embodiment of the invention;

FIG. 5 schematically illustrates a schematic diagram of a two-stage feature aggregation process according to an embodiment of the invention;

FIG. 6 schematically illustrates a schematic diagram of a spatial target segmentation method based on a non-local attention mechanism, according to an embodiment of the present invention;

FIG. 7 schematically illustrates a block diagram of a spatial target segmentation apparatus based on a non-local attention mechanism according to an embodiment of the present invention;

fig. 8 schematically shows a block diagram of an electronic device adapted to implement the access control method according to an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Aiming at the defects of an image segmentation algorithm based on a Non-Local Attention mechanism, the technical scheme of the invention adopts a strategy of 'Local area first and then whole', firstly, a Two-stage Local Attention mechanism structure (TLA structure for short) is provided to replace a conventional Non-Local Attention mechanism structure, then the TLA structure is embedded into the existing image segmentation network (such as Encoder-decoder series, deep Lab series and other segmentation networks), and a space target segmentation method based on a lightweight Non-Local Attention mechanism is provided. The method adapts to the data characteristics of the space image, realizes the light weight of attention mechanism calculation, compresses the weight matrix parameters, and can be specially used for space target segmentation tasks.

Fig. 1 schematically shows a flow chart of a spatial object segmentation method based on a non-local attention mechanism according to an embodiment of the present invention.

As shown in FIG. 1, the method for segmenting a spatial object based on a non-local attention mechanism according to the embodiment may include operations S110 to S130.

In operation S110, a two-stage local attention mechanism structure including a local feature aggregation stage and a cross-local feature aggregation stage is constructed, where the local feature aggregation stage is configured to perform spatial region grouping on a complete feature map embedded in a feature space, and extract local correlation features; and the cross-local feature aggregation stage is used for expanding local correlation features and extracting global correlation features.

In operation S120, the two-stage local attention mechanism structure is embedded into the image semantic segmentation network to obtain an updated image semantic segmentation network.

In operation S130, a spatial image to be processed is acquired, the spatial image to be processed is input into the updated image semantic segmentation network, and a segmentation result of the spatial object is output.

In the embodiment of the invention, the TLA structure adopts a 'Local area first and then integral' feature modeling strategy, and the feature correlation calculation and aggregation process of the conventional Non-Local attention mechanism structure is divided into a 'Local feature calculation aggregation' and a 'cross-Local feature calculation aggregation' two-stage feature aggregation process, so that the number of calculated feature points in a feature map is remarkably reduced, the sampling sparseness of the calculated feature points is realized, and the light weight and high efficiency of the Non-Local attention mechanism calculation are achieved. The detailed principle of the TLA structure is as follows.

Figure 2 schematically illustrates a schematic diagram of a TLA architecture according to an embodiment of the present invention.

As shown in FIG. 2, compared with the conventional Non-Local attention mechanism structure, the core of the first phase of the TLA structure, i.e. the Local feature aggregation phase, is to first embed the complete feature map in the feature space before calculating the attention weight matrix

In the spatial dimensionN×NAnd (4) area grouping processing.

Then, respectively atN ² Feature correlation calculations for features in the local space from the local feature set may be generatedN ² A local weight matrixV _Li (X) Thus completingN ² And modeling the correlation between the characteristic points in the local area space.

Finally, the original characteristic diagram is subjected to characteristic reconstruction by matrix multiplication operation by using the corresponding weight matrix in each local area group to generate a new local reconstruction characteristic diagramO _L (X)。

Fig. 3 schematically shows a flow chart of a local feature aggregation phase according to an embodiment of the invention.

Referring to fig. 3 in conjunction with fig. 2, in the embodiment of the present invention, the local feature aggregation stage in operation S110 may be specifically performed according to the following operations S1111 to S1114.

In operation S1111, a feature map is input using convolution layer pairsXRespectively carrying out feature mapping to obtain a complete feature map embedded in a feature space

。

In this operation, the convolutional layer is also referred to as a 1 × 1 convolutional layer. Input feature map

Embedding complete feature maps of feature spaces

Calculated according to the following formula:

in the formula (I), the compound is shown in the specification,H,W,Crespectively representing input characteristicsXHeight, width and number of characteristic channels;

respectively representing the parameters of the 1 × 1 convolutional layers;

representing a complete feature map via feature mapping

The number of channels of (a);

representing a matrix of rows and columns over the real number domain.

In operation S1112, the complete feature map is processed

Divided into in spatial dimensionN×NA local feature group, each local feature group including local sub-features

Wherein, in the process,Nfor adjustable local packet number, 1 is less than or equal toi≤N ² 。

In the present operation, the operation of the apparatus,Nand the over-parameter represents the adjustable local area packet number. For complete feature map

Respectively generateN ² The local sub-features are shown as follows:

in operation S1113, with respect to any ofiA local feature group for two local sub-features

Performing local similarity calculation to obtain the firstiLocal area weight matrixV _Li (X) To be connected toiLocal area weight matrixV _Li (X) With another local sub-feature

Carrying out a characteristic polymerization to obtainiLocal area reconstruction feature mapO _Li (X)。

In particular toTo any secondiA local feature group, which is subjected to the following matrix multiplication operation to calculateiA local weight matrixV _Li (X)：

In the formula (DEG) ^T Representing the transpose of the matrix.

Then, willV _Li (X) And with

Performing the following matrix multiplication to calculateiLocal area reconstruction feature mapO _Li (X)：

In operation S1114, aN ² Combining the local reconstruction feature maps of the local feature groups to generate an input feature mapXLocal reconstruction feature map ofO _L (X)。

In this operation, toN ² Repeating the above operation S1113 for each local feature group, calculating local reconstruction feature maps of other local feature groups, and combining to form input feature mapXLocal reconstruction feature map ofO _L (X) As shown in the following formula:

in the above calculation process, all the local weight matrixesV _L (X) The parameter size is:

by the embodiment of the invention, the space local grouping strategy in the local feature aggregation stage effectively reduces the calculation complexity of parameters and the space size of the weight matrix. However, the strategy also causes the problem of "dense connection in local area, no connection between local areas" of feature points in space, so that the feature aggregation process can only occur in a local area, which is very unfavorable for the global attention feature extraction.

Specifically, in the local feature aggregation stage, dense modeling is performed on the feature point correlation in the local region under the conditions of low spatial complexity and low computational complexity, but the feature point modeling capability between the feature points in the feature map across the local region is lost, and the problem of the lack of the correlation modeling capability between the feature points in the local region across the local region is mainly solved in the next local feature aggregation stage, so that the local correlation features extracted in the local feature aggregation stage are expanded to the whole feature map to extract the global correlation features.

Therefore, the embodiment of the invention also introduces a cross-local feature aggregation stage, wherein the cross-local feature aggregation stage uses the local reconstruction feature map of the local feature aggregation stageO _L (X) The specific calculation process for inputting the feature map is described in detail below.

Fig. 4 schematically shows a flow diagram of a cross-local feature aggregation phase according to an embodiment of the invention. FIG. 5 schematically illustrates a schematic diagram of a two-stage feature aggregation process according to an embodiment of the invention.

Referring to fig. 4 in conjunction with fig. 2 and fig. 5, in the embodiment of the present invention, the cross-local feature aggregation stage in the operation S110 may be specifically performed according to the following operations S1121 to S1125.

In operation S1121, feature maps are locally reconstructed using convolutional layersO _L (X) Respectively mapping the features to obtain an embedded feature map of the feature space

。

In this operation, the convolutional layer is also 1 × 1 convolutional layer. Local reconstructed feature map

Embedding feature maps of feature spaces

Calculated according to the following formula:

in the formula (I), the compound is shown in the specification,

each represents the parameters of 1 × 1 convolutional layer.

In operation S1122, a feature map is embedded

Divided into in spatial dimensionN×NSub-regions each having a characteristic point

Wherein, 1 is less than or equal toj≤HW/N ² ，H,WRespectively representing input characteristicsXHeight and width of (a).

In this operation, the spatial region is again grouped to generateN ² Sub-regions, each sub-region then havingHW/N ² Characteristic points, as shown in the following formula:

in operation S1123, with any of the 1 st sub-regionsjThe characteristic point is a reference pointR _pj Will be leftN ² 1 sub-region and reference pointR _pj The feature points whose relative positions are kept consistent are collectively called anchor pointsA _pj To be reference pointR _pj And correspondingN ² -1 anchor pointA _pj Together form the firstjThe cross-regional characteristic set of the group is embedded into the characteristic map

Exist ofHW/N ² The set of cross-regional feature sets.

In this operation, the first sub-region (i.e. the upper left sub-region in FIG. 5) is divided into any of the 1 st sub-regionsjThe characteristic point is a reference pointR _pj (Reference Point), the restN ² 1 sub-region and reference pointR _pj The feature points with the consistent relative positions are collectively called anchor pointsA _pj (Anchor Point), represented by the formula:

in the formula (I), the compound is shown in the specification,

is a reference pointR _pj To a corresponding secondkAn anchor point is arranged at the top of the anchor point,k=2,3,…,N ² 。

please continue to refer to FIG. 5, since each residue remainsThe sub-regions all have an anchor point corresponding to the reference point, so that for any reference pointR _pj Are all provided withN ² -1 anchor pointA _pj . The embodiment of the invention uses the reference pointR _pj And correspondingN ² -1 anchor pointA _pj Together form the firstjSet of cross-regional feature sets of size

Embedded feature map of

Each sub-region havingHW/N ² A characteristic point, therefore existsHW/N ² The set of cross-regional feature sets.

In operation S1124, for anyjSet of cross-regional feature sets, for two of which feature points

Performing local similarity calculation to obtain the firstjGroup cross-region weight matrixV _Gj (X) A first step ofjGroup cross-region weight matrixV _Gj (X) With another characteristic point

Carrying out a characteristic polymerization to obtainjGroup cross-region reconstruction feature mapO _Gj (X)。

Specifically, for any secondjSet the cross-regional feature set, perform the following matrix multiplication, calculatejGroup cross-region weight matrixV _Gj (X)：

Then, willV _Gj (X) And

performing the following matrix multiplication to calculatejGroup cross-region reconstruction feature mapO _Gj (X)：

In operation S1125, the methodHW/N ² Combining the cross-region reconstruction characteristic maps of the set of cross-region characteristic sets to generate a local reconstruction characteristic mapO _L (X) Global reconstructed feature map ofO _G (X)。

In this operation, theHW/N ² Grouping the cross-regional feature sets, repeating the operation S1124, calculating cross-regional reconstruction feature maps of other groups of cross-regional feature sets, and finally combining to generate a local reconstruction feature mapO _L (X) Global reconstructed feature map ofO _G (X) As shown in the following formula:

in the above calculation process, all cross-region weight matrixesV _G (X) The parameter size is:

by embodiments of the present invention, two stages in a TLA architectureV _L (X) AndV _G (X) The weight matrix parameter size is:

with continued reference to FIG. 5, the embodiment of the present invention uses any reference point in the figureR _pj For example, asShowing how to build through TLA architectureR _pj Correlation with arbitrary feature points in different sub-regions. From the firstjA reference pointR _pj Features and correspondingN ² -1 anchor pointA _pj The characteristics together formjThe set of cross-regional feature sets.

In the local feature aggregation stage, any feature point and anchor point in each local feature group are constructedA _pj Features and arbitrary features and reference pointsR _pj Correlation between features; in the cross-local area feature aggregation stage, reference points are establishedR _pj Anchoring features to other sub-areasA _pj Correlation between features, using anchor pointsA _pj Features acting as relays to slave relevant features from "anchor pointsA _pj Correlation with feature points "and" anchor pointsA _pj And a reference pointR _pj Correlation "translation" to reference pointsR _pj And (4) relevance of any characteristic point, thereby realizing the modeling of the relevance of any two characteristic points in the characteristic diagram.

The weight matrix parameter of the conventional Non-Local attention mechanism is as followsO(HW×HW) Thus, the TLA structure to conventional attention mechanism weight matrix parameter size ratio is:

in general, the actual programming selects the hyper-parametersNAnd if =4 to 8, calculating according to the formula as follows:

when inputting the feature map sizeH=W=16，NAnd when = 4:

when inputting the feature sizeH=W=16，NWhen = 8:

therefore, TLA structure weight matrix parameters are only 1 \8260ofthe original parameter quantity, 8 to 1 \82604, and the weight matrix parameter quantity can be remarkably reduced, so that the calculation quantity of the Non-Local attention mechanism and the light weight of the parameter quantity are realized.

By the embodiment of the invention, the calculation complexity is low compared with the conventional Non-Local attention mechanism. Under the condition that the network performance is consistent, the parameter quantity and the parameter calculation quantity are greatly reduced, and the video memory occupation quantity of a GPU in the network training process is reduced; the matrix parameter number is compressed several times to dozens of times according to different parameter settings.

Compared with the conventional Non-Local attention mechanism, the weight matrix parameter is small, and occupied storage is small. The embodiment of the invention greatly enhances the adaptability of the Non-Local structure to large-size characteristic diagrams, and can be used as a special segmentation method for dealing with wide-space target images.

FIG. 6 schematically illustrates a schematic diagram of a spatial object segmentation method based on a non-local attention mechanism according to an embodiment of the present invention.

As shown in fig. 6, in the embodiment of the present invention, the image semantic segmentation network in operation S120 includes a full convolution neural network with ResNet-50 as a backbone network, and the full convolution neural network may be, for example, a convolution neural network of an Encoder-decoder series or a deep lab series.

Therefore, the TLA structure is embedded into the existing image semantic segmentation network to obtain the updated image semantic segmentation network.

Further, in operation S120, embedding the two-stage local attention mechanism structure into the image semantic segmentation network specifically includes: and embedding a two-stage local attention mechanism structure between the last convolution layer in the full convolution neural network and the classifier to obtain the global correlation characteristic. Therefore, the light TLA structure is used for carrying out feature aggregation on the deep features of the space target so as to obtain global relevant features, and adverse effects on space target segmentation caused by the influences of uneven illumination, edge blurring and the like of the space image are suppressed.

Further, as shown in fig. 6, the classifier then obtains an output image with the same size as the spatial image to be processed by 8 times of upsampling.

Based on the method disclosed above, the present invention further provides a spatial object segmentation apparatus based on a non-local attention mechanism, which will be described in detail below with reference to fig. 7.

Fig. 7 schematically shows a block diagram of a spatial object segmentation apparatus based on a non-local attention mechanism according to an embodiment of the present invention.

As shown in fig. 7, the non-local attention mechanism-based spatial object segmentation apparatus 700 according to this embodiment includes a structure construction module 710, a structure embedding module 720, and a spatial object segmentation module 730.

The structure construction module 710 is configured to construct a two-stage local attention mechanism structure including a local feature aggregation stage and a cross-local feature aggregation stage, where the local feature aggregation stage is configured to perform spatial region grouping on a complete feature map embedded in a feature space, and extract local correlation features; and the cross-local feature aggregation stage is used for expanding local correlation features and extracting global correlation features.

And the structure embedding module 720 is used for embedding the two-stage local attention mechanism structure into the image semantic segmentation network to obtain an updated image semantic segmentation network.

And the space target segmentation module 730 is configured to acquire a to-be-processed space image, input the to-be-processed space image into an updated image semantic segmentation network, and output a segmentation result of the space target.

It should be noted that the embodiment of the apparatus portion is similar to the embodiment of the method portion, and the achieved technical effects are also similar, for details, please refer to the method embodiment, which is not described herein again.

According to an embodiment of the present invention, any plurality of the structure construction module 710, the structure embedding module 720 and the spatial object segmentation module 730 may be combined into one module to be implemented, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of other modules and implemented in one module. According to an embodiment of the present invention, at least one of the structure building module 710, the structure embedding module 720 and the spatial object segmentation module 730 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware and firmware, or in any suitable combination of any of them. Alternatively, at least one of the structure building module 710, the structure embedding module 720 and the spatial object segmentation module 730 may be at least partially implemented as a computer program module, which when executed, may perform a corresponding function.

As shown in fig. 8, an electronic device 800 according to an embodiment of the present invention includes a processor 801 which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., CPU), an instruction set processor and/or related chip sets and/or a special purpose microprocessor (e.g., application Specific Integrated Circuit (ASIC)), among others. The processor 801 may also include on-board memory for caching purposes. The processor 801 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present invention.

In the RAM 803, various programs and data necessary for the operation of the electronic apparatus 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flow according to the embodiment of the present invention by executing programs in the ROM 802 and/or the RAM 803. Note that the programs may also be stored in one or more memories other than the ROM 802 and RAM 803. The processor 801 may also perform various operations of method flows according to embodiments of the present invention by executing programs stored in the one or more memories.

Electronic device 800 may also include input/output (I/O) interface 805, input/output (I/O) interface 805 also connected to bus 804, according to an embodiment of the present invention. The electronic device 800 may also include one or more of the following components connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that the computer program read out therefrom is mounted on the storage section 808 as necessary.

Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks. The techniques of the present invention may be implemented in hardware and/or in software (including firmware, microcode, etc.). Furthermore, the techniques of this disclosure may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon for use by or in connection with an instruction execution system.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise. Further, the word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A space target segmentation method based on a non-local attention mechanism is characterized by comprising the following steps:

constructing a two-stage local attention mechanism structure comprising a local feature aggregation stage and a cross-local feature aggregation stage, wherein the local feature aggregation stage is used for carrying out spatial region grouping on a complete feature map embedded in a feature space and extracting local correlation features; the cross-local feature aggregation stage is used for expanding the local correlation features and extracting global correlation features;

embedding the two-stage local attention mechanism structure into an image semantic segmentation network to obtain an updated image semantic segmentation network;

acquiring a spatial image to be processed, inputting the spatial image to be processed into the updated image semantic segmentation network, and outputting a segmentation result of a spatial target;

wherein the local feature aggregation stage is performed in the following manner:

inputting feature maps using convolutional layer pairsXRespectively performing feature mapping to obtain the characteristic spaceComplete characteristic diagram

；

For the complete feature map

Is divided into in spatial dimensionN×NA local feature set, each local feature set including local sub-features

Wherein, in the process,Nfor adjustable number of local groups, 1 ≤i≤N ² ；

For any secondiA local feature group for two local sub-features

Performing local similarity calculation to obtain the firstiLocal area weight matrixV _Li (X) The first mentionediLocal area weight matrixV _Li (X) With another local sub-feature

Carrying out a characteristic polymerization to obtainiLocal area reconstruction feature mapO _Li (X)；

Will be described inN ² Combining local area reconstruction feature maps of the local area feature groups to generate the input feature mapXLocal reconstruction feature map ofO _L (X)；

The local weight matrixV _Li (X) And a firstiLocal area reconstruction feature mapO _Li (X) Calculated according to the following formula:

in the formula, (·) ^T Represents a transpose of a matrix;H,Wrespectively representing input characteristicsXHeight and width of (a);

complete feature map representing embedded feature space

The number of channels of (a);

representing a matrix of rows and columns over a real number domain.

2. The non-local attention mechanism-based spatial object segmentation method according to claim 1, wherein the cross-local feature aggregation stage is performed in the following manner:

reconstructing the feature map locally using convolution layersO _L (X) Respectively carrying out feature mapping to obtain an embedded feature map of a feature space

；

Embedding the feature map

Is divided into in spatial dimensionN×NSub-regions each having a characteristic point

Wherein, 1 is less than or equal toj≤HW/N ² ，H,WRespectively represent inputsCharacteristic diagramXHeight and width of;

in any of the 1 st subregionjThe characteristic point is a reference pointR _pj Will be leftN ² -1 sub-region within said reference pointR _pj The feature points whose relative positions are kept consistent are collectively called anchor pointsA _pj The reference pointR _pj And correspondingN ² -1 of said anchor pointsA _pj Together form the firstjThe cross-regional characteristic set is set, then the embedded characteristic map

Exist ofHW/N ² A set of cross-regional feature sets;

for any secondjSet of cross-regional feature sets, for two of which feature points

Performing local similarity calculation to obtain the firstjGroup cross-region weight matrixV _Gj (X) The first mentionedjGroup cross-region weight matrixV _Gj (X) With another characteristic point

Carrying out a characteristic polymerization to obtainjGroup cross-region reconstruction feature mapO _Gj (X)；

Will be described inHW/N ² Combining the cross-region reconstruction feature maps of the group cross-region feature sets to generate the local reconstruction feature mapO _L (X) Global reconstructed feature map ofO _G (X)。

3. The non-local attention mechanism-based spatial object segmentation method of claim 2, wherein the first step isjGroup cross-region weight matrixV _Gj (X) And a first step ofjGroup cross-region reconstruction feature mapO _Gj (X) Calculated according to the following formula:

in the formula (DEG) ^T Represents a transpose of a matrix;

complete feature map representing an embedded feature space

The number of channels of (a);

representing a matrix of rows and columns over the real number domain.

4. The non-local attention mechanism-based spatial object segmentation method according to claim 1, wherein the image semantic segmentation network comprises a full convolution neural network with ResNet-50 as a backbone network.

5. The spatial object segmentation method based on the non-local attention mechanism according to claim 4, wherein the embedding the two-stage local attention mechanism structure into the image semantic segmentation network specifically includes:

embedding the two-stage local attention mechanism structure between the last convolutional layer in the fully convolutional neural network and a classifier to obtain the global correlation feature.

6. The non-local attention mechanism-based spatial target segmentation method according to claim 5, wherein the classifier obtains an output image with the same size as the spatial image to be processed through 8 times of upsampling.

7. A spatial object segmentation apparatus based on a non-local attention mechanism, comprising:

the structure construction module is used for constructing a two-stage local attention mechanism structure comprising a local feature aggregation stage and a cross-local feature aggregation stage, wherein the local feature aggregation stage is used for carrying out space region grouping on a complete feature map embedded into a feature space and extracting local correlation features; the cross-local feature aggregation stage is used for expanding the local correlation features and extracting global correlation features;

the structure embedding module is used for embedding the two-stage local attention mechanism structure into an image semantic segmentation network to obtain an updated image semantic segmentation network;

the space target segmentation module is used for acquiring a space image to be processed, inputting the space image to be processed into the updated image semantic segmentation network and outputting a segmentation result of a space target;

inputting feature maps using convolutional layer pairsXRespectively carrying out feature mapping to obtain a complete feature map embedded in a feature space

；

For the complete feature map

Divided into in spatial dimensionN×NA local feature set, each local feature set including local sub-features

Wherein, in the step (A),Nfor adjustable local divisionThe number of groups is more than or equal to 1i≤N ² ；

For any secondiA local feature group for two local sub-features

Performing local similarity calculation to obtain the firstiLocal area weight matrixV _Li (X) The first mentionediA local weight matrixV _Li (X) With another local sub-feature

The local weight matrixV _Li (X) And a first step ofiLocal area reconstruction feature mapO _Li (X) Calculated according to the following formula:

in the formula (DEG) ^T Represents a transpose of a matrix;H,Wrespectively representing input characteristicsXHeight and width of;

complete feature map representing an embedded feature space

The number of channels of (a);

representing a matrix of rows and columns over a real number domain.

8. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1 to 6.