CN115131568B - Space target segmentation method and device based on non-local attention mechanism - Google Patents

Space target segmentation method and device based on non-local attention mechanism Download PDF

Info

Publication number
CN115131568B
CN115131568B CN202211050721.4A CN202211050721A CN115131568B CN 115131568 B CN115131568 B CN 115131568B CN 202211050721 A CN202211050721 A CN 202211050721A CN 115131568 B CN115131568 B CN 115131568B
Authority
CN
China
Prior art keywords
local
feature
attention mechanism
space
stage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211050721.4A
Other languages
Chinese (zh)
Other versions
CN115131568A (en
Inventor
李磊
胡玉新
丁昊
喻小东
闫国刚
高斌
崔婷婷
刘怡丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aerospace Information Research Institute of CAS
Original Assignee
Aerospace Information Research Institute of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aerospace Information Research Institute of CAS filed Critical Aerospace Information Research Institute of CAS
Priority to CN202211050721.4A priority Critical patent/CN115131568B/en
Publication of CN115131568A publication Critical patent/CN115131568A/en
Application granted granted Critical
Publication of CN115131568B publication Critical patent/CN115131568B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a space target segmentation method and device based on a Non-Local attention mechanism, relates to the technical field of computer vision, and aims to solve the technical problems that an existing image segmentation algorithm based on a Non-Local attention mechanism is large in weight matrix parameter quantity, limited in application range and difficult in compression of weight matrix parameters. The method comprises the following steps: constructing a two-stage local attention mechanism structure comprising a local feature aggregation stage and a cross-local feature aggregation stage; embedding the two-stage local attention mechanism structure into an image semantic segmentation network to obtain an updated image semantic segmentation network; and acquiring a space image to be processed, inputting the space image to be processed into the updated image semantic segmentation network, and outputting a segmentation result of the space target. The method can lighten the calculation amount of the attention mechanism, compress the parameter amount of the weight matrix and expand the application of the attention mechanism.

Description

Space target segmentation method and device based on non-local attention mechanism
Technical Field
The invention relates to the technical field of computer vision, in particular to the field of space target image segmentation, and more particularly relates to a space target segmentation method and device based on a non-local attention mechanism.
Background
In recent years, human aerospace exploration activities are increasingly frequent, the number of targets in orbit space of various countries is increased sharply, and collision early warning on the space targets has important significance for guaranteeing the on-orbit safety of space stations and various high-value space targets in China. The spatial situation awareness technology is used for determining the state, the attribute and the intention of a non-cooperative space target by monitoring the position and motion state information of the non-cooperative space target for a long time, and is a main countermeasure and precaution means for the space safety problem at present. At present, space-based optical observation is an important technical means for obtaining space target information, and compared with ground-based optical observation, the space-based optical observation is not limited by factors such as atmospheric interference and meteorological conditions.
The main task of the space target segmentation technology is to segment space targets and target component information from a starry sky background, so that subsequent target information (attributes, functions and intentions) can be further interpreted conveniently, and therefore, the space target segmentation is a basic key technology of a space situation perception technology. At present, the segmentation objects of the mainstream image segmentation technology are streetscapes, automobiles, airplanes, ships and warships, and a special segmentation method for a space target still remains blank in the industry. Compared with common natural images and high-resolution remote sensing images, the space-based optical observation images have the data characteristics of wide and large scenes, uneven illumination, blurred edges, serious overexposure and the like, and the data characteristics seriously interfere with image feature extraction, so that the design of a special space target segmentation method is a very challenging matter to solve urgently.
Aiming at the data characteristics of the space image, attention can be drawn for designing a special space target segmentation method. Since an image segmentation algorithm based on Non-Local attention mechanism is provided, the method is widely applied to the field of natural image and high-resolution remote sensing image segmentation due to the strong associated feature extraction capability. The method is limited by the Non-Local attention mechanism principle and the characteristics of spatial image data, and the following defects exist when the image segmentation algorithm of the conventional Non-Local attention mechanism is directly used for the spatial target image segmentation task:
1) The weight matrix has a large parameter number, and is very easy to occupy too much video memory, which causes the video memory explosion of a GPU (Graphics Processing Unit). The weight matrix parameter isO(HW×HW) The space complexity of the network, namely the parameter quantity and the calculated quantity are in direct proportion to the square of an image size value HW, so that the storage cost in the network calculation process is high;
2) The application range is limited. Due to the limitation of the size of the feature map, the application of a Non-Local attention mechanism in a network feature extraction stage of a dense prediction task (such as a semantic segmentation task) is greatly limited, and a Non-Local structure is difficult to use in a large-scale feature map such as a spatial target;
3) The parameters of the weight matrix are difficult to compress, the lightweight of the network parameters is difficult to realize, and the requirement on the performance of the server is high.
Therefore, the existing image segmentation algorithm based on the Non-Local attention mechanism has some defects and needs to be further improved.
Disclosure of Invention
In view of the above problems, the present invention provides a method and an apparatus for spatial object segmentation based on a non-local attention mechanism.
The invention provides a space target segmentation method based on a non-local attention mechanism, which comprises the following steps: constructing a two-stage local attention mechanism structure comprising a local feature aggregation stage and a cross-local feature aggregation stage, wherein the local feature aggregation stage is used for performing spatial region grouping on a complete feature map embedded in a feature space and extracting local correlation features; the cross-local feature aggregation stage is used for expanding local correlation features and extracting global correlation features; embedding the two-stage local attention mechanism structure into an image semantic segmentation network to obtain an updated image semantic segmentation network; and acquiring a space image to be processed, inputting the space image to be processed into the updated image semantic segmentation network, and outputting a segmentation result of the space target.
The invention provides a space target segmentation method based on a non-local attention mechanism, which comprises the following steps: the structure construction module is used for constructing a two-stage local attention mechanism structure comprising a local feature aggregation stage and a cross-local feature aggregation stage, wherein the local feature aggregation stage is used for carrying out space region grouping on a complete feature map embedded into a feature space and extracting local correlation features; the cross-local feature aggregation stage is used for expanding local correlation features and extracting global correlation features; the structure embedding module is used for embedding the two-stage local attention mechanism structure into the image semantic segmentation network to obtain an updated image semantic segmentation network; and the space target segmentation module is used for acquiring a space image to be processed, inputting the space image to be processed into the updated image semantic segmentation network and outputting a segmentation result of the space target.
The present invention also provides an electronic device, comprising: one or more processors; memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the non-local attention mechanism based spatial object segmentation method described above.
Compared with the prior art, the space target segmentation method and device based on the non-local attention mechanism provided by the invention at least have the following beneficial effects:
(1) The two-stage Local attention mechanism structure (TLA structure) is provided, the number of overall calculation characteristic points of the conventional Non-Local attention mechanism structure can be obviously reduced, the calculation amount of the attention mechanism is lightened, the parameter amount of a weight matrix is compressed, and the application of the attention mechanism is expanded;
(2) The method adopts a characteristic modeling strategy of 'Local area first and then integral', and divides the conventional Non-Local attention mechanism characteristic correlation calculation and aggregation process into two-stage calculation processes of 'Local characteristic aggregation' and 'cross-Local characteristic aggregation', so as to realize the sparseness of the sampling of the calculated characteristic points;
(3) The TLA structure is embedded into the deep layer of the convolution network, so that feature aggregation of a deep feature map is realized, adverse effects of uneven illumination, fuzzy edge and the like of a space image are inhibited, feature separability is enhanced, and space target segmentation precision is improved.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent from the following description of the embodiments of the present invention with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates a flow chart of a method for spatial object segmentation based on a non-local attention mechanism according to an embodiment of the present invention;
FIG. 2 schematically illustrates a schematic diagram of a TLA architecture in accordance with an embodiment of the present invention;
FIG. 3 schematically shows a flow diagram of a local feature aggregation phase according to an embodiment of the invention;
FIG. 4 schematically shows a flow diagram of a cross-local feature aggregation phase according to an embodiment of the invention;
FIG. 5 schematically illustrates a schematic diagram of a two-stage feature aggregation process according to an embodiment of the invention;
FIG. 6 schematically illustrates a schematic diagram of a spatial target segmentation method based on a non-local attention mechanism, according to an embodiment of the present invention;
FIG. 7 schematically illustrates a block diagram of a spatial target segmentation apparatus based on a non-local attention mechanism according to an embodiment of the present invention;
fig. 8 schematically shows a block diagram of an electronic device adapted to implement the access control method according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Aiming at the defects of an image segmentation algorithm based on a Non-Local Attention mechanism, the technical scheme of the invention adopts a strategy of 'Local area first and then whole', firstly, a Two-stage Local Attention mechanism structure (TLA structure for short) is provided to replace a conventional Non-Local Attention mechanism structure, then the TLA structure is embedded into the existing image segmentation network (such as Encoder-decoder series, deep Lab series and other segmentation networks), and a space target segmentation method based on a lightweight Non-Local Attention mechanism is provided. The method adapts to the data characteristics of the space image, realizes the light weight of attention mechanism calculation, compresses the weight matrix parameters, and can be specially used for space target segmentation tasks.
Fig. 1 schematically shows a flow chart of a spatial object segmentation method based on a non-local attention mechanism according to an embodiment of the present invention.
As shown in FIG. 1, the method for segmenting a spatial object based on a non-local attention mechanism according to the embodiment may include operations S110 to S130.
In operation S110, a two-stage local attention mechanism structure including a local feature aggregation stage and a cross-local feature aggregation stage is constructed, where the local feature aggregation stage is configured to perform spatial region grouping on a complete feature map embedded in a feature space, and extract local correlation features; and the cross-local feature aggregation stage is used for expanding local correlation features and extracting global correlation features.
In operation S120, the two-stage local attention mechanism structure is embedded into the image semantic segmentation network to obtain an updated image semantic segmentation network.
In operation S130, a spatial image to be processed is acquired, the spatial image to be processed is input into the updated image semantic segmentation network, and a segmentation result of the spatial object is output.
In the embodiment of the invention, the TLA structure adopts a 'Local area first and then integral' feature modeling strategy, and the feature correlation calculation and aggregation process of the conventional Non-Local attention mechanism structure is divided into a 'Local feature calculation aggregation' and a 'cross-Local feature calculation aggregation' two-stage feature aggregation process, so that the number of calculated feature points in a feature map is remarkably reduced, the sampling sparseness of the calculated feature points is realized, and the light weight and high efficiency of the Non-Local attention mechanism calculation are achieved. The detailed principle of the TLA structure is as follows.
Figure 2 schematically illustrates a schematic diagram of a TLA architecture according to an embodiment of the present invention.
As shown in FIG. 2, compared with the conventional Non-Local attention mechanism structure, the core of the first phase of the TLA structure, i.e. the Local feature aggregation phase, is to first embed the complete feature map in the feature space before calculating the attention weight matrix
Figure 881904DEST_PATH_IMAGE001
In the spatial dimensionN×NAnd (4) area grouping processing.
Then, respectively atN 2 Feature correlation calculations for features in the local space from the local feature set may be generatedN 2 A local weight matrixV Li (X) Thus completingN 2 And modeling the correlation between the characteristic points in the local area space.
Finally, the original characteristic diagram is subjected to characteristic reconstruction by matrix multiplication operation by using the corresponding weight matrix in each local area group to generate a new local reconstruction characteristic diagramO L (X)。
Fig. 3 schematically shows a flow chart of a local feature aggregation phase according to an embodiment of the invention.
Referring to fig. 3 in conjunction with fig. 2, in the embodiment of the present invention, the local feature aggregation stage in operation S110 may be specifically performed according to the following operations S1111 to S1114.
In operation S1111, a feature map is input using convolution layer pairsXRespectively carrying out feature mapping to obtain a complete feature map embedded in a feature space
Figure 301384DEST_PATH_IMAGE002
In this operation, the convolutional layer is also referred to as a 1 × 1 convolutional layer. Input feature map
Figure 652730DEST_PATH_IMAGE003
Embedding complete feature maps of feature spaces
Figure 270794DEST_PATH_IMAGE004
Calculated according to the following formula:
Figure 213342DEST_PATH_IMAGE005
Figure 339692DEST_PATH_IMAGE006
Figure 229151DEST_PATH_IMAGE007
in the formula (I), the compound is shown in the specification,H,W,Crespectively representing input characteristicsXHeight, width and number of characteristic channels;
Figure 170562DEST_PATH_IMAGE008
respectively representing the parameters of the 1 × 1 convolutional layers;
Figure 736541DEST_PATH_IMAGE009
representing a complete feature map via feature mapping
Figure 661772DEST_PATH_IMAGE010
The number of channels of (a);
Figure 886080DEST_PATH_IMAGE011
representing a matrix of rows and columns over the real number domain.
In operation S1112, the complete feature map is processed
Figure 150839DEST_PATH_IMAGE012
Divided into in spatial dimensionN×NA local feature group, each local feature group including local sub-features
Figure 638452DEST_PATH_IMAGE013
Wherein, in the process,Nfor adjustable local packet number, 1 is less than or equal toiN 2
In the present operation, the operation of the apparatus,Nand the over-parameter represents the adjustable local area packet number. For complete feature map
Figure 267623DEST_PATH_IMAGE014
Respectively generateN 2 The local sub-features are shown as follows:
Figure 764464DEST_PATH_IMAGE015
Figure 211626DEST_PATH_IMAGE016
Figure 870140DEST_PATH_IMAGE017
in operation S1113, with respect to any ofiA local feature group for two local sub-features
Figure 238804DEST_PATH_IMAGE018
Performing local similarity calculation to obtain the firstiLocal area weight matrixV Li (X) To be connected toiLocal area weight matrixV Li (X) With another local sub-feature
Figure 257445DEST_PATH_IMAGE019
Carrying out a characteristic polymerization to obtainiLocal area reconstruction feature mapO Li (X)。
In particular toTo any secondiA local feature group, which is subjected to the following matrix multiplication operation to calculateiA local weight matrixV Li (X):
Figure 27955DEST_PATH_IMAGE020
In the formula (DEG) T Representing the transpose of the matrix.
Then, willV Li (X) And with
Figure 388529DEST_PATH_IMAGE021
Performing the following matrix multiplication to calculateiLocal area reconstruction feature mapO Li (X):
Figure 775648DEST_PATH_IMAGE022
In operation S1114, aN 2 Combining the local reconstruction feature maps of the local feature groups to generate an input feature mapXLocal reconstruction feature map ofO L (X)。
In this operation, toN 2 Repeating the above operation S1113 for each local feature group, calculating local reconstruction feature maps of other local feature groups, and combining to form input feature mapXLocal reconstruction feature map ofO L (X) As shown in the following formula:
Figure 348712DEST_PATH_IMAGE023
in the above calculation process, all the local weight matrixesV L (X) The parameter size is:
Figure 193302DEST_PATH_IMAGE024
by the embodiment of the invention, the space local grouping strategy in the local feature aggregation stage effectively reduces the calculation complexity of parameters and the space size of the weight matrix. However, the strategy also causes the problem of "dense connection in local area, no connection between local areas" of feature points in space, so that the feature aggregation process can only occur in a local area, which is very unfavorable for the global attention feature extraction.
Specifically, in the local feature aggregation stage, dense modeling is performed on the feature point correlation in the local region under the conditions of low spatial complexity and low computational complexity, but the feature point modeling capability between the feature points in the feature map across the local region is lost, and the problem of the lack of the correlation modeling capability between the feature points in the local region across the local region is mainly solved in the next local feature aggregation stage, so that the local correlation features extracted in the local feature aggregation stage are expanded to the whole feature map to extract the global correlation features.
Therefore, the embodiment of the invention also introduces a cross-local feature aggregation stage, wherein the cross-local feature aggregation stage uses the local reconstruction feature map of the local feature aggregation stageO L (X) The specific calculation process for inputting the feature map is described in detail below.
Fig. 4 schematically shows a flow diagram of a cross-local feature aggregation phase according to an embodiment of the invention. FIG. 5 schematically illustrates a schematic diagram of a two-stage feature aggregation process according to an embodiment of the invention.
Referring to fig. 4 in conjunction with fig. 2 and fig. 5, in the embodiment of the present invention, the cross-local feature aggregation stage in the operation S110 may be specifically performed according to the following operations S1121 to S1125.
In operation S1121, feature maps are locally reconstructed using convolutional layersO L (X) Respectively mapping the features to obtain an embedded feature map of the feature space
Figure 459198DEST_PATH_IMAGE025
In this operation, the convolutional layer is also 1 × 1 convolutional layer. Local reconstructed feature map
Figure 333613DEST_PATH_IMAGE026
Embedding feature maps of feature spaces
Figure 975947DEST_PATH_IMAGE027
Calculated according to the following formula:
Figure 455470DEST_PATH_IMAGE028
Figure 875956DEST_PATH_IMAGE029
Figure 440930DEST_PATH_IMAGE030
in the formula (I), the compound is shown in the specification,
Figure 621375DEST_PATH_IMAGE031
each represents the parameters of 1 × 1 convolutional layer.
In operation S1122, a feature map is embedded
Figure 752142DEST_PATH_IMAGE032
Divided into in spatial dimensionN×NSub-regions each having a characteristic point
Figure 359841DEST_PATH_IMAGE033
Wherein, 1 is less than or equal tojHW/N 2 H,WRespectively representing input characteristicsXHeight and width of (a).
In this operation, the spatial region is again grouped to generateN 2 Sub-regions, each sub-region then havingHW/N 2 Characteristic points, as shown in the following formula:
Figure 159914DEST_PATH_IMAGE034
Figure 612892DEST_PATH_IMAGE035
Figure 67007DEST_PATH_IMAGE036
in operation S1123, with any of the 1 st sub-regionsjThe characteristic point is a reference pointR pj Will be leftN 2 1 sub-region and reference pointR pj The feature points whose relative positions are kept consistent are collectively called anchor pointsA pj To be reference pointR pj And correspondingN 2 -1 anchor pointA pj Together form the firstjThe cross-regional characteristic set of the group is embedded into the characteristic map
Figure 642345DEST_PATH_IMAGE037
Exist ofHW/N 2 The set of cross-regional feature sets.
In this operation, the first sub-region (i.e. the upper left sub-region in FIG. 5) is divided into any of the 1 st sub-regionsjThe characteristic point is a reference pointR pj (Reference Point), the restN 2 1 sub-region and reference pointR pj The feature points with the consistent relative positions are collectively called anchor pointsA pj (Anchor Point), represented by the formula:
Figure 181910DEST_PATH_IMAGE038
in the formula (I), the compound is shown in the specification,
Figure 953426DEST_PATH_IMAGE039
is a reference pointR pj To a corresponding secondkAn anchor point is arranged at the top of the anchor point,k=2,3,…,N 2
please continue to refer to FIG. 5, since each residue remainsThe sub-regions all have an anchor point corresponding to the reference point, so that for any reference pointR pj Are all provided withN 2 -1 anchor pointA pj . The embodiment of the invention uses the reference pointR pj And correspondingN 2 -1 anchor pointA pj Together form the firstjSet of cross-regional feature sets of size
Figure 996468DEST_PATH_IMAGE040
Embedded feature map of
Figure 945970DEST_PATH_IMAGE041
Each sub-region havingHW/N 2 A characteristic point, therefore existsHW/N 2 The set of cross-regional feature sets.
In operation S1124, for anyjSet of cross-regional feature sets, for two of which feature points
Figure 503990DEST_PATH_IMAGE042
Performing local similarity calculation to obtain the firstjGroup cross-region weight matrixV Gj (X) A first step ofjGroup cross-region weight matrixV Gj (X) With another characteristic point
Figure 829929DEST_PATH_IMAGE043
Carrying out a characteristic polymerization to obtainjGroup cross-region reconstruction feature mapO Gj (X)。
Specifically, for any secondjSet the cross-regional feature set, perform the following matrix multiplication, calculatejGroup cross-region weight matrixV Gj (X):
Figure 743790DEST_PATH_IMAGE044
Then, willV Gj (X) And
Figure 598613DEST_PATH_IMAGE045
performing the following matrix multiplication to calculatejGroup cross-region reconstruction feature mapO Gj (X):
Figure 112771DEST_PATH_IMAGE046
In operation S1125, the methodHW/N 2 Combining the cross-region reconstruction characteristic maps of the set of cross-region characteristic sets to generate a local reconstruction characteristic mapO L (X) Global reconstructed feature map ofO G (X)。
In this operation, theHW/N 2 Grouping the cross-regional feature sets, repeating the operation S1124, calculating cross-regional reconstruction feature maps of other groups of cross-regional feature sets, and finally combining to generate a local reconstruction feature mapO L (X) Global reconstructed feature map ofO G (X) As shown in the following formula:
Figure 976822DEST_PATH_IMAGE047
in the above calculation process, all cross-region weight matrixesV G (X) The parameter size is:
Figure 243724DEST_PATH_IMAGE048
by embodiments of the present invention, two stages in a TLA architectureV L (X) AndV G (X) The weight matrix parameter size is:
Figure 800608DEST_PATH_IMAGE049
with continued reference to FIG. 5, the embodiment of the present invention uses any reference point in the figureR pj For example, asShowing how to build through TLA architectureR pj Correlation with arbitrary feature points in different sub-regions. From the firstjA reference pointR pj Features and correspondingN 2 -1 anchor pointA pj The characteristics together formjThe set of cross-regional feature sets.
In the local feature aggregation stage, any feature point and anchor point in each local feature group are constructedA pj Features and arbitrary features and reference pointsR pj Correlation between features; in the cross-local area feature aggregation stage, reference points are establishedR pj Anchoring features to other sub-areasA pj Correlation between features, using anchor pointsA pj Features acting as relays to slave relevant features from "anchor pointsA pj Correlation with feature points "and" anchor pointsA pj And a reference pointR pj Correlation "translation" to reference pointsR pj And (4) relevance of any characteristic point, thereby realizing the modeling of the relevance of any two characteristic points in the characteristic diagram.
The weight matrix parameter of the conventional Non-Local attention mechanism is as followsO(HW×HW) Thus, the TLA structure to conventional attention mechanism weight matrix parameter size ratio is:
Figure 536482DEST_PATH_IMAGE050
in general, the actual programming selects the hyper-parametersNAnd if =4 to 8, calculating according to the formula as follows:
when inputting the feature map sizeH=W=16,NAnd when = 4:
Figure 962DEST_PATH_IMAGE051
when inputting the feature sizeH=W=16,NWhen = 8:
Figure 873103DEST_PATH_IMAGE052
therefore, TLA structure weight matrix parameters are only 1 \8260ofthe original parameter quantity, 8 to 1 \82604, and the weight matrix parameter quantity can be remarkably reduced, so that the calculation quantity of the Non-Local attention mechanism and the light weight of the parameter quantity are realized.
By the embodiment of the invention, the calculation complexity is low compared with the conventional Non-Local attention mechanism. Under the condition that the network performance is consistent, the parameter quantity and the parameter calculation quantity are greatly reduced, and the video memory occupation quantity of a GPU in the network training process is reduced; the matrix parameter number is compressed several times to dozens of times according to different parameter settings.
Compared with the conventional Non-Local attention mechanism, the weight matrix parameter is small, and occupied storage is small. The embodiment of the invention greatly enhances the adaptability of the Non-Local structure to large-size characteristic diagrams, and can be used as a special segmentation method for dealing with wide-space target images.
FIG. 6 schematically illustrates a schematic diagram of a spatial object segmentation method based on a non-local attention mechanism according to an embodiment of the present invention.
As shown in fig. 6, in the embodiment of the present invention, the image semantic segmentation network in operation S120 includes a full convolution neural network with ResNet-50 as a backbone network, and the full convolution neural network may be, for example, a convolution neural network of an Encoder-decoder series or a deep lab series.
Therefore, the TLA structure is embedded into the existing image semantic segmentation network to obtain the updated image semantic segmentation network.
Further, in operation S120, embedding the two-stage local attention mechanism structure into the image semantic segmentation network specifically includes: and embedding a two-stage local attention mechanism structure between the last convolution layer in the full convolution neural network and the classifier to obtain the global correlation characteristic. Therefore, the light TLA structure is used for carrying out feature aggregation on the deep features of the space target so as to obtain global relevant features, and adverse effects on space target segmentation caused by the influences of uneven illumination, edge blurring and the like of the space image are suppressed.
Further, as shown in fig. 6, the classifier then obtains an output image with the same size as the spatial image to be processed by 8 times of upsampling.
Based on the method disclosed above, the present invention further provides a spatial object segmentation apparatus based on a non-local attention mechanism, which will be described in detail below with reference to fig. 7.
Fig. 7 schematically shows a block diagram of a spatial object segmentation apparatus based on a non-local attention mechanism according to an embodiment of the present invention.
As shown in fig. 7, the non-local attention mechanism-based spatial object segmentation apparatus 700 according to this embodiment includes a structure construction module 710, a structure embedding module 720, and a spatial object segmentation module 730.
The structure construction module 710 is configured to construct a two-stage local attention mechanism structure including a local feature aggregation stage and a cross-local feature aggregation stage, where the local feature aggregation stage is configured to perform spatial region grouping on a complete feature map embedded in a feature space, and extract local correlation features; and the cross-local feature aggregation stage is used for expanding local correlation features and extracting global correlation features.
And the structure embedding module 720 is used for embedding the two-stage local attention mechanism structure into the image semantic segmentation network to obtain an updated image semantic segmentation network.
And the space target segmentation module 730 is configured to acquire a to-be-processed space image, input the to-be-processed space image into an updated image semantic segmentation network, and output a segmentation result of the space target.
It should be noted that the embodiment of the apparatus portion is similar to the embodiment of the method portion, and the achieved technical effects are also similar, for details, please refer to the method embodiment, which is not described herein again.
According to an embodiment of the present invention, any plurality of the structure construction module 710, the structure embedding module 720 and the spatial object segmentation module 730 may be combined into one module to be implemented, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of other modules and implemented in one module. According to an embodiment of the present invention, at least one of the structure building module 710, the structure embedding module 720 and the spatial object segmentation module 730 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware and firmware, or in any suitable combination of any of them. Alternatively, at least one of the structure building module 710, the structure embedding module 720 and the spatial object segmentation module 730 may be at least partially implemented as a computer program module, which when executed, may perform a corresponding function.
Fig. 8 schematically shows a block diagram of an electronic device adapted to implement the access control method according to an embodiment of the invention.
As shown in fig. 8, an electronic device 800 according to an embodiment of the present invention includes a processor 801 which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., CPU), an instruction set processor and/or related chip sets and/or a special purpose microprocessor (e.g., application Specific Integrated Circuit (ASIC)), among others. The processor 801 may also include on-board memory for caching purposes. The processor 801 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present invention.
In the RAM 803, various programs and data necessary for the operation of the electronic apparatus 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flow according to the embodiment of the present invention by executing programs in the ROM 802 and/or the RAM 803. Note that the programs may also be stored in one or more memories other than the ROM 802 and RAM 803. The processor 801 may also perform various operations of method flows according to embodiments of the present invention by executing programs stored in the one or more memories.
Electronic device 800 may also include input/output (I/O) interface 805, input/output (I/O) interface 805 also connected to bus 804, according to an embodiment of the present invention. The electronic device 800 may also include one or more of the following components connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that the computer program read out therefrom is mounted on the storage section 808 as necessary.
Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks. The techniques of the present invention may be implemented in hardware and/or in software (including firmware, microcode, etc.). Furthermore, the techniques of this disclosure may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon for use by or in connection with an instruction execution system.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise. Further, the word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A space target segmentation method based on a non-local attention mechanism is characterized by comprising the following steps:
constructing a two-stage local attention mechanism structure comprising a local feature aggregation stage and a cross-local feature aggregation stage, wherein the local feature aggregation stage is used for carrying out spatial region grouping on a complete feature map embedded in a feature space and extracting local correlation features; the cross-local feature aggregation stage is used for expanding the local correlation features and extracting global correlation features;
embedding the two-stage local attention mechanism structure into an image semantic segmentation network to obtain an updated image semantic segmentation network;
acquiring a spatial image to be processed, inputting the spatial image to be processed into the updated image semantic segmentation network, and outputting a segmentation result of a spatial target;
wherein the local feature aggregation stage is performed in the following manner:
inputting feature maps using convolutional layer pairsXRespectively performing feature mapping to obtain the characteristic spaceComplete characteristic diagram
Figure 62634DEST_PATH_IMAGE001
For the complete feature map
Figure 917457DEST_PATH_IMAGE002
Is divided into in spatial dimensionN×NA local feature set, each local feature set including local sub-features
Figure 697194DEST_PATH_IMAGE003
Wherein, in the process,Nfor adjustable number of local groups, 1 ≤iN 2
For any secondiA local feature group for two local sub-features
Figure 967770DEST_PATH_IMAGE004
Performing local similarity calculation to obtain the firstiLocal area weight matrixV Li (X) The first mentionediLocal area weight matrixV Li (X) With another local sub-feature
Figure 782142DEST_PATH_IMAGE005
Carrying out a characteristic polymerization to obtainiLocal area reconstruction feature mapO Li (X);
Will be described inN 2 Combining local area reconstruction feature maps of the local area feature groups to generate the input feature mapXLocal reconstruction feature map ofO L (X);
The local weight matrixV Li (X) And a firstiLocal area reconstruction feature mapO Li (X) Calculated according to the following formula:
Figure 542288DEST_PATH_IMAGE006
Figure 809321DEST_PATH_IMAGE007
in the formula, (·) T Represents a transpose of a matrix;H,Wrespectively representing input characteristicsXHeight and width of (a);
Figure 942975DEST_PATH_IMAGE008
complete feature map representing embedded feature space
Figure 611853DEST_PATH_IMAGE009
The number of channels of (a);
Figure 542900DEST_PATH_IMAGE010
representing a matrix of rows and columns over a real number domain.
2. The non-local attention mechanism-based spatial object segmentation method according to claim 1, wherein the cross-local feature aggregation stage is performed in the following manner:
reconstructing the feature map locally using convolution layersO L (X) Respectively carrying out feature mapping to obtain an embedded feature map of a feature space
Figure 297230DEST_PATH_IMAGE011
Embedding the feature map
Figure 971925DEST_PATH_IMAGE012
Is divided into in spatial dimensionN×NSub-regions each having a characteristic point
Figure 495310DEST_PATH_IMAGE013
Wherein, 1 is less than or equal tojHW/N 2 H,WRespectively represent inputsCharacteristic diagramXHeight and width of;
in any of the 1 st subregionjThe characteristic point is a reference pointR pj Will be leftN 2 -1 sub-region within said reference pointR pj The feature points whose relative positions are kept consistent are collectively called anchor pointsA pj The reference pointR pj And correspondingN 2 -1 of said anchor pointsA pj Together form the firstjThe cross-regional characteristic set is set, then the embedded characteristic map
Figure 862837DEST_PATH_IMAGE014
Exist ofHW/N 2 A set of cross-regional feature sets;
for any secondjSet of cross-regional feature sets, for two of which feature points
Figure 104463DEST_PATH_IMAGE015
Performing local similarity calculation to obtain the firstjGroup cross-region weight matrixV Gj (X) The first mentionedjGroup cross-region weight matrixV Gj (X) With another characteristic point
Figure 317269DEST_PATH_IMAGE016
Carrying out a characteristic polymerization to obtainjGroup cross-region reconstruction feature mapO Gj (X);
Will be described inHW/N 2 Combining the cross-region reconstruction feature maps of the group cross-region feature sets to generate the local reconstruction feature mapO L (X) Global reconstructed feature map ofO G (X)。
3. The non-local attention mechanism-based spatial object segmentation method of claim 2, wherein the first step isjGroup cross-region weight matrixV Gj (X) And a first step ofjGroup cross-region reconstruction feature mapO Gj (X) Calculated according to the following formula:
Figure 960740DEST_PATH_IMAGE017
Figure 233590DEST_PATH_IMAGE018
in the formula (DEG) T Represents a transpose of a matrix;
Figure 962511DEST_PATH_IMAGE019
complete feature map representing an embedded feature space
Figure 979009DEST_PATH_IMAGE020
The number of channels of (a);
Figure 476986DEST_PATH_IMAGE021
representing a matrix of rows and columns over the real number domain.
4. The non-local attention mechanism-based spatial object segmentation method according to claim 1, wherein the image semantic segmentation network comprises a full convolution neural network with ResNet-50 as a backbone network.
5. The spatial object segmentation method based on the non-local attention mechanism according to claim 4, wherein the embedding the two-stage local attention mechanism structure into the image semantic segmentation network specifically includes:
embedding the two-stage local attention mechanism structure between the last convolutional layer in the fully convolutional neural network and a classifier to obtain the global correlation feature.
6. The non-local attention mechanism-based spatial target segmentation method according to claim 5, wherein the classifier obtains an output image with the same size as the spatial image to be processed through 8 times of upsampling.
7. A spatial object segmentation apparatus based on a non-local attention mechanism, comprising:
the structure construction module is used for constructing a two-stage local attention mechanism structure comprising a local feature aggregation stage and a cross-local feature aggregation stage, wherein the local feature aggregation stage is used for carrying out space region grouping on a complete feature map embedded into a feature space and extracting local correlation features; the cross-local feature aggregation stage is used for expanding the local correlation features and extracting global correlation features;
the structure embedding module is used for embedding the two-stage local attention mechanism structure into an image semantic segmentation network to obtain an updated image semantic segmentation network;
the space target segmentation module is used for acquiring a space image to be processed, inputting the space image to be processed into the updated image semantic segmentation network and outputting a segmentation result of a space target;
wherein the local feature aggregation stage is performed in the following manner:
inputting feature maps using convolutional layer pairsXRespectively carrying out feature mapping to obtain a complete feature map embedded in a feature space
Figure 186316DEST_PATH_IMAGE001
For the complete feature map
Figure 136955DEST_PATH_IMAGE002
Divided into in spatial dimensionN×NA local feature set, each local feature set including local sub-features
Figure 954214DEST_PATH_IMAGE003
Wherein, in the step (A),Nfor adjustable local divisionThe number of groups is more than or equal to 1iN 2
For any secondiA local feature group for two local sub-features
Figure 306698DEST_PATH_IMAGE004
Performing local similarity calculation to obtain the firstiLocal area weight matrixV Li (X) The first mentionediA local weight matrixV Li (X) With another local sub-feature
Figure 186929DEST_PATH_IMAGE005
Carrying out a characteristic polymerization to obtainiLocal area reconstruction feature mapO Li (X);
Will be described inN 2 Combining local area reconstruction feature maps of the local area feature groups to generate the input feature mapXLocal reconstruction feature map ofO L (X);
The local weight matrixV Li (X) And a first step ofiLocal area reconstruction feature mapO Li (X) Calculated according to the following formula:
Figure 624863DEST_PATH_IMAGE006
Figure 983164DEST_PATH_IMAGE007
in the formula (DEG) T Represents a transpose of a matrix;H,Wrespectively representing input characteristicsXHeight and width of;
Figure 455733DEST_PATH_IMAGE008
complete feature map representing an embedded feature space
Figure 506866DEST_PATH_IMAGE009
The number of channels of (a);
Figure 432097DEST_PATH_IMAGE010
representing a matrix of rows and columns over a real number domain.
8. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1 to 6.
CN202211050721.4A 2022-08-31 2022-08-31 Space target segmentation method and device based on non-local attention mechanism Active CN115131568B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211050721.4A CN115131568B (en) 2022-08-31 2022-08-31 Space target segmentation method and device based on non-local attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211050721.4A CN115131568B (en) 2022-08-31 2022-08-31 Space target segmentation method and device based on non-local attention mechanism

Publications (2)

Publication Number Publication Date
CN115131568A CN115131568A (en) 2022-09-30
CN115131568B true CN115131568B (en) 2022-12-27

Family

ID=83386946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211050721.4A Active CN115131568B (en) 2022-08-31 2022-08-31 Space target segmentation method and device based on non-local attention mechanism

Country Status (1)

Country Link
CN (1) CN115131568B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797779A (en) * 2020-07-08 2020-10-20 兰州交通大学 Remote sensing image semantic segmentation method based on regional attention multi-scale feature fusion
CN114549439A (en) * 2022-02-11 2022-05-27 中北大学 RGB-D image semantic segmentation method based on multi-modal feature fusion

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11087174B2 (en) * 2018-09-25 2021-08-10 Nec Corporation Deep group disentangled embedding and network weight generation for visual inspection
CN113486897A (en) * 2021-07-29 2021-10-08 辽宁工程技术大学 Semantic segmentation method for convolution attention mechanism up-sampling decoding

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797779A (en) * 2020-07-08 2020-10-20 兰州交通大学 Remote sensing image semantic segmentation method based on regional attention multi-scale feature fusion
CN114549439A (en) * 2022-02-11 2022-05-27 中北大学 RGB-D image semantic segmentation method based on multi-modal feature fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于自适应非局部注意力网络的图像语义分割方法研究;李秉嶷;《中国优秀硕士学位论文全文数据库 信息科技辑》;20220315(第03期);第5页倒数第3段、第7页第2段、第18页第2段至第30页第1段、第36页第1-3段,图3-1 *

Also Published As

Publication number Publication date
CN115131568A (en) 2022-09-30

Similar Documents

Publication Publication Date Title
US10740897B2 (en) Method and device for three-dimensional feature-embedded image object component-level semantic segmentation
CN111369440B (en) Model training and image super-resolution processing method, device, terminal and storage medium
CN109522874B (en) Human body action recognition method and device, terminal equipment and storage medium
US20190355103A1 (en) Guided hallucination for missing image content using a neural network
CN106991665B (en) Parallel computing method based on CUDA image fusion
DE102019130889A1 (en) ESTIMATE THE DEPTH OF A VIDEO DATA STREAM TAKEN BY A MONOCULAR RGB CAMERA
CN110659664B (en) SSD-based high-precision small object identification method
DE102018113845A1 (en) Systems and methods for training neural networks with sparse data
CN112861729B (en) Real-time depth completion method based on pseudo-depth map guidance
CN107563405A (en) Garage automatic Pilot semantic objects recognition methods based on multiresolution neutral net
CN107688783B (en) 3D image detection method and device, electronic equipment and computer readable medium
CN115035295B (en) Remote sensing image semantic segmentation method based on shared convolution kernel and boundary loss function
CN110084181B (en) Remote sensing image ship target detection method based on sparse MobileNet V2 network
DE102018128699A1 (en) Adjusting an angular sampling rate during rendering using gaze information
US11682212B2 (en) Hierarchical data organization for dense optical flow processing in a computer vision system
DE102018123761A1 (en) FUSE PROTECTION IN AN ERROR CORRECTION CODE (ECC) IMPLEMENTED IN A MOTOR VEHICLE SYSTEM
DE112019001978T5 (en) IMPROVING THE REALISM OF SCENES WITH WATER SURFACES DURING RENDERING
CN112991537A (en) City scene reconstruction method and device, computer equipment and storage medium
CN114283217A (en) Method, device and equipment for training reconstruction model of three-dimensional electron microscope image
CN114998671A (en) Visual feature learning device based on convolution mask, acquisition device and storage medium
DE102020101525A1 (en) BLIND-SPOT FOLDING ARCHITECTURES AND BAYESE IMAGE RECOVERY
CN114648640A (en) Target object monomer method, device, equipment and storage medium
CN115131568B (en) Space target segmentation method and device based on non-local attention mechanism
DE102021114013A1 (en) TECHNIQUES FOR EFFICIENT SCANNING OF AN IMAGE
CN113837941A (en) Training method and device for image hyper-resolution model and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant