CN115131568B - Space target segmentation method and device based on non-local attention mechanism - Google Patents
Space target segmentation method and device based on non-local attention mechanism Download PDFInfo
- Publication number
- CN115131568B CN115131568B CN202211050721.4A CN202211050721A CN115131568B CN 115131568 B CN115131568 B CN 115131568B CN 202211050721 A CN202211050721 A CN 202211050721A CN 115131568 B CN115131568 B CN 115131568B
- Authority
- CN
- China
- Prior art keywords
- local
- feature
- attention mechanism
- space
- stage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Image Processing (AREA)
Abstract
The invention provides a space target segmentation method and device based on a Non-Local attention mechanism, relates to the technical field of computer vision, and aims to solve the technical problems that an existing image segmentation algorithm based on a Non-Local attention mechanism is large in weight matrix parameter quantity, limited in application range and difficult in compression of weight matrix parameters. The method comprises the following steps: constructing a two-stage local attention mechanism structure comprising a local feature aggregation stage and a cross-local feature aggregation stage; embedding the two-stage local attention mechanism structure into an image semantic segmentation network to obtain an updated image semantic segmentation network; and acquiring a space image to be processed, inputting the space image to be processed into the updated image semantic segmentation network, and outputting a segmentation result of the space target. The method can lighten the calculation amount of the attention mechanism, compress the parameter amount of the weight matrix and expand the application of the attention mechanism.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to the field of space target image segmentation, and more particularly relates to a space target segmentation method and device based on a non-local attention mechanism.
Background
In recent years, human aerospace exploration activities are increasingly frequent, the number of targets in orbit space of various countries is increased sharply, and collision early warning on the space targets has important significance for guaranteeing the on-orbit safety of space stations and various high-value space targets in China. The spatial situation awareness technology is used for determining the state, the attribute and the intention of a non-cooperative space target by monitoring the position and motion state information of the non-cooperative space target for a long time, and is a main countermeasure and precaution means for the space safety problem at present. At present, space-based optical observation is an important technical means for obtaining space target information, and compared with ground-based optical observation, the space-based optical observation is not limited by factors such as atmospheric interference and meteorological conditions.
The main task of the space target segmentation technology is to segment space targets and target component information from a starry sky background, so that subsequent target information (attributes, functions and intentions) can be further interpreted conveniently, and therefore, the space target segmentation is a basic key technology of a space situation perception technology. At present, the segmentation objects of the mainstream image segmentation technology are streetscapes, automobiles, airplanes, ships and warships, and a special segmentation method for a space target still remains blank in the industry. Compared with common natural images and high-resolution remote sensing images, the space-based optical observation images have the data characteristics of wide and large scenes, uneven illumination, blurred edges, serious overexposure and the like, and the data characteristics seriously interfere with image feature extraction, so that the design of a special space target segmentation method is a very challenging matter to solve urgently.
Aiming at the data characteristics of the space image, attention can be drawn for designing a special space target segmentation method. Since an image segmentation algorithm based on Non-Local attention mechanism is provided, the method is widely applied to the field of natural image and high-resolution remote sensing image segmentation due to the strong associated feature extraction capability. The method is limited by the Non-Local attention mechanism principle and the characteristics of spatial image data, and the following defects exist when the image segmentation algorithm of the conventional Non-Local attention mechanism is directly used for the spatial target image segmentation task:
1) The weight matrix has a large parameter number, and is very easy to occupy too much video memory, which causes the video memory explosion of a GPU (Graphics Processing Unit). The weight matrix parameter isO(HW×HW) The space complexity of the network, namely the parameter quantity and the calculated quantity are in direct proportion to the square of an image size value HW, so that the storage cost in the network calculation process is high;
2) The application range is limited. Due to the limitation of the size of the feature map, the application of a Non-Local attention mechanism in a network feature extraction stage of a dense prediction task (such as a semantic segmentation task) is greatly limited, and a Non-Local structure is difficult to use in a large-scale feature map such as a spatial target;
3) The parameters of the weight matrix are difficult to compress, the lightweight of the network parameters is difficult to realize, and the requirement on the performance of the server is high.
Therefore, the existing image segmentation algorithm based on the Non-Local attention mechanism has some defects and needs to be further improved.
Disclosure of Invention
In view of the above problems, the present invention provides a method and an apparatus for spatial object segmentation based on a non-local attention mechanism.
The invention provides a space target segmentation method based on a non-local attention mechanism, which comprises the following steps: constructing a two-stage local attention mechanism structure comprising a local feature aggregation stage and a cross-local feature aggregation stage, wherein the local feature aggregation stage is used for performing spatial region grouping on a complete feature map embedded in a feature space and extracting local correlation features; the cross-local feature aggregation stage is used for expanding local correlation features and extracting global correlation features; embedding the two-stage local attention mechanism structure into an image semantic segmentation network to obtain an updated image semantic segmentation network; and acquiring a space image to be processed, inputting the space image to be processed into the updated image semantic segmentation network, and outputting a segmentation result of the space target.
The invention provides a space target segmentation method based on a non-local attention mechanism, which comprises the following steps: the structure construction module is used for constructing a two-stage local attention mechanism structure comprising a local feature aggregation stage and a cross-local feature aggregation stage, wherein the local feature aggregation stage is used for carrying out space region grouping on a complete feature map embedded into a feature space and extracting local correlation features; the cross-local feature aggregation stage is used for expanding local correlation features and extracting global correlation features; the structure embedding module is used for embedding the two-stage local attention mechanism structure into the image semantic segmentation network to obtain an updated image semantic segmentation network; and the space target segmentation module is used for acquiring a space image to be processed, inputting the space image to be processed into the updated image semantic segmentation network and outputting a segmentation result of the space target.
The present invention also provides an electronic device, comprising: one or more processors; memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the non-local attention mechanism based spatial object segmentation method described above.
Compared with the prior art, the space target segmentation method and device based on the non-local attention mechanism provided by the invention at least have the following beneficial effects:
(1) The two-stage Local attention mechanism structure (TLA structure) is provided, the number of overall calculation characteristic points of the conventional Non-Local attention mechanism structure can be obviously reduced, the calculation amount of the attention mechanism is lightened, the parameter amount of a weight matrix is compressed, and the application of the attention mechanism is expanded;
(2) The method adopts a characteristic modeling strategy of 'Local area first and then integral', and divides the conventional Non-Local attention mechanism characteristic correlation calculation and aggregation process into two-stage calculation processes of 'Local characteristic aggregation' and 'cross-Local characteristic aggregation', so as to realize the sparseness of the sampling of the calculated characteristic points;
(3) The TLA structure is embedded into the deep layer of the convolution network, so that feature aggregation of a deep feature map is realized, adverse effects of uneven illumination, fuzzy edge and the like of a space image are inhibited, feature separability is enhanced, and space target segmentation precision is improved.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent from the following description of the embodiments of the present invention with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates a flow chart of a method for spatial object segmentation based on a non-local attention mechanism according to an embodiment of the present invention;
FIG. 2 schematically illustrates a schematic diagram of a TLA architecture in accordance with an embodiment of the present invention;
FIG. 3 schematically shows a flow diagram of a local feature aggregation phase according to an embodiment of the invention;
FIG. 4 schematically shows a flow diagram of a cross-local feature aggregation phase according to an embodiment of the invention;
FIG. 5 schematically illustrates a schematic diagram of a two-stage feature aggregation process according to an embodiment of the invention;
FIG. 6 schematically illustrates a schematic diagram of a spatial target segmentation method based on a non-local attention mechanism, according to an embodiment of the present invention;
FIG. 7 schematically illustrates a block diagram of a spatial target segmentation apparatus based on a non-local attention mechanism according to an embodiment of the present invention;
fig. 8 schematically shows a block diagram of an electronic device adapted to implement the access control method according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Aiming at the defects of an image segmentation algorithm based on a Non-Local Attention mechanism, the technical scheme of the invention adopts a strategy of 'Local area first and then whole', firstly, a Two-stage Local Attention mechanism structure (TLA structure for short) is provided to replace a conventional Non-Local Attention mechanism structure, then the TLA structure is embedded into the existing image segmentation network (such as Encoder-decoder series, deep Lab series and other segmentation networks), and a space target segmentation method based on a lightweight Non-Local Attention mechanism is provided. The method adapts to the data characteristics of the space image, realizes the light weight of attention mechanism calculation, compresses the weight matrix parameters, and can be specially used for space target segmentation tasks.
Fig. 1 schematically shows a flow chart of a spatial object segmentation method based on a non-local attention mechanism according to an embodiment of the present invention.
As shown in FIG. 1, the method for segmenting a spatial object based on a non-local attention mechanism according to the embodiment may include operations S110 to S130.
In operation S110, a two-stage local attention mechanism structure including a local feature aggregation stage and a cross-local feature aggregation stage is constructed, where the local feature aggregation stage is configured to perform spatial region grouping on a complete feature map embedded in a feature space, and extract local correlation features; and the cross-local feature aggregation stage is used for expanding local correlation features and extracting global correlation features.
In operation S120, the two-stage local attention mechanism structure is embedded into the image semantic segmentation network to obtain an updated image semantic segmentation network.
In operation S130, a spatial image to be processed is acquired, the spatial image to be processed is input into the updated image semantic segmentation network, and a segmentation result of the spatial object is output.
In the embodiment of the invention, the TLA structure adopts a 'Local area first and then integral' feature modeling strategy, and the feature correlation calculation and aggregation process of the conventional Non-Local attention mechanism structure is divided into a 'Local feature calculation aggregation' and a 'cross-Local feature calculation aggregation' two-stage feature aggregation process, so that the number of calculated feature points in a feature map is remarkably reduced, the sampling sparseness of the calculated feature points is realized, and the light weight and high efficiency of the Non-Local attention mechanism calculation are achieved. The detailed principle of the TLA structure is as follows.
Figure 2 schematically illustrates a schematic diagram of a TLA architecture according to an embodiment of the present invention.
As shown in FIG. 2, compared with the conventional Non-Local attention mechanism structure, the core of the first phase of the TLA structure, i.e. the Local feature aggregation phase, is to first embed the complete feature map in the feature space before calculating the attention weight matrixIn the spatial dimensionN×NAnd (4) area grouping processing.
Then, respectively atN 2 Feature correlation calculations for features in the local space from the local feature set may be generatedN 2 A local weight matrixV Li (X) Thus completingN 2 And modeling the correlation between the characteristic points in the local area space.
Finally, the original characteristic diagram is subjected to characteristic reconstruction by matrix multiplication operation by using the corresponding weight matrix in each local area group to generate a new local reconstruction characteristic diagramO L (X)。
Fig. 3 schematically shows a flow chart of a local feature aggregation phase according to an embodiment of the invention.
Referring to fig. 3 in conjunction with fig. 2, in the embodiment of the present invention, the local feature aggregation stage in operation S110 may be specifically performed according to the following operations S1111 to S1114.
In operation S1111, a feature map is input using convolution layer pairsXRespectively carrying out feature mapping to obtain a complete feature map embedded in a feature space。
In this operation, the convolutional layer is also referred to as a 1 × 1 convolutional layer. Input feature mapEmbedding complete feature maps of feature spacesCalculated according to the following formula:
in the formula (I), the compound is shown in the specification,H,W,Crespectively representing input characteristicsXHeight, width and number of characteristic channels;respectively representing the parameters of the 1 × 1 convolutional layers;representing a complete feature map via feature mappingThe number of channels of (a);representing a matrix of rows and columns over the real number domain.
In operation S1112, the complete feature map is processedDivided into in spatial dimensionN×NA local feature group, each local feature group including local sub-featuresWherein, in the process,Nfor adjustable local packet number, 1 is less than or equal toi≤N 2 。
In the present operation, the operation of the apparatus,Nand the over-parameter represents the adjustable local area packet number. For complete feature mapRespectively generateN 2 The local sub-features are shown as follows:
in operation S1113, with respect to any ofiA local feature group for two local sub-featuresPerforming local similarity calculation to obtain the firstiLocal area weight matrixV Li (X) To be connected toiLocal area weight matrixV Li (X) With another local sub-featureCarrying out a characteristic polymerization to obtainiLocal area reconstruction feature mapO Li (X)。
In particular toTo any secondiA local feature group, which is subjected to the following matrix multiplication operation to calculateiA local weight matrixV Li (X):
In the formula (DEG) T Representing the transpose of the matrix.
Then, willV Li (X) And withPerforming the following matrix multiplication to calculateiLocal area reconstruction feature mapO Li (X):
In operation S1114, aN 2 Combining the local reconstruction feature maps of the local feature groups to generate an input feature mapXLocal reconstruction feature map ofO L (X)。
In this operation, toN 2 Repeating the above operation S1113 for each local feature group, calculating local reconstruction feature maps of other local feature groups, and combining to form input feature mapXLocal reconstruction feature map ofO L (X) As shown in the following formula:
in the above calculation process, all the local weight matrixesV L (X) The parameter size is:
by the embodiment of the invention, the space local grouping strategy in the local feature aggregation stage effectively reduces the calculation complexity of parameters and the space size of the weight matrix. However, the strategy also causes the problem of "dense connection in local area, no connection between local areas" of feature points in space, so that the feature aggregation process can only occur in a local area, which is very unfavorable for the global attention feature extraction.
Specifically, in the local feature aggregation stage, dense modeling is performed on the feature point correlation in the local region under the conditions of low spatial complexity and low computational complexity, but the feature point modeling capability between the feature points in the feature map across the local region is lost, and the problem of the lack of the correlation modeling capability between the feature points in the local region across the local region is mainly solved in the next local feature aggregation stage, so that the local correlation features extracted in the local feature aggregation stage are expanded to the whole feature map to extract the global correlation features.
Therefore, the embodiment of the invention also introduces a cross-local feature aggregation stage, wherein the cross-local feature aggregation stage uses the local reconstruction feature map of the local feature aggregation stageO L (X) The specific calculation process for inputting the feature map is described in detail below.
Fig. 4 schematically shows a flow diagram of a cross-local feature aggregation phase according to an embodiment of the invention. FIG. 5 schematically illustrates a schematic diagram of a two-stage feature aggregation process according to an embodiment of the invention.
Referring to fig. 4 in conjunction with fig. 2 and fig. 5, in the embodiment of the present invention, the cross-local feature aggregation stage in the operation S110 may be specifically performed according to the following operations S1121 to S1125.
In operation S1121, feature maps are locally reconstructed using convolutional layersO L (X) Respectively mapping the features to obtain an embedded feature map of the feature space。
In this operation, the convolutional layer is also 1 × 1 convolutional layer. Local reconstructed feature mapEmbedding feature maps of feature spacesCalculated according to the following formula:
in the formula (I), the compound is shown in the specification,each represents the parameters of 1 × 1 convolutional layer.
In operation S1122, a feature map is embeddedDivided into in spatial dimensionN×NSub-regions each having a characteristic pointWherein, 1 is less than or equal toj≤HW/N 2 ,H,WRespectively representing input characteristicsXHeight and width of (a).
In this operation, the spatial region is again grouped to generateN 2 Sub-regions, each sub-region then havingHW/N 2 Characteristic points, as shown in the following formula:
in operation S1123, with any of the 1 st sub-regionsjThe characteristic point is a reference pointR pj Will be leftN 2 1 sub-region and reference pointR pj The feature points whose relative positions are kept consistent are collectively called anchor pointsA pj To be reference pointR pj And correspondingN 2 -1 anchor pointA pj Together form the firstjThe cross-regional characteristic set of the group is embedded into the characteristic mapExist ofHW/N 2 The set of cross-regional feature sets.
In this operation, the first sub-region (i.e. the upper left sub-region in FIG. 5) is divided into any of the 1 st sub-regionsjThe characteristic point is a reference pointR pj (Reference Point), the restN 2 1 sub-region and reference pointR pj The feature points with the consistent relative positions are collectively called anchor pointsA pj (Anchor Point), represented by the formula:
in the formula (I), the compound is shown in the specification,is a reference pointR pj To a corresponding secondkAn anchor point is arranged at the top of the anchor point,k=2,3,…,N 2 。
please continue to refer to FIG. 5, since each residue remainsThe sub-regions all have an anchor point corresponding to the reference point, so that for any reference pointR pj Are all provided withN 2 -1 anchor pointA pj . The embodiment of the invention uses the reference pointR pj And correspondingN 2 -1 anchor pointA pj Together form the firstjSet of cross-regional feature sets of sizeEmbedded feature map ofEach sub-region havingHW/N 2 A characteristic point, therefore existsHW/N 2 The set of cross-regional feature sets.
In operation S1124, for anyjSet of cross-regional feature sets, for two of which feature pointsPerforming local similarity calculation to obtain the firstjGroup cross-region weight matrixV Gj (X) A first step ofjGroup cross-region weight matrixV Gj (X) With another characteristic pointCarrying out a characteristic polymerization to obtainjGroup cross-region reconstruction feature mapO Gj (X)。
Specifically, for any secondjSet the cross-regional feature set, perform the following matrix multiplication, calculatejGroup cross-region weight matrixV Gj (X):
Then, willV Gj (X) Andperforming the following matrix multiplication to calculatejGroup cross-region reconstruction feature mapO Gj (X):
In operation S1125, the methodHW/N 2 Combining the cross-region reconstruction characteristic maps of the set of cross-region characteristic sets to generate a local reconstruction characteristic mapO L (X) Global reconstructed feature map ofO G (X)。
In this operation, theHW/N 2 Grouping the cross-regional feature sets, repeating the operation S1124, calculating cross-regional reconstruction feature maps of other groups of cross-regional feature sets, and finally combining to generate a local reconstruction feature mapO L (X) Global reconstructed feature map ofO G (X) As shown in the following formula:
in the above calculation process, all cross-region weight matrixesV G (X) The parameter size is:
by embodiments of the present invention, two stages in a TLA architectureV L (X) AndV G (X) The weight matrix parameter size is:
with continued reference to FIG. 5, the embodiment of the present invention uses any reference point in the figureR pj For example, asShowing how to build through TLA architectureR pj Correlation with arbitrary feature points in different sub-regions. From the firstjA reference pointR pj Features and correspondingN 2 -1 anchor pointA pj The characteristics together formjThe set of cross-regional feature sets.
In the local feature aggregation stage, any feature point and anchor point in each local feature group are constructedA pj Features and arbitrary features and reference pointsR pj Correlation between features; in the cross-local area feature aggregation stage, reference points are establishedR pj Anchoring features to other sub-areasA pj Correlation between features, using anchor pointsA pj Features acting as relays to slave relevant features from "anchor pointsA pj Correlation with feature points "and" anchor pointsA pj And a reference pointR pj Correlation "translation" to reference pointsR pj And (4) relevance of any characteristic point, thereby realizing the modeling of the relevance of any two characteristic points in the characteristic diagram.
The weight matrix parameter of the conventional Non-Local attention mechanism is as followsO(HW×HW) Thus, the TLA structure to conventional attention mechanism weight matrix parameter size ratio is:
in general, the actual programming selects the hyper-parametersNAnd if =4 to 8, calculating according to the formula as follows:
when inputting the feature map sizeH=W=16,NAnd when = 4:
when inputting the feature sizeH=W=16,NWhen = 8:
therefore, TLA structure weight matrix parameters are only 1 \8260ofthe original parameter quantity, 8 to 1 \82604, and the weight matrix parameter quantity can be remarkably reduced, so that the calculation quantity of the Non-Local attention mechanism and the light weight of the parameter quantity are realized.
By the embodiment of the invention, the calculation complexity is low compared with the conventional Non-Local attention mechanism. Under the condition that the network performance is consistent, the parameter quantity and the parameter calculation quantity are greatly reduced, and the video memory occupation quantity of a GPU in the network training process is reduced; the matrix parameter number is compressed several times to dozens of times according to different parameter settings.
Compared with the conventional Non-Local attention mechanism, the weight matrix parameter is small, and occupied storage is small. The embodiment of the invention greatly enhances the adaptability of the Non-Local structure to large-size characteristic diagrams, and can be used as a special segmentation method for dealing with wide-space target images.
FIG. 6 schematically illustrates a schematic diagram of a spatial object segmentation method based on a non-local attention mechanism according to an embodiment of the present invention.
As shown in fig. 6, in the embodiment of the present invention, the image semantic segmentation network in operation S120 includes a full convolution neural network with ResNet-50 as a backbone network, and the full convolution neural network may be, for example, a convolution neural network of an Encoder-decoder series or a deep lab series.
Therefore, the TLA structure is embedded into the existing image semantic segmentation network to obtain the updated image semantic segmentation network.
Further, in operation S120, embedding the two-stage local attention mechanism structure into the image semantic segmentation network specifically includes: and embedding a two-stage local attention mechanism structure between the last convolution layer in the full convolution neural network and the classifier to obtain the global correlation characteristic. Therefore, the light TLA structure is used for carrying out feature aggregation on the deep features of the space target so as to obtain global relevant features, and adverse effects on space target segmentation caused by the influences of uneven illumination, edge blurring and the like of the space image are suppressed.
Further, as shown in fig. 6, the classifier then obtains an output image with the same size as the spatial image to be processed by 8 times of upsampling.
Based on the method disclosed above, the present invention further provides a spatial object segmentation apparatus based on a non-local attention mechanism, which will be described in detail below with reference to fig. 7.
Fig. 7 schematically shows a block diagram of a spatial object segmentation apparatus based on a non-local attention mechanism according to an embodiment of the present invention.
As shown in fig. 7, the non-local attention mechanism-based spatial object segmentation apparatus 700 according to this embodiment includes a structure construction module 710, a structure embedding module 720, and a spatial object segmentation module 730.
The structure construction module 710 is configured to construct a two-stage local attention mechanism structure including a local feature aggregation stage and a cross-local feature aggregation stage, where the local feature aggregation stage is configured to perform spatial region grouping on a complete feature map embedded in a feature space, and extract local correlation features; and the cross-local feature aggregation stage is used for expanding local correlation features and extracting global correlation features.
And the structure embedding module 720 is used for embedding the two-stage local attention mechanism structure into the image semantic segmentation network to obtain an updated image semantic segmentation network.
And the space target segmentation module 730 is configured to acquire a to-be-processed space image, input the to-be-processed space image into an updated image semantic segmentation network, and output a segmentation result of the space target.
It should be noted that the embodiment of the apparatus portion is similar to the embodiment of the method portion, and the achieved technical effects are also similar, for details, please refer to the method embodiment, which is not described herein again.
According to an embodiment of the present invention, any plurality of the structure construction module 710, the structure embedding module 720 and the spatial object segmentation module 730 may be combined into one module to be implemented, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of other modules and implemented in one module. According to an embodiment of the present invention, at least one of the structure building module 710, the structure embedding module 720 and the spatial object segmentation module 730 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware and firmware, or in any suitable combination of any of them. Alternatively, at least one of the structure building module 710, the structure embedding module 720 and the spatial object segmentation module 730 may be at least partially implemented as a computer program module, which when executed, may perform a corresponding function.
Fig. 8 schematically shows a block diagram of an electronic device adapted to implement the access control method according to an embodiment of the invention.
As shown in fig. 8, an electronic device 800 according to an embodiment of the present invention includes a processor 801 which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., CPU), an instruction set processor and/or related chip sets and/or a special purpose microprocessor (e.g., application Specific Integrated Circuit (ASIC)), among others. The processor 801 may also include on-board memory for caching purposes. The processor 801 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present invention.
In the RAM 803, various programs and data necessary for the operation of the electronic apparatus 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flow according to the embodiment of the present invention by executing programs in the ROM 802 and/or the RAM 803. Note that the programs may also be stored in one or more memories other than the ROM 802 and RAM 803. The processor 801 may also perform various operations of method flows according to embodiments of the present invention by executing programs stored in the one or more memories.
Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks. The techniques of the present invention may be implemented in hardware and/or in software (including firmware, microcode, etc.). Furthermore, the techniques of this disclosure may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon for use by or in connection with an instruction execution system.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise. Further, the word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (8)
1. A space target segmentation method based on a non-local attention mechanism is characterized by comprising the following steps:
constructing a two-stage local attention mechanism structure comprising a local feature aggregation stage and a cross-local feature aggregation stage, wherein the local feature aggregation stage is used for carrying out spatial region grouping on a complete feature map embedded in a feature space and extracting local correlation features; the cross-local feature aggregation stage is used for expanding the local correlation features and extracting global correlation features;
embedding the two-stage local attention mechanism structure into an image semantic segmentation network to obtain an updated image semantic segmentation network;
acquiring a spatial image to be processed, inputting the spatial image to be processed into the updated image semantic segmentation network, and outputting a segmentation result of a spatial target;
wherein the local feature aggregation stage is performed in the following manner:
inputting feature maps using convolutional layer pairsXRespectively performing feature mapping to obtain the characteristic spaceComplete characteristic diagram;
For the complete feature mapIs divided into in spatial dimensionN×NA local feature set, each local feature set including local sub-featuresWherein, in the process,Nfor adjustable number of local groups, 1 ≤i≤N 2 ;
For any secondiA local feature group for two local sub-featuresPerforming local similarity calculation to obtain the firstiLocal area weight matrixV Li (X) The first mentionediLocal area weight matrixV Li (X) With another local sub-featureCarrying out a characteristic polymerization to obtainiLocal area reconstruction feature mapO Li (X);
Will be described inN 2 Combining local area reconstruction feature maps of the local area feature groups to generate the input feature mapXLocal reconstruction feature map ofO L (X);
The local weight matrixV Li (X) And a firstiLocal area reconstruction feature mapO Li (X) Calculated according to the following formula:
2. The non-local attention mechanism-based spatial object segmentation method according to claim 1, wherein the cross-local feature aggregation stage is performed in the following manner:
reconstructing the feature map locally using convolution layersO L (X) Respectively carrying out feature mapping to obtain an embedded feature map of a feature space;
Embedding the feature mapIs divided into in spatial dimensionN×NSub-regions each having a characteristic pointWherein, 1 is less than or equal toj≤HW/N 2 ,H,WRespectively represent inputsCharacteristic diagramXHeight and width of;
in any of the 1 st subregionjThe characteristic point is a reference pointR pj Will be leftN 2 -1 sub-region within said reference pointR pj The feature points whose relative positions are kept consistent are collectively called anchor pointsA pj The reference pointR pj And correspondingN 2 -1 of said anchor pointsA pj Together form the firstjThe cross-regional characteristic set is set, then the embedded characteristic mapExist ofHW/N 2 A set of cross-regional feature sets;
for any secondjSet of cross-regional feature sets, for two of which feature pointsPerforming local similarity calculation to obtain the firstjGroup cross-region weight matrixV Gj (X) The first mentionedjGroup cross-region weight matrixV Gj (X) With another characteristic pointCarrying out a characteristic polymerization to obtainjGroup cross-region reconstruction feature mapO Gj (X);
Will be described inHW/N 2 Combining the cross-region reconstruction feature maps of the group cross-region feature sets to generate the local reconstruction feature mapO L (X) Global reconstructed feature map ofO G (X)。
3. The non-local attention mechanism-based spatial object segmentation method of claim 2, wherein the first step isjGroup cross-region weight matrixV Gj (X) And a first step ofjGroup cross-region reconstruction feature mapO Gj (X) Calculated according to the following formula:
4. The non-local attention mechanism-based spatial object segmentation method according to claim 1, wherein the image semantic segmentation network comprises a full convolution neural network with ResNet-50 as a backbone network.
5. The spatial object segmentation method based on the non-local attention mechanism according to claim 4, wherein the embedding the two-stage local attention mechanism structure into the image semantic segmentation network specifically includes:
embedding the two-stage local attention mechanism structure between the last convolutional layer in the fully convolutional neural network and a classifier to obtain the global correlation feature.
6. The non-local attention mechanism-based spatial target segmentation method according to claim 5, wherein the classifier obtains an output image with the same size as the spatial image to be processed through 8 times of upsampling.
7. A spatial object segmentation apparatus based on a non-local attention mechanism, comprising:
the structure construction module is used for constructing a two-stage local attention mechanism structure comprising a local feature aggregation stage and a cross-local feature aggregation stage, wherein the local feature aggregation stage is used for carrying out space region grouping on a complete feature map embedded into a feature space and extracting local correlation features; the cross-local feature aggregation stage is used for expanding the local correlation features and extracting global correlation features;
the structure embedding module is used for embedding the two-stage local attention mechanism structure into an image semantic segmentation network to obtain an updated image semantic segmentation network;
the space target segmentation module is used for acquiring a space image to be processed, inputting the space image to be processed into the updated image semantic segmentation network and outputting a segmentation result of a space target;
wherein the local feature aggregation stage is performed in the following manner:
inputting feature maps using convolutional layer pairsXRespectively carrying out feature mapping to obtain a complete feature map embedded in a feature space;
For the complete feature mapDivided into in spatial dimensionN×NA local feature set, each local feature set including local sub-featuresWherein, in the step (A),Nfor adjustable local divisionThe number of groups is more than or equal to 1i≤N 2 ;
For any secondiA local feature group for two local sub-featuresPerforming local similarity calculation to obtain the firstiLocal area weight matrixV Li (X) The first mentionediA local weight matrixV Li (X) With another local sub-featureCarrying out a characteristic polymerization to obtainiLocal area reconstruction feature mapO Li (X);
Will be described inN 2 Combining local area reconstruction feature maps of the local area feature groups to generate the input feature mapXLocal reconstruction feature map ofO L (X);
The local weight matrixV Li (X) And a first step ofiLocal area reconstruction feature mapO Li (X) Calculated according to the following formula:
8. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211050721.4A CN115131568B (en) | 2022-08-31 | 2022-08-31 | Space target segmentation method and device based on non-local attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211050721.4A CN115131568B (en) | 2022-08-31 | 2022-08-31 | Space target segmentation method and device based on non-local attention mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115131568A CN115131568A (en) | 2022-09-30 |
CN115131568B true CN115131568B (en) | 2022-12-27 |
Family
ID=83386946
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211050721.4A Active CN115131568B (en) | 2022-08-31 | 2022-08-31 | Space target segmentation method and device based on non-local attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115131568B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111797779A (en) * | 2020-07-08 | 2020-10-20 | 兰州交通大学 | Remote sensing image semantic segmentation method based on regional attention multi-scale feature fusion |
CN114549439A (en) * | 2022-02-11 | 2022-05-27 | 中北大学 | RGB-D image semantic segmentation method based on multi-modal feature fusion |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11087174B2 (en) * | 2018-09-25 | 2021-08-10 | Nec Corporation | Deep group disentangled embedding and network weight generation for visual inspection |
CN113486897A (en) * | 2021-07-29 | 2021-10-08 | 辽宁工程技术大学 | Semantic segmentation method for convolution attention mechanism up-sampling decoding |
-
2022
- 2022-08-31 CN CN202211050721.4A patent/CN115131568B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111797779A (en) * | 2020-07-08 | 2020-10-20 | 兰州交通大学 | Remote sensing image semantic segmentation method based on regional attention multi-scale feature fusion |
CN114549439A (en) * | 2022-02-11 | 2022-05-27 | 中北大学 | RGB-D image semantic segmentation method based on multi-modal feature fusion |
Non-Patent Citations (1)
Title |
---|
基于自适应非局部注意力网络的图像语义分割方法研究;李秉嶷;《中国优秀硕士学位论文全文数据库 信息科技辑》;20220315(第03期);第5页倒数第3段、第7页第2段、第18页第2段至第30页第1段、第36页第1-3段,图3-1 * |
Also Published As
Publication number | Publication date |
---|---|
CN115131568A (en) | 2022-09-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10740897B2 (en) | Method and device for three-dimensional feature-embedded image object component-level semantic segmentation | |
CN111369440B (en) | Model training and image super-resolution processing method, device, terminal and storage medium | |
CN109522874B (en) | Human body action recognition method and device, terminal equipment and storage medium | |
US20190355103A1 (en) | Guided hallucination for missing image content using a neural network | |
CN106991665B (en) | Parallel computing method based on CUDA image fusion | |
DE102019130889A1 (en) | ESTIMATE THE DEPTH OF A VIDEO DATA STREAM TAKEN BY A MONOCULAR RGB CAMERA | |
CN110659664B (en) | SSD-based high-precision small object identification method | |
DE102018113845A1 (en) | Systems and methods for training neural networks with sparse data | |
CN112861729B (en) | Real-time depth completion method based on pseudo-depth map guidance | |
CN107563405A (en) | Garage automatic Pilot semantic objects recognition methods based on multiresolution neutral net | |
CN107688783B (en) | 3D image detection method and device, electronic equipment and computer readable medium | |
CN115035295B (en) | Remote sensing image semantic segmentation method based on shared convolution kernel and boundary loss function | |
CN110084181B (en) | Remote sensing image ship target detection method based on sparse MobileNet V2 network | |
DE102018128699A1 (en) | Adjusting an angular sampling rate during rendering using gaze information | |
US11682212B2 (en) | Hierarchical data organization for dense optical flow processing in a computer vision system | |
DE102018123761A1 (en) | FUSE PROTECTION IN AN ERROR CORRECTION CODE (ECC) IMPLEMENTED IN A MOTOR VEHICLE SYSTEM | |
DE112019001978T5 (en) | IMPROVING THE REALISM OF SCENES WITH WATER SURFACES DURING RENDERING | |
CN112991537A (en) | City scene reconstruction method and device, computer equipment and storage medium | |
CN114283217A (en) | Method, device and equipment for training reconstruction model of three-dimensional electron microscope image | |
CN114998671A (en) | Visual feature learning device based on convolution mask, acquisition device and storage medium | |
DE102020101525A1 (en) | BLIND-SPOT FOLDING ARCHITECTURES AND BAYESE IMAGE RECOVERY | |
CN114648640A (en) | Target object monomer method, device, equipment and storage medium | |
CN115131568B (en) | Space target segmentation method and device based on non-local attention mechanism | |
DE102021114013A1 (en) | TECHNIQUES FOR EFFICIENT SCANNING OF AN IMAGE | |
CN113837941A (en) | Training method and device for image hyper-resolution model and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |