CN111915613A - Image instance segmentation method, device, equipment and storage medium - Google Patents

Image instance segmentation method, device, equipment and storage medium Download PDF

Info

Publication number
CN111915613A
CN111915613A CN202010798878.XA CN202010798878A CN111915613A CN 111915613 A CN111915613 A CN 111915613A CN 202010798878 A CN202010798878 A CN 202010798878A CN 111915613 A CN111915613 A CN 111915613A
Authority
CN
China
Prior art keywords
feature map
feature
segmentation
image
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010798878.XA
Other languages
Chinese (zh)
Other versions
CN111915613B (en
Inventor
杜吉祥
程海峰
张洪博
翟传敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaqiao University
Original Assignee
Huaqiao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaqiao University filed Critical Huaqiao University
Priority to CN202010798878.XA priority Critical patent/CN111915613B/en
Publication of CN111915613A publication Critical patent/CN111915613A/en
Application granted granted Critical
Publication of CN111915613B publication Critical patent/CN111915613B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06T5/70
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform

Abstract

The invention provides an image instance segmentation method, an image instance segmentation device, image instance segmentation equipment and a storage medium, wherein the method comprises the following steps: extracting the characteristics of the image to obtain a characteristic diagram; extracting multi-scale information of the feature map through a self-attention module to obtain feature maps of different scales, and performing dot multiplication on the feature maps obtained by adjusting dimensionality through 1 multiplied by 1 convolution to introduce pixel-level attention to the feature maps; compressing the space dimension of the feature diagram, performing real-number transformation on the feature diagram, and weighting the low-level feature diagram; fusing the weighted feature map with the high-level feature map to obtain a new feature map; classifying the fused feature maps through target detection and semantic segmentation to obtain target detection and semantic segmentation feature maps; and carrying out image segmentation based on the obtained target detection and the semantic segmentation characteristic graph to obtain a segmentation result. According to the invention, through the self-attention module and the path enhancement module, more excellent semantic guidance is provided for subsequent feature fusion, and certain noise influence is reduced for training and learning of a subsequent network.

Description

Image instance segmentation method, device, equipment and storage medium
Technical Field
The invention relates to the field of artificial intelligence, in particular to an image instance segmentation method, device, equipment and storage medium.
Background
The image instance segmentation problem is an extremely important sub-problem in the field of computer vision, and the main purpose of the problem is to accurately intercept an object of interest in an image from the image, so as to avoid background interference and facilitate subsequent processing. The example segmentation itself contains many branch problems, respectively target detection and semantic segmentation. The lifting of each point is extremely important for the improvement of the final result, so the improvement direction in the text can also start from these two places.
Early image segmentation does not distinguish between instance segmentation and semantic segmentation, and aims to distinguish foreground and background images. The early image segmentation development techniques mainly had the following two directions:
image segmentation algorithm based on threshold value
The algorithm for image segmentation using threshold is one of the most commonly used image segmentation algorithms, and its main idea is to compare the gray value of each pixel of the image with the threshold, and then classify to obtain the final binary image, separating the target from the background. Therefore, the emphasis of the thresholding method is how to find the optimal gray level threshold. The classical algorithm is OTSU ohio, which finds the optimal threshold by maximizing the variance between different classes. Besides, there are various algorithms such as an averaging method, a histogram doublet method, an iterative threshold method and the like.
The threshold method has the characteristics of high calculation speed, simple algorithm implementation and the like, and has excellent effect when the difference between the target and the background is large or the contrast is obvious. However, when the information of the image is complex and the problems of blocking noise are serious, a satisfactory segmentation result cannot be achieved by the threshold method.
Image segmentation algorithm based on edge information
The edge itself contains more information than the pixel value, and the edge information changes most severely in the features such as pixel value, gradient and the like in the image, so that the segmentation based on the edge information is one of the most conceivable methods and the longest study time.
In general, pixels within a target are uniform or change gently, and such a change becomes extremely severe between different objects, so that edge information is often high-frequency information. One can calculate all possible boundary information in the image by the characteristic and perform segmentation according to the boundary information. Commonly used edge detection algorithms are Sobel, Canny, etc. However, when the pixel value of the target itself is variable or contains many fine textures, it is difficult for a single edge detection algorithm to obtain the boundary of the target completely. Similarly, when the image itself has a serious noise effect, the edge detection algorithm cannot work, because the noise information itself is not related to the image, and is also high-frequency information, which has a great influence on the detection.
Early image segmentation algorithms focused mainly on the use of surface information of images, such as pixels, edges, gradients, etc., but with the development of deep learning, neural networks can extract higher-level semantic information from images, which undoubtedly expands the amount of information that can be used by the algorithms, and greatly facilitates image segmentation, and segmentation starts to progress in two directions, namely semantic segmentation and example segmentation.
The semantic segmentation is aimed at classifying all pixels in an image, so as to distinguish different classes of objects in the image. The current semantic segmentation algorithm is developed very rapidly, end-to-end training and prediction are achieved, and the accuracy rate is continuously updated. The development of semantic segmentation on a neural network should be started from the FCN paper, which uses a conventional neural network as an encoding part to obtain a low-resolution but information-rich feature map, and then uses a transposed convolution to decode the feature map, and finally outputs a segmentation result map identical to that of the input image. This is the first time that end-to-end training is realized, and the development direction of semantic segmentation is also determined. In the subsequent semantic segmentation papers, the idea of the main body is not changed greatly, the first half is an encoder, a convolutional neural network is mainly used for encoding image information, and the second half is a decoder, and the encoded feature map is decoded to obtain a final result map. Subsequent improvements have also focused primarily on both the codec parts. The encoder part mainly uses a more excellent neural network, such as GoogleNet, ResNet, SENEt, and the like, and the decoder mainly uses a more excellent method, such as U-Net, to add short-circuit connection and reuse intermediate detection characteristics. Some other methods are to improve the processing details, such as the hole convolution proposed by deep lab, which increases the receptive field and makes the information of the feature map richer.
In the field of instance segmentation, the basic idea is object detection plus semantic segmentation. Early SDS generated proposed regions using MCG and then segmented the results using Bottom-Up, but this is computationally expensive and generally efficient. And then introducing a supercolumn concept into the classifier on the basis of SDS (sodium dodecyl sulfate) in HyperColumns to realize the correction of the ROI. The subsequent CFM starts assisted segmentation using mask. Through continuous iterative development, the final Mask-RCNN of the Renamo improves the development of example segmentation by a large step, and the final Mask-RCNN uses the fast-RCNN with the highest precision at that time as a target detection part, and then semantic segmentation branches are added on the basis to segment each detected target area. Therefore, the precision is ensured no matter in the target detection stage or the semantic segmentation stage. The later development mainly aims to improve the problems of Mask-RCNN, such as the full-connection information assisted segmentation added by PANET in semantic segmentation branches or the mechanism added by MS-RCNN for scoring the segmentation result.
The scene range used in the actual production life is wide, and the interaction between the instance segmentation technology and the user is more intuitive, and the instance segmentation requirement with higher precision is provided. At present, the Mask-RCNN method is similar to the present invention. In the semantic segmentation stage, the model mainly uses four layers of full convolution networks and final deconvolution to process to obtain a mask graph. The method is different from the method in that the self-defined feature fusion module is used for processing in the semantic segmentation stage, the feature graph information is more efficiently utilized, and finally processed moral mask graph information is richer and has higher accuracy.
Disclosure of Invention
The invention aims to provide an image instance segmentation method, an image instance segmentation device, an image instance segmentation equipment and a storage medium, which are used for realizing higher-precision instance segmentation.
In order to achieve the above object, an embodiment of the present invention provides an image instance segmentation method, including the following steps:
extracting features input into the convolutional neural network image to obtain five feature maps with different resolutions;
extracting multi-scale information of the feature map through a self-attention module to obtain feature maps of different scales;
based on the obtained feature maps with different scales, multiplying the feature maps with feature map points obtained by adjusting dimensionality through 1 × 1 convolution, and introducing pixel-level attention to the feature maps;
compressing the spatial dimension of the feature map, performing real number on the feature map, and weighting the low-level feature map based on the dimension and the real number;
fusing the weighted feature map with the high-level feature map to obtain a new feature map corresponding to the low level;
classifying the fused feature maps through target detection and semantic segmentation to obtain target detection and semantic segmentation feature maps;
and carrying out image segmentation based on the obtained target detection and the semantic segmentation characteristic graph to obtain a segmentation result.
Furthermore, the multi-scale feature map is obtained through a U-shaped structure, the U-shaped structure is fused with features of different pyramid scales, and the pyramid uses different convolution kernels to extract features to obtain the feature map.
Further, the feature map based on the obtained different scales is point-multiplied with the feature map obtained by adjusting dimensionality through 1 × 1 convolution, and the feature map after point multiplication is subjected to global pooling processing to obtain the feature map with the strongest global information.
Further, the weighting process is as follows:
Figure BDA0002626636750000041
Figure BDA0002626636750000042
wherein ,zcRepresenting the output array of real numbers, H and W being the height and width of the feature map, i, j representing the coordinates of the feature map, ucThe dimension of the feature map is to convert the H × W × C input feature map into 1 × 1 × C output, and the result represents the global information corresponding to each layer of the feature map in the spatial dimension.
The present invention also provides an image instance segmentation apparatus, comprising:
the first extraction module is used for extracting the features input into the convolutional neural network image and acquiring five feature maps with different resolutions;
the second extraction module is used for extracting multi-scale information of the feature map through the self-attention module to obtain feature maps with different scales;
the dot multiplication module is used for performing dot multiplication on the obtained feature maps with different scales and the obtained feature map with dimension adjusted through 1 x 1 convolution so as to introduce pixel level attention to the feature map;
the weighting module is used for compressing the spatial dimension of the feature map, carrying out real number on the feature map and weighting the low-level feature map based on the dimension and the real number;
the fusion module is used for fusing the weighted feature map with the high-level feature map to obtain a new feature map corresponding to the low level;
the classification module is used for classifying the fused feature map through target detection and semantic segmentation to obtain a target detection and semantic segmentation feature map;
and the segmentation module is used for carrying out image segmentation on the acquired target detection and semantic segmentation characteristic graph to obtain a segmentation result.
Further, the multi-scale feature map is obtained through a U-shaped structure, the U-shaped structure is fused with features of different pyramid scales, and the pyramid uses different convolution kernels to extract features to obtain the feature map.
Further, the feature map based on the obtained different scales is point-multiplied with the feature map obtained by adjusting dimensionality through 1 × 1 convolution, and the feature map after point multiplication is subjected to global pooling processing to obtain the feature map with the strongest global information.
Further, the weighting process is as follows:
Figure BDA0002626636750000061
Figure BDA0002626636750000062
wherein ,zcRepresenting the output array of real numbers, H and W being the height and width of the feature map, i, j representing the coordinates of the feature map, ucThe dimension of the feature map is to convert the H × W × C input feature map into 1 × 1 × C output, and the result represents the global information corresponding to each layer of the feature map in the spatial dimension.
The invention also provides an image example segmentation device, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor is used for operating the computer program to realize the image example segmentation method.
The invention also provides a storage medium for storing at least one program and at least one instruction, wherein the at least one program and the instruction are executed to realize the image instance segmentation method.
The invention has the beneficial technical effects that:
the invention provides a self-attention module and a path enhancement structure in the characteristic fusion stage, plays a great role in improving the final result of the network, and can realize higher-precision example segmentation. The information level of the feature map of the highest layer in the backbone is improved through the self-attention module, multi-scale information and pixel level attention are introduced, and more excellent semantic guidance is provided for subsequent feature fusion. The path enhancement structure introduces attention of space dimensionality, global information obtained by the high-level feature map is used as weight to weight the low-level feature map, so that the network is emphasized in use of information, and certain noise influence is reduced for training and learning of a subsequent network.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a flowchart illustrating an image example segmentation method according to a first embodiment of the present invention.
Fig. 2 is a flowchart illustrating an image example segmentation method according to a first embodiment of the present invention.
Fig. 3 is a schematic diagram of a self-attention module according to a first embodiment of the invention.
Fig. 4 is a schematic diagram of a path enhancing structure according to a first embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
For the understanding of the present invention, the original Mask-RCNN in which a classic Feature Pyramid (FPN) is used for merging five feature maps of different resolutions extracted from a backbone will be described first. The fusion mode is widely applied since being proposed, and the performance of the network is improved greatly. The method fuses high-level semantic information and low-level spatial information, and enables a feature map with unbalanced original information to be more balanced. In ResNet, the feature map semantic information of the fifth layer is most rich and the performance is strongest, but its spatial information is almost lost because the resolution is too small. However, by fusing the feature pyramid with a lower-level feature map, the defect that the feature pyramid is lack of spatial information can be overcome to a certain extent. The characteristic of rich semantic information can be used for providing semantic guidance for the low-level feature map and reducing noise information in the feature map.
Therefore, the function realized by the Feature Pyramid (FPN) is to guide low-level spatial information by using high-level semantic information, reduce noise, improve the information quality of the feature map and balance the information difference between the feature maps. The key role is the feature map at the highest level of the network, semantic information of the feature map at the layer is strongest, but multi-scale information is not available due to the design of the backbone. In the current structure, the pyramid structure can extract feature information of different scales, but the structure is flat and lacks global context information.
And an early characteristic pyramid (FPN) fuses the characteristic diagram through a Bottom-Up structure, so that the information of the characteristic diagram is more balanced, and the noise is suppressed. However, the feature pyramid just performs fusion processing on feature map information, does not select information, and has no emphasis. This results in the retention of noise or unimportant information within the profile, which is equivalent to introducing noise interference for later processing by the network. If the information of the feature map can be screened and weighted while the feature map fusion processing is performed, the unimportant and the next-important information can be suppressed, and the weight of the useful information can be enhanced, the effect of the feature fusion can be more prominent, and the processing after the network can also avoid the interference of noise
The following detailed description describes specific embodiments of the present invention.
As shown in fig. 1-2, a first embodiment of the present invention provides an image instance segmentation method, including the steps of:
s11, extracting the features of the convolution neural network image to obtain five feature maps with different resolutions;
s12, extracting multi-scale information of the feature map through a self-attention module to obtain feature maps with different scales;
in the present embodiment, as shown in fig. 3, the self-attention module can fuse three features with different pyramid scales through a U-shaped structure (e.g., a feature pyramid). To better extract the multi-scale context information of an image through a pyramid structure, the pyramid structure can use three different convolution kernels for feature extraction, 3 × 3, 5 × 5, and 7 × 7 respectively. The scale information of three different targets in large, medium and small sizes can be extracted from the original feature map through the three feature maps with different scales, the information level of the feature map is enriched, and meanwhile, the structure can bring pixel-level attention to the feature map. The self-attention module can introduce finer pixel-level attention information for the network while fusing multi-scale information, which can provide more accurate semantic guidance for the low-level feature map in subsequent feature fusion. It should be understood that the U-shaped structure may be of other types, and these embodiments are all within the scope of the present invention.
S13, based on the obtained feature maps of different scales, multiplying the feature maps by the feature map point obtained by adjusting the dimensionality through 1 × 1 convolution, thereby introducing pixel-level attention to the feature maps;
in this embodiment, the feature map input algorithm module at the last layer of the ResNet-101 is divided into two paths, one path extracts multi-scale information through a pyramid structure, and 4 × 4, 8 × 8, and 16 × 16 after each level of pyramid represents the resolution of the feature map after the processing at this level. The other path adjusts the dimensionality by a 1 × 1 convolution. Because the resolution of the feature map itself at the time of input is small, the use of a large convolution kernel herein also does not introduce excessive computational load to the network. The pyramid structure extracts different scales of information step by step, and combines the context information of different scales accurately in a cascading manner. After two paths of processing, the two feature maps are subjected to dot multiplication, and pixel level attention is introduced to the original feature map through the step.
Preferably, the two feature maps are subjected to point multiplication and then subjected to global pooling, so that the feature maps contain the strongest global information, and the branch can provide the global information for the module, thereby further improving the performance of the module.
S14, compressing the space dimension of the feature map, real-numbering the feature map, and weighting the low-level feature map based on the dimension and the real number;
in this embodiment, as shown in fig. 4, the path enhancement structure is that the real number obtained by real-quantizing the feature map has a global receptive field, and the output dimension of the path enhancement structure is the same as the dimension of the input feature map, which can be used to characterize the global distribution on the feature dimension, so that the global receptive field can be transmitted to the next-layer feature map, and such global information is important for many tasks.
Figure BDA0002626636750000091
wherein ,zcRepresenting the output array of real numbers, H and W being the height and width of the feature map, i, j representing the coordinates of the feature map, ucThe dimension of the feature map is to convert the H × W × C input feature map into 1 × 1 × C output, and the result represents the global information corresponding to each layer of the feature map in the spatial dimension.
And converting the high-level feature map into corresponding global information through global pooling, wherein the real numbers are equivalent to weight information extracted from the network. And performing dot multiplication on the low-level feature map and the feature map of the bottom layer, and weighting the low-level feature map, wherein the calculation process is as follows:
Figure BDA0002626636750000092
s15, fusing the weighted feature graph with the high-level feature graph to obtain a new feature graph corresponding to the low level;
in the embodiment, through the algorithm, adjustment on the spatial dimension is added, the low-level feature map is weighted by using the high-level feature map with rich semantic information, and guidance information is provided for the low-level feature map by using simpler processing. The blank of the original characteristic pyramid in the spatial dimension is supplemented, and the overall quality of characteristic fusion processing is improved.
S16, classifying the fused feature maps through target detection and semantic segmentation to obtain target detection and semantic segmentation feature maps;
and S17, carrying out image segmentation based on the acquired target detection and the semantic segmentation characteristic graph to obtain a segmentation result.
A second embodiment of the present invention provides a remote sensing image extraction device based on superpixels, as shown in the figure, including:
a first extraction module 110, which is used for extracting the characteristics input into the convolutional neural network image and acquiring five characteristic graphs with different resolutions;
a second extraction module 120, which is used for extracting the multi-scale information of the feature map through the self-attention module to obtain feature maps with different scales;
in this embodiment, as shown in fig. 3, the self-attention module can fuse three features with different pyramid scales through a U-shaped structure (e.g., a feature pyramid). To better extract the multi-scale context information of an image through a pyramid structure, the pyramid structure can use three different convolution kernels for feature extraction, 3 × 3, 5 × 5, and 7 × 7 respectively. The scale information of three different targets in large, medium and small sizes can be extracted from the original feature map through the three feature maps with different scales, the information level of the feature map is enriched, and meanwhile, the structure can bring pixel-level attention to the feature map. The self-attention module can introduce finer pixel-level attention information for the network while fusing multi-scale information, which can provide more accurate semantic guidance for the low-level feature map in subsequent feature fusion. It should be understood that the U-shaped structure may be of other types, and these embodiments are all within the scope of the present invention.
A dot multiplication module, which is used for performing dot multiplication on the obtained feature maps with different scales and the obtained feature map with dimension adjusted by 1 multiplied by 1 convolution, thereby introducing pixel level attention to the feature map;
in this embodiment, the feature map input algorithm module at the last layer of the ResNet-101 is divided into two paths, one path extracts multi-scale information through a pyramid structure, and 4 × 4, 8 × 8, and 16 × 16 after each level of pyramid represents the resolution of the feature map after the processing at this level. The other path adjusts the dimensionality by a 1 × 1 convolution. Because the resolution of the feature map itself at the time of input is small, the use of a large convolution kernel herein also does not introduce excessive computational load to the network. The pyramid structure extracts different scales of information step by step, and combines the context information of different scales accurately in a cascading manner. After two paths of processing, the two feature maps are subjected to dot multiplication, and pixel level attention is introduced to the original feature map through the step.
Preferably, the two feature maps are subjected to point multiplication and then subjected to global pooling, so that the feature maps contain the strongest global information, and the branch can provide the global information for the module, thereby further improving the performance of the module.
140: the weighting module is used for compressing the spatial dimension of the feature map, carrying out real number on the feature map and weighting the low-level feature map based on the dimension and the real number;
in this embodiment, as shown in fig. 4, the real number obtained by real-quantizing the feature map has a global receptive field, and the output dimension of the real number is the same as the input dimension of the feature map, which can be used to characterize the global distribution on the feature dimension, so that the global receptive field can be transmitted to the next layer of feature map, and such global information is important for many tasks.
Figure BDA0002626636750000111
wherein ,zcRepresenting the output array of real numbers, H and W being the height and width of the feature map, i, j representing the coordinates of the feature map, ucIs the dimension of the feature map, and converts the H multiplied by W multiplied by C input feature map into 1 multiplied by C outputThe result represents the global information corresponding to each layer of the feature map in the spatial dimension.
And converting the high-level feature map into corresponding global information through global pooling, wherein the real numbers are equivalent to weight information extracted from the network. And performing dot multiplication on the low-level feature map and the feature map of the bottom layer, and weighting the low-level feature map, wherein the calculation process is as follows:
Figure BDA0002626636750000112
150, a fusion module for fusing the weighted feature map with the high-level feature map to obtain a new feature map corresponding to the low level;
in the embodiment, through the algorithm, adjustment on the spatial dimension is added, the low-level feature map is weighted by using the high-level feature map with rich semantic information, and guidance information is provided for the low-level feature map by using simpler processing. The blank of the original characteristic pyramid in the spatial dimension is supplemented, and the overall quality of characteristic fusion processing is improved.
The classification module is used for classifying the fused feature map through target detection and semantic segmentation to obtain a target detection and semantic segmentation feature map;
170: and the segmentation module is used for carrying out image segmentation on the acquired target detection and semantic segmentation characteristic graph to obtain a segmentation result.
A third embodiment of the present invention provides an image instance segmentation apparatus, which includes a memory and a processor, wherein the memory stores a computer program, and the processor is configured to run the computer program to implement the image instance segmentation method.
The fourth embodiment of the present invention further provides a storage medium, where the storage medium stores a computer program, and the computer program can be executed by a processor of a device in which the storage medium is located, so as to implement the image instance segmentation method.
In the embodiments provided in the embodiments of the present invention, it should be understood that the apparatus and method provided may be implemented in other ways. The apparatus and method embodiments described above are illustrative only, as the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. An image instance segmentation method is characterized by comprising the following steps:
extracting features input into the convolutional neural network image to obtain five feature maps with different resolutions;
extracting multi-scale information of the feature map through a self-attention module to obtain feature maps of different scales;
based on the obtained feature maps with different scales, multiplying the feature maps with feature map points obtained by adjusting dimensionality through 1 × 1 convolution, and introducing pixel-level attention to the feature maps;
compressing the spatial dimension of the feature map, performing real number on the feature map, and weighting the low-level feature map based on the dimension and the real number;
fusing the weighted feature map with the high-level feature map to obtain a new feature map corresponding to the low level;
classifying the fused feature maps through target detection and semantic segmentation to obtain target detection and semantic segmentation feature maps;
and carrying out image segmentation based on the obtained target detection and the semantic segmentation characteristic graph to obtain a segmentation result.
2. The image instance segmentation method according to claim 1, wherein the multi-scale feature map is obtained by a U-shaped structure, the U-shaped structure fuses features of a plurality of different pyramid scales, and the pyramid extracts features using a plurality of different convolution kernels to obtain a feature map.
3. The image instance segmentation method according to claim 1, wherein the feature maps with different scales obtained based on the above are point-multiplied by the feature map obtained by adjusting dimensionality through 1 × 1 convolution, and the point-multiplied feature map is processed through global pooling to obtain the feature map with the strongest global information.
4. An image instance segmentation method as claimed in claim 1, wherein the weighting procedure is:
Figure FDA0002626636740000011
Figure FDA0002626636740000012
wherein ,zcRepresenting the output array of real numbers, H and W being the height and width of the feature map, i, j representing the coordinates of the feature map, ucThe dimension of the feature map is that the H multiplied by W multiplied by C input feature map is converted into 1 multiplied by C output, and the result represents the global information corresponding to each layer of the feature map in the space dimension.
5. An image instance segmentation apparatus, comprising:
the first extraction module is used for extracting the features input into the convolutional neural network image and acquiring five feature maps with different resolutions;
the second extraction module is used for extracting multi-scale information of the feature map through the self-attention module to obtain feature maps with different scales;
the dot multiplication module is used for performing dot multiplication on the obtained feature maps with different scales and the obtained feature map with dimension adjusted through 1 x 1 convolution so as to introduce pixel level attention to the feature map;
the weighting module is used for compressing the spatial dimension of the feature map, carrying out real number on the feature map and weighting the low-level feature map based on the dimension and the real number;
the fusion module is used for fusing the weighted feature map with the high-level feature map to obtain a new feature map corresponding to the low level;
the classification module is used for classifying the fused feature map through target detection and semantic segmentation to obtain a target detection and semantic segmentation feature map;
and the segmentation module is used for acquiring the target detection and semantic segmentation characteristic diagram to perform image segmentation so as to obtain a segmentation result.
6. The image instance segmentation device according to claim 5, wherein the multi-scale feature map is obtained by a U-shaped structure, the U-shaped structure fuses features of different pyramid scales, and the pyramid uses different convolution kernels to extract features to obtain a feature map.
7. The image example segmentation device according to claim 5, wherein the obtained feature maps of different scales are point-multiplied by the obtained feature map with dimensions adjusted by 1 × 1 convolution, and the point-multiplied feature map is processed by global pooling to obtain the feature map with the strongest global information.
8. An image instance segmentation apparatus as claimed in claim 5, wherein the weighting procedure is:
Figure FDA0002626636740000031
Figure FDA0002626636740000032
where z represents the real number array of the output, H and W are the height and width of the feature map, i, j represents the coordinates of the feature map, ucThe dimension of the feature map is that the H multiplied by W multiplied by C input feature map is converted into 1 multiplied by C output, and the result represents the global information corresponding to each layer of the feature map in the space dimension.
9. An image instance segmentation apparatus comprising a memory in which a computer program is stored and a processor for running the computer program to implement an image instance segmentation method according to any one of claims 1 to 4.
10. A storage medium, characterized in that the storage medium stores a computer program executable by a processor of a device on which the storage medium is located to implement an image instance segmentation method according to any one of claims 1 to 4.
CN202010798878.XA 2020-08-11 2020-08-11 Image instance segmentation method, device, equipment and storage medium Active CN111915613B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010798878.XA CN111915613B (en) 2020-08-11 2020-08-11 Image instance segmentation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010798878.XA CN111915613B (en) 2020-08-11 2020-08-11 Image instance segmentation method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111915613A true CN111915613A (en) 2020-11-10
CN111915613B CN111915613B (en) 2023-06-13

Family

ID=73284821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010798878.XA Active CN111915613B (en) 2020-08-11 2020-08-11 Image instance segmentation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111915613B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446308A (en) * 2020-11-16 2021-03-05 北京科技大学 Semantic enhancement-based pedestrian detection method based on multi-scale feature pyramid fusion
CN112700462A (en) * 2020-12-31 2021-04-23 北京迈格威科技有限公司 Image segmentation method and device, electronic equipment and storage medium
CN112819032A (en) * 2021-01-11 2021-05-18 平安科技(深圳)有限公司 Multi-model-based slice feature classification method, device, equipment and medium
CN113159061A (en) * 2021-03-10 2021-07-23 哈尔滨工业大学 Actual tunnel surrounding rock fragment identification method based on example segmentation
CN113344935A (en) * 2021-06-30 2021-09-03 山东建筑大学 Image segmentation method and system based on multi-scale difficulty perception
CN113421267A (en) * 2021-05-07 2021-09-21 江苏大学 Point cloud semantic and instance joint segmentation method and system based on improved PointConv
CN113469942A (en) * 2021-06-01 2021-10-01 天津大学 CT image lesion detection method
WO2021208726A1 (en) * 2020-11-23 2021-10-21 平安科技(深圳)有限公司 Target detection method and apparatus based on attention mechanism, and computer device
CN113591859A (en) * 2021-06-23 2021-11-02 北京旷视科技有限公司 Image segmentation method, apparatus, device and medium
CN113610085A (en) * 2021-10-10 2021-11-05 成都千嘉科技有限公司 Character wheel image identification method based on attention mechanism
CN114511046A (en) * 2022-04-19 2022-05-17 阿里巴巴(中国)有限公司 Object recognition method and device
CN114913325A (en) * 2022-03-24 2022-08-16 北京百度网讯科技有限公司 Semantic segmentation method, device and computer program product
CN114972376A (en) * 2022-05-16 2022-08-30 北京医准智能科技有限公司 Coronary calcified plaque segmentation method, segmentation model training method and related device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104866855A (en) * 2015-05-07 2015-08-26 华为技术有限公司 Image feature extraction method and apparatus
CN106157319A (en) * 2016-07-28 2016-11-23 哈尔滨工业大学 The significance detection method that region based on convolutional neural networks and Pixel-level merge
CN107330873A (en) * 2017-05-05 2017-11-07 浙江大学 Objective evaluation method for quality of stereo images based on multiple dimensioned binocular fusion and local shape factor
CN109525290A (en) * 2019-01-11 2019-03-26 电子科技大学 Real number feedback iteration channel estimation methods based on MIMO-FBMC system
CN110084210A (en) * 2019-04-30 2019-08-02 电子科技大学 The multiple dimensioned Ship Detection of SAR image based on attention pyramid network
CN110096994A (en) * 2019-04-28 2019-08-06 西安电子科技大学 A kind of small sample PolSAR image classification method based on fuzzy label semanteme priori
CN111242071A (en) * 2020-01-17 2020-06-05 陕西师范大学 Attention remote sensing image target detection method based on anchor frame
CN111292330A (en) * 2020-02-07 2020-06-16 北京工业大学 Image semantic segmentation method and device based on coder and decoder

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104866855A (en) * 2015-05-07 2015-08-26 华为技术有限公司 Image feature extraction method and apparatus
CN106157319A (en) * 2016-07-28 2016-11-23 哈尔滨工业大学 The significance detection method that region based on convolutional neural networks and Pixel-level merge
CN107330873A (en) * 2017-05-05 2017-11-07 浙江大学 Objective evaluation method for quality of stereo images based on multiple dimensioned binocular fusion and local shape factor
CN109525290A (en) * 2019-01-11 2019-03-26 电子科技大学 Real number feedback iteration channel estimation methods based on MIMO-FBMC system
CN110096994A (en) * 2019-04-28 2019-08-06 西安电子科技大学 A kind of small sample PolSAR image classification method based on fuzzy label semanteme priori
CN110084210A (en) * 2019-04-30 2019-08-02 电子科技大学 The multiple dimensioned Ship Detection of SAR image based on attention pyramid network
CN111242071A (en) * 2020-01-17 2020-06-05 陕西师范大学 Attention remote sensing image target detection method based on anchor frame
CN111292330A (en) * 2020-02-07 2020-06-16 北京工业大学 Image semantic segmentation method and device based on coder and decoder

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TING ZHAO, XIANGQIAN WU: "Pyramid Feature Attention Network for Saliency detection", CVPR, pages 1 - 10 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446308A (en) * 2020-11-16 2021-03-05 北京科技大学 Semantic enhancement-based pedestrian detection method based on multi-scale feature pyramid fusion
WO2021208726A1 (en) * 2020-11-23 2021-10-21 平安科技(深圳)有限公司 Target detection method and apparatus based on attention mechanism, and computer device
CN112700462A (en) * 2020-12-31 2021-04-23 北京迈格威科技有限公司 Image segmentation method and device, electronic equipment and storage medium
CN112819032B (en) * 2021-01-11 2023-10-27 平安科技(深圳)有限公司 Multi-model-based slice feature classification method, device, equipment and medium
CN112819032A (en) * 2021-01-11 2021-05-18 平安科技(深圳)有限公司 Multi-model-based slice feature classification method, device, equipment and medium
CN113159061A (en) * 2021-03-10 2021-07-23 哈尔滨工业大学 Actual tunnel surrounding rock fragment identification method based on example segmentation
CN113421267A (en) * 2021-05-07 2021-09-21 江苏大学 Point cloud semantic and instance joint segmentation method and system based on improved PointConv
CN113421267B (en) * 2021-05-07 2024-04-12 江苏大学 Point cloud semantic and instance joint segmentation method and system based on improved PointConv
CN113469942A (en) * 2021-06-01 2021-10-01 天津大学 CT image lesion detection method
CN113591859A (en) * 2021-06-23 2021-11-02 北京旷视科技有限公司 Image segmentation method, apparatus, device and medium
CN113344935A (en) * 2021-06-30 2021-09-03 山东建筑大学 Image segmentation method and system based on multi-scale difficulty perception
CN113610085B (en) * 2021-10-10 2021-12-07 成都千嘉科技有限公司 Character wheel image identification method based on attention mechanism
CN113610085A (en) * 2021-10-10 2021-11-05 成都千嘉科技有限公司 Character wheel image identification method based on attention mechanism
CN114913325A (en) * 2022-03-24 2022-08-16 北京百度网讯科技有限公司 Semantic segmentation method, device and computer program product
CN114511046A (en) * 2022-04-19 2022-05-17 阿里巴巴(中国)有限公司 Object recognition method and device
CN114972376A (en) * 2022-05-16 2022-08-30 北京医准智能科技有限公司 Coronary calcified plaque segmentation method, segmentation model training method and related device
CN114972376B (en) * 2022-05-16 2023-08-25 北京医准智能科技有限公司 Coronary calcified plaque segmentation method, segmentation model training method and related device

Also Published As

Publication number Publication date
CN111915613B (en) 2023-06-13

Similar Documents

Publication Publication Date Title
CN111915613A (en) Image instance segmentation method, device, equipment and storage medium
CN111462126B (en) Semantic image segmentation method and system based on edge enhancement
KR20220066945A (en) Image processing method, apparatus, electronic device and computer readable storage medium
CN105184763B (en) Image processing method and device
CN112966691B (en) Multi-scale text detection method and device based on semantic segmentation and electronic equipment
CN106156777B (en) Text picture detection method and device
CN113096140B (en) Instance partitioning method and device, electronic device and storage medium
CN111444919A (en) Method for detecting text with any shape in natural scene
CN116258719B (en) Flotation foam image segmentation method and device based on multi-mode data fusion
CN112365514A (en) Semantic segmentation method based on improved PSPNet
Tang et al. R²MRF: Defocus Blur Detection via Recurrently Refining Multi-Scale Residual Features
CN114048822A (en) Attention mechanism feature fusion segmentation method for image
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
CN111860683A (en) Target detection method based on feature fusion
CN114693924A (en) Road scene semantic segmentation method based on multi-model fusion
CN111553351A (en) Semantic segmentation based text detection method for arbitrary scene shape
CN112927209A (en) CNN-based significance detection system and method
CN111914947B (en) Image instance segmentation method, device, equipment and storage medium based on feature fusion
CN113591831A (en) Font identification method and system based on deep learning and storage medium
CN113591719A (en) Method and device for detecting text with any shape in natural scene and training method
CN112802039B (en) Panorama segmentation method based on global edge attention
CN113392854A (en) Image texture feature extraction and classification method
CN112365451A (en) Method, device and equipment for determining image quality grade and computer readable medium
CN111738069A (en) Face detection method and device, electronic equipment and storage medium
JP2011170890A (en) Face detecting method, face detection device, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant