CN111915613B - Image instance segmentation method, device, equipment and storage medium - Google Patents

Image instance segmentation method, device, equipment and storage medium Download PDF

Info

Publication number
CN111915613B
CN111915613B CN202010798878.XA CN202010798878A CN111915613B CN 111915613 B CN111915613 B CN 111915613B CN 202010798878 A CN202010798878 A CN 202010798878A CN 111915613 B CN111915613 B CN 111915613B
Authority
CN
China
Prior art keywords
feature
feature map
feature images
images
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010798878.XA
Other languages
Chinese (zh)
Other versions
CN111915613A (en
Inventor
杜吉祥
程海峰
张洪博
翟传敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaqiao University
Original Assignee
Huaqiao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaqiao University filed Critical Huaqiao University
Priority to CN202010798878.XA priority Critical patent/CN111915613B/en
Publication of CN111915613A publication Critical patent/CN111915613A/en
Application granted granted Critical
Publication of CN111915613B publication Critical patent/CN111915613B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an image instance segmentation method, device, equipment and storage medium, wherein the method comprises the following steps: extracting features of the image and obtaining a feature map; extracting multi-scale information of the feature images through a self-attention module to obtain feature images with different scales, and then multiplying the feature images with feature image points obtained through 1X 1 convolution adjustment dimensionality to introduce pixel-level attention to the feature images; compressing the space dimension of the feature map, realizing the feature map, and weighting the low-level feature map; fusing the weighted feature map with the high-level feature map to obtain a new feature map; classifying the fused feature images through target detection and semantic segmentation to obtain target detection and semantic segmentation feature images; and performing image segmentation based on the obtained target detection and the semantic segmentation feature map to obtain a segmentation result. The invention provides more excellent semantic guidance for subsequent feature fusion through the self-attention module and the path enhancement module, and reduces certain noise influence for training and learning of subsequent networks.

Description

Image instance segmentation method, device, equipment and storage medium
Technical Field
The present invention relates to the field of artificial intelligence, and in particular, to a method, apparatus, device, and storage medium for image instance segmentation.
Background
The problem of image instance segmentation is an extremely important sub-problem in the field of computer vision, and the main purpose of the image instance segmentation is to accurately intercept objects of interest in an image from the image so as to avoid background interference and facilitate subsequent processing. Instance segmentation itself involves many spur problems, target detection and semantic segmentation, respectively. The improvement at each point is of paramount importance for the improvement of the final result, and thus the direction of improvement herein may also be from both of these places.
Early image segmentation does not distinguish between instance segmentation and semantic segmentation, with the aim of distinguishing the foreground image from the background image. Early image segmentation development techniques mainly have the following two directions:
threshold-based image segmentation algorithm
The algorithm of image segmentation using a threshold is one of the most common image segmentation algorithms, the main idea being to compare the gray value of each pixel of the image with the threshold and then classify the image to obtain the final binary image, separating the object from the background. Therefore, the emphasis of the thresholding method is how to find the optimal gray threshold. The classical algorithm is the OTSU oxford method, which finds the best threshold by maximizing the variance between the different classes. In addition, the method also comprises a mean value method, a histogram double-peak method, an iterative threshold method and other algorithms.
The thresholding method has the characteristics of high calculation speed, simple algorithm implementation and the like, and has excellent effect when the difference between a target and a background is large or the contrast is obvious. However, when the information of the image itself is complex and the problems such as shielding noise are serious, the threshold method cannot achieve a satisfactory segmentation result.
Image segmentation algorithm based on edge information
The edge itself contains more information than the pixel value, and the edge information in the image also changes the most severely in the pixel value, gradient, and other characteristics, so that the segmentation according to the edge information is one of the most easily conceived and the longest research time.
In general, the pixels inside the object are uniform or change is moderate, and the change becomes extremely severe among different objects, so that the edge information is usually high-frequency information in most cases. One can calculate all possible boundary information in the image from this feature and segment it accordingly. Common edge detection algorithms are Sobel, canny, etc. However, when the pixel value of the object itself is variable or contains many fine textures, it is difficult for a single edge detection algorithm to obtain the boundary of the object completely. Similarly, when the noise influence of the image itself is serious, the edge detection algorithm cannot be effective, because the noise information itself is irrelevant to the image, and the noise information is also high-frequency information, so that the influence is extremely large for detection.
Early image segmentation algorithms focused mainly on the use of image surface information such as pixels, edges, gradients, etc., but with the development of deep learning, neural networks could extract more advanced semantic information from images, which undoubtedly expanded the amount of information that the algorithm could use, greatly facilitated image segmentation, with segmentation beginning in two directions, semantic segmentation and instance segmentation.
The purpose of semantic segmentation is simpler, namely to classify all pixels in an image, thereby distinguishing objects of different classes in the image. The current semantic segmentation algorithm is very rapid in development, has been trained and predicted end-to-end, and continuously refreshed. The development of semantic segmentation on neural networks starts from the paper FCN, and uses a traditional neural network as an encoding part to obtain a low-resolution but rich-information feature map, then decodes the feature map by using transposed convolution, and finally outputs a segmentation result map which is the same as an input image. This is the first time end-to-end training has been achieved, and the direction of development of semantic segmentation has also been determined. In the subsequent semantic segmentation theory, the main idea is not greatly changed, the former half part is an encoder, the convolutional neural network is mainly used for encoding the image information, the latter half part is a decoder, and the encoded feature map is decoded to obtain a final result map. Subsequent improvements have also focused mainly on both parts of the codec. The encoder part mainly uses more excellent neural networks, such as GoogleNet, resNet, SENet, and the decoder has more excellent methods such as U-Net added with short-circuit connection, and the intermediate detection feature is reused. And the other parts are improved in processing details, such as cavity convolution proposed by deep Lab, so that the receptive field is increased, and the information of the feature map is more abundant.
In the field of instance segmentation, the basic idea is object detection plus semantic segmentation. Early SDS generated a suggested region using MCG, after which the result was segmented using Bottom-Up, but this was computationally intensive and the segmentation effect was general. And then introducing a super column concept into the classifier on the basis of SDS (sodium dodecyl sulfate), thereby realizing the correction of the ROI. The CFM then starts using the mask for auxiliary segmentation. Through continuous iterative development, the Mask-RCNN of the final He Kaiming improves the development of instance segmentation by a large step, he uses the fast-RCNN with highest accuracy at the time as a target detection part, and semantic segmentation branches are added on the basis to segment each detected target region. Thus, the precision is ensured in both the target detection stage and the semantic segmentation stage. The development of the method mainly aims at improving the problems of Mask-RCNN, such as full connection information assisted segmentation added by PANet on semantic segmentation branches or a mechanism for scoring segmentation results added by MS-RCNN.
The scene range used in actual production and life is also very wide, and the example segmentation technology is more visual in interaction with a user, and the example segmentation requirement with higher precision is provided. At present, the Mask-RCNN method is relatively similar to the present invention. The model is mainly processed by using a four-layer full convolution network and a final deconvolution in the semantic segmentation stage to obtain a mask graph. The invention is different from the method in that the method uses the custom feature fusion module to process in the semantic segmentation stage, so that the feature map information is utilized more efficiently, and the finally processed moral mask map information is richer and higher in accuracy.
Disclosure of Invention
The invention aims to provide an image instance segmentation method, an image instance segmentation device and a storage medium, which are used for realizing higher-precision instance segmentation.
To achieve the above object, an embodiment of the present invention provides an image instance segmentation method, including the steps of:
extracting features input to a convolutional neural network image, and obtaining five feature images with different resolutions;
extracting multi-scale information of the feature images through a self-attention module to obtain feature images with different scales;
based on the obtained feature images with different scales, multiplying the feature images obtained by the dimension adjustment through 1 multiplied by 1 convolution, thereby introducing pixel-level attention to the feature images;
compressing the space dimension of the feature map, realizing the feature map, and weighting the low-level feature map based on the dimension and the real number;
fusing the weighted feature images with the high-level feature images to obtain new feature images corresponding to the low level;
classifying the fused feature images through target detection and semantic segmentation to obtain target detection and semantic segmentation feature images;
and performing image segmentation based on the obtained target detection and the semantic segmentation feature map to obtain a segmentation result.
Furthermore, the multi-scale feature map is obtained through a U-shaped structure, the U-shaped structure fuses features of various different pyramid scales, and the pyramid uses various different convolution kernels to perform feature extraction, so that a feature map is obtained.
Further, the feature map based on the obtained feature maps with different scales is subjected to dot multiplication with the obtained feature map with dimension adjusted through 1×1 convolution, and the feature map after dot multiplication is subjected to global pooling processing to obtain the feature map with the strongest global information.
Further, the weighting process is as follows:
Figure BDA0002626636750000041
Figure BDA0002626636750000042
wherein ,zc Representing the real number array of the output, H and W are the height and width of the feature map, i, j represents the coordinates of the feature map, u c Is the dimension of the feature map, converts the h×w×c input feature map into a 1×1×c output, and this result represents global information corresponding to each layer of this feature map in the spatial dimension.
The invention also provides an image instance segmentation device, comprising:
the first extraction module is used for extracting the characteristics input to the convolutional neural network image and obtaining five characteristic diagrams with different resolutions;
the second extraction module is used for extracting multi-scale information of the feature images through the self-attention module to obtain feature images with different scales;
the dot multiplication module is used for dot multiplying the obtained feature images with different dimensions by the feature images with dimension adjusted through 1X 1 convolution so as to introduce pixel-level attention to the feature images;
the weighting module is used for compressing the space dimension of the feature map, realizing the feature map, and weighting the low-level feature map based on the dimension and the real number;
the fusion module is used for fusing the weighted feature images with the high-level feature images to obtain new feature images corresponding to the low level;
the classification module is used for classifying the fused feature images through target detection and semantic segmentation to obtain target detection and semantic segmentation feature images;
the segmentation module is used for obtaining target detection and semantic segmentation feature images to carry out image segmentation, and obtaining segmentation results.
Further, the multi-scale feature map is obtained through a U-shaped structure, the U-shaped structure fuses features of various pyramid scales, and the pyramid uses various convolution kernels to perform feature extraction to obtain the feature map.
Further, the feature map based on the obtained feature maps with different scales is subjected to dot multiplication with the obtained feature map with dimension adjusted through 1×1 convolution, and the feature map after dot multiplication is subjected to global pooling processing to obtain the feature map with the strongest global information.
Further, the weighting process is as follows:
Figure BDA0002626636750000061
Figure BDA0002626636750000062
wherein ,zc Representing the real number array of the output, H and W are the height and width of the feature map, i, j represents the coordinates of the feature map, u c Is the dimension of the feature map, converts the h×w×c input feature map into a 1×1×c output, and this result represents global information corresponding to each layer of this feature map in the spatial dimension.
The invention also provides an image instance segmentation device, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor is used for running the computer program to realize the image instance segmentation method.
The present invention also provides a storage medium storing at least one program, at least one instruction, the at least one program, the instructions being executable to implement the image instance segmentation method.
The beneficial technical effects of the invention are as follows:
the invention provides a self-attention module and a path enhancement structure in the feature fusion stage, plays a great role in the final result of the network, and can realize higher-precision instance segmentation. The self-attention module improves the information level of the highest-layer feature map in the backstone, introduces multi-scale information and pixel-level attention, and provides more excellent semantic guidance for subsequent feature fusion. The path enhancement structure introduces the attention of space dimension, and uses the high-level feature map to acquire global information as weight to weight the low-level feature map, so that the network has a certain emphasis on the use of information, and a certain noise influence is reduced for training and learning of the subsequent network.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some examples of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart illustrating an image instance segmentation method according to a first embodiment of the present invention.
Fig. 2 is a flowchart illustrating an image instance segmentation method according to a first embodiment of the present invention.
Fig. 3 is a schematic diagram of a self-attention module according to a first embodiment of the present invention.
Fig. 4 is a schematic diagram of a path enhancement structure according to a first embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
To facilitate an understanding of the present invention, an original Mask-RCNN is described below in which a classical Feature Pyramid (FPN) is used for fusion of five different resolution feature maps extracted by a backface. Such a fusion approach has been widely used since its proposal, and has indeed improved the performance of the network. The method fuses high-level semantic information and low-level spatial information, so that the feature map with unbalanced original information becomes more balanced. In ResNet, the fifth layer feature map semantic information is the most abundant and the performance is the strongest, but because the resolution is too small, his spatial information is almost lost. However, the defect of lacking of spatial information can be made up to a certain extent by fusing the feature pyramid with a lower-level feature map. The characteristics of rich semantic information can be used for providing semantic guidance for the low-level feature map, and noise information in the low-level feature map can be reduced.
Therefore, the Feature Pyramid (FPN) can be known to realize the effect of guiding low-level space information by using high-level semantic information, reducing noise, improving the information quality of feature images and balancing the information difference between the feature images. The key function is the highest-level feature map of the network, and the semantic information of the feature map is the strongest, but no multi-scale information exists due to the design reason of the backhaul. In the current structure, the pyramid structure can extract feature information of different scales, but the structure is flattened and lacks global context information.
And the early Feature Pyramid (FPN) fuses the feature images through a Bottom-Up structure, so that the information of the feature images is more balanced, and noise is suppressed. However, the feature pyramid only carries out fusion processing on the feature map information, and does not select the information and has no emphasis. This results in that noise or unimportant information within the profile is also preserved, which is equivalent to introducing noise interference for processing after the network. If the information of the feature map can be screened and weighted while the feature map fusion processing is performed, unimportant and sub-important information is restrained, and the weight of useful information is enhanced, the effect of feature fusion is more prominent, and the noise interference can be avoided in the processing after the network
Details of embodiments of the present invention are described below.
As shown in fig. 1-2, a first embodiment of the present invention provides an image instance segmentation method, including the steps of:
s11, extracting features input into a convolutional neural network image, and obtaining five feature images with different resolutions;
s12, extracting multi-scale information of the feature map through a self-attention module to obtain feature maps with different scales;
in this embodiment, as shown in fig. 3, the self-attention module may merge three features with different pyramid dimensions through a U-shaped structure (e.g., feature pyramid). To better extract the multi-scale context information of the image through the pyramid structure, the pyramid structure can use three different convolution kernels for feature extraction, 3×3, 5×5, 7×7, respectively. The three different scale feature images can be used for extracting scale information of large, medium and small targets from the original feature images, enriching the information hierarchy of the feature images, and simultaneously, the structure can bring pixel-level attention to the feature images. The self-attention module can introduce finer pixel-level attention information to the network while fusing multi-scale information, which can provide more accurate semantic guidance for low-level feature graphs in subsequent feature fusion. Of course, it should be noted that other types of U-shaped structures may be used, and these solutions are all within the scope of the present invention.
S13, based on the feature images with different scales, multiplying the feature images with feature image points with dimension adjusted through 1X 1 convolution, and thus introducing pixel-level attention to the feature images;
in this embodiment, the feature map input algorithm module of the last layer of ResNet-101 divides two paths, one path extracts multi-scale information through pyramid structures, and 4×4, 8×8 and 16×16 after each level of pyramid represent the resolution of the feature map after the level of processing. The other path is dimensionality adjusted by a 1 x 1 convolution. Because the feature map itself has little resolution at the time of input, the use of large convolution kernels herein does not introduce excessive computational effort into the network. The pyramid structure extracts different scale information step by step, and the context information of different scales is accurately combined in a cascading mode. After two paths of processing, the two feature images are subjected to dot multiplication, and pixel-level attention is introduced to the original feature images through the dot multiplication.
Preferably, the global pooling processing can be performed after the dot multiplication of the two feature images, so that the feature images contain the strongest global information, and the branch can provide global information for the module, thereby further improving the performance of the module.
S14, compressing the space dimension of the feature map, realizing the feature map, and weighting the lower-layer feature map based on the dimension and the real number;
in this embodiment, as shown in fig. 4, the path enhancement structure is that the real number obtained by the real number of the feature map has a global receptive field, and the output dimension and the input feature map dimension of the real number are the same, so that the real number can be used to characterize the global distribution in the feature dimension, and thus the global receptive field can be transferred to the next layer of feature map, and such global information is important for many tasks.
Figure BDA0002626636750000091
wherein ,zc Representing the real number array of the output, H and W are the height and width of the feature map, i, j represents the coordinates of the feature map, u c Is the dimension of the feature map, converts the h×w×c input feature map into a 1×1×c output, and this result represents global information corresponding to each layer of this feature map in the spatial dimension.
The high-level feature map is converted into corresponding global information through global pooling, and the real numbers are equivalent to weight information extracted from the network. And carrying out point multiplication processing on the feature map and the feature map of the bottom layer, weighting the feature map of the lower layer, and calculating the following formula:
Figure BDA0002626636750000092
s15, fusing the weighted feature map with the high-level feature map to obtain a new feature map corresponding to the low level;
in this embodiment, the above algorithm adds adjustment in spatial dimension, weights the low-level feature map by using the high-level feature map with rich semantic information, and provides guidance information for the low-level feature map by using simpler processing. The blank of the original feature pyramid in the space dimension is supplemented, and the overall quality of feature fusion processing is improved.
S16, classifying the fused feature images through target detection and semantic segmentation to obtain target detection and semantic segmentation feature images;
and S17, image segmentation is carried out based on the acquisition target detection and the semantic segmentation feature map, and a segmentation result is obtained.
A second embodiment of the present invention provides a remote sensing image extraction device based on super pixels, as shown in the figure, including:
110, a first extraction module for extracting the characteristics of the convolutional neural network image and obtaining five characteristic diagrams with different resolutions;
the second extraction module is used for extracting multi-scale information of the feature map through the self-attention module to obtain feature maps with different scales;
in this embodiment, as shown in fig. 3, the self-attention module may merge three features with different pyramid dimensions through a U-shaped structure (e.g., feature pyramid). To better extract the multi-scale context information of the image through the pyramid structure, the pyramid structure can use three different convolution kernels for feature extraction, 3×3, 5×5, 7×7, respectively. The three different scale feature images can be used for extracting scale information of large, medium and small targets from the original feature images, enriching the information hierarchy of the feature images, and simultaneously, the structure can bring pixel-level attention to the feature images. The self-attention module can introduce finer pixel-level attention information to the network while fusing multi-scale information, which can provide more accurate semantic guidance for low-level feature graphs in subsequent feature fusion. Of course, it should be noted that other types of U-shaped structures may be used, and these solutions are all within the scope of the present invention.
A point multiplication module, configured to multiply the obtained feature map with the dimension adjusted by 1×1 convolution based on the obtained feature map with different dimensions, so as to introduce pixel-level attention to the feature map;
in this embodiment, the feature map input algorithm module of the last layer of ResNet-101 divides two paths, one path extracts multi-scale information through pyramid structures, and 4×4, 8×8 and 16×16 after each level of pyramid represent the resolution of the feature map after the level of processing. The other path is dimensionality adjusted by a 1 x 1 convolution. Because the feature map itself has little resolution at the time of input, the use of large convolution kernels herein does not introduce excessive computational effort into the network. The pyramid structure extracts different scale information step by step, and the context information of different scales is accurately combined in a cascading mode. After two paths of processing, the two feature images are subjected to dot multiplication, and pixel-level attention is introduced to the original feature images through the dot multiplication.
Preferably, the global pooling processing can be performed after the dot multiplication of the two feature images, so that the feature images contain the strongest global information, and the branch can provide global information for the module, thereby further improving the performance of the module.
140: the weighting module is used for compressing the space dimension of the feature map, realizing the feature map, and weighting the lower-layer feature map based on the dimension and the real number;
in this embodiment, as shown in fig. 4, the real number obtained by the real number of the feature map has a global receptive field, and the output dimension and the input dimension of the real number are the same, so that the real number can be used for representing the global distribution in the feature dimension, and thus the global receptive field can be transferred to the next layer of feature map, and such global information is important for many tasks.
Figure BDA0002626636750000111
wherein ,zc Representing the real number array of the output, H and W are the height and width of the feature map, i, j represents the coordinates of the feature map, u c Is the dimension of the feature map, converts the h×w×c input feature map into a 1×1×c output, and this result represents global information corresponding to each layer of this feature map in the spatial dimension.
The high-level feature map is converted into corresponding global information through global pooling, and the real numbers are equivalent to weight information extracted from the network. And carrying out point multiplication processing on the feature map and the feature map of the bottom layer, weighting the feature map of the lower layer, and calculating the following formula:
Figure BDA0002626636750000112
a fusion module, which is used for fusing the weighted feature map with the high-level feature map to obtain a new feature map corresponding to the low level;
in this embodiment, the above algorithm adds adjustment in spatial dimension, weights the low-level feature map by using the high-level feature map with rich semantic information, and provides guidance information for the low-level feature map by using simpler processing. The blank of the original feature pyramid in the space dimension is supplemented, and the overall quality of feature fusion processing is improved.
The classification module is used for classifying the fused feature images through target detection and semantic segmentation to obtain target detection and semantic segmentation feature images;
170: the segmentation module is used for obtaining target detection and semantic segmentation feature images to carry out image segmentation, and obtaining segmentation results.
A third embodiment of the present invention provides an image instance segmentation apparatus, including a memory, and a processor, where the memory stores a computer program, and the processor is configured to run the computer program to implement the image instance segmentation method.
The fourth embodiment of the present invention further provides a storage medium, where a computer program is stored, where the computer program can be executed by a processor of a device where the storage medium is located, so as to implement the image instance segmentation method.
In the several embodiments provided by the embodiments of the present invention, it should be understood that the provided apparatus and method may be implemented in other manners. The apparatus and method embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present invention may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, an electronic device, a network device, or the like) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (4)

1. An image instance segmentation method, comprising the steps of:
extracting features input to a convolutional neural network image, and obtaining five feature images with different resolutions;
extracting multi-scale information of the feature images through a self-attention module to obtain feature images with different scales; the feature images with different scales are obtained through a U-shaped structure, the U-shaped structure fuses features with different pyramid scales, and the pyramid extracts the features by using different convolution kernels to obtain the feature images;
based on the obtained feature images with different scales, multiplying the feature images obtained by the dimension adjustment through 1 multiplied by 1 convolution, thereby introducing pixel-level attention to the feature images; the feature map with the strongest global information is obtained through global pooling processing on the basis of the feature maps with different scales and the feature map obtained through dimension adjustment by 1X 1 convolution; specifically, the feature map input algorithm module of the last layer of ResNet-101 is divided into two paths, wherein one path extracts multi-scale information through a pyramid structure, 4×4, 8×8 and 16×16 after each level of pyramid represent the resolution of the feature map after the level of processing, and the other path adjusts the dimension through 1×1 convolution; the pyramid structure gradually extracts different scale information, and combines the context information of different scales in a cascading mode; after two paths of processing, the two feature images are subjected to dot multiplication, and pixel-level attention is introduced to the feature images through the dot multiplication; compressing the space dimension of the feature map, realizing the feature map, and weighting the low-level feature map based on the dimension and the real number;
fusing the weighted feature images with the high-level feature images to obtain new feature images corresponding to the low level; the weighting process comprises the following steps:
Figure QLYQS_1
Figure QLYQS_2
wherein ,
Figure QLYQS_3
representing the real number array of the output, H and W are the height and width of the feature map, i, j represents the coordinates of the feature map,/->
Figure QLYQS_4
The method is a feature map, an input feature map of H multiplied by W multiplied by C is converted into an output of 1 multiplied by C, and the result represents global information corresponding to each layer of the feature map in the space dimension;
classifying the fused feature images through target detection and semantic segmentation to obtain target detection and semantic segmentation feature images;
and performing image segmentation based on the obtained target detection and the semantic segmentation feature map to obtain a segmentation result.
2. An image instance segmentation apparatus, comprising:
the first extraction module is used for extracting the characteristics input to the convolutional neural network image and obtaining five characteristic diagrams with different resolutions;
the second extraction module is used for extracting multi-scale information of the feature images through the self-attention module to obtain feature images with different scales; the feature images with different scales are obtained through a U-shaped structure, the U-shaped structure fuses features with different pyramid scales, and the pyramid extracts the features by using different convolution kernels to obtain the feature images;
the dot multiplication module is used for dot multiplying the obtained feature images with different dimensions by the feature images with dimension adjusted through 1X 1 convolution so as to introduce pixel-level attention to the feature images; the point multiplication module is specifically used for carrying out point multiplication on the obtained feature images with different scales and the obtained feature images with dimension adjusted through 1 multiplied by 1 convolution, and obtaining the feature image with the strongest global information through global pooling processing on the feature images after the point multiplication; specifically, the feature map input algorithm module of the last layer of ResNet-101 is divided into two paths, wherein one path extracts multi-scale information through a pyramid structure, 4×4, 8×8 and 16×16 after each level of pyramid represent the resolution of the feature map after the level of processing, and the other path adjusts the dimension through 1×1 convolution; the pyramid structure gradually extracts different scale information, and combines the context information of different scales in a cascading mode; after two paths of processing, the two feature images are subjected to dot multiplication, and pixel-level attention is introduced to the feature images through the dot multiplication;
the weighting module is used for compressing the space dimension of the feature map, realizing the feature map, and weighting the low-level feature map based on the dimension and the real number; the weighting process comprises the following steps:
Figure QLYQS_5
Figure QLYQS_6
wherein ,
Figure QLYQS_7
representing the real number array of the output, H and W are the height and width of the feature map, i, j represents the coordinates of the feature map,/->
Figure QLYQS_8
The method is a feature map, an input feature map of H multiplied by W multiplied by C is converted into an output of 1 multiplied by C, and the result represents global information corresponding to each layer of the feature map in the space dimension;
the fusion module is used for fusing the weighted feature images with the high-level feature images to obtain new feature images corresponding to the low level;
the classification module is used for classifying the fused feature images through target detection and semantic segmentation to obtain target detection and semantic segmentation feature images;
the segmentation module is used for obtaining the target detection and the semantic segmentation feature map to carry out image segmentation and obtain a segmentation result.
3. An image instance segmentation apparatus comprising a memory and a processor, the memory having a computer program stored therein, the processor being operable to execute the computer program to implement an image instance segmentation method as claimed in claim 1.
4. A storage medium storing a computer program executable by a processor of a device in which the storage medium is located to implement an image instance segmentation method according to claim 1.
CN202010798878.XA 2020-08-11 2020-08-11 Image instance segmentation method, device, equipment and storage medium Active CN111915613B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010798878.XA CN111915613B (en) 2020-08-11 2020-08-11 Image instance segmentation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010798878.XA CN111915613B (en) 2020-08-11 2020-08-11 Image instance segmentation method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111915613A CN111915613A (en) 2020-11-10
CN111915613B true CN111915613B (en) 2023-06-13

Family

ID=73284821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010798878.XA Active CN111915613B (en) 2020-08-11 2020-08-11 Image instance segmentation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111915613B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446308B (en) * 2020-11-16 2024-09-13 北京科技大学 Pedestrian detection method based on semantic enhancement multi-scale feature pyramid fusion
CN112396115B (en) * 2020-11-23 2023-12-22 平安科技(深圳)有限公司 Attention mechanism-based target detection method and device and computer equipment
CN112700462B (en) * 2020-12-31 2024-09-17 北京迈格威科技有限公司 Image segmentation method and device, electronic equipment and storage medium
CN112819032B (en) * 2021-01-11 2023-10-27 平安科技(深圳)有限公司 Multi-model-based slice feature classification method, device, equipment and medium
CN112906525B (en) * 2021-02-05 2024-10-18 广州市百果园信息技术有限公司 Age identification method and device and electronic equipment
CN113159061B (en) * 2021-03-10 2022-02-08 哈尔滨工业大学 Actual tunnel surrounding rock fragment identification method based on example segmentation
CN113421267B (en) * 2021-05-07 2024-04-12 江苏大学 Point cloud semantic and instance joint segmentation method and system based on improved PointConv
CN113469942B (en) * 2021-06-01 2022-02-22 天津大学 CT image lesion detection method
CN113591859A (en) * 2021-06-23 2021-11-02 北京旷视科技有限公司 Image segmentation method, apparatus, device and medium
CN113344935B (en) * 2021-06-30 2023-02-03 山东建筑大学 Image segmentation method and system based on multi-scale difficulty perception
CN113610085B (en) * 2021-10-10 2021-12-07 成都千嘉科技有限公司 Character wheel image identification method based on attention mechanism
CN114913325B (en) * 2022-03-24 2024-05-10 北京百度网讯科技有限公司 Semantic segmentation method, semantic segmentation device and computer program product
CN114511046A (en) * 2022-04-19 2022-05-17 阿里巴巴(中国)有限公司 Object recognition method and device
CN114972376B (en) * 2022-05-16 2023-08-25 北京医准智能科技有限公司 Coronary calcified plaque segmentation method, segmentation model training method and related device
CN118334322B (en) * 2024-06-12 2024-08-27 浙江大华技术股份有限公司 Camouflage target detection method, camouflage target detection device, computer equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104866855A (en) * 2015-05-07 2015-08-26 华为技术有限公司 Image feature extraction method and apparatus
CN106157319A (en) * 2016-07-28 2016-11-23 哈尔滨工业大学 The significance detection method that region based on convolutional neural networks and Pixel-level merge
CN107330873A (en) * 2017-05-05 2017-11-07 浙江大学 Objective evaluation method for quality of stereo images based on multiple dimensioned binocular fusion and local shape factor
CN109525290A (en) * 2019-01-11 2019-03-26 电子科技大学 Real number feedback iteration channel estimation methods based on MIMO-FBMC system
CN110084210A (en) * 2019-04-30 2019-08-02 电子科技大学 The multiple dimensioned Ship Detection of SAR image based on attention pyramid network
CN110096994A (en) * 2019-04-28 2019-08-06 西安电子科技大学 A kind of small sample PolSAR image classification method based on fuzzy label semanteme priori
CN111242071A (en) * 2020-01-17 2020-06-05 陕西师范大学 Attention remote sensing image target detection method based on anchor frame
CN111292330A (en) * 2020-02-07 2020-06-16 北京工业大学 Image semantic segmentation method and device based on coder and decoder

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104866855A (en) * 2015-05-07 2015-08-26 华为技术有限公司 Image feature extraction method and apparatus
CN106157319A (en) * 2016-07-28 2016-11-23 哈尔滨工业大学 The significance detection method that region based on convolutional neural networks and Pixel-level merge
CN107330873A (en) * 2017-05-05 2017-11-07 浙江大学 Objective evaluation method for quality of stereo images based on multiple dimensioned binocular fusion and local shape factor
CN109525290A (en) * 2019-01-11 2019-03-26 电子科技大学 Real number feedback iteration channel estimation methods based on MIMO-FBMC system
CN110096994A (en) * 2019-04-28 2019-08-06 西安电子科技大学 A kind of small sample PolSAR image classification method based on fuzzy label semanteme priori
CN110084210A (en) * 2019-04-30 2019-08-02 电子科技大学 The multiple dimensioned Ship Detection of SAR image based on attention pyramid network
CN111242071A (en) * 2020-01-17 2020-06-05 陕西师范大学 Attention remote sensing image target detection method based on anchor frame
CN111292330A (en) * 2020-02-07 2020-06-16 北京工业大学 Image semantic segmentation method and device based on coder and decoder

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Pyramid Feature Attention Network for Saliency detection;Ting Zhao, Xiangqian Wu;CVPR;1-10 *

Also Published As

Publication number Publication date
CN111915613A (en) 2020-11-10

Similar Documents

Publication Publication Date Title
CN111915613B (en) Image instance segmentation method, device, equipment and storage medium
Tong et al. Saliency detection with multi-scale superpixels
CN112966691B (en) Multi-scale text detection method and device based on semantic segmentation and electronic equipment
CN111553837B (en) Artistic text image generation method based on neural style migration
CN105184763B (en) Image processing method and device
CN112150493B (en) Semantic guidance-based screen area detection method in natural scene
CN110334762B (en) Feature matching method based on quad tree combined with ORB and SIFT
CN112800964B (en) Remote sensing image target detection method and system based on multi-module fusion
CN111462126A (en) Semantic image segmentation method and system based on edge enhancement
CN108898145A (en) A kind of image well-marked target detection method of combination deep learning
CN109978848B (en) Method for detecting hard exudation in fundus image based on multi-light-source color constancy model
CN111738055B (en) Multi-category text detection system and bill form detection method based on same
CN112365514A (en) Semantic segmentation method based on improved PSPNet
CN105160355B (en) A kind of method for detecting change of remote sensing image based on region correlation and vision word
CN111914947B (en) Image instance segmentation method, device, equipment and storage medium based on feature fusion
CN106778768A (en) Image scene classification method based on multi-feature fusion
CN105046278B (en) The optimization method of Adaboost detection algorithm based on Haar feature
CN110706235A (en) Far infrared pedestrian detection method based on two-stage cascade segmentation
CN112802039B (en) Panorama segmentation method based on global edge attention
Song et al. Carvingnet: content-guided seam carving using deep convolution neural network
Al-Amaren et al. RHN: A residual holistic neural network for edge detection
CN113392854A (en) Image texture feature extraction and classification method
CN116071339A (en) Product defect identification method based on improved whale algorithm optimization SVM
CN114708615A (en) Human body detection method based on image enhancement in low-illumination environment, electronic equipment and storage medium
CN111242120B (en) Character detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant