WO2022217876A1 - 实例分割方法及装置、电子设备及存储介质 - Google Patents

实例分割方法及装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2022217876A1
WO2022217876A1 PCT/CN2021/124726 CN2021124726W WO2022217876A1 WO 2022217876 A1 WO2022217876 A1 WO 2022217876A1 CN 2021124726 W CN2021124726 W CN 2021124726W WO 2022217876 A1 WO2022217876 A1 WO 2022217876A1
Authority
WO
WIPO (PCT)
Prior art keywords
instance
mask
feature
semantic
stage
Prior art date
Application number
PCT/CN2021/124726
Other languages
English (en)
French (fr)
Inventor
张刚
李全全
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Publication of WO2022217876A1 publication Critical patent/WO2022217876A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/187Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the embodiments of the present disclosure relate to the technical field of instance segmentation, and relate to, but are not limited to, an instance segmentation method and apparatus, an electronic device, and a storage medium.
  • the goal of object detection and instance segmentation is to detect objects in images and to segment the pixels of the objects.
  • High-quality instance segmentation requires a model that can not only segment objects in images, but also achieve high accuracy at the pixel level.
  • features are extracted for each object based on the object detection frame, and a downsampling operation is used in the process of extracting features to process objects of different scales to achieve object detection and instance segmentation, which is difficult to achieve high accuracy of instance segmentation.
  • the embodiments of the present disclosure provide a technical solution for instance segmentation.
  • Embodiments of the present disclosure provide an instance segmentation method, which includes:
  • the first instance feature and the first instance mask Based on the first semantic information, the first instance feature and the first instance mask, at least two stages of semantic fusion processing are performed to obtain a second instance mask;
  • the first instance feature output by the semantic fusion processing in the previous stage is up-sampled to obtain the instance feature of the latter stage, and the corresponding instance mask is obtained based on the instance feature of the latter stage, and the latter
  • the instance feature of one stage, the instance mask of the latter stage, and the semantic information corresponding to the latter stage are used as the input features of the semantic fusion processing of the latter stage; and, the input of the semantic fusion processing of each stage
  • the resolution of semantic information in features is the same as the resolution of instance features.
  • performing at least two stages of semantic fusion processing based on the first semantic information, the first instance feature and the first instance mask to obtain a second instance mask including: Based on the first semantic information, the first instance feature and the first instance mask, a first-stage semantic fusion process is performed to obtain a second instance feature; based on the second instance feature, and the first instance feature The stage instance mask corresponding to the two instance features and the second semantic information are subjected to at least one stage of semantic fusion processing to obtain the second instance mask; wherein the resolution of the second semantic information and the second instance feature the same resolution.
  • multi-stage refinement is performed on the first semantic information, the first instance feature and the first instance mask, and the resolution of the output result of each stage is greater than the resolution of the output result of the previous stage, so that each stage can be refined for each stage.
  • An instance to be segmented outputs a high-resolution instance mask.
  • performing a first-stage semantic fusion process based on the first semantic information, the first instance feature, and the first instance mask to obtain a second instance feature including: The first semantic feature in the first semantic information and the first instance feature are fused to obtain a first fusion feature; the first fusion feature, the first semantic mask in the first semantic information and the The first instance masks are connected to obtain the second instance features.
  • the semantic fusion module to fuse the feature and mask of the image to be segmented, the second instance feature with finer granularity can be obtained.
  • the merging the first semantic feature in the first semantic information and the first instance feature to obtain the first fused feature includes: using a first convolution operation to A semantic feature and the first instance feature are processed to obtain a first convolution feature; a plurality of second convolution operations are respectively used to process the first convolution feature to obtain a plurality of second convolution results; Wherein, the convolution kernel of the first convolution operation is smaller than the convolution kernel of the second convolution operation, and the size of the holes of the plurality of second convolution operations is different; based on the results of the plurality of second convolution operations , and determine the first fusion feature. In this way, the obtained fusion feature can fully retain the local detail information of the instance to be segmented.
  • the second instance mask is obtained by performing at least one stage of semantic fusion processing based on the second instance feature, the stage instance mask corresponding to the second instance feature, and the second semantic information.
  • the model includes: performing a second-stage semantic fusion process on the second instance feature, the stage instance mask, and the second semantic information to obtain the first instance feature corresponding to the third instance feature and the third instance feature.
  • a hole mask based on the first hole mask and the stage instance mask, a third instance mask is determined; a third instance mask is performed on the third instance feature, the first hole mask and the third semantic information Semantic fusion processing at the stage, to obtain a second hole mask corresponding to the fourth instance feature and the fourth instance feature; based on the second hole mask and the third instance mask, determine the second instance mask mold.
  • the edge region of the instance can be segmented more accurately, thereby greatly improving the segmentation effect.
  • the second-stage semantic fusion processing is performed on the second instance feature, the stage instance mask, and the second semantic information to obtain a third instance feature and the third instance feature
  • the corresponding first hole mask includes: performing a second-stage semantic fusion process on the second instance feature, the stage instance mask, and the second semantic information to obtain the third instance feature;
  • the edge region in the third instance feature is predicted to obtain the first hole mask.
  • the determining a third instance mask based on the first hole mask and the stage instance mask includes: determining an edge region in the stage instance mask; based on the edge region and the first hole mask, determine an edge mask describing the edge area of the instance to be segmented; based on the edge area and the stage instance mask, determine a non-edge area describing the non-edge area of the instance to be segmented an edge mask; determining the third instance mask based on the non-edge mask and the edge mask.
  • the determining, based on the edge region and the first hole mask, an edge mask describing the edge region of the instance to be segmented includes: based on a resolution of the first hole mask , up-sampling the edge region in the stage instance mask to obtain a first edge region; and based on the first edge region and the first hole mask, obtain the edge mask. In this way, by combining the first edge region of the stage instance mask and the first hole mask for predicting the edge region of the instance to be segmented, the edge region of the instance to be segmented can be predicted more accurately.
  • the determining, based on the edge region and the stage instance mask, a non-edge mask describing a non-edge region of the instance to be segmented includes: resolving based on the first hole mask Upsampling the stage instance mask at a rate to obtain a magnified instance mask; performing an inversion operation on the first edge region to obtain an inversion mask; based on the inversion mask and the magnified instance mask, obtaining the non-edge mask.
  • the third instance mask can accurately describe the complete shape of the instance to be segmented.
  • the determining the edge region in the stage instance mask includes: determining the edge line of the to-be-segmented instance based on the stage-instance mask; in the to-be-processed image, determining a A set of pixel points whose minimum distance between the edge lines is less than a preset distance; and based on the set of pixel points, an edge region in the stage instance mask is determined. In this way, by analyzing the distance between the pixel point and the edge line of the instance to be segmented, the detail information of the edge region of the instance to be segmented can be more fully preserved.
  • the method before the determining of the first semantic information of the image to be processed, the method further includes: using a feature map pyramid network to perform feature extraction on the image to be processed to obtain a plurality of images including different resolutions An image feature set of features; based on target image features whose resolutions meet a preset threshold in the image feature set, determine the semantic information of the to-be-processed image. In this way, richer semantic information and more accurate instance features and instance masks can be obtained.
  • the determining the semantic information of the to-be-processed image based on the target image feature whose resolution satisfies a preset threshold in the image feature set includes: based on the target image feature, determining the to-be-processed image feature. Perform semantic segmentation on the image to obtain semantic features; based on the semantic features, determine the probability that each pixel in the image to be processed belongs to the instance to be segmented; based on the probability, determine the semantic mask of the image to be processed; The semantic feature and the semantic mask are used as the semantic information. In this way, semantic information rich in detailed information can be obtained.
  • the acquiring first semantic information of the image to be processed, the first instance feature of the instance to be segmented in the image to be processed, and the first instance mask corresponding to the first instance feature includes: using a region of interest alignment operation to select a first image feature that satisfies a preset resolution in the feature map set of the image to be processed; based on the first image feature, determine the first instance feature and the A first instance mask; using the region of interest alignment operation to select the first semantic information whose resolution is the preset resolution in the semantic information.
  • the loss of detail can be further supplemented by selecting semantic information, instance features, and instance masks that satisfy a certain resolution by using a region-of-interest alignment operation.
  • Embodiments of the present disclosure provide an instance segmentation apparatus, the apparatus comprising:
  • a first acquisition module configured to acquire first semantic information of an image to be processed, a first instance feature of an instance to be segmented in the to-be-processed image, and a first instance mask corresponding to the first instance feature
  • a first processing module configured to perform at least two stages of semantic fusion processing based on the first semantic information, the first instance feature and the first instance mask to obtain a second instance mask;
  • the first instance feature output by the semantic fusion processing in the previous stage is up-sampled to obtain the instance feature of the latter stage, and the corresponding instance mask is obtained based on the instance feature of the latter stage, and the latter
  • the instance feature of one stage, the instance mask of the latter stage, and the semantic information corresponding to the latter stage are used as the input features of the semantic fusion processing of the latter stage; and, the input of the semantic fusion processing of each stage
  • the resolution of semantic information in features is the same as the resolution of instance features.
  • the first processing module includes:
  • a first processing submodule configured to perform first-stage semantic fusion processing based on the first semantic information, the first instance feature and the first instance mask to obtain a second instance feature
  • the second processing submodule is configured to perform at least one stage of semantic fusion processing based on the second instance feature, the stage instance mask corresponding to the second instance feature, and the second semantic information to obtain the second instance mask wherein, the resolution of the second semantic information is the same as the resolution of the second instance feature.
  • the first processing sub-module includes:
  • a first fusion unit configured to fuse the first semantic feature in the first semantic information with the first instance feature to obtain a first fusion feature
  • the first connecting unit is configured to connect the first fusion feature, the first semantic mask in the first semantic information, and the first instance mask to obtain the second instance feature.
  • the first fusion unit includes:
  • a first convolution subunit configured to use a first convolution operation to process the first semantic feature and the first instance feature to obtain a first convolution feature
  • the second convolution subunit is configured to use a plurality of second convolution operations respectively to process the first convolution feature to obtain a plurality of second convolution results; wherein, the volume of the first convolution operation
  • the product kernel is smaller than the convolution kernel of the second convolution, and the holes of the plurality of second convolution operations are different in size;
  • the first determination subunit is configured to determine the first fusion feature based on the plurality of second convolution results.
  • the second processing sub-module includes:
  • a first processing unit configured to perform a second-stage semantic fusion process on the second instance feature, the stage instance mask, and the second semantic information, to obtain a third instance feature corresponding to the third instance feature the first hole mask of ;
  • a first determining unit configured to determine a third instance mask based on the first hole mask and the stage instance mask
  • the second processing unit is configured to perform a third-stage semantic fusion process on the third instance feature, the first hole mask and the third semantic information, to obtain a fourth instance feature corresponding to the fourth instance feature a second hole mask;
  • a second determination unit configured to determine the second instance mask based on the second hole mask and the third instance mask.
  • the first processing unit includes:
  • a first processing subunit configured to perform a second-stage semantic fusion process on the second instance feature, the stage instance mask, and the second semantic information to obtain the third instance feature
  • the first prediction subunit is configured to predict the edge region in the third instance feature to obtain the first hole mask.
  • the second determining unit includes:
  • a second determination subunit configured to determine the edge region in the stage instance mask
  • a third determining subunit configured to determine an edge mask describing the edge region of the instance to be segmented based on the edge region and the first hole mask
  • a fourth determination subunit configured to determine a non-edge mask describing the non-edge region of the instance to be segmented based on the edge region and the stage instance mask;
  • a fifth determination subunit is configured to determine the third instance mask based on the non-edge mask and the edge mask.
  • the third determination subunit is further configured to: based on the resolution of the first hole mask, up-sample the edge region in the stage instance mask to obtain the first edge region ; Obtain the edge mask based on the first edge region and the first hole mask.
  • the fourth determining subunit is further configured to: upsample the stage instance mask based on the resolution of the first hole mask to obtain an enlarged instance mask; An inversion operation is performed on the region to obtain an inversion mask; based on the inversion mask and the enlarged example mask, the non-edge mask is obtained.
  • the second determining subunit is further configured to: determine the edge line of the instance to be segmented based on the stage instance mask; A set of pixel points whose minimum distance between them is less than a preset distance; based on the set of pixel points, an edge region in the stage instance mask is determined.
  • the apparatus further includes:
  • a first extraction module configured to use a feature map pyramid network to perform feature extraction on the to-be-processed image to obtain an image feature set including multiple image features with different resolutions
  • the first determining module is configured to determine the semantic information of the to-be-processed image based on the target image feature whose resolution satisfies a preset threshold in the image feature set.
  • the first determining module includes:
  • a first segmentation sub-module configured to perform semantic segmentation on the to-be-processed image based on the target image feature to obtain semantic features
  • a first determination submodule configured to determine the probability that each pixel in the to-be-processed image belongs to the to-be-segmented instance based on the semantic feature
  • a second determination submodule configured to determine the semantic mask of the to-be-processed image based on the probability
  • the third determining submodule is configured to use the semantic feature and the semantic mask as the semantic information.
  • the first obtaining module includes:
  • a first alignment sub-module configured to use a region of interest alignment operation to select a first image feature that satisfies a preset resolution in the feature map set of the to-be-processed image
  • a fourth determination submodule configured to determine the first instance feature and the first instance mask based on the first image feature
  • the second alignment sub-module is configured to use the region of interest alignment operation to select the first semantic information whose resolution is the preset resolution in the semantic information.
  • an embodiment of the present disclosure provides a computer storage medium, where computer-executable instructions are stored thereon, and after the computer-executable instructions are executed, the steps of the above method can be implemented.
  • An embodiment of the present disclosure provides a computer device, the computer device includes a memory and a processor, the memory stores computer-executable instructions, and the processor can implement the above method when running the computer-executable instructions on the memory A step of.
  • An embodiment of the present disclosure provides a computer program product, where the computer program product includes computer-executable instructions, and after the computer-executable instructions are executed, the instance segmentation method described in any one of the foregoing can be implemented.
  • Embodiments of the present disclosure provide an instance segmentation method and apparatus, electronic device, and storage medium.
  • first semantic information of an image to be processed and first instance features and first instance masks of instances to be segmented are acquired; a semantic information, a first instance feature and a first instance mask, perform at least two stages of semantic fusion processing to obtain a second instance mask describing the image area where the instance to be segmented is located; and the instance output from the previous stage semantic fusion processing
  • the features are upsampled to obtain the instance features of the latter stage, and the instance features of the latter stage, the corresponding instance masks and semantic information are used as the input features of the semantic fusion processing of the latter stage; in this way, the segmentation instances are treated through multiple stages.
  • the semantic information, instance features and instance masks are refined, and each stage receives the instance features output from the previous stage and the detailed information supplemented by semantic segmentation, which can greatly improve the segmentation effect of the instances to be segmented.
  • FIG. 1A is a schematic diagram of a system architecture to which an instance segmentation method according to an embodiment of the present disclosure can be applied;
  • FIG. 1B is a schematic diagram of an implementation flowchart of an instance segmentation method provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of another implementation of an instance segmentation method provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of instance segmentation results in different manners provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of a frame of a thinning mask provided by an embodiment of the present disclosure
  • FIG. 5 is a schematic diagram of the composition and structure of a semantic fusion module provided by an embodiment of the present disclosure
  • FIG. 6 is a schematic diagram of the reasoning process of the second stage of instance segmentation according to an embodiment of the present disclosure
  • FIG. 7 is a schematic diagram of an application scenario of an instance edge region provided by an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural composition diagram of an example segmentation device according to an embodiment of the present disclosure.
  • FIG. 9 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure.
  • first ⁇ second ⁇ third is only to distinguish similar instances, and does not represent a specific ordering of instances, it is understood that "first ⁇ second ⁇ third" Where permitted, the specific order or sequence may be interchanged to enable the embodiments of the disclosure described herein to be practiced in sequences other than those illustrated or described herein.
  • Semantic segmentation segment all kinds of information in the picture, for example, the outline of a person is marked in red, and the road is marked in purple, but there is no way to distinguish different people, which is equivalent to classifying the pictures in the picture.
  • the outer contour of the label matches that of the label.
  • Instance segmentation Each pixel in the image is divided into a corresponding category and a specific instance of the class, which is an instance. Instance segmentation not only needs to perform pixel-level classification, but also needs to distinguish different types on the basis of specific categories. instance. For example, if there are multiple people A, B, and C in the image, their semantic segmentation results are all people, but the instance segmentation results are different instances.
  • FPN Feature Pyramid Networks
  • the device provided by the embodiment of the present disclosure can be implemented as a notebook computer, a tablet computer, a desktop computer, a camera, a mobile device (for example, a personal digital
  • a mobile device for example, a personal digital
  • Various types of user terminals such as assistants, dedicated messaging devices, portable game devices
  • servers can also be implemented as servers.
  • exemplary applications when the device is implemented as a terminal or a server will be described.
  • the embodiments of the present disclosure provide an instance segmentation method, which can be applied to a computer device, and the functions implemented by the method can be implemented by calling a program code by a processor in the computer device.
  • the program code can be stored in a computer storage medium, It can be seen that the computer device includes at least a processor and a storage medium.
  • FIG. 1A is a schematic diagram of a system architecture to which an instance segmentation method according to an embodiment of the present disclosure can be applied; as shown in FIG. 1A , the system architecture includes: an image acquisition terminal 11 , a network 12 and an instance segmentation terminal 13 .
  • the image acquisition terminal 11 and the instance segmentation terminal 13 may establish a communication connection through the network 12 , and the image acquisition terminal 11 reports the collected to-be-processed images to the instance segmentation terminal 13 through the network 12 .
  • the instance segmentation terminal 13 For the image to be processed received by the instance segmentation terminal 13, firstly determine the first semantic information of the image, the first instance feature of the instance to be segmented in the image, and the corresponding first instance mask; then, through the determined first semantic information information, the first instance feature and the first instance mask, perform semantic fusion processing in multiple stages to obtain a second instance mask; finally, the instance segmentation terminal 13 uploads the second instance mask supplemented with detailed information to the network 12 , and sent to the image acquisition terminal 11 through the network 12 . In this way, by introducing semantic information matching the feature resolution of the first instance, it is possible to supplement the detailed information when the instance to be segmented is segmented, thereby greatly improving the segmentation effect of the instance to be segmented.
  • the image acquisition terminal 11 may include an image acquisition device, and the instance segmentation terminal 13 may include a processing device or a remote server with information processing capabilities.
  • the network 12 can be wired or wireless.
  • the instance segmentation terminal is a processing device
  • the image acquisition terminal 11 can communicate with the processing device through a wired connection, such as data communication through a bus;
  • the instance segmentation terminal 13 is a remote server
  • the image acquisition terminal 11 can be connected through The wireless network exchanges data with the remote server.
  • the image acquisition terminal 11 may be a vision processing device with an image acquisition module, and is specifically implemented as a host with a camera.
  • the instance segmentation method of the embodiment of the present application may be executed by the instance segmentation terminal 13 , and the above-mentioned system architecture may not include the network and the image acquisition terminal 11 .
  • the embodiment of the present application provides an instance segmentation method, as shown in FIG. 1B , which is described in conjunction with the steps shown in FIG. 1B :
  • Step S101 acquiring first semantic information of the image to be processed, a first instance feature of an instance to be segmented in the image to be processed, and a first instance mask corresponding to the first instance feature.
  • the image to be processed may be an image including multiple instances or one instance to be segmented, an image with a complex appearance, or an image with a simple appearance.
  • the image to be detected may be an image collected by any collection device in any scene with an instance to be segmented.
  • the to-be-segmented instance in the to-be-processed image can be any instance that matches the application scene. For example, if the application scenario is human body segmentation, the instance to be segmented is the human body in the image to be processed; if the application scenario is vehicle segmentation, the instance to be segmented is the vehicle in the image to be processed.
  • the semantic information of the image to be processed represents the category description of the image to be processed at the pixel level.
  • each pixel in the image is divided into a corresponding category, and a pixel-level classification result is obtained.
  • the first semantic information includes semantic features of the image to be processed and a semantic mask of the image to be processed.
  • the resolution of the first semantic information is the same as the resolution of the first instance feature; the first instance mask is used to describe the image area corresponding to the instance to be segmented, that is, to describe the complete shape of the instance to be segmented.
  • step S101 feature extraction is performed on the image to be processed by using a feature map pyramid network to obtain richer semantic information and more accurate instance features and instance masks, that is, the above step S101 can be implemented through the following process:
  • Step S111 using a feature map pyramid network to perform feature extraction on the to-be-processed image to obtain an image feature set including multiple image features with different resolutions.
  • the image to be processed is extracted from the bottom to the top; secondly, the extracted high-level feature map is upsampled in a top-down manner; thirdly, through the horizontal connection , fuse the upsampling result with the feature map of the same size generated from the bottom up; and upsample the low-resolution feature map by a factor of 2 (or use nearest neighbor upsampling). Finally, the upsampled map is merged with the corresponding bottom-up map by element-wise addition. This process is iterative until the final resolution map is generated, that is, the image feature set is obtained.
  • multiple images of the image to be processed at different resolutions may be acquired, and feature extraction is performed on the multiple images to obtain an image feature set including multiple image features with different resolutions.
  • the number of different resolutions can be set to match the number of layers of the feature pyramid network, that is, if the feature pyramid network has 4 layers, it can be Set 5 different resolutions from large to small.
  • a fixed scaling ratio may be used to scale the image to be processed, so as to obtain multiple images with different resolutions.
  • Step S112 Determine the semantic information of the to-be-processed image based on the target image feature whose resolution meets a preset threshold in the image feature set.
  • the preset threshold may be a resolution threshold set according to the resolution size of each feature map in the image feature set; for example, in the image feature set, it is determined that the image feature with the highest resolution is the target image feature.
  • the target image feature Through the target image feature, the semantic information of the full image of the to-be-processed image is analyzed. In this way, by using the high-resolution target image features as the input of the semantic segmentation branch network to predict the semantic information of the full image of the to-be-processed image, more detailed information can be provided for subsequent instance segmentation.
  • Step S113 using a region of interest alignment operation to select a first image feature satisfying a preset resolution from the feature map set of the image to be processed.
  • step S113 may be a region of interest alignment (Region of Interest Align, RoI-Align) operation.
  • region of interest alignment (Region of Interest Align, RoI-Align) operation.
  • the feature map pyramid The feature map in the detection frame; then, the feature map corresponding to each detection frame is adjusted to the size of the preset size, and the first image feature satisfying the preset resolution can be obtained.
  • a 14 ⁇ 14 region of interest alignment operation is used, and a 14 ⁇ 14 feature map is set to obtain the first image feature.
  • the instance segmentation branch network may be implemented by a fully convolutional instance segmentation branch network, such as a fully convolutional neural network (FCN), a mask region convolutional neural network ( Mask Region Convolutional Neural Network, Mask R-CNN) or Efficient Residual Factorized ConvNet, etc.
  • FCN fully convolutional neural network
  • Mask Region Convolutional Neural Network Mask R-CNN
  • Efficient Residual Factorized ConvNet etc.
  • Step S114 determining the first instance feature and the first instance mask based on the first image feature.
  • the first instance feature is obtained by performing a convolution operation on the first image feature that has been set to a preset size; the complete shape of the instance to be segmented is predicted based on the first instance feature to obtain the first instance feature.
  • An example mask is included in this way, by including feature map pyramids of multiple resolutions to predict the first instance feature and the first instance mask of the instance to be segmented, the accuracy of the predicted first instance mask can be improved.
  • Step S115 using the region of interest alignment operation to select the first semantic information whose resolution is the preset resolution from the semantic information.
  • the first semantic information is obtained by selecting a first semantic feature and a first semantic mask with the same resolution as that of the first instance feature in the semantic information by using the same region of interest alignment operation.
  • a rough and complete instance mask is obtained, which is convenient for refinement based on the rough and complete first instance mask in subsequent stages Segmentation can further supplement the loss of detail.
  • Step S102 based on the first semantic information, the first instance feature and the first instance mask, perform at least two stages of semantic fusion processing to obtain a second instance mask.
  • an instance feature and an instance mask corresponding to the instance feature obtained after up-sampling the first instance feature output from the previous semantic fusion processing are used as the input of the subsequent semantic fusion processing and, the resolution of the semantic information in the input feature of each semantic fusion process is the same as the resolution of the instance feature.
  • the first semantic information, the first instance feature and the first instance mask are used as the input of the first semantic fusion processing; then, the instance features output by the first semantic fusion processing are upsampled.
  • An instance feature is obtained; the instance feature and the corresponding instance mask, as well as the semantic feature and the semantic mask with the same resolution as that of the instance feature, are used as the input of the second semantic fusion process.
  • a second instance mask that can accurately and completely describe the complete shape of the instance to be segmented is obtained.
  • an acquired image to be processed that includes an instance to be segmented
  • the semantic information, the first instance feature and the first instance mask are subjected to multiple semantic fusion processing to obtain the second instance mask.
  • the instance feature and the instance mask corresponding to the instance feature obtained by up-sampling the first instance feature output from the previous semantic fusion process are used as the input feature of the subsequent semantic fusion process, and the Introducing semantic information that matches the feature resolution of the first instance can supplement the detailed information when the instance to be segmented is segmented, thereby greatly improving the segmentation effect of the instance to be segmented.
  • semantic segmentation is performed on the high-resolution image features in the input feature map pyramid to obtain semantic information describing the image to be processed, that is, the above step S112 can be implemented by the following steps:
  • the first step is to perform semantic segmentation on the to-be-processed image based on the target image features to obtain semantic features.
  • the target image features are input into the semantic segmentation branch network, which may include four convolutional layers to extract the semantic features of the entire image.
  • the second step based on the semantic features, determine the probability that each pixel in the to-be-processed image belongs to the to-be-segmented instance.
  • the two-class classifier predicts the probability that each pixel belongs to the instance in the image, that is, each pixel in the image to be processed belongs to the image to be processed.
  • the probability of splitting the instance For example, if the instance to be segmented is a vehicle, then a binary classifier is used to predict the probability that each pixel belongs to a vehicle, so as to predict the semantic mask of the image to be processed.
  • a semantic mask of the image to be processed is determined, and the semantic feature and the semantic mask are used as the semantic information.
  • a high-resolution semantic mask of the entire image is predicted by being supervised by a binary cross-entropy loss.
  • the semantic features of the image are obtained by semantically segmenting the high-resolution image features, and the image is predicted by using binary cross entropy loss.
  • the semantic mask of so as to obtain detailed semantic information.
  • multi-stage refinement is performed on the first semantic information, the first instance feature and the first instance mask, and the resolution of the output result of each stage is greater than the resolution of the output result of the previous stage,
  • a high-resolution instance mask can be output for each instance to be segmented, that is, the above-mentioned step S102 can be implemented by the steps shown in FIG. 2 , which is a schematic flowchart of another implementation of the instance segmentation method provided by the embodiment of the present disclosure. , in conjunction with the steps shown in Figures 1 and 2, the following descriptions are made:
  • Step S201 based on the first semantic information, the first instance feature and the first instance mask, perform a first-stage semantic fusion process to obtain a second instance feature.
  • the first instance feature may be a 14 ⁇ 14 instance feature, and the first instance mask is a 14 ⁇ 14 instance mask corresponding to the instance feature; the first semantic information is a 14 ⁇ 14 first semantic feature and a 14 ⁇ 14 first semantic mask.
  • Step S202 performing at least one-stage fusion processing based on the second instance feature, the stage instance mask corresponding to the second instance feature, and the second semantic information to obtain the second instance mask.
  • the resolution of the second semantic information is the same as the resolution of the second instance feature.
  • the second semantic information may be a region of interest alignment operation, in the semantic information obtained by semantically segmenting the image to be processed through the semantic segmentation branch network, the selected semantic feature and semantic mask with the same resolution as the second instance feature . Upsampling the second instance feature output from the first semantic fusion process, and inputting the upsampled instance feature, the stage instance mask of the same resolution, and the second semantic information of the same resolution into the semantic fusion module of the second stage , the second semantic fusion processing is performed.
  • a hole mask describing the edge region of the instance to be segmented is obtained, and the hole mask is combined with the stage instance mask describing the complete shape to obtain a mask that can describe the complete shape of the instance to be segmented.
  • Second example mask Second example mask.
  • the first fusion feature is obtained, that is, the above step S201 can be implemented by the following steps:
  • Step S211 fuse the first semantic feature in the first semantic information with the first instance feature to obtain a first fused feature.
  • the first semantic information includes: a first semantic feature and a first semantic mask; the first semantic feature and the first instance feature have the same resolution; the first semantic mask and the first instance mask The resolution of the mode is the same.
  • the semantic information output by the semantic segmentation branch network is the output semantic features and semantic masks. Since the resolution of the first instance feature is 14 ⁇ 14, the RoI-Align operation of 14 ⁇ 14 is firstly used, and among the semantic features output by the semantic segmentation branch network, the first semantic feature with a resolution of 14 ⁇ 14 is selected. Then, in the semantic mask, a first semantic mask that matches the resolution of the first instance mask is determined. For example, since the resolution of the first instance mask is 14 ⁇ 14, the RoI-Align operation of 14 ⁇ 14 is used. In the semantic mask output by the semantic segmentation branch network, the first semantic mask with a resolution of 14 ⁇ 14 is selected. mold.
  • the fusion of the first semantic feature and the first instance feature may be performed by inputting the first semantic feature and the first instance feature into the convolution layer, and using multiple different convolution ranges to fuse the first semantic feature and the first instance feature.
  • the features are subjected to convolution and element-by-element summation to obtain the first fusion feature. In some possible implementations, it can be achieved by the following steps:
  • a first convolution operation is used to process the first semantic feature and the first instance feature to obtain a first convolution feature.
  • the first convolution operation may be a convolution network with a convolution kernel smaller than a certain threshold, and may also be a convolution calculation, for example, a 1 ⁇ 1 convolution layer.
  • the first semantic feature, the first instance feature, the first semantic mask and the first instance mask are jointly output to the semantic fusion module; a 1 ⁇ 1 convolutional layer is used to perform the first semantic feature and the first instance feature input. Convolution to obtain the first convolution feature.
  • multiple second convolution operations are respectively used to process the first convolution feature to obtain multiple second convolution results.
  • the convolution kernel of the first convolution operation is smaller than the convolution kernel of the second convolution operation, and the holes of the plurality of second convolution operations are different in size, that is, the plurality of second convolution operations are used.
  • the product operation performs the convolution operation on the input features
  • the convolution coverage of a single convolution is different. For example, three parallel 3 ⁇ 3 convolutional layers with different hole sizes are used to process the first convolutional features to obtain multiple second convolutional results.
  • the first fusion feature is obtained based on the plurality of second convolution results.
  • the multiple second convolution results are summed element by element to obtain the first fusion feature.
  • the input features are first convolved by using a convolution layer with a smaller convolution kernel, which can reduce the channel size;
  • the features are further processed by convolution; finally, multiple convolution results are fused, so that the obtained fused features can fully retain the local details of the instance to be segmented.
  • Step S212 connecting the first fusion feature, the first semantic mask in the first semantic information, and the first instance mask to obtain the second instance feature.
  • the semantic fusion module after fusing the input first semantic feature and the first instance feature, first, the first semantic mask and the first instance mask in the input semantic fusion module are fused.
  • the resolution is enlarged; then, the first fusion feature obtained by fusion, the semantic mask after the resolution enlargement and the instance mask are spliced in the order from front to back, or spliced in any order to obtain the second instance feature.
  • the second-stage semantic fusion processing is performed on the second instance feature, the stage instance mask, and the second semantic information, and attention is paid to the edge region of the instance to be segmented, so that the edge region of the instance can be segmented more accurately, Thereby, the segmentation effect is greatly improved, that is, the above step S202 can be realized by the following steps:
  • Step S221 performing a second-stage semantic fusion process on the second instance feature, the stage instance mask, and the second semantic information to obtain a first hole corresponding to the third instance feature and the third instance feature mask.
  • the resolution of the second instance feature is greater than the resolution of the first instance feature.
  • the edge area is an area composed of pixels whose distance from the edge line is less than a certain distance threshold based on the edge line of the instance to be segmented.
  • the edge area is an image area formed by taking the edge line of the vehicle in the image to be processed as the center and including part of the foreground (ie, the vehicle image area) and part of the background. .
  • Step S222 based on the first hole mask and the stage instance mask, determine a third instance mask.
  • the first hole mask and the stage instance mask are input into a boundary-aware refinement module; in this module, the prediction results and The stage instance mask, which can predict the complete shape of the instance to be segmented, obtains the third instance mask.
  • step S222 by combining the first instance mask of the full shape output by the first stage and the first hole mask describing the edge region output by the second stage, we obtain A third example mask capable of describing the complete shape more precisely; that is, the above-mentioned step S222 can be achieved by the following steps:
  • the edge regions in the stage instance mask are determined.
  • the edge region of the shape in the image to be processed is predicted.
  • the edge line of the instance to be segmented is determined based on the stage instance mask; for example, the instance is determined by analyzing the complete shape of the instance to be segmented represented by the first instance mask edge line.
  • the to-be-processed image determine a set of pixels whose minimum distance from the edge line is less than a preset distance; for example, in the to-be-processed image, determine each pixel to be the closest to the pixel The distance between the edge lines; the pixel points whose distance is less than the preset distance form a pixel point set.
  • an edge region in the stage instance mask is determined. For example, by fitting the pixel points in the pixel point set, an image area, that is, an edge area is formed; the edge area includes the image area where the edge line is adjacent to the background, and the edge line adjacent to the instance itself. image area. In this way, by analyzing the distance between the pixel point and the edge line of the instance to be segmented, the detail information of the edge region of the instance to be segmented can be more fully preserved.
  • an edge mask describing the edge region of the instance to be segmented is determined based on the edge region and the first hole mask.
  • an edge mask describing the edge region of the segmented instance is obtained by up-sampling the edge region and fusing the up-sampled region with the first hole mask.
  • first based on the resolution of the first hole mask, up-sampling the edge region in the stage instance mask to obtain the first edge region; for example, the first hole mask The resolution is 28 ⁇ 28, and the predicted edge region is up-sampled according to this resolution to obtain a 28 ⁇ 28 first edge region.
  • the edge mask is obtained based on the first edge region and the first hole mask. For example, element-wise multiplication of the first edge region and the first hole mask is performed to obtain the edge mask. In this way, by combining the first edge region of the stage instance mask and the first hole mask for predicting the edge region of the instance to be segmented, the edge region of the instance to be segmented can be predicted more accurately.
  • a non-edge mask describing the non-edge region of the instance to be segmented is determined.
  • a non-edge mask describing the non-edge region of the instance to be segmented is obtained by performing element-by-element multiplication of the edge region after resolution enlargement and the stage instance mask after resolution enlargement.
  • the second step above can be achieved through the following process:
  • the stage instance mask based on the resolution of the first hole mask to obtain an enlarged instance mask; for example, according to the resolution of the first hole mask, the first instance mask is The resolution is upsampled to obtain an enlarged instance mask.
  • the inversion operation is performed on the upsampled mask to obtain an inversion mask.
  • the value of the element in the upsampled mask is 0 or 1; the value of the element in the mask after the upsampling becomes 0; the value of the element 0 becomes 1.
  • the non-edge mask is obtained based on the inversion mask and the enlarged instance mask. For example, element-wise multiplication of the inversion mask and the magnified instance mask results in a non-edge mask that does not contain edge regions.
  • the third instance mask is determined based on the non-edge mask and the edge mask.
  • the non-edge mask and the edge mask are added element-wise to obtain a third instance mask that can accurately describe the complete shape of the instance to be segmented.
  • Step S223 performing a third-stage semantic fusion process on the third instance feature, the first hole mask, and the third semantic information to obtain a second hole mask corresponding to the fourth instance feature and the fourth instance feature. mold.
  • the third instance feature output in the previous stage, the first hole mask describing the edge region, and the third semantic information with the same resolution are input into the semantic fusion module to obtain the fourth instance feature of the instance to be segmented , and a second hole mask obtained by predicting the edge region of the fourth instance feature.
  • Step S224 determining the second example mask based on the second hole mask and the third example mask.
  • the second hole mask describing the edge region and the third instance mask describing the full shape are combined to obtain a second instance mask that can segment the instance to be segmented more accurately.
  • the edge region of the instance to be segmented is predicted, and an accurate edge region can be predicted for each instance to be segmented.
  • the goal of general object detection and instance segmentation is to detect objects in the image and require segmentation of the pixels of the object.
  • High-quality instance segmentation requires the model to not only segment the objects in the image, but also achieve high accuracy at the pixel level, especially the edge regions of the objects.
  • the former requires the model to extract high-level semantic information, while the latter requires the model to preserve detailed information as much as possible.
  • the two-stage instance segmentation algorithm extracts features for each object on the feature pyramid based on the object detection frame, and adopts down-sampling operation to deal with objects of different scales in the process of extracting features. Sampling operations all lead to the loss of detailed information, making it difficult for the final model to achieve high accuracy at the pixel level.
  • instance segmentation is used to assign each pixel to a specific semantic category and to distinguish instances in the same category. For example, taking Mask R-CNN as an example, first, an instance detector is used to generate high-quality bounding boxes; then, a parallel segmentation branch is introduced to predict the binary mask for each instance inside the bounding box; in later steps, such as , the merging operation of RoI-Align to extract instance features from the feature pyramid; finally, pixel-wise classification is performed based on the output features of the instance segmentation branch network.
  • semantic segmentation classifies each pixel into a fixed set of classes without distinguishing instances. Since semantic segmentation does not require extreme high-level features to distinguish large instances, high-resolution features can be fully exploited. Semantic segmentation methods in related art utilize high-resolution features to generate high-quality semantic representations and segment clear instance boundaries. Giraffes 321 and 322 are shown in Figure 3(b).
  • the embodiments of the present disclosure provide a high-quality instance segmentation framework for performing high-quality instance segmentation on instances and scenes, and merging fine-grained features in a multi-stage manner during the instance-level segmentation process.
  • the high-quality instance segmentation framework is able to refine high-quality masks.
  • the segmentation accuracy at the pixel level can be effectively improved while retaining the advantages of existing algorithms, thereby achieving high-quality instance segmentation.
  • Embodiments of the present disclosure perform instance segmentation by adopting the current two-stage method to distinguish instances and supplement lost details with fine-grained features during instance segmentation.
  • an embodiment of the present disclosure proposes a new framework for Refine Mask.
  • the refinement mask builds a new branch network for semantic segmentation on the highest resolution feature map of the feature pyramid to generate fine-grained semantic features. These fine-grained features are used to supplement the lost details during the per-instance segmentation process.
  • the refinement mask gradually enlarges the prediction size and incorporates fine-grained features, which can alleviate the loss of detail in the prediction of high-quality instance masks.
  • the refinement mask uses a boundary-aware refinement strategy to focus on edge regions, enabling more accurate boundaries to be predicted.
  • refinement masks enable higher quality masks by iteratively fusing more fine-grained features and focusing explicitly on edge regions.
  • the quality segmentation results output by the thinning mask are shown as giraffes 331 and 332, which shows that the thinning mask can obtain sufficient detailed features in hard regions such as instance boundaries.
  • FIG. 4 is a schematic diagram of the frame of the refinement mask provided by the embodiment of the present disclosure.
  • the frame of the refinement mask is based on the detector feature map pyramid network 401, through two small network modules, namely the semantic segmentation branch network 402 and instance segmentation branch network to achieve high-quality instance segmentation.
  • the semantic segmentation branch network 402 takes as input the highest resolution feature map from the feature pyramid of the detector feature map pyramid network 401 and performs semantic segmentation.
  • the output of the semantic segmentation branch network maintains the same resolution as the input without using spatial compression operations (e.g., downsampling).
  • the fine-grained features generated by the Semantic Segmentation Branch Network are used to facilitate instance segmentation in the Instance Segmentation Branch Network.
  • the instance segmentation branch network performs instance segmentation in a multi-stage manner. At each stage, the instance segmentation branch network incorporates semantic features and semantic masks extracted from fine-grained features, and increases the spatial size of the features, enabling better instance mask prediction. Besides, a boundary-aware refinement strategy is proposed in the instance segmentation branch network, which explicitly focuses on edge regions and predicts sharper boundaries.
  • the semantic segmentation branch network is a fully convolutional neural network whose input is the highest resolution feature map of the feature map pyramid network.
  • the semantic segmentation branch network consists of four convolutional layers to extract semantic features of the whole image and predict the probability that each pixel belongs to an object through a binary classifier. Predict high-resolution semantic masks for the entire image under the supervision of a binary cross-entropy loss. Define fine-grained features as the union of semantic features and semantic masks. These fine-grained features can also be used to supplement the lost details in the instance segmentation branch network, enabling high-quality semantic mask prediction.
  • the highest resolution feature map of the feature map pyramid network 401 is input into the semantic segmentation branch network 402 , and the semantic features and semantic masks 403 are output.
  • the instance segmentation branch is a fully convolutional instance segmentation branch network.
  • the features extracted by the 14 ⁇ 14 region-of-interest alignment operation are fed into two 3 ⁇ 3 convolutional layers to generate instance features.
  • a 1 ⁇ 1 convolutional layer is used to predict the instance mask, but the spatial size of this mask is 14 ⁇ 14. This rough mask is used as a mask for subsequent refinement stages.
  • embodiments of the present disclosure propose a multi-stage optimization process to optimize the rough instance mask in an iterative manner.
  • the input of each stage consists of four parts, including: instance features and instance masks obtained in the previous stage, and semantic features and semantic masks pooled from the output of the semantic segmentation branch network.
  • instance features and instance masks obtained in the previous stage
  • semantic features and semantic masks pooled from the output of the semantic segmentation branch network.
  • the input of each stage consists of four parts, including: instance features and instance masks obtained in the previous stage, and semantic features and semantic masks pooled from the output of the semantic segmentation branch network.
  • the instance segmentation branch network runs this optimization process repeatedly and outputs high-quality instance masks with resolutions up to 112 ⁇ 112.
  • the fused features in the semantic fusion module are compressed using a 1 ⁇ 1 convolutional layer to halve their channels before scaling to higher spaces.
  • the region of interest alignment operation is performed on the feature pyramid of the feature map pyramid network 401 to obtain an instance feature 404 of a fixed size, and a convolution operation is performed on the instance feature 404 to obtain a convolved instance feature 405 .
  • Mask prediction is performed based on the convolved instance features 405, resulting in a 14 ⁇ 14 initial mask.
  • the semantic features and semantic masks of size 14 ⁇ 14 are extracted from the semantic features and semantic masks 403 assembled in the output of the semantic segmentation branch network 402 using a region of interest alignment operation.
  • a region-of-interest alignment operation is used to extract semantic features and semantic masks 403 of size 28 ⁇ 28 from the semantic features and semantic masks 403 assembled in the output of the semantic segmentation branch network 402; the instance features 406, complete The 28 ⁇ 28 instance masks, 28 ⁇ 28 semantic features and semantic masks of the Scale up to a higher space, output a 56x56 instance feature 407; and based on the 56x56 instance feature, predict an instance mask 409 for edge regions of the instance feature.
  • BAR Boundary-Aware Refinement
  • a region of interest alignment operation is used to extract semantic features and semantic masks 403 of size 56 ⁇ 56 from the semantic features and semantic masks 403 collected in the output of the semantic segmentation branch network 402;
  • the masks 409, 56 ⁇ 56 semantic features and semantic masks are input into the semantic fusion module 413 of the third stage; then, the semantic fusion module 413 fuses the contents of these four parts, and upsamples the fused features to Higher space, output a 112x112 instance feature 408; and based on the 112x112 instance feature, predict an instance mask 410 for the edge region of the instance feature.
  • an instance mask of 112 ⁇ 112 that can characterize the complete shape of the instance is obtained; in this way, by further improving the resolution of instance features
  • the resulting 112 ⁇ 112 instance mask characterizes the full shape of the instance more precisely.
  • an embodiment of the present disclosure proposes a semantic fusion module, so that each neuron in the instance segmentation branch network can perceive its surrounding environment.
  • the semantic fusion module connects four input parts 51 to 54.
  • these features are fused after a 1 ⁇ 1 convolutional layer, resulting in a fused instance feature 501 (corresponding to the above the first convolutional feature in the example), and reduce the channel size.
  • the embodiments of the present disclosure propose a boundary-aware refinement strategy to focus on edge regions, which can accurately predict the boundary of instance masks.
  • the first stage outputs a coarse and full instance mask M 1 of size 28 ⁇ 28 and generates its boundary mask
  • a finer and more complete instance mask M'k (the final output of stage k ) is generated in subsequent stages, which can be expressed as equations (1) and (2):
  • FIG. 6 is a schematic diagram of the reasoning process of the second stage of instance segmentation provided by the embodiment of the present disclosure.
  • the mask 604 is multiplied element by element with the 56 ⁇ 56 instance mask 611 output in the current stage to obtain the multiplication result (corresponding to the non-edge mask in the above-mentioned embodiment); then the upsampled boundary mask is The modulus 603 is multiplied by the first hole mask 605 generated in the second stage (ie, the current stage) to obtain another multiplication result (corresponding to the edge mask in the above embodiment); finally, these two multiplication results are Element-wise summation is performed to obtain a 56x56 complete and fine second example mask 606, and the process shown in Figure 6 is repeated until the best mask is obtained.
  • the instance segmentation branch network has three refinement stages.
  • the training process for optimizing the edge region of the instance to be segmented in the embodiments of the present disclosure is as follows:
  • M k denote the binary instance mask of stage k
  • the edge region of Mk is defined as a distance from the mask contour less than An area consisting of pixels of pixels.
  • a binary mask B k is used to represent the edge region of M k
  • B k can be expressed as formula (3):
  • FIG. 7 is a schematic diagram of an application scenario of an example edge area provided by an embodiment of the present disclosure
  • d ij is the distance from the pixel p ij in the picture to the contour 701 closest to the pixel, and from the contour 701 to the boundary line
  • the area formed by 702 , and the area formed from the outline 701 to the boundary line 703 , these two areas make up the edge area 704 .
  • the instance mask is first resized to a fixed size. For example, the mask boundaries are determined by using 28x28 in the first stage and 56x56 in the second stage. As shown in FIG. 4, first, in the second stage, the complete 28 ⁇ 28 instance mask and the instance mask 409 are input into the edge refinement module 421, and the edge refinement module 421 predicts the complete instance of the instance mask 410 . Then, in three stages, the complete instance mask 410 and the instance mask 410 are input into the edge refinement module 421, and the edge refinement module 421 predicts the instance complete and fine instance mask.
  • edge regions Rk are determined by the ground-truth mask and the predicted mask of the previous stage of this stage, as shown in Equation (4):
  • f up represents a bilinear upsampling operation with a scale factor of 2
  • represents the union of the above two edge regions.
  • N is the number of instances and lnij is the binary cross-entropy loss of instance n at pixel position (i, j).
  • the losses defined in equation (5) are used for the last two refining stages.
  • an average binary cross-entropy loss is employed for the semantic segmentation branch network and other mask prediction stages.
  • the loss weights for the initial mask prediction stage and the three refinement stages are set to 0.25, 0.5, 0.75, and 1.0, respectively.
  • the loss weight of the detection head is set to 2.0, which includes both classification and regression losses. set in the training phase is 2, set in the inference phase is 1.
  • the lost detail information is supplemented stage by stage in the segmentation process, so that the model can more accurately segment the edge region of the instance, thereby greatly improving the final segmentation effect.
  • FIG. 8 is a schematic structural diagram of the instance segmentation apparatus according to an embodiment of the present disclosure. As shown in FIG. 8 , the instance segmentation apparatus 800 includes:
  • the first acquisition module 801 is configured to acquire first semantic information of the image to be processed, and a first instance feature of an instance to be segmented in the to-be-processed image and a first instance mask corresponding to the first instance feature;
  • the first processing module 802 is configured to perform at least two stages of semantic fusion processing based on the first semantic information, the first instance feature and the first instance mask to obtain a second instance mask;
  • the first instance feature output by the semantic fusion processing in the previous stage is up-sampled to obtain the instance feature of the latter stage, and the corresponding instance mask is obtained based on the instance feature of the latter stage, and the latter
  • the instance feature of one stage, the instance mask of the latter stage, and the semantic information corresponding to the latter stage are used as the input features of the semantic fusion processing of the latter stage; and, the input of the semantic fusion processing of each stage
  • the resolution of semantic information in features is the same as the resolution of instance features.
  • the first processing module 802 includes:
  • a first processing submodule configured to perform first-stage semantic fusion processing based on the first semantic information, the first instance feature and the first instance mask to obtain a second instance feature
  • the second processing submodule is configured to perform at least one stage of semantic fusion processing based on the second instance feature, the stage instance mask corresponding to the second instance feature, and the second semantic information to obtain the second instance mask wherein, the resolution of the second semantic information is the same as the resolution of the second instance feature.
  • the first processing sub-module includes:
  • a first fusion unit configured to fuse the first semantic feature in the first semantic information with the first instance feature to obtain a first fusion feature
  • the first connection unit is configured to connect the first fusion feature, the first semantic mask in the first semantic information, and the first instance mask to obtain the second instance feature.
  • the first fusion unit includes:
  • a first convolution subunit configured to use a first convolution operation to process the first semantic feature and the first instance feature to obtain a first convolution feature
  • the second convolution subunit is configured to use a plurality of second convolution operations respectively to process the first convolution feature to obtain a plurality of second convolution results; wherein, the volume of the first convolution operation
  • the product kernel is smaller than the convolution kernel of the second convolution, and the holes of the plurality of second convolution operations are different in size;
  • the first determination subunit is configured to determine the first fusion feature based on the plurality of second convolution results.
  • the second processing sub-module includes:
  • a first processing unit configured to perform a second-stage semantic fusion process on the second instance feature, the stage instance mask, and the second semantic information, to obtain a third instance feature corresponding to the third instance feature the first hole mask of ;
  • a first determining unit configured to determine a third instance mask based on the first hole mask and the stage instance mask
  • the second processing unit is configured to perform a third-stage semantic fusion process on the third instance feature, the first hole mask and the third semantic information, to obtain a fourth instance feature corresponding to the fourth instance feature a second hole mask;
  • a second determination unit configured to determine the second instance mask based on the second hole mask and the third instance mask.
  • the first processing unit includes:
  • a first processing subunit configured to perform a second-stage semantic fusion process on the second instance feature, the stage instance mask, and the second semantic information to obtain the third instance feature
  • the first prediction subunit is configured to predict the edge region in the third instance feature to obtain the first hole mask.
  • the second determining unit includes:
  • a second determination subunit configured to determine the edge region in the stage instance mask
  • a third determining subunit configured to determine an edge mask describing the edge region of the instance to be segmented based on the edge region and the first hole mask
  • a fourth determination subunit configured to determine a non-edge mask describing the non-edge region of the instance to be segmented based on the edge region and the stage instance mask;
  • a fifth determination subunit is configured to determine the third instance mask based on the non-edge mask and the edge mask.
  • the third determination subunit is further configured to: based on the resolution of the first hole mask, up-sample the edge region in the stage instance mask to obtain the first edge region ; Obtain the edge mask based on the first edge region and the first hole mask.
  • the fourth determining subunit is further configured to: upsample the stage instance mask based on the resolution of the first hole mask to obtain an enlarged instance mask; An inversion operation is performed on the region to obtain an inversion mask; based on the inversion mask and the enlarged example mask, the non-edge mask is obtained.
  • the second determining subunit is further configured to: determine the edge line of the instance to be segmented based on the stage instance mask; A set of pixel points whose minimum distance between them is less than a preset distance; based on the set of pixel points, an edge region in the stage instance mask is determined.
  • the apparatus further includes:
  • a first extraction module configured to use a feature map pyramid network to perform feature extraction on the to-be-processed image to obtain an image feature set including multiple image features with different resolutions
  • the first determining module is configured to determine the semantic information of the to-be-processed image based on the target image feature whose resolution meets a preset threshold in the image feature set.
  • the first determining module includes:
  • a first segmentation sub-module configured to perform semantic segmentation on the to-be-processed image based on the target image feature to obtain semantic features
  • a first determination submodule configured to determine the probability that each pixel in the to-be-processed image belongs to the to-be-segmented instance based on the semantic feature
  • a second determination submodule configured to determine the semantic mask of the to-be-processed image based on the probability
  • the third determining submodule is configured to use the semantic feature and the semantic mask as the semantic information.
  • the first obtaining module 801 includes:
  • a first alignment sub-module configured to use a region of interest alignment operation to select a first image feature that satisfies a preset resolution in the feature map set of the to-be-processed image
  • a fourth determination submodule configured to determine the first instance feature and the first instance mask based on the first image feature
  • the second alignment sub-module is configured to use the region of interest alignment operation to select the first semantic information whose resolution is the preset resolution in the semantic information.
  • the above-mentioned instance segmentation method is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer-readable storage medium.
  • the technical solutions of the embodiments of the present disclosure essentially or the parts that make contributions to the prior art can be embodied in the form of a software product, and the computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a terminal, a server, etc.) is caused to execute all or part of the methods described in the various embodiments of the present disclosure.
  • the aforementioned storage medium includes: a U disk, a mobile hard disk, a read only memory (Read Only Memory, ROM), a magnetic disk or an optical disk and other media that can store program codes.
  • ROM Read Only Memory
  • an embodiment of the present disclosure further provides a computer program product, wherein the computer program product includes computer-executable instructions, and after the computer-executable instructions are executed, the steps in the instance segmentation method provided by the embodiment of the present disclosure can be implemented.
  • an embodiment of the present disclosure further provides a computer storage medium, where computer-executable instructions are stored on the computer storage medium, and when the computer-executable instructions are executed by a processor, the method for instance segmentation provided by the foregoing embodiments is implemented. step.
  • FIG. 9 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure.
  • the computer device 900 includes: a processor 901 , at least one communication bus, Communication interface 902 , at least one external communication interface and memory 903 .
  • the communication interface 902 is configured to realize the connection communication between these components.
  • the communication interface 902 may include a display screen, and the external communication interface may include a standard wired interface and a wireless interface.
  • the processor 901 is configured to execute an image processing program in the memory, so as to implement the steps of the instance segmentation method provided by the foregoing embodiments.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms. of.
  • each functional unit in each embodiment of the present disclosure may be all integrated into one processing unit, or each unit may be separately used as a unit, or two or more units may be integrated into one unit; the above integration
  • the unit can be implemented either in the form of hardware or in the form of hardware plus software functional units.
  • the aforementioned program can be stored in a computer-readable storage medium, and when the program is executed, the execution includes: The steps of the above method embodiments; and the aforementioned storage medium includes: a removable storage device, a read only memory (Read Only Memory, ROM), a magnetic disk or an optical disk and other media that can store program codes.
  • ROM Read Only Memory
  • the above-mentioned integrated units of the present disclosure are implemented in the form of software functional modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.
  • the technical solutions of the embodiments of the present disclosure essentially or the parts that make contributions to the prior art can be embodied in the form of a software product, and the computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a personal computer, a server, or a network device, etc.) is caused to execute all or part of the methods described in the various embodiments of the present disclosure.
  • the aforementioned storage medium includes various media that can store program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.
  • Embodiments of the present disclosure provide an instance segmentation method and apparatus, an electronic device, and a storage medium, wherein first semantic information of an image to be processed is acquired, and a first instance feature of an instance to be segmented in the image to be processed and the The first instance mask corresponding to the first instance feature; based on the first semantic information, the first instance feature and the first instance mask, perform at least two stages of semantic fusion processing to obtain a second instance mask; wherein, the first instance feature output by the semantic fusion processing of the previous stage is upsampled to obtain the instance feature of the latter stage, and the corresponding instance mask is obtained based on the instance feature of the latter stage, and the The instance feature, the instance mask of the next stage, and the semantic information corresponding to the next stage are used as the input features of the semantic fusion processing of the subsequent stage; and the resolution of the semantic information in the input features of the semantic fusion processing of each stage and the instance feature the same resolution.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

本公开实施例提供一种实例分割方法及装置、电子设备及存储介质,其中,获取待处理图像的第一语义信息,和所述待处理图像中的待分割实例的第一实例特征和与所述第一实例特征对应的第一实例掩模;基于所述第一语义信息、所述第一实例特征和所述第一实例掩模,进行至少两个阶段的语义融合处理,得到第二实例掩模;其中,将前一阶段语义融合处理输出的第一实例特征进行上采样得到后一阶段的实例特征,并基于后一阶段的实例特征得到其对应的实例掩模,将后一阶段的实例特征、后一阶段的实例掩膜和后一阶段对应的语义信息作为后一阶段语义融合处理的输入特征;且,每一阶段语义融合处理的输入特征中的语义信息的分辨率与实例特征的分辨率相同。如此,通过多个阶段对待分割实例的语义信息、实例特征和实例掩模进行细化,每个阶段接收上一阶段输出的实例特征以及语义分割补充的细节信息,能够大幅度提升待分割实例的分割效果。

Description

实例分割方法及装置、电子设备及存储介质
相关申请的交叉引用
本专利申请要求2021年04月15日提交的中国专利申请号为202110407978.X、申请人为北京市商汤科技开发有限公司,申请名称“实例分割方法及装置、电子设备及存储介质”的优先权,该申请的全文以引用的方式并入本公开中。
技术领域
本公开实施例涉及实例分割技术领域,涉及但不限于一种实例分割方法及装置、电子设备及存储介质。
背景技术
物体检测和实例分割的目标是检测出图片中的物体并要求分割出物体的像素。高质量实例分割要求模型不仅能够分割出图片中的物体,而且要在像素级别达到高准确率。在相关技术中,基于物体检测框为每个物体提取特征,并且在提取特征过程中采用下采样操作以处理不同尺度的物体,以实现物体检测和实例分割,难以达到较高的实例分割的准确度。
发明内容
本公开实施例提供一种实例分割技术方案。
本公开实施例的技术方案是这样实现的:
本公开实施例提供一种实例分割方法,所述方法包括:
获取待处理图像的第一语义信息,和所述待处理图像中的待分割实例的第一实例特征和与所述第一实例特征对应的第一实例掩模;
基于所述第一语义信息、所述第一实例特征和所述第一实例掩模,进行至少两个阶段的语义融合处理,得到第二实例掩模;
其中,将前一阶段所述语义融合处理输出的第一实例特征进行上采样得到后一阶段的实例特征,并基于所述后一阶段的实例特征得到其对应的实例掩模,将所述后一阶段的所述实例特征、所述后一阶段的实例掩膜和所述后一阶段对应的语义信息作为后一阶段语义融合处理的输入特征;且,每一阶段所述语义融合处理的输入特征中的语义信息的分辨率与实例特征的分辨率相同。
在一些实施例中,所述基于所述第一语义信息、所述第一实例特征和所述第一实例掩模,进行至少两个阶段的语义融合处理,得到第二实例掩模,包括:基于所述第一语义信息、所述第一实例特征和所述第一实例掩模,进行第一阶段的语义融合处理,得到第二实例特征;基于所述第二实例特征、与所述第二实例特征对应的阶段实例掩模和第二语义信息进行至少一阶段的语义融合处理,得到所述第二实例掩模;其中,所述第二语义信息的分辨率与所述第二实例特征的分辨率相同。如此,对第一语义信息、第一实例特征和第一实例掩模进行多阶段细化,能并且每一阶段输出的结果的分辨率大于其前一阶段输出结果的分辨率,从而能够为每一个待分割实例输出高分辨率的实例掩模。
在一些实施例中,所述基于所述第一语义信息、所述第一实例特征和所述第一实例掩模,进行第一阶段的语义融合处理,得到第二实例特征,包括:将所述第一语义信息中的第一语义特征和所述第一实例特征进行融合,得到第一融合特征;将所述第一融合特征、所述第一语义信息中第一语义掩模和所述第一实例掩模相连接,得到所述第二实例特征。如此,通过采用语义融合模块对待分割图像的特征和掩模进行融合,能够得到具更细粒度的第二实例特征。
在一些实施例中,所述将所述第一语义信息中的第一语义特征和所述第一实例特征进行融合,得到第一融合特征,包括:采用第一卷积操作,对所述第一语义特征和所述第一实例特征进行处理,得到第一卷积特征;分别采用多个第二卷积操作,对所述第一卷积特征进行处理,得到多个第二卷积结果;其中,所述第一卷积操作的卷积核小于所述第二卷积的卷积核,且所述多个第二卷积操作的空洞大小不同;基于所述多个第二卷积结果,确定所述第一融合特征。如此,使得到的融合特征能够充分保留待分割实例局部细节信息。
在一些实施例中,所述基于所述第二实例特征、与所述第二实例特征对应的阶段实例掩模和第二语义信息进行至少一阶段的语义融合处理,得到所述第二实例掩模,包括:对所述第二实例特征、所述阶段实例掩模和所述第二语义信息进行第二阶段的语义融合处理,得到第三实例特征和所述第三实例特征对应的第一空洞掩模;基于所述第一空洞掩模和所述阶段实例掩模,确定第三实例掩模;对所述第三实例特征、所述第一空洞掩模和第三语义信息进行第三阶段的语义融合处理,得到第四实例特征和所述第四实例特征对应的第二空洞掩模;基于所述第二空洞掩模和所述第三实例掩模,确定所述第二实例掩模。如此,能够更加准确地分割实例的边缘区域,从而大幅度提升分割效果。
在一些实施例中,所述对所述第二实例特征、所述阶段实例掩模和所述第二语义信息进行第二阶段的语义融合处理,得到第三实例特征和所述第三实例特征对应的第一空洞掩模,包括:对所述第二实例特征、所述阶段实例掩模和所述第二语义信息进行第二阶段的语义融合处理,得到所述第三实例特征;对所述第三实例特征中的边缘区域进行预测,得到所述第一空洞掩模。如此,通过通过在对待分割实例进行实例分割的中,进一步对待分割实例的实例特征以及语义特征进行第二次融合处理,能够引入更高分辨率的语义分割特征,从而使得到分割的边缘区域更加准确。
在一些实施例中,所述基于所述第一空洞掩模和所述阶段实例掩模,确定第三实例掩模,包括:确定所述阶段实例掩模中的边缘区域;基于所述边缘区域和所述第一空洞掩模,确定描述所述待分割实例的边缘区域的边缘掩模;基于所述边缘区域和所述阶段实例掩模,确定描述所述待分割实例的非边缘区域的非边缘掩模;基于所述非边缘掩模和所述边缘掩模,确定所述第三实例掩模。如此,结合第一阶段输出的完整形状的第一实例掩模和第二阶段输出的描述边缘区域的第一空洞掩模,得到能够更加精确描述完整形状的第三实例掩模。
在一些实施例中,所述基于所述边缘区域和所述第一空洞掩模,确定描述所述待分割实例的边缘区域的边缘掩模,包括:基于所述第一空洞掩模的分辨率,对所述阶段实例掩模中的边缘区域进行上采样,得到第一边缘区域;基于所述第一边缘区域和所述第一空洞掩模,得到所述边缘掩模。如此,通过结合阶段实例掩模的第一边缘区域和对待分割实例进行边缘区域预测的第一空洞掩模,能够更加准确的预测待分割实例的边缘区域。
在一些实施例中,所述基于所述边缘区域和所述阶段实例掩模,确定描述所述待分割实例的非边缘区域的非边缘掩模,包括:基于所述第一空洞掩模的分辨率对所述阶段实例掩模进行上采样,得到放大实例掩模;对第一边缘区域进行反转操作,得到反转掩模;基于所述反转掩模和所述放大实例掩模,得到所述非边缘掩模。如此,能够准确描述待分割实例的完整形状的第三实例掩模。
在一些实施例中,所述确定所述阶段实例掩模中的边缘区域,包括:基于所述阶段实例掩模,确定所述待分割实例的边缘线;在所述待处理图像中,确定与所述边缘线之间的最小距离小于预设距离的像素点集合;基于所述像素点集合,确定所述阶段实例掩模中的边缘区域。如此,通过分析像素点与待分割实例的边缘线之间的距离,能够更加充分地保留待分割实例的边缘区域的细节信息。
在一些实施例中,所述确定待处理图像的第一语义信息之前,所述方法还包括:采用特征图金字塔网络,对所述待处理图像进行特征提取,得到包括分辨率不同的多个图像特征的图像特征集合;基于所述图像特征集合中分辨率满足预设阈值的目标图像特征,确定所述待处理图像的语义信息。如此,能够更加丰富的语义信息和更加准确的实例特征和实例掩模。
在一些实施例中,所述基于所述图像特征集合中分辨率满足预设阈值的目标图像特征,确定所述待处理图像的语义信息,包括:基于所述目标图像特征,对所述待处理图像进行语义分割,得到语义特征;基于所述语义特征,确定所述待处理图像中每一像素属于所述待分割实例的概率;基于所述概率,确定所述待处理图像的语义掩模;将所述语义特征和所述语义掩模,作为所述语义信息。如此,能够得到细节信息丰富的语义信息。
在一些实施例中,所述获取待处理图像的第一语义信息,和所述待处理图像中的待分割实例的第一实例特征和与所述第一实例特征对应的第一实例掩模,包括:采用感兴趣区域对齐操作,在所述待处理图像的特征图集合中选择满足预设分辨率的第一图像特征;基于所述第一图像特征,确定所述第一实例特征和所述第一实例掩模;采用所述感兴趣区域对齐操作,在所述语义信息中选择分辨率为所述预设分辨率的所述第一语义信息。如此,通过采用采用感兴趣区域对齐操作选择满足一定分辨率的语义信息、实例特征以及实例掩模,能够进一步补充细节损失。
本公开实施例提供一种实例分割装置,所述装置包括:
第一获取模块,配置为获取待处理图像的第一语义信息,和所述待处理图像中的待分割实例的第一实例特征和与所述第一实例特征对应的第一实例掩模;
第一处理模块,配置为基于所述第一语义信息、所述第一实例特征和所述第一实例掩模,进行至少两个阶段的语义融合处理,得到第二实例掩模;
其中,将前一阶段所述语义融合处理输出的第一实例特征进行上采样得到后一阶段的实例特征,并基于所述后一阶段的实例特征得到其对应的实例掩模,将所述后一阶段的所述实例特征、所述后一阶段的实例掩膜和所述后一阶段对应的语义信息作为后一阶段语义融合处理的输入特征;且,每一阶段所述语义融合处理的输入特征中的语义信息的分辨率与实例特征的分辨率相同。
在一些实施例中,所述第一处理模块,包括:
第一处理子模块,配置为基于所述第一语义信息、所述第一实例特征和所述第一实例掩模,进行第一阶段的语义融合处理,得到第二实例特征;
第二处理子模块,配置为基于所述第二实例特征、与所述第二实例特征对应的阶段实例掩模和第二语义信息进行至少一阶段的语义融合处理,得到所述第二实例掩模;其中,所述第二语义信息的分辨率与所述第二实例特征的分辨率相同。
在一些实施例中,所述第一处理子模块,包括:
第一融合单元,配置为将所述第一语义信息中的第一语义特征和所述第一实例特征进行融合,得到第一融合特征;
第一连接单元,配置为将所述第一融合特征、所述第一语义信息中第一语义掩模和所述第一实例掩模相连接,得到所述第二实例特征。
在一些实施例中,所述第一融合单元,包括:
第一卷积子单元,配置为采用第一卷积操作,对所述第一语义特征和所述第一实例特征进行处理,得到第一卷积特征;
第二卷积子单元,配置为分别采用多个第二卷积操作,对所述第一卷积特征进行处理,得到多个第二卷积结果;其中,所述第一卷积操作的卷积核小于所述第二卷积的卷积核,且所述多个第二卷积操作的空洞大小不同;
第一确定子单元,配置为基于所述多个第二卷积结果,确定所述第一融合特征。
在一些实施例中,所述第二处理子模块,包括:
第一处理单元,配置为对所述第二实例特征、所述阶段实例掩模和所述第二语义信息进行第二阶段的语义融合处理,得到第三实例特征和所述第三实例特征对应的第一空洞掩模;
第一确定单元,配置为基于所述第一空洞掩模和所述阶段实例掩模,确定第三实例掩模;
第二处理单元,配置为对所述第三实例特征、所述第一空洞掩模和第三语义信息进行第三阶段的语义融合处理,得到第四实例特征和所述第四实例特征对应的第二空洞掩模;
第二确定单元,配置为基于所述第二空洞掩模和所述第三实例掩模,确定所述第二实例掩模。
在一些实施例中,所述第一处理单元,包括:
第一处理子单元,配置为对所述第二实例特征、所述阶段实例掩模和所述第二语义信息进行第二阶段的语义融合处理,得到所述第三实例特征;
第一预测子单元,配置为对所述第三实例特征中的边缘区域进行预测,得到所述第一空洞掩模。
在一些实施例中,所述第二确定单元,包括:
第二确定子单元,配置为确定所述阶段实例掩模中的边缘区域;
第三确定子单元,配置为基于所述边缘区域和所述第一空洞掩模,确定描述所述待分割实例的边缘区域的边缘掩模;
第四确定子单元,配置为基于所述边缘区域和所述阶段实例掩模,确定描述所述待分割实例的非边缘区域的非边缘掩模;
第五确定子单元,配置为基于所述非边缘掩模和所述边缘掩模,确定所述第三实例掩模。
在一些实施例中,所述第三确定子单元,还配置为:基于所述第一空洞掩模的分辨率,对所述阶段实例掩模中的边缘区域进行上采样,得到第一边缘区域;基于所述第一边缘区域和所述第一空洞掩模,得到所述边缘掩模。
在一些实施例中,所述第四确定子单元,还配置为:基于所述第一空洞掩模的分辨率对所述阶段实例掩模进行上采样,得到放大实例掩模;对第一边缘区域进行反转操作,得到反转掩模;基于所述反转掩模和所述放大实例掩模,得到所述非边缘掩模。
在一些实施例中,所述第二确定子单元,还配置为:基于所述阶段实例掩模,确定所述待分割实例的边缘线;在所述待处理图像中,确定与所述边缘线之间的最小距离小于预设距离的像素点集合;基于所述像素点集合,确定所述阶段实例掩模中的边缘区域。
在一些实施例中,所述装置还包括:
第一提取模块,配置为采用特征图金字塔网络,对所述待处理图像进行特征提取,得到包括分辨率不同的多个图像特征的图像特征集合;
第一确定模块,配置为基于所述图像特征集合中分辨率满足预设阈值的目标图像特征,确定所述待处理图像的语义信息。
在一些实施例中,所述第一确定模块,包括:
第一分割子模块,配置为基于所述目标图像特征,对所述待处理图像进行语义分割,得到语义特征;
第一确定子模块,配置为基于所述语义特征,确定所述待处理图像中每一像素属于所述待分割实例的概率;
第二确定子模块,配置为基于所述概率,确定所述待处理图像的语义掩模;
第三确定子模块,配置为将所述语义特征和所述语义掩模,作为所述语义信息。
在一些实施例中,所述第一获取模块,包括:
第一对齐子模块,配置为采用感兴趣区域对齐操作,在所述待处理图像的特征图集合中选择满足预设分辨率的第一图像特征;
第四确定子模块,配置为基于所述第一图像特征,确定所述第一实例特征和所述第一实例掩模;
第二对齐子模块,配置为采用所述感兴趣区域对齐操作,在所述语义信息中选择分辨率为所述预设分辨率的所述第一语义信息。
对应地,本公开实施例提供一种计算机存储介质,所述计算机存储介质上存储有计算机可执行指令,该计算机可执行指令被执行后,能够实现上述方法的步骤。
本公开实施例提供一种计算机设备,所述计算机设备包括存储器和处理器,所述存储器上存储有计算机可执行指令,所述处理器运行所述存储器上的计算机可执行指令时能够实现上述方法的步骤。
本公开实施例提供一种计算机程序产品,所述计算机程序产品包括计算机可执行指令,该计算机可执行指令被执行后,能够实现上述任意一项所述的实例分割方法。
本公开实施例提供一种实例分割方法及装置、电子设备及存储介质,首先获取待处理图像的第一语义信息和待分割实例的第一实例特征和第一实例掩模;然后,通过对第一语义信息、第一实例特征和第一实例掩模,进行至少两阶段的语义融合处理,得到描述待分割实例所在图像区域的第二实例掩模;而且将前一阶段语义融合处理输出的实例特征进行上采样得到后一阶段的实例特征,将该后一阶段的实例特征、对应的实例掩模和语义信息,作为后一阶段语义融合处理的输入特征;这样,通过多个阶段对待分割实例的语义信息、实例特征和实例掩模进行细化,每个阶段接收上一阶段输出的实例特征以及语义分割补充的细节信息,能够大幅度提升待分割实例的分割效果。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。根据下面参考附图对示例性实施例的详细说明,本公开的其它特征及方面将变得清楚。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开实施例的实施例,并与说明书一起用于说明本公开实施例的技术方案。应当理解,以下附图仅示出了本公开实施例的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1A为可以应用本公开实施例的实例分割方法的一种系统架构示意图;
图1B为本公开实施例提供的实例分割方法的实现流程示意图;
图2为本公开实施例提供的实例分割方法的另一实现流程示意图;
图3为本公开实施例提供的不同方式的实例分割结果示意图;
图4为本公开实施例提供的细化掩模的框架示意图;
图5为本公开实施例提供的语义融合模块的组成结构示意图;
图6为本公开实施例提供的实例分割的第二阶段的推理过程示意图;
图7为本公开实施例提供的实例边缘区域的应用场景示意图;
图8为本公开实施例实例分割装置的结构组成示意图;
图9为本公开实施例计算机设备的组成结构示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对发明的具体技术方案做进一步详细描述。以下实施例用于说明本公开,但不用来限制本公开的范围。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
在以下的描述中,所涉及的术语“第一\第二\第三”仅仅是是区别类似的实例,不代表针对实例的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本公开实施例能够以除了在这里图示或描述的以外的顺序实施。
除非另有定义,本文所使用的所有的技术和科学术语与属于本公开的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本公开实施例的目的,不是旨在限制本公开。
对本公开实施例进行进一步详细说明之前,对本公开实施例中涉及的名词和术语进行说明,本公开实施例中涉及的名词和术语适用于如下的解释。
1)语义分割,将图片中的各类信息进行分割,例如人的轮廓标记为红色,马路标记为紫色,但是不同的人是没有办法区分的,相当的与将图片中的图片进行了大类的外部轮廓与标签的匹配。
2)实例分割,对图像中的每个像素都划分出对应的类别以及类的具体实例,即为实例,实例分割不但要进行像素级别的分类,还需在具体的类别基础上区别开不同的实例。比如,图像有多个人甲、乙、丙,那边他们的语义分割结果都是人,而实例分割结果却是不同的实例。
3)特征图金字塔网络(Feature Pyramid Networks,FPN)是一种网络,主要解决的是物体检测中的多尺度问题,在不同特征层独立进行预测,通过简单的网络连接改变,在基本不增加原有模型计算量的情况下,大幅度提升了小物体检测的性能。
下面说明本公开实施例提供的实例分割的设备的示例性应用,本公开实施例提供的设备可以实施为具有图像采集功能的笔记本电脑,平板电脑,台式计算机,相机,移动设备(例如,个人数字助理,专用消息设备,便携式游戏设备)等各种类型的用户终端,也可以实施为服务器。下面,将说明设备实施为终端或服务器时示例性应用。
本公开实施例提供一种实例分割方法,该方法可以应用于计算机设备,该方法所实现的功能可以通过计算机设备中的处理器调用程序代码来实现,当然程序代码可以保存在计算机存储介质中,可见,该计算机设备至少包括处理器和存储介质。
如图1A所示,图1A为可以应用本公开实施例的实例分割方法的一种系统架构示意图;如图1A所示,该系统架构中包括:图像获取终端11、网络12和实例分割终端13。为实现支撑一个示例性应用,图像获取终端11和实例分割终端13可以通过网络12建立通信连接,图像获取终端11通过网络12向实例分割终端13上报采集的待处理图像。实例分割终端13接收到的待处理图像,首先确定该图像的第一语义信息、该图像中待分割实例的第一实例特征和对应的第一实例掩模;然后,通过确定的这些第一语义信息、第一实例特征和第一实例掩模,进行多个阶段的语义融合处理,得到第二实例掩模;最后,实例分割终端13将补充了细节信息的第二实例掩模上传至网络12,并通过网络12发送给图像获取终端11。如此,通过引入与第一实例特征分辨率匹配的语义信息,能够补充对待分割实例进行分割时的细节信息,从而大幅度提升待分割实例的分割效果。
作为示例,图像获取终端11可以包括图像采集设备,实例分割终端13可以包括具有信息处理能力的处理设备或远程服务器。网络12可以采用有线连接或无线连接方式。 其中,当实例分割终端为处理设备时,图像获取终端11可以通过有线连接的方式与处理设备通信连接,例如通过总线进行数据通信;当实例分割终端13为远程服务器时,图像获取终端11可以通过无线网络与远程服务器进行数据交互。
或者,在一些场景中,图像获取终端11可以是带有图像采集模组的视觉处理设备,具体实现为带有摄像头的主机。这时,本申请实施例的实例分割方法可以由实例分割终端13执行,上述系统架构可以不包含网络和图像获取终端11。
下面,将结合本公开实施例提供的电子设备的示例性应用和实施,说明本公开实施例提供的实例分割方法。
本申请实施例提供一种实例分割方法,如图1B所示,结合如图1B所示步骤进行说明:
步骤S101,获取待处理图像的第一语义信息,和待处理图像中的待分割实例的第一实例特征和与所述第一实例特征对应的第一实例掩模。
在一些实施例中,待处理图像可以是包括多个或者一个待分割实例的图像,可以是外观复杂的图像还可以是外观简单的图像。待检测图像可以是任意采集设备在任意具有待分割实例的场景下采集到的图像。待处理图像中的待分割实例可以是与应用场景匹配的任意实例。比如,应用场景是人体分割,那么待分割实例为待处理图像中的人体;如果应用场景为车辆分割,那么待分割实例为待处理图像中的车辆。待处理图像的语义信息表征该待处理图像在像素级别的类别描述。比如,对图像中的每个像素都划分出对应的类别,得到像素级别的分类结果。该第一语义信息包括待处理图像的语义特征和该待处理图像的语义掩模。第一语义信息的分辨率与第一实例特征的分辨率相同;该第一实例掩模用于描述待分割实例所对应的图像区域,即描述待分割实例的完整形状。
在一些可能的实现方式中,通过采用特征图金字塔网络,对待处理图像进行特征提取,以得到更加丰富的语义信息和更加准确的实例特征和实例掩模,即上述步骤S101可以通过以下过程实现:
步骤S111,采用特征图金字塔网络,对所述待处理图像进行特征提取,得到包括分辨率不同的多个图像特征的图像特征集合。
在一些实施例中,采用特征图金字塔网络,首先,对待处理图像以自底向上进行特征提取;其次,采用自上而下的方式对提取到的高层特征图进行上采样;再次,通过横向连接,将上采样的结果和自底向上生成的相同大小的特征图进行融合;并,将低分辨率的特征图做2倍上采样(或者,采用最近邻上采样)。最后,通过按元素相加,将上采样映射与相应的自底而上映射合并。这个过程是迭代的,直到生成最终的分辨率图,即得到图像特征集合。
在其他实施例中,还可以是通过获取所述待处理图像在不同分辨率下的多个图像,对这多个图像进行特征提取,得到包括分辨率不同的多个图像特征的图像特征集合。比如,将待处理图像转换为在多个不同分辨率下的图像,该不同分辨率的数量的设定可以是与特征金字塔网络的层数相匹配,即特征金字塔网络如果有4层,那么可以设定5个不同的由大到小的分辨率。在一个具体例子中,可以采用固定的缩放比例对待处理图像进行缩放,从而得到不同分辨率下的多个图像。
步骤S112,基于所述图像特征集合中分辨率满足预设阈值的目标图像特征,确定所述待处理图像的语义信息。
在一些实施例中,所述预设阈值可以是依据图像特征集合中每一个特征图的分辨率大小设定的分辨率阈值;比如,在图像特征集合中确定分辨率最大的图像特征为目标图像特征。通过该目标图像特征,分析该待处理图像全图的语义信息。如此,通过将高分辨率的目标图像特征作为语义分割分支网络的输入,预测该待处理图像全图的语义信息,能够为后续的实例分割提供更加细节的信息。
步骤S113,采用感兴趣区域对齐操作,在所述待处理图像的特征图集合中选择满足预设分辨率的第一图像特征。
在一些实施例中,步骤S113可以是采用感兴趣区域对齐(Region of Interest Align,RoI-Align)操作,首先,对于待待处理图像中每一个待分割实例的检测框,从特征图金字塔中选取该检测框内的特征图;然后,将每一个检测框对应的特征图调整为预设尺寸的大小,即可得到满足预设分辨率的第一图像特征。比如,采用14×14的感兴趣区域对齐操作,设定14×14的特征图,得到第一图像特征。在一些可能的实现方式中,实例分割分支网络可以是由完全卷积的实例分割分支网络实现的,比如,全卷积神经网络(Fully-Convolutional Network,FCN)、掩模区域卷积神经网络(Mask Region Convolutional Neural Network,Mask R-CNN)或者高效残差分解卷积网络(Efficient Residual Factorized ConvNet)等。
步骤S114,基于所述第一图像特征,确定所述第一实例特征和所述第一实例掩模。
在一些实施例中,通过对已经设定成预设尺寸大小的第一图像特征,进行卷积操作,得到第一实例特征;基于该第一实例特征对待分割实例的完整形状进行预测,得到第一实例掩模。如此,通过包含多个分辨率的特征图金字塔,预测待分割实例的第一实例特征和第一实例掩模,能够提高预测的第一实例掩模的准确度。
步骤S115,采用所述感兴趣区域对齐操作,在所述语义信息中选择分辨率为所述预设分辨率的所述第一语义信息。
在一些实施例中,通过采用相同的感兴趣区域对齐操作,在语义信息中选择分辨率与第一实例特征的分辨率相同的第一语义特征和第一语义掩模,得到第一语义信息。这样,通过采用该网络的卷积层预测第一实例特征和第一实例掩模,从而得到粗略且完整的实例掩模,便于后续阶段中基于该粗糙且完整的第一实例掩模进行精细化分割,能够进一步补充细节损失。
步骤S102,基于所述第一语义信息、所述第一实例特征和所述第一实例掩模,进行至少两个阶段的语义融合处理,得到第二实例掩模。
在一些实施例中,将前一次所述语义融合处理输出的第一实例特征进行上采样后得到的实例特征和与所述实例特征对应的实例掩模,作为后一次所述语义融合处理的输入特征;且,每一次所述语义融合处理的输入特征中的语义信息的分辨率与实例特征的分辨率相同。首先,将第一语义信息、所述第一实例特征和所述第一实例掩模作为第一次语义融合处理的输入;然后,将第一次语义融合处理输出的实例特征,进行上采样后得到一实例特征;将该实例特征与对应的实例掩模,以及分辨率与该实例特征的分辨率相同的语义特征、语义掩模,作为第二次语义融合处理的输入。最后,基于第二次第二次语义融合处理的输出,得到能够精确且完整的描述待分割实例完整形状的第二实例掩模。
在本公开实施例中,对于获取的包含待分割实例的待处理图像,首先确定该图像的第一语义信息和待分割实例的第一实例特征和第一实例掩模;然后,通过对第一语义信息、第一实例特征和第一实例掩模进行多次语义融合处理,得到第二实例掩模。这样,将前一次所述语义融合处理输出的第一实例特征进行上采样后得到的实例特征和与所述实例特征对应的实例掩模,作为后一次所述语义融合处理的输入特征,而且通过引入与第一实例特征分辨率匹配的语义信息,能够补充对待分割实例进行分割时的细节信息,从而大幅度提升待分割实例的分割效果。
在一些实施例中,通过采用语义分割分支网络,对输入的特征图金字塔中高分辨率的图像特征,进行语义分割,得到描述该待处理图像的语义信息,即上述步骤S112可以通过以下步骤实现:
第一步,基于所述目标图像特征,对所述待处理图像进行语义分割,得到语义特征。
在一些实施例中,将目标图像特征输入语义分割分支网络中,该语义分割分支网络 可以是包括四个卷积层,以提取整个图像的语义特征。
第二步,基于所述语义特征,确定所述待处理图像中每一像素属于所述待分割实例的概率。
在一些实施例中,语义分割分支网络通过卷积层提取图像的语义特征之后,通过二分类的分类器预测每个像素属于图像中实例的概率,即待处理图像中每一像素属于所述待分割实例的概率。比如,待分割实例为车辆,那么通过二分类的分类器预测每个像素属于车辆的概率,从而实现对待处理图像的语义掩模的预测。
第三步,基于所述概率,确定所述待处理图像的语义掩模,并将所述语义特征和所述语义掩模,作为所述语义信息。
在一些实施例中,在语义分割分支网络中,通过在二分类交叉熵损失的监督下,预测整个图像的高分辨率的语义掩模。如此,通过上述第一步至第三步,在语义分割分支网络中,通过对高分辨率的图像特征进行语义分割,得到该图像的语义特征,而且通过采用二分类交叉熵损失,预测该图像的语义掩模,从而得到细节信息丰富的语义信息。
在一些实施例中,对第一语义信息、第一实例特征和第一实例掩模进行多阶段细化,能并且每一阶段输出的结果的分辨率大于其前一阶段输出结果的分辨率,从而能够为每一个待分割实例输出高分辨率的实例掩模,即上述步骤S102可以通过如图2所示的步骤实现,图2为本公开实施例提供的实例分割方法的另一实现流程示意图,结合图1和2所示的步骤进行以下说明:
步骤S201,基于所述第一语义信息、所述第一实例特征和所述第一实例掩模,进行第一阶段的语义融合处理,得到第二实例特征。
在一些实施例中,第一实例特征可以是14×14的实例特征,第一实例掩模为实例特征对应的14×14的实例掩模;第一语义信息为14×14的第一语义特征和14×14的第一语义掩模。通过将14×14的第一语义特征、14×14的第一语义掩模、14×14的实例特征和14×14的实例掩模输入语义融合模块,在语义融合模块中,分别对输入的四部分进行融合,得到具有更多细节信息的第二实例特征。
步骤S202,基于所述第二实例特征、与所述第二实例特征对应的阶段实例掩模和第二语义信息进行至少一阶段的融合处理,得到所述第二实例掩模。
在一些实施例中,第二语义信息的分辨率与所述第二实例特征的分辨率相同。第二语义信息可以是采用感兴趣区域对齐操作,在通过语义分割分支网络,对待处理图像进行语义分割得到的语义信息中,选取的与第二实例特征的分辨率相同的语义特征和语义掩模。将第一次语义融合处理输出的第二实例特征进行上采样,将上采样后实例特征、相同分辨率的阶段实例掩模以及相同分辨率的第二语义信息,输入第二阶段的语义融合模块中,进行第二次的语义融合处理。基于第二次语义融合处理的结果,得到描述待分割实例的边缘区域的空洞掩模,并且将该空洞掩模与描述完整形状的阶段实例掩模相结合,得到能够描述待分割实例完整形状的第二实例掩模。
在一些实施例中,通过采用语义融合模块对第一语义特征和第一实例特征进行融合,得到第一融合特征,即上述步骤S201可以通过以下步骤实现:
步骤S211,将所述第一语义信息中的第一语义特征和所述第一实例特征进行融合,得到第一融合特征。
在一些实施例中,第一语义信息中包括:第一语义特征和第一语义掩模;第一语义特征与所述第一实例特征的分辨率相同;第一语义掩模与第一实例掩模的分辨率相同。在一些可能的实现方式中,在语义分割分支网络输出的语义信息,即输出的语义特征和语义掩模。由于第一实例特征的分辨率为14×14,所以首先采用14×14的RoI-Align操作,在语义分割分支网络输出的语义特征中,选择分辨率为14×14的第一语义特征。然后,在所述语义掩模中,确定与所述第一实例掩模的分辨率相匹配的第一语义掩模。比 如,由于第一实例掩模的分辨率14×14,所以采用14×14的RoI-Align操作,在语义分割分支网络输出的语义掩模中,选择分辨率为14×14的第一语义掩模。
其中,将第一语义特征和第一实例特征进行融合,可以是通过将第一语义特征和第一实例特征输入卷积层中,采用多种不同卷积范围对第一语义特征和第一实例特征进行卷积,以及,逐元素求和等操作,从而得到第一融合特征。在一些可能的实现方式中,可以通过以下步骤实现:
第一步,采用第一卷积操作,对所述第一语义特征和所述第一实例特征进行处理,得到第一卷积特征。
在一些实施例中,第一卷积操作可以是卷积核小于一定阈值的卷积网络还可以是一个卷积计算,比如,1×1的卷积层。将第一语义特征、第一实例特征、第一语义掩模和第一实例掩模,共同输出语义融合模块;采用1×1卷积层,对输入的第一语义特征和第一实例特征进行卷积,从而得到第一卷积特征。
第二步,分别采用多个第二卷积操作,对所述第一卷积特征进行处理,得到多个第二卷积结果。
在一些实施例中,第一卷积操作的卷积核小于所述第二卷积的卷积核,且所述多个第二卷积操作的空洞大小不同,即采用这多个第二卷积操作对输入的特征进行卷积操作时,单个卷积的卷积覆盖范围不同。比如,采用具有不同空洞大小的三个并行3×3卷积层,对第一卷积特征进行处理,得到多个第二卷积结果。
第三步,基于所述多个第二卷积结果,得到所述第一融合特征。
在一些可能的实现方式中,将这多个第二卷积结果进行逐元素求和,得到第一融合特征。如此,首先采用卷积核较小的卷积层对输入的特征进行卷积,能够降低通道大小;然后,通过采用卷积核较大,且空洞不同的多个卷积层对卷积后的特征进一步进行卷积处理;最后,将多个卷积结果相融合,使得到的融合特征能够充分保留待分割实例局部细节信息。
步骤S212,将所述第一融合特征、所述第一语义信息中第一语义掩模和所述第一实例掩模相连接,得到所述第二实例特征。
在一些实施例中,在语义融合模块中,通过对输入的第一语义特征和第一实例特征进行融合之后,首先,对输入语义融合模块中的第一语义掩模和第一实例掩模的分辨率进行放大;然后,将融合得到的第一融合特征、分辨率放大后的语义掩模和实例掩模,按照从前到后的顺序拼接起来,或者是按照任意顺序拼接起来,得到第二实例特征。
上述步骤S211和步骤S212中,通过采用语义融合模块对待分割图像的特征和掩模进行融合,能够得到具更细粒度的第二实例特征。
在一些实施例中,通过对第二实例特征、阶段实例掩模和第二语义信息进行第二阶段的语义融合处理,关注待分割实例的边缘区域,从而能够更加准确地分割实例的边缘区域,从而大幅度提升分割效果,即上述步骤S202可以通过以下步骤实现:
步骤S221,对所述第二实例特征、所述阶段实例掩模和所述第二语义信息进行第二阶段的语义融合处理,得到第三实例特征和所述第三实例特征对应的第一空洞掩模。
在一些实施例中,第二实例特征的分辨率大于第一实例特征的分辨率。通过将第二实例特征、所述阶段实例掩模和所述第二语义信息输入第二阶段的语义融合模块中,得到第三实例特征;对该第三实例特征的边缘区域进行预测,得到第一空洞掩模。即,第一空洞掩模用于描述待分割实例的边缘区域。如此,通过在对待分割实例进行实例分割的中,进一步对待分割实例的实例特征以及语义特征进行第二次融合处理,能够引入更高分辨率的语义分割特征,从而使得到分割的边缘区域更加准确。
其中,该边缘区域为以待分割实例的边缘线为基准,与该边缘线的距离小于一定距离阈值的像素点所组成的区域。在一个具体例子中,以待分割实例为车辆为例,该边缘 区域为以车辆在待处理图像中的边缘线为中心形成的包括部分前景(即车辆图像区域),以及,部分背景的图像区域。
步骤S222,基于所述第一空洞掩模和所述阶段实例掩模,确定第三实例掩模。
在一些实施例中,通过将第一空洞掩模和阶段实例掩模输入边界感知细化模块中;在该模块中,通过对第一空洞掩模的边缘区域进行分辨率放大,结合预测结果和阶段实例掩模,能够预测待分割实例的完整形状,得到第三实例掩模。
在一些可能的实现方式中,对实例分割过程的第二阶段中,通过结合第一阶段输出的完整形状的第一实例掩模和第二阶段输出的描述边缘区域的第一空洞掩模,得到能够更加精确描述完整形状的第三实例掩模;即上述步骤S222可以通过以下步骤实现:
第一步,确定所述阶段实例掩模中的边缘区域。
在一些实施例中,在实例分割分支网络中,按照阶段实例掩模中的第一实例掩模所表征的待分割实例的形状,预测该形状在待处理图像中的边缘区域。在一些可能的实现方式中,首先,基于所述阶段实例掩模,确定所述待分割实例的边缘线;比如,通过分析第一实例掩模所表征的待分割实例的完整形状,确定该实例的边缘线。然后,在所述待处理图像中,确定与所述边缘线之间的最小距离小于预设距离的像素点集合;比如,在待处理图像中,分别确定每一像素点到离该像素点最近的边缘线,之间的距离;将距离小于预设距离的像素点,组成像素点集合。最后,基于所述像素点集合,确定所述阶段实例掩模中的边缘区域。比如,通过对该像素点集合中的像素点进行拟合,形成一个图像区域,即边缘区域;该边缘区域中包括边缘线与背景相邻的图像区域,以及,边缘线与实例本身相邻的图像区域。如此,通过分析像素点与待分割实例的边缘线之间的距离,能够更加充分地保留待分割实例的边缘区域的细节信息。
第二步,基于所述边缘区域和所述第一空洞掩模,确定描述所述待分割实例的边缘区域的边缘掩模。
在一些实施例中,通过对该边缘区域进行上采样,并将上采样后的区域与第一空洞掩模进行融合,得到描述分割实例的边缘区域的边缘掩模。在一些可能的实现方式中,首先,基于所述第一空洞掩模的分辨率,对所述阶段实例掩模中的边缘区域进行上采样,得到第一边缘区域;比如,第一空洞掩模的分辨率为28×28,按照该分辨率对预测的边缘区域进行上采样,得到28×28的第一边缘区域。然后,基于所述第一边缘区域和所述第一空洞掩模,得到所述边缘掩模。比如,将第一边缘区域和第一空洞掩模进行逐元素相乘,得到该边缘掩模。如此,通过结合阶段实例掩模的第一边缘区域和对待分割实例进行边缘区域预测的第一空洞掩模,能够更加准确的预测待分割实例的边缘区域。
第三步,基于所述边缘区域和所述阶段实例掩模,确定描述所述待分割实例的非边缘区域的非边缘掩模。
在一些实施例中,将分辨率放大后的边缘区域与分辨率放大后的阶段实例掩模进行逐元素相乘,得到描述待分割实例的非边缘区域的非边缘掩模。
在一些可能的实现方式中,可以通过以下过程实现上述第二步:
首先,基于所述第一空洞掩模的分辨率对所述阶段实例掩模进行上采样,得到放大实例掩模;比如,按照第一空洞掩模的分辨率,对所述第一实例掩模的分辨率进行上采样,得到放大实例掩模。
然后,对第一边缘区域进行反转操作,得到反转掩模;比如,首先,基于第一实例掩模,分析边缘区域所在的边缘掩模;其次,将该边缘掩模的分辨率进行上采样,使得上采样后的掩模分辨率与放大实例掩模的分辨率相同;最后,对上采样后的掩模进行反转操作,得到反转掩模。比如,上采样后的掩模中的元素值为0或1;将上采样后的掩模中的元素值为1的变为0;元素值为0的变为1。
最后,基于所述反转掩模和所述放大实例掩模,得到所述非边缘掩模。比如,将反 转掩模和放大实例掩模进行逐元素相乘,得到不包含边缘区域的非边缘掩模。
第四步,基于所述非边缘掩模和所述边缘掩模,确定所述第三实例掩模。
在一些实施例中,将非边缘掩模和边缘掩模进行逐元素相加,得到能够准确描述待分割实例的完整形状的第三实例掩模。
步骤S223,对所述第三实例特征、所述第一空洞掩模和第三语义信息进行第三阶段的语义融合处理,得到第四实例特征和所述第四实例特征对应的第二空洞掩模。
在一些实施例中,将上一阶段输出的第三实例特征、描述边缘区域的第一空洞掩模和分辨率相同的第三语义信息输入语义融合模块中,得到待分割实例的第四实例特征,和对该第四实例特征的边缘区域进行预测得到的第二空洞掩模。
步骤S224,基于所述第二空洞掩模和所述第三实例掩模,确定所述第二实例掩模。
在一些实施例中,将描述边缘区域的第二空洞掩模和描述完整形状的第三实例掩模相结合,得到能够更加精确分割待分割实例的第二实例掩模。
在本公开实施例中,通过对语义信息、实例特征和实例掩模进行多阶段融合处理的过程中,预测待分割实例的边缘区域,能够为每一个待分割实例预测准确的边缘区域。
下面,将说明本公开实施例在一个实际的应用场景中的示例性应用,以采用高质量实例分割框架实现高质量实例分割为例,进行说明。
通用物体检测和实例分割的目标是检测出图片中的物体并要求分割出物体的像素。高质量的实例分割要求模型不仅能够分割出图片中的物体,而且要在像素级别达到高准确率,尤其是物体的边缘区域。前者要求模型提取高层次语义信息,而后者要求模型尽可能地保留细节信息。在相关技术中,两阶段实例分割算法基于物体检测框在特征金字塔上分别为每个物体提取特征,并且在提取特征过程中采用了下采样操作以处理不同尺度的物体,而使用特征金字塔和下采样操作都会导致细节信息的损失,使得最终模型难以在像素级别达到高准确率。
相关技术中,实例分割用于将每个像素分配到特定的语义类别中,并区分同一类别中的实例。例如,以Mask R-CNN为例,首先,使用实例检测器生成高质量的边界框;然后,引入并行分割分支以预测边界框内每个实例的二分类掩模;在后面的步骤中,例如,RoI-Align的合并操作,从特征金字塔提取实例特;最后,基于实例分割分支网络的输出特征执行逐像素分类。
尽管实例检测器提供了强大的定位和区分实例的能力,但Mask R-CNN却丢失图像细节,这对于高质量的实例分割任务是必不可少的,如图3(a)所示,长颈鹿301和302丢失了图像细节。细节的丢失主要是由于两个因素:首先,馈入合并操作的特征来自特征金字塔的多个级别,而较高级别的特征通常会导致较粗糙的空间分辨率。对于这些高级特征,在将掩模预测映射回输入空间时,很难保留细节。其次,合并操作进一步将特征的空间尺寸减小到较小的尺寸,也会导致信息丢失。
与实例分割相反,语义分割是在不区分实例的情况下将每个像素分类为一组固定的类别。由于语义分段不需要极端的高级特征来区分大型实例,因此可以充分利用高分辨率特征。相关技术中的语义分割方法利用高分辨率特征来生成高质量的语义表示,分割清晰的实例边界。如图3(b)中的长颈鹿321和322所示。
基于此,本公开实施例提供一种高质量实例分割框架,用于对实例和场景进行高质量的实例分割,在实例级分割过程中以多阶段方式合并细粒度的特征。通过逐步融合更详细的信息,高质量实例分割框架能够完善高质量的掩模。如此,通过在分割过程中补充损失的细节信息,可以有效地提升像素级别的分割准确率,同时保留现有算法的优势,从而实现高质量实例分割。
本公开实施例通过采用当前两阶段方法来进行实例分割,以区分实例,并在实例分割过程中用细粒度特征补充丢失的细节。为此本公开实施例提出细化掩模(Refine Mask) 的新框架。细化掩模在要素金字塔的最高分辨率要素图上构建新的语义分割分支网络,以生成细粒度的语义特征。这些细粒度的特征用于在按实例分割过程中补充丢失的细节。在感兴趣区域对齐操作之后,细化掩模逐步扩大预测尺寸并合并细粒度特征,能够减轻高质量实例掩模预测的细节损失。此外,细化掩模使用边界感知细化策略将重点放在边缘区域上,能够预测更准确的边界。通过迭代地融合更多细粒度的特征并明确地关注边缘区域,细化掩模能够实现更高质量的掩模。如图3(c)所示,细化掩模输出的质量分割结果如长颈鹿331和332所示,由此可见,细化掩模在实例边界之类的硬区域中可以获得充分的细节特征。
图4为本公开实施例提供的细化掩模的框架示意图,如图4所示,细化掩模框架基于检测器特征图金字塔网络401,通过两个小的网络模块,即语义分割分支网络402和实例分割分支网络实现高质量实例细分。
语义分割分支网络402将来自检测器特征图金字塔网络401的特征金字塔的最高分辨率特征图作为输入,并执行语义分割。语义分割分支网络的输出保持与输入相同的分辨率,而无需使用空间压缩操作(例如,下采样)。语义分割分支网络生成的细粒度特征用于促进实例分割分支网络中的实例分割。
实例分割分支网络以多阶段方式执行实例分割。在每个阶段,实例分割分支网络都包含语义特征和从细粒度特征中提取的语义掩模,并增加了特征的空间大小,从而能够进行更好的实例掩模预测。除此之外,在实例分割分支网络中提出了一种边界感知的细化策略,明确地专注于边缘区域,预测更清晰的边界。
在本公开实施例中,语义分割分支网络是输入为特征图金字塔网络的最高分辨率特征图的完全卷积神经网络。语义分割分支网络由四个卷积层组成,以提取整个图像的语义特征,并通过二分类的分类器预测每个像素属于物体的概率。在二分类交叉熵损失的监督下,预测整个图像的高分辨率语义掩模。将细粒度特征定义为语义特征和语义掩模的并集。这些细粒度特征还可以用来补充实例分割分支网络中丢失的细节,从而实现高质量的语义掩模预测。如图4所示,将特征图金字塔网络401的最高分辨率特征图输入语义分割分支网络402中,输出语义特征和语义掩模403。
实例分割分支是一个完全卷积的实例分割分支网络。在实例分割分支网络中,首先,将通过14×14感兴趣区域对齐操作提取的特征馈送到两个3×3卷积层中以生成实例特征。然后,采用1×1的卷积层来预测实例掩模,但是该掩模的空间大小为14×14。该粗略掩模用作以后的精细化阶段的掩模。
经过上述过程,能够得到粗糙的实例掩模。接下来,本公开实施例提出了一个多阶段的优化过程,以迭代的方式来优化粗糙的实例掩模。每个阶段的输入均由四个部分组成,包括:在前一阶段获得的实例特征和实例掩模,以及,从语义分割分支网络的输出中汇集的语义特征和语义掩模。比如,首先,使用语义融合模块集成这些输入;然后,将融合后的特征按比例上采样到更大的空间。实例分割分支网络反复运行此优化过程,并输出分辨率高达112×112的高质量实例掩模。在按比例缩放到更高的空间之前,语义融合模块中的融合特征使用1×1卷积层压缩以将其通道减半。因此,尽管特征的空间大小越来越大,但是引入的额外计算成本却非常低。如图4所示,对特征图金字塔网络401的特征金字塔进行感兴趣区域对齐操作,得到固定大小的实例特征404,对该实例特征404进行卷积操作,得到卷积后的实例特征405。基于卷积后的实例特征405进行掩模预测,得到14×14的初始掩模。在第一阶段中,采用感兴趣区域对齐操作从语义分割分支网络402的输出中汇集的语义特征和语义掩模403中取出大小为14×14语义特征和语义掩模。将特征图金字塔网络401的卷积后的实例特征405、14×14初始掩模、14×14语义特征和语义掩模,输入第一阶段的语义融合模块411中;然后,语义融合模块411对这四部分的内容进行融合,将融合后的特征按比例上采样到更高的空间,输出28×28的 实例特征406;并基于28×28的实例特征,预测该实例特征的完整的28×28的实例掩模。
在第二阶段中,采用感兴趣区域对齐操作从语义分割分支网络402的输出中汇集的语义特征和语义掩模403中取出大小为28×28语义特征和语义掩模;将实例特征406、完整的28×28的实例掩模、28×28语义特征和语义掩模,输入第二阶段的语义融合模块412中;然后,语义融合模块412对这四部分的内容进行融合,将融合后的特征按比例上采样到更高的空间,输出56×56的实例特征407;并基于56×56的实例特征,预测该实例特征的边缘区域的实例掩模409。通过采用边界感知细化(Boundary-Aware Refinement,BAR)对第一阶段得到表征实例完整形状的28×28的实例掩模与实例掩模409相结合,得到能够表征实例完整形状的56×56的实例掩模;这样,通过进一步提高实例特征的分辨率,以及对细节信息的补充,使得到的实例掩模表征的实例完整形状的效果更好。
在第三阶段中,采用感兴趣区域对齐操作从语义分割分支网络402的输出中汇集的语义特征和语义掩模403中取出大小为56×56语义特征和语义掩模;将实例特征407、实例掩模409、56×56语义特征和语义掩模,输入第三阶段的语义融合模块413中;然后,语义融合模块413对这四部分的内容进行融合,将融合后的特征按比例上采样到更高的空间,输出112×112的实例特征408;并基于112×112的实例特征,预测该实例特征的边缘区域的实例掩模410。通过将第二阶段得到表征实例完整形状的56×56的实例掩模与实例掩模410相结合,得到能够表征实例完整形状112×112的实例掩模;这样,通过更进一步提高实例特征的分辨率,以及对细节信息的补充,使得到的112×112的实例掩模表征的实例完整形状更加精确。
为了更好地集成细粒度特征,本公开实施例提出语义融合模块,以使得实例分割分支网络中的每个神经元都能感知其周围环境。如图5所示,语义融合模块连接了四个输入部分51至54,首先,在上述每个阶段中,在1×1卷积层之后融合这些特征,得到融合的实例特征501(对应于上述实施例中的第一卷积特征),并降低通道大小。然后,通过使用具有不同空洞大小的三个并行3×3卷积层(其中,一个卷积层的空洞为1,一个卷积层的空洞为3,一个卷积层的空洞为5),分别对融合的实例特征501进行卷积操作,得到卷积结果502、503和504;将卷积结果502、503和504进行逐元素求和,得到第一融合特征;这样,将卷积结果502、503和504融合到单个神经元周围,同时能够保留局部细节。最后,将实例掩模和语义掩模再次与第一融合特征连接起来,得到能够作为后续预测的指引505。
本公开实施例提出了一种边界感知的细化策略来关注边缘区域,能够准确的预测实例掩模的边界。对于每一实例,第一阶段输出大小为28×28的粗略且完整的实例掩模M 1,并且生成其边界掩模
Figure PCTCN2021124726-appb-000001
在后续阶段中生成更精细和更完整的实例掩模M' k(阶段k的最终输出),可以表示为公式(1)和(2)所示:
M' k=M 1   (1);
Figure PCTCN2021124726-appb-000002
其中,
Figure PCTCN2021124726-appb-000003
表示逐像素乘法,
Figure PCTCN2021124726-appb-000004
表示第k-1阶段预测掩模的边缘区域。如图6所示,图6为本公开实施例提供的实例分割的第二阶段的推理过程示意图,首先,基于第一阶段(即当前阶段的上一阶段)得到的28×28的实例掩模601,得到表示实例掩模的边缘区域602,并将28×28的实例掩模601上采样为56×56的实例掩模611;其次,将边缘区域602上采样到56×56的像素空间中,得到已上采样边界掩模603(对应于上述实施例中的第一边缘区域);并对已上采样边界掩模603进行反转操作,即将已上采样边界掩 模603中的元素值为1的反转为0;元素值为0的反转为1,得到掩模604;即对上一阶段的阶段实例掩模和边缘区域分别上采样,得到与当前实例掩模分辨率相同的实例掩模611和已上采样边界掩模603。再次,将掩模604与当前阶段输出的56×56的实例掩模611进行逐元素相乘,得到相乘结果(对应于上述实施例中的非边缘掩模);再将已上采样边界掩模603与第二阶段(即当前阶段)产生的第一空洞掩模605相乘,得到另一相乘结果(对应于上述实施例中的边缘掩模);最后,将这两个相乘结果进行逐元素求和,得到56×56的完整且精细的第二实例掩模606,重复图6所示的过程,直到获得最好的掩模。
采用Mask R-CNN作为基础,并用多级细化分支替换默认的实例分割分支网络,在默认的情况下,实例分割分支网络中有三个细化阶段。
在本公开实施例中,过引入高分辨率的语义分割特征,在分割过程中逐阶段补充损失的细节信息,使得模型能够更加准确地分割物体边缘区域,从而大幅度提升了最终的分割效果。
在一些实施例中,对于本公开实施例中关于对待分割实例的边缘区域进行优化的训练过程如下:
令M k表示阶段k的二分类实例掩模,实例掩模的空间大小可以表示为14·2 k×14·2 k,其中k=1,2,3。M k的边缘区域定义为由与掩模轮廓的距离小于
Figure PCTCN2021124726-appb-000005
个像素的像素构成的区域。采用二分类掩模B k来表示M k的边缘区域,并且B k可以表示为公式(3)所示:
Figure PCTCN2021124726-appb-000006
其中,(i,j)表示像素p ij在M k中的位置,d ij表示从像素p ij到其在掩模轮廓上最近的像素的欧几里得距离。如图7所示,图7为本公开实施例提供的实例边缘区域的应用场景示意图,d ij为从画面中的像素p ij到距离该像素最近的轮廓701的距离,从轮廓701到边界线702形成的区域,以及从轮廓701到边界线703形成的区域,这两个区域组成边缘区域704。本公开实施例采用卷积算子来近似得到边缘区域,从而能够有效确定d ij。由于实例具有不同的比例,因此,首先将实例掩模调整为固定大小。例如,通过在第一阶段中使用28×28,在第二阶段中使用56×56,确定掩模边界。如图4所示,首先,在第二阶段中,将完整的28×28的实例掩模与实例掩模409输入边缘细化模块421中,通过边缘细化模块421预测该实例的完整的实例掩模410。然后,在三阶段中,将完整的实例掩模410与有实例掩模410输入边缘细化模块421中,通过边缘细化模块421预测该实例完整且精细的实例掩模。
在图4所示的框架示意图的第一阶段中,预测了大小为28×28的完整实例掩模。在输出大小为56×56和112×112的两个后续阶段中,对边缘区域进行监督训练。这些边缘区域R k由真值掩模与该阶段的前一阶段的预测掩模确定,如公式(4)所示:
Figure PCTCN2021124726-appb-000007
其中,f up表示比例因子为2的双线性上采样操作,
Figure PCTCN2021124726-appb-000008
表示第k-1阶段标注掩模的边缘区域,
Figure PCTCN2021124726-appb-000009
表示第k-1阶段预测掩模的边缘区域,∨表示以上两个边缘区域的并集。输出大小为S k×S k的第k个阶段(k=2,3)的训练损失L k,可以表示为如公式(5)和(6)所示:
Figure PCTCN2021124726-appb-000010
Figure PCTCN2021124726-appb-000011
其中,N是实例数,l nij是实例n在像素位置(i,j)的二分类交叉熵损失。
在本公开实施例中,对于最后两个精炼阶段,采用公式(5)中定义的损失。对于语义分割分支网络和其他掩模预测阶段,采用平均二分类交叉熵损失。初始掩模预测阶段和三个细化阶段的损失权重分别设置为0.25、0.5、0.75和1.0。为了平衡检测头和掩模头之间的损失,将检测头的损失权重设置为2.0,其中,包括分类和回归损失。在训练阶段中设置
Figure PCTCN2021124726-appb-000012
为2,在推理阶段中设置
Figure PCTCN2021124726-appb-000013
为1。
在本公开实施例中,通过引入高分辨率的语义分割特征,在分割过程中逐阶段补充损失的细节信息,使得模型能够更加准确地分割实例的边缘区域,从而大幅度提升了最终的分割效果。
本公开实施例提供一种实例分割装置,图8为本公开实施例实例分割装置的结构组成示意图,如图8所示,所述实例分割装置800包括:
第一获取模块801,配置为获取待处理图像的第一语义信息,和所述待处理图像中的待分割实例的第一实例特征和与所述第一实例特征对应的第一实例掩模;
第一处理模块802,配置为基于所述第一语义信息、所述第一实例特征和所述第一实例掩模,进行至少两个阶段的语义融合处理,得到第二实例掩模;
其中,将前一阶段所述语义融合处理输出的第一实例特征进行上采样得到后一阶段的实例特征,并基于所述后一阶段的实例特征得到其对应的实例掩模,将所述后一阶段的所述实例特征、所述后一阶段的实例掩膜和所述后一阶段对应的语义信息作为后一阶段语义融合处理的输入特征;且,每一阶段所述语义融合处理的输入特征中的语义信息的分辨率与实例特征的分辨率相同。
在一些实施例中,所述第一处理模块802,包括:
第一处理子模块,配置为基于所述第一语义信息、所述第一实例特征和所述第一实例掩模,进行第一阶段的语义融合处理,得到第二实例特征;
第二处理子模块,配置为基于所述第二实例特征、与所述第二实例特征对应的阶段实例掩模和第二语义信息进行至少一阶段的语义融合处理,得到所述第二实例掩模;其中,所述第二语义信息的分辨率与所述第二实例特征的分辨率相同。
在一些实施例中,所述第一处理子模块,包括:
第一融合单元,配置为将所述第一语义信息中的第一语义特征和所述第一实例特征进行融合,得到第一融合特征;
第一连接单元,配置为将所述第一融合特征、所述第一语义信息中第一语义掩模和 所述第一实例掩模相连接,得到所述第二实例特征。
在一些实施例中,所述第一融合单元,包括:
第一卷积子单元,配置为采用第一卷积操作,对所述第一语义特征和所述第一实例特征进行处理,得到第一卷积特征;
第二卷积子单元,配置为分别采用多个第二卷积操作,对所述第一卷积特征进行处理,得到多个第二卷积结果;其中,所述第一卷积操作的卷积核小于所述第二卷积的卷积核,且所述多个第二卷积操作的空洞大小不同;
第一确定子单元,配置为基于所述多个第二卷积结果,确定所述第一融合特征。
在一些实施例中,所述第二处理子模块,包括:
第一处理单元,配置为对所述第二实例特征、所述阶段实例掩模和所述第二语义信息进行第二阶段的语义融合处理,得到第三实例特征和所述第三实例特征对应的第一空洞掩模;
第一确定单元,配置为基于所述第一空洞掩模和所述阶段实例掩模,确定第三实例掩模;
第二处理单元,配置为对所述第三实例特征、所述第一空洞掩模和第三语义信息进行第三阶段的语义融合处理,得到第四实例特征和所述第四实例特征对应的第二空洞掩模;
第二确定单元,配置为基于所述第二空洞掩模和所述第三实例掩模,确定所述第二实例掩模。
在一些实施例中,所述第一处理单元,包括:
第一处理子单元,配置为对所述第二实例特征、所述阶段实例掩模和所述第二语义信息进行第二阶段的语义融合处理,得到所述第三实例特征;
第一预测子单元,配置为对所述第三实例特征中的边缘区域进行预测,得到所述第一空洞掩模。
在一些实施例中,所述第二确定单元,包括:
第二确定子单元,配置为确定所述阶段实例掩模中的边缘区域;
第三确定子单元,配置为基于所述边缘区域和所述第一空洞掩模,确定描述所述待分割实例的边缘区域的边缘掩模;
第四确定子单元,配置为基于所述边缘区域和所述阶段实例掩模,确定描述所述待分割实例的非边缘区域的非边缘掩模;
第五确定子单元,配置为基于所述非边缘掩模和所述边缘掩模,确定所述第三实例掩模。
在一些实施例中,所述第三确定子单元,还配置为:基于所述第一空洞掩模的分辨率,对所述阶段实例掩模中的边缘区域进行上采样,得到第一边缘区域;基于所述第一边缘区域和所述第一空洞掩模,得到所述边缘掩模。
在一些实施例中,所述第四确定子单元,还配置为:基于所述第一空洞掩模的分辨率对所述阶段实例掩模进行上采样,得到放大实例掩模;对第一边缘区域进行反转操作,得到反转掩模;基于所述反转掩模和所述放大实例掩模,得到所述非边缘掩模。
在一些实施例中,所述第二确定子单元,还配置为:基于所述阶段实例掩模,确定所述待分割实例的边缘线;在所述待处理图像中,确定与所述边缘线之间的最小距离小于预设距离的像素点集合;基于所述像素点集合,确定所述阶段实例掩模中的边缘区域。
在一些实施例中,所述装置还包括:
第一提取模块,配置为采用特征图金字塔网络,对所述待处理图像进行特征提取,得到包括分辨率不同的多个图像特征的图像特征集合;
第一确定模块,配置为基于所述图像特征集合中分辨率满足预设阈值的目标图像特 征,确定所述待处理图像的语义信息。
在一些实施例中,所述第一确定模块,包括:
第一分割子模块,配置为基于所述目标图像特征,对所述待处理图像进行语义分割,得到语义特征;
第一确定子模块,配置为基于所述语义特征,确定所述待处理图像中每一像素属于所述待分割实例的概率;
第二确定子模块,配置为基于所述概率,确定所述待处理图像的语义掩模;
第三确定子模块,配置为将所述语义特征和所述语义掩模,作为所述语义信息。
在一些实施例中,所述第一获取模块801,包括:
第一对齐子模块,配置为采用感兴趣区域对齐操作,在所述待处理图像的特征图集合中选择满足预设分辨率的第一图像特征;
第四确定子模块,配置为基于所述第一图像特征,确定所述第一实例特征和所述第一实例掩模;
第二对齐子模块,配置为采用所述感兴趣区域对齐操作,在所述语义信息中选择分辨率为所述预设分辨率的所述第一语义信息。
需要说明的是,以上装置实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本公开装置实施例中未披露的技术细节,请参照本公开方法实施例的描述而理解。
需要说明的是,本公开实施例中,如果以软件功能模块的形式实现上述的实例分割方法,并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是终端、服务器等)执行本公开各个实施例所述方法的全部或部分。而前述的存储介质包括:U盘、运动硬盘、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。这样,本公开实施例不限制于任何特定的硬件和软件结合。
对应地,本公开实施例再提供一种计算机程序产品,所述计算机程序产品包括计算机可执行指令,该计算机可执行指令被执行后,能够实现本公开实施例提供的实例分割方法中的步骤。
相应的,本公开实施例再提供一种计算机存储介质,所述计算机存储介质上存储有计算机可执行指令,所述该计算机可执行指令被处理器执行时实现上述实施例提供的实例分割方法的步骤。
相应的,本公开实施例提供一种计算机设备,图9为本公开实施例计算机设备的组成结构示意图,如图9所示,所述计算机设备900包括:一个处理器901、至少一个通信总线、通信接口902、至少一个外部通信接口和存储器903。其中,通信接口902配置为实现这些组件之间的连接通信。其中,通信接口902可以包括显示屏,外部通信接口可以包括标准的有线接口和无线接口。其中所述处理器901,配置为执行存储器中图像处理程序,以实现上述实施例提供的实例分割方法的步骤。
以上实例分割装置、计算机设备和存储介质实施例的描述,与上述方法实施例的描述是类似的,具有同相应方法实施例相似的技术描述和有益效果,限于篇幅,可案件上述方法实施例的记载,故在此不再赘述。对于本公开实例分割装置、计算机设备和存储介质实施例中未披露的技术细节,请参照本公开方法实施例的描述而理解。
应理解,说明书通篇中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本公开的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”未必一定指相同的实施例。此外,这些 特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本公开的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本公开实施例的实施过程构成任何限定。上述本公开实施例序号仅仅为了描述,不代表实施例的优劣。需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列特征的过程、方法、物品或者装置不仅包括那些特征,而且还包括没有明确列出的其他特征,或者是还包括为这种过程、方法、物品或者装置所固有的特征。在没有更多限制的情况下,由语句“包括一个……”限定的特征,并不排除在包括该特征的过程、方法、物品或者装置中还存在另外的相同特征。
在本公开所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元;既可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。另外,在本公开各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本公开上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本公开各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以所述权利要求的保护范围为准。
工业实用性
本公开实施例提供一种实例分割方法及装置、电子设备及存储介质,其中,获取待处理图像的第一语义信息,和所述待处理图像中的待分割实例的第一实例特征和与所述第一实例特征对应的第一实例掩模;基于所述第一语义信息、所述第一实例特征和所述第一实例掩模,进行至少两个阶段的语义融合处理,得到第二实例掩模;其中,将前一阶段语义融合处理输出的第一实例特征进行上采样得到后一阶段的实例特征,并基于后一阶段的实例特征得到其对应的实例掩模,将后一阶段的实例特征、后一阶段的实例掩膜和后一阶段对应的语义信息作为后一阶段语义融合处理的输入特征;且,每一阶段语义融合处理的输入特征中的语义信息的分辨率与实例特征的分辨率相同。

Claims (20)

  1. 一种实例分割方法,所述方法由电子设备执行,所述方法包括:
    获取待处理图像的第一语义信息,和所述待处理图像中的待分割实例的第一实例特征和与所述第一实例特征对应的第一实例掩模;
    基于所述第一语义信息、所述第一实例特征和所述第一实例掩模,进行至少两个阶段的语义融合处理,得到第二实例掩模;
    其中,将前一阶段所述语义融合处理输出的第一实例特征进行上采样得到后一阶段的实例特征,并基于所述后一阶段的实例特征得到其对应的实例掩模,将所述后一阶段的所述实例特征、所述后一阶段的实例掩膜和所述后一阶段对应的语义信息作为后一阶段语义融合处理的输入特征;且,每一阶段所述语义融合处理的输入特征中的语义信息的分辨率与实例特征的分辨率相同。
  2. 根据权利要求1所述的方法,其中,所述基于所述第一语义信息、所述第一实例特征和所述第一实例掩模,进行至少两个阶段的语义融合处理,得到第二实例掩模,包括:
    基于所述第一语义信息、所述第一实例特征和所述第一实例掩模,进行第一阶段的语义融合处理,得到第二实例特征;
    基于所述第二实例特征、与所述第二实例特征对应的阶段实例掩模和第二语义信息进行至少一阶段的语义融合处理,得到所述第二实例掩模;其中,所述第二语义信息的分辨率与所述第二实例特征的分辨率相同。
  3. 根据权利要求2所述的方法,其中,所述基于所述第一语义信息、所述第一实例特征和所述第一实例掩模,进行第一阶段的语义融合处理,得到第二实例特征,包括:
    将所述第一语义信息中的第一语义特征和所述第一实例特征进行融合,得到第一融合特征;
    将所述第一融合特征、所述第一语义信息中第一语义掩模和所述第一实例掩模相连接,得到所述第二实例特征。
  4. 根据权利要求2或3所述的方法,其中,所述将所述第一语义信息中的第一语义特征和所述第一实例特征进行融合,得到第一融合特征,包括:
    采用第一卷积操作,对所述第一语义特征和所述第一实例特征进行处理,得到第一卷积特征;
    分别采用多个第二卷积操作,对所述第一卷积特征进行处理,得到多个第二卷积结果;其中,所述第一卷积操作的卷积核小于所述第二卷积的卷积核,且所述多个第二卷积操作的空洞大小不同;
    基于所述多个第二卷积结果,确定所述第一融合特征。
  5. 根据权利要求2所述的方法,其中,所述基于所述第二实例特征、与所述第二实例特征对应的阶段实例掩模和第二语义信息进行至少一阶段的语义融合处理,得到所述第二实例掩模,包括:
    对所述第二实例特征、所述阶段实例掩模和所述第二语义信息进行第二阶段的语义融合处理,得到第三实例特征和所述第三实例特征对应的第一空洞掩模;
    基于所述第一空洞掩模和所述阶段实例掩模,确定第三实例掩模;
    对所述第三实例特征、所述第一空洞掩模和第三语义信息进行第三阶段的语义融合处理,得到第四实例特征和所述第四实例特征对应的第二空洞掩模;
    基于所述第二空洞掩模和所述第三实例掩模,确定所述第二实例掩模。
  6. 根据权利要求5所述的方法,其中,所述对所述第二实例特征、所述阶段实例掩模和所述第二语义信息进行第二阶段的语义融合处理,得到第三实例特征和所述第三实例特征对应的第一空洞掩模,包括:
    对所述第二实例特征、所述阶段实例掩模和所述第二语义信息进行第二阶段的语义融合处理,得到所述第三实例特征;
    对所述第三实例特征中的边缘区域进行预测,得到所述第一空洞掩模。
  7. 根据权利要求5或6所述的方法,其中,所述基于所述第一空洞掩模和所述阶段实例掩模,确定第三实例掩模,包括:
    确定所述阶段实例掩模中的边缘区域;
    基于所述边缘区域和所述第一空洞掩模,确定描述所述待分割实例的边缘区域的边缘掩模;
    基于所述边缘区域和所述阶段实例掩模,确定描述所述待分割实例的非边缘区域的非边缘掩模;
    基于所述非边缘掩模和所述边缘掩模,确定所述第三实例掩模。
  8. 根据权利要求7所述的方法,其中,所述基于所述边缘区域和所述第一空洞掩模,确定描述所述待分割实例的边缘区域的边缘掩模,包括:
    基于所述第一空洞掩模的分辨率,对所述阶段实例掩模中的边缘区域进行上采样,得到第一边缘区域;
    基于所述第一边缘区域和所述第一空洞掩模,得到所述边缘掩模。
  9. 根据权利要求7或8所述的方法,其中,所述基于所述边缘区域和所述阶段实例掩模,确定描述所述待分割实例的非边缘区域的非边缘掩模,包括:
    基于所述第一空洞掩模的分辨率对所述阶段实例掩模进行上采样,得到放大实例掩模;
    对第一边缘区域进行反转操作,得到反转掩模;
    基于所述反转掩模和所述放大实例掩模,得到所述非边缘掩模。
  10. 根据权利要求7至9任一项所述的方法,其中,所述确定所述阶段实例掩模中的边缘区域,包括:
    基于所述阶段实例掩模,确定所述待分割实例的边缘线;
    在所述待处理图像中,确定与所述边缘线之间的最小距离小于预设距离的像素点集合;
    基于所述像素点集合,确定所述阶段实例掩模中的边缘区域。
  11. 根据权利要求1至10任一项所述的方法,其中,所述确定待处理图像的第一语义信息之前,所述方法还包括:
    采用特征图金字塔网络,对所述待处理图像进行特征提取,得到包括分辨率不同的多个图像特征的图像特征集合;
    基于所述图像特征集合中分辨率满足预设阈值的目标图像特征,确定所述待处理图像的语义信息。
  12. 根据权利要求11所述的方法,其中,所述基于所述图像特征集合中分辨率满足预设阈值的目标图像特征,确定所述待处理图像的语义信息,包括:
    基于所述目标图像特征,对所述待处理图像进行语义分割,得到语义特征;
    基于所述语义特征,确定所述待处理图像中每一像素属于所述待分割实例的概率;
    基于所述概率,确定所述待处理图像的语义掩模;
    将所述语义特征和所述语义掩模,作为所述语义信息。
  13. 根据权利要求11或12所述的方法,其中,所述获取待处理图像的第一语义信息,和所述待处理图像中的待分割实例的第一实例特征和与所述第一实例特征对应的第一实 例掩模,包括:
    采用感兴趣区域对齐操作,在所述待处理图像的特征图集合中选择满足预设分辨率的第一图像特征;
    基于所述第一图像特征,确定所述第一实例特征和所述第一实例掩模;
    采用所述感兴趣区域对齐操作,在所述语义信息中选择分辨率为所述预设分辨率的所述第一语义信息。
  14. 一种实例分割装置,其中,所述装置包括:
    第一获取模块,配置为获取待处理图像的第一语义信息,和所述待处理图像中的待分割实例的第一实例特征和与所述第一实例特征对应的第一实例掩模;
    第一处理模块,配置为基于所述第一语义信息、所述第一实例特征和所述第一实例掩模,进行至少两个阶段的语义融合处理,得到第二实例掩模;
    其中,将前一阶段所述语义融合处理输出的第一实例特征进行上采样得到后一阶段的实例特征,并基于所述后一阶段的实例特征得到其对应的实例掩模,将所述后一阶段的所述实例特征、所述后一阶段的实例掩膜和所述后一阶段对应的语义信息作为后一阶段语义融合处理的输入特征;且,每一阶段所述语义融合处理的输入特征中的语义信息的分辨率与实例特征的分辨率相同。
  15. 根据权利要求14所述的装置,其中,所述第一处理模块,包括:
    第一处理子模块,配置为基于所述第一语义信息、所述第一实例特征和所述第一实例掩模,进行第一阶段的语义融合处理,得到第二实例特征;
    第二处理子模块,配置为基于所述第二实例特征、与所述第二实例特征对应的阶段实例掩模和第二语义信息进行至少一阶段的语义融合处理,得到所述第二实例掩模;其中,所述第二语义信息的分辨率与所述第二实例特征的分辨率相同。
  16. 根据权利要求15所述的装置,其中,所述第一处理子模块,包括:
    第一融合单元,配置为将所述第一语义信息中的第一语义特征和所述第一实例特征进行融合,得到第一融合特征;
    第一连接单元,配置为将所述第一融合特征、所述第一语义信息中第一语义掩模和所述第一实例掩模相连接,得到所述第二实例特征。
  17. 根据权利要求15或16所述的装置,其中,所述第一融合单元,包括:
    第一卷积子单元,配置为采用第一卷积操作,对所述第一语义特征和所述第一实例特征进行处理,得到第一卷积特征;
    第二卷积子单元,配置为分别采用多个第二卷积操作,对所述第一卷积特征进行处理,得到多个第二卷积结果;其中,所述第一卷积操作的卷积核小于所述第二卷积的卷积核,且所述多个第二卷积操作的空洞大小不同;
    第一确定子单元,配置为基于所述多个第二卷积结果,确定所述第一融合特征。
  18. 一种计算机存储介质,其中,所述计算机存储介质上存储有计算机可执行指令,该计算机可执行指令被执行后能够实现权利要求1至13任一项所述的方法步骤。
  19. 一种电子设备,其中,所述电子设备包括存储器和处理器,所述存储器上存储有计算机可执行指令,所述处理器运行所述存储器上的计算机可执行指令时能够实现权利要求1至13任一项所述的方法步骤。
  20. 一种计算机程序产品,其中,所述计算机程序产品包括计算机可执行指令,该计算机可执行指令被执行后,能够实现权利要求1至13任一项所述的方法步骤。
PCT/CN2021/124726 2021-04-15 2021-10-19 实例分割方法及装置、电子设备及存储介质 WO2022217876A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110407978.X 2021-04-15
CN202110407978.XA CN113096140B (zh) 2021-04-15 2021-04-15 实例分割方法及装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022217876A1 true WO2022217876A1 (zh) 2022-10-20

Family

ID=76677976

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/124726 WO2022217876A1 (zh) 2021-04-15 2021-10-19 实例分割方法及装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN113096140B (zh)
WO (1) WO2022217876A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113096140B (zh) * 2021-04-15 2022-11-22 北京市商汤科技开发有限公司 实例分割方法及装置、电子设备及存储介质
CN113792738A (zh) * 2021-08-05 2021-12-14 北京旷视科技有限公司 实例分割方法、装置、电子设备和计算机可读存储介质
US11976940B2 (en) * 2021-09-30 2024-05-07 Woven By Toyota, Inc. Vehicle data collection system and method of using
WO2023083231A1 (en) * 2021-11-12 2023-05-19 Huawei Technologies Co., Ltd. System and methods for multiple instance segmentation and tracking
CN115578564B (zh) * 2022-10-25 2023-05-23 北京医准智能科技有限公司 实例分割模型的训练方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190057507A1 (en) * 2017-08-18 2019-02-21 Samsung Electronics Co., Ltd. System and method for semantic segmentation of images
CN109801307A (zh) * 2018-12-17 2019-05-24 中国科学院深圳先进技术研究院 一种全景分割方法、装置及设备
CN110008808A (zh) * 2018-12-29 2019-07-12 北京迈格威科技有限公司 全景分割方法、装置和系统及存储介质
US20200134365A1 (en) * 2018-02-09 2020-04-30 Beijing Sensetime Technology Development Co., Ltd. Instance segmentation methods and apparatuses, electronic devices, programs, and media
CN112053358A (zh) * 2020-09-28 2020-12-08 腾讯科技(深圳)有限公司 图像中像素的实例类别确定方法、装置、设备及存储介质
CN113096140A (zh) * 2021-04-15 2021-07-09 北京市商汤科技开发有限公司 实例分割方法及装置、电子设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368893B (zh) * 2020-02-27 2023-07-25 Oppo广东移动通信有限公司 图像识别方法、装置、电子设备及存储介质
CN111414963B (zh) * 2020-03-19 2024-05-17 北京市商汤科技开发有限公司 图像处理方法、装置、设备和存储介质
CN111862140B (zh) * 2020-06-11 2023-08-18 中山大学 一种基于协同模块级搜索的全景分割网络及方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190057507A1 (en) * 2017-08-18 2019-02-21 Samsung Electronics Co., Ltd. System and method for semantic segmentation of images
US20200134365A1 (en) * 2018-02-09 2020-04-30 Beijing Sensetime Technology Development Co., Ltd. Instance segmentation methods and apparatuses, electronic devices, programs, and media
CN109801307A (zh) * 2018-12-17 2019-05-24 中国科学院深圳先进技术研究院 一种全景分割方法、装置及设备
CN110008808A (zh) * 2018-12-29 2019-07-12 北京迈格威科技有限公司 全景分割方法、装置和系统及存储介质
CN112053358A (zh) * 2020-09-28 2020-12-08 腾讯科技(深圳)有限公司 图像中像素的实例类别确定方法、装置、设备及存储介质
CN113096140A (zh) * 2021-04-15 2021-07-09 北京市商汤科技开发有限公司 实例分割方法及装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN113096140B (zh) 2022-11-22
CN113096140A (zh) 2021-07-09

Similar Documents

Publication Publication Date Title
WO2022217876A1 (zh) 实例分割方法及装置、电子设备及存储介质
CN111104962B (zh) 图像的语义分割方法、装置、电子设备及可读存储介质
CN109255352B (zh) 目标检测方法、装置及系统
Hsu et al. Ratio-and-scale-aware YOLO for pedestrian detection
CN110070511B (zh) 图像处理方法和装置、电子设备及存储介质
US9633282B2 (en) Cross-trained convolutional neural networks using multimodal images
US20220108454A1 (en) Segmentation for image effects
US8761446B1 (en) Object detection with false positive filtering
CN112070044B (zh) 一种视频物体分类方法及装置
US9443287B2 (en) Image processing method and apparatus using trained dictionary
CN110533046B (zh) 一种图像实例分割方法、装置、计算机可读存储介质及电子设备
CN110781980B (zh) 目标检测模型的训练方法、目标检测方法及装置
US20210183014A1 (en) Determination of disparity
CN112200115B (zh) 人脸识别训练方法、识别方法、装置、设备及存储介质
CN110807384A (zh) 低能见度下的小目标检测方法和系统
EP3836083A1 (en) Disparity estimation system and method, electronic device and computer program product
CN110765903A (zh) 行人重识别方法、装置及存储介质
CN111079864A (zh) 一种基于优化视频关键帧提取的短视频分类方法及系统
CN113393434A (zh) 一种基于非对称双流网络架构的rgb-d显著性检测方法
US20220301106A1 (en) Training method and apparatus for image processing model, and image processing method and apparatus
CN114332993A (zh) 人脸识别方法、装置、电子设备及计算机可读存储介质
Niemeijer et al. A review of neural network based semantic segmentation for scene understanding in context of the self driving car
EP4383183A1 (en) Data processing method and apparatus
WO2022033088A1 (zh) 图像处理方法、装置、电子设备和计算机可读介质
CN111340044A (zh) 图像处理方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21936743

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21936743

Country of ref document: EP

Kind code of ref document: A1