CN110675407B

CN110675407B - Image instance segmentation method and device, electronic equipment and storage medium

Info

Publication number: CN110675407B
Application number: CN201910875535.6A
Authority: CN
Inventors: 孙阳; 宋丛礼; 郑文
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-09-17
Filing date: 2019-09-17
Publication date: 2022-08-05
Anticipated expiration: 2039-09-17
Also published as: CN110675407A

Abstract

The disclosure relates to an image instance segmentation method, an image instance segmentation device, electronic equipment and a storage medium, relates to the technical field of computer vision, and is used for solving the problem of inaccurate instance segmentation in the related technology, and the image instance segmentation method comprises the following steps: inputting a feature map of a target into a trained segmentation network, determining a sub-image of the target in the feature map based on the segmentation network, and training the segmentation network in the following manner: the method comprises the steps of down-sampling a sample image, processing the down-sampled sample image based on a hole convolution kernel with different expansion rates, determining sample images with different receptive fields, fusing the sample images with different receptive fields to obtain a fused sample image, up-sampling and example segmentation the fused sample image, determining a first sub-image, and training an initial segmentation network according to the first sub-image and a marked second sub-image; and (4) marking the position of the sub-image in the image, wherein the pixel information of pixel points in the feature maps of different receptive fields is more, so that the example segmentation is more accurate.

Description

Image instance segmentation method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to an image instance segmentation method and apparatus, an electronic device, and a storage medium.

Background

Example segmentation is an important component in computer vision, and many research works are also being developed to be able to improve the segmentation accuracy. The earliest example segmentation method is a bottom-up method, such as a deep mask method, which first predicts a segmentation candidate region and then classifies the region by using a classification network, but in this method, the segmentation process precedes the classification process, which results in a slow process speed and low precision of example segmentation. The FCIS (full Convolutional adaptive-aware Segmentation) scheme that appears next combines the Segmentation candidate area and the object detection system, and can improve the speed of Instance Segmentation, but this type of method does not work well on overlapped objects and has low precision. Another type of method is based on semantic segmentation, where each pixel is classified first, trying to cluster pixels of the same class into different instances, but the accuracy of this type of method is not high.

The Mask R-CNN method firstly detects objects based on detection priority, then extracts features of each detected object for segmentation, is a better choice for current example segmentation due to high efficiency and accuracy, and is generally improved on the basis of the later example segmentation method. An example segmentation process for Mask R-CNN includes: detecting each human body frame in the image, firstly performing down-sampling on each human body frame, performing RoIAlign operation on a feature map corresponding to the down-sampled human body frame to generate a segmentation Mask of a background and a foreground, obtaining a segmented example (namely, a human body image contained in the human body frame corresponding to the foreground), and attaching the segmented example to a corresponding position in an original image.

However, when the human body frame is subjected to instance segmentation, the segmentation is only carried out simply and roughly, and the edges of the segmented instance/human body are not accurate enough, so that fine segmentation cannot be guaranteed.

Disclosure of Invention

The disclosure provides an image instance segmentation method, an image instance segmentation device, an electronic device and a storage medium, which are used for solving the problem that the edge segmentation of an instance is not accurate enough in an instance segmentation process in the related art. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided an image instance segmentation method, including:

carrying out target detection on an image to be segmented, and determining a feature map corresponding to each target detected in the image to be segmented;

inputting the feature map of each target into a trained segmentation network, and determining a target sub-image corresponding to the target in the feature map based on the segmentation network, wherein the segmentation network is trained according to the following modes: the method comprises the steps that an initial segmentation network carries out downsampling processing on a sample characteristic graph, the sample characteristic graph after downsampling is processed on the basis of a hole convolution kernel of different expansion rates, the sample characteristic graphs of different receptive fields are determined, the sample characteristic graphs of the different receptive fields are fused to obtain a fused sample characteristic graph, the fused sample characteristic graph is subjected to upsampling processing and example segmentation, a first sub-image corresponding to a target in the sample characteristic graph is determined, and the initial segmentation network is trained according to the first sub-image and a marked second sub-image corresponding to the target in the sample characteristic graph;

and labeling the corresponding position of each target sub-image in the image to be segmented.

Optionally, the performing target detection on the image to be segmented, and determining the feature map corresponding to each target in the image to be segmented includes:

carrying out target detection on an image to be segmented, and determining a detection frame corresponding to each target in the image to be segmented;

if the detection frame with the overlapping area is determined to exist, determining each pixel point located in the overlapping area aiming at the overlapping area of the detection frame, and determining the probability value of each target corresponding to the detection frame with the overlapping area to which each pixel point belongs;

determining the target to which each pixel belongs according to the probability value of each pixel belonging to each target; and aiming at the detection frame with the overlapping area, determining the characteristic diagram of the target to which the detection frame belongs according to the non-overlapping area of the detection frame and the pixel point belonging to the target to which the detection frame belongs.

Optionally, the image instance segmentation method further includes:

and aiming at the detection frame without the overlapping area, determining the area corresponding to the detection frame as the feature map of the target to which the detection frame belongs.

Optionally, the determining, according to the probability value that each pixel belongs to each target, a target to which each pixel belongs includes:

for each pixel point, attributing the pixel point to a target with the maximum probability value in the probability values of each target, and determining the target to which the pixel point belongs;

and if at least two maximum and same probability values exist in the probability values of the pixel point belonging to each target, determining the target with the highest confidence coefficient as the target to which the pixel point belongs according to the confidence coefficients of the pixel point belonging to the targets with the at least two maximum and same probability values.

Optionally, training the initial segmentation network according to the first sub-image and a labeled second sub-image corresponding to the target in the sample feature map includes:

for each sample feature map:

if the matching degree of the first sub-image and the second sub-image in the sample feature map is not higher than a set threshold value, determining a loss function value of the sample feature map according to the sample feature map, the first sub-image and the second sub-image in the sample feature map and a loss function in the initial segmentation network; calculating the loss function value through back propagation of a neural network, and continuing to train the initial segmentation network by adopting the sample feature map;

and if the matching degree of the first sub-image and the second sub-image in the sample feature map is higher than the set threshold, setting the loss function value of the sample feature map as the set loss function value, and continuing to train the initial segmentation network by adopting other sample feature maps until the matching degree of the first sub-image and the second sub-image in each sample feature map is higher than the set threshold.

Optionally, before the feature map of each target is input into the trained segmentation network, the image instance segmentation method further includes:

and for the feature map of each target, scaling the feature map according to the length-width ratio of the feature map, wherein the length of the long side of the scaled feature map is a set length.

Optionally, after scaling the feature map according to the length-width ratio of the feature map, the image instance segmentation method further includes:

and if the length of the short side of the feature map after scaling is determined not to be the set length, filling the feature map after scaling by adopting a pixel point with a set color, wherein the length of each side of the feature map after filling is the set length.

According to a second aspect of the embodiments of the present disclosure, there is provided an image instance segmentation apparatus including:

the detection determining unit is configured to perform target detection on an image to be segmented and determine a feature map corresponding to each target detected in the image to be segmented;

an example segmentation unit configured to perform feature maps for each target, input the feature maps of the targets into a trained segmentation network, and determine target sub-images corresponding to the targets in the feature maps based on the segmentation network, wherein the segmentation network is trained according to the following method: the method comprises the steps that an initial segmentation network carries out downsampling processing on a sample characteristic graph, the sample characteristic graph after downsampling is processed on the basis of a hole convolution kernel of different expansion rates, the sample characteristic graphs of different receptive fields are determined, the sample characteristic graphs of the different receptive fields are fused to obtain a fused sample characteristic graph, the fused sample characteristic graph is subjected to upsampling processing and example segmentation, a first sub-image corresponding to a target in the sample characteristic graph is determined, and the initial segmentation network is trained according to the first sub-image and a marked second sub-image corresponding to the target in the sample characteristic graph;

and the labeling unit is configured to perform labeling at the corresponding position of each target sub-image in the image to be segmented.

Optionally, the detection determining unit is specifically configured to perform:

Optionally, the detection determining unit is further configured to execute, for a detection frame without an overlapping region, determining a region corresponding to the detection frame as a feature map of a target to which the detection frame belongs.

Optionally, the image instance segmenting device further includes:

a training unit configured to perform, for each sample feature map: if the matching degree of the first sub-image and the second sub-image in the sample feature map is not higher than a set threshold value, determining a loss function value of the sample feature map according to the sample feature map, the first sub-image and the second sub-image in the sample feature map and a loss function in the initial segmentation network; calculating the loss function value through back propagation of a neural network, and continuing to train the initial segmentation network by adopting the sample feature map; and if the matching degree of the first sub-image and the second sub-image in the sample feature map is higher than the set threshold, setting the loss function value of the sample feature map as the set loss function value, and continuing to train the initial segmentation network by adopting other sample feature maps until the matching degree of the first sub-image and the second sub-image in each sample feature map is higher than the set threshold.

Optionally, the image instance segmenting device further includes:

and the updating unit is configured to execute scaling of the feature map according to the length-width ratio of the feature map aiming at each target, wherein the length of the long side of the scaled feature map is a set length.

Optionally, the updating unit is further configured to perform, if it is determined that the side length of the short side of the feature map after scaling is not the set length, filling the feature map after scaling with a pixel point of a set color, where the side length of each side of the feature map after filling is the set length.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the image instance segmentation method described above.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the image instance segmentation method described above.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising: program code for enabling an electronic device to carry out the image instance segmentation method described above, when said computer program product is executed by said electronic device.

The technical scheme provided in the embodiment of the disclosure at least brings the following beneficial effects:

the method comprises the steps of determining a target sub-image corresponding to a target in a feature map corresponding to the detected target based on a trained segmentation network, obtaining sample feature maps of different receptive fields based on a cavity convolution kernel containing different contrast rates in the segmentation network in the training process, fusing the sample feature maps of the different receptive fields, and obtaining a segmentation network with finer segmentation based on more pixel information contained in each pixel point in the fused sample feature map, so that the edge of the target sub-image/example obtained by segmentation based on the trained segmentation network is more accurate and the finer segmentation can be ensured.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a flow diagram illustrating a method of image instance segmentation in accordance with an exemplary embodiment;

FIG. 2 is a block diagram illustrating an example image segmentation apparatus according to one illustrative embodiment;

FIG. 3 is a block diagram of an electronic device shown in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In the related art, when the human body frame is subjected to instance segmentation, only simple rough segmentation (such as roiign operation) is performed, so that the edge of the segmented instance (i.e., the sub-image corresponding to the target contained in the detection frame corresponding to the foreground, where the contour edge of the sub-image corresponding to the target is theoretically consistent with the actual contour edge of the target) is not accurate enough. Therefore, the image instance segmentation method provided by the embodiment of the present disclosure can perform instance segmentation on objects belonging to different types, and in the embodiment of the present disclosure, only the type to which the object belongs is used as a human body, and for the image instance segmentation process of other types (such as animals, balloons, furniture, and the like) to which the object belongs, reference may be made to the image instance segmentation process of the human body, and no repeated description is given in the embodiment of the present disclosure.

Fig. 1 is a flowchart illustrating an image example segmentation method according to an exemplary embodiment, as shown in fig. 1, the image example segmentation method is applied to an image example segmentation apparatus, such as an electronic device, and the image example segmentation method includes the following steps:

in step S101, target detection is performed on an image to be segmented, and a feature map corresponding to each target detected in the image to be segmented is determined.

In step S102, for each feature map of a target, inputting the feature map of the target into a segmentation network that is trained, and determining a target sub-image corresponding to the target in the feature map based on the segmentation network, where the segmentation network is trained according to the following method: the initial segmentation network carries out downsampling processing on the sample characteristic graph, processes the downsampled sample characteristic graph based on cavity convolution kernels of different dispation rates, determines the sample characteristic graphs of different receptive fields, fuses the sample characteristic graphs of the different receptive fields to obtain a fused sample characteristic graph, carries out upsampling processing and example segmentation on the fused sample characteristic graph, determines a first sub-image corresponding to a target in the sample characteristic graph, and trains the initial segmentation network according to the first sub-image and a marked second sub-image corresponding to the target in the sample characteristic graph.

In step S103, labeling a corresponding position of each target sub-image in the image to be segmented.

The initial segmentation network is trained to obtain the trained segmentation network, and the trained segmentation network comprises cavity convolution kernels with different segmentation rates, so that feature maps of different receptive fields can be determined based on the segmentation network, and each pixel point in the feature maps of the different receptive fields contains more pixel information, so that when example segmentation is performed based on the feature maps of the different receptive fields, more precise segmentation can be performed, and the edge of a target sub-image (namely a target example) corresponding to a target obtained by the example segmentation is more precise, so that the more precise segmentation is realized, and a better segmentation effect is achieved.

In the embodiment of the present disclosure, the target may also be understood as a segmentation object, and the target sub-image corresponding to the target may also be understood as a segmentation result of the segmentation object, that is, an example obtained by the segmentation.

The example segmentation is a task of identifying a target contour at a pixel level, the identified target contour may be a contour of a target example corresponding to a target, or a contour of a target sub-image corresponding to the target, the closer the contour edge of the target sub-image or the target example is to the contour edge of the corresponding target, the more accurate the edge of the example obtained by the example segmentation, the finer the example segmentation, and the better the segmentation effect.

The image instance segmentation device can detect the target of the acquired image to be segmented, detect one or more targets in the image to be segmented and determine a feature map (feature map) corresponding to each target.

For example, the image instance dividing device may perform object detection in the image to be divided, may detect one or more object detection frames, may directly use an area (or an image area) corresponding to the object detection frame as a feature map corresponding to the object, or may process the object detection frame to determine the feature map corresponding to the object. For example, in some scenes, occlusion may exist between human bodies, at this time, an overlapping region exists between detection frames of two human bodies, and if a region corresponding to a detection frame of a direct target is used as a feature map corresponding to the target, it is possible that a difference between an example edge subsequently segmented and an actual contour edge of the target is too large, so that if the detection frame of the target is processed to determine the feature map corresponding to the target, the segmented example edge can be closer to the actual contour edge of the target, and thus the example segmentation is more accurate. The process of determining the feature map corresponding to the target by the image example segmentation device will be described in detail below.

In an example, the image instance segmentation device trains a human body detection task by using a convolutional neural network, so as to achieve target detection in a plurality of complex scenes with occlusion, different illumination, and the like, for example, the image instance segmentation device performs target detection on an image to be segmented by using an FPN (feature pyramid network), and the detection accuracy of human bodies with different proportions can be improved by performing target detection on the image by using the FPN.

For the convenience of distinguishing from the subsequent feature map, the feature map corresponding to the target determined by the image instance segmentation example may be referred to as a first feature map corresponding to the target.

The specific training process of the training-completed segmentation network for performing segmentation may be implemented depending on a sample feature map of each sample target in a training set, and for distinguishing from a subsequent sample feature map, the sample feature map of each sample target in the training set may be referred to as a first sample feature map, the first sample feature map is manually labeled with a second sub-image, the second sub-image is a second sub-image corresponding to a target in the first sample feature map, that is, the second sub-image is an example manually labeled in the first sample feature map.

In the process of training the segmentation network, inputting each first sample feature map and a second sub-image in each first sample feature map into an initial segmentation network, the initial segmentation network downsamples the first sample feature maps, the initial segmentation network comprises cavity convolution kernels with different differentiation rates, the initial segmentation network processes the downsampled first sample feature maps based on the cavity convolution kernels with different differentiation rates to determine second sample feature maps with different receptive fields, the initial segmentation network fuses the second sample feature maps with different receptive fields to obtain a fused third sample feature map, the initial segmentation network upsamples the third sample feature maps, performs example segmentation on the upsampled third sample feature maps to determine a first sub-image corresponding to an object in each first sample feature map, and the initial segmentation network performs first sub-images corresponding to the objects in each first sample feature map and labeled second sub-images And the image is used for training the initial segmentation network, and the segmentation network for example segmentation can be obtained after the training of the initial segmentation network is completed.

The number of the hole convolution kernels of different partition rates included in the partition network and the value of the partition rates may be set according to the actual partition effect requirement, which is not limited in the embodiment of the present disclosure. The segmentation network may further include a network for performing upsampling processing and/or a network for performing downsampling processing, for example, the network for performing upsampling processing may be a pooling layer, and the network for performing downsampling processing may be a pooling layer.

The segmentation network is usually at the pixel which is difficult to distinguish (the pixel belongs to the target), so that the loss value of the pixel with the classification error and/or low confidence coefficient is returned in the training process, the training of the pixel with the classification error and/or low confidence coefficient is focused, and the loss value return of the pixel with the classification accuracy and/or high confidence coefficient can not be calculated any more, so that the segmentation network can pay more attention to learning in the training process, and a better training effect is achieved. Thus, for example, training the initial segmentation network according to the first sub-image and a labeled second sub-image corresponding to a target in a sample feature map comprises:

for each sample feature map:

Therefore, in the embodiment of the present disclosure, if the matching degree of the first sub-image and the second sub-image of the sample feature map is not higher than the set threshold, it may be considered that a classification error occurs more and/or the confidence degree is lower for the pixels at the second sub-image that are manually labeled, and therefore, the training may be focused on training the portion of pixels, so as to achieve a better training effect.

The value of the set threshold is not limited in the embodiments of the present disclosure.

Alternatively, the loss function in the split network may be crossentrypyloss (cross entropy loss function).

Alternatively, the set loss function value may be 0.

The training of the initial segmentation network may include training parameters of a network in the initial segmentation network, which is subjected to downsampling processing, parameters of a hole convolution kernel, and parameters of a network which is subjected to upsampling processing.

When the segmentation network trained based on the training process is applied (namely, during instance segmentation), aiming at the first feature map of each target, the first feature map of the target is input into the segmentation network trained based on the training process, the segmentation network trained carries out downsampling processing on the first feature map, the segmentation network trained based on different partition rates carries out processing on the downsampled first feature map based on the hole convolution kernel, second feature maps of different receptive fields are determined, the segmentation network trained carries out fusion on the second feature maps of different receptive fields to obtain a fused third feature map, the segmentation network trained carries out upsampling processing on the third feature map, the upsampled third feature map is subjected to instance segmentation, and a target sub-image corresponding to the target in the first feature map is determined.

The number of times of downsampling processing on the feature map and the number of times of upsampling processing on the feature map by the image example segmentation device can be set according to the actual segmentation effect requirement.

In a possible scenario, for a part where detection frames of multiple human bodies intersect, overlapping pixel points are likely to occur, so that the target detection is performed on the image to be segmented, and determining a feature map corresponding to each target in the image includes:

if the detection frame with the overlapping area is determined to exist, determining each pixel point located in the overlapping area according to the overlapping area of the detection frame, and determining the probability that each pixel point belongs to each target corresponding to the detection frame with the overlapping area;

determining the target to which each pixel belongs according to the probability of the pixel belonging to each target; aiming at a detection frame with an overlapping area, determining a feature map of a target to which the detection frame belongs according to a non-overlapping area of the detection frame and pixel points belonging to the target to which the detection frame belongs;

For example, the detection frame corresponding to the detected object may be an ROI (region of interest).

For the intersected parts of the detection frames of the human bodies, overlapped pixel points are easy to appear, an example individual with the maximum probability value is selected for the pixel points of the intersected parts, after the image example segmentation device determines the detection frame corresponding to each target in the image, the probability that each pixel point in the overlapped area belongs to each target corresponding to each first detection frame is determined for at least two first detection frames with the overlapped area, and the image example segmentation device determines the pixel point to be the target to which the pixel point belongs according to the target with the maximum probability value in the probability that the pixel point belongs to each target. For example, an overlapping area exists between the detection frame of the target a and the detection frame of the target B, and for a first pixel point located in the overlapping area, if it is determined that the probability that the first pixel point belongs to the target a is 0.2 and the probability that the first pixel point belongs to the target B is 0.8, it may be determined that the target to which the first pixel point belongs is the target B, and the feature map corresponding to the target B at least includes the non-overlapping area and the first pixel point in the detection frame.

Whether the detection frames of the human bodies are intersected or not, for the detection frame without the overlapped area, the area corresponding to the detection frame of the human body can be directly determined as the feature map of the target to which the detection frame belongs. For example, if there is no overlap between the detection frame of the target C and the detection frame of another target, the region (image region) corresponding to the detection frame of the target C may be determined as the feature map of the target C.

The determining the target to which each pixel belongs according to the probability value to which each pixel belongs to each target includes:

optionally, for each pixel point, attributing the pixel point to the target with the highest probability value in the probability values of each target, and determining the target to which the pixel point belongs;

In the process of target detection, the same probability value may exist in the probability that the pixel point belongs to each target, in this scenario, a target with higher confidence may be selected as the target to which the pixel point belongs, the image instance segmentation device may also determine the confidence that the pixel point belongs to each target, for example, the detection frame of target a and the detection frame of target B have an overlapping region, for a second pixel point located in the overlapping region, if it is determined that the probability that the second pixel point belongs to target a is 0.8, it is determined that the probability that the second pixel point belongs to target B is 0.8, if the confidence that the second pixel point belongs to target a is higher than the confidence that the second pixel point belongs to target B, it may be determined that the target to which the second pixel point belongs is target a, if the confidence that the second pixel point belongs to target a is not higher than the confidence that the second pixel point belongs to target B, it may be determined that the target to which the second pixel belongs is target B.

The image instance segmentation means may input, for each feature map (which includes the first feature map in the instance segmentation process and/or the first sample feature map in the segmentation network training process), the feature map into the segmentation network. Optionally, before the image instance segmentation apparatus inputs the target feature map into the segmentation network for the feature map of each target, the image instance segmentation method further includes:

and for the feature map of each target, scaling the feature map according to the length-width ratio of the feature map, wherein the length of the long side of the scaled feature map is a set length, the scaled feature map may be referred to as a fourth feature map and/or a fourth sample feature map, and the image instance segmentation device may update the feature map by using the scaled feature map, and input the updated feature map into a trained segmentation network. Compared with the prior art that the feature map is directly scaled to be 28 × 28 squares, the human body is not square in general, so that the error caused by forced scaling to be square is avoided, the original proportion of the human body is ensured, and further finer segmentation is realized.

The set length may be related to the input of the segmentation network, for example, 256 × 256, and the set length may be 256, so that the size of the feature map is increased, and the segmentation accuracy is further improved.

Alternatively, the set length may be not less than 128 in order to ensure the accuracy of the segmentation.

Generally, the input of the segmentation network is m × m, if it is determined that the side length of the short side of the scaled feature map (i.e., the fourth feature map) is not the set length, the scaled feature map is filled with pixels of a set color, where the side length of each side of the filled feature map is the set length, and the pixels of the set color used in the filled fourth feature map may be pixels of black, white, or the like, so as to ensure that the side length of each side of the filled fourth feature map is the set length. The image instance segmentation device may update the feature map of the target by using the filled fourth feature map, and input the updated feature map into the trained segmentation network.

The image instance segmentation device labels the corresponding position of each target sub-image in the image, and may add one or a combination of a plurality of labeling modes such as contour lines, character labeling, coloring and the like to each target sub-image.

In the embodiment of the present disclosure, for the feature map corresponding to the detected target, the target sub-image corresponding to the target in the feature map is determined based on the segmentation network completed through training, the segmentation network obtains the sample feature maps of different receptive fields based on the cavity convolution kernels including different resolution rates in the training process, and the sample feature maps of different receptive fields are fused, so that each pixel point in the fused sample feature map contains more pixel information, and a segmentation network with finer segmentation can be trained, and therefore, the edge of the target sub-image obtained through segmentation based on the segmentation network completed through training is more accurate, and finer segmentation can be ensured.

FIG. 2 is a block diagram illustrating an image instance segmentation apparatus according to an exemplary embodiment. Referring to fig. 2, the apparatus includes: a detection determination unit 21, an instance segmentation unit 32, and an annotation unit 33.

The detection determining unit 21 is configured to perform target detection on an image to be segmented, and determine a feature map corresponding to each target detected in the image to be segmented;

the example segmentation unit 22 is configured to perform a feature map for each target, input the feature map of the target into a trained segmentation network, and determine a target sub-image corresponding to the target in the feature map based on the segmentation network, where the segmentation network is trained according to the following method: the method comprises the steps that an initial segmentation network carries out downsampling processing on a sample characteristic graph, the sample characteristic graph after downsampling is processed on the basis of a hole convolution kernel of different expansion rates, the sample characteristic graphs of different receptive fields are determined, the sample characteristic graphs of the different receptive fields are fused to obtain a fused sample characteristic graph, the fused sample characteristic graph is subjected to upsampling processing and example segmentation, a first sub-image corresponding to a target in the sample characteristic graph is determined, and the initial segmentation network is trained according to the first sub-image and a marked second sub-image corresponding to the target in the sample characteristic graph;

the labeling unit 23 is configured to perform labeling at a corresponding position of each target sub-image in the image to be segmented.

Further, the detection determining unit 21 is specifically configured to:

carrying out target detection on an image to be segmented, and determining a detection frame corresponding to each target in the image;

and aiming at the detection frame without the overlapped area, determining the area corresponding to the detection frame as the feature map of the target to which the detection frame belongs.

Further, the detection determining unit 21 is specifically configured to:

Further, the image instance segmentation device further includes:

the training unit 24 is configured to perform, for each sample feature map: if the matching degree of the first sub-image and the second sub-image in the sample feature map is not higher than a set threshold value, determining a loss function value of the sample feature map according to the sample feature map, the first sub-image and the second sub-image in the sample feature map and a loss function in the initial segmentation network; calculating the loss function value through back propagation of a neural network, and continuing to train the initial segmentation network by adopting the sample feature map; and if the matching degree of the first sub-image and the second sub-image in the sample feature map is higher than the set threshold, setting the loss function value of the sample feature map as the set loss function value, and continuing to train the initial segmentation network by adopting other sample feature maps until the matching degree of the first sub-image and the second sub-image in each sample feature map is higher than the set threshold.

Further, the image instance segmentation device further includes:

the updating unit 25 is configured to perform scaling on the feature map of each object according to the length-width ratio of the feature map, wherein the length of the long side of the scaled feature map is a set length.

Further, the updating unit 25 is further configured to perform, if it is determined that the side length of the short side of the feature map after scaling is not the set length, filling the feature map after scaling with a pixel point of a set color, where the side length of each side of the feature map after filling is the set length.

FIG. 3 is a block diagram of an electronic device shown in accordance with an example embodiment. The electronic device comprises a processor 31; a memory 32 for storing instructions executable by the processor 31, wherein the processor 31 is configured to execute the instructions to implement the image instance segmentation method described above.

In an exemplary embodiment, there is also provided a storage medium comprising instructions, such as a memory 32 comprising instructions, executable by a processor 31 of an electronic device to perform the above-described method. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In some possible implementation embodiments, various aspects of image instance segmentation provided by the present disclosure may also be implemented in the form of a computer program product including program code for causing an electronic device to perform the steps in image instance segmentation according to various exemplary embodiments of the present disclosure described above in this specification when the computer program product is run on the electronic device, for example, the electronic device may perform steps S101, S102, and S103 as shown in fig. 1.

The computer program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer program product for an image segmentation example of embodiments of the present disclosure may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on an electronic device. However, the computer program product of the present disclosure is not so limited, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image instance segmentation method, comprising:

inputting the feature map of each target into a trained segmentation network, and determining a target sub-image corresponding to the target in the feature map based on the segmentation network, wherein the segmentation network is trained according to the following modes: the method comprises the steps that an initial segmentation network carries out downsampling processing on a sample characteristic graph, the downsampled sample characteristic graph is processed on the basis of a hole convolution kernel with different expansion rates, the sample characteristic graphs of different receptive fields are determined, the sample characteristic graphs of the different receptive fields are fused to obtain a fused sample characteristic graph, the fused sample characteristic graph is subjected to upsampling processing and example segmentation, a first sub-image corresponding to a target in the sample characteristic graph is determined, and the initial segmentation network is trained according to the first sub-image and a marked second sub-image corresponding to the target in the sample characteristic graph;

labeling the corresponding position of each target sub-image in the image to be segmented;

training the initial segmentation network according to the first sub-image and a labeled second sub-image corresponding to a target in a sample feature map comprises:

for each sample feature map:

2. The image instance segmentation method according to claim 1, wherein the performing target detection on the image to be segmented and determining the feature map corresponding to each target in the image to be segmented comprises:

3. The image instance segmentation method of claim 2, further comprising:

4. The image instance segmentation method of claim 2, wherein the determining the target to which each pixel belongs according to the probability value to which each pixel belongs to each target comprises:

5. The image instance segmentation method according to any one of claims 1 to 4, wherein before the feature map of each object is input into the trained segmentation network, the image instance segmentation method further comprises:

6. The method of claim 5, wherein after scaling the feature map according to the aspect ratio of the feature map, the image instance segmentation method further comprises:

7. An image instance segmentation apparatus, comprising:

the labeling unit is configured to perform labeling at a corresponding position of each target sub-image in the image to be segmented;

the image instance segmentation device further comprises:

8. The image instance segmentation apparatus according to claim 7, wherein the detection determination unit is specifically configured to perform:

9. The apparatus according to claim 8, wherein the detection determining unit is further configured to perform, for a detection frame having no overlapping area, determining an area corresponding to the detection frame as the feature map of the target to which the detection frame belongs.

10. The image instance segmentation apparatus as claimed in claim 8, characterized in that the detection determination unit is specifically configured to perform:

11. The instance segmenting device according to any one of claims 7 to 10, characterized in that said image instance segmenting device further comprises:

12. The example splitting apparatus according to claim 11, wherein the updating unit is further configured to perform padding the scaled feature map with pixel points of a set color if it is determined that the side length of the short side of the scaled feature map is not the set length, where the side length of each side of the padded feature map is the set length.

13. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the image instance segmentation method according to any one of claims 1 to 6.

14. A storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the image instance segmentation method according to any one of claims 1 to 6.