CN113192060A

CN113192060A - Image segmentation method and device, electronic equipment and storage medium

Info

Publication number: CN113192060A
Application number: CN202110570239.2A
Authority: CN
Inventors: 李祥泰; 程光亮
Original assignee: Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Priority date: 2021-05-25
Filing date: 2021-05-25
Publication date: 2021-07-30

Abstract

The present disclosure provides a method, an apparatus, an electronic device and a storage medium for image segmentation, wherein the method comprises: performing feature extraction on an image to be segmented to obtain a feature map of the image to be segmented; performing edge feature extraction on the obtained feature map to obtain a first edge feature and a second edge feature of the image to be segmented; the direction of the feature vector corresponding to the first edge feature is different from that of the feature vector corresponding to the second edge feature; and determining an image segmentation result of the image to be segmented based on the first edge feature, the second edge feature and the feature map of the image to be segmented. The image segmentation is carried out by combining the two edge features (namely the first edge feature and the second edge feature) and the feature map of the image to be segmented, so that the edge contour of the target object can be conveniently revealed from the image to be segmented, and the accuracy of the image segmentation is improved.

Description

Image segmentation method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image segmentation method, an image segmentation apparatus, an electronic device, and a storage medium.

Background

Image segmentation is a process of dividing an image into a plurality of specific regions with unique properties and extracting an object of interest. The extracted target can be applied to the fields of image semantic recognition, image search and the like.

The basic idea of the method is to detect edge points existing in an image by using feature inconsistency between regions, and then connect all the points into a line according to a given strategy until a closed region is formed.

However, when the edge is complicated, especially when the image background is complicated, the pixel that originally belongs to the image background is easily divided into the pixel of the target object, and the accuracy of the division is not high.

Disclosure of Invention

The embodiment of the disclosure at least provides an image segmentation method, an image segmentation device, an electronic device and a storage medium, so that the edge of an image is segmented more accurately, and the segmentation accuracy is higher.

In a first aspect, an embodiment of the present disclosure provides a method for image segmentation, where the method includes:

performing feature extraction on an image to be segmented to obtain a feature map of the image to be segmented;

performing edge feature extraction on the obtained feature map to obtain a first edge feature and a second edge feature for the image to be segmented; the direction of the feature vector corresponding to the first edge feature is different from that of the feature vector corresponding to the second edge feature;

and determining an image segmentation result of the image to be segmented based on the first edge feature, the second edge feature and the feature map of the image to be segmented.

By adopting the image segmentation method, under the condition that the image to be segmented is obtained, the feature extraction can be carried out on the image to be segmented, and further the edge feature extraction can be carried out on the extracted feature graph, so that the first edge feature and the second edge feature with different feature vector directions are obtained. The image segmentation is carried out by combining the two edge features (namely the first edge feature and the second edge feature) and the feature map of the image to be segmented, so that the edge contour of the target object can be conveniently revealed from the image to be segmented, and the accuracy of the image segmentation is improved.

In a possible implementation manner, performing feature extraction on an image to be segmented to obtain a feature map of the image to be segmented includes:

extracting features of an image to be segmented to obtain an original multi-scale feature map; the original multi-scale feature map comprises feature maps of multiple scales;

selecting one feature map with the largest scale from the feature maps with the multiple scales as a bottom-layer single-scale feature map;

the performing edge feature extraction on the obtained feature map to obtain a first edge feature and a second edge feature for the image to be segmented includes:

performing sum-value operation on the original multi-scale feature map and the bottom single-scale feature map to obtain a first sum-value feature map;

and performing edge feature extraction on the first sum feature graph to obtain the first edge feature and the second edge feature.

The original multi-scale feature map obtained here can represent features of the target object under each scale, and for target objects of different sizes, segmentation accuracy can be improved to the greatest extent by corresponding to different scales. The extraction of the two edge features is realized by combining the sum operation of the two feature maps, and the feature expression capability is better.

In a possible implementation manner, the performing edge feature extraction on the first sum feature map to obtain the first edge feature and the second edge feature includes:

performing cascade operation based on the first sum characteristic diagram and the original multi-scale characteristic diagram to obtain a cascade characteristic diagram;

inputting the cascade characteristic diagram into different convolution layers of the edge compression network for edge characteristic extraction to obtain two characteristic vectors with different directions;

and performing bilinear interpolation operation on the first sum value characteristic graph based on the two characteristic vectors respectively to obtain the first edge characteristic and the second edge characteristic.

Here, the extraction of the two edge features can be realized based on the bilinear interpolation operation between the two feature vectors in different directions and the first sum feature map, one of the two extracted edge features can highlight the related expansion operation, and the other can highlight the related erosion operation, so that the contour of the target object can be effectively extracted, and the accuracy of the subsequent segmentation result is further improved.

In a possible implementation manner, the determining an image segmentation result of the image to be segmented based on the first edge feature, the second edge feature, and a feature map of the image to be segmented includes:

performing sum-value operation on the first edge feature, the second edge feature and the original multi-scale feature map to obtain a second sum-value feature map;

and carrying out image segmentation on the image to be segmented based on the second sum characteristic graph to obtain an image segmentation result.

The image segmentation is realized by combining the two edge features and the original multi-scale feature map, and the determined image segmentation result is more accurate because the edge features can highlight the outline of the target object.

In one possible implementation, the edge feature extraction of the obtained feature map is performed by an edge compression network in a neural network; performing image segmentation on the image to be segmented based on the second sum-value feature map by a main segmentation network in the neural network;

training the neural network as follows:

acquiring a plurality of image samples and original annotation information for each image sample;

performing feature extraction on the image sample to obtain a sample feature map;

and training a neural network based on the extracted sample feature map and the original labeling information of the image sample.

In one possible embodiment, the training the neural network based on the extracted sample feature map and the original labeling information of the image sample includes:

carrying out image segmentation on the extracted sample characteristic graph by using the neural network, outputting an image segmentation result, and comparing the image segmentation result with original labeling information of the image sample;

and adjusting the network parameter value of at least one of the main segmentation network and the edge compression network based on the comparison result.

Here, the network adjustment can be realized based on the comparison result between the image segmentation result output by the main segmentation network and the original annotation information of the image sample, and the operation is simple.

In one possible embodiment, the method further comprises:

determining first object labeling information, second object labeling information and third object labeling information of a target object included in each image sample based on the original labeling information of the image sample; the first object labeling information is used for indicating that the labeling value of the edge pixel point of the target object is different from the labeling values of other non-edge pixel points, the second object labeling information is used for indicating that the labeling value of the corresponding corrosion region in the target object is different from the labeling values of other non-corrosion regions, and the third object labeling information is used for indicating that the labeling value of the corresponding expansion region outside the target object is different from the labeling values of other non-expansion regions;

the neural network further comprises a first sub-divided network, a second sub-divided network and a third sub-divided network;

the training the neural network based on the extracted sample feature map and the original labeling information of the image sample comprises:

performing edge feature extraction on the sample feature graph to obtain a first sample edge feature and a second sample edge feature; the direction of the feature vector corresponding to the first sample edge feature is different from that of the feature vector corresponding to the second sample edge feature;

and training the neural network based on the first sample edge feature, the second sample edge feature, and first object labeling information, second object labeling information and third object labeling information of a target object included in the image sample.

In one possible embodiment, the training the neural network based on the first sample edge feature, the second sample edge feature, and first object labeling information, second object labeling information, and third object labeling information of a target object included in the image sample includes:

performing sum-value operation on the first sample edge feature and the second sample edge feature, inputting the obtained sum-value edge feature into the first sub-segmentation network, obtaining a first object segmentation result output by the first sub-segmentation network and aiming at a target object, and comparing the first object segmentation result with first object labeling information of the target object included in the image sample; and the number of the first and second groups,

inputting the first sample edge feature into the second sub-segmentation network to obtain a second object segmentation result output by the second sub-segmentation network and aiming at the target object, and comparing the second object segmentation result with second object labeling information of the target object included in the image sample; and the number of the first and second groups,

inputting the edge characteristics of a second sample into the third sub-segmentation network to obtain a third object segmentation result output by the third sub-segmentation network and aiming at the target object, and comparing the third object segmentation result with third object labeling information of the target object included in the image sample;

and if any one of the secondary segmentation networks and any one of the comparison results corresponding to the primary segmentation network are inconsistent, adjusting network parameter values of the primary segmentation network, the edge compression network and at least one of the three secondary segmentation networks until the comparison results are consistent, and obtaining a trained neural network.

Here, the network training may be implemented in conjunction with joint training of the sub-segmented network. The secondary segmentation network restrains the edge pixel points of the target object, the corresponding expansion area outside the target object and the corresponding corrosion area outside the target object, so that the primary segmentation network can be assisted to accurately segment the target object from the image sample, and the segmentation accuracy is high.

In one possible embodiment, the neural network further comprises a target detection network; the training the neural network based on the extracted sample feature map and the original labeling information of the image sample comprises:

carrying out target detection on the extracted sample characteristic graph by using the target detection network, outputting a target detection result, and comparing the target detection result with original labeling information of the image sample;

and adjusting a network parameter value of at least one of the main segmentation network, the edge compression network and the target detection network based on the comparison result.

In a possible implementation manner, the extracting features from the image sample to obtain a sample feature map includes:

extracting sample feature maps of multiple scales from the image sample;

selecting a sample feature map with the largest scale from the sample feature maps with the plurality of scales; selecting a corresponding sample feature map from the sample feature maps with the multiple scales based on the pre-marked scale of the target object in the image sample;

and training the neural network based on the two selected sample feature maps and the original labeling information of the image sample.

In a second aspect, an embodiment of the present disclosure further provides an apparatus for image segmentation, where the apparatus includes:

the first extraction module is used for extracting the features of the image to be segmented to obtain a feature map of the image to be segmented;

the second extraction module is used for extracting edge features of the obtained feature graph to obtain a first edge feature and a second edge feature of the image to be segmented; the direction of the feature vector corresponding to the first edge feature is different from that of the feature vector corresponding to the second edge feature;

and the segmentation module is used for determining an image segmentation result of the image to be segmented based on the first edge feature, the second edge feature and the feature map of the image to be segmented.

In a third aspect, an embodiment of the present disclosure further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the method of image segmentation according to the first aspect and any of its various embodiments.

In a fourth aspect, the disclosed embodiments also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the method for image segmentation according to the first aspect and any of its various embodiments.

For the description of the effects of the image segmentation apparatus, the electronic device, and the computer-readable storage medium, reference is made to the description of the image segmentation method, and details are not repeated here.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 illustrates a flow chart of a method of image segmentation provided by an embodiment of the present disclosure;

fig. 2(a) shows a flowchart of a specific method for edge feature extraction in the method for image segmentation provided by the embodiment of the present disclosure;

fig. 2(b) shows a flowchart of a specific method for edge feature extraction in the method for image segmentation provided by the embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating a specific method of network training in a method of image segmentation provided by an embodiment of the present disclosure;

FIG. 4 is a schematic diagram illustrating an apparatus for image segmentation provided by an embodiment of the present disclosure;

fig. 5 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The term "and/or" herein merely describes an associative relationship, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

It has been found through research that the basic idea of the image segmentation method based on edge detection provided in the related art is to detect edge points existing in an image by using feature inconsistency between regions, and then connect all the points into a line according to a predetermined strategy until a closed region is formed.

Based on the research, the present disclosure provides an image segmentation method, an image segmentation apparatus, an electronic device, and a storage medium, so that an image edge is more accurately segmented, and the segmentation accuracy is higher.

To facilitate understanding of the present embodiment, a detailed description is first given of an image segmentation method disclosed in the embodiments of the present disclosure, and an execution subject of the image segmentation method provided in the embodiments of the present disclosure is generally a computer device with certain computing power, where the computer device includes, for example: a terminal device, which may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle mounted device, a wearable device, or a server or other processing device. In some possible implementations, the method of image segmentation may be implemented by a processor calling computer readable instructions stored in a memory.

Referring to fig. 1, which is a flowchart of an image segmentation method provided in an embodiment of the present disclosure, the method includes steps S101 to S103, where:

s101: performing feature extraction on an image to be segmented to obtain a feature map of the image to be segmented;

s102: performing edge feature extraction on the obtained feature map to obtain a first edge feature and a second edge feature of the image to be segmented; the direction of the feature vector corresponding to the first edge feature is different from that of the feature vector corresponding to the second edge feature;

s103: and determining an image segmentation result of the image to be segmented based on the first edge feature, the second edge feature and the feature map of the image to be segmented.

In order to facilitate understanding of the method for image segmentation provided by the embodiments of the present disclosure, an application scenario of the method is first described below. The image segmentation method can be mainly applied to the technical field of computer vision, such as video security based on video application, industrial vision, medical treatment, intelligent driving vision and the like. In the related art, the requirement of image segmentation lies in accurate edge location of scenes and people, and classification of classes in the scenes according to the requirement of tasks.

However, for some edges that are complex, especially when the image background is complex, the image segmentation scheme in the related art is easy to segment the pixels that originally belong to the image background into pixels of the target object, and the segmentation accuracy is not high.

In order to solve the above problem, the embodiments of the present disclosure provide a scheme for image segmentation by combining edge features and image features of the image segmentation scheme, which reveals an edge contour of a target object from two layers of image erosion and image dilation, and improves segmentation accuracy.

The image to be segmented may be an image obtained in the various application scenes, and the image to be segmented obtained in different application scenes may be different, for example, an image related to sky and airplane elements, or an image related to people and lawn.

For an image to be segmented, the embodiment of the present disclosure may extract a feature map by using various feature extraction methods. For example, features may be extracted from an image to be segmented by image processing, and a feature map may be extracted by using a trained feature extraction network.

In consideration of the fact that the feature extraction network can mine deeper image features, the feature extraction network can be adopted to extract the feature map in the embodiment of the disclosure. The feature extraction network may be a Convolutional Neural Network (CNN). In specific application, the Feature map can be extracted by a method of efficiently extracting each dimension Feature in a picture by using a CNN model such as a Feature Pyramid Network (FPN).

In the embodiment of the disclosure, two edge features with different vector directions can be extracted based on the edge compression network. In practical applications, the two edge features (the first edge feature and the second edge feature) may be extracted by using an edge Compression Network (BCN), and directions of corresponding feature vectors may be opposite to further enhance the feature at the edge position of the target object, for example, the two edge features may be compressed from two directions, i.e., from outside to inside and from inside to outside, respectively.

It can be known that, in the embodiment of the present disclosure, the edge feature of the image is compressed in a manner of learning feature vectors in two different directions through an edge compression network, so that the boundary of the target object can be effectively found by the segmentation implemented by combining the feature map of the image to be segmented, that is, the approximate contour of the target object can be obtained, thereby significantly improving the performance of image segmentation.

In the extraction process of the feature map, the embodiment of the disclosure can obtain an original multi-scale feature map based on the FPN extraction, where the original multi-scale feature map includes feature maps of multiple scales. The FPN herein may include a plurality of connected convolutional layers and a pooling layer corresponding to each convolutional layer.

Under the condition that an image to be segmented is input into the FPN, sample convolution characteristics respectively output by a plurality of connected convolution layers can be obtained, so that for each pooling layer, the pooling characteristics output by the pooling layer are determined based on the pooling characteristics output by the last pooling layer of the pooling layer and the convolution characteristics output by the convolution layer corresponding to the pooling layer, and finally, the characteristic diagram of the image to be segmented is determined based on the pooling characteristics output by each pooling layer.

A bottom-to-top via may be formed for the convolutional layer, which may correspond to a process of concentrating the expression features layer-by-layer from bottom to top. Wherein, the lower layer reflects the image information characteristics of the shallower layer, such as the edge, etc., and the higher layer reflects the image characteristics of the deeper layer, such as the object outline, even the category, etc.; for pooling layers, top-to-bottom paths may be formed, and information of each pooling layer may be processed by referring to the higher-level information of the previous layer as input to obtain feature maps (corresponding to the original multi-scale feature maps) of various scales.

In order to improve the accuracy of subsequent edge feature extraction to a certain extent, one feature map with the largest scale may be selected from feature maps with multiple scales as a bottom-layer single-scale feature map, which mainly considers that the bottom-layer single-scale feature map may represent a shallower-layer picture information feature, such as an edge feature, and is further convenient for implementing the extraction of the edge feature.

Under the condition of obtaining an original multi-scale feature map and a bottom-layer single-scale feature map, sum-value operation can be firstly carried out, and then the extraction of edge features is realized on the basis of the obtained first sum-value feature map and a trained edge compression network, and the method can be specifically realized through the following steps:

performing cascade operation based on a first sum characteristic diagram and an original multi-scale characteristic diagram to obtain a cascade characteristic diagram;

inputting the cascade characteristic diagram into different convolution layers of an edge compression network to carry out edge characteristic extraction, and obtaining two characteristic vectors in different directions;

and thirdly, performing bilinear interpolation operation on the first sum characteristic graph based on the two characteristic vectors respectively to obtain a first edge characteristic and a second edge characteristic.

Here, the cascade feature map may be obtained based on a cascade operation of the first sum feature map and the original multi-scale feature map, then two feature vectors with different directions are determined by using different convolution layers of the edge compression network, and finally two interpolated features are obtained based on a bilinear interpolation operation and serve as the two edge features.

In this case, the cascade operation may be cascade operation on the channel, which is different from the sum operation, and the sum operation may be an addition and fusion operation of feature values of corresponding dimensions.

To further explain the above-mentioned edge feature extraction process, fig. 2 (c) may be combined next_a) And FIG. 2(b) is further illustrated.

As shown in FIG. 2(a), the original multi-scale feature map F_RoIAnd the bottom single-scale feature map F_RoI' after a sum operation (corresponding to a "+" operation), a first sum profile F can be obtained_sumThe first sum feature map can be obtained by 2-layer convolution operation to obtain a convolved first sum feature map F'_sum. Feature map F 'for first sum value'_sumIn other words, two edge features (F) can be implemented using BCN_conAnd F_exp) The extraction of (1). The tool can be used in FIG. 2(b)The body illustrates the process of extracting edge features.

As shown in FIG. 2(b), the convolved first sum feature map F'_sumAnd the original multi-scale feature map F_RoIThe cascade operation (corresponding to "C" operation) is performed first, and then one of two feature vectors with different directions is learned through the 3 × 3 convolutional layer, which corresponds to one dimension of the feature vector of 14 × 14 × 2. Further, bilinear interpolation operation is carried out on the first sum value feature map based on the feature vector to obtain interpolated feature F_squThe interpolated feature may be used as the edge feature.

Here, after performing sum-value operation on the first edge feature, the second edge feature and the original multi-scale feature map, image segmentation on the image to be segmented can be realized based on the obtained second sum-value feature map, and an image segmentation result is obtained. In a specific application, the image segmentation process can be realized by using a main segmentation network.

In a specific application, the sum operation between the first edge feature and the second edge feature may be performed first to obtain the sum edge feature F_bouThis edge feature sum value is then summed with the original multi-scale feature map and the final sum value result is input into the principal segmentation network to determine the image segmentation result, as shown in fig. 2 (b). It is appreciated that the first edge feature and the second edge feature may manifest edge characteristics from a dilation angle and a erosion angle, respectively, to improve the accuracy of the final segmentation result.

In consideration of the key role of the training process of each network on the image segmentation method provided by the embodiment of the present disclosure, the following description may focus on the training process of each network.

Here, the edge compression network and the main segmentation network may be used as sub-networks of the neural network, and training of the neural network may be performed together.

The training process of the neural network can be realized by the following steps:

the method comprises the following steps of firstly, obtaining a plurality of image samples and original marking information for each image sample;

secondly, extracting the characteristics of the image sample to obtain a sample characteristic diagram;

and step three, training the neural network based on the extracted sample characteristic diagram and the original labeling information of the image sample.

Here, the image samples with the original annotation information may be obtained first, and then feature extraction may be performed on each image sample to obtain a sample feature map. Therefore, the trained main segmentation network and the edge compression network can be obtained by performing network training on the extracted sample feature map and the original labeling information aiming at the image sample.

The relevant image sample may be an image obtained by combining a specific application scene, and the operation of obtaining the image to be segmented may be specifically referred to, which is not described herein again.

In the embodiment of the present disclosure, the extraction related to the sample feature map may also be obtained by using a feature extraction network, and in the same way, sample feature maps of multiple scales may be extracted from an image sample. In the process of neural network training, the embodiment of the disclosure can realize the extraction of the related sample feature map by combining the region feature aggregation mode.

The regional feature aggregation mode can be an ROI Align mode correspondingly, the ROI Align mode is used as an improved mode of an ROI Pooling method, quantization operation is cancelled, and an image feature value on a pixel point with the coordinate as a floating point number is obtained by using a bilinear interpolation method, so that the whole feature aggregation process is converted into a continuous operation. Here, one sample feature map of a corresponding scale may be selected from the sample feature maps of the multiple scales based on the pre-labeled scale of the target object, and the training of the neural network may be further implemented in combination with the sample feature map of the largest scale.

In the embodiment of the present disclosure, the network training process related to the main segmentation network and the edge compression network may be implemented by the following steps:

firstly, carrying out image segmentation on the extracted sample characteristic graph by using a neural network, outputting an image segmentation result, and comparing the image segmentation result with original annotation information of an image sample;

and secondly, adjusting a network parameter value of at least one of the main segmentation network and the edge compression network based on the comparison result.

The network parameter values may be adjusted based on a comparison between the image segmentation results output by the primary segmentation network and the original annotation information for the image sample.

In the embodiment of the present disclosure, considering the key influence of two edge features on the accuracy of the segmentation result, the influence of the edge features may be enhanced by different supervisory signals. In the process of training the main segmentation network and the edge compression network, three sub segmentation networks in the neural network can be synchronously trained, and the segmentation accuracy of the main segmentation network is further improved through the combined action of the three sub segmentation networks.

Before performing the joint training of the three sub-segmentation networks, the first object labeling information, the second object labeling information, and the third object labeling information of the target object included in the image sample may be determined based on the original labeling information of each image sample.

The first object labeling information is used for indicating that the labeling value of the edge pixel point of the target object is different from the labeling values of other non-edge pixel points, the second object labeling information is used for indicating that the labeling value of the corresponding corrosion area in the target object is different from the labeling values of other non-corrosion areas, and the third object labeling information is used for indicating that the labeling value of the corresponding expansion area outside the target object is different from the labeling values of other non-expansion areas.

Training of the neural network comprising the various sub-networks can then be performed as follows:

step one, performing edge feature extraction on a sample feature graph to obtain a first sample edge feature and a second sample edge feature; the direction of the feature vector corresponding to the first sample edge feature is different from that of the feature vector corresponding to the second sample edge feature;

and secondly, training the neural network based on the first sample edge feature, the second sample edge feature, and the first object labeling information, the second object labeling information and the third object labeling information of the target object included in the image sample.

Here, for the process of extracting the edge feature for the sample feature map, reference may be made to the above-mentioned process of extracting the edge feature for the feature map, and details are not described here again.

In the embodiment of the present disclosure, training of each network may be implemented based on two extracted sample edge features and three object labeling information according to the following steps:

performing sum value operation on a first sample edge feature and a second sample edge feature, inputting the obtained sum value edge feature into a first sub-segmentation network to obtain a first object segmentation result aiming at a target object and output by the first sub-segmentation network, and comparing the first object segmentation result with first object labeling information of the target object in an image sample; inputting the first sample edge feature into a second sub-segmentation network to obtain a second object segmentation result output by the second sub-segmentation network and aiming at the target object, and comparing the second object segmentation result with second object marking information of the target object included in the image sample; inputting the edge characteristics of the second sample into a third sub-segmentation network to obtain a third object segmentation result output by the third sub-segmentation network and aiming at the target object, and comparing the third object segmentation result with third object labeling information of the target object included in the image sample;

and step two, if any comparison result corresponding to each secondary segmentation network and the main segmentation network is inconsistent, adjusting network parameter values of the main segmentation network, the edge compression network and at least one network in the three secondary segmentation networks until the comparison results are consistent, and obtaining the trained neural network.

Here, constraints for different sub-segmentation networks may be implemented in connection with different object labeling information. For the first sub-segmentation network, the constraint can be performed through a comparison result between a first object segmentation result output by the first sub-segmentation network corresponding to a sum edge feature between the first sample edge feature and the second sample edge feature and first object labeling information of the highlighted edge pixel point; for the second sub-segmentation network, constraint can be performed through a comparison result between a second object segmentation result output by the second sub-segmentation network corresponding to the first sample edge feature and second object marking information of a corresponding corrosion region in the highlighted target object; for the third sub-segmentation network, the constraint may be performed by a comparison result between a third object segmentation result output by the third sub-segmentation network corresponding to the second sample edge feature and third object labeling information highlighting a corresponding expansion region outside the target object.

Here, the adjustment of each network can be realized regardless of which of the sub-divided network and the main divided network corresponds to the inconsistent comparison result, and a trained neural network can be finally obtained.

Under the condition that the image to be segmented is input into a trained edge compression network, two edge features can be obtained. The main segmentation network can be used for combining the edge features and the feature graph of the image to be segmented to realize rapid and accurate image segmentation.

In the embodiment of the disclosure, the neural network further includes a target detection network, so that in the process of network training, the target detection network can be used to perform target detection on the extracted sample feature map, output a target detection result, compare the target detection result with the original labeling information of the image sample, and adjust a network parameter value of at least one of the main segmentation network, the edge compression network and the target detection network based on the comparison result.

It should be noted that, in the method for image segmentation provided by the embodiment of the present disclosure, the training on the neural network may be training on a plurality of sub-networks in combination, specifically which sub-network or sub-networks are adjusted, and information such as the specific number of network parameter values to be adjusted may be determined based on a specific training requirement, which is not limited herein.

To facilitate a further understanding of the above network training process, further description is provided below in conjunction with fig. 3.

As shown in the sample feature map of FIG. 3, in the passing featureAfter the network extraction and the RoIAlign operation are carried out, an original multi-scale feature map F of the sample feature map can be obtained_RoIMeanwhile, based on the RoIAlign operation and the Conv operation, a bottom layer single-scale feature map F can be obtained_RoI'. After passing through the edge compression network, two sample edge features (i.e., F) can be reached_conAnd F_exp)。

After summing the two sample edge features, a sum edge feature F can be obtained_bouThe sum edge feature and the original multi-scale feature map F_RoIThe common input principal segmentation network may determine a first loss function value Lseg (corresponding to the image segmentation result output by the principal segmentation network) while, at the same time, the sum-value edge feature F_bouTwo sample edge features F_expAnd F_conThree sub-divided networks (corresponding to the first sub-divided network, the second sub-divided network and the third sub-divided network) are correspondingly input, three loss function values (corresponding to the second loss function value Lbou, the third loss function value Lcon and the fourth loss function value Lexp) can be determined, and network training is carried out by combining the four loss function values, so that each trained network can be obtained.

In the embodiment of the disclosure, the target detection task can be simultaneously performed while the image segmentation task is performed. The target detection can be performed on the feature map based on the trained target detection network, and the detection information of the target object in the image to be segmented is determined. The detection information may be embodied in the form of a target frame, so that the information such as the position and size of the target object in the image to be segmented can be determined.

The target detection network may be synchronously trained in the process of training the main segmentation network and the edge compression network, as shown in fig. 3, a prediction box (corresponding to the fifth loss function value Ldet) of the target object may be determined, and the prediction box is compared with the real box, so that the training of the target detection network may be implemented.

It should be noted that the method for image segmentation provided by the embodiment of the present disclosure may be directly applied to application scenarios of some segmentation tasks, such as portrait segmentation and matting. In addition, the edge compression network in the embodiment of the present disclosure may be applied to any semantic segmentation and instance segmentation model separately, so as to further improve the segmentation effect.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same inventive concept, an image segmentation apparatus corresponding to the image segmentation method is also provided in the embodiments of the present disclosure, and because the principle of the apparatus in the embodiments of the present disclosure for solving the problem is similar to the method of the embodiments of the present disclosure for image segmentation, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not repeated.

Referring to fig. 4, a schematic diagram of an image segmentation apparatus provided in an embodiment of the present disclosure is shown, where the apparatus includes: a first extraction module 401, a second extraction module 402, a segmentation module 403; wherein the content of the first and second substances,

the first extraction module 401 is configured to perform feature extraction on an image to be segmented to obtain a feature map of the image to be segmented;

a second extraction module 402, configured to perform edge feature extraction on the obtained feature map to obtain a first edge feature and a second edge feature for the image to be segmented; the direction of the feature vector corresponding to the first edge feature is different from that of the feature vector corresponding to the second edge feature;

and a segmentation module 403, configured to determine an image segmentation result of the image to be segmented based on the first edge feature, the second edge feature, and a feature map of the image to be segmented.

By adopting the image segmentation device, under the condition of acquiring the image to be segmented, the image to be segmented can be subjected to feature extraction, and then the extracted feature graph can be subjected to edge feature extraction, so that a first edge feature and a second edge feature with different feature vector directions are obtained. The image segmentation is carried out by combining the two edge features (namely the first edge feature and the second edge feature) and the feature map of the image to be segmented, so that the edge contour of the target object can be conveniently revealed from the image to be segmented, and the accuracy of the image segmentation is improved.

In a possible implementation manner, the first extraction module 401 is configured to perform feature extraction on an image to be segmented according to the following steps to obtain a feature map of the image to be segmented:

selecting one feature map with the largest scale from the feature maps with multiple scales as a bottom-layer single-scale feature map;

a second extraction module 402, configured to perform edge feature extraction on the obtained feature map according to the following steps, so as to obtain a first edge feature and a second edge feature for the image to be segmented:

and performing edge feature extraction on the first sum feature graph to obtain a first edge feature and a second edge feature.

In a possible implementation manner, the second extraction module 402 is configured to perform edge feature extraction on the first sum feature map according to the following steps to obtain a first edge feature and a second edge feature:

and performing bilinear interpolation operation on the first sum characteristic graph based on the two characteristic vectors respectively to obtain a first edge characteristic and a second edge characteristic.

In a possible implementation, the segmentation module 403 is configured to determine an image segmentation result of the image to be segmented based on the first edge feature, the second edge feature, and the feature map of the image to be segmented according to the following steps:

In one possible implementation, the edge feature extraction of the obtained feature map is performed by an edge compression network in a neural network; performing image segmentation on the image to be segmented based on the second sum-value feature map by a main segmentation network in a neural network; also included is a training module 404;

a training module 404 for training the neural network according to the following steps:

acquiring a plurality of image samples and original annotation information aiming at each image sample; carrying out feature extraction on the image sample to obtain a sample feature map; and training the neural network based on the extracted sample characteristic diagram and the original labeling information of the image sample.

In one possible implementation, the training module 404 is configured to train the neural network based on the extracted sample feature map and the original labeling information of the image sample according to the following steps:

carrying out image segmentation on the extracted sample characteristic graph by using a neural network, outputting an image segmentation result, and comparing the image segmentation result with original labeling information of an image sample;

In one possible embodiment, the neural network further comprises a first sub-divided network, a second sub-divided network, and a third sub-divided network; a training module 404, configured to train a neural network based on the extracted sample feature map and the original labeling information of the image sample according to the following steps:

determining first object labeling information, second object labeling information and third object labeling information of a target object in the image sample based on the original labeling information of each image sample; the first object marking information is used for indicating that the marking value of the edge pixel point of the target object is different from the marking values of other non-edge pixel points, the second object marking information is used for indicating that the marking value of the corresponding corrosion area in the target object is different from the marking values of other non-corrosion areas, and the third object marking information is used for indicating that the marking value of the corresponding expansion area outside the target object is different from the marking values of other non-expansion areas; and the number of the first and second groups,

and training the neural network based on the first sample edge feature, the second sample edge feature, and the first object labeling information, the second object labeling information and the third object labeling information of the target object included in the image sample.

In one possible implementation, the training module 404 is configured to train the neural network based on the first sample edge feature, the second sample edge feature, the first object labeling information, the second object labeling information, and the third object labeling information of the target object included in the image sample, according to the following steps:

performing sum value operation on the first sample edge characteristic and the second sample edge characteristic, inputting the obtained sum value edge characteristic into a first sub-segmentation network to obtain a first object segmentation result aiming at a target object and output by the first sub-segmentation network, and comparing the first object segmentation result with first object labeling information of the target object in the image sample; and the number of the first and second groups,

inputting the first sample edge feature into a second sub-segmentation network to obtain a second object segmentation result output by the second sub-segmentation network and aiming at the target object, and comparing the second object segmentation result with second object marking information of the target object included in the image sample; and the number of the first and second groups,

inputting the edge characteristics of the second sample into a third sub-segmentation network to obtain a third object segmentation result output by the third sub-segmentation network and aiming at the target object, and comparing the third object segmentation result with third object labeling information of the target object in the image sample;

if any one of the comparison results corresponding to each secondary segmentation network and the primary segmentation network is inconsistent, network parameter values of the primary segmentation network, the edge compression network and at least one of the three secondary segmentation networks are adjusted until the comparison results are consistent, and a trained neural network is obtained.

In one possible embodiment, the neural network further comprises a target detection network; a training module 404, configured to train a neural network based on the extracted sample feature map and the original labeling information of the image sample according to the following steps:

carrying out target detection on the extracted sample characteristic graph by using a target detection network, outputting a target detection result, and comparing the target detection result with original labeling information of the image sample;

and adjusting the network parameter value of at least one of the main segmentation network, the edge compression network and the target detection network based on the comparison result.

extracting sample feature maps of multiple scales from an image sample;

selecting a sample feature map with the largest scale from the sample feature maps with a plurality of scales; selecting a corresponding sample feature map from the sample feature maps with multiple scales based on the pre-labeling scale of the target object in the image sample;

and training the neural network based on the two selected sample characteristic graphs and the original labeling information of the image sample.

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

An embodiment of the present disclosure further provides an electronic device, as shown in fig. 5, which is a schematic structural diagram of the electronic device provided in the embodiment of the present disclosure, and the electronic device includes: a processor 501, a memory 502, and a bus 503. The memory 502 stores machine-readable instructions executable by the processor 501 (for example, the execution instructions corresponding to the first extraction module 401, the second extraction module 402, the segmentation module 403 in the apparatus in fig. 4, etc.), when the electronic device is operated, the processor 501 and the memory 502 communicate via the bus 503, and when the machine-readable instructions are executed by the processor 501, the following processes are performed:

performing edge feature extraction on the obtained feature map to obtain a first edge feature and a second edge feature of the image to be segmented; the direction of the feature vector corresponding to the first edge feature is different from that of the feature vector corresponding to the second edge feature;

The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the method for image segmentation described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the image segmentation method described in the foregoing method embodiments, which may be referred to specifically for the foregoing method embodiments, and are not described herein again.

The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method of image segmentation, the method comprising:

2. The method according to claim 1, wherein the extracting features of the image to be segmented to obtain the feature map of the image to be segmented comprises:

3. The method of claim 2, wherein the performing edge feature extraction on the first sum feature map to obtain the first edge feature and a second edge feature comprises:

4. The method according to claim 2 or 3, wherein the determining an image segmentation result of the image to be segmented based on the first edge feature, the second edge feature and a feature map of the image to be segmented comprises:

5. The method of claim 4, wherein the edge feature extraction of the obtained feature map is performed by an edge compression network in a neural network; performing image segmentation on the image to be segmented based on the second sum-value feature map by a main segmentation network in the neural network;

training the neural network as follows:

and training the neural network based on the extracted sample feature map and the original labeling information of the image sample.

6. The method of claim 5, wherein the training the neural network based on the extracted sample feature map and the original labeling information of the image sample comprises:

7. The method of claim 6, further comprising:

8. The method of claim 7, wherein the training the neural network based on the first sample edge feature, the second sample edge feature, first object labeling information, second object labeling information, and third object labeling information of a target object included in the image sample comprises:

9. The method of any one of claims 6-8, wherein the neural network further comprises an object detection network; the training the neural network based on the extracted sample feature map and the original labeling information of the image sample comprises:

10. The method according to any one of claims 5 to 9, wherein the performing feature extraction on the image sample to obtain a sample feature map comprises:

extracting sample feature maps of multiple scales from the image sample;

11. An apparatus for image segmentation, the apparatus comprising:

12. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the method of image segmentation according to any one of claims 1 to 10.

13. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method of image segmentation according to any one of claims 1 to 10.