CN113420827A

CN113420827A - Semantic segmentation network training and image semantic segmentation method, device and equipment

Info

Publication number: CN113420827A
Application number: CN202110771852.0A
Authority: CN
Inventors: 杨昀欣; 万建伟; 贺凯; 孙科; 余非; 裴卫民; 冯文亮
Original assignee: Shanghai Pudong Development Bank Co Ltd
Current assignee: Shanghai Pudong Development Bank Co Ltd
Priority date: 2021-07-08
Filing date: 2021-07-08
Publication date: 2021-09-21

Abstract

The embodiment of the invention discloses a semantic segmentation network training and image semantic segmentation method, device and equipment. The semantic segmentation network training method comprises the following steps: inputting a sample image into a classification network for feature extraction, and determining a seed region in the sample image according to an extracted classification sample feature map; inputting the sample image into a segmentation network for feature extraction, and determining an initial segmentation result output by the segmentation network according to the extracted segmentation sample feature map and a label corresponding to the sample image; and training the segmentation network by taking the seed region as supervision information and adopting a seed region growth model and a conditional random field model based on the initial segmentation result to obtain the trained semantic segmentation network. According to the technical scheme of the embodiment of the invention, the segmentation network training is carried out through the sample image prior category information, the problem of segmentation network noise category output is relieved, and the image semantic segmentation effect is improved.

Description

Semantic segmentation network training and image semantic segmentation method, device and equipment

Technical Field

The embodiment of the invention relates to a computer vision technology, in particular to a semantic segmentation network training and image semantic segmentation method, device and equipment.

Background

As a key task in the field of computer vision, image semantic segmentation has become a research focus in recent years, and is widely applied to the fields of medical image diagnosis, automatic driving, geographic information annotation of satellite images, robot semantics, instant positioning and Mapping (SLAM), and the like.

Nowadays, a large number of semantic segmentation schemes are researched around a fully supervised convolutional neural network, and such schemes need to label a sample image at a pixel level manually, which consumes a large amount of labor cost, so that an image semantic segmentation method based on weak supervised learning is called as an important research direction.

The seed region growing method is a common method in image semantic segmentation based on weak supervised learning, and in the seed region growing method, various seed point positioning methods are usually used to obtain an initial seed region, and then a segmentation mask is expanded based on the seed region and gradually iterates to converge along with a model to form a final segmentation result. The semantic segmentation method based on region growth depends on the quality of the seed region and the region growth condition, so how to obtain a more optimal seed region and optimizing the region growth condition are very important for improving the semantic segmentation effect.

Disclosure of Invention

The embodiment of the invention provides a semantic segmentation model training and image semantic segmentation method, device, equipment and medium.

In a first aspect, an embodiment of the present invention provides a semantic segmentation network training method, where the method includes:

inputting a sample image into a classification network for feature extraction, and determining a seed region in the sample image according to an extracted classification sample feature map;

inputting the sample image into a segmentation network for feature extraction, and determining an initial segmentation result output by the segmentation network according to the extracted segmentation sample feature map and a label corresponding to the sample image;

and training the segmentation network by taking the seed region as supervision information and adopting a seed region growth model and a conditional random field model based on the initial segmentation result to obtain the trained semantic segmentation network.

In a second aspect, an embodiment of the present invention provides an image semantic segmentation method, where the method includes:

taking an image to be segmented as input of a semantic segmentation network, wherein the semantic segmentation network is trained by adopting a semantic segmentation network training method provided by any embodiment of the invention in advance;

and determining the segmentation result of the image to be segmented according to the output result of the semantic segmentation network.

In a third aspect, an embodiment of the present invention further provides a semantic segmentation network training apparatus, where the apparatus includes:

the seed region determining module is used for inputting the sample image into a classification network for feature extraction and determining a seed region in the sample image according to the extracted classification sample feature map;

the initial segmentation result determining module is used for inputting the sample image into the segmentation network for feature extraction, and determining an initial segmentation result output by the segmentation network according to the extracted segmentation sample feature map and the label corresponding to the sample image;

and the segmentation network training module is used for training a segmentation network by taking the seed region as supervision information and adopting a seed region growth model and a conditional random field model based on the initial segmentation result to obtain a trained semantic segmentation network.

In a fourth aspect, an embodiment of the present invention provides an image semantic segmentation apparatus, where the apparatus includes:

the system comprises an image to be segmented input module, a semantic segmentation network training module and a semantic segmentation network output module, wherein the image to be segmented input module is used for inputting an image to be segmented as a semantic segmentation network, and the semantic segmentation network is trained by adopting a semantic segmentation network training method provided by any embodiment of the invention in advance;

and the segmentation result determining module is used for determining the segmentation result of the image to be segmented according to the output result of the semantic segmentation network.

In a fifth aspect, an embodiment of the present invention further provides an electronic device, including:

one or more processors;

a memory for storing one or more programs;

when the one or more programs are executed by the one or more processors, the one or more processors implement the semantic segmentation network information method or the image semantic segmentation method provided by any embodiment of the invention.

In a sixth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the semantic segmentation network training method or the image semantic segmentation method provided in any embodiment of the present invention.

According to the technical scheme, the sample image is input to a classification network for feature extraction, a seed region in the sample image is determined according to an extracted classification sample feature map, the sample image is input to a segmentation network for feature extraction, an initial segmentation result output by the segmentation network is determined according to the extracted segmentation sample feature map and a label corresponding to the sample image, the seed region is finally used as supervision information, based on the initial segmentation result, a seed region growth model and a conditional random field model are adopted to train the segmentation network, the trained semantic segmentation network is obtained, segmentation network training is achieved through sample image prior class information, the problem of segmentation network noise class output is relieved, and the image semantic segmentation effect is improved.

Drawings

FIG. 1a is a flowchart of a semantic segmentation network training method according to a first embodiment of the present invention;

FIG. 1b is a schematic diagram of semantic segmentation network training according to an embodiment of the present invention;

FIG. 2a is a flowchart of a semantic segmentation network training method according to a second embodiment of the present invention;

FIG. 2b is a schematic structural diagram of a non-local module according to a second embodiment of the present invention;

FIG. 3a is a flowchart of a semantic segmentation network training method according to a third embodiment of the present invention;

FIG. 3b is a schematic diagram of a classification network structure according to a third embodiment of the present invention;

FIG. 4a is a flowchart of a semantic segmentation method according to a fourth embodiment of the present invention;

FIG. 4b is a diagram illustrating semantic segmentation of an image according to a fourth embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a semantic segmentation network training apparatus according to a fifth embodiment of the present invention;

FIG. 6 is a schematic structural diagram of an image semantic segmentation apparatus according to a sixth embodiment of the present invention;

fig. 7 is a schematic structural diagram of an apparatus according to a seventh embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1a is a flowchart of a semantic segmentation network training method in an embodiment of the present invention, where the technical solution of this embodiment is suitable for a case of performing segmentation network training through sample image prior class information, and the method may be executed by a semantic segmentation network training apparatus, and the apparatus may be implemented by software and/or hardware, and may be integrated in various general-purpose computer devices. The semantic segmentation network training method in the embodiment specifically comprises the following steps:

and step 110, inputting the sample image into a classification network for feature extraction, and determining a seed region in the sample image according to the extracted classification sample feature map.

The sample image is used for training the segmentation network, and specifically comprises an image, a classification label of the image and a position area of a target object in the image, wherein the sample image is an image of a bicycle ridden person, and the corresponding labels are image classifications of 'person' and 'bicycle', and the position areas of the person and the bicycle in the image; the classification network is a network structure for classifying an input image, and may be, for example, a convolutional neural network, e.g., a residual network; the seed region refers to a partial region in the sample image belonging to a set category, for example, for an image including a person, the obtained seed region is a head region of the person, and the seed region is used as supervision information to segment a complete region where the target object is located.

In this embodiment, a sample image collected in advance and labeled is input into a classification network trained in advance to perform feature extraction, so as to obtain a classification sample feature map, and then a seed region in the sample image is determined according to the classification sample feature map, specifically, as shown in fig. 1b, feature extraction is performed on the input sample image through a plurality of convolutional layers in the classification network, a weak positioning method is adopted based on the classification sample feature map output by the last convolutional layer to generate a thermodynamic diagram corresponding to the classification sample feature map, and finally, the seed region in the sample image is determined according to a response value of each pixel position in the thermodynamic diagram.

Exemplarily, the classification network includes a plurality of convolution layers, a pooling layer, a full-link layer and an Activation function, convolution processing is performed on an input sample image by the plurality of convolution layers in the classification network to obtain a classification sample feature map output by a last convolution layer, further, a thermodynamic diagram corresponding to the classification sample feature map is obtained by using a Class Activation Mapping (CAM) method or a Gradient-weighted Class Activation Mapping (Grad-CAM) method, and a set formed by pixels of which response values are greater than a segmentation threshold in the thermodynamic diagram is used as a seed region by using a preset segmentation threshold.

In addition, in order to optimize the process of generating the seed region and obtain a higher-quality seed region, the pooling layer in the classification network may be set as a spatial pyramid pooling layer, the classification sample feature maps output by the trunk network are segmented into feature maps of different sizes in the spatial pyramid pooling layer, and then the feature maps of different sizes are processed and fused, so that the higher-quality seed region is obtained by performing fusion of multi-scale semantics.

And 120, inputting the sample image into a segmentation network for feature extraction, and determining an initial segmentation result output by the segmentation network according to the extracted segmentation sample feature map and the label corresponding to the sample image.

The segmentation network is a network structure for performing semantic segmentation on an input image, and may be, for example, a convolutional neural network.

In the embodiment, a sample image collected in advance and labeled is input to a segmentation network for feature extraction to obtain a segmentation sample feature map, a label labeled in advance and corresponding to the sample image is introduced as prior knowledge and is added to the segmentation sample feature map, the network is forced to learn a segmentation region corresponding to a category, finally, an initial segmentation result output by the segmentation network is determined according to the segmentation sample feature map and the preset label corresponding to the sample image, specifically, after the sample image is input to the segmentation network, the segmentation sample feature map is output by a plurality of convolution layers in the segmentation network, wherein the number of the segmentation sample feature maps is 1 more than the total number of image categories, the 1 segmentation sample feature map is a background of the sample image, and the label labeled in advance is applied to the segmentation sample feature map to forcibly learn the segmentation region corresponding to the category by the segmentation network, the problem of suppressing the existence of noise classes in the split network output.

And step 130, taking the seed region as supervision information, and training the segmentation network by adopting a seed region growth model and a conditional random field model based on an initial segmentation result to obtain a trained semantic segmentation network.

The method comprises the following steps that firstly, a small block or a seed region in a target object to be segmented in an image is given, and then, on the basis of the seed region, surrounding pixel points are continuously added into the seed region according to a certain rule, so that the aim of finally combining all pixel points representing the object into one region is fulfilled; the other method is that the image is firstly divided into a plurality of small areas with strong consistency, such as the small areas with the same pixel gray value in the area, and then the small areas are fused into a large area according to a certain rule, so as to achieve the purpose of dividing the image.

In this embodiment, a seed region is used as supervision information, based on an initial segmentation result output by a segmentation network, a seed region growth model and a conditional random field model are used to construct a loss function, and the segmentation network is trained based on the loss function.

Example two

Fig. 2a is a flowchart of a semantic segmentation network training method in the second embodiment of the present invention, which is further refined based on the above embodiments and provides a specific step of inputting a sample image into a classification network for feature extraction, and determining a seed region in the sample image according to the extracted classification sample feature map, and a specific step of inputting the sample image into a segmentation network for feature extraction, and determining an initial segmentation result output by the segmentation network according to the extracted segmentation sample feature map and a label corresponding to the sample image. A semantic segmentation network training method provided by the second embodiment of the present invention is described below with reference to fig. 2a, which includes the following steps:

step 210, inputting the sample image into a classification network for feature extraction, and obtaining a classification sample feature map.

In this embodiment, in order to determine a seed region in a sample image, the sample image is first input to a classification network for feature extraction, so as to obtain a classification sample feature map, and specifically, the sample image is input to a trunk network in the classification network for convolution processing, so as to obtain the classification sample feature map. Illustratively, the backbone network is a convolutional neural network, and in particular, the backbone network employs ResNet 101.

Step 220, aiming at the classification sample feature graph, generating a feature thermodynamic diagram by adopting a weak positioning method;

in this embodiment, a weak localization method is used to generate a feature thermodynamic diagram corresponding to a sample image for a classification sample feature diagram output by a backbone network in a classification network, so as to determine a seed region in the sample image according to a response value of each pixel point in the feature thermodynamic diagram. Illustratively, a thermodynamic diagram corresponding to the classification sample characteristic diagram is obtained by a CAM method or a Grad-CAM method.

And step 230, segmenting the threshold value according to the thermodynamic diagram, segmenting the characteristic thermodynamic diagram, and determining a seed region in the sample image according to the segmentation result.

In this embodiment, the characteristic thermodynamic diagram is segmented according to a preset thermodynamic diagram segmentation threshold, and a seed region in the sample image is determined according to the segmentation result, specifically, after the thermodynamic diagram is segmented, a region in the thermodynamic diagram, in which a response value is greater than the set threshold, may be determined as the seed region.

For example, the thermodynamic diagram segmentation threshold is 0.3 and 0.6, the characteristic thermodynamic diagram can be segmented according to the thermodynamic diagram segmentation threshold into three classes of response values between 0-0.3, 0.3-0.6 and 0.6-1, and finally, a set of pixel points with response values greater than 0.6 can be used as a seed region.

Optionally, before segmenting the feature thermodynamic diagram according to the thermodynamic diagram segmentation threshold and determining the seed region in the sample image according to the segmentation result, the method further includes:

segmenting the characteristic thermodynamic diagram according to a preset initial segmentation threshold;

and determining a thermodynamic diagram segmentation threshold value by calculating an intersection ratio of the segmentation result and the semantic segmentation result of the sample image.

In this optional embodiment, a determination method of a thermodynamic diagram segmentation threshold is provided, where a feature thermodynamic diagram is segmented according to a preset initial segmentation threshold to obtain a segmentation result, and the initial segmentation threshold is corrected according to semantic segmentation of a pre-labeled sample image, specifically, an intersection ratio is calculated between the segmentation result and the pre-labeled segmentation result, so as to obtain the thermodynamic diagram segmentation threshold. For example, the initial segmentation threshold may be selected from 0-1 in steps of 0.05.

Step 240, inputting the sample image into the segmentation network for feature extraction, and determining an initial segmentation result output by the segmentation network according to the extracted segmentation sample feature map and a label corresponding to the sample image.

Optionally, the segmentation network comprises a plurality of convolutional layers, a pooling layer, and a Nonlocal module; the non-local module is arranged behind the convolutional layers with the set number of layers;

inputting the sample image into a segmentation network for feature extraction, and determining an initial segmentation result output by the segmentation network according to the extracted segmentation sample feature map and a label corresponding to the sample image, wherein the initial segmentation result comprises the following steps:

inputting the sample image into a segmentation network, and performing feature extraction on the sample image by a plurality of convolution layers and a Nonlocal module in the segmentation network to obtain a segmentation sample feature map;

determining a plurality of heat vectors of the label corresponding to the sample image;

and calculating tensor products of the segmentation sample characteristic graph and the multiple heat vectors to obtain an initial segmentation result.

In this alternative embodiment, the segmentation network includes a plurality of convolutional layers, a pooling layer, and a Nonlocal module; the non-local module is arranged behind the convolutional layers with the set number of layers, for example, when the segmentation network comprises 5 convolutional layers, the non-local module can be arranged behind the 4 th or 5 th convolutional layer, processes the segmentation sample feature map output by the convolutional layers, and introduces spatial semantic information among pixels to improve the semantic segmentation effect; the introduction of the Nonlocal module can be a spatial semantic meaning between pixels, specifically, the structure of the Nonlocal module is shown in fig. 2b, and the specific structural expression is as follows:

where, x is the input matrix and,

is a convolution matrix for reducing the number of channels, W_σFor fusing channel information, α is a learnable parameter.

In this optional embodiment, a specific manner is further provided in which the sample image is input to the segmentation network for feature extraction, and an initial segmentation result output by the segmentation network is determined according to the extracted segmentation sample feature map and a label corresponding to the sample image, the sample image is input to the segmentation network, feature extraction is performed on the sample image by a plurality of convolution layers in the segmentation network, further, in order to introduce spatial semantic information between pixels, a non local module after a set level of convolution layers processes an output result of the convolution layers to obtain a segmentation sample feature map, where the number of segmentation sample feature maps is 1 more than the total number of classes (1 more segmentation sample feature map corresponds to a background of the sample image), further, a multi-heat vector corresponding to the label of the sample image is determined, and a tensor product of the segmentation sample feature map and the multi-heat vector is calculated, obtaining an initial segmentation result, wherein the value of the background category dimension in the multiple heat vectors is constant to 1, and the specific calculation formula is as follows:

wherein, v is the multi-heat vector of the label corresponding to the sample image, and f (X) is the segmentation sample characteristic diagram output by the main network in the segmentation network. It is noted that the eigenvalues of the background dimension in the multiple heat vectors are always 1.

And step 250, taking the seed region as supervision information, and training the segmentation network by adopting a seed region growth model and a conditional random field model based on an initial segmentation result to obtain a trained semantic segmentation network.

The technical scheme of the embodiment of the invention comprises the steps of inputting a sample image into a classification network for feature extraction, obtaining a classification sample feature map, generating a feature thermodynamic diagram by aiming at the classification sample feature map by adopting a weak positioning method, segmenting a threshold value according to the thermodynamic diagram, determining a seed region in the sample image according to a segmentation result, further inputting the sample image into the segmentation network for feature extraction, determining an initial segmentation result output by the segmentation network according to the extracted segmentation sample feature map and a label corresponding to the sample image, finally training the segmentation network by adopting a seed region growth model and a conditional random field model based on the initial segmentation result to obtain a trained semantic segmentation network, and realizing the training of the segmentation network through the prior information of the sample image category, the problem of output of the segmentation network noise category is solved, and the image semantic segmentation effect is improved.

EXAMPLE III

Fig. 3a is a flowchart of a semantic segmentation network training method in a third embodiment of the present invention, which is further refined based on the above embodiments and provides specific steps of training a segmentation network by using a seed region as supervision information and based on an initial segmentation result, using a seed region growth model and a conditional random field model. A semantic segmentation network training method provided by the third embodiment of the present invention is described below with reference to fig. 3a, which includes the following steps:

and 310, inputting the sample image into a classification network for feature extraction, and determining a seed region in the sample image according to the extracted classification sample feature map.

Optionally, the classification network includes a plurality of convolution layers, pyramid pooling layers, full-link layers, and activation functions;

before inputting the sample image into the classification network for feature extraction, the method further comprises the following steps:

carrying out feature extraction on the sample image through the plurality of convolution layers to obtain a classification sample feature map;

performing feature fusion on the classified sample feature map under different scales through a pyramid pooling layer to obtain sample feature vectors corresponding to the sample images;

processing the sample characteristic vectors through a full connection layer and an activation function to obtain a classification result of the sample image;

and constructing a classification loss function based on the classification result and the label of the sample image, and training the classification network based on the classification loss function.

In this optional embodiment, the classification network includes a plurality of convolution layers, pyramid pooling layers, full-link layers, and activation functions, where a trunk network of the classification network may be selected as a residual network, and the activation functions may be selected as a sigmoid function, a hyperbolic tangent function, a modified linear unit, and the like.

The optional embodiment further provides a training process of the classification network before inputting the sample image into the classification network for feature extraction, specifically as shown in fig. 3b, first performing feature extraction on the sample image through a plurality of convolution layers to obtain a classification sample feature map, further dividing the classification sample feature map according to different division scales through a pyramid pooling layer, for example, dividing the classification sample feature map into 1, 2, 3, 4, 6 and other dimensions to obtain multi-scale features, processing and splicing the multi-scale features to obtain a sample feature vector fusing the multi-scale features, further, processing the sample feature vector through a full-connection layer and an activation function to obtain a classification result of the sample image, finally constructing a classification loss function based on the classification result and a classification label of the sample image, and based on the classification loss function, training a classification network, wherein a classification loss function is designed as follows:

where n identifies the total number of target categories, yⁱIs a true tag for the category i,

representing the rating of the class by the classification network.

In the optional embodiment, by introducing the pyramid pooling layer, the semantic features under different scales are fused, the training effect of the classification network is improved, and the quality of the obtained seed region is further improved.

And step 320, inputting the sample image into the segmentation network for feature extraction, and determining an initial segmentation result output by the segmentation network according to the extracted segmentation sample feature map and the label corresponding to the sample image.

And 330, constructing a region growth loss function according to the probability that each pixel point in the initial segmentation result belongs to a set category.

In this embodiment, in order to train the segmentation network, a weak supervised semantic segmentation method is adopted, for example, Deep Seeded Region Growing (DSRG) is used as a dynamic supervised model for training, so that the segmentation network can expand a segmentation Region with a sufficient size, specifically, according to a probability that each pixel in an initial segmentation result belongs to a set category, a Region Growing loss function is constructed as follows:

where C is the set of categories present in the sample image,

as a background category, S_CFor the set of pixel positions belonging to class c obtained by the weak localization procedure, f_u,c(X) is the probability of being classified as class c at position u in the initial segmentation result output by the segmentation network.

And 340, constructing a conditional random field loss function according to the probability that each pixel point in the initial segmentation result belongs to the set category and the output of each pixel point aiming at the conditional random field of the set category.

In this embodiment, the conditional random field model is used to train the segmentation network, specifically, according to the probability that each pixel point in the initial segmentation result belongs to the set category and the output of each pixel point to the conditional random field of the set category, a conditional random field loss function is constructed as follows:

wherein Q is_u,c(X, F (X)) is the output of the conditional random field at position u for class c, f_u,c(X) is the probability of being classified as class c at position u in the initial segmentation result output by the segmentation network.

And 350, determining a segmentation network loss function according to the region growth loss function and the conditional random field loss function.

In this embodiment, the initial segmentation result output by the segmentation network is input to the depth seed region growing branch and the conditional random field branch, the depth seed region growing branch adopts a DSRG structure, and the region growing and loss calculation are performed on the segmentation mask according to the output probability. The conditional random field branches combine color and position information in the sample image to perform object boundary bounding on the segmentation mask output by the segmentation network.

Specifically, according to the region growing loss function and the conditional random field loss function, an overall segmentation network loss function is constructed as follows:

L＝L_DSRG+L_CRF+0.01*||w||₂

wherein | w | purple₂Is the L2 norm of all trainable parameters.

And step 360, training the segmentation network by using a gradient descent method according to the segmentation network loss function to obtain the trained semantic segmentation network.

In this embodiment, after determining the segmentation network loss function, the gradient descent method is used to train the segmentation network, so as to obtain a trained semantic segmentation network.

The technical scheme of the embodiment of the invention comprises the steps of inputting a sample image into a classification network for feature extraction, determining a seed region in the sample image according to an extracted classification sample feature map, then inputting the sample image into a segmentation network for feature extraction, determining an initial segmentation result output by the segmentation network according to the extracted segmentation sample feature map and a label corresponding to the sample image, further, constructing a region growth loss function according to the probability that each pixel point in the initial segmentation result belongs to a set category, constructing a conditional random field loss function according to the probability that each pixel point in the initial segmentation result belongs to the set category and the output of each pixel point aiming at the set category of a conditional random field, determining the segmentation network loss function according to the region growth loss function and the conditional random field loss function, and finally determining the segmentation network loss function according to the segmentation network loss function, and training the segmentation network by using a gradient descent method, and training the segmentation network by using a seed region growth model and a conditional random field model to improve the semantic segmentation effect.

Example four

Fig. 4a is a flowchart of an image semantic segmentation method in the fourth embodiment of the present invention, where the technical solution of this embodiment is suitable for performing semantic segmentation on an image to be segmented by a semantic segmentation network, and the method can be executed by an image semantic segmentation apparatus, and the apparatus can be implemented by software and/or hardware, and can be integrated in various general-purpose computer devices. The image semantic segmentation method in the embodiment specifically includes the following steps:

step 410, taking an image to be segmented as input of a semantic segmentation network, wherein the semantic segmentation network is trained by adopting a semantic segmentation network training method provided by any embodiment in advance;

in this embodiment, after the semantic segmentation network is obtained through training, the semantic segmentation network is used to perform semantic segmentation on the image to be segmented, specifically, as shown in fig. 4b, the image to be segmented is input to the semantic segmentation network, and the image to be segmented is processed through the semantic segmentation network.

And step 420, determining a segmentation result of the image to be segmented according to an output result of the semantic segmentation network.

In this embodiment, a feature map obtained by processing an image to be segmented through a segmentation network is subjected to conditional random field branching to obtain a segmentation result, so as to form a final semantic segmentation mask, thereby determining a segmentation result of the image to be segmented.

According to the technical scheme of the embodiment of the invention, the image to be segmented is used as the input of the semantic segmentation network, wherein the semantic segmentation network is trained by adopting the semantic segmentation network training method provided by any embodiment in advance, and the segmentation result of the image to be segmented is determined according to the output result of the semantic segmentation network, so that the semantic segmentation effect is improved.

EXAMPLE five

Fig. 5 is a schematic structural diagram of a semantic segmentation network training device according to a fifth embodiment of the present invention, where the semantic segmentation network training device includes: a seed region determination module 510, an initial segmentation result determination module 520, and a segmented network training module 530.

A seed region determining module 510, configured to input the sample image into a classification network for feature extraction, and determine a seed region in the sample image according to an extracted classification sample feature map;

an initial segmentation result determining module 520, configured to input the sample image into a segmentation network for feature extraction, and determine an initial segmentation result output by the segmentation network according to the extracted segmentation sample feature map and a label corresponding to the sample image;

and the segmentation network training module 530 is configured to train the segmentation network by using the seed region as the supervision information and based on the initial segmentation result and using a seed region growth model and a conditional random field model, so as to obtain a trained semantic segmentation network.

Optionally, the classification network includes a plurality of convolution layers, a pyramid pooling layer, a full-link layer, and an activation function;

the semantic segmentation network training device further comprises:

the classification sample feature map acquisition module is used for performing feature extraction on the sample image through the plurality of convolution layers before inputting the sample image into the classification network for performing feature extraction to obtain a classification sample feature map;

the sample feature vector acquisition module is used for performing feature fusion on the classified sample feature map under different scales through a pyramid pooling layer to obtain a sample feature vector corresponding to the sample image;

the classification result determining module is used for processing the sample feature vectors through a full connection layer and an activation function to obtain a classification result of the sample image;

and the classification network training module is used for constructing a classification loss function based on the classification result and the label of the sample image, and training the classification network based on the classification loss function.

Optionally, the seed region determining module 510 includes:

the classification sample characteristic diagram acquisition unit is used for inputting the sample image into a classification network for characteristic extraction and scoring a classification sample characteristic diagram;

the characteristic thermodynamic diagram generating unit is used for generating a characteristic thermodynamic diagram by adopting a weak positioning method aiming at the classification sample characteristic diagram;

and the seed region determining unit is used for segmenting the characteristic thermodynamic diagram according to a thermodynamic diagram segmentation threshold value and determining a seed region in the sample image according to a segmentation result.

the initial segmentation result determining module 520 includes:

the segmentation sample feature map acquisition unit is used for inputting the sample image into a segmentation network, and performing feature extraction on the sample image by a plurality of convolution layers and a non-local module in the segmentation network to obtain a segmentation sample feature map;

the multi-heat vector determining unit is used for determining the multi-heat vectors of the labels corresponding to the sample images;

and the initial segmentation result determining unit is used for calculating the tensor product of the segmented sample characteristic diagram and the multiple heat vectors to obtain an initial segmentation result.

Optionally, the segmented network training module 530 includes:

the region growth loss function construction unit is used for constructing a region growth loss function according to the probability that each pixel point in the initial segmentation result belongs to a set category;

the conditional random field loss function building unit is used for building a conditional random field loss function according to the probability that each pixel point in the initial segmentation result belongs to the set category and the output of each pixel point aiming at the conditional random field of the set category;

the segmentation network loss function determining unit is used for determining a segmentation network loss function according to the region growth loss function and the conditional random field loss function;

and the segmentation network training unit is used for training the segmentation network by utilizing a gradient descent method according to the segmentation network loss function.

Optionally, the seed region determining module 510 further includes:

an initial segmentation unit, configured to segment the characteristic thermodynamic diagram according to a thermodynamic diagram segmentation threshold, and segment the characteristic thermodynamic diagram according to a preset initial segmentation threshold before determining a seed region in the sample image according to a segmentation result;

and the segmentation threshold determining unit is used for determining the thermodynamic diagram segmentation threshold by calculating an intersection ratio of the segmentation result and the semantic segmentation result of the sample image.

The semantic segmentation network training device provided by the embodiment of the invention can execute the semantic segmentation network training method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

EXAMPLE six

Fig. 6 is a schematic structural diagram of an image semantic segmentation apparatus according to a sixth embodiment of the present invention, where the image semantic segmentation apparatus includes: an image to be segmented input module 610 and a segmentation result determination module 620.

The image to be segmented input module 610 is configured to use an image to be segmented as input of a semantic segmentation network, where the semantic segmentation network is trained in advance by using a semantic segmentation network training method provided in any embodiment;

and a segmentation result determining module 620, configured to determine a segmentation result of the image to be segmented according to an output result of the semantic segmentation network.

The image semantic segmentation device provided by the embodiment of the invention can execute the image semantic segmentation method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

EXAMPLE seven

Fig. 7 is a schematic structural diagram of an electronic apparatus according to a seventh embodiment of the present invention, as shown in fig. 7, the electronic apparatus includes a processor 70, a memory 71, an input device 72, and an output device 73; the number of processors 70 in the device may be one or more, and one processor 70 is taken as an example in fig. 7; the processor 70, the memory 71, the input device 72 and the output device 73 of the apparatus may be connected by a bus or other means, as exemplified by the bus connection in fig. 7.

The memory 71 serves as a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the semantic segmentation network training method and the image semantic segmentation method in the embodiment of the present invention (for example, the seed region determination module 510, the initial segmentation result determination module 520, and the segmentation network training module 530 in the semantic segmentation network training device, or the image input module 610 to be segmented and the segmentation result determination module 620 in the image semantic segmentation device). The processor 70 executes various functional applications of the device and data processing, i.e., implements the semantic segmentation network training method or the image semantic segmentation method described above, by running software programs, instructions, and modules stored in the memory 71.

The semantic segmentation network training method comprises the following steps:

The image semantic segmentation method comprises the following steps:

taking an image to be segmented as input of a semantic segmentation network, wherein the semantic segmentation network is trained by adopting a semantic segmentation network training method provided by any embodiment in advance;

The memory 71 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 71 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 71 may further include memory located remotely from the processor 70, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Example eight

An eighth embodiment of the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a computer processor, is configured to perform a semantic segmentation network training method or an image semantic segmentation method.

The image semantic segmentation method comprises the following steps:

Of course, the storage medium provided by the embodiment of the present invention and containing the computer-executable instructions is not limited to the method operations described above, and may also perform related operations in the image semantic segmentation method provided by any embodiment of the present invention.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, an application server, or a network device) to execute the methods according to the embodiments of the present invention.

It should be noted that, in the embodiment of the semantic segmentation network training apparatus and the image semantic segmentation apparatus, each unit and each module included in the embodiment are only divided according to functional logic, but are not limited to the above division, as long as the corresponding function can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A semantic segmentation network training method is characterized by comprising the following steps:

2. The method of claim 1, wherein the classification network comprises a plurality of convolutional layers, pyramid pooling layers, full-link layers, and activation functions;

processing the sample characteristic vector through a full connection layer and an activation function to obtain a classification result of the sample image;

3. The method of claim 1, wherein inputting the sample image into a classification network for feature extraction, and determining a seed region in the sample image according to the extracted classification sample feature map comprises:

inputting the sample image into a classification network for feature extraction, and obtaining a classification sample feature map;

aiming at the classified sample feature map, generating a feature thermodynamic diagram by adopting a weak positioning method;

and segmenting the characteristic thermodynamic diagram according to a thermodynamic diagram segmentation threshold, and determining a seed region in the sample image according to a segmentation result.

4. The method of claim 1, wherein the segmentation network comprises a plurality of convolutional layers, pooling layers, and non-local modules; the non-local module is arranged behind the convolutional layers with the set number of layers;

inputting a sample image into a segmentation network for feature extraction, and determining an initial segmentation result output by the segmentation network according to an extracted segmentation sample feature map and a label corresponding to the sample image, wherein the initial segmentation result comprises the following steps:

and calculating a tensor product of the segmentation sample characteristic diagram and the multiple heat vectors to obtain an initial segmentation result.

5. The method of claim 1, wherein training a segmentation network using a seed region growing model and a conditional random field model based on the initial segmentation result with the seed region as supervisory information comprises:

constructing a region growth loss function according to the probability that each pixel point in the initial segmentation result belongs to a set category;

constructing a conditional random field loss function according to the probability that each pixel point in the initial segmentation result belongs to the set category and the output of each pixel point aiming at the conditional random field of the set category;

determining a segmentation network loss function according to the region growth loss function and the conditional random field loss function;

and training the segmentation network by using a gradient descent method according to the segmentation network loss function.

6. The method of claim 3, further comprising, prior to segmenting the feature thermodynamic diagram according to a thermodynamic diagram segmentation threshold and determining a seed region in the sample image according to a segmentation result:

according to a preset initial segmentation threshold value, segmenting the characteristic thermodynamic diagram;

7. An image semantic segmentation method, comprising:

taking an image to be segmented as an input of a semantic segmentation network, the semantic segmentation network being trained in advance by the method of claims 1-6;

8. A semantic segmentation model training device, comprising:

9. An image semantic segmentation apparatus, comprising:

an image to be segmented input module, configured to use an image to be segmented as an input of a semantic segmentation network, where the semantic segmentation network is trained in advance by using the method according to claims 1 to 6;

10. An electronic device, characterized in that the device comprises:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the semantic segmentation network training method of any one of claims 1-6 or the image semantic segmentation method of claim 7.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a semantic segmentation network training method according to any one of claims 1 to 6, or an image semantic segmentation method according to claim 7.