CN115170809A

CN115170809A - Image segmentation model training method, image segmentation device, image segmentation equipment and medium

Info

Publication number: CN115170809A
Application number: CN202211085858.3A
Authority: CN
Inventors: 俞元杰; 付建海; 吴立; 颜成钢; 李亮; 殷海兵; 熊剑平
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2022-09-06
Filing date: 2022-09-06
Publication date: 2022-10-11
Anticipated expiration: 2042-09-06
Also published as: CN115170809B

Abstract

The application discloses an image segmentation model training method, an image segmentation device and a medium, wherein a first segmentation module in an image segmentation model is trained based on a first sample image in a first training set and first label information corresponding to the first sample image, a second sample image in a second training set is subjected to feature extraction based on the trained first segmentation module to obtain a fusion feature map corresponding to the second sample image, and then the second segmentation module in the image segmentation model is trained based on the fusion feature map corresponding to the second sample image. The parameters of the first segmentation module and the second segmentation module are not interfered with each other. Part of parameters of the first segmentation module are shared by the first segmentation module and the second segmentation module, and the trained first segmentation module can improve the feature extraction effect of the second sample image, reduce the number of the second sample image, and reduce the labeling cost and the data processing amount.

Description

Image segmentation model training method, image segmentation device, image segmentation equipment and image segmentation medium

Technical Field

The application relates to the technical field of machine vision, in particular to an image segmentation model training method, an image segmentation device, an image segmentation equipment and a medium.

Background

The deep convolutional neural network makes a significant breakthrough in the task of image segmentation and visual understanding. The method utilizes the availability of large-scale sample images, such as an image recognition database ImageNet, and utilizes the sample images in the ImageNet to train an image segmentation model.

In the related art, an image segmentation model is trained based on a large number of sample images of a basic category and corresponding labeling information, when a sample image of a new category appears, the sample image of the new category needs to be labeled, and parameters of the image segmentation model are retrained by combining the sample image of the new category and the corresponding labeling information as well as the sample image of the basic category and the corresponding labeling information. In order to realize the identification of the new category, a large number of sample images of the new category need to be labeled, the acquisition and labeling costs of the sample images of the new category are high, training of an image segmentation model is performed based on a large number of sample images of the basic category and a large number of sample images of the new category, and the data processing amount is also large.

Disclosure of Invention

The embodiment of the application provides an image segmentation model training method, an image segmentation model training device and an image segmentation model training medium, and aims to solve the problems that in the related art, the image segmentation model training method is high in cost and large in data processing amount.

The application provides an image segmentation model training method, which comprises the following steps:

inputting a first sample image in a first training set and first annotation information corresponding to the first sample image into a first segmentation module in an image segmentation model, and training the first segmentation module;

inputting second sample images in a second training set and second labeling information corresponding to the second sample images into a trained first segmentation module, and determining a fusion feature map corresponding to the second sample images based on the first segmentation module;

inputting the fusion feature map into a second segmentation module in the image segmentation model, determining a segmentation result of the fusion feature map based on the second segmentation module, and training the second segmentation module according to the segmentation result and the second labeling information.

Further, the first segmentation module comprises:

the device comprises a feature extraction network, a first class branch network, a first detection frame branch network and a mask branch network.

Further, the training process of the first segmentation module comprises:

determining a first depth feature map of the first sample image based on the feature extraction network, extracting a first detection frame in the first depth feature map, mapping the first detection frame to the first depth feature map, determining each first local feature map, and performing feature extraction on each first local feature map to obtain each first fused feature map;

inputting each first fusion feature map into a first class branch network and a first detection frame branch network respectively; determining first prediction category information corresponding to each first fusion feature map based on the first category branch network; determining a first detection frame area corresponding to each first fusion feature map based on the first detection frame branch network;

inputting the first depth feature map and each first detection frame area into the mask branch network, and determining first prediction mask information corresponding to each first detection frame area based on the mask branch network;

and training parameters of the feature extraction network, the first class branch network, the first detection frame branch network and the mask branch network according to the first prediction category information, the first prediction mask information and the first marking information.

Further, the second segmentation module comprises:

a class weight regularization network, a second class branch network, an uncertainty prediction network, a coordinate offset prediction network, and a second detection box branch network.

Further, the training process of the second segmentation module comprises:

determining a second depth feature map of the second sample image based on the feature extraction network, extracting a second detection frame in the second depth feature map, mapping the second detection frame to the second depth feature map, determining each second local feature map, and performing feature extraction on each second local feature map to obtain each second fusion feature map;

inputting each second fusion feature map into a category weight regularization network and a first detection frame branch network respectively; determining each category probability value corresponding to each second fusion characteristic graph based on the category weight regularization network, inputting each category probability value into the second category branch network, and determining each corresponding second prediction category information of each first fusion characteristic graph based on the second category branch network; determining second detection frame areas corresponding to the second fusion feature maps based on the first detection frame branch network, determining uncertainty values of the second detection frame areas based on the uncertainty prediction network, and determining coordinate deviation values of the second detection frame areas based on the coordinate deviation prediction network; based on the second detection frame branch network, carrying out detection frame adjustment on each second detection frame area with the uncertainty value smaller than a set threshold value according to the coordinate offset value of each second detection frame area to obtain each third detection frame area;

inputting the second depth feature map and each third detection box area into the mask branch network, and determining second prediction mask information corresponding to each third detection box area based on the mask branch network;

and training parameters of the class weight regularization network, the second class branch network, the uncertainty prediction network, the coordinate offset prediction network and the second detection frame branch network according to the second prediction class information, the second prediction mask information and the second marking information.

Further, the determining, based on the category weight regularization network, each category probability value corresponding to each second fused feature map includes:

determining the mean value of each category weight of the category weight regularization network and the diagonal covariance matrix of each category weight by adopting a Bayes learning algorithm and a general variational frame;

and determining the probability value of each category corresponding to each second fusion feature map according to each second fusion feature map, the mean value of each category weight and the diagonal covariance matrix of each category weight.

Further, the determining, according to the second fused feature maps, the mean value of the class weights, and the diagonal covariance matrix of the class weights, the class probability values corresponding to the second fused feature maps includes:

and determining the probability value of each category corresponding to each second fusion feature map according to each second fusion feature map, the mean value of each category weight, the diagonal covariance matrix of each category weight and the sigmoid activation function.

In another aspect, the present application provides an image segmentation method, including:

acquiring an image to be processed, and inputting the image into a trained image segmentation model;

determining, based on a first segmentation module in the image segmentation model, third prediction category information, third prediction mask information, and a first probability value corresponding to the third prediction category information for the image;

determining fourth prediction category information, fourth prediction mask information and a second probability value corresponding to the fourth prediction category information of the image based on a second segmentation module in the image segmentation model;

if the first probability value is greater than the second probability value, using the third prediction category information and the third prediction mask information as a segmentation result;

if the first probability value is not greater than the second probability value, using the fourth prediction type information and the fourth prediction mask information as a segmentation result;

wherein the image segmentation model is determined by: inputting a first sample image in a first training set and first annotation information corresponding to the first sample image into a first segmentation module in an image segmentation model, and training the first segmentation module; inputting second sample images in a second training set and second labeling information corresponding to the second sample images into a trained first segmentation module, and determining a fusion feature map corresponding to the second sample images based on the first segmentation module; inputting the fusion feature map into a second segmentation module in the image segmentation model, determining a segmentation result of the fusion feature map based on the second segmentation module, and training the second segmentation module according to the segmentation result and the second labeling information.

In yet another aspect, the present application provides an image segmentation model training apparatus, including:

the first training unit is used for inputting a first sample image in a first training set and first annotation information corresponding to the first sample image into a first segmentation module in an image segmentation model and training the first segmentation module;

the second training unit is used for inputting a second sample image in a second training set and second marking information corresponding to the second sample image into the trained first segmentation module, and determining a fusion feature map corresponding to the second sample image based on the first segmentation module;

and the third training unit is used for inputting the fusion feature map into a second segmentation module in the image segmentation model, determining a segmentation result of the fusion feature map based on the second segmentation module, and training the second segmentation module according to the segmentation result and the second labeling information.

A first training unit, configured to determine a first depth feature map of the first sample image based on the feature extraction network, extract a first detection frame in the first depth feature map, map the first detection frame to the first depth feature map, determine each first local feature map, and perform feature extraction on each first local feature map to obtain each first fused feature map;

A third training unit, configured to determine a second depth feature map of the second sample image based on the feature extraction network, extract a second detection frame in the second depth feature map, map the second detection frame to the second depth feature map, determine each second local feature map, and perform feature extraction on each second local feature map to obtain each second fused feature map;

and training parameters of the class weight regularization network, the second class branch network, the uncertainty prediction network, the coordinate offset prediction network and the second detection frame branch network according to each piece of second prediction class information, each piece of second prediction mask information and the second marking information.

The third training unit is specifically used for determining the mean value of each class weight of the class weight regularization network and the diagonal covariance matrix of each class weight by adopting a Bayes learning algorithm and a general variational frame;

And the third training unit is specifically configured to determine, according to the second fusion feature maps, the mean value of the class weights, the diagonal covariance matrix of the class weights, and the sigmoid activation function, the class probability values corresponding to the second fusion feature maps.

In yet another aspect, the present application provides an image segmentation apparatus, including:

the acquisition unit is used for acquiring an image to be processed and inputting the image into a trained image segmentation model;

a first determining unit, configured to determine, based on a first segmentation module in the image segmentation model, third prediction category information, third prediction mask information, and a first probability value corresponding to the third prediction category information of the image;

a second determining unit, configured to determine, based on a second segmentation module in the image segmentation model, fourth prediction category information, fourth prediction mask information, and a second probability value corresponding to the fourth prediction category information of the image;

an image segmentation unit configured to take the third prediction category information and the third prediction mask information as a segmentation result if the first probability value is greater than the second probability value; if the first probability value is not greater than the second probability value, using the fourth prediction type information and the fourth prediction mask information as a segmentation result;

In another aspect, the present application provides an electronic device, including a processor, a communication interface, a memory and a communication bus, where the processor, the communication interface and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any one of the above when executing a program stored in the memory.

In another aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method steps of any of the above.

The application provides an image segmentation model training method, an image segmentation device and a medium, wherein the method comprises the following steps: inputting a first sample image in a first training set and first marking information corresponding to the first sample image into a first segmentation module in an image segmentation model, and training the first segmentation module; inputting a second sample image in a second training set and second marking information corresponding to the second sample image into a trained first segmentation module, and determining a fusion feature map corresponding to the second sample image based on the first segmentation module; inputting the fusion feature map into a second segmentation module in the image segmentation model, determining a segmentation result of the fusion feature map based on the second segmentation module, and training the second segmentation module according to the segmentation result and the second labeling information.

The technical scheme has the following advantages or beneficial effects:

according to the image segmentation method and device, training of a first segmentation module in an image segmentation model is completed based on a first sample image in a first training set and corresponding first labeling information of the first sample image, feature extraction is performed on a second sample image in a second training set based on the trained first segmentation module, a fusion feature map corresponding to the second sample image is obtained, and then training of a second segmentation module in the image segmentation model is performed based on the fusion feature map corresponding to the second sample image. This application can not adjust the parameter of the first segmentation module of having trained when training the second segmentation module, and the parameter of first segmentation module and second segmentation module is mutual noninterference. And partial parameters of the first segmentation module are shared by the first segmentation module and the second segmentation module, the characteristic extraction effect on the second sample image can be improved based on the trained first segmentation module, and compared with a scheme that parameters of an image segmentation model are retrained by combining a new type of sample image and corresponding annotation information as well as a basic type of sample image and corresponding annotation information in the related art, the number of the second sample images can be reduced, and the annotation cost and the data processing amount are reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of an image segmentation model training process provided in the present application;

fig. 2 is a schematic structural diagram of a first segmentation module provided in the present application;

fig. 3 is a schematic structural diagram of another first segmentation module provided in the present application;

FIG. 4 is a structural diagram of a second segmentation module provided in the present application;

FIG. 5 is a schematic structural diagram of an image segmentation model provided in the present application;

FIG. 6 is a schematic diagram of an image segmentation process provided in the present application;

FIG. 7 is a schematic structural diagram of an image segmentation model training apparatus provided in the present application;

FIG. 8 is a schematic structural diagram of an image segmentation apparatus provided in the present application;

fig. 9 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

The present application will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a schematic diagram of an image segmentation model training process provided in the present application, where the process includes the following steps:

s101: inputting a first sample image in a first training set and first marking information corresponding to the first sample image into a first segmentation module in an image segmentation model, and training the first segmentation module.

S102: inputting a second sample image in a second training set and second marking information corresponding to the second sample image into a trained first segmentation module, and determining a fusion feature map corresponding to the second sample image based on the first segmentation module.

S103: inputting the fusion feature map into a second segmentation module in the image segmentation model, determining a segmentation result of the fusion feature map based on the second segmentation module, and training the second segmentation module according to the segmentation result and the second labeling information.

The image segmentation method provided by the application is applied to electronic equipment, and the electronic equipment can be equipment such as a PC (personal computer), a tablet personal computer and the like, and can also be a server.

The method comprises the steps that electronic equipment obtains a first training set, wherein the first training set comprises a training set of sample images of basic categories, the first training set comprises a large number of labeled first sample images, each first sample image has corresponding first labeling information, and the first labeling information comprises mask information and category information of each example in the corresponding first sample image.

The image segmentation model comprises a first segmentation module and a second segmentation module, wherein the first segmentation module is obtained by training a first training set of sample images comprising basic categories. The second segmentation module is trained by a second training set comprising the sample images of the new class and the first segmentation module.

After the electronic equipment acquires the first training set, inputting each first sample image and each corresponding first annotation information in the first training set into a first segmentation module in the image segmentation model. The first segmentation module extracts features of each first sample image and outputs prediction category information and prediction mask information. And calculating a loss function value according to the prediction type information and the prediction mask information corresponding to each first sample image and the labeled type information and mask information corresponding to the first sample image. And after iterative computation, when the loss function value meets the set requirement, determining that the training of the first segmentation module is finished.

And after the training of the first segmentation module is finished, fixing the parameters of the first segmentation module unchanged. Then, each second sample image in the second training set and the corresponding second label information are input into the trained first segmentation module. And performing feature extraction on each second sample image by using the first segmentation module to obtain a fusion feature map corresponding to each second sample image. And then inputting each fusion feature map corresponding to each second sample image and each corresponding second annotation information into a second segmentation module in the image segmentation model. And the second segmentation module performs feature extraction on each fused feature map and outputs prediction category information and prediction mask information. And calculating a loss function value according to the prediction category information and the prediction mask information corresponding to the fusion feature map and the labeled category information and mask information corresponding to the first fusion feature map aiming at each fusion feature map. And after iterative computation, when the loss function value meets the set requirement, determining that the training of the second segmentation module is finished.

In the application, training of a first segmentation module in an image segmentation model is completed based on a first sample image in a first training set and first annotation information corresponding to the first sample image, feature extraction is performed on a second sample image in a second training set based on the trained first segmentation module to obtain a fusion feature map corresponding to the second sample image, and then training of a second segmentation module in the image segmentation model is performed based on the fusion feature map corresponding to the second sample image. This application can not adjust the parameter of the first segmentation module of having trained when training the second segmentation module, and the parameter of first segmentation module and second segmentation module is mutual noninterference. And partial parameters of the first segmentation module are shared by the first segmentation module and the second segmentation module, the characteristic extraction effect on the second sample image can be improved based on the trained first segmentation module, and compared with a scheme that parameters of an image segmentation model are retrained by combining a new type of sample image and corresponding annotation information as well as a basic type of sample image and corresponding annotation information in the related art, the number of the second sample images can be reduced, and the annotation cost and the data processing amount are reduced.

Specifically, fig. 2 is a schematic structural diagram of a first segmentation module provided in the present application, and as shown in fig. 2, the first segmentation module includes:

The training process of the first segmentation module comprises the following steps:

The electronic equipment inputs each first sample image in the first training set and the corresponding first marking information into a feature extraction network in the first segmentation module, and determines each first fusion feature map based on the feature extraction network. Optionally, as shown in the schematic structural diagram of the first segmentation module shown in fig. 3, the feature extraction network includes a full convolution network and a region-of-interest recommendation network. Firstly, the full convolution network carries out full convolution processing on each first sample image to obtain each first depth feature map, and then each first depth feature map is input into the interested area recommendation network. And extracting a first detection frame in each first depth feature map by the region of interest recommendation network, mapping the first detection frame to the corresponding first depth feature map, determining each first local feature map, and performing feature extraction on each first local feature map to obtain each first fusion feature map. The convolution processing may be performed on each first local feature map to obtain each first fused feature map.

After each first fusion characteristic diagram is determined, each first fusion characteristic diagram is respectively input into a first class branch network and a first detection frame branch network; the first class branch network is used for determining first prediction class information corresponding to each first fusion characteristic graph, and the first detection frame branch network is used for determining a first detection frame area corresponding to each first fusion characteristic graph. After each first detection frame area is determined, the first depth feature map and each first detection frame area need to be input into a mask branch network, and the mask branch network is used for determining first prediction mask information corresponding to each first detection frame area.

The first prediction category information and the first prediction mask information are prediction results obtained by the first segmentation module, then a loss function value is calculated according to the first prediction category information, the first prediction mask information and the label category information and the label mask information in the first label information, and parameters of the feature extraction network, the first category branch network, the first detection frame branch network and the mask branch network are adjusted through multiple iterative training. When the loss function value meets the requirement, the parameters of the feature extraction network, the first class branch network, the first detection frame branch network and the mask branch network are fixed, and at the moment, the training of the first segmentation module is completed.

Fig. 4 is a schematic structural diagram of a second splitting module provided in the present application, and as shown in fig. 4, the second splitting module includes:

The training process of the second segmentation module comprises the following steps:

When the second segmentation module is trained, the second sample image is input into the feature extraction network of the first segmentation module, a second depth feature map of the second sample image is determined based on a full convolution network in the feature extraction network, a network is recommended based on an interested region in the feature extraction network, a second detection frame in the second depth feature map is extracted, the second detection frame is mapped to the second depth feature map, and each second local feature map is determined. And extracting features by performing convolution processing on each second local feature map to obtain each second fusion feature map.

And then, inputting each second fusion feature map into the category weight regularization network and the first detection frame branch network respectively. The category weight regularization network is used for determining each category probability value corresponding to each second fusion characteristic graph, and the second category branch network is used for determining second prediction category information corresponding to each first fusion characteristic graph according to the determined each category probability value. And determining the category corresponding to the maximum probability value as second prediction category information.

The first detection frame branch network is used for determining second detection frame areas corresponding to the second fusion feature maps respectively, and the uncertainty prediction network determines uncertainty values of the second detection frame areas respectively. And deleting the second detection frame area with the uncertainty value not less than the set threshold. And the coordinate deviation prediction network is used for determining the coordinate deviation value of each second detection frame area, and superposing the second detection frame area with the reserved uncertainty value not less than the set threshold value and the coordinate deviation value thereof to the second local characteristic diagram to obtain each adjusted detection frame area, namely each third detection frame area.

And respectively inputting the second depth feature map and each third detection frame area into a mask branch network, wherein the mask branch network is used for determining second prediction mask information corresponding to each third detection frame area.

And calculating a loss function value according to the second prediction category information, the second prediction mask information and the labeled category information and labeled mask information in the second labeled information, and adjusting parameters of a category weight regularization network, a second category branch network, an uncertainty prediction network, a coordinate offset prediction network and a second detection frame branch network through multiple iterative training. And when the loss function value meets the requirement, the second segmentation module completes training.

In this application, in order to improve the segmentation effect of the second segmentation module during the training of the small sample, the determining, based on the class weight regularization network, each class probability value corresponding to each second fusion feature map includes:

and determining the probability value of each category corresponding to each second fusion characteristic diagram according to each second fusion characteristic diagram, the mean value of each category weight and the diagonal covariance matrix of each category weight.

The Bayesian method is adopted to learn the weight of each new category and provide a good priori estimation, so that the new category weight on which the sigmoid activation function depends has a good initial iteration value, and the situations of serious optimization process fluctuation and unstable model results (particularly new category probability prediction) caused by too few training samples are greatly reduced.

In this application, in order to further improve the segmentation effect of the second segmentation module during the training of the small sample, determining, according to the second fusion feature maps, the mean value of the class weights, and the diagonal covariance matrix of the class weights, the class probability values corresponding to the second fusion feature maps includes:

In the present application, the first training set includes a large number of first sample images, and the second training set includes, for example, only one second sample image with corresponding second label information. Due to the fact that samples are seriously unbalanced between the first training set and the second training set, a sigmoid activation function is adopted to replace a softmax activation function and is used as an activation function when prediction examples belong to each class of probability. The analysis was as follows:

under the softmax activation function, the probability sum of all the classes must meet p1+ p2+ \8230, + pn =1, and the increase of the prediction probability of one class necessarily leads to the decrease of the prediction probability of other classes, namely the elimination of the negative; moreover, this phenomenon is also affected by the number of samples of different classes: the second training set material is much less than the first training set material, resulting in predicted class results for training and testing under the softmax activation function, more likely favoring the classes of the first training set that are more material-rich. Meanwhile, even if class equalization processing is performed, for example, training weights of classes of the second training set with fewer materials are increased, and training sampling proportion is increased, the problem cannot be effectively alleviated.

Such problems can be circumvented by using the sigmoid activation function, where the sum of the probabilities for all classes p1+ p2+ \8230, and + pn no longer needs to be a constant value. At the same time. When the class of the second training set is trained, the trained weight corresponding to the first training set does not need to be updated and adjusted at all. That is to say, the parameters of the first segmentation module and the second segmentation module do not interfere with each other.

In order to solve the problems that the fluctuation is serious (the fluctuation is larger when the number of samples is less and the training is unstable and the influence on the fluctuation of other activation functions is the same) in the optimization process of the sigmoid activation function, the regression precision is poor and the accuracy degree of the prediction classification probability is low, the Bayesian method is adopted to learn the weight of each new class, and a good priori estimation is given, so that the new class weight on which the sigmoid activation function depends has a good initial iteration value, and the conditions that the fluctuation is serious and the model result (particularly the prediction of the new class probability) is unstable when the number of training samples is too few are greatly reduced.

The following describes the image segmentation model training process in detail with reference to the algorithm core structure diagram.

Fig. 5 is a schematic structural diagram of an image segmentation model provided in the present application. When training the image segmentation model, firstly, a large amount of basic class sample data (a first training set) and new class sample data (1 piece of each class and a second training set) and corresponding labeling information need to be prepared, and uploading is completed.

And for the existing basic categories which are easy to obtain a large number of samples/labels, carrying out full supervision training by adopting mass data to obtain a first segmentation module. The training mode is consistent with the mask-rcnn algorithm. It should be noted that the basic type and the new type of networks, that is, the first segmentation module and the second segmentation module, share a part of the structure and corresponding parameters, and are embodied on the modules of the full convolution network, the feature kernel prediction module, the region-of-interest recommendation network, the first detection box branch network, and the mask branch network in fig. 5.

The process of training the second segmentation module is as follows:

step 1: the method comprises the steps of taking a full convolution network as a backbone network, extracting depth features of different scales of an image, unifying the scales of all the depth features, and then fusing the features, wherein the fusing process comprises but is not limited to the following steps: and obtaining a second depth feature map by means of feature splicing, feature addition/multiplication, feature fusion according to weight and the like.

And 2, step: detection and mask branch training, the specific process is as follows:

2.1, extracting a plurality of regions with the maximum target existence probability by the second depth feature map through a Region of interest recommendation Network (RPN) in a rectangular detection frame form, and obtaining a second local feature map of a Region of interest (ROI) through ROIAlign (alignment of features of the Region of interest, which is interpolation operation essentially).

2.2, feature kernel prediction branches for generating another set of depth features.

And 2.3, taking the second local feature map as input, taking the feature kernel prediction branch as a convolution kernel weight parameter, and performing convolution operation to obtain a second fused feature map after fusion.

And 2.4, performing class weight regularization network of the new class features by taking the second fusion feature map as input (in a training stage, a small number of samples are adopted for supervised training, for example, each new class only has 1 picture and a corresponding label, and in an application stage, a model obtained after training is used for prediction).

The weight regularization method is specifically as follows:

bayesian learning with class weight distribution is adopted, and a universal variation framework is adopted; namely, learning the mean value mu of each new category weight and the diagonal covariance matrix sigma of each new category weight by minimizing the following variation target. In the training phase, the loss function is formulated as follows:

。

in the application stage, the probability that the region of interest belongs to the new category is jointly derived according to the depth feature f (corresponding fusion feature map) of the region of interest, the mean value mu of the new category weight and the diagonal covariance matrix sigma of the new category weight. New class weight usage: and performing convolution operation with the depth characteristic f to obtain the probability that the target belongs to each category, the mean value mu of the new category weight and the diagonal covariance matrix sigma of the new category weight, which is the overall mean value/diagonal covariance matrix of all the category weights).

Wherein:

(1) f is the depth feature obtained under the constraints of the detection box.

(2) c and c refer to the predicted class and the true class (both refer to the new class label of the small sample); l _d Is the sigmoid focal loss corresponding to the predicted category c.

(3) p (c | f, μm, Σ) is the posterior predicted distribution of c with respect to f, μm, and Σ; in order to facilitate calculation of the posterior prediction distribution, a good approximate expression of p (c | f, mu, sigma) is searched in the application. The method comprises the following specific steps: let σ (-) represent sigmoid activation function, Φ (-) represent cumulative distribution function, because the essence of Bayesian learning is to learn μ and Σ, so class prediction is equivalent to mapping problem, fine tuning the last layer of classification head by using Bayesian learning is equivalent to learning Bayesian logistic regression, and then:

；

let a = f ^T w, and a cumulative distribution function phi (lambda a) is used for approximating and replacing sigma (a), so that a better approximate expression of p (c | f, mu, sigma) is obtained:

。

in the application stage of the expression, probability distribution of the case belonging to each category is given, the category with the maximum probability value is selected, the final new category prediction result is obtained, the result is independent from the basic category prediction result, because the result is respectively given by two non-interfering branches, the new category prediction result and the basic category prediction result are finally merged, and the category to which the maximum probability value belongs is selected as the final category prediction result.

(4) KL is the Kullback-Leibler divergence (i.e., relative entropy, which is also a representation of the probability distribution): let P (X), Q (X) be two probability distributions over a random variable X, then the relative entropy is defined as:

；

the value of KL divergence is always greater than 0, and KL divergence is equal to 0 if and only if the two distributions are the same; the higher the similarity between P (X) and Q (X), the smaller the KL distance.

In actual training, P (X) corresponds to N (mu, sigma), namely the weight parameter distribution of the new category; q (X) corresponds to a normal distribution with a mean of 0 and a standard deviation of 1.

2.5, taking the second fusion characteristic diagram as input, and performing regression prediction based on convolution operation to obtain a coordinate deviation value of the second detection frame area and an uncertainty value of the second detection frame area; and firstly, deleting the second detection frame area with the uncertainty value not less than the set threshold, and then superposing the coordinate deviation values of the rest second detection frame areas on the corresponding interested area coordinate frame in the second local characteristic diagram to obtain a fine-tuned third detection frame area.

And 2.6, taking the second depth feature map and each third detection frame area as the input of a mask branch network, and performing ROIAlign (alignment of the features of the region of interest) and convolution operation to achieve the purpose of fine tuning the shape of the mask, thereby finally generating second prediction mask information.

And 2.7, determining a final class prediction result according to the second prediction class information and the first prediction class information output by the first segmentation module, wherein the second prediction mask information provides a final mask prediction result, and the region (re-contour) in the target detection frame, which is subordinate to the target pixel (foreground), is obtained to obtain a final example segmentation result.

3. Deployment of the model: the nnx model is a generic chip deployment model. The method aims at unifying protocols, and is an open file format designed for machine learning and used for storing a trained model. The method enables different artificial intelligence frames (such as a coffee model, a pth model of a pitorch, an onnx model of Microsoft and the like) to adopt the same format to store model data and interact, further uniformly convert the model data into an nnx model, and finally achieve an efficient solution of 'one model storage and multiple platform deployment'. After the image segmentation model is trained, uniformly converting the image segmentation model into an nnx model, and uniformly deploying by adopting an nnx inference engine. Wherein the nnx model is a unified inference forward engine of the developed chip hardware. NNX is a unified reasoning deployment engine.

Fig. 6 is a schematic diagram of an image segmentation process provided in the present application, where the process includes the following steps:

s201: and acquiring an image to be processed, and inputting the image into the trained image segmentation model.

S202: determining, based on a first segmentation module in the image segmentation model, a third prediction category information, a third prediction mask information, and a first probability value corresponding to the third prediction category information for the image.

S203: and determining fourth prediction category information, fourth prediction mask information and a second probability value corresponding to the fourth prediction category information of the image based on a second segmentation module in the image segmentation model.

S204: if the first probability value is greater than the second probability value, using the third prediction type information and the third prediction mask information as a segmentation result; and if the first probability value is not larger than the second probability value, using the fourth prediction type information and the fourth prediction mask information as a segmentation result.

The image segmentation method provided by the application is applied to electronic equipment, and the electronic equipment can be equipment such as a PC (personal computer), a tablet personal computer and the like, and can also be a server. The electronic equipment is provided with a trained image segmentation model, and instance segmentation of an image to be processed is realized based on the image segmentation model. Wherein the image segmentation model is determined by: inputting a first sample image in a first training set and first marking information corresponding to the first sample image into a first segmentation module in an image segmentation model, and training the first segmentation module; inputting second sample images in a second training set and second labeling information corresponding to the second sample images into a trained first segmentation module, and determining a fusion feature map corresponding to the second sample images based on the first segmentation module; inputting the fusion feature map into a second segmentation module in the image segmentation model, determining a segmentation result of the fusion feature map based on the second segmentation module, and training the second segmentation module according to the segmentation result and the second labeling information.

The image segmentation model that the scheme that this application provided obtained, the training effect of small sample reaches and promotes by a wide margin, and generalization ability promotes, for segmentation algorithms such as maskrnnn, low consuming time, effectual, more is fit for online training. Under the condition of 300 samples, the generalization performance is far beyond the original 3000 effects, and the method can be extended to various tasks and passes multiple project verification. The example segmentation algorithm is a lightweight high-performance network structure, the learning rate and the parameters are automatically adjusted, and the assistance is greatly improved for online training. The system gets through the whole process of training and deploying the small samples, and trains the small samples on line to be used on the ground.

Fig. 7 is a schematic structural diagram of an image segmentation model training apparatus provided in the present application, where the apparatus includes:

a first training unit 71, configured to input a first sample image in a first training set and first annotation information corresponding to the first sample image into a first segmentation module in an image segmentation model, and train the first segmentation module;

a second training unit 72, configured to input a second sample image in a second training set and second label information corresponding to the second sample image into a trained first segmentation module, and determine a fusion feature map corresponding to the second sample image based on the first segmentation module;

a third training unit 73, configured to input the fused feature map into a second segmentation module in the image segmentation model, determine a segmentation result of the fused feature map based on the second segmentation module, and train the second segmentation module according to the segmentation result and the second label information.

A first training unit 71, configured to determine a first depth feature map of the first sample image based on the feature extraction network, extract a first detection frame in the first depth feature map, map the first detection frame to the first depth feature map, determine each first local feature map, and perform feature extraction on each first local feature map to obtain each first fused feature map;

A third training unit 73, specifically configured to determine a second depth feature map of the second sample image based on the feature extraction network, extract a second detection frame in the second depth feature map, map the second detection frame to the second depth feature map, determine each second local feature map, and perform feature extraction on each second local feature map to obtain each second fusion feature map;

inputting each second fusion feature map into a category weight regularization network and a first detection frame branch network respectively; determining each category probability value corresponding to each second fusion characteristic graph based on the category weight regularization network, inputting each category probability value into the second category branch network, and determining each corresponding second prediction category information of each first fusion characteristic graph based on the second category branch network; determining second detection frame areas corresponding to the second fusion feature maps respectively based on the first detection frame branch network, determining uncertainty values of the second detection frame areas based on the uncertainty prediction network, and determining coordinate offset values of the second detection frame areas based on the coordinate offset prediction network; based on the second detection frame branch network, carrying out detection frame adjustment on each second detection frame area with the uncertainty value smaller than a set threshold value according to the coordinate offset value of each second detection frame area to obtain each third detection frame area;

The third training unit 73 is specifically configured to determine a mean value of each class weight of the class weight regularization network and a diagonal covariance matrix of each class weight by using a bayesian learning algorithm and a general variational frame;

The third training unit 73 is specifically configured to determine, according to the second fusion feature maps, the mean value of the class weights, the diagonal covariance matrix of the class weights, and the sigmoid activation function, the class probability values corresponding to the second fusion feature maps.

Fig. 8 is a schematic structural diagram of an image segmentation apparatus provided in the present application, where the apparatus includes:

an obtaining unit 81, configured to take an image to be processed, and input the image into a trained image segmentation model;

a first determining unit 82, configured to determine, based on a first segmentation module in the image segmentation model, third prediction category information, third prediction mask information, and a first probability value corresponding to the third prediction category information of the image;

a second determining unit 83, configured to determine, based on a second segmentation module in the image segmentation model, fourth prediction category information, fourth prediction mask information, and a second probability value corresponding to the fourth prediction category information of the image;

an image segmentation unit 84, configured to take the third prediction category information and the third prediction mask information as a segmentation result if the first probability value is greater than the second probability value; and if the first probability value is not larger than the second probability value, using the fourth prediction type information and the fourth prediction mask information as a segmentation result.

The present application also provides an electronic device, as shown in fig. 9, including: the system comprises a processor 401, a communication interface 402, a memory 403 and a communication bus 404, wherein the processor 401, the communication interface 402 and the memory 403 complete mutual communication through the communication bus 404;

the memory 403 has stored therein a computer program which, when executed by the processor 401, causes the processor 401 to perform any of the above method steps.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface 402 is used for communication between the above-described electronic apparatus and other apparatuses.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Alternatively, the memory may be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a central processing unit, a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc.

A computer-storage-readable storage medium is provided, in which a computer program executable by an electronic device is stored, which program, when run on the electronic device, causes the electronic device to carry out any of the above method steps when executed.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the scope of the present application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. An image segmentation model training method, characterized in that the method comprises:

inputting a second sample image in a second training set and second marking information corresponding to the second sample image into a trained first segmentation module, and determining a fusion feature map corresponding to the second sample image based on the first segmentation module;

2. The method of claim 1, wherein the first segmentation module comprises:

the system comprises a feature extraction network, a first class branch network, a first detection frame branch network and a mask branch network.

3. The method of claim 2, wherein the training process of the first segmentation module comprises:

4. The method of claim 2, wherein the second segmentation module comprises:

5. The method of claim 4, wherein the training process of the second segmentation module comprises:

6. The method of claim 5, wherein the determining, based on the class weight regularization network, respective class probability values for each of the second fused feature maps comprises:

7. The method of claim 6, wherein determining the respective class probability values corresponding to the respective second fused feature maps according to the respective second fused feature maps, the mean of the respective class weights, and the diagonal covariance matrix of the respective class weights comprises:

8. A method of image segmentation, the method comprising:

determining third prediction category information, third prediction mask information and a first probability value corresponding to the third prediction category information of the image based on a first segmentation module in the image segmentation model;

if the first probability value is greater than the second probability value, using the third prediction type information and the third prediction mask information as a segmentation result;

wherein the image segmentation model is determined by: inputting a first sample image in a first training set and first marking information corresponding to the first sample image into a first segmentation module in an image segmentation model, and training the first segmentation module; inputting a second sample image in a second training set and second marking information corresponding to the second sample image into a trained first segmentation module, and determining a fusion feature map corresponding to the second sample image based on the first segmentation module; inputting the fusion feature map into a second segmentation module in the image segmentation model, determining a segmentation result of the fusion feature map based on the second segmentation module, and training the second segmentation module according to the segmentation result and the second labeling information.

9. An apparatus for training an image segmentation model, the apparatus comprising:

the first training unit is used for inputting a first sample image in a first training set and first marking information corresponding to the first sample image into a first segmentation module in an image segmentation model and training the first segmentation module;

10. An image segmentation apparatus, characterized in that the apparatus comprises:

an image segmentation unit configured to take the third prediction category information and the third prediction mask information as a segmentation result if the first probability value is greater than the second probability value; if the first probability value is not greater than the second probability value, using the fourth prediction category information and the fourth prediction mask information as a segmentation result;

wherein the image segmentation model is determined by: inputting a first sample image in a first training set and first marking information corresponding to the first sample image into a first segmentation module in an image segmentation model, and training the first segmentation module; inputting second sample images in a second training set and second labeling information corresponding to the second sample images into a trained first segmentation module, and determining a fusion feature map corresponding to the second sample images based on the first segmentation module; inputting the fusion feature map into a second segmentation module in the image segmentation model, determining a segmentation result of the fusion feature map based on the second segmentation module, and training the second segmentation module according to the segmentation result and the second labeling information.

11. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing the communication between the processor and the memory through the communication bus;

a memory for storing a computer program;

a processor for carrying out the method steps of any one of claims 1 to 7 or the method steps of claim 8 when executing a program stored in the memory.

12. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-7 or carries out the method steps of claim 8.