CN115170807A

CN115170807A - Image segmentation and model training method, device, equipment and medium

Info

Publication number: CN115170807A
Application number: CN202211076329.7A
Authority: CN
Inventors: 付建海; 俞元杰; 吴立; 颜成钢; 李亮; 殷海兵; 熊剑平
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2022-09-05
Filing date: 2022-09-05
Publication date: 2022-10-11
Anticipated expiration: 2042-09-05
Also published as: CN115170807B

Abstract

The application discloses an image segmentation and model training method, device, equipment and medium. According to the method and the device, the initial parameters of the image segmentation model are trained by combining the sample images, and the initial parameters of the image segmentation model are adjusted according to the image to be processed, so that the image segmentation model is trained based on a small number of sample images, the image to be processed can be accurately and effectively segmented, and the cost of training the image segmentation model is reduced.

Description

Image segmentation and model training method, device, equipment and medium

Technical Field

The application relates to the technical field of machine vision, in particular to an image segmentation and model training method, device, equipment and medium.

Background

The deep convolutional neural network makes a major breakthrough in the task of image segmentation and visual understanding. The method utilizes the availability of large-scale sample images, such as an image recognition database ImageNet, and utilizes the sample images in the ImageNet to train an image segmentation model.

In the related art, when an image segmentation model is trained, large-scale sample images need to be classified and labeled, and particularly for intensive prediction tasks such as semantic segmentation, larger-scale sample images need to be classified and labeled respectively, so that the image segmentation model capable of performing semantic segmentation on the images can be obtained. The related art has problems that the cost of performing classification labeling on large-scale sample images respectively is large, and the data processing amount in the training process of the image segmentation model is also large.

Disclosure of Invention

The embodiment of the application provides an image segmentation and model training method, device, equipment and medium, which are used for providing a scheme for training an image segmentation model based on a small number of sample images and used for image segmentation.

The application provides an image segmentation method, which comprises the following steps:

acquiring an image to be processed, and inputting the image into a trained image segmentation model, wherein the image segmentation model is obtained by training based on a sample image in a training set and a corresponding annotation image;

respectively determining an initial classification result and a thermodynamic diagram of the image based on the image segmentation model, determining a probability distribution value according to the initial classification result and the thermodynamic diagram, adjusting parameters of the image segmentation model according to the probability distribution value, and segmenting the image according to the image segmentation model after the parameters are adjusted to obtain a segmented image.

Further, the image segmentation model comprises an encoder, a decoder, a first classifier, a depth self-attention transformer and a second classifier;

the encoder is connected with the decoder, the decoder is further connected with the first classifier and the depth self-attention converter respectively, and the first classifier and the depth self-attention converter are further connected with the second classifier respectively.

Further, the determining an initial classification result and a thermodynamic diagram of the image based on the image segmentation model, respectively, determining a probability distribution value according to the initial classification result and the thermodynamic diagram, and adjusting parameters of the image segmentation model according to the probability distribution value includes:

inputting the image into an encoder in the image segmentation model, and determining a depth feature map corresponding to the image based on the encoder and a decoder;

determining an initial classification result of the depth feature map based on the first classifier, determining a thermodynamic diagram of the depth feature map based on the depth self-attention transformer;

and determining a probability distribution value based on the initial classification result and the thermodynamic diagram of the second classifier, and adjusting the parameters of the second classifier according to the probability distribution value.

Further, the adjusting the parameter of the second classifier according to the probability distribution value includes:

and judging whether the probability distribution value is larger than a set allowable threshold value, if so, adjusting the parameters of the second classifier according to a random gradient descent method, and if not, stopping adjusting the parameters of the second classifier.

Further, the segmenting the image according to the image segmentation model after the parameter adjustment to obtain a segmented image includes:

performing first fusion processing on the depth feature map and the initial classification result based on the depth self-attention transformer to obtain a fusion feature map, and performing second fusion processing on the depth feature map and the fusion feature map to obtain a thermodynamic map;

and performing segmentation processing on the thermodynamic diagram based on the parameters adjusted by the second classifier to obtain the segmentation image.

Further, the determining, based on the encoder and the decoder, a depth feature map corresponding to the image comprises:

obtaining a coding feature map corresponding to the image based on the encoder;

and respectively carrying out multi-stage down-sampling operation, up-sampling operation, channel expansion or compression operation on the coding feature map based on the decoder, and carrying out feature extraction on the coding feature map to obtain a depth feature map corresponding to the coding feature map.

Further, downsampling the encoded feature map based on the decoder comprises:

stacking pixels on the width dimension and the height dimension of the coding feature map onto the depth dimension based on a down-sampling layer of the decoder to obtain a down-sampling operation result;

the upsampling of the encoded feature map based on the decoder comprises:

and tiling pixels on the depth dimension of the coding feature map onto a width-height channel based on an upsampling layer of the decoder to obtain an upsampling operation result.

In another aspect, the present application provides an image segmentation model training method, including:

inputting a sample image in a training set and a corresponding labeled image into an encoder in an image segmentation model, obtaining a coding feature map corresponding to the sample image based on the encoder, inputting the coding feature map into a decoder, and obtaining a depth feature map based on the decoder;

respectively inputting the depth feature map into a first classifier and a depth self-attention converter, obtaining an initial classification result based on the first classifier, and inputting the initial classification result into the depth self-attention converter;

performing fusion processing on the depth feature map and the initial classification result based on a depth self-attention converter to obtain a fusion feature map, performing fusion processing on the fusion feature map and the depth feature map to obtain a thermodynamic diagram, and inputting the thermodynamic diagram into a second classifier;

obtaining a prediction segmentation image according to the thermodynamic diagram based on the second classifier;

and determining a loss function value according to the prediction segmentation image and the annotation image, and training parameters of the encoder, the decoder, the first classifier, the depth self-attention transformer and the second classifier according to the loss function value.

In another aspect, the present application provides an image segmentation apparatus, comprising:

the acquisition module is used for acquiring an image to be processed and inputting the image into a trained image segmentation model, wherein the image segmentation model is obtained by training based on a sample image in a training set and a corresponding labeled image;

and the segmentation module is used for respectively determining an initial classification result and a thermodynamic diagram of the image based on the image segmentation model, determining a probability distribution value according to the initial classification result and the thermodynamic diagram, adjusting parameters of the image segmentation model according to the probability distribution value, and segmenting the image according to the image segmentation model after the parameters are adjusted to obtain a segmented image.

Further, the image segmentation model comprises an encoder, a decoder, a first classifier, a depth self-attention transformer and a second classifier; the encoder is connected with the decoder, the decoder is further connected with the first classifier and the depth self-attention converter respectively, and the first classifier and the depth self-attention converter are further connected with the second classifier respectively.

Further, the obtaining module is further configured to train the image segmentation model based on the sample images in the training set and the corresponding labeled images thereof, and obtain parameters of an encoder, a decoder, a first classifier, a depth self-attention transformer, and a second classifier in the image segmentation model.

Further, the segmentation module is specifically configured to input the image into an encoder in the image segmentation model, and determine, based on the encoder and a decoder, a depth feature map corresponding to the image; determining an initial classification result of the depth feature map based on the first classifier, determining a thermodynamic diagram of the depth feature map based on the depth self-attention transformer; and determining a probability distribution value based on the initial classification result and the thermodynamic diagram of the second classifier, and adjusting the parameters of the second classifier according to the probability distribution value.

Further, the segmentation module is specifically configured to determine whether the probability distribution value is greater than a set allowable threshold, if so, adjust the parameter of the second classifier according to a random gradient descent method, and if not, stop adjusting the parameter of the second classifier.

Further, the segmentation module is specifically configured to perform a first fusion process on the depth feature map and the initial classification result based on the depth self-attention transformer to obtain a fusion feature map, and perform a second fusion process on the depth feature map and the fusion feature map to obtain a thermodynamic map; and performing segmentation processing on the thermodynamic diagram based on the parameters adjusted by the second classifier to obtain the segmentation image.

Further, the segmentation module is specifically configured to obtain, based on the encoder, an encoding feature map corresponding to the image; and respectively carrying out multi-stage down-sampling operation, up-sampling operation, channel expansion or compression operation on the coding feature map based on the decoder, and carrying out feature extraction on the coding feature map to obtain a depth feature map corresponding to the coding feature map.

Further, the segmentation module is specifically configured to stack pixels in a width-height dimension of the encoded feature map onto a depth dimension based on a downsampling layer of the decoder, so as to obtain a downsampling operation result; and tiling pixels on the depth dimension of the coding feature map onto a width-height channel based on an upsampling layer of the decoder to obtain an upsampling operation result.

In another aspect, the present application provides an image segmentation model training apparatus, including:

the first training module is used for inputting the sample images in the training set and the corresponding labeled images into an encoder in the image segmentation model, obtaining coding feature maps corresponding to the sample images based on the encoder, inputting the coding feature maps into a decoder, and obtaining depth feature maps based on the decoder;

the second training module is used for respectively inputting the depth feature map into the first classifier and the depth self-attention converter, obtaining an initial classification result based on the first classifier and inputting the initial classification result into the depth self-attention converter;

the third training module is used for carrying out fusion processing on the depth feature map and the initial classification result based on the depth self-attention converter to obtain a fusion feature map, carrying out fusion processing on the fusion feature map and the depth feature map to obtain a thermodynamic diagram, and inputting the thermodynamic diagram into the second classifier;

the fourth training module is used for obtaining a prediction segmentation image according to the thermodynamic diagram based on the second classifier;

and the fifth training module is used for determining a loss function value according to the prediction segmentation image and the annotation image and training parameters of the encoder, the decoder, the first classifier, the depth self-attention transformer and the second classifier according to the loss function value.

In another aspect, the present application provides an electronic device, which includes a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

a processor for implementing any of the above method steps when executing a program stored in the memory.

In another aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method steps of any of the above.

The application provides an image segmentation and model training method, device, equipment and medium, wherein the method comprises the following steps: acquiring an image to be processed, and inputting the image into a trained image segmentation model, wherein the image segmentation model is obtained by training based on a sample image in a training set and a corresponding annotation image; respectively determining an initial classification result and a thermodynamic diagram of the image based on the image segmentation model, determining a probability distribution value according to the initial classification result and the thermodynamic diagram, adjusting parameters of the image segmentation model according to the probability distribution value, and segmenting the image according to the image segmentation model after the parameters are adjusted to obtain a segmented image.

The technical scheme has the following advantages or beneficial effects:

according to the method, the image segmentation model is trained on the basis of the sample images in the training set and the corresponding labeled images thereof to obtain initial parameters of the image segmentation model, when the image segmentation is carried out on the basis of the image segmentation model, initial classification results and thermodynamic diagrams of the images are respectively determined, probability distribution values are determined according to the initial classification results and the thermodynamic diagrams, the parameters of the image segmentation model are adjusted according to the probability distribution values, and more characteristic information of the images to be processed and the sample images can be mined out through the parameter adjustment mode.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic diagram of an image segmentation process provided herein;

FIG. 2 is a schematic diagram of an image to be processed according to the present application;

FIG. 3 is a diagram illustrating the segmentation effect of the image to be processed according to the present disclosure;

FIG. 4 is a schematic structural diagram of an image segmentation model provided in the present application;

FIG. 5 is a schematic diagram of an image segmentation model training process provided in the present application;

FIG. 6 is a block diagram of a decoder according to the present application;

FIG. 7 is a schematic diagram of an upsampling operation and a downsampling operation provided herein;

FIG. 8 is a flowchart of a system for training an image segmentation model provided herein;

FIG. 9 is a schematic structural diagram of an image segmentation apparatus provided in the present application;

FIG. 10 is a schematic structural diagram of an image segmentation model training apparatus provided in the present application;

fig. 11 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

The present application will now be described in further detail with reference to the accompanying drawings, wherein like reference numerals refer to like elements throughout. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a schematic diagram of an image segmentation process provided in the present application, where the process includes the following steps:

s101: and acquiring an image to be processed, and inputting the image into a trained image segmentation model, wherein the image segmentation model is obtained by training based on a sample image in a training set and a corresponding annotation image.

S102: respectively determining an initial classification result and a thermodynamic diagram of the image based on the image segmentation model, determining a probability distribution value according to the initial classification result and the thermodynamic diagram, and adjusting parameters of the image segmentation model according to the probability distribution value.

S103: and according to the image segmentation model after the parameters are adjusted, carrying out segmentation processing on the image to obtain a segmented image.

The image segmentation method provided by the application is applied to electronic equipment, and the electronic equipment can be equipment such as a PC (personal computer) and a tablet personal computer, and can also be a server.

The electronic equipment stores an image segmentation model which is trained in advance. When the image segmentation model is trained, obtaining each sample image in a training set and an annotation image corresponding to each sample image, inputting each sample image and the annotation image corresponding to each sample image into the image segmentation model, outputting a prediction segmentation image by the image segmentation model, determining a loss function value according to the prediction segmentation image and the annotation image, and finishing parameter training of the image segmentation model by iterative training when the loss function value meets requirements. The label image is labeled with the segmentation class information of each position area.

The electronic equipment inputs the image to be processed into the trained image segmentation model, respectively determines an initial classification result and a thermodynamic diagram of the image based on the image segmentation model, and then determines a probability distribution value according to the initial classification result and the thermodynamic diagram. And judging whether the probability distribution value is greater than a set allowable threshold value, if so, adjusting the parameters of the image segmentation model, then re-determining the initial classification result and the thermodynamic diagram of the image based on the image segmentation model with the adjusted parameters, and stopping parameter adjustment of the image segmentation model when the probability distribution value determined according to the initial classification result and the thermodynamic diagram is not greater than the set allowable threshold value. The probability distribution value may be a cross entropy value, and preferably, the probability distribution value may be a maximum likelihood estimation value.

And the electronic equipment performs segmentation processing on the image according to the image segmentation model after the parameters are adjusted to obtain a segmented image. Fig. 2 is a schematic diagram of an image to be processed provided by the present application, fig. 3 is a diagram of a segmentation effect of the image to be processed provided by the present application, and in fig. 3, pixels with the same color represent objects of the same category.

Fig. 4 is a schematic structural diagram of an image segmentation model provided in the present application, and as shown in fig. 4, the image segmentation model includes an encoder, a decoder, a first classifier, a depth self-attention transformer, and a second classifier;

the encoder is connected with the decoder, the decoder is further connected with the first classifier and the depth self-attention converter respectively, and the first classifier and the depth self-attention converter are further connected with the second classifier respectively. The depth self-attention Transformer is, for example, a transducer.

The encoder can select network structures with different depths according to complex programs of tasks, and can select basic networks such as resnet, mobilenet, shufflenet and the like. The decoder network structure is an improved Unet network structure, and the generalization capability of the algorithm can be effectively improved.

Training the image segmentation model based on the sample images in the training set and the corresponding annotation images comprises the following steps:

and training the image segmentation model based on the sample images in the training set and the corresponding labeled images to obtain parameters of an encoder, a decoder, a first classifier, a depth self-attention transformer and a second classifier in the image segmentation model.

When the image segmentation model is trained, inputting a sample image in a training set and a corresponding labeled image into the image segmentation model, extracting features of the sample image based on an encoder, a decoder, a first classifier, a depth self-attention converter and a second classifier in the image segmentation model to obtain a predicted segmentation image, determining a loss function value according to the predicted segmentation image and the labeled image, adjusting parameters of at least one of the encoder, the decoder, the first classifier, the depth self-attention converter and the second classifier when the loss function value does not meet requirements, and then continuing iterative training. And determining that the parameter training of the image segmentation model is finished until the loss function value meets the requirement. Parameters of an encoder, a decoder, a first classifier, a depth self-attention transformer and a second classifier in the image segmentation model are obtained at this time.

Fig. 5 is a schematic diagram of an image segmentation model training process provided in the present application, where the process includes the following steps:

s201: inputting the sample images in the training set and the corresponding labeled images into an encoder in the image segmentation model, obtaining a coding feature map corresponding to the sample images based on the encoder, inputting the coding feature map into a decoder, and obtaining a depth feature map based on the decoder.

S202: and respectively inputting the depth feature map into a first classifier and a depth self-attention transformer, obtaining an initial classification result based on the first classifier, and inputting the initial classification result into the depth self-attention transformer.

S203: and performing fusion processing on the depth feature map and the initial classification result based on the depth self-attention transformer to obtain a fusion feature map, performing fusion processing on the fusion feature map and the depth feature map to obtain a thermodynamic diagram, and inputting the thermodynamic diagram into a second classifier.

S204: and obtaining a prediction segmentation image according to the thermodynamic diagram based on the second classifier.

S205: and determining a loss function value according to the prediction segmentation image and the annotation image, and training parameters of the encoder, the decoder, the first classifier, the depth self-attention transformer and the second classifier according to the loss function value.

In this application, the determining an initial classification result and a thermodynamic diagram of the image based on the image segmentation model, respectively, determining a probability distribution value according to the initial classification result and the thermodynamic diagram, and adjusting parameters of the image segmentation model according to the probability distribution value includes:

When adjusting the parameters of the image segmentation model, the parameters of the encoder, the decoder, the first classifier and the depth self-attention transformer are fixed, and the parameters of the second classifier are adjusted. Wherein the parameter may be a weight.

The electronic equipment inputs the image to be processed into an encoder in the image segmentation model, carries out encoding processing on the image based on the encoder to obtain an encoding characteristic diagram, inputs the encoding characteristic diagram into a decoder, and carries out decoding processing on the encoding characteristic diagram based on the decoder to obtain a decoded depth characteristic diagram. The depth feature maps are input to a first classifier and a depth self-attention transformer, respectively. An initial classification result of the depth feature map is determined based on the first classifier, the initial classification result is input to the second classifier, a thermodynamic diagram of the depth feature map is determined based on the depth self-attention transformer, and the thermodynamic diagram is input to the second classifier. And determining an initial classification result and a probability distribution value based on the second classifier, and adjusting the parameters of the second classifier according to the probability distribution value.

The adjusting the parameter of the second classifier according to the probability distribution value comprises:

Specifically, when the probability distribution value is greater than a set allowable threshold, parameters of a second classifier of the image segmentation model are adjusted, then based on the second classifier after the parameters are adjusted, the probability distribution value is determined again according to the initial classification result and the thermodynamic diagram, and when the probability distribution value is not greater than the set allowable threshold, the parameters of the second classifier are stopped being adjusted. When the parameters of the second classifier are adjusted, the parameters of the second classifier can be adjusted according to a random gradient descent method.

In the application, a conventional training mode is adopted for a coder, a decoder, a first classifier and a depth self-attention converter to train a segmentation model; and performing meta-learning on the second classifier, namely learning aiming at the data of the small sample, and adaptively fine-tuning the parameters of the second classifier. The method comprises the following specific steps:

the weight of the image segmentation model network is divided into four parts, namely a coder-Decoder (Encoder-Decoder) weight A, a first classifier module weight C1, a second classifier module weight C2 and a depth self-attention Transformer (DEN) weight T.

A training stage: and carrying out full-supervision training by using the sample image (support image) and the mark image (mask) in the training set. After training is finished, the weights A, C1 and T are always in a locking state, and the weight C2 is subjected to self-adaptive learning adjustment in a testing stage.

And (3) a testing stage: for each image to be processed in the query set (query image), only the weight C2 is updated using a meta learning method, and then the image to be processed is segmented.

Since the image to be processed does not contain any annotation data, the C2 weight cannot be learned and updated through full supervised training. Therefore, the meta learning method is adopted, and the steps are as follows:

every time a to-be-processed image in the query set is introduced, the weight of the second classifier is first reset to C2.

Using the coder-Decoder (Encoder-Decoder) weight a, a depth feature map F of an image to be processed is extracted.

And calculating the depth feature map F through the weight C1 of the first classifier to obtain an initial classification result S0.

The depth feature map F is calculated by the weight T of the depth self-attention transducer, resulting in a thermodynamic map H0 based on the attention mechanism (attention).

The probability distribution values L of S0 and H0 are calculated by the weight C2 of the second classifier.

If L is smaller than the preset tolerance, stopping iteration, otherwise, updating the weight C2 by a random gradient descent method.

And calculating the depth characteristic graph F through the finely adjusted weight C2 to obtain a final segmentation result S.

In the whole testing stage, the weight C2 is reset for each query image and then is subjected to real-time and dynamic self-adaptive updating, unlike the situation that the weights of an encoder, a decoder, a first classifier and a depth self-attention converter are completely locked and then multiplexed.

Compared with the traditional large-batch data learning, the method has the advantages that: meta learning is suitable for learning of small samples, the meta learning is fused into a deep learning scheme, more feature information of the small samples is effectively mined in a multi-stage feature fine tuning mode, the effect that the traditional large-batch data learning is equivalent is achieved, and the cost performance is high.

In the present application, the nnx model is a generic chip deployment model. The method aims at unifying protocols, and is an open file format designed for machine learning and used for storing a trained model. Different artificial intelligence frames (such as a caffe's capacity model, a pytorech's pth model, a Microsoft's onnx model and the like) can adopt the same format to store model data and interact, and further uniformly convert the model data into an nnx model, and finally achieve the efficient solution of ' one model storage and multiple platform deployment '. And (3) deploying the model, uniformly converting the model into an nnx model after the model training is finished, and uniformly deploying by adopting an nnx inference engine. Wherein the nnx model is a unified inference forward engine of the developed chip hardware. NNX is a unified inference deployment engine.

In the application, after the parameter adjustment of the second classifier is finished, the image is segmented based on the parameters of the encoder, the decoder, the first classifier and the parameters adjusted by the second classifier, so that a segmented image is obtained.

The step of performing segmentation processing on the image according to the image segmentation model after the parameters are adjusted to obtain a segmented image comprises the following steps:

According to the method and the device, a depth feature map is obtained based on an encoder and a decoder, after an initial classification result is obtained based on a first classifier, a depth self-attention converter is used for conducting first fusion processing on the depth feature map and the initial classification result to obtain a fusion feature map, and then second fusion processing is conducted on the depth feature map and the fusion feature map to obtain a thermodynamic diagram. And then inputting the thermodynamic diagrams into a second classifier, and performing segmentation processing on the thermodynamic diagrams based on the parameters adjusted by the second classifier to obtain a segmented image. The first fusion process and the second fusion process in the present application include, but are not limited to: feature splicing and fusion processing, feature addition or multiplication and fusion processing, and feature fusion processing according to weight. The image segmentation model provided by the application is suitable for various scenes, such as: a new retail shelf segmentation scenario, intelligent water service (surface float detection), and assisted driving (drivable area and lane line segmentation).

In this application, determining, based on the encoder and the decoder, a depth feature map corresponding to the image includes:

obtaining a coding feature map corresponding to the image based on the encoder;

After the encoder in the image segmentation model carries out encoding processing on multiple images, an encoding characteristic graph is obtained, and then the encoding characteristic graph is input into a decoder. And respectively carrying out multi-stage down-sampling operation, up-sampling operation, channel expansion or compression operation on the coding feature map based on a decoder, and carrying out feature extraction on the coding feature map to obtain a depth feature map corresponding to the coding feature map. The down-sampling operation is an operation for reducing the scale, the up-sampling operation is an operation for increasing the scale, and the up-sampling operation and the down-sampling operation can be inverse operations to each other. The channel expansion or compression operation is a scale-invariant operation.

In this application, the down-sampling of the encoded feature map based on the decoder comprises:

the upsampling operation of the encoded feature map based on the decoder comprises:

The processing procedure of the decoder is explained below with reference to the drawings.

FIG. 6 is a block diagram of a decoder according to the present application, X in FIG. 6 ^0，0 、X ^0，1 ，X ^1，0 、……、X ^0，4 Is a processing unit in a decoder. Horizontal arrows indicate channel expansion or compression operations, typically one or moreAnd a plurality of convolution operator operations, channel expansion or compression operations, the resolution of the characteristic diagram is unchanged, the width and the height are consistent with the input, but the number of channels is increased or decreased. The diagonal down arrows indicate the down-sampling operation, which reduces the resolution for the feature map. The obliquely upward arrows indicate the up-sampling operation, which increases the resolution for the feature map.

For ease of understanding, the following is illustrated.

E.g. original feature size C ^i，j *640 x 640,1/2 times the size of C ^i，j *320 x 320,1/4 times the size of C ⁱ ^，j *160 x 160,1/8 times the size C ^i，j *80 x 80,1/16 times the size C ^i，j *40*40. The downsampling operation is, for example, to change the feature map from 3 × 640 to 3 × 320. The upsampling operation changes the feature map from 64 × 160 to 64 × 320, for example. The channel expansion or compression operation is, for example, to change the characteristic pattern from 256 × 80 to 512 × 80, or from 256 × 80 to 128 × 80.

In fig. 6, when there are a plurality of inputs to one processing unit, a fusion process is first performed on the plurality of input feature maps, for example, a feature concatenation fusion process, a feature addition or multiplication fusion process, a feature-by-weight fusion process, or the like.

In the application, the unet network in the decoder is improved, and shallow and deep features of the unet network are deeply fused; the hops are redesigned to aggregate features of different semantic scales across the decoder unet network, resulting in a highly flexible feature fusion scheme. Meanwhile, the super-resolution technology is applied to modification of a network structure, an original down-sampling layer in a unet network is replaced by a focus-shuffle layer, and an up-sampling layer is replaced by a pixel-shuffle layer. The two are inverse operations, and the focus-buffer layer is used for the down-sampling operation, and the pixel-buffer layer is used for the up-sampling operation.

Fig. 7 is a schematic diagram of an upsampling operation and a downsampling operation provided by the present application. As shown in fig. 7, the left-to-right conversion is a focus-shuffle operation, and the right-to-left conversion is a pixel-shuffle operation, which are inverse operations to each other.

The specific operation of the focus-shuffle layer is as follows: let r be the downsampling magnification (an integer greater than 1); for a C H W dimension feature map (when designing a network structure, ensuring that H and W are integral multiples of r), the feature map is divided at intervals of r pixels along H and W channels. The unit obtained by division comprises r pixels in the height H and width W dimensions, and the whole unit comprises r2 pixels in the height H and width W dimensions; then stacking the pixels in the square units to the depth dimension C from left to right and from top to bottom according to the numbering sequence of the depth dimension C; this resulted in a characteristic map of (C r 2) × (Hr) × (Wr) dimensions after rearrangement.

In essence, the focus-shuffle layer "changes the height and width to depth", and stacks the pixels on the H-channel and W-channel onto the depth channel C. Finally, the height dimension and the width dimension are compressed, the depth dimension is expanded, but all pixel values in the transformation process are kept unchanged, only the adjustment of the arrangement sequence is carried out, the lossless transformation is carried out, and the information loss caused by the operation of averaging/discarding part of the pixel values in the traditional downsampling method is also avoided.

The specific operation of the pixel-shuffle layer is as follows: let r be the upsampling magnification (an integer greater than 1); for a characteristic diagram of C H W (when designing a network structure, ensuring that C is an integral multiple of r 2), for given coordinate points (Hi, wj) in dimensions of H height and W width, arranging r2 pixels into r square units in the order of numbering channels in a group along a C channel; this results in a "square matrix of cells" in the (C/r 2) × (H) × (W) dimension, each cell containing r pixels in both the H and W dimensions, in effect a characteristic map in the (C/r 2) × (Hr) × (Wr) dimensions after the rearrangement.

In essence, the pixel-shuffle layer "changes depth into height and width", tiles the pixels on the depth dimension C to the dimensions of height H and width W, and finally, the depth channel is compressed and the dimensions of height and width are expanded, but all the pixel values are kept unchanged in the conversion process, only the adjustment of the arrangement sequence is performed, the conversion is lossless, and the noise interference caused by various interpolation methods in the traditional sampling method is also avoided.

Fig. 8 is a flowchart of a training system of an image segmentation model provided in the present application, as shown in fig. 8, including:

s301: and acquiring a sample image in the training set and a corresponding labeled image thereof.

The number of sample images is, for example, 300, 400, and the like.

S302: and training an image segmentation model based on the sample images in the training set and the corresponding labeled images.

S303: and uniformly converting the image segmentation model into an nnx model, and uniformly deploying the nnx model into hardware equipment by adopting an nnx inference engine.

S304: and starting hardware equipment to run the nnx model to finish semantic segmentation of the image to be processed.

Fig. 9 is a schematic structural diagram of an image segmentation apparatus provided in the present application, where the apparatus includes:

an obtaining module 81, configured to obtain an image to be processed, and input the image into a trained image segmentation model, where the image segmentation model is obtained by training based on a sample image in a training set and a corresponding labeled image thereof;

and the segmentation module 82 is configured to determine an initial classification result and a thermodynamic diagram of the image based on the image segmentation model, determine a probability distribution value according to the initial classification result and the thermodynamic diagram, adjust parameters of the image segmentation model according to the probability distribution value, and segment the image according to the image segmentation model with the parameters adjusted to obtain a segmented image.

The image segmentation model comprises an encoder, a decoder, a first classifier, a depth self-attention transformer and a second classifier; the encoder is connected with the decoder, the decoder is further connected with the first classifier and the depth self-attention converter respectively, and the first classifier and the depth self-attention converter are further connected with the second classifier respectively.

The obtaining module 81 is further configured to train the image segmentation model based on the sample images in the training set and the corresponding labeled images thereof, so as to obtain parameters of an encoder, a decoder, a first classifier, a depth self-attention transformer, and a second classifier in the image segmentation model.

The segmentation module 82 is specifically configured to input the image into an encoder in the image segmentation model, and determine a depth feature map corresponding to the image based on the encoder and the decoder; determining an initial classification result of the depth feature map based on the first classifier, determining a thermodynamic diagram of the depth feature map based on the depth self-attention transformer; and determining a probability distribution value based on the initial classification result and the thermodynamic diagram of the second classifier, and adjusting the parameters of the second classifier according to the probability distribution value.

The segmentation module 82 is specifically configured to determine whether the probability distribution value is greater than a set allowable threshold, if so, adjust the parameter of the second classifier according to a random gradient descent method, and if not, stop adjusting the parameter of the second classifier.

The segmentation module 82 is specifically configured to perform a first fusion process on the depth feature map and the initial classification result based on the depth self-attention transformer to obtain a fusion feature map, and perform a second fusion process on the depth feature map and the fusion feature map to obtain a thermodynamic diagram; and performing segmentation processing on the thermodynamic diagram based on the parameters adjusted by the second classifier to obtain the segmentation image.

The segmentation module 82 is specifically configured to obtain an encoding feature map corresponding to the image based on the encoder; and respectively carrying out multi-stage down-sampling operation, up-sampling operation, channel expansion or compression operation on the coding feature map based on the decoder, and carrying out feature extraction on the coding feature map to obtain a depth feature map corresponding to the coding feature map.

The segmentation module 82 is specifically configured to stack pixels in a width dimension and a height dimension of the encoded feature map onto a depth dimension based on a downsampling layer of the decoder, so as to obtain a downsampling operation result; and tiling pixels on the depth dimension of the coding feature map onto a width-height channel based on an upsampling layer of the decoder to obtain an upsampling operation result.

Fig. 10 is a schematic structural diagram of an image segmentation model training apparatus provided in the present application, where the apparatus includes:

the first training module 91 is configured to input the sample images in the training set and the corresponding labeled images into an encoder in the image segmentation model, obtain coding feature maps corresponding to the sample images based on the encoder, input the coding feature maps into a decoder, and obtain depth feature maps based on the decoder;

the second training module 92 is configured to input the depth feature map into the first classifier and the depth self-attention transformer, obtain an initial classification result based on the first classifier, and input the initial classification result into the depth self-attention transformer;

a third training module 93, configured to perform fusion processing on the depth feature map and the initial classification result based on the depth self-attention transformer to obtain a fusion feature map, perform fusion processing on the fusion feature map and the depth feature map to obtain a thermodynamic diagram, and input the thermodynamic diagram into a second classifier;

a fourth training module 94, configured to obtain a predicted segmented image according to the thermodynamic diagram based on the second classifier;

a fifth training module 95, configured to determine a loss function value according to the prediction segmentation image and the annotation image, and train parameters of the encoder, the decoder, the first classifier, the depth self-attention transformer, and the second classifier according to the loss function value.

The present application also provides an electronic device, as shown in fig. 11, including: the system comprises a processor 401, a communication interface 402, a memory 403 and a communication bus 404, wherein the processor 401, the communication interface 402 and the memory 403 complete mutual communication through the communication bus 404;

the memory 403 has stored therein a computer program which, when executed by the processor 401, causes the processor 401 to perform any of the above method steps.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface 402 is used for communication between the above-described electronic apparatus and other apparatuses.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Alternatively, the memory may be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a central processing unit, a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc.

The present application further provides a computer-readable storage medium having stored therein a computer program executable by an electronic device, the program, when run on the electronic device, causing the electronic device to perform any of the above method steps.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the scope of the present application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. An image segmentation method, characterized in that the method comprises:

acquiring an image to be processed, and inputting the image into a trained image segmentation model, wherein the image segmentation model is obtained by training based on a sample image in a training set and a corresponding labeled image;

respectively determining an initial classification result and a thermodynamic diagram of the image based on the image segmentation model, determining a probability distribution value according to the initial classification result and the thermodynamic diagram, and adjusting parameters of the image segmentation model according to the probability distribution value;

and according to the image segmentation model with the adjusted parameters, carrying out segmentation processing on the image to obtain a segmented image.

2. The method of claim 1, wherein the image segmentation model comprises an encoder, a decoder, a first classifier, a depth self-attention transformer, and a second classifier;

3. The method of claim 2, wherein the determining an initial classification result and a thermodynamic diagram of the image based on the image segmentation model, respectively, determining a probability distribution value according to the initial classification result and the thermodynamic diagram, and adjusting a parameter of the image segmentation model according to the probability distribution value comprises:

determining an initial classification result of the depth feature map based on the first classifier, determining a thermodynamic map of the depth feature map based on the depth self-attention transformer;

4. The method of claim 3, wherein the adjusting the parameters of the second classifier according to the probability distribution values comprises:

5. The method as claimed in claim 3, wherein the performing segmentation processing on the image according to the image segmentation model after the parameter adjustment to obtain a segmented image comprises:

6. The method of claim 3, wherein the determining the depth feature map to which the image corresponds based on the encoder and decoder comprises:

obtaining a coding feature map corresponding to the image based on the coder;

7. The method of claim 6, wherein downsampling the encoded feature map based on the decoder comprises:

the upsampling of the encoded feature map based on the decoder comprises:

8. An image segmentation model training method, characterized in that the method comprises:

inputting the depth feature map into a first classifier and a depth self-attention converter respectively, obtaining an initial classification result based on the first classifier, and inputting the initial classification result into the depth self-attention converter;

performing fusion processing on the depth feature map and the initial classification result based on the depth self-attention transformer to obtain a fusion feature map, performing fusion processing on the fusion feature map and the depth feature map to obtain a thermodynamic diagram, and inputting the thermodynamic diagram into a second classifier;

9. An image segmentation apparatus, characterized in that the apparatus comprises:

and the segmentation module is used for respectively determining an initial classification result and a thermodynamic diagram of the image based on the image segmentation model, determining a probability distribution value according to the initial classification result and the thermodynamic diagram, adjusting parameters of the image segmentation model according to the probability distribution value, and segmenting the image according to the image segmentation model with the parameters adjusted to obtain a segmented image.

10. An apparatus for training an image segmentation model, the apparatus comprising:

and the fifth training module is used for determining a loss function value according to the prediction segmentation image and the annotation image and training the parameters of the encoder, the decoder, the first classifier, the depth self-attention transformer and the second classifier according to the loss function value.

11. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for carrying out the method steps of any one of claims 1 to 7 or the method steps of claim 8 when executing a program stored in the memory.

12. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-7 or carries out the method steps of claim 8.