CN114627296B - Training method and device for image segmentation model, electronic equipment and storage medium - Google Patents

Training method and device for image segmentation model, electronic equipment and storage medium Download PDF

Info

Publication number
CN114627296B
CN114627296B CN202210282010.3A CN202210282010A CN114627296B CN 114627296 B CN114627296 B CN 114627296B CN 202210282010 A CN202210282010 A CN 202210282010A CN 114627296 B CN114627296 B CN 114627296B
Authority
CN
China
Prior art keywords
image
sub
sample
network
segmentation model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210282010.3A
Other languages
Chinese (zh)
Other versions
CN114627296A (en
Inventor
马璐
李小星
丁佳
吕晨翀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Yizhun Intelligent Technology Co.,Ltd.
Original Assignee
Beijing Yizhun Medical AI Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yizhun Medical AI Co Ltd filed Critical Beijing Yizhun Medical AI Co Ltd
Priority to CN202210282010.3A priority Critical patent/CN114627296B/en
Publication of CN114627296A publication Critical patent/CN114627296A/en
Application granted granted Critical
Publication of CN114627296B publication Critical patent/CN114627296B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The present disclosure provides a training method of an image segmentation model, the method comprising: determining a first marking prediction image corresponding to a first marking sample image in the first training subset based on the first coding sub-network and the decoding sub-network; determining spatial feature vectors corresponding to all sample images in the first training subset based on the first coding sub-network, the second coding sub-network and the transformation sub-network; determining a first sub-loss value of the image segmentation model based on a first annotation image and a first annotation prediction image corresponding to the first annotation sample image; determining a second sub-loss value of the image segmentation model based on the space characteristic vectors corresponding to all the sample images in the first training subset; training the image segmentation model based on the first sub-loss value and the second sub-loss value of the image segmentation model; the first training subset comprises at least one group of sample image pairs, and each group of sample image pairs comprises an annotated sample image and a non-standard sample image which are acquired from the same video.

Description

Training method and device for image segmentation model, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to a training method and apparatus for an image segmentation model, an electronic device, and a storage medium.
Background
In the application of computer vision, high-quality marking data are very scarce and the acquisition cost is high; the method needs an experienced technician to manually label the sample, so that the technician is required to have higher occupational quality, and a large amount of human resources are consumed; furthermore, image segmentation differs from image classification, which only needs to assign a label to an image, and image segmentation needs to assign a label to each pixel on the image, which undoubtedly increases the labeling pressure.
Disclosure of Invention
The present disclosure provides a training method and apparatus for an image segmentation model, an electronic device, and a storage medium, so as to at least solve the above technical problems in the prior art.
According to a first aspect of the present disclosure, a method of training an image segmentation model, the image segmentation model comprising a first coding subnetwork, a decoding subnetwork, a second coding subnetwork, and a transformation subnetwork, is disclosed, the method comprising:
determining a first annotation predictive image corresponding to a first annotation sample image in a first training subset based on the first coding subnetwork and the decoding subnetwork;
determining spatial feature vectors corresponding to all sample images in the first training subset based on the first coding sub-network, the second coding sub-network and the transformation sub-network;
determining a first sub-loss value of the image segmentation model based on a first labeled image corresponding to the first labeled sample image and the first labeled predicted image; determining a second sub-loss value of the image segmentation model based on the spatial feature vectors corresponding to all sample images in the first training subset;
training the image segmentation model based on a first sub-loss value and a second sub-loss value of the image segmentation model;
the first training subset comprises at least one group of sample image pairs, and each group of sample image pairs comprises an annotated sample image and a non-standard sample image which are acquired from the same video; different sets of sample images differ from the corresponding video.
In the foregoing solution, before determining, based on the first encoding sub-network and the decoding sub-network, a first labeled prediction image corresponding to a first labeled sample image in the first training subset, the method further includes:
a training set of models is obtained from at least one video that includes the first training subset.
In the above solution, the obtaining the model training set including the first training subset from at least one video includes performing the following operations on each video of the at least one video:
the method comprises the steps of obtaining at least two images in any video based on a first time interval, labeling at least one region of a first labeled sample image in the at least two images, generating a first labeled image corresponding to the first labeled sample image based on a labeling result, and confirming that the images except the first labeled sample image in the at least two images are first non-standard sample images.
In the foregoing aspect, the determining a first labeled predicted image corresponding to a first labeled sample image in a first training subset based on the first encoding subnetwork and the decoding subnetwork includes:
inputting a first labeling sample image in a first training subset into the first coding sub-network to obtain a first labeling sample characteristic;
and inputting the first labeled sample characteristic into the decoding sub-network to obtain the first labeled predicted image.
In the above solution, the determining the spatial feature vectors corresponding to all the sample images in the first training subset based on the first coding sub-network, the second coding sub-network and the transformation sub-network includes performing the following operations on each sample image in the first training subset except the first labeled sample image:
inputting sample images except the first marked sample image in the sample images into the second coding sub-network, and acquiring sample characteristics corresponding to the sample images; inputting the sample characteristics to a first transformation layer included in the transformation sub-network, and acquiring corresponding spatial characteristic vectors;
inputting a first marking sample image in a first training subset into the first coding subnetwork to obtain a first marking sample characteristic; and inputting the first labeling sample feature into a second transformation layer included in the transformation sub-network, and acquiring a spatial feature vector corresponding to the first labeling sample image.
In the foregoing aspect, the determining a first sub-loss value of the image segmentation model based on the first labeled image corresponding to the first labeled sample image and the first labeled predicted image includes:
determining a first sub-loss value of the image segmentation model based on the prediction probability value for each pixel in the first annotated sample image, the total number of annotation classes, and the number of sample image pairs in the first training subset in the first annotated predictive image, the first annotated predictive image.
In the above scheme, the determining a second sub-loss value of the image segmentation model based on the spatial feature vectors corresponding to all the sample images includes:
and determining a second sub-loss value of the image segmentation model based on the distance between the spatial feature vector of the first labeled sample image and the spatial feature vectors corresponding to all the sample images.
In the foregoing solution, the training the image segmentation model based on the first sub-loss value and the second sub-loss value of the image segmentation model includes:
updating model parameters of a first coding sub-network, a decoding sub-network and a second transform layer in the image segmentation model based on the first sub-loss value and the second sub-loss value of the image segmentation model;
and updating the model parameters of the second coding sub-network and the first transformation layer based on the updated model parameter momentum of the first coding sub-network.
In the above solution, the training the image segmentation model based on the first sub-loss value and the second sub-loss value of the image segmentation model further includes:
training the image segmentation model based on the first coding sub-network, the second coding sub-network, the decoding sub-network, and the transformation layer after updating model parameters, and a second training sub-set.
According to a second aspect of the present disclosure, there is provided an image segmentation method implemented based on the trained image segmentation model, the method including:
inputting an image to be segmented into a first coding sub-network included in the image segmentation model, and determining a first characteristic image corresponding to the image to be segmented;
and inputting the first characteristic image into a decoding sub-network included in the image segmentation model, and determining segmentation results of different classes in the image to be segmented.
According to a third aspect of the disclosure, the image segmentation model comprises a first coding subnetwork, a decoding subnetwork, a second coding subnetwork and a transformation subnetwork, the device comprising:
a first determining unit, configured to determine, based on the first coding sub-network and the decoding sub-network, a first annotation prediction image corresponding to a first annotation sample image in a first training subset;
a second determining unit, configured to determine, based on the first coding sub-network, the second coding sub-network, and the transformation sub-network, spatial feature vectors corresponding to all sample images in the first training subset;
a third determining unit, configured to determine a first sub-loss value of the image segmentation model based on the first labeled sample image and the first labeled predicted image; determining a second sub-loss value of the image segmentation model based on the spatial feature vectors corresponding to all sample images in the first training subset;
a training unit, configured to train the image segmentation model based on a first sub-loss value and a second sub-loss value of the image segmentation model;
the first training subset comprises at least one group of sample image pairs, and each group of sample image pairs comprises an annotated sample image and a non-standard sample image which are acquired from the same video; different sets of sample images differ from the corresponding video.
According to a fourth aspect of the present disclosure, there is provided an image segmentation apparatus comprising:
the image segmentation model comprises an encoding unit, a segmentation unit and a segmentation unit, wherein the encoding unit is used for inputting an image to be segmented into a first encoding subnetwork included in the image segmentation model and determining a first characteristic image corresponding to the image to be segmented;
and the decoding unit is used for inputting the first characteristic image into a decoding sub-network included in the image segmentation model and determining segmentation results of different categories in the image to be segmented.
According to a fifth aspect of the present disclosure, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the image segmentation model training method and the image segmentation method of the present disclosure.
According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a training method of an image segmentation model and an image segmentation method according to the present disclosure.
The disclosed training method for an image segmentation model, which comprises a first coding subnetwork, a decoding subnetwork, a second coding subnetwork and a transformation subnetwork, comprises: determining a first labeling prediction image corresponding to a first labeling sample image in a first training subset based on the first coding sub-network and the decoding sub-network; determining spatial feature vectors corresponding to all sample images in the first training subset based on the first coding sub-network, the second coding sub-network and the transformation sub-network; determining a first sub-loss value of the image segmentation model based on a first labeled image corresponding to the first labeled sample image and the first labeled predicted image; determining a second sub-loss value of the image segmentation model based on the spatial feature vectors corresponding to all sample images in the first training subset; training the image segmentation model based on a first sub-loss value and a second sub-loss value of the image segmentation model; the first training subset comprises at least one group of sample image pairs, and each group of sample image pairs comprises an annotated sample image and a non-standard sample image which are acquired from the same video; different sets of sample images differ from the corresponding video. The image segmentation model is trained simultaneously by combining the marked image and the unmarked image (namely, the unmarked image), the unmarked image is effectively utilized, the cost of the marked image is reduced, the loss value of the image segmentation model is determined based on the space characteristic vector corresponding to the marked image and the unmarked image, and the expression capability and the prediction capability of the image segmentation model can be improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Fig. 1 shows a structural diagram of a related art center UNet model;
FIG. 2 is a schematic flow chart diagram illustrating an alternative method for training an image segmentation model according to an embodiment of the present disclosure;
FIG. 3 is a schematic flow chart diagram illustrating an alternative image segmentation method provided by the embodiment of the present disclosure;
FIG. 4 is a schematic flow chart illustrating an alternative method for training an image segmentation model provided by the embodiment of the present disclosure;
FIG. 5 is a data diagram illustrating a training method of an image segmentation model provided by an embodiment of the present disclosure;
FIG. 6 is a schematic diagram illustrating an alternative structure of a training apparatus for an image segmentation model provided by an embodiment of the present disclosure;
fig. 7 is a schematic diagram illustrating an alternative structure of an image segmentation apparatus provided in an embodiment of the present disclosure;
FIG. 8 shows a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure.
Detailed Description
In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without making creative efforts fall within the protection scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
In the following description, references to the terms "first", "second", and the like, are only to distinguish similar objects and do not denote a particular order, but rather the terms "first", "second", and the like may be used interchangeably with the order specified, where permissible, to enable embodiments of the present application described herein to be practiced otherwise than as specifically illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
It should be understood that, in the various embodiments of the present application, the size of the serial number of each implementation process does not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.
Fig. 1 shows a schematic structural diagram of a related art center UNet model.
Most of semantic segmentation models based on deep learning are combinations of an Encoder (Encoder) and a Decoder (Decoder), wherein the Encoder extracts picture features through layer-by-layer downsampling, and the Decoder recovers the extracted features through layer-by-layer upsampling to obtain a segmentation graph with the size of an original graph. In the task of semantic segmentation of biomedical images, the most classical one is UNet model, as shown in fig. 1. UNet's Encoder down samples 4 times, totally down samples 16 times, symmetrically, its Decoder also up samples 4 times correspondingly, recover the high-level semantic characteristic map that Enencoder got to the resolution ratio of the original picture. And calculating loss by using the obtained segmentation probability map and the segmentation gold standard.
For medical applications, for example, high quality annotation data is scarce and costly to acquire, since manual annotation of samples often requires a high level of physicians. Thus, the need for large marker data sets can become quite difficult and can limit the application of deep learning in this area. And the labeling for the semantic segmentation task is different from the classification task, the classification only needs to allocate a label to a given image, and the segmentation needs to allocate a label to each pixel on the image. This undoubtedly increases the marking pressure.
For example, current lesion segmentation based on breast ultrasound video generally depends on a large amount of data, for a video, the same lesion may contain many frames, and if a segmentation gold standard is marked for each frame, it is a huge workload. In practical application, only one frame is usually labeled as training data, but this will certainly cause a great waste of data, and at the same time, the training data will be greatly reduced.
However, it should be understood by those skilled in the art that the above medical application, the lesion segmentation of breast ultrasound video, are only examples, and the training method of the image segmentation model and the image segmentation method provided by the present disclosure are not only applied in the medical field, or the lesion segmentation for breast ultrasound video, but also applied in other fields or applications where image segmentation needs to be performed.
The method comprises the steps of training an annotated image and an unlabeled image simultaneously, performing supervised learning on the annotated image, splitting a video into multiple frames of images, labeling one frame of the multiple frames of images, training the annotated image and a non-annotated image jointly, performing supervised semantic segmentation learning on the annotated image, and performing comparative learning on positive and negative example data pairs constructed by the non-annotated image and the annotated image. The method not only greatly reduces the cost of label generation and only uses limited label data, but also fully utilizes the information of non-label images and greatly improves the effect of semantic segmentation branches through a semi-supervised learning mode; and the expression capability of the image segmentation model is further improved through the contrast learning of the labeled image and the unlabeled image. Compared with a model which is trained by simply using the labeled image with the same data volume, the prediction capability is greatly improved.
Fig. 2 shows an alternative flow chart of the training method of the image segmentation model provided in the embodiment of the present disclosure, which will be described according to various steps.
Step S201, determining a first labeled predicted image corresponding to the first labeled sample image in the first training subset based on the first coding subnetwork and the decoding subnetwork.
In some embodiments, the image segmentation model comprises a first encoding sub-network, a decoding sub-network, a second encoding sub-network, and a transformation sub-network, wherein the transformation sub-network comprises a first transformation layer and a second transformation layer. A training device (hereinafter referred to as a first device) of the image segmentation model determines a first labeled predicted image corresponding to a first labeled sample image in the first training subset based on the first coding sub-network and the decoding sub-network. The first training subset comprises at least one group of sample image pairs, and each group of sample image pairs comprises an annotated sample image and a non-standard sample image which are acquired from the same video; different groups of sample images differ from the corresponding videos.
In specific implementation, the first device inputs a first labeling sample image in a first training subset into the first coding sub-network to obtain a first labeling sample characteristic; and inputting the first labeled sample characteristic into the decoding sub-network to obtain the first labeled predicted image. Optionally, the first coding sub-network and the decoding sub-network are UNet model structures, and correspondingly, the first coding sub-network is an encoder and the decoding sub-network is a decoder.
In some optional embodiments, before the first apparatus performs step S201, the first apparatus may further obtain a model training set including the first training subset from at least one video.
In specific implementation, the first device acquires at least two images in any one video based on a first time interval, annotates at least one region of a first annotation sample image in the at least two images, generates a first annotation image corresponding to the first annotation sample image based on an annotation result (the first annotation image may be shown as an output of a decoding sub-network in a segmentation supervision branch in fig. 5), and confirms that an image other than the first annotation sample image in the at least two images is a first non-standard sample image.
For example, taking the application of the image segmentation model to lesion segmentation of a breast ultrasound video frame in the medical field as an example, for a segment of complete lesion video, the first device extracts M frames at equal intervals as an image of a lesion corresponding to the lesion video, optionally labels a contour of a lesion region of one frame of image (i.e., labels a first region of any image), determines that the image after labeling is a first labeled sample image, and generates a first labeled image corresponding to the first labeled sample image based on a labeling result; and selecting one frame image from the rest M-1 frame images as a first nonstandard sample image corresponding to the first labeling sample image. Further, the first device executes the above operations on other videos to obtain a labeled sample image and a non-standard sample image corresponding to each video; and confirming that N marked sample images and N non-standard sample images corresponding to N videos are the first training subset (namely Batch), and confirming that the marked sample images and the non-standard sample images corresponding to all videos (the number of which is larger than N) are the model training set. The model training set may be divided into a plurality of training subsets (including the first training subset), and the number of labeled sample images in each training subset may be the same or different.
Step S202, determining spatial feature vectors corresponding to all sample images in the first training subset based on the first coding sub-network, the second coding sub-network, and the transformation sub-network.
In some embodiments, the first device inputs the first annotated sample image into the first coding subnetwork, obtaining a first annotated sample feature; and inputting the first labeling sample characteristic to a second transformation layer included in the transformation sub-network, and obtaining a spatial characteristic vector corresponding to the first labeling sample image.
In some embodiments, the first device inputs all sample images (excluding the first labeled sample image) into the second coding sub-network, and obtains corresponding sample features of all sample images; and inputting the sample features into a first transformation layer included in the transformation sub-network, and acquiring corresponding spatial feature vectors.
In a specific implementation, the first transform layer or the second transform layer may implement mapping of a feature vector to a spatial feature vector, and any one of the transform layers may include two full connection (FC layers), a Batch Normalization (BN) layer, and an activation function (ReLU) layer. Specifically, the first conversion layer may include a first fully connected layer (first FC layer), a first batch normalization layer (first BN layer), a first activation function layer (first ReLU layer), and a second fully connected layer (second FC layer).
Step S203, determining a first sub-loss value of the image segmentation model based on a first annotation image corresponding to the first annotation sample image and the first annotation predicted image; and determining a second sub-loss value of the image segmentation model based on the spatial feature vectors corresponding to all the sample images in the first training subset.
In some embodiments, the loss value of the image segmentation model comprises the first sub-loss value and the second sub-loss value, and the first device determines the first sub-loss value of the image segmentation model based on the first annotated prediction image and the first annotated prediction image corresponding to the first annotated sample image.
In particular implementations, the first device determines a first sub-loss value of the image segmentation model based on the prediction probability value for each pixel in the first annotated sample image in the first annotated prediction image, the total number of annotation classes, and the number of sample image pairs in the first training subset.
In other embodiments, the first device determines the second sub-loss value of the image segmentation model based on the spatial feature vectors corresponding to all the sample images.
In specific implementation, the first device determines the second sub-loss value of the image segmentation model based on the distance between the spatial feature vector of the first labeled sample image and the spatial feature vector of the first non-standard sample image corresponding to the first labeled sample image, and the distance between the spatial feature vector of the first labeled sample image and the spatial feature vectors of all sample images (because the spatial feature vector of the first labeled sample image is coincident with itself, that is, the distance is 0), and the distance between the spatial feature vector of the first labeled sample image and the spatial feature vectors of all sample images except the first labeled sample image, and the numerical values of the two vectors are the same).
And step S204, training the image segmentation model based on the loss value of the image segmentation model.
In some embodiments, the first device updates model parameters of a first encoding subnetwork, a decoding subnetwork, and a second transform layer in the image segmentation model based on the first sub-loss value and the second sub-loss value of the image segmentation model; and updating the model parameters of the second coding sub-network and the first transformation layer based on the updated model parameter momentum of the first coding sub-network.
In other embodiments, the first device trains the image segmentation model based on the first encoding subnetwork, the second encoding subnetwork, the decoding subnetwork, and the transform layer after updating model parameters, and the second training subset.
The second training subset comprises at least one group of sample image pairs, and each group of sample image pairs comprises an annotated sample image and a non-standard sample image which are acquired from the same video; different sets of sample images differ from the corresponding video. At least one set of sample image pairs included in the first training subset and at least one set of sample image pairs included in the second training subset are derived from different videos.
As such, a method for training an image segmentation model provided by an embodiment of the present disclosure includes a first coding sub-network, a decoding sub-network, a second coding sub-network, and a transformation sub-network, and includes: determining a first labeling prediction image corresponding to a first labeling sample image in a first training subset based on the first coding sub-network and the decoding sub-network; determining spatial feature vectors corresponding to all sample images in the first training subset based on the first coding sub-network, the second coding sub-network and the transformation sub-network; determining a first sub-loss value of the image segmentation model based on a first labeled image corresponding to the first labeled sample image and the first labeled predicted image; determining a second sub-loss value of the image segmentation model based on the spatial feature vectors corresponding to all sample images in the first training subset; training the image segmentation model based on a first sub-loss value and a second sub-loss value of the image segmentation model; the first training subset comprises at least one group of sample image pairs, and each group of sample image pairs comprises an annotated sample image and a non-standard sample image which are acquired from the same video; different groups of sample images differ from the corresponding videos. The image segmentation model is trained simultaneously by combining the marked image and the unmarked image (namely, the unmarked image), the unmarked image is effectively utilized, the cost of the marked image is reduced, the loss value of the image segmentation model is determined based on the space characteristic vector corresponding to the marked image and the unmarked image, and the expression capability and the prediction capability of the image segmentation model can be improved.
Fig. 3 shows an alternative flowchart of the image segmentation method provided by the embodiment of the present disclosure, which will be described according to various steps.
Step S301, inputting an image to be segmented into a first coding sub-network included in the image segmentation model, and determining a first feature image corresponding to the image to be segmented.
In some embodiments, an image segmentation apparatus (hereinafter referred to as a second apparatus) inputs an image to be segmented into a first coding subnetwork based on the image segmentation model trained in steps S201 to S204, and determines a first feature image of the image to be segmented.
Step S302, inputting the first characteristic image into a decoding sub-network included in the image segmentation model, and determining segmentation results of different categories in the image to be segmented.
In some embodiments, the second device inputs the first feature image into a decoding sub-network included in the image segmentation model, and determines segmentation results of different classes in the image to be segmented. Wherein, the segmentation result may include a category of each pixel of the image to be segmented.
Therefore, by the image segmentation method provided by the embodiment of the disclosure, the segmentation results of different types in the image to be segmented are determined based on the image segmentation models trained in the steps S201 to S204, so that the image segmentation can be performed more efficiently and accurately, and powerful support is provided for the application of subsequent images.
Fig. 4 shows another alternative flowchart of the training method of the image segmentation model provided by the embodiment of the present disclosure, and fig. 5 shows a data diagram of the training method of the image segmentation model provided by the embodiment of the present disclosure, which will be described with reference to fig. 4 and fig. 5.
Step S401, a model training set is obtained.
In some embodiments, the model training set is obtained based on at least two videos, the first device obtains at least two images in any one video based on a first time interval, annotates at least one region of a first annotated sample image in the at least two images, generates a first annotated image corresponding to the first annotated sample image based on an annotation result (the first annotated image may be shown as an output of a decoding sub-network in a segmentation supervision branch in fig. 5), and confirms that an image other than the first annotated sample image in the at least two images is a first non-annotated sample image. The first time interval may be set according to actual requirements.
For example, taking the application of the image segmentation model to lesion segmentation of a breast ultrasound video frame in the medical field as an example, for a segment of complete lesion video, the first device extracts M frames at equal intervals as an image of a lesion corresponding to the lesion video, optionally labels a contour of a lesion region (i.e., labels a first region of any image) in one frame of image, determines that the image after labeling is a first labeled sample image, and generates a first labeled image corresponding to the first labeled sample image based on a labeling result; and selecting one frame image from the rest M-1 frame images as a first non-standard sample image corresponding to the first labeling sample image. Further, the first device executes the above operations on other videos to obtain a labeled sample image and a non-standard sample image corresponding to each video; and confirming that N marked sample images and N non-standard sample images corresponding to N videos are the first training subset (namely Batch), and confirming that the marked sample images and the non-standard sample images corresponding to all videos (the number of which is larger than N) are the model training set. The model training set may be divided into a plurality of training subsets (including the first training subset), and the number of labeled sample images in each training subset may be the same or different.
Further, the description will be made with the M being 60. The first device extracts 60 frames of images at equal intervals as the image of the focus of the breast ultrasound video, randomly extracts one frame to label the contour of the focus region (namely label the first region of any image) to obtain a labeled image, uses the labeled image as a label (label) for image segmentation as supervised learning, and does not label the rest 59 frames of images. The image labeled for the same video and each unlabeled image can form a sample image pair (or similar positive case); for the negative example in which two images derived from different videos are not similar in composition, comparison learning based on the positive and negative examples can be performed subsequently (i.e., step S403 and step S405).
In some alternative embodiments, the image segmentation model may be divided into two branches: the supervision branch and the comparative learning branch are segmented. Wherein the segmentation supervision branch may be composed of a first coding sub-network and a decoding sub-network, and the contrast learning branch may be composed of a second coding sub-network and a transformation sub-network.
For a Batch image (an image in a training subset) comprising N marked sample images and N nonstandard sample images, the nonstandard sample image in the Batch is formed by randomly extracting one from all nonstandard sample images of a focus video to which each marked sample image belongs, and the nonstandard sample image is x i Nonstandard sample image x belonging to the same video j Pairs of composed image samples<x i ,x j >Are positive examples of each other, and x i And x is removed from the upper and lower branches j The other 2N-2 images (including the annotated sample image and the non-standard sample image) are negative examples of each other. The segmentation supervision branch carries out supervision training on the marked sample image, and the marked sample image x i Inputting the division supervision branch, outputting the marked prediction image (or called division probability image) with the same input size and the marked sample image x i Cross entropy loss is calculated. And the comparison branch extracts spatial feature vectors from all sample images and performs feature comparison learning by combining the spatial feature vectors with the marked sample images. The specific training mode is as follows:
step S402, based on the first coding sub-network and the decoding sub-network, determining a first labeling predicted image corresponding to a first labeling sample image in the first training sub-set.
In some embodiments, the first device determines a first annotation predictive image corresponding to a first annotation sample image in the first training subset based on the first encoding subnetwork and the decoding subnetwork.
In specific implementation, the first device inputs a first labeling sample image in a first training subset into the first coding sub-network to obtain a first labeling sample characteristic; and inputting the first labeling sample characteristic into the decoding sub-network to obtain the first labeling prediction image. Optionally, the first coding sub-network and the decoding sub-network are UNet model structures, and correspondingly, the first coding sub-network is an encoder and the decoding sub-network is a decoder.
As shown in fig. 5, after passing through a first encoding sub-network, a first annotation prediction image enters the decoding sub-network to obtain the first annotation prediction image, and a first sub-loss value is determined based on a first annotation image corresponding to the first annotation prediction image and the first annotation sample image.
Step S403, determining spatial feature vectors corresponding to all sample images in the first training subset based on the first coding subnetwork, the second coding subnetwork, and the transformation subnetwork.
In some embodiments, the first device inputs all sample images except the first labeled sample image into the second coding sub-network, and obtains sample features corresponding to all sample images except the first labeled sample image; and inputting the sample features into the first transform layer (Projector structure) to obtain corresponding spatial feature vectors.
In a specific implementation, the first transformation layer may implement mapping of a feature vector to a spatial feature vector, and the first transformation layer may include a first full connection layer, a first batch normalization layer, a first activation function layer, and a second full connection layer. And the first device inputs the sample features corresponding to all the sample images except the first marked sample image into the Projector structure, and after passing through a first fc layer, a first BN layer, a first ReLU layer and a second fc layer, the sample features are mapped into corresponding spatial feature vectors.
As shown in fig. 5, the first apparatus may input all sample images except the first labeled sample image in the first training subset to the first transform layer, and obtain corresponding spatial feature vectors.
In other embodiments, the apparatus may further input the output (i.e. the first labeled sample feature) of the first coding sub-network in step S402 into a second transformation layer included in the transformation sub-network, so as to obtain a spatial feature vector corresponding to the first labeled sample image.
Step S404, determining a first sub-loss value of the image segmentation model based on a first labeled image and the first labeled predicted image.
In some embodiments, the first means determines the first sub-loss value of the image segmentation model based on the Label class (Label) of each region in the first labeled image and the first labeled predicted image.
In some alternative embodiments, the first means determines the cross-entropy loss for the annotated Label of the first annotated image and the first annotated predictive image as the first sub-loss value.
In a specific implementation, the first device determines the first sub-loss value of the image segmentation model based on the labeling class of each pixel in the first labeled image, the prediction probability value of each pixel in the first labeled sample image in the first labeled predicted image, the total number of labeling classes, and the number of sample image pairs in the first training subset.
Specifically, the first device may determine the first sub-Loss value Loss1 according to the following formula:
Figure BDA0003557234740000161
wherein N is the number of sample image pairs in the first training subset; k is the total number of labeled categories (the total number of categories refers to the image)If the total number of categories included in the image is, for example, only background and lesion in the image, the total number of labeled categories is 2, that is, labeled label has only 2 categories, and optionally, the background region in the final segmentation result is represented by pixel 0. The lesion area is represented by pixel 1, and those skilled in the art will understand that the pixel identification of different areas is not limited to 0, 1, but only for distinction); y is the pixel value of the class (pixel 0, pixel 1 mentioned earlier), y i,k Characterizing a first annotated sample image x i The determined pixel value of the kth class in the first annotation image (which can be shown as the output of the decoding subnetwork in the segmentation supervision branch in fig. 5); p is the prediction probability value, p i,k And (4) representing the prediction probability value of the kth category in the first labeling prediction image (the sequence number is the same as that of the first labeling sample image, and the sequence number is the same as that of the first labeling sample image).
Step S405, determining a second sub-loss value of the image segmentation model based on the spatial feature vectors corresponding to all the sample images.
In some embodiments, the first device determines a second sub-loss value of the image segmentation model based on the corresponding spatial feature vectors of the entire sample image.
In specific implementation, the first device determines the second sub-loss value of the image segmentation model based on the distance between the spatial feature vector of the first labeled sample image and the spatial feature vector of the first non-standard sample image corresponding to the first labeled sample image, and the distance between the spatial feature vector of the first labeled sample image and the spatial feature vectors of all sample images (because the spatial feature vector of the first labeled sample image is coincident with itself, that is, the distance is 0, the distance between the spatial feature vector of the first labeled sample image and the spatial feature vectors of all sample images can be determined, and the distance between the spatial feature vector of the first labeled sample image and the spatial feature vectors of all sample images except the first labeled sample image can be determined, and the numerical values of the two are the same).
In some embodiments, the contrast learning branch (i.e., the branch formed by the second encoding subnetwork and the transformation subnetwork) is takenAnd extracting sample features corresponding to the sample images by a second coding subnetwork with a UNet structure by using a symmetrical structure, and obtaining a spatial feature vector z corresponding to the sample features through a linear transformation structure Projector (first transformation layer). E.g. with the first annotated sample image x i Corresponding first non-standard sample image x j After passing through the first coding subnetwork and the first transform layer, a spatial feature vector z is obtained j
Further, the first apparatus performs contrast learning on the spatial feature vector corresponding to the first labeled sample image and the spatial feature vectors corresponding to all the sample images in the first training subset, and in order to make the positive example distance closer (i.e. the distance between the spatial feature vector of the first labeled sample image and the spatial feature vector of the first non-standard sample image is closest), and the negative example distance farther (i.e. the distance between the spatial feature vector of the first labeled sample image and the spatial feature vectors of other sample images is farther), the second sub-loss is determined based on the InfoNCE loss function, and can be represented by the following formula:
Figure BDA0003557234740000171
wherein, sim (z) i ,z j )=z i T z j /(‖z i2 ||z j || 2 ) Characterizing a spatial feature vector z i And the spatial feature vector z j Wherein z characterizes a spatial feature vector, z i Characterizing the corresponding spatial feature vector, z, of the first annotated sample image j Characterizing the corresponding spatial feature vector, z, of the first non-standard sample image n The spatial feature vectors characterizing the non-standard sample images in the first training subset other than the first non-standard sample image, τ characterizing the temperature hyperparameter, optionally τ =0.1 or τ =0.2 in embodiments of the present disclosure. It should be understood by those skilled in the art that τ can be other set values, such as can be selected based on experimental results, and 0.1 and 0.2 in this disclosure are only examples and are not intended to limit the range of τ.
Step S406, training the image segmentation model based on the loss value of the image segmentation model.
In some embodiments, a loss value of the image segmentation model may be determined from the first sub-loss and the second sub-loss as shown in the following equation:
Loss=α×Loss 1 +(1-α)×Loss 2 (3)
wherein, α represents a weight coefficient greater than 0 and less than or equal to 1, which can be set according to actual requirements, and optionally α =0.9. It should be understood by those skilled in the art that α can also be other set values, for example, it can be selected according to experimental results, and 0.9 in the present disclosure is only an example and is not used to limit the range of α.
In some embodiments, at the beginning of the training of the image segmentation model, the parameters of the image segmentation model are initialized randomly, and when a Batch calculation (i.e. a training subset training) is completed and the backward propagation is performed according to the loss value of the image segmentation model calculated as above (step S402 to step S406), the loss value is only used to optimize the segmentation supervision branch (first coding sub-network and decoding sub-network) and the predictor (i.e. the second transformation layer) of the labeled image in the contrast learning branch, and in the contrast learning branch, the method of updating Momentum (Momentum) is adopted for the part (the second coding sub-network and the first transformation layer) processed by the unlabeled sample image:
ξ=mξ+(1-m)θ (4)
where θ is the model parameter of the Encoder (first coding sub-network) of the splitting supervision branch and the Projector structure (second transform layer) of the contrast learning branch, ξ is the model parameter of the corresponding structure (second coding sub-network and first transform layer) of the contrast learning branch, and m is the weight adjustment coefficient, m in the present disclosure may be between 0.9 and 0.99, and optionally m =0.9. It will be understood by those skilled in the art that m can be other set values, such as can be selected based on experimental results, and the ranges of 0.9 to 0.99 and 0.9 in the present disclosure are only examples and are not intended to limit the scope of m. The parameter updating mode enables the parameters of the unsupervised branch (namely the contrast learning branch) to be slowly and stably updated to the optimal values, and effectively avoids the model from falling into local optimization.
Based on the above steps S401 to S406, semi-supervised learning is jointly performed on the labeled image and the non-labeled image in the video, so that the corresponding image segmentation model can be obtained.
Therefore, according to the training method of the image segmentation model provided by the embodiment of the disclosure, a video is split into multiple frames of images, only one frame is labeled, a labeled image and a non-labeled image are trained jointly, supervised semantic segmentation learning is performed on the labeled image, and positive negative example data pairs are constructed between the non-labeled image and the image for comparison learning. The method not only greatly reduces the cost of label generation, but also only uses limited label data (label images), fully utilizes the information of non-label images, and greatly improves the effect of semantic segmentation branches through a semi-supervised learning mode.
Fig. 6 is a schematic diagram illustrating an alternative structure of a training apparatus for an image segmentation model according to an embodiment of the present disclosure, which will be described according to various parts.
In some embodiments, the training apparatus 600 of the image segmentation model may include a first determination unit 601, a second determination unit 602, a third determination unit 603, and a training unit 604.
The first determining unit 601 is configured to determine, based on the first coding sub-network and the decoding sub-network, a first labeled prediction image corresponding to a first labeled sample image in the first training sub-set;
the second determining unit 602 is configured to determine, based on the first coding sub-network, the second coding sub-network, and the transformation sub-network, spatial feature vectors corresponding to all sample images in the first training subset;
the third determining unit 603 is configured to determine a first sub-loss value of the image segmentation model based on a first labeled image corresponding to the first labeled sample image and the first labeled predicted image; determining a second sub-loss value of the image segmentation model based on the spatial feature vectors corresponding to all sample images in the first training subset;
the training unit 604 is configured to train the image segmentation model based on a first sub-loss value and a second sub-loss value of the image segmentation model;
the first training subset comprises at least one group of sample image pairs, and each group of sample image pairs comprises an annotated sample image and a non-standard sample image which are acquired from the same video; different sets of sample images differ from the corresponding video.
In some embodiments, the training apparatus 600 for an image segmentation model may further include an obtaining unit 605.
The obtaining unit 605 is specifically configured to obtain a model training set including a first training subset from at least one video before determining a first labeled predicted image corresponding to a first labeled sample image in the first training subset.
The obtaining unit 605 is specifically configured to perform the following operations on each of the at least one video:
the method comprises the steps of obtaining at least two images in any video based on a first time interval, labeling at least one region of a first labeled sample image in the at least two images, generating a first labeled image corresponding to the first labeled sample image based on a labeling result, and confirming that the images except the first labeled sample image in the at least two images are first non-standard sample images.
The first determining unit 601 is specifically configured to input a first labeled sample image in a first training subset into the first coding subnetwork, and obtain a first labeled sample feature; and inputting the first labeling sample characteristic into the decoding sub-network to obtain the first labeling prediction image.
The second determining unit 602 is specifically configured to perform the following operations on each non-standard sample image in the first training subset:
inputting images except the first marked sample image in the sample image into the second coding sub-network, and acquiring sample characteristics corresponding to the sample image; inputting the sample characteristics to a first transformation layer included in the transformation sub-network, and acquiring corresponding spatial characteristic vectors;
the first annotation sample image in the first training subset is input into the first coding sub-network, and first annotation sample characteristics are obtained; and inputting the first labeling sample characteristic to a second transformation layer included in the transformation sub-network, and acquiring a spatial characteristic vector corresponding to the first labeling sample image.
The third determining unit 603 is specifically configured to determine the first sub-loss value of the image segmentation model based on the prediction probability value of each pixel in the first labeled sample image in the first labeled prediction image, the total number of labeled classes, and the number of sample image pairs in the first training subset.
The third determining unit 603 is specifically configured to determine a second sub-loss value of the image segmentation model based on a distance between the spatial feature vector of the first labeled sample image and the spatial feature vectors corresponding to all the sample images.
The training unit 604 is specifically configured to update model parameters of a first coding sub-network, a decoding sub-network, and a second transform layer in the image segmentation model based on the first sub-loss value and the second sub-loss value of the image segmentation model; and updating the model parameters of the second coding sub-network and the first transformation layer based on the updated model parameter momentum of the first coding sub-network.
The training unit 604 is specifically configured to train the image segmentation model based on the first coding sub-network, the second coding sub-network, the decoding sub-network, the transformation layer after updating the model parameters, and the second training sub-network.
Fig. 7 is a schematic diagram illustrating an alternative structure of an image segmentation apparatus provided in an embodiment of the present disclosure, which will be described according to various parts.
In some embodiments, the image segmentation apparatus 700 comprises an encoding unit 701 and a decoding unit 702.
The encoding unit 701 is configured to input an image to be segmented to a first encoding subnetwork included in the image segmentation model, and determine a first feature image corresponding to the image to be segmented;
the decoding unit 702 is configured to input the first feature image into a decoding sub-network included in the image segmentation model, and determine segmentation results of different categories in the image to be segmented.
The present disclosure also provides an electronic device and a readable storage medium according to an embodiment of the present disclosure.
FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 8, the apparatus 800 includes a computing unit 801 which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 performs the respective methods and processes described above, such as a training method of an image segmentation model or an image segmentation method. For example, in some embodiments, the training method of the image segmentation model or the image segmentation method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the training method of the image segmentation model or the image segmentation method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the training method of the image segmentation model or the image segmentation method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present disclosure, "a plurality" means two or more unless specifically limited otherwise.
The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present disclosure, and shall cover the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (13)

1. A method of training an image segmentation model, the image segmentation model comprising a first coding sub-network, a decoding sub-network, a second coding sub-network, and a transformation sub-network, the method comprising:
determining a first annotation predictive image corresponding to a first annotation sample image in a first training subset based on the first coding subnetwork and the decoding subnetwork;
all sample images in the first training subset comprise a first labeled sample image and sample images in the first training subset except the first labeled sample image;
inputting sample images except the first marked sample image in all the sample images into the second coding sub-network to obtain sample characteristics corresponding to the sample images; inputting the sample features to a first transformation layer included in the transformation sub-network, and acquiring a spatial feature vector corresponding to the sample image;
inputting a first labeling sample image in a first training subset into the first coding sub-network to obtain a first labeling sample characteristic; inputting the first labeling sample feature into a second transformation layer included in the transformation sub-network, and acquiring a spatial feature vector corresponding to the first labeling sample image; the space feature vectors corresponding to the sample images except the first labeled sample image in all the sample images and the space feature vector corresponding to the first labeled sample image are the space feature vectors corresponding to all the sample images in the first training subset;
determining a first sub-loss value of the image segmentation model based on a first labeled image corresponding to the first labeled sample image and the first labeled predicted image; determining a second sub-loss value of the image segmentation model based on the spatial feature vectors corresponding to all sample images in the first training subset;
training the image segmentation model based on a first sub-loss value and a second sub-loss value of the image segmentation model;
the first training subset comprises at least one group of sample image pairs, and each group of sample image pairs comprises an annotated sample image and a non-standard sample image which are acquired from the same video; different sets of sample images differ from the corresponding video.
2. The method of claim 1, wherein prior to determining the first annotated prediction image corresponding to the first annotated sample image in the first training subset based on the first encoding subnetwork and the decoding subnetwork, the method further comprises:
a training set of models is obtained from at least one video that includes the first training subset.
3. The method of claim 2, wherein the obtaining the training set of models including the first training subset from at least one video comprises, for each video of the at least one video:
the method comprises the steps of obtaining at least two images in any video based on a first time interval, labeling at least one region of a first labeled sample image in the at least two images, generating a first labeled image corresponding to the first labeled sample image based on a labeling result, and confirming that the images except the first labeled sample image in the at least two images are first non-standard sample images.
4. The method of claim 1, wherein determining the first annotated prediction image corresponding to the first annotated sample image in the first training subset based on the first encoding subnetwork and the decoding subnetwork comprises:
inputting a first labeling sample image in a first training subset into the first coding sub-network to obtain a first labeling sample characteristic;
and inputting the first labeled sample characteristic into the decoding sub-network to obtain the first labeled predicted image.
5. The method of claim 1, wherein the determining a first sub-loss value for the image segmentation model based on a first labeled picture corresponding to the first labeled sample picture and the first labeled predictive picture comprises:
determining a first sub-loss value of the image segmentation model based on a prediction probability value in the first annotated image, the first annotated prediction image corresponding to each pixel in the first annotated image, a total number of annotation classes, and a number of sample image pairs in the first training subset.
6. The method according to claim 5, wherein the determining a second sub-loss value of the image segmentation model based on the corresponding spatial feature vectors of all the sample images comprises:
and determining a second sub-loss value of the image segmentation model based on the distance between the spatial feature vector of the first labeled sample image and the spatial feature vectors corresponding to all the sample images.
7. The method of claim 1, wherein training the image segmentation model based on the first sub-loss value and the second sub-loss value of the image segmentation model comprises:
updating model parameters of a first coding sub-network, a decoding sub-network and a second transform layer in the image segmentation model based on the first sub-loss value and the second sub-loss value of the image segmentation model;
and updating the model parameters of the second coding sub-network and the first transformation layer based on the updated model parameter momentum of the first coding sub-network.
8. The method of claim 7, wherein training the image segmentation model based on the first sub-loss value and the second sub-loss value of the image segmentation model further comprises:
training the image segmentation model based on the first coding sub-network, the second coding sub-network, the decoding sub-network, and the transformation layer after updating model parameters, and a second training sub-set.
9. An image segmentation method implemented on the basis of an image segmentation model trained by the method of any one of claims 1 to 8, the method comprising:
inputting an image to be segmented into a first coding sub-network included in the image segmentation model, and determining a first characteristic image corresponding to the image to be segmented;
and inputting the first characteristic image into a decoding sub-network included in the image segmentation model, and determining segmentation results of different classes in the image to be segmented.
10. An apparatus for training an image segmentation model, wherein the image segmentation model comprises a first coding sub-network, a decoding sub-network, a second coding sub-network and a transformation sub-network, the apparatus comprising:
a first determining unit, configured to determine, based on the first coding sub-network and the decoding sub-network, a first labeled prediction image corresponding to a first labeled sample image in a first training subset;
all sample images in the first training subset comprise a first labeled sample image and sample images in the first training subset except the first labeled sample image;
a second determining unit, configured to input sample images, except for the first labeled sample image, of the all sample images into the second coding sub-network, and obtain sample features corresponding to the sample images; inputting the sample features into a first transformation layer included in the transformation sub-network, and acquiring a spatial feature vector corresponding to the sample image; inputting a first labeling sample image in a first training subset into the first coding sub-network to obtain a first labeling sample characteristic; inputting the first labeling sample feature into a second transformation layer included in the transformation sub-network, and acquiring a spatial feature vector corresponding to the first labeling sample image; the space feature vectors corresponding to the sample images except the first labeled sample image in all the sample images and the space feature vector corresponding to the first labeled sample image are the space feature vectors corresponding to all the sample images in the first training subset;
a third determining unit, configured to determine a first sub-loss value of the image segmentation model based on the first labeled sample image and the first labeled predicted image; determining a second sub-loss value of the image segmentation model based on the spatial feature vectors corresponding to all sample images in the first training subset;
a training unit, configured to train the image segmentation model based on a first sub-loss value and a second sub-loss value of the image segmentation model;
the first training subset comprises at least one group of sample image pairs, and each group of sample image pairs comprises an annotated sample image and a non-standard sample image which are acquired from the same video; different sets of sample images differ from the corresponding video.
11. An image segmentation apparatus implemented based on an image segmentation model trained by the method of any one of claims 1 to 8, the apparatus comprising:
the image segmentation model comprises an encoding unit, a segmentation unit and a segmentation unit, wherein the encoding unit is used for inputting an image to be segmented into a first encoding sub-network included in the image segmentation model and determining a first characteristic image corresponding to the image to be segmented;
and the decoding unit is used for inputting the first characteristic image into a decoding sub-network included in the image segmentation model and determining segmentation results of different categories in the image to be segmented.
12. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8;
or to perform the method of claim 9.
13. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8;
or to perform the method of claim 9.
CN202210282010.3A 2022-03-21 2022-03-21 Training method and device for image segmentation model, electronic equipment and storage medium Active CN114627296B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210282010.3A CN114627296B (en) 2022-03-21 2022-03-21 Training method and device for image segmentation model, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210282010.3A CN114627296B (en) 2022-03-21 2022-03-21 Training method and device for image segmentation model, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114627296A CN114627296A (en) 2022-06-14
CN114627296B true CN114627296B (en) 2022-11-08

Family

ID=81903336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210282010.3A Active CN114627296B (en) 2022-03-21 2022-03-21 Training method and device for image segmentation model, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114627296B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115294400B (en) * 2022-08-23 2023-03-31 北京医准智能科技有限公司 Training method and device for image classification model, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111583291A (en) * 2020-04-20 2020-08-25 中山大学 Layer segmentation method and system for retina layer and effusion region based on deep learning
CN111899244A (en) * 2020-07-30 2020-11-06 北京推想科技有限公司 Image segmentation method, network model training method, device and electronic equipment
US10839565B1 (en) * 2019-08-19 2020-11-17 Samsung Electronics Co., Ltd. Decoding apparatus and operating method of the same, and artificial intelligence (AI) up-scaling apparatus and operating method of the same
CN112085739A (en) * 2020-08-20 2020-12-15 深圳力维智联技术有限公司 Semantic segmentation model training method, device and equipment based on weak supervision
CN113935957A (en) * 2021-09-26 2022-01-14 平安科技(深圳)有限公司 Medical image comparison method and device, electronic equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110517759B (en) * 2019-08-29 2022-03-25 腾讯医疗健康(深圳)有限公司 Method for determining image to be marked, method and device for model training
CN114049516A (en) * 2021-11-09 2022-02-15 北京百度网讯科技有限公司 Training method, image processing method, device, electronic device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10839565B1 (en) * 2019-08-19 2020-11-17 Samsung Electronics Co., Ltd. Decoding apparatus and operating method of the same, and artificial intelligence (AI) up-scaling apparatus and operating method of the same
CN111583291A (en) * 2020-04-20 2020-08-25 中山大学 Layer segmentation method and system for retina layer and effusion region based on deep learning
CN111899244A (en) * 2020-07-30 2020-11-06 北京推想科技有限公司 Image segmentation method, network model training method, device and electronic equipment
CN112085739A (en) * 2020-08-20 2020-12-15 深圳力维智联技术有限公司 Semantic segmentation model training method, device and equipment based on weak supervision
CN113935957A (en) * 2021-09-26 2022-01-14 平安科技(深圳)有限公司 Medical image comparison method and device, electronic equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Representation Learning with Contrastive Predictive Coding";Aaron van den Oord et al.;《arXiv》;20190122;第1-13页 *
"Weakly-Supervised Ultrasound Video Segmentation with Minimal Annotations";Ruiheng Chang et al.;《MICCAI 2021, LNCS 12908》;20211231;第648-658页 *
"基于监督对比学习正则化的高分辨率SAR图像建筑物提取方法";康健 等;《雷达学报》;20220228;第11卷(第1期);第157-167页 *

Also Published As

Publication number Publication date
CN114627296A (en) 2022-06-14

Similar Documents

Publication Publication Date Title
CN113326764B (en) Method and device for training image recognition model and image recognition
CN113033622B (en) Training method, device, equipment and storage medium for cross-modal retrieval model
CN113590858B (en) Target object generation method and device, electronic equipment and storage medium
CN113159010B (en) Video classification method, device, equipment and storage medium
KR102576344B1 (en) Method and apparatus for processing video, electronic device, medium and computer program
CN114863437B (en) Text recognition method and device, electronic equipment and storage medium
CN113327599B (en) Voice recognition method, device, medium and electronic equipment
CN115376211B (en) Lip driving method, lip driving model training method, device and equipment
CN113627536A (en) Model training method, video classification method, device, equipment and storage medium
US20230102804A1 (en) Method of rectifying text image, training method, electronic device, and medium
CN115861462A (en) Training method and device for image generation model, electronic equipment and storage medium
CN114627296B (en) Training method and device for image segmentation model, electronic equipment and storage medium
CN113360683A (en) Method for training cross-modal retrieval model and cross-modal retrieval method and device
CN111312224B (en) Training method and device of voice segmentation model and electronic equipment
CN116579407B (en) Compression method, training method, processing method and device of neural network model
US11741713B2 (en) Method of detecting action, electronic device, and storage medium
CN114078097A (en) Method and device for acquiring image defogging model and electronic equipment
CN113360712B (en) Video representation generation method and device and electronic equipment
CN114550236B (en) Training method, device, equipment and storage medium for image recognition and model thereof
CN111311604A (en) Method and apparatus for segmenting an image
CN114663372B (en) Video-based focus classification method and device, electronic equipment and medium
CN114820686B (en) Matting method and device, electronic equipment and storage medium
CN116451770B (en) Compression method, training method, processing method and device of neural network model
CN115861684B (en) Training method of image classification model, image classification method and device
JP7403673B2 (en) Model training methods, pedestrian re-identification methods, devices and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Room 3011, 2nd Floor, Building A, No. 1092 Jiangnan Road, Nanmingshan Street, Liandu District, Lishui City, Zhejiang Province, 323000

Patentee after: Zhejiang Yizhun Intelligent Technology Co.,Ltd.

Address before: No. 1202-1203, 12 / F, block a, Zhizhen building, No. 7, Zhichun Road, Haidian District, Beijing 100083

Patentee before: Beijing Yizhun Intelligent Technology Co.,Ltd.

CP03 Change of name, title or address