CN115222639A - Image segmentation method and device, electronic device and storage medium - Google Patents

Image segmentation method and device, electronic device and storage medium Download PDF

Info

Publication number
CN115222639A
CN115222639A CN202110410164.1A CN202110410164A CN115222639A CN 115222639 A CN115222639 A CN 115222639A CN 202110410164 A CN202110410164 A CN 202110410164A CN 115222639 A CN115222639 A CN 115222639A
Authority
CN
China
Prior art keywords
features
feature
module
image
decoding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110410164.1A
Other languages
Chinese (zh)
Inventor
徐奕
王楠
李一鸣
唐洋
张佼
孙宝德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202110410164.1A priority Critical patent/CN115222639A/en
Publication of CN115222639A publication Critical patent/CN115222639A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10116X-ray image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30108Industrial image inspection
    • G06T2207/30136Metal

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The application provides an image segmentation method and device, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: coding the target image through a feature extraction module of the segmentation model to obtain original coding features of multiple stages; decoding the original coding features of multiple stages through a stage feature decoding module of the segmentation model to obtain intermediate features of multiple scales; and predicting after fusing the intermediate features of the multiple scales through a multi-level feature aggregation module of the segmentation model to obtain an image segmentation result. According to the scheme, the encoder-decoder framework with a symmetrical structure is adopted, so that the continuity and high definition of the boundary of the foreground object in the image segmentation result are ensured; the distinguishing capability of the foreground object and the noise is improved through the stage feature decoding module, and the feature expression of the multi-scale foreground object in the decoder is enhanced through the multi-level feature aggregation module, so that the detection capability of the segmentation model on the small-scale foreground object is improved.

Description

Image segmentation method and device, electronic device and storage medium
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to an image segmentation method and apparatus, an electronic device, and a computer-readable storage medium.
Background
In the process of metal solidification, the microstructure control of a solidification structure is the key for improving the performance of a casting. Permeability is a key parameter in controlling fluid flow between the coagulated tissues in the mushy zone and significantly affects the transfer of heat and mass in the liquid phase, thereby affecting the formation of coagulated tissue. Therefore, it is important to determine the dynamic permeability in situ during solidification.
At present, the in-situ recovery of a dendritic crystal three-dimensional microstructure in the solidification process of a metal alloy is realized on the basis of the corresponding relation between the imaging energy intensity and the three-dimensional microstructure of a solidification structure by combining a synchrotron radiation X-ray imaging technology and a traditional image processing algorithm. According to the method, a traditional image segmentation algorithm is used for segmenting solid phase dendrites and other regions in an image, and then a three-dimensional solid phase microstructure is obtained through subsequent image processing by using segmentation results. The method is used for obtaining a solid-phase three-dimensional microscopic result, and then the magnitude of the permeability can be further deduced and calculated by combining related theories in the field of materials. Therefore, the accuracy of the solid-liquid phase segmentation result is crucial to accurately determining the permeability, and directly determines the accuracy of the three-dimensional solid-phase microstructure.
In recent years, deep learning has achieved good performance in the fields of image classification, semantic segmentation, object detection, natural language processing, recommendation, personalization technology, and the like. Deep learning-based semantic segmentation techniques often rely on large-scale, finely labeled data sets. The dendrite has various shapes, most of the boundaries grow radially, and semantic boundaries and coarse boundaries exist. In addition, texture, artifacts, and a variety of semantic noise in the image lead to semantic boundary ambiguity. These problems result in low accuracy of deep learning when segmenting solid-liquid phase images.
Disclosure of Invention
An object of the embodiments of the present application is to provide an image segmentation method and apparatus, an electronic device, and a computer-readable storage medium, which are used for implementing solid-liquid phase segmentation on a dendrite image.
In one aspect, the present application provides an image segmentation method, including:
coding the target image through a feature extraction module of the segmentation model to obtain original coding features of multiple stages;
decoding the original coding features of the multiple stages through a stage feature decoding module of the segmentation model to obtain intermediate features of multiple scales;
and predicting after fusing the intermediate features of the multiple scales through a multi-level feature aggregation module of the segmentation model to obtain an image segmentation result.
In one embodiment, the feature extraction module is constructed based on SE-ResNet;
the method for obtaining the original coding features of multiple stages by coding the target image through the feature extraction module of the segmentation model comprises the following steps:
and encoding the target image by a feature extraction module based on the SE-ResNet to obtain original encoding features of multiple stages, and transmitting the original encoding features to the stage feature decoding module of the same stage by stage through jump connection.
In one embodiment, the stage feature decoding module comprises a plurality of sub-decoding modules;
the step of decoding the original coding features of the multiple stages by the stage feature decoding module of the segmentation model to obtain intermediate features of multiple scales includes:
respectively fusing the shallow feature and the deep feature corresponding to the sub-decoding modules through the sub-decoding modules to obtain the intermediate feature corresponding to the sub-decoding modules; the deep features are intermediate features transmitted by a previous sub-decoding module or original coding features transmitted by a feature extraction module, and the shallow features are original coding features transmitted by a feature extraction module in the same stage in a jumping connection mode.
In an embodiment, the fusing the shallow feature and the deep feature corresponding to the sub-decoding module to obtain the intermediate feature corresponding to the sub-decoding module includes:
splicing the deep layer features and the shallow layer features to obtain spliced features;
compressing the splicing features by using a global pooling layer;
determining feature weights from the compressed splicing features through the convolutional layers and the active layers;
and multiplying the characteristic weight by the shallow characteristic, and adding the weighted shallow characteristic and the weighted deep characteristic to obtain an intermediate characteristic.
In one embodiment, the segmentation model is trained by:
and self-training a preset deep learning model by utilizing a plurality of sample images carrying labels and a plurality of test images to obtain the segmentation model.
In an embodiment, the deep learning model comprises a plurality of deep learning submodels;
the self-training of the preset deep learning model by utilizing a plurality of sample images carrying labels and a plurality of test images to obtain the segmentation model comprises the following steps:
self-training the multiple deep learning submodels by using the sample image and the test image, and integrating the multiple deep learning submodels in each iteration based on a cosine annealing learning rate adjustment strategy;
and determining a segmentation model from the trained multiple deep learning submodels through multiple iterations.
In another aspect, the present application further provides an image segmentation apparatus, including:
the encoding module is used for encoding the target image through the feature extraction module of the segmentation model to obtain original encoding features of multiple stages;
the decoding module is used for decoding the original coding features of the multiple stages through the stage feature decoding module of the segmentation model to obtain intermediate features of multiple scales;
and the prediction module is used for predicting the intermediate features of the multiple scales after the intermediate features of the multiple scales are fused by the multi-level feature aggregation module of the segmentation model so as to obtain an image segmentation result.
In one embodiment, the apparatus further comprises:
and the training module is used for self-training a preset deep learning model by utilizing a plurality of sample images carrying labels and a plurality of test images to obtain the segmentation model.
Further, the present application also provides an electronic device, including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to perform the image segmentation method described above.
In addition, the present application also provides a computer-readable storage medium storing a computer program executable by a processor to perform the above-mentioned image segmentation method.
According to the scheme, aiming at the characteristics that foreground objects with various forms and rough boundaries in the image and strong semantic noise are contained in the background, the encoder-decoder framework with the symmetrical structure is adopted, and low-level feature information is reserved to the maximum extent, so that the continuity and high definition of the boundaries of the foreground objects in the image segmentation result are ensured; on the basis, the distinguishing capability of the segmentation model on foreground objects and noise is improved through a stage feature decoding module based on an attention mechanism, and the feature expression of multi-scale foreground objects in the decoder is enhanced through a multi-level feature aggregation module, so that the detection capability of the segmentation model on small-scale foreground objects is improved. By the measures, the accuracy of image segmentation can be greatly improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic view of an application scenario of an image segmentation method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 3 is a schematic flowchart of an image segmentation method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a process of a multi-level feature aggregation module according to an embodiment of the present application;
fig. 5 is a schematic processing diagram of a UNet-based segmentation model according to an embodiment of the present application;
fig. 6 is a processing diagram of a stage feature decoding module according to an embodiment of the present application;
FIG. 7 is a schematic view of a sample image and label provided by an embodiment of the present application;
FIG. 8 is a schematic diagram illustrating a training method of a segmentation model according to an embodiment of the present application;
fig. 9 is a block diagram of an image segmentation apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
Like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not construed as indicating or implying relative importance.
Fig. 1 is a schematic view of an application scenario of an image segmentation method provided in an embodiment of the present application. As shown in fig. 1, the application scenario includes a client 20 and a server 30; the client 20 may be a synchrotron radiation X-ray imaging device, and is configured to perform in-situ X-ray imaging on a metal sample in a solidification process, and transmit an obtained X-ray imaging image to the server 30; the server 30 may be a server, a server cluster, or a cloud computing center, and may perform image segmentation on the acquired X-ray imaging image.
As shown in fig. 2, the present embodiment provides an electronic device 1, including: at least one processor 11 and a memory 12, one processor 11 being taken as an example in fig. 2. The processor 11 and the memory 12 are connected by a bus 10, and the memory 12 stores instructions executable by the processor 11, and the instructions are executed by the processor 11, so that the electronic device 1 can execute all or part of the flow of the method in the embodiments described below. In an embodiment, the electronic device 1 may be the server 30 described above, and is configured to perform the image segmentation method.
The Memory 12 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically Erasable Programmable Read-Only Memory (EEPROM), erasable Programmable Read-Only Memory (EPROM), programmable Read-Only Memory (PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk.
The present application also provides a computer readable storage medium storing a computer program executable by a processor 11 to perform the image segmentation method provided by the present application.
Referring to fig. 3, a flowchart of an image segmentation method according to an embodiment of the present application is shown, and as shown in fig. 3, the method may include the following steps 310 to 330.
Step 310: and coding the target image through a feature extraction module of the segmentation model to obtain original coding features of multiple stages.
Wherein the segmentation model is a deep learning model trained for image segmentation. In one embodiment, the segmentation model is constructed based on UNet, and has a U-shaped structure. The feature extraction module is an encoder of a segmentation model for extracting original coding features from an image subjected to image segmentation.
The target image refers to an image of the currently received image segmentation. In one embodiment, the target image may be an X-ray imaging image of the metal solidification process.
The server can perform multi-stage coding processing on the target image through the feature extraction module, so that original coding features of multiple stages are obtained. In the encoding process, the original encoding characteristics of the later stage can be obtained by encoding the original encoding characteristics of the previous stage.
Step 320: and decoding the original coding features of the multiple stages by a stage feature decoding module of the segmentation model to obtain intermediate features of multiple scales.
The stage feature decoding module is a decoder of the segmentation model and is used for decoding original coding features.
The server side can decode the original coding features through the stage feature decoding module, and therefore the intermediate features of multiple scales are obtained.
Step 330: and (3) fusing the intermediate features of multiple scales through a multi-level feature aggregation module of the segmentation model and then predicting to obtain an image segmentation result.
The multi-level Feature aggregation module (HFF) may fuse intermediate features of different scales, and predict features obtained by the Fusion, thereby obtaining an image segmentation result.
Referring to fig. 4, which is a schematic processing diagram of the multi-level feature aggregation module provided in an embodiment of the present application, as shown in fig. 4, a server may fuse multiple intermediate features (the intermediate features include features from a deep layer to a shallow layer) by using the multi-level feature aggregation module to obtain fused features, and perform prediction based on the fused features to obtain an image segmentation result of a target image.
Among the intermediate features, the shallow features with the fine scale can learn more accurate spatial structure information, and the deep features with the coarse scale can learn rich semantic information contained in objects in the image (for example, for an X-ray imaging image in the metal solidification process, the objects in the image can be dendrites). The multi-level feature aggregation module fully aggregates the intermediate features output by the decoder obtained from each level to enhance the feature representation of the multi-scale object in the image, so that the intermediate features of each scale can contain rich semantic information and fine positioning information, the perception capability of the decoder on the small-scale object is improved, and the segmentation result of the segmentation model on the small object is improved.
In one embodiment, the feature extraction module of the segmentation model may be constructed based on SE-ResNet. The SE-ResNet used to construct the feature extraction module removes the downsampling layers of the last two stages to obtain larger scale original coding features. In addition, SE-ResNet can add hole convolution to enlarge the receptive field.
The feature extraction module may include a plurality of SE-ResNet adjusted as described above as a multi-stage encoder.
The server side can carry out coding processing on the target image through a feature extraction module based on SE-ResNet to obtain original coding features of multiple stages. In the encoding process, the encoder in the first stage can directly encode the target image to obtain the original encoding characteristics in the first stage; the encoder of the second stage can encode the original encoding characteristics of the first stage to obtain the original encoding characteristics of the second stage, and so on until the encoder of each stage outputs the corresponding original encoding characteristics.
The server can transmit the original coding characteristics to the stage characteristic coding module of the same stage by stage through the jump connection. Referring to fig. 5, a schematic processing diagram of the UNet-based segmentation model according to an embodiment of the present application is shown in fig. 5, where the UNet-based segmentation model has a U-shaped structure, the encoder is on the left side, the decoder is on the right side, and the numbers on the sides of arrows in fig. 5 indicate the calculation processes included in the arrows. After the server obtains the original coding features through the encoders at each stage in the feature extraction module, the server can transmit the original coding features to the stage feature encoding module at the same stage through the processing procedure with the sequence number 530 in fig. 5.
In one embodiment, the stage feature decoding module includes a plurality of sub-decoding modules, where the sub-decoding modules are stage feature decoding modules of different stages.
When the server side decodes the original coding features of multiple stages through the stage feature decoding module of the segmentation model, the shallow feature and the deep feature corresponding to the server side can be fused through the multiple sub-decoding modules respectively, and the intermediate feature corresponding to the sub-decoding modules is obtained.
Wherein the deep and shallow features are relative concepts; for any sub-decoding module, the deep features are the intermediate features transmitted by the previous sub-decoding module or the original coding features transmitted by the feature extraction module, and the shallow features are the original coding features transmitted by the feature extraction module in the same stage in a jumping connection manner.
As shown in fig. 5, the stage feature decoding module includes 4 sub-decoding modules, a first sub-decoding module may use the original coding feature (the original coding feature located at the bottom of the U-shaped structure in fig. 5) transmitted by the encoder in the same stage as a deep feature, and use the original coding feature (the original coding feature transmitted in the bottom 530 step in fig. 5) transmitted by the encoder in the same stage through a skip connection as a shallow feature, and the first sub-decoding module fuses the deep feature and the shallow feature corresponding to itself to obtain the corresponding intermediate feature. The second sub-decoding module may use the intermediate features output by the first sub-decoding module as deep features, use original coding features (original coding features transmitted in 3 530 steps from top to bottom in fig. 5) transmitted by the encoder in the same stage through skip connection as shallow features, and fuse the deep features and the shallow features corresponding to the second sub-decoding module to obtain the corresponding intermediate features. By analogy, the 4 sub-decoding modules can successively decode to obtain respective intermediate features.
In one embodiment, the stage feature decoding module is a channel attention mechanism-based stage feature decoding module. Referring to fig. 6, which is a schematic processing diagram of the stage feature decoding module according to an embodiment of the present application, as shown in fig. 6, the server may splice the deep-layer feature and the shallow-layer feature through the stage feature decoding module to obtain a spliced feature. For any sub-decoding module in the stage feature decoding module, because the deep-layer features and the shallow-layer features of the sub-decoding module have different scales, the server side can firstly up-sample the deep-layer features to be the same as the scales of the shallow-layer features through the sub-decoding module, and splice the processed deep-layer features and the shallow-layer features in the channel dimension, so as to obtain spliced features.
The server side can compress the splicing characteristics through the global pooling layer in the sub-decoding module, and determines the characteristic weight from the compressed splicing characteristics through the convolution layer and the active layer in the sub-decoding module. As shown in fig. 6, the feature weight can be obtained by calculating the compressed splicing feature in the convolution layer, the ReLu (Linear rectification function) layer, the convolution layer, and the Sigmoid layer in sequence. Here, the feature weight can be regarded as a matrix having the same width and height as the shallow feature, and each element in the feature weight corresponds to an element at the same position as each channel in the shallow feature. Illustratively, the scale of the shallow feature is denoted as W × H × C in terms of width × height × number of channels, and the scale of the feature weight is denoted as W × H × 1 in terms of width × height × number of channels, and the element in the mth row and nth column in the feature weight is the weight of the element in the mth row and nth column of each channel in the shallow feature.
The server can multiply the feature weight and the shallow feature through the sub-decoding module, and add the weighted shallow feature and the weighted deep feature to obtain the intermediate feature. Here, before adding the shallow features to the deep features, the deep features may be upsampled to the same scale as the shallow features, and the upsampled deep features may be added to the collocated elements in the shallow features, resulting in intermediate features.
In a semantic segmentation model, it is generally considered that deep features contain richer global context information, while shallow features contain more detailed information such as space and texture. The use of deep features to assist in the generation of feature weights facilitates better identification of semantic noise locally similar to objects in the image using global context information. For an X-ray imaging image of a metal sample in a solidification process, certain similarity exists between semantic noise in the image and foreground dendrites on local features and modes, and a feature decoding module can utilize global context information to assist a segmentation model to better distinguish the dendrites from the semantic noise based on the application stage. In which a solute in a liquid phase gradually precipitates to form a solid phase during solidification of a metal, and thus dendrites in an image are not actually a pure solid phase but a mixture of a solid phase and a liquid phase.
The UNet-based segmentation model in the application and the encoder-decoder framework with a symmetrical structure can reserve low-level feature information to the maximum extent, so that continuity and high definition of object boundaries in a segmentation result are guaranteed. On the basis, the same feature expression capability of foreground objects with various shapes is enhanced by improving the encoder part (constructing an encoder based on SE-ResNet); the stage feature decoding module based on the attention mechanism is used for improving the distinguishing capability of the segmentation model on foreground objects and noise; the feature expression of the multi-scale foreground object in the decoder is enhanced through multi-level feature aggregation, so that the detection capability of the model for the small-size foreground object is improved.
For an X-ray imaging image, the segmentation model can ensure the continuity and high definition of dendritic crystal boundaries in a segmentation result, effectively distinguish dendritic crystals and noise, and accurately detect new small dendritic crystals.
In an embodiment, the server may train to obtain the segmentation model before performing the image segmentation method.
The server side can perform self-training on the preset deep learning model by using a plurality of sample images carrying labels and a plurality of test images to obtain a segmentation model. The deep learning model can be built based on a UNet framework. In one embodiment, the encoder with the downsampling layer of the last two stages removed and the SE-ResNet with hole convolution added is used as a deep learning model, the stage feature decoding module is used as a deep learning model decoder, and the multi-stage feature aggregation module is connected with the deep learning model decoder.
The sample image and the test image both contain foreground objects to be segmented. For an X-ray image of the solidification process of a metal sample, the foreground object is a dendrite. The label of the sample image is used to indicate the position of the foreground object in the sample image, and for example, the label may be a binary image, where a pixel with a pixel value of 0 in the binary image corresponds to the foreground object in the sample image, and a pixel with a pixel value of 1 in the binary image corresponds to the background in the sample image. Referring to fig. 7, a schematic diagram of a sample image and a label is provided according to an embodiment of the present application, where the left image in fig. 7 is the sample image and the right image is the label for indicating the location of the dendrite in the sample image.
Referring to fig. 8, which is a schematic diagram of a training method of a segmentation model provided in an embodiment of the present application, as shown in fig. 8, a server may perform preliminary training on a deep learning model by using a sample image (train image in fig. 8) carrying a label (ground route in fig. 8), and when a downward trend of a loss function begins to be flat in a training process, the deep learning model may be considered to be converged in this round of training. The server can predict the multiple test images by using the preliminarily trained deep learning model to obtain a prediction result.
The server can screen out a prediction result with the confidence coefficient larger than a preset confidence coefficient threshold value as a pseudo label of the test image. The confidence threshold may be an empirical value for screening reliable prediction results, for example, the confidence threshold may be 0.7, and the prediction results with confidence greater than 0.7 may be used as the pseudo-tags of the corresponding test images.
The server can take the pseudo label of the test image as a real label, add the test image carrying the pseudo label into the training data, and train the preliminarily trained deep learning model again through the sample image carrying the label and the test image carrying the pseudo label in the training data, so as to obtain the secondarily trained deep learning model.
The server side can predict a plurality of predicted images by using the deep learning model after secondary training to obtain a prediction result, and screens out the prediction result with the confidence coefficient greater than the confidence coefficient threshold value from the prediction result as a pseudo label of the test image.
The server can add the test image carrying the pseudo label into the training data, and train the deep learning model after the secondary training again through the sample image carrying the label and the test image carrying the pseudo label in the training data, so that the deep learning model after the tertiary training is obtained.
After the process is iterated for multiple times, the pseudo label can be gradually improved, and finally the segmentation model is obtained.
For some foreground objects, there may be large morphological differences in the images of different batches. The test images and the corresponding test results are added into the training data to train the deep learning model again, so that richer data characteristics and supervision information about foreground objects in the sample can be provided for the segmentation model. The deep learning model can gradually improve the understanding and recognition capability of semantic noise and improve the robust performance and generalization capability of the model by learning most correct supervision information. And the model with stronger performance can predict a more reliable and accurate false label as a real label of the next iteration, so that the performance of the model can be further improved.
In an embodiment, the server may perform self-training on the deep learning model by using the sample image and the test image when performing self-training on the deep learning model, and integrate a plurality of deep learning submodels at each iteration based on a cosine annealing learning rate adjustment strategy.
The cosine annealing learning strategy uses a cosine function to gradually reduce the current learning rate in each adjusting period, and after the current adjusting period is finished, the next adjusting period resets the current learning rate to an initial value.
In one iteration of self-training, a cosine annealing learning rate adjustment strategy is used, so that each deep learning model can converge function values of loss functions of the deep learning models to a plurality of local minimum values along an optimization path of the deep learning models, and a server can store model parameters corresponding to the local minimum values, so that a plurality of deep learning submodels are obtained. Here, the deep learning submodel is a model corresponding to each local minimum value at each iteration in the training process.
The one-time iteration can comprise a plurality of learning rate adjustment periods, in one learning rate adjustment period, because the current learning rate is gradually reduced, the deep learning model correspondingly converges to a certain local minimum value point of the loss function, and the current model parameters are stored in the current adjustment period technology, so that the deep learning submodel is obtained. When the next learning rate adjustment period begins, the current learning rate is reset to the initial learning rate. Since the initial learning rate is large, the model is updated with a large gradient at the beginning of the learning rate adjustment period, and thus the local minimum point that converged before can be skipped and gradually converged again to another local minimum point.
The space to which the deep learning model is fitted is extremely high-dimensional, which makes it possible for models that converge to multiple local minimum points to have similar performance indicators in a test set containing multiple test images, but the image features of interest of the respective models are different. For any test image, the server side can take the output average value of the multiple deep learning submodels as a prediction result, and the misjudgment of a single model can be corrected, so that a more accurate prediction result can be obtained after the iteration of the round and is taken as a pseudo label of the next round of self-training.
After multiple iterations, the server side can determine a segmentation model from the trained multiple deep learning submodels. After multiple iterations, the server can select one deep learning submodel from multiple deep learning submodels obtained in the last iteration as a segmentation model, and therefore training is completed.
Fig. 9 is an image segmentation apparatus according to an embodiment of the present invention, and as shown in fig. 9, the apparatus may include:
the encoding module 910 is configured to perform encoding processing on the target image through the feature extraction module of the segmentation model to obtain original encoding features of multiple stages;
a decoding module 920, configured to decode the original coding features of the multiple stages through the stage feature decoding module of the segmentation model to obtain intermediate features of multiple scales;
and a predicting module 930, configured to perform prediction after the multi-level feature aggregation module of the segmentation model fuses the intermediate features of the multiple scales, so as to obtain an image segmentation result.
In one embodiment, the feature extraction module is constructed based on SE-ResNet;
the encoding module 910 is further configured to perform encoding processing on the target image through the feature extraction module based on SE-ResNet to obtain original encoding features of multiple stages, and transmit the original encoding features to the stage feature decoding module of the same stage by stage through a skip connection.
In one embodiment, the stage feature decoding module comprises a plurality of sub-decoding modules;
the decoding module 920 is further configured to fuse the shallow feature and the deep feature corresponding to the decoding module by the multiple sub-decoding modules, respectively, to obtain an intermediate feature corresponding to the sub-decoding module; the deep features are intermediate features transmitted by a previous sub-decoding module or original coding features transmitted by a feature extraction module, and the shallow features are original coding features transmitted by a feature extraction module in the same stage in a jumping connection mode.
In an embodiment, the decoding module 920 is further configured to:
splicing the deep layer features and the shallow layer features to obtain spliced features;
compressing the splicing features by using a global pooling layer;
determining feature weights from the compressed splicing features through the convolutional layer and the active layer;
and multiplying the characteristic weight by the shallow characteristic, and adding the weighted shallow characteristic and the weighted deep characteristic to obtain an intermediate characteristic.
In one embodiment, the apparatus further comprises:
and a training module 940, configured to perform self-training on a preset deep learning model by using a plurality of sample images carrying labels and a plurality of test images, to obtain the segmentation model.
In an embodiment, the deep learning model comprises a plurality of deep learning submodels; a training module 840, further configured to:
self-training the deep learning submodels by utilizing the sample image and the test image, and integrating the deep learning submodels in each iteration based on a cosine annealing learning rate adjusting strategy;
and determining a segmentation model from the trained multiple deep learning submodels through multiple iterations.
The implementation process of the functions and actions of each module in the above device is detailed in the implementation process of the corresponding step in the above image segmentation method, and is not described again here.
In the embodiments provided in the present application, the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

Claims (10)

1. An image segmentation method, comprising:
coding the target image through a feature extraction module of the segmentation model to obtain original coding features of multiple stages;
decoding the original coding features of the multiple stages through a stage feature decoding module of the segmentation model to obtain intermediate features of multiple scales;
and predicting after fusing the intermediate features of the multiple scales through a multi-level feature aggregation module of the segmentation model to obtain an image segmentation result.
2. The method of claim 1, wherein the feature extraction module is constructed based on SE-ResNet;
the method for obtaining the original coding features of multiple stages by coding the target image through the feature extraction module of the segmentation model comprises the following steps:
and carrying out coding processing on the target image through a feature extraction module based on the SE-ResNet to obtain original coding features of a plurality of stages, and transmitting the original coding features to the stage feature decoding module of the same stage by stage through jump connection.
3. The method of claim 1, wherein the stage feature decoding module comprises a plurality of sub-decoding modules;
the step of decoding the original coding features of the multiple stages by the stage feature decoding module of the segmentation model to obtain intermediate features of multiple scales includes:
respectively fusing shallow features and deep features corresponding to the sub-decoding modules through a plurality of sub-decoding modules to obtain intermediate features corresponding to the sub-decoding modules; the deep features are intermediate features transmitted by a previous sub-decoding module or original coding features transmitted by a feature extraction module, and the shallow features are original coding features transmitted by a feature extraction module in the same stage in a jumping connection mode.
4. The method according to claim 3, wherein the fusing the shallow features and the deep features corresponding to the sub-decoding module to obtain the intermediate features corresponding to the sub-decoding module comprises:
splicing the deep layer features and the shallow layer features to obtain spliced features;
compressing the splicing features by using a global pooling layer;
determining feature weights from the compressed splicing features through the convolutional layer and the active layer;
and multiplying the characteristic weight by the shallow characteristic, and adding the weighted shallow characteristic and the weighted deep characteristic to obtain an intermediate characteristic.
5. The method of claim 1, wherein the segmentation model is trained by:
and self-training a preset deep learning model by using a plurality of sample images carrying labels and a plurality of test images to obtain the segmentation model.
6. The method of claim 5, wherein the deep learning model comprises a plurality of deep learning submodels;
the self-training is carried out on the preset deep learning model by utilizing a plurality of sample images carrying labels and a plurality of test images to obtain the segmentation model, and the self-training comprises the following steps:
self-training the deep learning submodels by utilizing the sample image and the test image, and integrating the deep learning submodels in each iteration based on a cosine annealing learning rate adjusting strategy;
and determining a segmentation model from the trained multiple deep learning submodels through multiple iterations.
7. An image segmentation apparatus, comprising:
the encoding module is used for encoding the target image through the feature extraction module of the segmentation model to obtain original encoding features of multiple stages;
the decoding module is used for decoding the original coding features of the multiple stages through the stage feature decoding module of the segmentation model to obtain intermediate features of multiple scales;
and the prediction module is used for predicting the intermediate features of the multiple scales after the intermediate features of the multiple scales are fused by the multi-level feature aggregation module of the segmentation model so as to obtain an image segmentation result.
8. The apparatus of claim 7, further comprising:
and the training module is used for carrying out self-training on a preset deep learning model by utilizing a plurality of sample images carrying labels and a plurality of test images to obtain the segmentation model.
9. An electronic device, characterized in that the electronic device comprises:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to perform the image segmentation method of any one of claims 1 to 6.
10. A computer-readable storage medium, characterized in that the storage medium stores a computer program executable by a processor to perform the image segmentation method according to any one of claims 1 to 6.
CN202110410164.1A 2021-04-16 2021-04-16 Image segmentation method and device, electronic device and storage medium Pending CN115222639A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110410164.1A CN115222639A (en) 2021-04-16 2021-04-16 Image segmentation method and device, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110410164.1A CN115222639A (en) 2021-04-16 2021-04-16 Image segmentation method and device, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN115222639A true CN115222639A (en) 2022-10-21

Family

ID=83604628

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110410164.1A Pending CN115222639A (en) 2021-04-16 2021-04-16 Image segmentation method and device, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN115222639A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117911804A (en) * 2023-05-09 2024-04-19 宁波大学 Semi-supervised segmentation model based on self-correcting pseudo-double model, training method and application

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117911804A (en) * 2023-05-09 2024-04-19 宁波大学 Semi-supervised segmentation model based on self-correcting pseudo-double model, training method and application

Similar Documents

Publication Publication Date Title
CN112001339B (en) Pedestrian social distance real-time monitoring method based on YOLO v4
CN112016500B (en) Group abnormal behavior identification method and system based on multi-scale time information fusion
CN110046550B (en) Pedestrian attribute identification system and method based on multilayer feature learning
CN110413838B (en) Unsupervised video abstract model and establishing method thereof
CN111696110B (en) Scene segmentation method and system
CN109886330B (en) Text detection method and device, computer readable storage medium and computer equipment
CN113538480A (en) Image segmentation processing method and device, computer equipment and storage medium
CN111079683A (en) Remote sensing image cloud and snow detection method based on convolutional neural network
CN112307883B (en) Training method, training device, electronic equipment and computer readable storage medium
CN111882620A (en) Road drivable area segmentation method based on multi-scale information
CN111062383A (en) Image-based ship detection depth neural network algorithm
CN114283350A (en) Visual model training and video processing method, device, equipment and storage medium
CN113065551A (en) Method for performing image segmentation using a deep neural network model
CN111047088A (en) Prediction image acquisition method and device, computer equipment and storage medium
CN113111716A (en) Remote sensing image semi-automatic labeling method and device based on deep learning
CN112364933A (en) Image classification method and device, electronic equipment and storage medium
CN113822287B (en) Image processing method, system, device and medium
Zhou et al. Efficient traffic accident warning based on unsupervised prediction framework
CN115222639A (en) Image segmentation method and device, electronic device and storage medium
CN113468357A (en) Image description text generation method and device
CN111339950B (en) Remote sensing image target detection method
CN117710841A (en) Small target detection method and device for aerial image of unmanned aerial vehicle
CN113570509A (en) Data processing method and computer device
CN116342624A (en) Brain tumor image segmentation method combining feature fusion and attention mechanism
CN116824291A (en) Remote sensing image learning method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination