CN115601542A - Image semantic segmentation method, system and equipment based on full-scale dense connection - Google Patents

Image semantic segmentation method, system and equipment based on full-scale dense connection Download PDF

Info

Publication number
CN115601542A
CN115601542A CN202211229781.2A CN202211229781A CN115601542A CN 115601542 A CN115601542 A CN 115601542A CN 202211229781 A CN202211229781 A CN 202211229781A CN 115601542 A CN115601542 A CN 115601542A
Authority
CN
China
Prior art keywords
image
semantic segmentation
full
convolution
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211229781.2A
Other languages
Chinese (zh)
Other versions
CN115601542B (en
Inventor
熊炜
田紫欣
陈奕博
强观臣
郑大定
汪锋
邹勤
王松
李利荣
宋海娜
李婕
涂静敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei University of Technology
Original Assignee
Hubei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei University of Technology filed Critical Hubei University of Technology
Priority to CN202211229781.2A priority Critical patent/CN115601542B/en
Publication of CN115601542A publication Critical patent/CN115601542A/en
Application granted granted Critical
Publication of CN115601542B publication Critical patent/CN115601542B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image semantic segmentation method, a system and equipment based on full-scale dense connection, wherein an image to be segmented is preprocessed and cut or filled to be a preset size; then, realizing semantic segmentation of the image to be segmented by using an image semantic segmentation network; the image semantic segmentation network (UNet 4 +) of the present invention receives the intermediate aggregated feature map from encoders of different scales through full-scale and dense hop connections, while each node in the decoder receives the intermediate aggregated feature map not only from encoders and decoders of different scales, but also from encoders of the same scale. Thus, the aggregation layer in the decoder can learn to use all the collected feature maps on the nodes. UNet4+ of the present invention alleviates the problem of gradient disappearance, which also maximizes information flow in the network; meanwhile, the feature propagation in the network is enhanced; has more compact model and extreme characteristic reusability.

Description

Image semantic segmentation method, system and equipment based on full-scale dense connection
Technical Field
The invention belongs to the technical field of artificial intelligence, deep learning and image processing, and relates to an image semantic segmentation method, system and equipment, in particular to an image semantic segmentation method, system and equipment based on a full-scale dense connection semantic segmentation network.
Background
Image Semantic Segmentation (Semantic Segmentation) is an important ring in image processing and machine vision technology with respect to image understanding, and is also an important branch in the AI field. The semantic segmentation is to classify each pixel point in the image, determine the category (such as belonging to the background, people or vehicles) of each point, and thus perform region division. At present, semantic segmentation is widely applied to scenes such as automatic driving and unmanned aerial vehicle point-of-fall determination.
At present, the problem of image semantic segmentation is solved, and a UNet architecture and UNet are adopted e UNet +, UNet + +, UNet3+, and the like.
The UNet architecture (O.Ronneberger, P.Fischer, and T.Brox, "U-net: computational networks for biological Image segmentation," in 18th International Conference on Medical Image Computing and Computer-Assisted interpretation (MICCAI 2015), munich, GERMANY,2015, reference proceedings, pp.234-241.) has become a de facto standard for various Image segmentation tasks and has met with great success. It is a typical encoder-decoder cascaded architecture, where the encoder (the contracted path) performs feature extraction and the decoder (the expanded path) performs resolution restoration. The UNet architecture is most attractive with its long hop connections, which allows the same scale of information to flow directly from the encoder to the decoder, enabling the model to make better predictions.
However, such a relatively fixed structure makes it difficult for the model to balance the receptive field size and the boundary segmentation accuracy. It is now generally accepted that deeper networks have better non-linear characterizations, which can learn more complex transformations, adapting to more complex features. But deeper networks introduce the so-called gradient vanishing problem and reduce the learning power of the shallow layers. When the network depth reaches a certain level, the segmentation performance does not improve, but may decrease.
To determine the optimal depth of UNet architecture, zhou et al (Z.Zhou, M.M.R.Siddique, N.Tajbakhsh, and J.Liang, "Unet + +: identifying skip connections to extension multiscale defects in image segmentation," IEEE Transactions on medical Imaging, vol.39, no.6, pp.1856-1867, 2020.) propose an integrated architecture UNet architecture e It combines unets of different depths into one unified architecture. Integrated architectures benefit from knowledge sharing, UNet e All UNet parts within the architecture share the encoder, but have separate decoders. Since the decoder in this architecture is off, the deeper UNet cannot provide a supervisory signal to the shallower counterpart. Therefore, explicit deep supervision is required in the combination.
Another solution to overcome the above limitation is to use UNet e All hopping connections in the structure are removed, and a short hopping connection is usedTo connect each neighboring node in the set to form a nested structure called UNet +, so that gradient backpropagation will pass from the deeper decoder to the shallower corresponding node. This idea is almost simultaneously addressed by Yu et al (F.Yu, D.Wang, E.Shell, and T.Darrell, "Deep layer aggregation," in 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018), salt Lake City, UT, USA,2018, conference proceedings, pp.2403-2412.) and Zhou et al (Z.Zhou, M.M.R.Siddique, N.Tajbakhs, and J.Liang, "net +: A nested u-network architecture for the statistical Image segmentation," in 4th International works on Deep Learning in Medical Image Analysis (DLMIA 2018) Held in connection with MICCAI 2018, granada, SPAIN,2018, conference proceedings, pp.3-11), respectively.
Notably, each node in the UNet + architecture integrates the feature maps of its neighboring ancestors on different scales from a horizontal perspective in conjunction with the feature maps of their neighboring ancestors on the same scale from a vertical perspective. To ensure maximum information flow between unets of all different depths within the UNet + architecture, zhou et al also proposes a nested UNet architecture with dense hop connections, called UNet + +, whose decoders are densely connected in the same dimension from a horizontal perspective. Redesigned same-scale hopping connections make dense feature propagation more flexible, connecting all previous feature maps directly together.
Although convincing as a natural design, there is no solid theory to ensure that the same scale feature map is the best match for feature fusion. To utilize full scale features in image segmentation, huang et al (H.Huang, L.Lin, R.Tong, H.Hu, Q.Zhang, Y.Iwamoto, X.Han, Y. -W.Chen, and J.Wu, "Unet 3+: A full-scale connected equation for media segmentation," in 45th IEEE International Conference on optics, speed, and Signal Processing (ICASSP 2020), barcelona, SPAIN,2020, conference proceedings, 1055-1059.) propose UNet3+, which combines fine-grained low-level detailed feature maps with coarse-grained high-level feature maps of different scales. However, UNet3+ only partially redesigns the long hop connection between the encoder and decoder and the short hop connection within the decoder.
Although the use of different scale feature maps in a decoder using the UNet3+ architecture is much less restrictive than the use of the same scale feature maps in an encoder using UNet, UNet + and UNet + + architectures, there is still room for improvement.
Disclosure of Invention
In order to solve the above technical problem, the image semantic segmentation network adopted by the invention uses all full-scale and dense jump connections inside and between the encoder and the decoder, thereby forming the image semantic segmentation network (UNet 4+ architecture) of the invention.
The technical scheme adopted by the method is as follows: an image semantic segmentation method based on full-scale dense connection comprises the following steps:
step 1: preprocessing an image to be segmented, and cutting or filling the image to be segmented into a preset size;
and 2, step: realizing semantic segmentation of an image to be segmented by using an image semantic segmentation network;
the image semantic segmentation network comprises an encoder, a decoder, full-scale dense jump connection and full-scale deep supervision; the encoder consists of 5 coding convolution blocks, the 1st to 4th coding convolution blocks respectively comprise 2 convolution layers consisting of Conv, instanceNorm and LeakyReLU which are connected in sequence and 1 downsampling layer MaxPoint, and the 5th coding convolution block only comprises 2 convolution layers consisting of Conv, instanceNorm and LeakyReLU which are connected in sequence; the number of output channels of each coding convolution block is C, 2C, 4C, 8C and 16C respectively, the sizes of convolution kernels are 3 multiplied by 3, and the size of a maximum pooling kernel and the pooling step length are 2 multiplied by 2; the decoder is composed of 4 decoding convolution blocks, each decoding convolution block comprises 1 upsampling layer Biliner, 1 fusion layer Conscatenate and 2 convolution layers, all encoder feature maps or decoder feature maps positioned in front of the decoding block are cascaded together through full-scale dense skip connection, and the side output of each decoding convolution block is subjected to channel number alignment by 1 x 1 convolution layer, so that subsequent full-scale deep supervision is realized.
The technical scheme adopted by the system of the invention is as follows: an image semantic segmentation system based on full-scale dense connection comprises the following modules:
the module 1 is used for preprocessing an image to be segmented and cutting or filling the image to be segmented into a preset size;
the module 2 is used for realizing semantic segmentation of the image to be segmented by using an image semantic segmentation network;
the image semantic segmentation network comprises an encoder, a decoder, full-scale dense jump connection and full-scale deep supervision; the encoder consists of 5 coding convolution blocks, the 1st to 4th coding convolution blocks respectively comprise 2 convolution layers consisting of Conv, instanceNorm and LeakyReLU which are connected in sequence and 1 downsampling layer MaxPoint, and the 5th coding convolution block only comprises 2 convolution layers consisting of Conv, instanceNorm and LeakyReLU which are connected in sequence; the number of output channels of each coding convolution block is C, 2C, 4C, 8C and 16C respectively, the sizes of convolution kernels are 3 multiplied by 3, and the size of a maximum pooling kernel and the pooling step length are 2 multiplied by 2; the decoder is composed of 4 decoding convolution blocks, each decoding convolution block comprises 1 upsampling layer Biliner, 1 fusion layer Conscatenate and 2 convolution layers, all encoder feature maps or decoder feature maps positioned in front of the decoding block are cascaded together through full-scale dense skip connection, and the side output of each decoding convolution block is subjected to channel number alignment by 1 x 1 convolution layer, so that subsequent full-scale deep supervision is realized.
The technical scheme adopted by the equipment of the invention is as follows: an image semantic segmentation device based on full-scale dense connection comprises:
one or more processors;
a storage device for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the full-scale dense connectivity-based image semantic segmentation method.
The image semantic segmentation network (UNet 4 +) network provided by the invention has the following advantages:
(1) UNet4+ is connected by a direct hop between any two volume blocks, thereby alleviating the problem of gradient vanishing, which also maximizes information flow in the network.
(2) UNet4+ makes extensive use of feature concatenation, thereby enhancing feature propagation in the network.
(3) UNet4+ results in a more compact model and extreme feature reusability by aggregating a large number of feature maps in the network back-end volume block.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an image semantic segmentation network (UNet 4 +) according to an embodiment of the present invention.
Detailed Description
In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.
Referring to fig. 1, the image semantic segmentation method based on full-scale dense connection provided by the present invention includes the following steps:
step 1: preprocessing an image to be segmented, and cutting or filling the image to be segmented into a preset size;
in this embodiment, the image to be segmented may be read in grayscale or color, where the number of channels of the grayscale image is 1 and the number of channels of the color image is 3. The input image resolution may be any size and is cropped into an image block of 512 x 512 size. When the image is cut, the overlapping area of the adjacent image blocks is recommended to be not less than 5% so as to avoid that the tiny targets at the edges of the image blocks cannot be completely detected. If the input image resolution is less than 512 x 512, the image block boundaries are filled with the mirror image.
Step 2: realizing semantic segmentation of an image to be segmented by using an image semantic segmentation network;
referring to fig. 2, the image semantic segmentation network of the present embodiment includes an encoder, a decoder, full-scale dense jump connection, and full-scale deep supervision; wherein the encoder is composed of 5 convolutional blocks, each of the 1st to 4th convolutional blocks includes 2 convolutional layers (Conv → InstanceNorm → leakyreu) and 1 downsampling layer (MaxPooling), and the 5th convolutional block includes only 2 convolutional layers. The number of output channels of each convolution block is C, 2C, 4C, 8C and 16C respectively, the sizes of convolution kernels are 3 multiplied by 3, and the size of the maximum pooling kernel and the pooling step length are 2 multiplied by 2. The decoder consists of 4 convolution blocks, each convolution block comprises 1 upsampling layer (upsampling Biliner), 1 fusion layer (conditioner) and 2 convolution layers, all the codec characteristic diagrams (downsampling or upsampling is needed if necessary to ensure the consistent characteristic diagram dimension) positioned in front of the decoding block are cascaded together through full-scale dense skip connection, and the side output of each decoding convolution block is subjected to channel number alignment by 1 × 1 convolution layer, so that the subsequent full-scale deep supervision is realized.
The full-scale dense hop connection is redesigned in the image semantic segmentation network (UNet 4+ architecture) of the embodiment. Let node X i X for the output characteristic map i Where the superscript i is indexed along the downsampled layer of the encoder and N represents the depth of the network layer. The characteristic diagrams of the encoder side and the decoder side are respectively used
Figure BDA0003881025930000051
And
Figure BDA0003881025930000052
it can be expressed as:
Figure BDA0003881025930000053
and
Figure BDA0003881025930000054
wherein the content of the first and second substances,
Figure BDA0003881025930000055
showing the layer of the convolution layer,
Figure BDA0003881025930000056
the representation being formed of a plurality of successive
Figure BDA0003881025930000057
The convolution layer is formed by convolution layers to form a convolution block,
Figure BDA0003881025930000058
and
Figure BDA0003881025930000059
respectively representing a down-sampling layer and an up-sampling layer, the number of output channels of a node following each sampling layer being determined by
Figure BDA00038810259300000510
Adjustment of the convolutional layer, symbol [. ]]Indicating a cascading operation.
As shown in FIG. 2, only one input passes through the encoder node
Figure BDA00038810259300000511
Enter the UNet4+ architecture proposed in this embodiment and locate in the ith>Other encoder nodes of layer 1
Figure BDA00038810259300000512
Only i-1 down-sampled inputs can be received from all upper nodes of the encoder. Is located at the ith<Decoder node of N layers
Figure BDA00038810259300000513
N-i-1 upsampled inputs are received from the decoding side and N inputs (of which i-1 downsampled, 1 co-scale, N-i upsampled inputs) are received from the encoding side. The main reason for designing all previous signatures to be accumulated and concatenated to the current node is that this embodiment utilizes dense hop connections both between the encoder and decoder and within.
The present embodiment introduces two distinct full-scale deep supervision mechanisms in UNet4+ architecture.
Mechanism 1: with UNet e UNet + and UNet + + pairs of intermediate same-scale feature mapsInstead of performing deep supervision, the proposed UNet4+ produces a side output at each decoded volume block, similar to UNet3+, but with several subtle and important differences. This embodiment is implemented in the decoder node
Figure BDA00038810259300000514
And
Figure BDA00038810259300000515
the side output ends of the nodes are added with 1 up-sampling layer of bilinear interpolation, so that the output characteristic graphs of the nodes have AND nodes
Figure BDA00038810259300000516
The same spatial resolution. The 4 side outputs are then cascaded or summed pixel by pixel in the channel dimension, and a predicted image is output via 1 3 × 3 convolutional layer (Conv → Sigmoid) (the input of which is mapped to [0,1 ] by Sigmoid activation function]In between).
Mechanism 2: decoder node
Figure BDA00038810259300000517
The side outputs 1 up-sampling layer with bilinear interpolation and 1 convolution layer with 1 multiplied by 1, so that the output characteristic graph has the same node
Figure BDA00038810259300000518
The spatial resolution and the channel dimension are the same, and then multiplication or addition operation is carried out pixel by pixel; the fused feature map is output with nodes through 1 bilinear interpolation up-sampling layer and 1 x 1 convolution layer
Figure BDA0003881025930000061
The spatial resolution and the channel dimension are the same, and then multiplication or addition operation is carried out pixel by pixel; the fused feature graph is further processed by 1 bilinear interpolation upsampling layer and 1 convolution layer of 1 multiplied by 1, so that the output of the feature graph has a node
Figure BDA0003881025930000062
The same spatial resolution and channel dimensions, and then pixel-by-pixel multiplication or addition. Finally, a prediction image is output through 1 3 × 3 convolution layers (Conv → Sigmoid).
The image semantic segmentation network is a trained image semantic segmentation network; this embodiment defines a blended segmentation loss function that is optimized as a weighted average of the Binary Cross Entropy (BCE) loss, the Die Similarity Coefficient (DSC) loss, and the image average accuracy loss at different IoU thresholds.
The binary cross entropy loss of this example is defined as:
Figure BDA0003881025930000063
wherein y and
Figure BDA0003881025930000064
and the prediction segmentation probability maps are respectively corresponding to the GT binary label and the model.
The die similarity factor loss of this embodiment is defined as:
Figure BDA0003881025930000065
wherein, y and
Figure BDA0003881025930000066
and the prediction segmentation probability graphs are respectively corresponding to the GT binary label and the model.
The present embodiment also evaluates using image average precision values for different IoU thresholds t, ranging from 0.5 to 0.95, with a step size of 0.05 (i.e., 0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0.95). For example, below a threshold of 0.5, a predicted tag is considered a hit if the IoU of the GT tag is greater than 0.5. Therefore, the loss of image average accuracy of the present embodiment is defined as:
Figure BDA0003881025930000067
wherein, t is different IoU threshold values,
Figure BDA0003881025930000068
to represent
Figure BDA0003881025930000069
The prediction result at threshold t, | thresholds | is the total number of different IoU thresholds.
Finally, by combining all three loss terms, the mixed partition loss used in this embodiment is defined as:
Figure BDA00038810259300000610
in all experiments, the weighting factor α BCE 、α DSC And alpha mAP Set to 0.4, 0.2 and 0.4, respectively.
The present invention proposes to use all full-scale and dense hopping connections inside and between the encoder and decoder, thus forming the final UNet4+ architecture of the present embodiment. With full-scale and dense hop connections, each node in the encoder receives the intermediate aggregated feature map from encoders of different scales, while each node in the decoder receives the intermediate aggregated feature map not only from encoders and decoders of different scales, but also from encoders of the same scale. Thus, the aggregation layer in the decoder can learn to use all collected feature maps on the node. And UNet e In contrast, none, UNet + +, UNet3+, and the proposed UNet4+ architecture require explicit deep supervision.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (7)

1. A full-scale dense connection-based image semantic segmentation method is characterized by comprising the following steps:
step 1: preprocessing an image to be segmented, and cutting or filling the image to be segmented into a preset size;
step 2: realizing semantic segmentation of an image to be segmented by using an image semantic segmentation network;
the image semantic segmentation network comprises an encoder, a decoder, full-scale dense jump connection and full-scale deep supervision; the encoder consists of 5 coding convolution blocks, the 1st to 4th coding convolution blocks respectively comprise 2 convolution layers consisting of Conv, instanceNorm and LeakyReLU which are connected in sequence and 1 downsampling layer MaxPoint, and the 5th coding convolution block only comprises 2 convolution layers consisting of Conv, instanceNorm and LeakyReLU which are connected in sequence; the number of output channels of each coding convolution block is C, 2C, 4C, 8C and 16C respectively, the sizes of convolution kernels are 3 multiplied by 3, and the size of a maximum pooling kernel and the pooling step length are 2 multiplied by 2; the decoder consists of 4 decoding convolution blocks, each decoding convolution block comprises 1 upsampling layer Biliner, 1 fusion layer Conscatenate and 2 convolution layers, all the encoder feature diagrams or decoder feature diagrams positioned in front of the decoding block are cascaded together through full-scale dense skip connection, and the side output of each decoding convolution block is subjected to channel number alignment by 1 multiplied by 1 convolution layer, so that subsequent full-scale deep supervision is realized.
2. The full-scale dense connection-based image semantic segmentation method according to claim 1, characterized in that: in the step 1, if the resolution of the image to be segmented is larger than the preset size, the image to be segmented is segmented into image blocks with the preset size; and if the resolution of the image to be segmented is smaller than the preset size, filling the image block boundary by adopting mirror image, and filling the image block boundary into the image with the preset size.
3. The full-scale dense connection-based image of claim 1The semantic segmentation method is characterized by comprising the following steps: in step 2, the characteristic diagrams of the encoder end and the decoder end of the image semantic segmentation network are respectively used
Figure FDA0003881025920000011
And
Figure FDA0003881025920000012
representing input by encoder nodes
Figure FDA0003881025920000013
Enter the image semantic segmentation network and is positioned at the ith>Other encoder nodes of layer 1
Figure FDA0003881025920000014
Only i-1 down-sampled inputs can be received from all upper nodes of the encoder; is located at the ith<Decoder node of N layers
Figure FDA0003881025920000015
Receiving N-i-1 upsampled inputs from the decoding side and N inputs from the encoding side; wherein the superscript i is indexed along a downsampled layer of the encoder, and N represents the depth of the network layer;
the full-scale deep supervision is performed at a decoder node
Figure FDA0003881025920000016
And
Figure FDA0003881025920000017
the side output ends of the nodes are added with 1 up-sampling layer of bilinear interpolation, so that the output characteristic graphs of the nodes have AND nodes
Figure FDA0003881025920000018
The same spatial resolution; then, the 4 side outputs are cascaded or added pixel by pixel in channel dimension, and then 1 is composed of Conv and SigmoidThe 3 x 3 convolutional layer outputs a predicted image.
4. The full-scale dense connection-based image semantic segmentation method according to claim 1, characterized in that: in step 2, the characteristic diagrams of the encoder end and the decoder end of the image semantic segmentation network are respectively used
Figure FDA0003881025920000019
And
Figure FDA00038810259200000110
representing input by encoder nodes
Figure FDA00038810259200000111
Enter the image semantic segmentation network and is positioned at the ith>Other encoder nodes of layer 1
Figure FDA00038810259200000112
Only i-1 down-sampled inputs can be received from all upper nodes of the encoder; is located at the ith<Decoder node of N layers
Figure FDA0003881025920000021
Receiving N-i-1 upsampled inputs from the decoding side and N inputs from the encoding side; wherein the superscript i is indexed along a downsampled layer of the encoder, and N represents the depth of the network layer;
the full-scale deep supervision is performed at a decoder node
Figure FDA0003881025920000022
The side outputs 1 up-sampling layer with bilinear interpolation and 1 convolution layer with 1 multiplied by 1, so that the output characteristic graph has the same node
Figure FDA0003881025920000023
The spatial resolution and the channel dimension are the same, and then multiplication or addition operation is carried out pixel by pixel; the fused feature map is formed by 1An upsampled layer of bilinear interpolation and 1 convolutional layer of 1 × 1, with output having an AND node
Figure FDA0003881025920000024
The spatial resolution and the channel dimension are the same, and then multiplication or addition operation is carried out pixel by pixel; the fused feature graph is further processed by 1 bilinear interpolation upsampling layer and 1 convolution layer of 1 multiplied by 1, so that the output of the feature graph has a node
Figure FDA0003881025920000025
The spatial resolution and the channel dimension are the same, and then multiplication or addition operation is carried out pixel by pixel; finally, a predicted image is output through 1 3 × 3 convolutional layer composed of Conv and Sigmoid.
5. The method for semantically segmenting the image based on the full-scale dense connection according to any one of claims 1 to 4, wherein: the image semantic segmentation network is a trained image semantic segmentation network; the loss function adopted in the training is a mixed segmentation loss function which is a weighted average of binary cross entropy BCE loss, dice similarity coefficient DSC loss and image average precision loss under different IoU thresholds;
the binary cross-entropy BCE loss is defined as:
Figure FDA0003881025920000026
wherein y and
Figure FDA0003881025920000027
the image semantic segmentation network comprises GT binary labels and a prediction segmentation probability graph corresponding to the image semantic segmentation network;
the dice similarity coefficient DSC loss is defined as:
Figure FDA0003881025920000028
the average precision loss of the images under the different IoU thresholds is defined as:
Figure FDA0003881025920000029
wherein t is different IoU threshold values, the threshold value range is from 0.5 to 0.95, and the step length is 0.05;
Figure FDA00038810259200000210
to represent
Figure FDA00038810259200000211
The predicted result under the threshold t, | thresholds | is the total number of different IoU thresholds;
finally, by combining all three loss terms, a mixed partition loss is obtained as:
Figure FDA00038810259200000212
wherein alpha is BCE 、α DSC And alpha mAP Respectively, are weighting coefficients.
6. An image semantic segmentation system based on full-scale dense connection is characterized by comprising the following modules:
the module 1 is used for preprocessing an image to be segmented and cutting or filling the image to be segmented into a preset size;
the module 2 is used for realizing semantic segmentation of the image to be segmented by using an image semantic segmentation network;
the image semantic segmentation network comprises an encoder, a decoder, full-scale dense jump connection and full-scale deep supervision; the encoder consists of 5 coding convolution blocks, the 1st to 4th coding convolution blocks respectively comprise 2 convolution layers consisting of Conv, instanceNorm and LeakyReLU which are connected in sequence and 1 downsampling layer MaxPoint, and the 5th coding convolution block only comprises 2 convolution layers consisting of Conv, instanceNorm and LeakyReLU which are connected in sequence; the number of output channels of each coding convolution block is respectively C, 2C, 4C, 8C and 16C, the sizes of convolution kernels are all 3 multiplied by 3, and the size of the maximum pooling kernel and the pooling step length are all 2 multiplied by 2; the decoder is composed of 4 decoding convolution blocks, each decoding convolution block comprises 1 upsampling layer Biliner, 1 fusion layer Conscatenate and 2 convolution layers, all encoder feature maps or decoder feature maps positioned in front of the decoding block are cascaded together through full-scale dense skip connection, and the side output of each decoding convolution block is subjected to channel number alignment by 1 x 1 convolution layer, so that subsequent full-scale deep supervision is realized.
7. An image semantic segmentation device based on full-scale dense connection is characterized by comprising:
one or more processors;
storage means for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the full-scale dense-connectivity-based image semantic segmentation method according to any one of claims 1 to 5.
CN202211229781.2A 2022-10-08 2022-10-08 Image semantic segmentation method, system and equipment based on full-scale dense connection Active CN115601542B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211229781.2A CN115601542B (en) 2022-10-08 2022-10-08 Image semantic segmentation method, system and equipment based on full-scale dense connection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211229781.2A CN115601542B (en) 2022-10-08 2022-10-08 Image semantic segmentation method, system and equipment based on full-scale dense connection

Publications (2)

Publication Number Publication Date
CN115601542A true CN115601542A (en) 2023-01-13
CN115601542B CN115601542B (en) 2023-07-21

Family

ID=84846535

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211229781.2A Active CN115601542B (en) 2022-10-08 2022-10-08 Image semantic segmentation method, system and equipment based on full-scale dense connection

Country Status (1)

Country Link
CN (1) CN115601542B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115909001A (en) * 2023-03-09 2023-04-04 和普威视光电股份有限公司 Target detection method and system fusing dense nested jump connection

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490884A (en) * 2019-08-23 2019-11-22 北京工业大学 A kind of lightweight network semantic segmentation method based on confrontation
US20190385021A1 (en) * 2018-06-18 2019-12-19 Drvision Technologies Llc Optimal and efficient machine learning method for deep semantic segmentation
US20200380695A1 (en) * 2019-05-28 2020-12-03 Zongwei Zhou Methods, systems, and media for segmenting images
CN113807355A (en) * 2021-07-29 2021-12-17 北京工商大学 Image semantic segmentation method based on coding and decoding structure
CN114220098A (en) * 2021-12-21 2022-03-22 一拓通信集团股份有限公司 Improved multi-scale full-convolution network semantic segmentation method
CN114283164A (en) * 2022-03-02 2022-04-05 华南理工大学 Breast cancer pathological section image segmentation prediction system based on UNet3+
CN114332117A (en) * 2021-12-23 2022-04-12 杭州电子科技大学 Post-earthquake landform segmentation method based on UNET3+ and full-connection condition random field fusion
CN114677671A (en) * 2022-02-18 2022-06-28 深圳大学 Automatic identifying method for old ribs of preserved szechuan pickle based on multispectral image and deep learning
CN114863274A (en) * 2022-04-26 2022-08-05 北京市测绘设计研究院 Surface green net thatch cover extraction method based on deep learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190385021A1 (en) * 2018-06-18 2019-12-19 Drvision Technologies Llc Optimal and efficient machine learning method for deep semantic segmentation
US20200380695A1 (en) * 2019-05-28 2020-12-03 Zongwei Zhou Methods, systems, and media for segmenting images
CN110490884A (en) * 2019-08-23 2019-11-22 北京工业大学 A kind of lightweight network semantic segmentation method based on confrontation
CN113807355A (en) * 2021-07-29 2021-12-17 北京工商大学 Image semantic segmentation method based on coding and decoding structure
CN114220098A (en) * 2021-12-21 2022-03-22 一拓通信集团股份有限公司 Improved multi-scale full-convolution network semantic segmentation method
CN114332117A (en) * 2021-12-23 2022-04-12 杭州电子科技大学 Post-earthquake landform segmentation method based on UNET3+ and full-connection condition random field fusion
CN114677671A (en) * 2022-02-18 2022-06-28 深圳大学 Automatic identifying method for old ribs of preserved szechuan pickle based on multispectral image and deep learning
CN114283164A (en) * 2022-03-02 2022-04-05 华南理工大学 Breast cancer pathological section image segmentation prediction system based on UNet3+
CN114863274A (en) * 2022-04-26 2022-08-05 北京市测绘设计研究院 Surface green net thatch cover extraction method based on deep learning

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
HUANG等: "Unet 3+:A full-scale connected unet for medical image segmentation", 《ICASSP 2020》 *
JUAN WANG等: "Image Semantic Segmentation Algorithm Based on Self-learning Super-Pixel Feature Extraction", 《EIDWT 2018》 *
李万琦;李克俭;陈少波;: "多模态融合的高分遥感图像语义分割方法", 中南民族大学学报(自然科学版), no. 04 *
田启川;孟颖;: "卷积神经网络图像语义分割技术", 小型微型计算机系统, no. 06 *
郑凯;李建胜;: "基于深度神经网络的图像语义分割综述", 测绘与空间地理信息, no. 10 *
马震环;高洪举;雷涛;: "基于增强特征融合解码器的语义分割算法", 计算机工程, no. 05 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115909001A (en) * 2023-03-09 2023-04-04 和普威视光电股份有限公司 Target detection method and system fusing dense nested jump connection

Also Published As

Publication number Publication date
CN115601542B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN111210443B (en) Deformable convolution mixing task cascading semantic segmentation method based on embedding balance
CN111325751B (en) CT image segmentation system based on attention convolution neural network
CN110232394B (en) Multi-scale image semantic segmentation method
CN111144329B (en) Multi-label-based lightweight rapid crowd counting method
CN111179167B (en) Image super-resolution method based on multi-stage attention enhancement network
CN110084234B (en) Sonar image target identification method based on example segmentation
CN111091130A (en) Real-time image semantic segmentation method and system based on lightweight convolutional neural network
CN114283164B (en) Breast cancer pathological section image segmentation prediction system based on UNet3+
CN115457498A (en) Urban road semantic segmentation method based on double attention and dense connection
CN114549439A (en) RGB-D image semantic segmentation method based on multi-modal feature fusion
CN113888547A (en) Non-supervision domain self-adaptive remote sensing road semantic segmentation method based on GAN network
CN112883887B (en) Building instance automatic extraction method based on high spatial resolution optical remote sensing image
CN114048822A (en) Attention mechanism feature fusion segmentation method for image
CN116469100A (en) Dual-band image semantic segmentation method based on Transformer
CN115601723A (en) Night thermal infrared image semantic segmentation enhancement method based on improved ResNet
CN115601542B (en) Image semantic segmentation method, system and equipment based on full-scale dense connection
CN116129291A (en) Unmanned aerial vehicle animal husbandry-oriented image target recognition method and device
CN114693929A (en) Semantic segmentation method for RGB-D bimodal feature fusion
CN117078930A (en) Medical image segmentation method based on boundary sensing and attention mechanism
CN115527096A (en) Small target detection method based on improved YOLOv5
CN112149526A (en) Lane line detection method and system based on long-distance information fusion
CN116630704A (en) Ground object classification network model based on attention enhancement and intensive multiscale
CN115082928A (en) Method for asymmetric double-branch real-time semantic segmentation of network for complex scene
CN112418229A (en) Unmanned ship marine scene image real-time segmentation method based on deep learning
CN116542988A (en) Nodule segmentation method, nodule segmentation device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant