CN115601542A - Image semantic segmentation method, system and equipment based on full-scale dense connection - Google Patents
Image semantic segmentation method, system and equipment based on full-scale dense connection Download PDFInfo
- Publication number
- CN115601542A CN115601542A CN202211229781.2A CN202211229781A CN115601542A CN 115601542 A CN115601542 A CN 115601542A CN 202211229781 A CN202211229781 A CN 202211229781A CN 115601542 A CN115601542 A CN 115601542A
- Authority
- CN
- China
- Prior art keywords
- image
- semantic segmentation
- full
- convolution
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an image semantic segmentation method, a system and equipment based on full-scale dense connection, wherein an image to be segmented is preprocessed and cut or filled to be a preset size; then, realizing semantic segmentation of the image to be segmented by using an image semantic segmentation network; the image semantic segmentation network (UNet 4 +) of the present invention receives the intermediate aggregated feature map from encoders of different scales through full-scale and dense hop connections, while each node in the decoder receives the intermediate aggregated feature map not only from encoders and decoders of different scales, but also from encoders of the same scale. Thus, the aggregation layer in the decoder can learn to use all the collected feature maps on the nodes. UNet4+ of the present invention alleviates the problem of gradient disappearance, which also maximizes information flow in the network; meanwhile, the feature propagation in the network is enhanced; has more compact model and extreme characteristic reusability.
Description
Technical Field
The invention belongs to the technical field of artificial intelligence, deep learning and image processing, and relates to an image semantic segmentation method, system and equipment, in particular to an image semantic segmentation method, system and equipment based on a full-scale dense connection semantic segmentation network.
Background
Image Semantic Segmentation (Semantic Segmentation) is an important ring in image processing and machine vision technology with respect to image understanding, and is also an important branch in the AI field. The semantic segmentation is to classify each pixel point in the image, determine the category (such as belonging to the background, people or vehicles) of each point, and thus perform region division. At present, semantic segmentation is widely applied to scenes such as automatic driving and unmanned aerial vehicle point-of-fall determination.
At present, the problem of image semantic segmentation is solved, and a UNet architecture and UNet are adopted e UNet +, UNet + +, UNet3+, and the like.
The UNet architecture (O.Ronneberger, P.Fischer, and T.Brox, "U-net: computational networks for biological Image segmentation," in 18th International Conference on Medical Image Computing and Computer-Assisted interpretation (MICCAI 2015), munich, GERMANY,2015, reference proceedings, pp.234-241.) has become a de facto standard for various Image segmentation tasks and has met with great success. It is a typical encoder-decoder cascaded architecture, where the encoder (the contracted path) performs feature extraction and the decoder (the expanded path) performs resolution restoration. The UNet architecture is most attractive with its long hop connections, which allows the same scale of information to flow directly from the encoder to the decoder, enabling the model to make better predictions.
However, such a relatively fixed structure makes it difficult for the model to balance the receptive field size and the boundary segmentation accuracy. It is now generally accepted that deeper networks have better non-linear characterizations, which can learn more complex transformations, adapting to more complex features. But deeper networks introduce the so-called gradient vanishing problem and reduce the learning power of the shallow layers. When the network depth reaches a certain level, the segmentation performance does not improve, but may decrease.
To determine the optimal depth of UNet architecture, zhou et al (Z.Zhou, M.M.R.Siddique, N.Tajbakhsh, and J.Liang, "Unet + +: identifying skip connections to extension multiscale defects in image segmentation," IEEE Transactions on medical Imaging, vol.39, no.6, pp.1856-1867, 2020.) propose an integrated architecture UNet architecture e It combines unets of different depths into one unified architecture. Integrated architectures benefit from knowledge sharing, UNet e All UNet parts within the architecture share the encoder, but have separate decoders. Since the decoder in this architecture is off, the deeper UNet cannot provide a supervisory signal to the shallower counterpart. Therefore, explicit deep supervision is required in the combination.
Another solution to overcome the above limitation is to use UNet e All hopping connections in the structure are removed, and a short hopping connection is usedTo connect each neighboring node in the set to form a nested structure called UNet +, so that gradient backpropagation will pass from the deeper decoder to the shallower corresponding node. This idea is almost simultaneously addressed by Yu et al (F.Yu, D.Wang, E.Shell, and T.Darrell, "Deep layer aggregation," in 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018), salt Lake City, UT, USA,2018, conference proceedings, pp.2403-2412.) and Zhou et al (Z.Zhou, M.M.R.Siddique, N.Tajbakhs, and J.Liang, "net +: A nested u-network architecture for the statistical Image segmentation," in 4th International works on Deep Learning in Medical Image Analysis (DLMIA 2018) Held in connection with MICCAI 2018, granada, SPAIN,2018, conference proceedings, pp.3-11), respectively.
Notably, each node in the UNet + architecture integrates the feature maps of its neighboring ancestors on different scales from a horizontal perspective in conjunction with the feature maps of their neighboring ancestors on the same scale from a vertical perspective. To ensure maximum information flow between unets of all different depths within the UNet + architecture, zhou et al also proposes a nested UNet architecture with dense hop connections, called UNet + +, whose decoders are densely connected in the same dimension from a horizontal perspective. Redesigned same-scale hopping connections make dense feature propagation more flexible, connecting all previous feature maps directly together.
Although convincing as a natural design, there is no solid theory to ensure that the same scale feature map is the best match for feature fusion. To utilize full scale features in image segmentation, huang et al (H.Huang, L.Lin, R.Tong, H.Hu, Q.Zhang, Y.Iwamoto, X.Han, Y. -W.Chen, and J.Wu, "Unet 3+: A full-scale connected equation for media segmentation," in 45th IEEE International Conference on optics, speed, and Signal Processing (ICASSP 2020), barcelona, SPAIN,2020, conference proceedings, 1055-1059.) propose UNet3+, which combines fine-grained low-level detailed feature maps with coarse-grained high-level feature maps of different scales. However, UNet3+ only partially redesigns the long hop connection between the encoder and decoder and the short hop connection within the decoder.
Although the use of different scale feature maps in a decoder using the UNet3+ architecture is much less restrictive than the use of the same scale feature maps in an encoder using UNet, UNet + and UNet + + architectures, there is still room for improvement.
Disclosure of Invention
In order to solve the above technical problem, the image semantic segmentation network adopted by the invention uses all full-scale and dense jump connections inside and between the encoder and the decoder, thereby forming the image semantic segmentation network (UNet 4+ architecture) of the invention.
The technical scheme adopted by the method is as follows: an image semantic segmentation method based on full-scale dense connection comprises the following steps:
step 1: preprocessing an image to be segmented, and cutting or filling the image to be segmented into a preset size;
and 2, step: realizing semantic segmentation of an image to be segmented by using an image semantic segmentation network;
the image semantic segmentation network comprises an encoder, a decoder, full-scale dense jump connection and full-scale deep supervision; the encoder consists of 5 coding convolution blocks, the 1st to 4th coding convolution blocks respectively comprise 2 convolution layers consisting of Conv, instanceNorm and LeakyReLU which are connected in sequence and 1 downsampling layer MaxPoint, and the 5th coding convolution block only comprises 2 convolution layers consisting of Conv, instanceNorm and LeakyReLU which are connected in sequence; the number of output channels of each coding convolution block is C, 2C, 4C, 8C and 16C respectively, the sizes of convolution kernels are 3 multiplied by 3, and the size of a maximum pooling kernel and the pooling step length are 2 multiplied by 2; the decoder is composed of 4 decoding convolution blocks, each decoding convolution block comprises 1 upsampling layer Biliner, 1 fusion layer Conscatenate and 2 convolution layers, all encoder feature maps or decoder feature maps positioned in front of the decoding block are cascaded together through full-scale dense skip connection, and the side output of each decoding convolution block is subjected to channel number alignment by 1 x 1 convolution layer, so that subsequent full-scale deep supervision is realized.
The technical scheme adopted by the system of the invention is as follows: an image semantic segmentation system based on full-scale dense connection comprises the following modules:
the module 1 is used for preprocessing an image to be segmented and cutting or filling the image to be segmented into a preset size;
the module 2 is used for realizing semantic segmentation of the image to be segmented by using an image semantic segmentation network;
the image semantic segmentation network comprises an encoder, a decoder, full-scale dense jump connection and full-scale deep supervision; the encoder consists of 5 coding convolution blocks, the 1st to 4th coding convolution blocks respectively comprise 2 convolution layers consisting of Conv, instanceNorm and LeakyReLU which are connected in sequence and 1 downsampling layer MaxPoint, and the 5th coding convolution block only comprises 2 convolution layers consisting of Conv, instanceNorm and LeakyReLU which are connected in sequence; the number of output channels of each coding convolution block is C, 2C, 4C, 8C and 16C respectively, the sizes of convolution kernels are 3 multiplied by 3, and the size of a maximum pooling kernel and the pooling step length are 2 multiplied by 2; the decoder is composed of 4 decoding convolution blocks, each decoding convolution block comprises 1 upsampling layer Biliner, 1 fusion layer Conscatenate and 2 convolution layers, all encoder feature maps or decoder feature maps positioned in front of the decoding block are cascaded together through full-scale dense skip connection, and the side output of each decoding convolution block is subjected to channel number alignment by 1 x 1 convolution layer, so that subsequent full-scale deep supervision is realized.
The technical scheme adopted by the equipment of the invention is as follows: an image semantic segmentation device based on full-scale dense connection comprises:
one or more processors;
a storage device for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the full-scale dense connectivity-based image semantic segmentation method.
The image semantic segmentation network (UNet 4 +) network provided by the invention has the following advantages:
(1) UNet4+ is connected by a direct hop between any two volume blocks, thereby alleviating the problem of gradient vanishing, which also maximizes information flow in the network.
(2) UNet4+ makes extensive use of feature concatenation, thereby enhancing feature propagation in the network.
(3) UNet4+ results in a more compact model and extreme feature reusability by aggregating a large number of feature maps in the network back-end volume block.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an image semantic segmentation network (UNet 4 +) according to an embodiment of the present invention.
Detailed Description
In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.
Referring to fig. 1, the image semantic segmentation method based on full-scale dense connection provided by the present invention includes the following steps:
step 1: preprocessing an image to be segmented, and cutting or filling the image to be segmented into a preset size;
in this embodiment, the image to be segmented may be read in grayscale or color, where the number of channels of the grayscale image is 1 and the number of channels of the color image is 3. The input image resolution may be any size and is cropped into an image block of 512 x 512 size. When the image is cut, the overlapping area of the adjacent image blocks is recommended to be not less than 5% so as to avoid that the tiny targets at the edges of the image blocks cannot be completely detected. If the input image resolution is less than 512 x 512, the image block boundaries are filled with the mirror image.
Step 2: realizing semantic segmentation of an image to be segmented by using an image semantic segmentation network;
referring to fig. 2, the image semantic segmentation network of the present embodiment includes an encoder, a decoder, full-scale dense jump connection, and full-scale deep supervision; wherein the encoder is composed of 5 convolutional blocks, each of the 1st to 4th convolutional blocks includes 2 convolutional layers (Conv → InstanceNorm → leakyreu) and 1 downsampling layer (MaxPooling), and the 5th convolutional block includes only 2 convolutional layers. The number of output channels of each convolution block is C, 2C, 4C, 8C and 16C respectively, the sizes of convolution kernels are 3 multiplied by 3, and the size of the maximum pooling kernel and the pooling step length are 2 multiplied by 2. The decoder consists of 4 convolution blocks, each convolution block comprises 1 upsampling layer (upsampling Biliner), 1 fusion layer (conditioner) and 2 convolution layers, all the codec characteristic diagrams (downsampling or upsampling is needed if necessary to ensure the consistent characteristic diagram dimension) positioned in front of the decoding block are cascaded together through full-scale dense skip connection, and the side output of each decoding convolution block is subjected to channel number alignment by 1 × 1 convolution layer, so that the subsequent full-scale deep supervision is realized.
The full-scale dense hop connection is redesigned in the image semantic segmentation network (UNet 4+ architecture) of the embodiment. Let node X i X for the output characteristic map i Where the superscript i is indexed along the downsampled layer of the encoder and N represents the depth of the network layer. The characteristic diagrams of the encoder side and the decoder side are respectively usedAndit can be expressed as:
and
wherein the content of the first and second substances,showing the layer of the convolution layer,the representation being formed of a plurality of successiveThe convolution layer is formed by convolution layers to form a convolution block,andrespectively representing a down-sampling layer and an up-sampling layer, the number of output channels of a node following each sampling layer being determined byAdjustment of the convolutional layer, symbol [. ]]Indicating a cascading operation.
As shown in FIG. 2, only one input passes through the encoder nodeEnter the UNet4+ architecture proposed in this embodiment and locate in the ith>Other encoder nodes of layer 1Only i-1 down-sampled inputs can be received from all upper nodes of the encoder. Is located at the ith<Decoder node of N layersN-i-1 upsampled inputs are received from the decoding side and N inputs (of which i-1 downsampled, 1 co-scale, N-i upsampled inputs) are received from the encoding side. The main reason for designing all previous signatures to be accumulated and concatenated to the current node is that this embodiment utilizes dense hop connections both between the encoder and decoder and within.
The present embodiment introduces two distinct full-scale deep supervision mechanisms in UNet4+ architecture.
Mechanism 1: with UNet e UNet + and UNet + + pairs of intermediate same-scale feature mapsInstead of performing deep supervision, the proposed UNet4+ produces a side output at each decoded volume block, similar to UNet3+, but with several subtle and important differences. This embodiment is implemented in the decoder nodeAndthe side output ends of the nodes are added with 1 up-sampling layer of bilinear interpolation, so that the output characteristic graphs of the nodes have AND nodesThe same spatial resolution. The 4 side outputs are then cascaded or summed pixel by pixel in the channel dimension, and a predicted image is output via 1 3 × 3 convolutional layer (Conv → Sigmoid) (the input of which is mapped to [0,1 ] by Sigmoid activation function]In between).
Mechanism 2: decoder nodeThe side outputs 1 up-sampling layer with bilinear interpolation and 1 convolution layer with 1 multiplied by 1, so that the output characteristic graph has the same nodeThe spatial resolution and the channel dimension are the same, and then multiplication or addition operation is carried out pixel by pixel; the fused feature map is output with nodes through 1 bilinear interpolation up-sampling layer and 1 x 1 convolution layerThe spatial resolution and the channel dimension are the same, and then multiplication or addition operation is carried out pixel by pixel; the fused feature graph is further processed by 1 bilinear interpolation upsampling layer and 1 convolution layer of 1 multiplied by 1, so that the output of the feature graph has a nodeThe same spatial resolution and channel dimensions, and then pixel-by-pixel multiplication or addition. Finally, a prediction image is output through 1 3 × 3 convolution layers (Conv → Sigmoid).
The image semantic segmentation network is a trained image semantic segmentation network; this embodiment defines a blended segmentation loss function that is optimized as a weighted average of the Binary Cross Entropy (BCE) loss, the Die Similarity Coefficient (DSC) loss, and the image average accuracy loss at different IoU thresholds.
The binary cross entropy loss of this example is defined as:
wherein y andand the prediction segmentation probability maps are respectively corresponding to the GT binary label and the model.
The die similarity factor loss of this embodiment is defined as:
wherein, y andand the prediction segmentation probability graphs are respectively corresponding to the GT binary label and the model.
The present embodiment also evaluates using image average precision values for different IoU thresholds t, ranging from 0.5 to 0.95, with a step size of 0.05 (i.e., 0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0.95). For example, below a threshold of 0.5, a predicted tag is considered a hit if the IoU of the GT tag is greater than 0.5. Therefore, the loss of image average accuracy of the present embodiment is defined as:
wherein, t is different IoU threshold values,to representThe prediction result at threshold t, | thresholds | is the total number of different IoU thresholds.
Finally, by combining all three loss terms, the mixed partition loss used in this embodiment is defined as:
in all experiments, the weighting factor α BCE 、α DSC And alpha mAP Set to 0.4, 0.2 and 0.4, respectively.
The present invention proposes to use all full-scale and dense hopping connections inside and between the encoder and decoder, thus forming the final UNet4+ architecture of the present embodiment. With full-scale and dense hop connections, each node in the encoder receives the intermediate aggregated feature map from encoders of different scales, while each node in the decoder receives the intermediate aggregated feature map not only from encoders and decoders of different scales, but also from encoders of the same scale. Thus, the aggregation layer in the decoder can learn to use all collected feature maps on the node. And UNet e In contrast, none, UNet + +, UNet3+, and the proposed UNet4+ architecture require explicit deep supervision.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (7)
1. A full-scale dense connection-based image semantic segmentation method is characterized by comprising the following steps:
step 1: preprocessing an image to be segmented, and cutting or filling the image to be segmented into a preset size;
step 2: realizing semantic segmentation of an image to be segmented by using an image semantic segmentation network;
the image semantic segmentation network comprises an encoder, a decoder, full-scale dense jump connection and full-scale deep supervision; the encoder consists of 5 coding convolution blocks, the 1st to 4th coding convolution blocks respectively comprise 2 convolution layers consisting of Conv, instanceNorm and LeakyReLU which are connected in sequence and 1 downsampling layer MaxPoint, and the 5th coding convolution block only comprises 2 convolution layers consisting of Conv, instanceNorm and LeakyReLU which are connected in sequence; the number of output channels of each coding convolution block is C, 2C, 4C, 8C and 16C respectively, the sizes of convolution kernels are 3 multiplied by 3, and the size of a maximum pooling kernel and the pooling step length are 2 multiplied by 2; the decoder consists of 4 decoding convolution blocks, each decoding convolution block comprises 1 upsampling layer Biliner, 1 fusion layer Conscatenate and 2 convolution layers, all the encoder feature diagrams or decoder feature diagrams positioned in front of the decoding block are cascaded together through full-scale dense skip connection, and the side output of each decoding convolution block is subjected to channel number alignment by 1 multiplied by 1 convolution layer, so that subsequent full-scale deep supervision is realized.
2. The full-scale dense connection-based image semantic segmentation method according to claim 1, characterized in that: in the step 1, if the resolution of the image to be segmented is larger than the preset size, the image to be segmented is segmented into image blocks with the preset size; and if the resolution of the image to be segmented is smaller than the preset size, filling the image block boundary by adopting mirror image, and filling the image block boundary into the image with the preset size.
3. The full-scale dense connection-based image of claim 1The semantic segmentation method is characterized by comprising the following steps: in step 2, the characteristic diagrams of the encoder end and the decoder end of the image semantic segmentation network are respectively usedAndrepresenting input by encoder nodesEnter the image semantic segmentation network and is positioned at the ith>Other encoder nodes of layer 1Only i-1 down-sampled inputs can be received from all upper nodes of the encoder; is located at the ith<Decoder node of N layersReceiving N-i-1 upsampled inputs from the decoding side and N inputs from the encoding side; wherein the superscript i is indexed along a downsampled layer of the encoder, and N represents the depth of the network layer;
the full-scale deep supervision is performed at a decoder nodeAndthe side output ends of the nodes are added with 1 up-sampling layer of bilinear interpolation, so that the output characteristic graphs of the nodes have AND nodesThe same spatial resolution; then, the 4 side outputs are cascaded or added pixel by pixel in channel dimension, and then 1 is composed of Conv and SigmoidThe 3 x 3 convolutional layer outputs a predicted image.
4. The full-scale dense connection-based image semantic segmentation method according to claim 1, characterized in that: in step 2, the characteristic diagrams of the encoder end and the decoder end of the image semantic segmentation network are respectively usedAndrepresenting input by encoder nodesEnter the image semantic segmentation network and is positioned at the ith>Other encoder nodes of layer 1Only i-1 down-sampled inputs can be received from all upper nodes of the encoder; is located at the ith<Decoder node of N layersReceiving N-i-1 upsampled inputs from the decoding side and N inputs from the encoding side; wherein the superscript i is indexed along a downsampled layer of the encoder, and N represents the depth of the network layer;
the full-scale deep supervision is performed at a decoder nodeThe side outputs 1 up-sampling layer with bilinear interpolation and 1 convolution layer with 1 multiplied by 1, so that the output characteristic graph has the same nodeThe spatial resolution and the channel dimension are the same, and then multiplication or addition operation is carried out pixel by pixel; the fused feature map is formed by 1An upsampled layer of bilinear interpolation and 1 convolutional layer of 1 × 1, with output having an AND nodeThe spatial resolution and the channel dimension are the same, and then multiplication or addition operation is carried out pixel by pixel; the fused feature graph is further processed by 1 bilinear interpolation upsampling layer and 1 convolution layer of 1 multiplied by 1, so that the output of the feature graph has a nodeThe spatial resolution and the channel dimension are the same, and then multiplication or addition operation is carried out pixel by pixel; finally, a predicted image is output through 1 3 × 3 convolutional layer composed of Conv and Sigmoid.
5. The method for semantically segmenting the image based on the full-scale dense connection according to any one of claims 1 to 4, wherein: the image semantic segmentation network is a trained image semantic segmentation network; the loss function adopted in the training is a mixed segmentation loss function which is a weighted average of binary cross entropy BCE loss, dice similarity coefficient DSC loss and image average precision loss under different IoU thresholds;
the binary cross-entropy BCE loss is defined as:
wherein y andthe image semantic segmentation network comprises GT binary labels and a prediction segmentation probability graph corresponding to the image semantic segmentation network;
the dice similarity coefficient DSC loss is defined as:
the average precision loss of the images under the different IoU thresholds is defined as:
wherein t is different IoU threshold values, the threshold value range is from 0.5 to 0.95, and the step length is 0.05;to representThe predicted result under the threshold t, | thresholds | is the total number of different IoU thresholds;
finally, by combining all three loss terms, a mixed partition loss is obtained as:
wherein alpha is BCE 、α DSC And alpha mAP Respectively, are weighting coefficients.
6. An image semantic segmentation system based on full-scale dense connection is characterized by comprising the following modules:
the module 1 is used for preprocessing an image to be segmented and cutting or filling the image to be segmented into a preset size;
the module 2 is used for realizing semantic segmentation of the image to be segmented by using an image semantic segmentation network;
the image semantic segmentation network comprises an encoder, a decoder, full-scale dense jump connection and full-scale deep supervision; the encoder consists of 5 coding convolution blocks, the 1st to 4th coding convolution blocks respectively comprise 2 convolution layers consisting of Conv, instanceNorm and LeakyReLU which are connected in sequence and 1 downsampling layer MaxPoint, and the 5th coding convolution block only comprises 2 convolution layers consisting of Conv, instanceNorm and LeakyReLU which are connected in sequence; the number of output channels of each coding convolution block is respectively C, 2C, 4C, 8C and 16C, the sizes of convolution kernels are all 3 multiplied by 3, and the size of the maximum pooling kernel and the pooling step length are all 2 multiplied by 2; the decoder is composed of 4 decoding convolution blocks, each decoding convolution block comprises 1 upsampling layer Biliner, 1 fusion layer Conscatenate and 2 convolution layers, all encoder feature maps or decoder feature maps positioned in front of the decoding block are cascaded together through full-scale dense skip connection, and the side output of each decoding convolution block is subjected to channel number alignment by 1 x 1 convolution layer, so that subsequent full-scale deep supervision is realized.
7. An image semantic segmentation device based on full-scale dense connection is characterized by comprising:
one or more processors;
storage means for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the full-scale dense-connectivity-based image semantic segmentation method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211229781.2A CN115601542B (en) | 2022-10-08 | 2022-10-08 | Image semantic segmentation method, system and equipment based on full-scale dense connection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211229781.2A CN115601542B (en) | 2022-10-08 | 2022-10-08 | Image semantic segmentation method, system and equipment based on full-scale dense connection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115601542A true CN115601542A (en) | 2023-01-13 |
CN115601542B CN115601542B (en) | 2023-07-21 |
Family
ID=84846535
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211229781.2A Active CN115601542B (en) | 2022-10-08 | 2022-10-08 | Image semantic segmentation method, system and equipment based on full-scale dense connection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115601542B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115909001A (en) * | 2023-03-09 | 2023-04-04 | 和普威视光电股份有限公司 | Target detection method and system fusing dense nested jump connection |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110490884A (en) * | 2019-08-23 | 2019-11-22 | 北京工业大学 | A kind of lightweight network semantic segmentation method based on confrontation |
US20190385021A1 (en) * | 2018-06-18 | 2019-12-19 | Drvision Technologies Llc | Optimal and efficient machine learning method for deep semantic segmentation |
US20200380695A1 (en) * | 2019-05-28 | 2020-12-03 | Zongwei Zhou | Methods, systems, and media for segmenting images |
CN113807355A (en) * | 2021-07-29 | 2021-12-17 | 北京工商大学 | Image semantic segmentation method based on coding and decoding structure |
CN114220098A (en) * | 2021-12-21 | 2022-03-22 | 一拓通信集团股份有限公司 | Improved multi-scale full-convolution network semantic segmentation method |
CN114283164A (en) * | 2022-03-02 | 2022-04-05 | 华南理工大学 | Breast cancer pathological section image segmentation prediction system based on UNet3+ |
CN114332117A (en) * | 2021-12-23 | 2022-04-12 | 杭州电子科技大学 | Post-earthquake landform segmentation method based on UNET3+ and full-connection condition random field fusion |
CN114677671A (en) * | 2022-02-18 | 2022-06-28 | 深圳大学 | Automatic identifying method for old ribs of preserved szechuan pickle based on multispectral image and deep learning |
CN114863274A (en) * | 2022-04-26 | 2022-08-05 | 北京市测绘设计研究院 | Surface green net thatch cover extraction method based on deep learning |
-
2022
- 2022-10-08 CN CN202211229781.2A patent/CN115601542B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190385021A1 (en) * | 2018-06-18 | 2019-12-19 | Drvision Technologies Llc | Optimal and efficient machine learning method for deep semantic segmentation |
US20200380695A1 (en) * | 2019-05-28 | 2020-12-03 | Zongwei Zhou | Methods, systems, and media for segmenting images |
CN110490884A (en) * | 2019-08-23 | 2019-11-22 | 北京工业大学 | A kind of lightweight network semantic segmentation method based on confrontation |
CN113807355A (en) * | 2021-07-29 | 2021-12-17 | 北京工商大学 | Image semantic segmentation method based on coding and decoding structure |
CN114220098A (en) * | 2021-12-21 | 2022-03-22 | 一拓通信集团股份有限公司 | Improved multi-scale full-convolution network semantic segmentation method |
CN114332117A (en) * | 2021-12-23 | 2022-04-12 | 杭州电子科技大学 | Post-earthquake landform segmentation method based on UNET3+ and full-connection condition random field fusion |
CN114677671A (en) * | 2022-02-18 | 2022-06-28 | 深圳大学 | Automatic identifying method for old ribs of preserved szechuan pickle based on multispectral image and deep learning |
CN114283164A (en) * | 2022-03-02 | 2022-04-05 | 华南理工大学 | Breast cancer pathological section image segmentation prediction system based on UNet3+ |
CN114863274A (en) * | 2022-04-26 | 2022-08-05 | 北京市测绘设计研究院 | Surface green net thatch cover extraction method based on deep learning |
Non-Patent Citations (6)
Title |
---|
HUANG等: "Unet 3+:A full-scale connected unet for medical image segmentation", 《ICASSP 2020》 * |
JUAN WANG等: "Image Semantic Segmentation Algorithm Based on Self-learning Super-Pixel Feature Extraction", 《EIDWT 2018》 * |
李万琦;李克俭;陈少波;: "多模态融合的高分遥感图像语义分割方法", 中南民族大学学报(自然科学版), no. 04 * |
田启川;孟颖;: "卷积神经网络图像语义分割技术", 小型微型计算机系统, no. 06 * |
郑凯;李建胜;: "基于深度神经网络的图像语义分割综述", 测绘与空间地理信息, no. 10 * |
马震环;高洪举;雷涛;: "基于增强特征融合解码器的语义分割算法", 计算机工程, no. 05 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115909001A (en) * | 2023-03-09 | 2023-04-04 | 和普威视光电股份有限公司 | Target detection method and system fusing dense nested jump connection |
Also Published As
Publication number | Publication date |
---|---|
CN115601542B (en) | 2023-07-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111210443B (en) | Deformable convolution mixing task cascading semantic segmentation method based on embedding balance | |
CN111325751B (en) | CT image segmentation system based on attention convolution neural network | |
CN110232394B (en) | Multi-scale image semantic segmentation method | |
CN111144329B (en) | Multi-label-based lightweight rapid crowd counting method | |
CN111179167B (en) | Image super-resolution method based on multi-stage attention enhancement network | |
CN110084234B (en) | Sonar image target identification method based on example segmentation | |
CN111091130A (en) | Real-time image semantic segmentation method and system based on lightweight convolutional neural network | |
CN114283164B (en) | Breast cancer pathological section image segmentation prediction system based on UNet3+ | |
CN115457498A (en) | Urban road semantic segmentation method based on double attention and dense connection | |
CN114549439A (en) | RGB-D image semantic segmentation method based on multi-modal feature fusion | |
CN113888547A (en) | Non-supervision domain self-adaptive remote sensing road semantic segmentation method based on GAN network | |
CN112883887B (en) | Building instance automatic extraction method based on high spatial resolution optical remote sensing image | |
CN114048822A (en) | Attention mechanism feature fusion segmentation method for image | |
CN116469100A (en) | Dual-band image semantic segmentation method based on Transformer | |
CN115601723A (en) | Night thermal infrared image semantic segmentation enhancement method based on improved ResNet | |
CN115601542B (en) | Image semantic segmentation method, system and equipment based on full-scale dense connection | |
CN116129291A (en) | Unmanned aerial vehicle animal husbandry-oriented image target recognition method and device | |
CN114693929A (en) | Semantic segmentation method for RGB-D bimodal feature fusion | |
CN117078930A (en) | Medical image segmentation method based on boundary sensing and attention mechanism | |
CN115527096A (en) | Small target detection method based on improved YOLOv5 | |
CN112149526A (en) | Lane line detection method and system based on long-distance information fusion | |
CN116630704A (en) | Ground object classification network model based on attention enhancement and intensive multiscale | |
CN115082928A (en) | Method for asymmetric double-branch real-time semantic segmentation of network for complex scene | |
CN112418229A (en) | Unmanned ship marine scene image real-time segmentation method based on deep learning | |
CN116542988A (en) | Nodule segmentation method, nodule segmentation device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |