CN110263833A - Based on coding-decoding structure image, semantic dividing method - Google Patents
Based on coding-decoding structure image, semantic dividing method Download PDFInfo
- Publication number
- CN110263833A CN110263833A CN201910503595.5A CN201910503595A CN110263833A CN 110263833 A CN110263833 A CN 110263833A CN 201910503595 A CN201910503595 A CN 201910503595A CN 110263833 A CN110263833 A CN 110263833A
- Authority
- CN
- China
- Prior art keywords
- convolution
- characteristic pattern
- size
- information
- conv
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
- G06V10/464—Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
Abstract
The invention discloses a kind of based on coding-decoding structure image, semantic dividing method.Its feature includes: to divide picture extraction feature set of graphs by improving the ResNet-101 network handles of structure first;Then multi-scale information capture is carried out to extracted characteristic pattern using multi-scale information Fusion Module;Also, the shallow-layer using extraction of spatial information module in ResNet-101 extracts spatial information abundant;Then, by after the fusion of the spatial information of the multi-scale information of deep layer and shallow-layer, fused characteristic pattern is refined using a multi-kernel convolution block;Finally, it operates to obtain segmentation result by the up-sampling of data dependence.Present invention is primarily directed to improve image segmentation accuracy, belong to technical field of image processing, especially suitable for medical image analysis, automatic Pilot, virtual reality, driver assistance, robot sensing, indoor environment reconstruction, unmanned plane etc..
Description
Technical field
The invention belongs to technical field of image processing, more particularly, to based on coding-decoding structure image, semantic segmentation side
Method, especially suitable for tasks such as medical image analysis, automatic Pilot, indoor environment reconstruction, unmanned planes.
Background technique
Semantic segmentation is an important research field in image procossing, and target is carried out to each pixel on image
Dense prediction simultaneously marks the classification for corresponding to object or region.With the continuous development of depth convolutional neural networks, especially entirely
The appearance of machine neural network is rolled up, semantic segmentation technology realizes qualitative leap.In order to further increase semantic segmentation as a result, various countries
Researcher sets out from different angles, has devised diversified model framework.
The phenomenon that spatial resolution caused by continuous down-sampling and pondization operate in order to prevent reduces, mentions in Chen et al.
In the PSPNet model that Deeplabv2, Deeplabv3 and Deeplabv3+ and Zhao out et al. are proposed, expansion is used
Convolution can effectively expand the receptive field of filter, reduce the loss of spatial detail.Moreover, coding-decoding structure can also
The phenomenon that prevent spatial information from losing.For example, the SegNet that Badrinarayanan et al. is proposed utilizes coding-decoding knot
Structure captures more spatial informations.In order to capture more spatial informations in shallow-layer, help model restores target detail,
DeepLabv3+ is added to a simple and effective decoder module in DeepLabv3 model.In addition to this, in Chao et al.
U-shaped structure is applied in the PAN model that the DFN and Li et al. people that GCN, Yu of proposition et al. are proposed are proposed, gradually to merge backbone
The characteristic pattern of different levels in net improves spatial resolution, makes up the loss of spatial detail.GCN is connect using " big core " to expand
By domain, spatial information is kept.
In order to capture richer multiple dimensioned contextual information, many work have been achieved for certain achievement.Deeplabv2
Expansive space pyramid pond module is proposed to capture multiple dimensioned contextual information.The OCNet model that Yuan et al. is proposed is logical
It crosses using pyramid object context or the spatial pyramid object context of expansion and captures multiple dimensioned contextual information.This
Outside, DenseASPP model use one group of expansion convolutional layer that Yang et al. is proposed generates Analysis On Multi-scale Features figure.Lin et al. is mentioned
The U-net that Refinenet and Ronneberger out et al. are proposed is using coding-decoding structure to the characteristic pattern of different levels
It is merged, obtains contextual information abundant.Byeon et al. is proposed a kind of based on mark based on two-dimentional LSTM network
The complex space dependence Capturing Models of label.In order to capture context dependency abundant, Shuai et al. on local feature
Devise the recurrent neural network of a directed acyclic graph.A row/column line is devised in the SPN model that Liu et al. people proposes
Spread through sex intercourse model, which can extract the pairs of relationship of the overall situation intensive in scene image.In the PSANet that Zhao et al. is proposed
In model, learn adaptive point by bidirectional information propagation to context.
Summary of the invention
To avoid shortcomings and deficiencies present in the prior art, the present invention proposes a kind of based on coding-decoding structure figure
As semantic segmentation method, to solve two challenges present in image, semantic segmentation task: 1) presence of multiple dimensioned object is led
Cause mistake classification;2) loss of spatial information causes wisp can not identify.
For achieving the above object, the present invention adopts the following technical scheme:
It is to carry out as follows the present invention is based on coding-decoding structure image, semantic dividing method
Step 1, production contain M picture data set, are divided into three subsets: training set, verifying collection and test set, wherein
Training set and verifying collection are accurately carried out Pixel-level mark;
Step 2 is trained to based on coding-decoding structure image, semantic parted pattern
Step 2.1 carries out data enhancing, i.e. Random Level overturning, the random rotation of 10 to -10 degree to training set picture first
Turn the random scaling with 0.5 to 2 times;
Step 2.2, by the data enhancing after training set picture X ∈ { x1, x2..., xnBe sent into and improve structure
To extract feature set of graphs E ∈ { e abundant in ResNet-101 backbone1, e2..., em};
Step 2.3, by the characteristic pattern E ∈ { e1, e2..., emIt is fed for multi-scale information Fusion Module, contained with capture
Distinction is strong and multi-scale information feature set of graphs T ∈ { t abundant1, t2..., ta};
Step 2.4, with spatial information trapping module, extracted from the shallow-layer of the ResNet-101 backbone for improving structure
Characteristic pattern Q ∈ { q with rich space information1, q2..., qd, with compensate improve structure ResNet-101 backbone in because
The loss of continuous pondization and down-sampling operation bring spatial resolution;
Step 2.5, by the characteristic pattern T ∈ { t containing abundant multi-scale information1, t2..., taAnd contain rich space
The characteristic pattern Q ∈ { q of information1, q2..., qdThe feature set of graphs P ∈ { p with abundant information is obtained after fusion1, p2...,
pz, recycle a multi-kernel convolution block fine-characterization figure P ∈ { p1, p2..., pz, then grasped by the up-sampling of data dependence
Image segmentation result is obtained, then, classifier is returned using Softmax and obtains output error, recycle and intersect entropy loss letter
Several pairs of results are assessed, and are finally trained using back-propagation algorithm optimization error, are obtained parted pattern;
Step 3 passes through step 2.1-2.5, after being trained using training set to described image semantic segmentation model, utilizes
Verifying collection carries out the model after training to assess its performance;
Step 4 is directed to test sample, and final image segmentation result figure can be obtained after step 2.2-2.5.
Based on coding-decoding structure image, semantic dividing method in the present invention, it is characterized in that the structure of improving
ResNet-101 backbone has the following structure:
The ResNet-101 backbone that improvement structure is arranged includes 5 groups of convolution: first group of convolution r1Containing core having a size of 7 × 7
And the convolution that number is 64, convolution step-length stride=2;Second group of convolution r2Containing core having a size of 2 × 2 and step-length be stride
=23 mutually isostructural convolutional layers of pond convolution sum, each convolutional layer is just like flowering structure: conv2_1Convolution kernel having a size of 1 ×
1 and number be 64, conv2_2Convolution kernel is having a size of 3 × 3 and number is 64, conv2_3Convolution kernel is having a size of 1 × 1 and number is
256;Third group convolution r3In contain 4 mutually isostructural convolutional layers, each convolutional layer is just like flowering structure: conv3_1Convolution kernel ruler
Very little is 1 × 1 and number is 128, conv3_2Convolution kernel is having a size of 3 × 3 and number is 128, conv3_3Convolution kernel is having a size of 1 × 1
And number is 512;4th group of convolution r4In contain 23 mutually isostructural convolutional layers, each convolution spreading rate rate=2, volume
Step-length stride=1 and each convolutional layer are accumulated just like flowering structure: conv4_1Convolution kernel is having a size of 1 × 1 and number is 256,
conv4_2Convolution kernel is having a size of 3 × 3 and number is 256, conv4_3Convolution kernel is having a size of 1 × 1 and number is 1024;5th group of volume
Product r5In contain 3 mutually isostructural Kronecker convolutional layers, interior expansion factor κ in each Kronecker convolution1=4 and interior
Portion sharing learning κ1=3 and each Kronecker convolutional layer just like flowering structure: conv5_1Convolution kernel is having a size of 1 × 1 and number is
512, conv5_2Convolution kernel is having a size of 3 × 3 and number is 512, conv5_3Convolution kernel is having a size of 1 × 1 and number is 2048.
Based on coding-decoding structure image, semantic dividing method in the present invention, it is characterized in that the multi-scale information
Fusion Module structure and to extract distinction strong and multi-scale information feature set of graphs abundant is to carry out as follows:
Setting multi-scale information Fusion Module has input layer, multi-scale information extract layer, output layer.Firstly, being mentioned from backbone
The characteristic pattern E ∈ { e taken1, e2..., emBe sent to containing batch normalization (BN), amendment linear unit (ReLU) and 1 × 1
The module of convolution is to reduce characteristic pattern number.Then, characteristic pattern is admitted to multi-scale information extract layer and extracts multi-scale information.It is more
Dimensional information extract layer contains the main road of three parallel constructions, and every main road contains a Kronecker convolution block, each Crow
Interior gram of convolution block is made of Kronecker convolution, BN and ReLU.Different Kronecker convolution contain different interior expansions because
Son and the intra-sharing factor capture relatively rich dimensional information to expand receptive field to the greatest extent.In addition, there is three parallel junctions
The branch of structure, every branch contain identical global attention power module.The overall situation pay attention to power module by the average pond layer of the overall situation with
Sigmoid activation primitive composition.Notice that power module generates using global and pay attention to force vector to being extracted by Kronecker convolution block
Characteristic pattern containing multi-scale information carries out recalibration, and to select, distinction is strong and multi-scale information characteristic pattern abundant.Benefit
The characteristic pattern selected from three main roads is carried out with three 1 × 1 convolution to reduce channel processing, to reduce complicated calculations and section
It saves time.Characteristic pattern in final three main roads is fused together, and exports new feature set of graphs T ∈ { t1, t2..., ta}。
Based on coding-decoding structure image, semantic dividing method in the present invention, it is characterized in that the spatial information is caught
Obtaining modular structure and extracting the feature set of graphs of rich space information is to carry out as follows:
Spatial information trapping module contains three branches, and every branch contains 1 × 1 convolution to reduce characteristic pattern number.
The characteristic pattern G ∈ { g obtained from second group of convolution of the ResNet-101 backbone for improving structure1, g2..., glPass through three 1 × 1
Process of convolution after obtain three new feature set of graphs Ξ ∈ { μ1, μ2..., μs,With ξ ∈ { η1,
η2..., ηk, wherein Ξ, Ψ have carried out matrix multiplication after being deformed respectively, calculate space note using Softmax operation later
Meaning force vectorUtilize calculated space transforms force vectorCharacteristic pattern ξ is marked again in the enterprising feature of Spatial Dimension, and is introduced
One scale factor carrys out guidance model and gradually learns for the weight of regional area to be assigned to global position, and final output contains abundant sky
Between information feature set of graphs Q ∈ { q1, q2..., qd}。
In formula (1),It is expressed as influence of the position i to position j,For scale parameter, it is initialized as 0.
Based on coding-decoding structure image, semantic dividing method in the present invention, feature lies also in the multi-kernel convolution
Block has the following structure:
Two convolution parallel connections, convolution kernel size is respectively 3 × 3 and 5 × 5.
Detailed description of the invention
Fig. 1 is overall construction drawing schematic diagram of the invention;
Fig. 2 is the multi-scale information Fusion Module schematic diagram designed in the present invention;
Fig. 3 is the spatial information trapping module schematic diagram designed in the present invention;
Fig. 4 is the part sample image schematic diagram of emulation experiment of the present invention output.
Specific embodiment
Clear, complete description is carried out to technical solution of the present invention below in conjunction with attached drawing.Based on volume in the present embodiment
The image, semantic dividing method of code-decoding structure is to carry out as follows:
Step 1, production contain M picture data set, are divided into three subsets: training set, verifying collection and test set, wherein
Training set and verifying collection are accurately carried out Pixel-level mark;
Step 2 is trained to based on coding-decoding structure image, semantic parted pattern
Step 2.1 carries out data enhancing, i.e. Random Level overturning, the random rotation of 10 to -10 degree to training set picture first
Turn the random scaling with 0.5 to 2 times;
Step 2.2, by the data enhancing after training set picture X ∈ { x1, x2..., xnBe sent into and improve structure
To extract feature set of graphs E ∈ { e abundant in ResNet-101 backbone1, e2..., em, as shown in Figure 1;
Step 2.3, by the characteristic pattern E ∈ { e1, e2..., emIt is fed for multi-scale information Fusion Module, contained with capture
Distinction is strong and multi-scale information feature set of graphs T ∈ { t abundant1, t2..., ta, as shown in Figure 2;
Step 2.4, with spatial information trapping module, extracted from the shallow-layer of the ResNet-101 backbone for improving structure
Characteristic pattern Q ∈ { q with rich space information1, q2..., qd, with compensate improve structure ResNet-101 backbone in because
The loss of continuous pondization and down-sampling operation bring spatial resolution, as shown in Figure 3;
Step 2.5, by the characteristic pattern T ∈ { t containing abundant multi-scale information1, t2..., taAnd contain rich space
The characteristic pattern Q ∈ { q of information1, q2..., qdThe feature set of graphs P ∈ { p with abundant information is obtained after fusion1, p2...,
pz, recycle a multi-kernel convolution block fine-characterization figure P ∈ { p1, p2..., pz, then grasped by the up-sampling of data dependence
Image segmentation result is obtained, then, classifier is returned using Softmax and obtains output error, recycle and intersect entropy loss letter
Several pairs of results are assessed, and are finally trained using back-propagation algorithm optimization error, are obtained parted pattern;
Step 3 passes through step 2.1-2.5, after being trained using training set to described image semantic segmentation model, utilizes
Verifying collection carries out the model after training to assess its performance;
Step 4 is directed to test sample, final image segmentation result figure can be obtained after step 2.2-2.5, as shown in Figure 4.
It is had the following structure in the present embodiment for the ResNet-101 backbone for improving structure:
The ResNet-101 backbone that improvement structure is arranged includes 5 groups of convolution: first group of convolution r1Containing core having a size of 7 × 7
And the convolution that number is 64, convolution step-length stride=2;Second group of convolution r2Containing core having a size of 2 × 2 and step-length be stride
=23 mutually isostructural convolutional layers of pond convolution sum, each convolutional layer is just like flowering structure: conv2_1Convolution kernel having a size of 1 ×
1 and number be 64, conv2_2Convolution kernel is having a size of 3 × 3 and number is 64, conv2_3Convolution kernel is having a size of 1 × 1 and number is
256;Third group convolution r3In contain 4 mutually isostructural convolutional layers, each convolutional layer is just like flowering structure: conv3_1Convolution kernel ruler
Very little is 1 × 1 and number is 128, conv3_2Convolution kernel is having a size of 3 × 3 and number is 128, conv3_3Convolution kernel is having a size of 1 × 1
And number is 512;4th group of convolution r4In contain 23 mutually isostructural convolutional layers, each convolution spreading rate rate=2, volume
Step-length stride=1 and each convolutional layer are accumulated just like flowering structure: conv4_1Convolution kernel is having a size of 1 × 1 and number is 256,
conv4_2Convolution kernel is having a size of 3 × 3 and number is 256, conv4_3Convolution kernel is having a size of 1 × 1 and number is 1024;5th group of volume
Product r5In contain 3 mutually isostructural Kronecker convolutional layers, interior expansion factor κ in each Kronecker convolution1=4 and interior
Portion sharing learning κ1=3 and each Kronecker convolutional layer just like flowering structure: conv5_1Convolution kernel is having a size of 1 × 1 and number is
512, conv5_2Convolution kernel is having a size of 3 × 3 and number is 512, conv5_3Convolution kernel is having a size of 1 × 1 and number is 2048.
It is directed to the multi-scale information Fusion Module structure in the present embodiment and extraction distinction is strong and multi-scale information is rich
Rich feature set of graphs is to carry out as follows:
As shown in Fig. 2, setting multi-scale information Fusion Module has input layer, multi-scale information extract layer, output layer.It is first
First, the characteristic pattern E ∈ { e extracted from backbone1, e2..., emBe sent to containing batch normalization (BN), amendment linear unit
(ReLU) and 1 × 1 convolution module to reduce characteristic pattern number.Then, characteristic pattern is admitted to multi-scale information extract layer and mentions
Take multi-scale information.Multi-scale information extract layer contains the main road of three parallel constructions, and every main road contains a Kronecker
Convolution block, each Kronecker convolution block are made of Kronecker convolution, BN and ReLU.Different Kronecker convolution contain
The different interior expansion factors and the intra-sharing factor captures relatively rich dimensional information to expand receptive field to the greatest extent.
In addition, there is the branch of three parallel constructions, every branch contains identical global attention power module.The overall situation pays attention to power module by complete
The average pond layer of office and Sigmoid activation primitive composition.Noticing that power module generates using the overall situation pays attention to force vector to by Crow
The characteristic pattern containing multi-scale information that gram convolution block extracts carries out recalibration, and to select, distinction is strong and multi-scale information is rich
Rich characteristic pattern.The characteristic pattern selected from three main roads is carried out using three 1 × 1 convolution to reduce channel processing, to subtract
Lack complicated calculations and saves the time.Characteristic pattern in final three main roads is fused together, the new feature set of graphs T of final output
∈{t1, t2..., ta}。
For the spatial information trapping module structure and the feature set of graphs of extraction rich space information in the present embodiment
It is to carry out as follows:
As shown in figure 3, spatial information trapping module contains three branches, every branch contains 1 × 1 convolution to reduce spy
Levy figure number.The characteristic pattern G ∈ { g obtained from second group of convolution of the ResNet-101 backbone for improving structure1, g2..., glWarp
Three new feature set of graphs Ξ ∈ { μ are obtained after crossing three 1 × 1 process of convolution1, μ2..., μs,And ξ
∈{η1, η2..., ηk, wherein Ξ, Ψ have carried out matrix multiplication after being deformed respectively, are operated using Softmax later to calculate
Space transforms force vectorUtilize calculated space transforms force vectorCharacteristic pattern ξ is marked again in the enterprising feature of Spatial Dimension,
And gradually learn the weight of regional area being assigned to global position come guidance model using a scale factor, final output contains
The feature set of graphs Q ∈ { q of rich space information1, q2..., qd}。
In formula (1),It is expressed as influence of the position i to position j,For scale parameter, it is initialized as 0.
It is had the following structure in the present embodiment for the multi-kernel convolution block:
Two convolution parallel connections, convolution kernel size is respectively 3 × 3 and 5 × 5.
Claims (5)
1. it is a kind of based on coding-decoding structure image, semantic dividing method, it is characterized in that carrying out as follows:
Step 1, production contain M picture data set, are divided into three subsets: training set, verifying collection and test set, wherein training
Collection and verifying collection are accurately carried out Pixel-level mark;
Step 2 is trained to based on coding-decoding structure image, semantic parted pattern
Step 2.1, first to training set picture carry out data enhancing, i.e., Random Level overturning, 10 to -10 spend Random-Rotations and
0.5 to 2 times of random scaling;
Step 2.2, by the data enhancing after training set picture X ∈ { x1, x2..., xnIt is sent into the ResNet-101 for improving structure
To extract feature set of graphs E ∈ { e abundant in backbone1, e2..., em};
Step 2.3, by the characteristic pattern E ∈ { e1, e2..., emIt is fed for multi-scale information Fusion Module, differentiation is contained with capture
Power is strong and multi-scale information feature set of graphs T ∈ { t abundant1, t2..., ta};
Step 2.4, with spatial information trapping module, have from the shallow-layer extraction of the ResNet-101 backbone for improving structure
The characteristic pattern Q ∈ { q of rich space information1, q2..., qd, to compensate in the ResNet-101 backbone for improving structure because continuous
Pondization and down-sampling operation bring spatial resolution loss;
Step 2.5, by the characteristic pattern T ∈ { t containing abundant multi-scale information1, t2..., taAnd contain rich space information
Characteristic pattern Q ∈ { q1, q2..., qdThe feature set of graphs P ∈ { p with abundant information is obtained after fusion1, p2..., pz, then benefit
With a multi-kernel convolution block fine-characterization figure P ∈ { p1, p2..., pz, then operate to obtain figure by the up-sampling of data dependence
As segmentation result, then, classifier is returned using Softmax and obtains output error, recycles cross entropy loss function to result
It is assessed, is finally trained using back-propagation algorithm optimization error, obtains parted pattern;
Step 3 utilizes verifying after being trained using training set to described image semantic segmentation model by step 2.1-2.5
Collection carries out the model after training to assess its performance;
Step 4 is directed to test sample, and final image segmentation result figure can be obtained after step 2.2-2.5.
2. according to claim 1 based on coding-decoding structure image, semantic dividing method, characterized in that
The ResNet-101 backbone for improving structure has the following structure:
The ResNet-101 backbone that improvement structure is arranged includes 5 groups of convolution: first group of convolution r1Containing core having a size of 7 × 7 and number
For 64 convolution, convolution step-length stride=2;Second group of convolution r2Containing core having a size of 2 × 2 and step-length is stride=2
3 mutually isostructural convolutional layers of pond convolution sum, each convolutional layer is just like flowering structure: conv2_1Convolution kernel is having a size of 1 × 1 and a
Number is 64, conv2_2Convolution kernel is having a size of 3 × 3 and number is 64, conv2_3Convolution kernel is having a size of 1 × 1 and number is 256;The
Three groups of convolution r3In contain 4 mutually isostructural convolutional layers, each convolutional layer is just like flowering structure: conv3_1Convolution kernel is having a size of 1
× 1 and number be 128, conv3_2Convolution kernel is having a size of 3 × 3 and number is 128, conv3_3Convolution kernel is having a size of 1 × 1 and number
It is 512;4th group of convolution r4In contain 23 mutually isostructural convolutional layers, each convolution spreading rate rate=2, convolution step-length
Stride=1 and each convolutional layer is just like flowering structure: conv4_1Convolution kernel is having a size of 1 × 1 and number is 256, conv4_2Convolution
Core is having a size of 3 × 3 and number is 256, conv4_3Convolution kernel is having a size of 1 × 1 and number is 1024;5th group of convolution r5In contain
3 mutually isostructural Kronecker convolutional layers, interior expansion factor κ in each Kronecker convolution1=4 and the intra-sharing factor
κ1=3 and each Kronecker convolutional layer just like flowering structure: conv5_1Convolution kernel is having a size of 1 × 1 and number is 512, conv5_2
Convolution kernel is having a size of 3 × 3 and number is 512, conv5_3Convolution kernel is having a size of 1 × 1 and number is 2048.
3. according to claim 1 based on coding-decoding structure image, semantic dividing method, characterized in that
The multi-scale information Fusion Module structure and extract that distinction is strong and multi-scale information feature set of graphs abundant be by
Following steps carry out:
Setting multi-scale information Fusion Module has input layer, multi-scale information extract layer, output layer.Firstly, extracted from backbone
Characteristic pattern E ∈ { e1, e2..., emIt is sent to the convolution for normalizing (BN) containing batch, correcting linear unit (ReLU) and 1 × 1
Module to reduce characteristic pattern number.Then, characteristic pattern is admitted to multi-scale information extract layer and extracts multi-scale information.It is multiple dimensioned
Main road of the information extraction layer containing three parallel constructions, every main road contain a Kronecker convolution block, each Kronecker
Convolution block is made of Kronecker convolution, BN and ReLU.Different Kronecker convolution contain the different interior expansion factor and
The intra-sharing factor captures relatively rich dimensional information to expand receptive field to the greatest extent.In addition, there is three parallel constructions
Branch, every branch contain identical global attention power module.The overall situation pays attention to power module by the average pond layer of the overall situation and Sigmoid
Activation primitive composition.Notice that power module generates attention force vector and contains more rulers to what is extracted by Kronecker convolution block using global
The characteristic pattern for spending information carries out recalibration, and to select, distinction is strong and multi-scale information characteristic pattern abundant.Using three 1 ×
1 convolution carries out the characteristic pattern selected from three main roads to reduce channel processing, to reduce complicated calculations and save the time.Most
The characteristic pattern in three main roads is fused together eventually, exports new feature set of graphs T ∈ { t1, t2..., ta}。
4. according to claim 1 based on coding-decoding structure image, semantic dividing method, characterized in that
The spatial information trapping module structure and the feature set of graphs for extracting rich space information are to carry out as follows:
Spatial information trapping module contains three branches, and every branch contains 1 × 1 convolution to reduce characteristic pattern number.From changing
The characteristic pattern G ∈ { g obtained into second group of convolution of the ResNet-101 backbone of structure1, g2..., glBy three 1 × 1 volumes
Three new feature set of graphs Ξ ∈ { μ are obtained after product processing1, μ2..., μs,With ξ ∈ { η1, η2...,
ηk, wherein Ξ, Ψ have carried out matrix multiplication after being deformed respectively, have calculated spatial attention using Softmax operation later
VectorUtilize calculated space transforms force vectorCharacteristic pattern ξ is marked again in the enterprising feature of Spatial Dimension, and introduces one
Scale factor carrys out guidance model and gradually learns for the weight of regional area to be assigned to global position, and final output contains rich space letter
The feature set of graphs Q ∈ { q of breath1, q2..., qd}。
In formula (1),It is expressed as influence of the position i to position j,For scale parameter, it is initialized as 0.
5. according to claim 1 based on coding-decoding structure image, semantic dividing method, characterized in that
The multi-kernel convolution block has the following structure:
Two convolution parallel connections, convolution kernel size is respectively 3 × 3 and 5 × 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910503595.5A CN110263833A (en) | 2019-06-03 | 2019-06-03 | Based on coding-decoding structure image, semantic dividing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910503595.5A CN110263833A (en) | 2019-06-03 | 2019-06-03 | Based on coding-decoding structure image, semantic dividing method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110263833A true CN110263833A (en) | 2019-09-20 |
Family
ID=67917688
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910503595.5A Pending CN110263833A (en) | 2019-06-03 | 2019-06-03 | Based on coding-decoding structure image, semantic dividing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110263833A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110991617A (en) * | 2019-12-02 | 2020-04-10 | 华东师范大学 | Construction method of kaleidoscope convolution network |
CN111127470A (en) * | 2019-12-24 | 2020-05-08 | 江西理工大学 | Image semantic segmentation method based on context and shallow space coding and decoding network |
CN111242288A (en) * | 2020-01-16 | 2020-06-05 | 浙江工业大学 | Multi-scale parallel deep neural network model construction method for lesion image segmentation |
CN111325093A (en) * | 2020-01-15 | 2020-06-23 | 北京字节跳动网络技术有限公司 | Video segmentation method and device and electronic equipment |
CN111369582A (en) * | 2020-03-06 | 2020-07-03 | 腾讯科技(深圳)有限公司 | Image segmentation method, background replacement method, device, equipment and storage medium |
CN111373439A (en) * | 2020-02-10 | 2020-07-03 | 香港应用科技研究院有限公司 | Method for image segmentation using CNN |
CN111461130A (en) * | 2020-04-10 | 2020-07-28 | 视研智能科技(广州)有限公司 | High-precision image semantic segmentation algorithm model and segmentation method |
CN111627055A (en) * | 2020-05-07 | 2020-09-04 | 浙江大学 | Scene depth completion method based on semantic segmentation |
CN111860386A (en) * | 2020-07-27 | 2020-10-30 | 山东大学 | Video semantic segmentation method based on ConvLSTM convolutional neural network |
CN112287940A (en) * | 2020-10-30 | 2021-01-29 | 西安工程大学 | Semantic segmentation method of attention mechanism based on deep learning |
CN112489061A (en) * | 2020-12-09 | 2021-03-12 | 浙江工业大学 | Deep learning intestinal polyp segmentation method based on multi-scale information and parallel attention mechanism |
CN112634289A (en) * | 2020-12-28 | 2021-04-09 | 华中科技大学 | Rapid feasible domain segmentation method based on asymmetric void convolution |
CN112734715A (en) * | 2020-12-31 | 2021-04-30 | 同济大学 | Lung nodule segmentation method of lung CT image |
CN112967294A (en) * | 2021-03-11 | 2021-06-15 | 西安智诊智能科技有限公司 | Liver CT image segmentation method and system |
CN113256609A (en) * | 2021-06-18 | 2021-08-13 | 四川大学 | CT picture cerebral hemorrhage automatic check out system based on improved generation Unet |
CN113392783A (en) * | 2021-06-18 | 2021-09-14 | 河南科技学院 | Improved ResNet-based transparent window object detection method |
CN113658200A (en) * | 2021-07-29 | 2021-11-16 | 东北大学 | Edge perception image semantic segmentation method based on self-adaptive feature fusion |
CN114140472A (en) * | 2022-02-07 | 2022-03-04 | 湖南大学 | Cross-level information fusion medical image segmentation method |
CN115423810A (en) * | 2022-11-04 | 2022-12-02 | 国网江西省电力有限公司电力科学研究院 | Blade icing form analysis method for wind generating set |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108062754A (en) * | 2018-01-19 | 2018-05-22 | 深圳大学 | Segmentation, recognition methods and device based on dense network image |
CN108090565A (en) * | 2018-01-16 | 2018-05-29 | 电子科技大学 | Accelerated method is trained in a kind of convolutional neural networks parallelization |
-
2019
- 2019-06-03 CN CN201910503595.5A patent/CN110263833A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108090565A (en) * | 2018-01-16 | 2018-05-29 | 电子科技大学 | Accelerated method is trained in a kind of convolutional neural networks parallelization |
CN108062754A (en) * | 2018-01-19 | 2018-05-22 | 深圳大学 | Segmentation, recognition methods and device based on dense network image |
Non-Patent Citations (1)
Title |
---|
TIANYI WU等: "Tree-structured Kronecker Convolutional Network for Semantic Segmentation", 《ARXIV》 * |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110991617A (en) * | 2019-12-02 | 2020-04-10 | 华东师范大学 | Construction method of kaleidoscope convolution network |
CN110991617B (en) * | 2019-12-02 | 2020-12-01 | 华东师范大学 | Construction method of kaleidoscope convolution network |
CN111127470A (en) * | 2019-12-24 | 2020-05-08 | 江西理工大学 | Image semantic segmentation method based on context and shallow space coding and decoding network |
CN111127470B (en) * | 2019-12-24 | 2023-06-16 | 江西理工大学 | Image semantic segmentation method based on context and shallow space coding and decoding network |
CN111325093A (en) * | 2020-01-15 | 2020-06-23 | 北京字节跳动网络技术有限公司 | Video segmentation method and device and electronic equipment |
CN111242288B (en) * | 2020-01-16 | 2023-06-27 | 浙江工业大学 | Multi-scale parallel deep neural network model construction method for lesion image segmentation |
CN111242288A (en) * | 2020-01-16 | 2020-06-05 | 浙江工业大学 | Multi-scale parallel deep neural network model construction method for lesion image segmentation |
CN111373439A (en) * | 2020-02-10 | 2020-07-03 | 香港应用科技研究院有限公司 | Method for image segmentation using CNN |
CN111373439B (en) * | 2020-02-10 | 2023-05-02 | 香港应用科技研究院有限公司 | Method for image segmentation using CNN |
CN111369582A (en) * | 2020-03-06 | 2020-07-03 | 腾讯科技(深圳)有限公司 | Image segmentation method, background replacement method, device, equipment and storage medium |
CN111369582B (en) * | 2020-03-06 | 2023-04-07 | 腾讯科技(深圳)有限公司 | Image segmentation method, background replacement method, device, equipment and storage medium |
CN111461130A (en) * | 2020-04-10 | 2020-07-28 | 视研智能科技(广州)有限公司 | High-precision image semantic segmentation algorithm model and segmentation method |
CN111627055A (en) * | 2020-05-07 | 2020-09-04 | 浙江大学 | Scene depth completion method based on semantic segmentation |
CN111627055B (en) * | 2020-05-07 | 2023-11-24 | 浙江大学 | Scene depth completion method combining semantic segmentation |
CN111860386A (en) * | 2020-07-27 | 2020-10-30 | 山东大学 | Video semantic segmentation method based on ConvLSTM convolutional neural network |
CN111860386B (en) * | 2020-07-27 | 2022-04-08 | 山东大学 | Video semantic segmentation method based on ConvLSTM convolutional neural network |
CN112287940A (en) * | 2020-10-30 | 2021-01-29 | 西安工程大学 | Semantic segmentation method of attention mechanism based on deep learning |
CN112489061B (en) * | 2020-12-09 | 2024-04-16 | 浙江工业大学 | Deep learning intestinal polyp segmentation method based on multi-scale information and parallel attention mechanism |
CN112489061A (en) * | 2020-12-09 | 2021-03-12 | 浙江工业大学 | Deep learning intestinal polyp segmentation method based on multi-scale information and parallel attention mechanism |
CN112634289B (en) * | 2020-12-28 | 2022-05-27 | 华中科技大学 | Rapid feasible domain segmentation method based on asymmetric void convolution |
CN112634289A (en) * | 2020-12-28 | 2021-04-09 | 华中科技大学 | Rapid feasible domain segmentation method based on asymmetric void convolution |
CN112734715A (en) * | 2020-12-31 | 2021-04-30 | 同济大学 | Lung nodule segmentation method of lung CT image |
CN112967294A (en) * | 2021-03-11 | 2021-06-15 | 西安智诊智能科技有限公司 | Liver CT image segmentation method and system |
CN113392783B (en) * | 2021-06-18 | 2022-11-01 | 河南科技学院 | Improved ResNet-based transparent window object detection method |
CN113256609B (en) * | 2021-06-18 | 2021-09-21 | 四川大学 | CT picture cerebral hemorrhage automatic check out system based on improved generation Unet |
CN113392783A (en) * | 2021-06-18 | 2021-09-14 | 河南科技学院 | Improved ResNet-based transparent window object detection method |
CN113256609A (en) * | 2021-06-18 | 2021-08-13 | 四川大学 | CT picture cerebral hemorrhage automatic check out system based on improved generation Unet |
CN113658200A (en) * | 2021-07-29 | 2021-11-16 | 东北大学 | Edge perception image semantic segmentation method based on self-adaptive feature fusion |
CN113658200B (en) * | 2021-07-29 | 2024-01-02 | 东北大学 | Edge perception image semantic segmentation method based on self-adaptive feature fusion |
CN114140472A (en) * | 2022-02-07 | 2022-03-04 | 湖南大学 | Cross-level information fusion medical image segmentation method |
CN115423810A (en) * | 2022-11-04 | 2022-12-02 | 国网江西省电力有限公司电力科学研究院 | Blade icing form analysis method for wind generating set |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110263833A (en) | Based on coding-decoding structure image, semantic dividing method | |
CN110298387A (en) | Incorporate the deep neural network object detection method of Pixel-level attention mechanism | |
CN107564025A (en) | A kind of power equipment infrared image semantic segmentation method based on deep neural network | |
CN109902798A (en) | The training method and device of deep neural network | |
CN108564097A (en) | A kind of multiscale target detection method based on depth convolutional neural networks | |
CN109815785A (en) | A kind of face Emotion identification method based on double-current convolutional neural networks | |
CN107818302A (en) | Non-rigid multiple dimensioned object detecting method based on convolutional neural networks | |
CN110111366A (en) | A kind of end-to-end light stream estimation method based on multistage loss amount | |
CN106446930A (en) | Deep convolutional neural network-based robot working scene identification method | |
CN106960206A (en) | Character identifying method and character recognition system | |
CN108256426A (en) | A kind of facial expression recognizing method based on convolutional neural networks | |
CN107506722A (en) | One kind is based on depth sparse convolution neutral net face emotion identification method | |
CN109800628A (en) | A kind of network structure and detection method for reinforcing SSD Small object pedestrian detection performance | |
CN105160310A (en) | 3D (three-dimensional) convolutional neural network based human body behavior recognition method | |
CN106981080A (en) | Night unmanned vehicle scene depth method of estimation based on infrared image and radar data | |
CN111626176B (en) | Remote sensing target rapid detection method and system based on dynamic attention mechanism | |
CN107679462A (en) | A kind of depth multiple features fusion sorting technique based on small echo | |
CN108629288A (en) | A kind of gesture identification model training method, gesture identification method and system | |
CN107085723A (en) | A kind of characters on license plate global recognition method based on deep learning model | |
CN107967474A (en) | A kind of sea-surface target conspicuousness detection method based on convolutional neural networks | |
CN108122003A (en) | A kind of Weak target recognition methods based on deep neural network | |
CN106372597A (en) | CNN traffic detection method based on adaptive context information | |
CN112288776B (en) | Target tracking method based on multi-time step pyramid codec | |
CN109492618A (en) | Object detection method and device based on grouping expansion convolutional neural networks model | |
CN111160294A (en) | Gait recognition method based on graph convolution network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190920 |