CN110263833A - Based on coding-decoding structure image, semantic dividing method - Google Patents

Based on coding-decoding structure image, semantic dividing method Download PDF

Info

Publication number
CN110263833A
CN110263833A CN201910503595.5A CN201910503595A CN110263833A CN 110263833 A CN110263833 A CN 110263833A CN 201910503595 A CN201910503595 A CN 201910503595A CN 110263833 A CN110263833 A CN 110263833A
Authority
CN
China
Prior art keywords
convolution
characteristic pattern
size
information
conv
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910503595.5A
Other languages
Chinese (zh)
Inventor
韩慧慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201910503595.5A priority Critical patent/CN110263833A/en
Publication of CN110263833A publication Critical patent/CN110263833A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations

Abstract

The invention discloses a kind of based on coding-decoding structure image, semantic dividing method.Its feature includes: to divide picture extraction feature set of graphs by improving the ResNet-101 network handles of structure first;Then multi-scale information capture is carried out to extracted characteristic pattern using multi-scale information Fusion Module;Also, the shallow-layer using extraction of spatial information module in ResNet-101 extracts spatial information abundant;Then, by after the fusion of the spatial information of the multi-scale information of deep layer and shallow-layer, fused characteristic pattern is refined using a multi-kernel convolution block;Finally, it operates to obtain segmentation result by the up-sampling of data dependence.Present invention is primarily directed to improve image segmentation accuracy, belong to technical field of image processing, especially suitable for medical image analysis, automatic Pilot, virtual reality, driver assistance, robot sensing, indoor environment reconstruction, unmanned plane etc..

Description

Based on coding-decoding structure image, semantic dividing method
Technical field
The invention belongs to technical field of image processing, more particularly, to based on coding-decoding structure image, semantic segmentation side Method, especially suitable for tasks such as medical image analysis, automatic Pilot, indoor environment reconstruction, unmanned planes.
Background technique
Semantic segmentation is an important research field in image procossing, and target is carried out to each pixel on image Dense prediction simultaneously marks the classification for corresponding to object or region.With the continuous development of depth convolutional neural networks, especially entirely The appearance of machine neural network is rolled up, semantic segmentation technology realizes qualitative leap.In order to further increase semantic segmentation as a result, various countries Researcher sets out from different angles, has devised diversified model framework.
The phenomenon that spatial resolution caused by continuous down-sampling and pondization operate in order to prevent reduces, mentions in Chen et al. In the PSPNet model that Deeplabv2, Deeplabv3 and Deeplabv3+ and Zhao out et al. are proposed, expansion is used Convolution can effectively expand the receptive field of filter, reduce the loss of spatial detail.Moreover, coding-decoding structure can also The phenomenon that prevent spatial information from losing.For example, the SegNet that Badrinarayanan et al. is proposed utilizes coding-decoding knot Structure captures more spatial informations.In order to capture more spatial informations in shallow-layer, help model restores target detail, DeepLabv3+ is added to a simple and effective decoder module in DeepLabv3 model.In addition to this, in Chao et al. U-shaped structure is applied in the PAN model that the DFN and Li et al. people that GCN, Yu of proposition et al. are proposed are proposed, gradually to merge backbone The characteristic pattern of different levels in net improves spatial resolution, makes up the loss of spatial detail.GCN is connect using " big core " to expand By domain, spatial information is kept.
In order to capture richer multiple dimensioned contextual information, many work have been achieved for certain achievement.Deeplabv2 Expansive space pyramid pond module is proposed to capture multiple dimensioned contextual information.The OCNet model that Yuan et al. is proposed is logical It crosses using pyramid object context or the spatial pyramid object context of expansion and captures multiple dimensioned contextual information.This Outside, DenseASPP model use one group of expansion convolutional layer that Yang et al. is proposed generates Analysis On Multi-scale Features figure.Lin et al. is mentioned The U-net that Refinenet and Ronneberger out et al. are proposed is using coding-decoding structure to the characteristic pattern of different levels It is merged, obtains contextual information abundant.Byeon et al. is proposed a kind of based on mark based on two-dimentional LSTM network The complex space dependence Capturing Models of label.In order to capture context dependency abundant, Shuai et al. on local feature Devise the recurrent neural network of a directed acyclic graph.A row/column line is devised in the SPN model that Liu et al. people proposes Spread through sex intercourse model, which can extract the pairs of relationship of the overall situation intensive in scene image.In the PSANet that Zhao et al. is proposed In model, learn adaptive point by bidirectional information propagation to context.
Summary of the invention
To avoid shortcomings and deficiencies present in the prior art, the present invention proposes a kind of based on coding-decoding structure figure As semantic segmentation method, to solve two challenges present in image, semantic segmentation task: 1) presence of multiple dimensioned object is led Cause mistake classification;2) loss of spatial information causes wisp can not identify.
For achieving the above object, the present invention adopts the following technical scheme:
It is to carry out as follows the present invention is based on coding-decoding structure image, semantic dividing method
Step 1, production contain M picture data set, are divided into three subsets: training set, verifying collection and test set, wherein Training set and verifying collection are accurately carried out Pixel-level mark;
Step 2 is trained to based on coding-decoding structure image, semantic parted pattern
Step 2.1 carries out data enhancing, i.e. Random Level overturning, the random rotation of 10 to -10 degree to training set picture first Turn the random scaling with 0.5 to 2 times;
Step 2.2, by the data enhancing after training set picture X ∈ { x1, x2..., xnBe sent into and improve structure To extract feature set of graphs E ∈ { e abundant in ResNet-101 backbone1, e2..., em};
Step 2.3, by the characteristic pattern E ∈ { e1, e2..., emIt is fed for multi-scale information Fusion Module, contained with capture Distinction is strong and multi-scale information feature set of graphs T ∈ { t abundant1, t2..., ta};
Step 2.4, with spatial information trapping module, extracted from the shallow-layer of the ResNet-101 backbone for improving structure Characteristic pattern Q ∈ { q with rich space information1, q2..., qd, with compensate improve structure ResNet-101 backbone in because The loss of continuous pondization and down-sampling operation bring spatial resolution;
Step 2.5, by the characteristic pattern T ∈ { t containing abundant multi-scale information1, t2..., taAnd contain rich space The characteristic pattern Q ∈ { q of information1, q2..., qdThe feature set of graphs P ∈ { p with abundant information is obtained after fusion1, p2..., pz, recycle a multi-kernel convolution block fine-characterization figure P ∈ { p1, p2..., pz, then grasped by the up-sampling of data dependence Image segmentation result is obtained, then, classifier is returned using Softmax and obtains output error, recycle and intersect entropy loss letter Several pairs of results are assessed, and are finally trained using back-propagation algorithm optimization error, are obtained parted pattern;
Step 3 passes through step 2.1-2.5, after being trained using training set to described image semantic segmentation model, utilizes Verifying collection carries out the model after training to assess its performance;
Step 4 is directed to test sample, and final image segmentation result figure can be obtained after step 2.2-2.5.
Based on coding-decoding structure image, semantic dividing method in the present invention, it is characterized in that the structure of improving ResNet-101 backbone has the following structure:
The ResNet-101 backbone that improvement structure is arranged includes 5 groups of convolution: first group of convolution r1Containing core having a size of 7 × 7 And the convolution that number is 64, convolution step-length stride=2;Second group of convolution r2Containing core having a size of 2 × 2 and step-length be stride =23 mutually isostructural convolutional layers of pond convolution sum, each convolutional layer is just like flowering structure: conv2_1Convolution kernel having a size of 1 × 1 and number be 64, conv2_2Convolution kernel is having a size of 3 × 3 and number is 64, conv2_3Convolution kernel is having a size of 1 × 1 and number is 256;Third group convolution r3In contain 4 mutually isostructural convolutional layers, each convolutional layer is just like flowering structure: conv3_1Convolution kernel ruler Very little is 1 × 1 and number is 128, conv3_2Convolution kernel is having a size of 3 × 3 and number is 128, conv3_3Convolution kernel is having a size of 1 × 1 And number is 512;4th group of convolution r4In contain 23 mutually isostructural convolutional layers, each convolution spreading rate rate=2, volume Step-length stride=1 and each convolutional layer are accumulated just like flowering structure: conv4_1Convolution kernel is having a size of 1 × 1 and number is 256, conv4_2Convolution kernel is having a size of 3 × 3 and number is 256, conv4_3Convolution kernel is having a size of 1 × 1 and number is 1024;5th group of volume Product r5In contain 3 mutually isostructural Kronecker convolutional layers, interior expansion factor κ in each Kronecker convolution1=4 and interior Portion sharing learning κ1=3 and each Kronecker convolutional layer just like flowering structure: conv5_1Convolution kernel is having a size of 1 × 1 and number is 512, conv5_2Convolution kernel is having a size of 3 × 3 and number is 512, conv5_3Convolution kernel is having a size of 1 × 1 and number is 2048.
Based on coding-decoding structure image, semantic dividing method in the present invention, it is characterized in that the multi-scale information Fusion Module structure and to extract distinction strong and multi-scale information feature set of graphs abundant is to carry out as follows:
Setting multi-scale information Fusion Module has input layer, multi-scale information extract layer, output layer.Firstly, being mentioned from backbone The characteristic pattern E ∈ { e taken1, e2..., emBe sent to containing batch normalization (BN), amendment linear unit (ReLU) and 1 × 1 The module of convolution is to reduce characteristic pattern number.Then, characteristic pattern is admitted to multi-scale information extract layer and extracts multi-scale information.It is more Dimensional information extract layer contains the main road of three parallel constructions, and every main road contains a Kronecker convolution block, each Crow Interior gram of convolution block is made of Kronecker convolution, BN and ReLU.Different Kronecker convolution contain different interior expansions because Son and the intra-sharing factor capture relatively rich dimensional information to expand receptive field to the greatest extent.In addition, there is three parallel junctions The branch of structure, every branch contain identical global attention power module.The overall situation pay attention to power module by the average pond layer of the overall situation with Sigmoid activation primitive composition.Notice that power module generates using global and pay attention to force vector to being extracted by Kronecker convolution block Characteristic pattern containing multi-scale information carries out recalibration, and to select, distinction is strong and multi-scale information characteristic pattern abundant.Benefit The characteristic pattern selected from three main roads is carried out with three 1 × 1 convolution to reduce channel processing, to reduce complicated calculations and section It saves time.Characteristic pattern in final three main roads is fused together, and exports new feature set of graphs T ∈ { t1, t2..., ta}。
Based on coding-decoding structure image, semantic dividing method in the present invention, it is characterized in that the spatial information is caught Obtaining modular structure and extracting the feature set of graphs of rich space information is to carry out as follows:
Spatial information trapping module contains three branches, and every branch contains 1 × 1 convolution to reduce characteristic pattern number. The characteristic pattern G ∈ { g obtained from second group of convolution of the ResNet-101 backbone for improving structure1, g2..., glPass through three 1 × 1 Process of convolution after obtain three new feature set of graphs Ξ ∈ { μ1, μ2..., μs,With ξ ∈ { η1, η2..., ηk, wherein Ξ, Ψ have carried out matrix multiplication after being deformed respectively, calculate space note using Softmax operation later Meaning force vectorUtilize calculated space transforms force vectorCharacteristic pattern ξ is marked again in the enterprising feature of Spatial Dimension, and is introduced One scale factor carrys out guidance model and gradually learns for the weight of regional area to be assigned to global position, and final output contains abundant sky Between information feature set of graphs Q ∈ { q1, q2..., qd}。
In formula (1),It is expressed as influence of the position i to position j,For scale parameter, it is initialized as 0.
Based on coding-decoding structure image, semantic dividing method in the present invention, feature lies also in the multi-kernel convolution Block has the following structure:
Two convolution parallel connections, convolution kernel size is respectively 3 × 3 and 5 × 5.
Detailed description of the invention
Fig. 1 is overall construction drawing schematic diagram of the invention;
Fig. 2 is the multi-scale information Fusion Module schematic diagram designed in the present invention;
Fig. 3 is the spatial information trapping module schematic diagram designed in the present invention;
Fig. 4 is the part sample image schematic diagram of emulation experiment of the present invention output.
Specific embodiment
Clear, complete description is carried out to technical solution of the present invention below in conjunction with attached drawing.Based on volume in the present embodiment The image, semantic dividing method of code-decoding structure is to carry out as follows:
Step 1, production contain M picture data set, are divided into three subsets: training set, verifying collection and test set, wherein Training set and verifying collection are accurately carried out Pixel-level mark;
Step 2 is trained to based on coding-decoding structure image, semantic parted pattern
Step 2.1 carries out data enhancing, i.e. Random Level overturning, the random rotation of 10 to -10 degree to training set picture first Turn the random scaling with 0.5 to 2 times;
Step 2.2, by the data enhancing after training set picture X ∈ { x1, x2..., xnBe sent into and improve structure To extract feature set of graphs E ∈ { e abundant in ResNet-101 backbone1, e2..., em, as shown in Figure 1;
Step 2.3, by the characteristic pattern E ∈ { e1, e2..., emIt is fed for multi-scale information Fusion Module, contained with capture Distinction is strong and multi-scale information feature set of graphs T ∈ { t abundant1, t2..., ta, as shown in Figure 2;
Step 2.4, with spatial information trapping module, extracted from the shallow-layer of the ResNet-101 backbone for improving structure Characteristic pattern Q ∈ { q with rich space information1, q2..., qd, with compensate improve structure ResNet-101 backbone in because The loss of continuous pondization and down-sampling operation bring spatial resolution, as shown in Figure 3;
Step 2.5, by the characteristic pattern T ∈ { t containing abundant multi-scale information1, t2..., taAnd contain rich space The characteristic pattern Q ∈ { q of information1, q2..., qdThe feature set of graphs P ∈ { p with abundant information is obtained after fusion1, p2..., pz, recycle a multi-kernel convolution block fine-characterization figure P ∈ { p1, p2..., pz, then grasped by the up-sampling of data dependence Image segmentation result is obtained, then, classifier is returned using Softmax and obtains output error, recycle and intersect entropy loss letter Several pairs of results are assessed, and are finally trained using back-propagation algorithm optimization error, are obtained parted pattern;
Step 3 passes through step 2.1-2.5, after being trained using training set to described image semantic segmentation model, utilizes Verifying collection carries out the model after training to assess its performance;
Step 4 is directed to test sample, final image segmentation result figure can be obtained after step 2.2-2.5, as shown in Figure 4.
It is had the following structure in the present embodiment for the ResNet-101 backbone for improving structure:
The ResNet-101 backbone that improvement structure is arranged includes 5 groups of convolution: first group of convolution r1Containing core having a size of 7 × 7 And the convolution that number is 64, convolution step-length stride=2;Second group of convolution r2Containing core having a size of 2 × 2 and step-length be stride =23 mutually isostructural convolutional layers of pond convolution sum, each convolutional layer is just like flowering structure: conv2_1Convolution kernel having a size of 1 × 1 and number be 64, conv2_2Convolution kernel is having a size of 3 × 3 and number is 64, conv2_3Convolution kernel is having a size of 1 × 1 and number is 256;Third group convolution r3In contain 4 mutually isostructural convolutional layers, each convolutional layer is just like flowering structure: conv3_1Convolution kernel ruler Very little is 1 × 1 and number is 128, conv3_2Convolution kernel is having a size of 3 × 3 and number is 128, conv3_3Convolution kernel is having a size of 1 × 1 And number is 512;4th group of convolution r4In contain 23 mutually isostructural convolutional layers, each convolution spreading rate rate=2, volume Step-length stride=1 and each convolutional layer are accumulated just like flowering structure: conv4_1Convolution kernel is having a size of 1 × 1 and number is 256, conv4_2Convolution kernel is having a size of 3 × 3 and number is 256, conv4_3Convolution kernel is having a size of 1 × 1 and number is 1024;5th group of volume Product r5In contain 3 mutually isostructural Kronecker convolutional layers, interior expansion factor κ in each Kronecker convolution1=4 and interior Portion sharing learning κ1=3 and each Kronecker convolutional layer just like flowering structure: conv5_1Convolution kernel is having a size of 1 × 1 and number is 512, conv5_2Convolution kernel is having a size of 3 × 3 and number is 512, conv5_3Convolution kernel is having a size of 1 × 1 and number is 2048.
It is directed to the multi-scale information Fusion Module structure in the present embodiment and extraction distinction is strong and multi-scale information is rich Rich feature set of graphs is to carry out as follows:
As shown in Fig. 2, setting multi-scale information Fusion Module has input layer, multi-scale information extract layer, output layer.It is first First, the characteristic pattern E ∈ { e extracted from backbone1, e2..., emBe sent to containing batch normalization (BN), amendment linear unit (ReLU) and 1 × 1 convolution module to reduce characteristic pattern number.Then, characteristic pattern is admitted to multi-scale information extract layer and mentions Take multi-scale information.Multi-scale information extract layer contains the main road of three parallel constructions, and every main road contains a Kronecker Convolution block, each Kronecker convolution block are made of Kronecker convolution, BN and ReLU.Different Kronecker convolution contain The different interior expansion factors and the intra-sharing factor captures relatively rich dimensional information to expand receptive field to the greatest extent. In addition, there is the branch of three parallel constructions, every branch contains identical global attention power module.The overall situation pays attention to power module by complete The average pond layer of office and Sigmoid activation primitive composition.Noticing that power module generates using the overall situation pays attention to force vector to by Crow The characteristic pattern containing multi-scale information that gram convolution block extracts carries out recalibration, and to select, distinction is strong and multi-scale information is rich Rich characteristic pattern.The characteristic pattern selected from three main roads is carried out using three 1 × 1 convolution to reduce channel processing, to subtract Lack complicated calculations and saves the time.Characteristic pattern in final three main roads is fused together, the new feature set of graphs T of final output ∈{t1, t2..., ta}。
For the spatial information trapping module structure and the feature set of graphs of extraction rich space information in the present embodiment It is to carry out as follows:
As shown in figure 3, spatial information trapping module contains three branches, every branch contains 1 × 1 convolution to reduce spy Levy figure number.The characteristic pattern G ∈ { g obtained from second group of convolution of the ResNet-101 backbone for improving structure1, g2..., glWarp Three new feature set of graphs Ξ ∈ { μ are obtained after crossing three 1 × 1 process of convolution1, μ2..., μs,And ξ ∈{η1, η2..., ηk, wherein Ξ, Ψ have carried out matrix multiplication after being deformed respectively, are operated using Softmax later to calculate Space transforms force vectorUtilize calculated space transforms force vectorCharacteristic pattern ξ is marked again in the enterprising feature of Spatial Dimension, And gradually learn the weight of regional area being assigned to global position come guidance model using a scale factor, final output contains The feature set of graphs Q ∈ { q of rich space information1, q2..., qd}。
In formula (1),It is expressed as influence of the position i to position j,For scale parameter, it is initialized as 0.
It is had the following structure in the present embodiment for the multi-kernel convolution block:
Two convolution parallel connections, convolution kernel size is respectively 3 × 3 and 5 × 5.

Claims (5)

1. it is a kind of based on coding-decoding structure image, semantic dividing method, it is characterized in that carrying out as follows:
Step 1, production contain M picture data set, are divided into three subsets: training set, verifying collection and test set, wherein training Collection and verifying collection are accurately carried out Pixel-level mark;
Step 2 is trained to based on coding-decoding structure image, semantic parted pattern
Step 2.1, first to training set picture carry out data enhancing, i.e., Random Level overturning, 10 to -10 spend Random-Rotations and 0.5 to 2 times of random scaling;
Step 2.2, by the data enhancing after training set picture X ∈ { x1, x2..., xnIt is sent into the ResNet-101 for improving structure To extract feature set of graphs E ∈ { e abundant in backbone1, e2..., em};
Step 2.3, by the characteristic pattern E ∈ { e1, e2..., emIt is fed for multi-scale information Fusion Module, differentiation is contained with capture Power is strong and multi-scale information feature set of graphs T ∈ { t abundant1, t2..., ta};
Step 2.4, with spatial information trapping module, have from the shallow-layer extraction of the ResNet-101 backbone for improving structure The characteristic pattern Q ∈ { q of rich space information1, q2..., qd, to compensate in the ResNet-101 backbone for improving structure because continuous Pondization and down-sampling operation bring spatial resolution loss;
Step 2.5, by the characteristic pattern T ∈ { t containing abundant multi-scale information1, t2..., taAnd contain rich space information Characteristic pattern Q ∈ { q1, q2..., qdThe feature set of graphs P ∈ { p with abundant information is obtained after fusion1, p2..., pz, then benefit With a multi-kernel convolution block fine-characterization figure P ∈ { p1, p2..., pz, then operate to obtain figure by the up-sampling of data dependence As segmentation result, then, classifier is returned using Softmax and obtains output error, recycles cross entropy loss function to result It is assessed, is finally trained using back-propagation algorithm optimization error, obtains parted pattern;
Step 3 utilizes verifying after being trained using training set to described image semantic segmentation model by step 2.1-2.5 Collection carries out the model after training to assess its performance;
Step 4 is directed to test sample, and final image segmentation result figure can be obtained after step 2.2-2.5.
2. according to claim 1 based on coding-decoding structure image, semantic dividing method, characterized in that
The ResNet-101 backbone for improving structure has the following structure:
The ResNet-101 backbone that improvement structure is arranged includes 5 groups of convolution: first group of convolution r1Containing core having a size of 7 × 7 and number For 64 convolution, convolution step-length stride=2;Second group of convolution r2Containing core having a size of 2 × 2 and step-length is stride=2 3 mutually isostructural convolutional layers of pond convolution sum, each convolutional layer is just like flowering structure: conv2_1Convolution kernel is having a size of 1 × 1 and a Number is 64, conv2_2Convolution kernel is having a size of 3 × 3 and number is 64, conv2_3Convolution kernel is having a size of 1 × 1 and number is 256;The Three groups of convolution r3In contain 4 mutually isostructural convolutional layers, each convolutional layer is just like flowering structure: conv3_1Convolution kernel is having a size of 1 × 1 and number be 128, conv3_2Convolution kernel is having a size of 3 × 3 and number is 128, conv3_3Convolution kernel is having a size of 1 × 1 and number It is 512;4th group of convolution r4In contain 23 mutually isostructural convolutional layers, each convolution spreading rate rate=2, convolution step-length Stride=1 and each convolutional layer is just like flowering structure: conv4_1Convolution kernel is having a size of 1 × 1 and number is 256, conv4_2Convolution Core is having a size of 3 × 3 and number is 256, conv4_3Convolution kernel is having a size of 1 × 1 and number is 1024;5th group of convolution r5In contain 3 mutually isostructural Kronecker convolutional layers, interior expansion factor κ in each Kronecker convolution1=4 and the intra-sharing factor κ1=3 and each Kronecker convolutional layer just like flowering structure: conv5_1Convolution kernel is having a size of 1 × 1 and number is 512, conv5_2 Convolution kernel is having a size of 3 × 3 and number is 512, conv5_3Convolution kernel is having a size of 1 × 1 and number is 2048.
3. according to claim 1 based on coding-decoding structure image, semantic dividing method, characterized in that
The multi-scale information Fusion Module structure and extract that distinction is strong and multi-scale information feature set of graphs abundant be by Following steps carry out:
Setting multi-scale information Fusion Module has input layer, multi-scale information extract layer, output layer.Firstly, extracted from backbone Characteristic pattern E ∈ { e1, e2..., emIt is sent to the convolution for normalizing (BN) containing batch, correcting linear unit (ReLU) and 1 × 1 Module to reduce characteristic pattern number.Then, characteristic pattern is admitted to multi-scale information extract layer and extracts multi-scale information.It is multiple dimensioned Main road of the information extraction layer containing three parallel constructions, every main road contain a Kronecker convolution block, each Kronecker Convolution block is made of Kronecker convolution, BN and ReLU.Different Kronecker convolution contain the different interior expansion factor and The intra-sharing factor captures relatively rich dimensional information to expand receptive field to the greatest extent.In addition, there is three parallel constructions Branch, every branch contain identical global attention power module.The overall situation pays attention to power module by the average pond layer of the overall situation and Sigmoid Activation primitive composition.Notice that power module generates attention force vector and contains more rulers to what is extracted by Kronecker convolution block using global The characteristic pattern for spending information carries out recalibration, and to select, distinction is strong and multi-scale information characteristic pattern abundant.Using three 1 × 1 convolution carries out the characteristic pattern selected from three main roads to reduce channel processing, to reduce complicated calculations and save the time.Most The characteristic pattern in three main roads is fused together eventually, exports new feature set of graphs T ∈ { t1, t2..., ta}。
4. according to claim 1 based on coding-decoding structure image, semantic dividing method, characterized in that
The spatial information trapping module structure and the feature set of graphs for extracting rich space information are to carry out as follows:
Spatial information trapping module contains three branches, and every branch contains 1 × 1 convolution to reduce characteristic pattern number.From changing The characteristic pattern G ∈ { g obtained into second group of convolution of the ResNet-101 backbone of structure1, g2..., glBy three 1 × 1 volumes Three new feature set of graphs Ξ ∈ { μ are obtained after product processing1, μ2..., μs,With ξ ∈ { η1, η2..., ηk, wherein Ξ, Ψ have carried out matrix multiplication after being deformed respectively, have calculated spatial attention using Softmax operation later VectorUtilize calculated space transforms force vectorCharacteristic pattern ξ is marked again in the enterprising feature of Spatial Dimension, and introduces one Scale factor carrys out guidance model and gradually learns for the weight of regional area to be assigned to global position, and final output contains rich space letter The feature set of graphs Q ∈ { q of breath1, q2..., qd}。
In formula (1),It is expressed as influence of the position i to position j,For scale parameter, it is initialized as 0.
5. according to claim 1 based on coding-decoding structure image, semantic dividing method, characterized in that
The multi-kernel convolution block has the following structure:
Two convolution parallel connections, convolution kernel size is respectively 3 × 3 and 5 × 5.
CN201910503595.5A 2019-06-03 2019-06-03 Based on coding-decoding structure image, semantic dividing method Pending CN110263833A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910503595.5A CN110263833A (en) 2019-06-03 2019-06-03 Based on coding-decoding structure image, semantic dividing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910503595.5A CN110263833A (en) 2019-06-03 2019-06-03 Based on coding-decoding structure image, semantic dividing method

Publications (1)

Publication Number Publication Date
CN110263833A true CN110263833A (en) 2019-09-20

Family

ID=67917688

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910503595.5A Pending CN110263833A (en) 2019-06-03 2019-06-03 Based on coding-decoding structure image, semantic dividing method

Country Status (1)

Country Link
CN (1) CN110263833A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991617A (en) * 2019-12-02 2020-04-10 华东师范大学 Construction method of kaleidoscope convolution network
CN111127470A (en) * 2019-12-24 2020-05-08 江西理工大学 Image semantic segmentation method based on context and shallow space coding and decoding network
CN111242288A (en) * 2020-01-16 2020-06-05 浙江工业大学 Multi-scale parallel deep neural network model construction method for lesion image segmentation
CN111325093A (en) * 2020-01-15 2020-06-23 北京字节跳动网络技术有限公司 Video segmentation method and device and electronic equipment
CN111369582A (en) * 2020-03-06 2020-07-03 腾讯科技(深圳)有限公司 Image segmentation method, background replacement method, device, equipment and storage medium
CN111373439A (en) * 2020-02-10 2020-07-03 香港应用科技研究院有限公司 Method for image segmentation using CNN
CN111461130A (en) * 2020-04-10 2020-07-28 视研智能科技(广州)有限公司 High-precision image semantic segmentation algorithm model and segmentation method
CN111627055A (en) * 2020-05-07 2020-09-04 浙江大学 Scene depth completion method based on semantic segmentation
CN111860386A (en) * 2020-07-27 2020-10-30 山东大学 Video semantic segmentation method based on ConvLSTM convolutional neural network
CN112287940A (en) * 2020-10-30 2021-01-29 西安工程大学 Semantic segmentation method of attention mechanism based on deep learning
CN112489061A (en) * 2020-12-09 2021-03-12 浙江工业大学 Deep learning intestinal polyp segmentation method based on multi-scale information and parallel attention mechanism
CN112634289A (en) * 2020-12-28 2021-04-09 华中科技大学 Rapid feasible domain segmentation method based on asymmetric void convolution
CN112734715A (en) * 2020-12-31 2021-04-30 同济大学 Lung nodule segmentation method of lung CT image
CN112967294A (en) * 2021-03-11 2021-06-15 西安智诊智能科技有限公司 Liver CT image segmentation method and system
CN113256609A (en) * 2021-06-18 2021-08-13 四川大学 CT picture cerebral hemorrhage automatic check out system based on improved generation Unet
CN113392783A (en) * 2021-06-18 2021-09-14 河南科技学院 Improved ResNet-based transparent window object detection method
CN113658200A (en) * 2021-07-29 2021-11-16 东北大学 Edge perception image semantic segmentation method based on self-adaptive feature fusion
CN114140472A (en) * 2022-02-07 2022-03-04 湖南大学 Cross-level information fusion medical image segmentation method
CN115423810A (en) * 2022-11-04 2022-12-02 国网江西省电力有限公司电力科学研究院 Blade icing form analysis method for wind generating set

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062754A (en) * 2018-01-19 2018-05-22 深圳大学 Segmentation, recognition methods and device based on dense network image
CN108090565A (en) * 2018-01-16 2018-05-29 电子科技大学 Accelerated method is trained in a kind of convolutional neural networks parallelization

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108090565A (en) * 2018-01-16 2018-05-29 电子科技大学 Accelerated method is trained in a kind of convolutional neural networks parallelization
CN108062754A (en) * 2018-01-19 2018-05-22 深圳大学 Segmentation, recognition methods and device based on dense network image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TIANYI WU等: "Tree-structured Kronecker Convolutional Network for Semantic Segmentation", 《ARXIV》 *

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991617A (en) * 2019-12-02 2020-04-10 华东师范大学 Construction method of kaleidoscope convolution network
CN110991617B (en) * 2019-12-02 2020-12-01 华东师范大学 Construction method of kaleidoscope convolution network
CN111127470A (en) * 2019-12-24 2020-05-08 江西理工大学 Image semantic segmentation method based on context and shallow space coding and decoding network
CN111127470B (en) * 2019-12-24 2023-06-16 江西理工大学 Image semantic segmentation method based on context and shallow space coding and decoding network
CN111325093A (en) * 2020-01-15 2020-06-23 北京字节跳动网络技术有限公司 Video segmentation method and device and electronic equipment
CN111242288B (en) * 2020-01-16 2023-06-27 浙江工业大学 Multi-scale parallel deep neural network model construction method for lesion image segmentation
CN111242288A (en) * 2020-01-16 2020-06-05 浙江工业大学 Multi-scale parallel deep neural network model construction method for lesion image segmentation
CN111373439A (en) * 2020-02-10 2020-07-03 香港应用科技研究院有限公司 Method for image segmentation using CNN
CN111373439B (en) * 2020-02-10 2023-05-02 香港应用科技研究院有限公司 Method for image segmentation using CNN
CN111369582A (en) * 2020-03-06 2020-07-03 腾讯科技(深圳)有限公司 Image segmentation method, background replacement method, device, equipment and storage medium
CN111369582B (en) * 2020-03-06 2023-04-07 腾讯科技(深圳)有限公司 Image segmentation method, background replacement method, device, equipment and storage medium
CN111461130A (en) * 2020-04-10 2020-07-28 视研智能科技(广州)有限公司 High-precision image semantic segmentation algorithm model and segmentation method
CN111627055A (en) * 2020-05-07 2020-09-04 浙江大学 Scene depth completion method based on semantic segmentation
CN111627055B (en) * 2020-05-07 2023-11-24 浙江大学 Scene depth completion method combining semantic segmentation
CN111860386A (en) * 2020-07-27 2020-10-30 山东大学 Video semantic segmentation method based on ConvLSTM convolutional neural network
CN111860386B (en) * 2020-07-27 2022-04-08 山东大学 Video semantic segmentation method based on ConvLSTM convolutional neural network
CN112287940A (en) * 2020-10-30 2021-01-29 西安工程大学 Semantic segmentation method of attention mechanism based on deep learning
CN112489061B (en) * 2020-12-09 2024-04-16 浙江工业大学 Deep learning intestinal polyp segmentation method based on multi-scale information and parallel attention mechanism
CN112489061A (en) * 2020-12-09 2021-03-12 浙江工业大学 Deep learning intestinal polyp segmentation method based on multi-scale information and parallel attention mechanism
CN112634289B (en) * 2020-12-28 2022-05-27 华中科技大学 Rapid feasible domain segmentation method based on asymmetric void convolution
CN112634289A (en) * 2020-12-28 2021-04-09 华中科技大学 Rapid feasible domain segmentation method based on asymmetric void convolution
CN112734715A (en) * 2020-12-31 2021-04-30 同济大学 Lung nodule segmentation method of lung CT image
CN112967294A (en) * 2021-03-11 2021-06-15 西安智诊智能科技有限公司 Liver CT image segmentation method and system
CN113392783B (en) * 2021-06-18 2022-11-01 河南科技学院 Improved ResNet-based transparent window object detection method
CN113256609B (en) * 2021-06-18 2021-09-21 四川大学 CT picture cerebral hemorrhage automatic check out system based on improved generation Unet
CN113392783A (en) * 2021-06-18 2021-09-14 河南科技学院 Improved ResNet-based transparent window object detection method
CN113256609A (en) * 2021-06-18 2021-08-13 四川大学 CT picture cerebral hemorrhage automatic check out system based on improved generation Unet
CN113658200A (en) * 2021-07-29 2021-11-16 东北大学 Edge perception image semantic segmentation method based on self-adaptive feature fusion
CN113658200B (en) * 2021-07-29 2024-01-02 东北大学 Edge perception image semantic segmentation method based on self-adaptive feature fusion
CN114140472A (en) * 2022-02-07 2022-03-04 湖南大学 Cross-level information fusion medical image segmentation method
CN115423810A (en) * 2022-11-04 2022-12-02 国网江西省电力有限公司电力科学研究院 Blade icing form analysis method for wind generating set

Similar Documents

Publication Publication Date Title
CN110263833A (en) Based on coding-decoding structure image, semantic dividing method
CN110298387A (en) Incorporate the deep neural network object detection method of Pixel-level attention mechanism
CN107564025A (en) A kind of power equipment infrared image semantic segmentation method based on deep neural network
CN109902798A (en) The training method and device of deep neural network
CN108564097A (en) A kind of multiscale target detection method based on depth convolutional neural networks
CN109815785A (en) A kind of face Emotion identification method based on double-current convolutional neural networks
CN107818302A (en) Non-rigid multiple dimensioned object detecting method based on convolutional neural networks
CN110111366A (en) A kind of end-to-end light stream estimation method based on multistage loss amount
CN106446930A (en) Deep convolutional neural network-based robot working scene identification method
CN106960206A (en) Character identifying method and character recognition system
CN108256426A (en) A kind of facial expression recognizing method based on convolutional neural networks
CN107506722A (en) One kind is based on depth sparse convolution neutral net face emotion identification method
CN109800628A (en) A kind of network structure and detection method for reinforcing SSD Small object pedestrian detection performance
CN105160310A (en) 3D (three-dimensional) convolutional neural network based human body behavior recognition method
CN106981080A (en) Night unmanned vehicle scene depth method of estimation based on infrared image and radar data
CN111626176B (en) Remote sensing target rapid detection method and system based on dynamic attention mechanism
CN107679462A (en) A kind of depth multiple features fusion sorting technique based on small echo
CN108629288A (en) A kind of gesture identification model training method, gesture identification method and system
CN107085723A (en) A kind of characters on license plate global recognition method based on deep learning model
CN107967474A (en) A kind of sea-surface target conspicuousness detection method based on convolutional neural networks
CN108122003A (en) A kind of Weak target recognition methods based on deep neural network
CN106372597A (en) CNN traffic detection method based on adaptive context information
CN112288776B (en) Target tracking method based on multi-time step pyramid codec
CN109492618A (en) Object detection method and device based on grouping expansion convolutional neural networks model
CN111160294A (en) Gait recognition method based on graph convolution network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190920