CN110263833A

CN110263833A - Based on coding-decoding structure image, semantic dividing method

Info

Publication number: CN110263833A
Application number: CN201910503595.5A
Authority: CN
Inventors: 韩慧慧
Original assignee: Individual
Current assignee: Individual
Priority date: 2019-06-03
Filing date: 2019-06-03
Publication date: 2019-09-20

Abstract

The invention discloses a kind of based on coding-decoding structure image, semantic dividing method.Its feature includes: to divide picture extraction feature set of graphs by improving the ResNet-101 network handles of structure first；Then multi-scale information capture is carried out to extracted characteristic pattern using multi-scale information Fusion Module；Also, the shallow-layer using extraction of spatial information module in ResNet-101 extracts spatial information abundant；Then, by after the fusion of the spatial information of the multi-scale information of deep layer and shallow-layer, fused characteristic pattern is refined using a multi-kernel convolution block；Finally, it operates to obtain segmentation result by the up-sampling of data dependence.Present invention is primarily directed to improve image segmentation accuracy, belong to technical field of image processing, especially suitable for medical image analysis, automatic Pilot, virtual reality, driver assistance, robot sensing, indoor environment reconstruction, unmanned plane etc..

Description

Based on coding-decoding structure image, semantic dividing method

Technical field

The invention belongs to technical field of image processing, more particularly, to based on coding-decoding structure image, semantic segmentation side Method, especially suitable for tasks such as medical image analysis, automatic Pilot, indoor environment reconstruction, unmanned planes.

Background technique

Semantic segmentation is an important research field in image procossing, and target is carried out to each pixel on image Dense prediction simultaneously marks the classification for corresponding to object or region.With the continuous development of depth convolutional neural networks, especially entirely The appearance of machine neural network is rolled up, semantic segmentation technology realizes qualitative leap.In order to further increase semantic segmentation as a result, various countries Researcher sets out from different angles, has devised diversified model framework.

The phenomenon that spatial resolution caused by continuous down-sampling and pondization operate in order to prevent reduces, mentions in Chen et al. In the PSPNet model that Deeplabv2, Deeplabv3 and Deeplabv3+ and Zhao out et al. are proposed, expansion is used Convolution can effectively expand the receptive field of filter, reduce the loss of spatial detail.Moreover, coding-decoding structure can also The phenomenon that prevent spatial information from losing.For example, the SegNet that Badrinarayanan et al. is proposed utilizes coding-decoding knot Structure captures more spatial informations.In order to capture more spatial informations in shallow-layer, help model restores target detail, DeepLabv3+ is added to a simple and effective decoder module in DeepLabv3 model.In addition to this, in Chao et al. U-shaped structure is applied in the PAN model that the DFN and Li et al. people that GCN, Yu of proposition et al. are proposed are proposed, gradually to merge backbone The characteristic pattern of different levels in net improves spatial resolution, makes up the loss of spatial detail.GCN is connect using " big core " to expand By domain, spatial information is kept.

In order to capture richer multiple dimensioned contextual information, many work have been achieved for certain achievement.Deeplabv2 Expansive space pyramid pond module is proposed to capture multiple dimensioned contextual information.The OCNet model that Yuan et al. is proposed is logical It crosses using pyramid object context or the spatial pyramid object context of expansion and captures multiple dimensioned contextual information.This Outside, DenseASPP model use one group of expansion convolutional layer that Yang et al. is proposed generates Analysis On Multi-scale Features figure.Lin et al. is mentioned The U-net that Refinenet and Ronneberger out et al. are proposed is using coding-decoding structure to the characteristic pattern of different levels It is merged, obtains contextual information abundant.Byeon et al. is proposed a kind of based on mark based on two-dimentional LSTM network The complex space dependence Capturing Models of label.In order to capture context dependency abundant, Shuai et al. on local feature Devise the recurrent neural network of a directed acyclic graph.A row/column line is devised in the SPN model that Liu et al. people proposes Spread through sex intercourse model, which can extract the pairs of relationship of the overall situation intensive in scene image.In the PSANet that Zhao et al. is proposed In model, learn adaptive point by bidirectional information propagation to context.

Summary of the invention

To avoid shortcomings and deficiencies present in the prior art, the present invention proposes a kind of based on coding-decoding structure figure As semantic segmentation method, to solve two challenges present in image, semantic segmentation task: 1) presence of multiple dimensioned object is led Cause mistake classification；2) loss of spatial information causes wisp can not identify.

For achieving the above object, the present invention adopts the following technical scheme:

It is to carry out as follows the present invention is based on coding-decoding structure image, semantic dividing method

Step 1, production contain M picture data set, are divided into three subsets: training set, verifying collection and test set, wherein Training set and verifying collection are accurately carried out Pixel-level mark；

Step 2 is trained to based on coding-decoding structure image, semantic parted pattern

Step 2.1 carries out data enhancing, i.e. Random Level overturning, the random rotation of 10 to -10 degree to training set picture first Turn the random scaling with 0.5 to 2 times；

Step 2.2, by the data enhancing after training set picture X ∈ { x₁, x₂..., x_nBe sent into and improve structure To extract feature set of graphs E ∈ { e abundant in ResNet-101 backbone₁, e₂..., e_m}；

Step 2.3, by the characteristic pattern E ∈ { e₁, e₂..., e_mIt is fed for multi-scale information Fusion Module, contained with capture Distinction is strong and multi-scale information feature set of graphs T ∈ { t abundant₁, t₂..., t_a}；

Step 2.4, with spatial information trapping module, extracted from the shallow-layer of the ResNet-101 backbone for improving structure Characteristic pattern Q ∈ { q with rich space information₁, q₂..., q_d, with compensate improve structure ResNet-101 backbone in because The loss of continuous pondization and down-sampling operation bring spatial resolution；

Step 2.5, by the characteristic pattern T ∈ { t containing abundant multi-scale information₁, t₂..., t_aAnd contain rich space The characteristic pattern Q ∈ { q of information₁, q₂..., q_dThe feature set of graphs P ∈ { p with abundant information is obtained after fusion₁, p₂..., p_z, recycle a multi-kernel convolution block fine-characterization figure P ∈ { p₁, p₂..., p_z, then grasped by the up-sampling of data dependence Image segmentation result is obtained, then, classifier is returned using Softmax and obtains output error, recycle and intersect entropy loss letter Several pairs of results are assessed, and are finally trained using back-propagation algorithm optimization error, are obtained parted pattern；

Step 3 passes through step 2.1-2.5, after being trained using training set to described image semantic segmentation model, utilizes Verifying collection carries out the model after training to assess its performance；

Step 4 is directed to test sample, and final image segmentation result figure can be obtained after step 2.2-2.5.

Based on coding-decoding structure image, semantic dividing method in the present invention, it is characterized in that the structure of improving ResNet-101 backbone has the following structure:

The ResNet-101 backbone that improvement structure is arranged includes 5 groups of convolution: first group of convolution r₁Containing core having a size of 7 × 7 And the convolution that number is 64, convolution step-length stride=2；Second group of convolution r₂Containing core having a size of 2 × 2 and step-length be stride =23 mutually isostructural convolutional layers of pond convolution sum, each convolutional layer is just like flowering structure: conv_{2_1}Convolution kernel having a size of 1 × 1 and number be 64, conv_{2_2}Convolution kernel is having a size of 3 × 3 and number is 64, conv_{2_3}Convolution kernel is having a size of 1 × 1 and number is 256；Third group convolution r₃In contain 4 mutually isostructural convolutional layers, each convolutional layer is just like flowering structure: conv_{3_1}Convolution kernel ruler Very little is 1 × 1 and number is 128, conv_{3_2}Convolution kernel is having a size of 3 × 3 and number is 128, conv_{3_3}Convolution kernel is having a size of 1 × 1 And number is 512；4th group of convolution r₄In contain 23 mutually isostructural convolutional layers, each convolution spreading rate rate=2, volume Step-length stride=1 and each convolutional layer are accumulated just like flowering structure: conv_{4_1}Convolution kernel is having a size of 1 × 1 and number is 256, conv_{4_2}Convolution kernel is having a size of 3 × 3 and number is 256, conv_{4_3}Convolution kernel is having a size of 1 × 1 and number is 1024；5th group of volume Product r₅In contain 3 mutually isostructural Kronecker convolutional layers, interior expansion factor κ in each Kronecker convolution₁=4 and interior Portion sharing learning κ₁=3 and each Kronecker convolutional layer just like flowering structure: conv_{5_1}Convolution kernel is having a size of 1 × 1 and number is 512, conv_{5_2}Convolution kernel is having a size of 3 × 3 and number is 512, conv_{5_3}Convolution kernel is having a size of 1 × 1 and number is 2048.

Based on coding-decoding structure image, semantic dividing method in the present invention, it is characterized in that the multi-scale information Fusion Module structure and to extract distinction strong and multi-scale information feature set of graphs abundant is to carry out as follows:

Setting multi-scale information Fusion Module has input layer, multi-scale information extract layer, output layer.Firstly, being mentioned from backbone The characteristic pattern E ∈ { e taken₁, e₂..., e_mBe sent to containing batch normalization (BN), amendment linear unit (ReLU) and 1 × 1 The module of convolution is to reduce characteristic pattern number.Then, characteristic pattern is admitted to multi-scale information extract layer and extracts multi-scale information.It is more Dimensional information extract layer contains the main road of three parallel constructions, and every main road contains a Kronecker convolution block, each Crow Interior gram of convolution block is made of Kronecker convolution, BN and ReLU.Different Kronecker convolution contain different interior expansions because Son and the intra-sharing factor capture relatively rich dimensional information to expand receptive field to the greatest extent.In addition, there is three parallel junctions The branch of structure, every branch contain identical global attention power module.The overall situation pay attention to power module by the average pond layer of the overall situation with Sigmoid activation primitive composition.Notice that power module generates using global and pay attention to force vector to being extracted by Kronecker convolution block Characteristic pattern containing multi-scale information carries out recalibration, and to select, distinction is strong and multi-scale information characteristic pattern abundant.Benefit The characteristic pattern selected from three main roads is carried out with three 1 × 1 convolution to reduce channel processing, to reduce complicated calculations and section It saves time.Characteristic pattern in final three main roads is fused together, and exports new feature set of graphs T ∈ { t₁, t₂..., t_a}。

Based on coding-decoding structure image, semantic dividing method in the present invention, it is characterized in that the spatial information is caught Obtaining modular structure and extracting the feature set of graphs of rich space information is to carry out as follows:

Spatial information trapping module contains three branches, and every branch contains 1 × 1 convolution to reduce characteristic pattern number. The characteristic pattern G ∈ { g obtained from second group of convolution of the ResNet-101 backbone for improving structure₁, g₂..., g_lPass through three 1 × 1 Process of convolution after obtain three new feature set of graphs Ξ ∈ { μ₁, μ₂..., μ_s,With ξ ∈ { η₁, η₂..., η_k, wherein Ξ, Ψ have carried out matrix multiplication after being deformed respectively, calculate space note using Softmax operation later Meaning force vectorUtilize calculated space transforms force vectorCharacteristic pattern ξ is marked again in the enterprising feature of Spatial Dimension, and is introduced One scale factor carrys out guidance model and gradually learns for the weight of regional area to be assigned to global position, and final output contains abundant sky Between information feature set of graphs Q ∈ { q₁, q₂..., q_d}。

In formula (1),It is expressed as influence of the position i to position j,For scale parameter, it is initialized as 0.

Based on coding-decoding structure image, semantic dividing method in the present invention, feature lies also in the multi-kernel convolution Block has the following structure:

Two convolution parallel connections, convolution kernel size is respectively 3 × 3 and 5 × 5.

Detailed description of the invention

Fig. 1 is overall construction drawing schematic diagram of the invention；

Fig. 2 is the multi-scale information Fusion Module schematic diagram designed in the present invention；

Fig. 3 is the spatial information trapping module schematic diagram designed in the present invention；

Fig. 4 is the part sample image schematic diagram of emulation experiment of the present invention output.

Specific embodiment

Clear, complete description is carried out to technical solution of the present invention below in conjunction with attached drawing.Based on volume in the present embodiment The image, semantic dividing method of code-decoding structure is to carry out as follows:

Step 2.2, by the data enhancing after training set picture X ∈ { x₁, x₂..., x_nBe sent into and improve structure To extract feature set of graphs E ∈ { e abundant in ResNet-101 backbone₁, e₂..., e_m, as shown in Figure 1；

Step 2.3, by the characteristic pattern E ∈ { e₁, e₂..., e_mIt is fed for multi-scale information Fusion Module, contained with capture Distinction is strong and multi-scale information feature set of graphs T ∈ { t abundant₁, t₂..., t_a, as shown in Figure 2；

Step 2.4, with spatial information trapping module, extracted from the shallow-layer of the ResNet-101 backbone for improving structure Characteristic pattern Q ∈ { q with rich space information₁, q₂..., q_d, with compensate improve structure ResNet-101 backbone in because The loss of continuous pondization and down-sampling operation bring spatial resolution, as shown in Figure 3；

Step 4 is directed to test sample, final image segmentation result figure can be obtained after step 2.2-2.5, as shown in Figure 4.

It is had the following structure in the present embodiment for the ResNet-101 backbone for improving structure:

It is directed to the multi-scale information Fusion Module structure in the present embodiment and extraction distinction is strong and multi-scale information is rich Rich feature set of graphs is to carry out as follows:

As shown in Fig. 2, setting multi-scale information Fusion Module has input layer, multi-scale information extract layer, output layer.It is first First, the characteristic pattern E ∈ { e extracted from backbone₁, e₂..., e_mBe sent to containing batch normalization (BN), amendment linear unit (ReLU) and 1 × 1 convolution module to reduce characteristic pattern number.Then, characteristic pattern is admitted to multi-scale information extract layer and mentions Take multi-scale information.Multi-scale information extract layer contains the main road of three parallel constructions, and every main road contains a Kronecker Convolution block, each Kronecker convolution block are made of Kronecker convolution, BN and ReLU.Different Kronecker convolution contain The different interior expansion factors and the intra-sharing factor captures relatively rich dimensional information to expand receptive field to the greatest extent. In addition, there is the branch of three parallel constructions, every branch contains identical global attention power module.The overall situation pays attention to power module by complete The average pond layer of office and Sigmoid activation primitive composition.Noticing that power module generates using the overall situation pays attention to force vector to by Crow The characteristic pattern containing multi-scale information that gram convolution block extracts carries out recalibration, and to select, distinction is strong and multi-scale information is rich Rich characteristic pattern.The characteristic pattern selected from three main roads is carried out using three 1 × 1 convolution to reduce channel processing, to subtract Lack complicated calculations and saves the time.Characteristic pattern in final three main roads is fused together, the new feature set of graphs T of final output ∈{t₁, t₂..., t_a}。

For the spatial information trapping module structure and the feature set of graphs of extraction rich space information in the present embodiment It is to carry out as follows:

As shown in figure 3, spatial information trapping module contains three branches, every branch contains 1 × 1 convolution to reduce spy Levy figure number.The characteristic pattern G ∈ { g obtained from second group of convolution of the ResNet-101 backbone for improving structure₁, g₂..., g_lWarp Three new feature set of graphs Ξ ∈ { μ are obtained after crossing three 1 × 1 process of convolution₁, μ₂..., μ_s,And ξ ∈{η₁, η₂..., η_k, wherein Ξ, Ψ have carried out matrix multiplication after being deformed respectively, are operated using Softmax later to calculate Space transforms force vectorUtilize calculated space transforms force vectorCharacteristic pattern ξ is marked again in the enterprising feature of Spatial Dimension, And gradually learn the weight of regional area being assigned to global position come guidance model using a scale factor, final output contains The feature set of graphs Q ∈ { q of rich space information₁, q₂..., q_d}。

It is had the following structure in the present embodiment for the multi-kernel convolution block:

Claims

1. it is a kind of based on coding-decoding structure image, semantic dividing method, it is characterized in that carrying out as follows:

Step 1, production contain M picture data set, are divided into three subsets: training set, verifying collection and test set, wherein training Collection and verifying collection are accurately carried out Pixel-level mark；

Step 2.1, first to training set picture carry out data enhancing, i.e., Random Level overturning, 10 to -10 spend Random-Rotations and 0.5 to 2 times of random scaling；

Step 2.2, by the data enhancing after training set picture X ∈ { x₁, x₂..., x_nIt is sent into the ResNet-101 for improving structure To extract feature set of graphs E ∈ { e abundant in backbone₁, e₂..., e_m}；

Step 2.3, by the characteristic pattern E ∈ { e₁, e₂..., e_mIt is fed for multi-scale information Fusion Module, differentiation is contained with capture Power is strong and multi-scale information feature set of graphs T ∈ { t abundant₁, t₂..., t_a}；

Step 2.4, with spatial information trapping module, have from the shallow-layer extraction of the ResNet-101 backbone for improving structure The characteristic pattern Q ∈ { q of rich space information₁, q₂..., q_d, to compensate in the ResNet-101 backbone for improving structure because continuous Pondization and down-sampling operation bring spatial resolution loss；

Step 2.5, by the characteristic pattern T ∈ { t containing abundant multi-scale information₁, t₂..., t_aAnd contain rich space information Characteristic pattern Q ∈ { q₁, q₂..., q_dThe feature set of graphs P ∈ { p with abundant information is obtained after fusion₁, p₂..., p_z, then benefit With a multi-kernel convolution block fine-characterization figure P ∈ { p₁, p₂..., p_z, then operate to obtain figure by the up-sampling of data dependence As segmentation result, then, classifier is returned using Softmax and obtains output error, recycles cross entropy loss function to result It is assessed, is finally trained using back-propagation algorithm optimization error, obtains parted pattern；

Step 3 utilizes verifying after being trained using training set to described image semantic segmentation model by step 2.1-2.5 Collection carries out the model after training to assess its performance；

2. according to claim 1 based on coding-decoding structure image, semantic dividing method, characterized in that

The ResNet-101 backbone for improving structure has the following structure:

The ResNet-101 backbone that improvement structure is arranged includes 5 groups of convolution: first group of convolution r₁Containing core having a size of 7 × 7 and number For 64 convolution, convolution step-length stride=2；Second group of convolution r₂Containing core having a size of 2 × 2 and step-length is stride=2 3 mutually isostructural convolutional layers of pond convolution sum, each convolutional layer is just like flowering structure: conv_{2_1}Convolution kernel is having a size of 1 × 1 and a Number is 64, conv_{2_2}Convolution kernel is having a size of 3 × 3 and number is 64, conv_{2_3}Convolution kernel is having a size of 1 × 1 and number is 256；The Three groups of convolution r₃In contain 4 mutually isostructural convolutional layers, each convolutional layer is just like flowering structure: conv_{3_1}Convolution kernel is having a size of 1 × 1 and number be 128, conv_{3_2}Convolution kernel is having a size of 3 × 3 and number is 128, conv_{3_3}Convolution kernel is having a size of 1 × 1 and number It is 512；4th group of convolution r₄In contain 23 mutually isostructural convolutional layers, each convolution spreading rate rate=2, convolution step-length Stride=1 and each convolutional layer is just like flowering structure: conv_{4_1}Convolution kernel is having a size of 1 × 1 and number is 256, conv_{4_2}Convolution Core is having a size of 3 × 3 and number is 256, conv_{4_3}Convolution kernel is having a size of 1 × 1 and number is 1024；5th group of convolution r₅In contain 3 mutually isostructural Kronecker convolutional layers, interior expansion factor κ in each Kronecker convolution₁=4 and the intra-sharing factor κ₁=3 and each Kronecker convolutional layer just like flowering structure: conv_{5_1}Convolution kernel is having a size of 1 × 1 and number is 512, conv_{5_2} Convolution kernel is having a size of 3 × 3 and number is 512, conv_{5_3}Convolution kernel is having a size of 1 × 1 and number is 2048.

3. according to claim 1 based on coding-decoding structure image, semantic dividing method, characterized in that

The multi-scale information Fusion Module structure and extract that distinction is strong and multi-scale information feature set of graphs abundant be by Following steps carry out:

Setting multi-scale information Fusion Module has input layer, multi-scale information extract layer, output layer.Firstly, extracted from backbone Characteristic pattern E ∈ { e₁, e₂..., e_mIt is sent to the convolution for normalizing (BN) containing batch, correcting linear unit (ReLU) and 1 × 1 Module to reduce characteristic pattern number.Then, characteristic pattern is admitted to multi-scale information extract layer and extracts multi-scale information.It is multiple dimensioned Main road of the information extraction layer containing three parallel constructions, every main road contain a Kronecker convolution block, each Kronecker Convolution block is made of Kronecker convolution, BN and ReLU.Different Kronecker convolution contain the different interior expansion factor and The intra-sharing factor captures relatively rich dimensional information to expand receptive field to the greatest extent.In addition, there is three parallel constructions Branch, every branch contain identical global attention power module.The overall situation pays attention to power module by the average pond layer of the overall situation and Sigmoid Activation primitive composition.Notice that power module generates attention force vector and contains more rulers to what is extracted by Kronecker convolution block using global The characteristic pattern for spending information carries out recalibration, and to select, distinction is strong and multi-scale information characteristic pattern abundant.Using three 1 × 1 convolution carries out the characteristic pattern selected from three main roads to reduce channel processing, to reduce complicated calculations and save the time.Most The characteristic pattern in three main roads is fused together eventually, exports new feature set of graphs T ∈ { t₁, t₂..., t_a}。

4. according to claim 1 based on coding-decoding structure image, semantic dividing method, characterized in that

The spatial information trapping module structure and the feature set of graphs for extracting rich space information are to carry out as follows:

Spatial information trapping module contains three branches, and every branch contains 1 × 1 convolution to reduce characteristic pattern number.From changing The characteristic pattern G ∈ { g obtained into second group of convolution of the ResNet-101 backbone of structure₁, g₂..., g_lBy three 1 × 1 volumes Three new feature set of graphs Ξ ∈ { μ are obtained after product processing₁, μ₂..., μ_s,With ξ ∈ { η₁, η₂..., η_k, wherein Ξ, Ψ have carried out matrix multiplication after being deformed respectively, have calculated spatial attention using Softmax operation later VectorUtilize calculated space transforms force vectorCharacteristic pattern ξ is marked again in the enterprising feature of Spatial Dimension, and introduces one Scale factor carrys out guidance model and gradually learns for the weight of regional area to be assigned to global position, and final output contains rich space letter The feature set of graphs Q ∈ { q of breath₁, q₂..., q_d}。

5. according to claim 1 based on coding-decoding structure image, semantic dividing method, characterized in that

The multi-kernel convolution block has the following structure: