CN108717569B - Expansion full-convolution neural network device and construction method thereof - Google Patents

Expansion full-convolution neural network device and construction method thereof Download PDF

Info

Publication number
CN108717569B
CN108717569B CN201810470228.5A CN201810470228A CN108717569B CN 108717569 B CN108717569 B CN 108717569B CN 201810470228 A CN201810470228 A CN 201810470228A CN 108717569 B CN108717569 B CN 108717569B
Authority
CN
China
Prior art keywords
expansion
layer
convolutional
feature
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810470228.5A
Other languages
Chinese (zh)
Other versions
CN108717569A (en
Inventor
曹铁勇
方正
张雄伟
杨吉斌
孙蒙
李莉
赵斐
洪施展
项圣凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Army Engineering University of PLA
Original Assignee
Army Engineering University of PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Army Engineering University of PLA filed Critical Army Engineering University of PLA
Priority to CN201810470228.5A priority Critical patent/CN108717569B/en
Publication of CN108717569A publication Critical patent/CN108717569A/en
Application granted granted Critical
Publication of CN108717569B publication Critical patent/CN108717569B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons

Abstract

The invention discloses an expansion full convolution neural network and a construction method thereof. The neural network comprises a convolutional neural network, a feature extraction module and a feature fusion module which are connected in sequence. The construction method comprises the following steps: selecting a convolutional neural network: removing full-connection layers and classification layers for classification in the convolutional neural network, only leaving a convolutional layer and a pooling layer in the middle, and extracting a feature map from the convolutional layer and the pooling layer; a structural feature extraction module: the feature extraction module comprises a plurality of expansion up-sampling modules which are connected in series, and each expansion up-sampling module respectively comprises a feature map merging layer, an expansion convolution layer and a deconvolution layer; constructing a feature fusion module: the feature fusion module comprises a dense expansion fusion volume block and an deconvolution layer. The method effectively solves the problems of feature extraction and fusion in the convolutional neural network, and can be applied to the task of labeling the pixel level of the image.

Description

Expansion full-convolution neural network device and construction method thereof
Technical Field
The invention belongs to the technical field of image signal processing, and particularly relates to an expansion full convolution neural network device and a construction method thereof.
Background
Convolutional Neural Networks (CNNs) are the most widely used networks for deep learning in image processing and computer vision. CNNs were originally designed for image recognition classification, i.e., class labels in the output image after the input image passes through the CNN. However, in some fields of image processing, merely identifying the category of the entire image is far from sufficient. For example, in image semantic segmentation, the category of each pixel point in an image needs to be labeled, and the output at this time is not a category label but a mapping map with the same size as the original image, and each pixel in the mapping map is labeled with the semantic category to which the corresponding pixel in the original image belongs. At this time, the CNN is unable to complete the task only, structural improvement needs to be made on the CNN, and the earliest network of CNN modified to the pixel level labeling task is a full volume network (FCN) (j.long, e.shell, and t.darrell, "full connectivity networks for the sake of" in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2015, pp.3431-3440 "), which replaces the classification layer of the conventional CNN tail with a convolution layer and a deconvolution layer to obtain an output mapping image with the same size as the original image, and FCN was used for semantic segmentation of the image at the earliest time and is also used for other types of pixel level labeling tasks later. FCN has two main applications:
(1) detecting the image significance: the saliency detection of an image aims to find out a salient foreground target in the image, namely, a foreground target and a background target of the image are detected through an algorithm, and if a saliency detection model is learned through FCN, a loss function of a general network is the Euclidean distance or cross entropy between an annotation graph and a generated mapping graph.
(2) Image semantic segmentation: different from the saliency target detection, the semantic segmentation needs to find out and label all semantic contents in each image, not only to segment the foreground but also to have the background, and also needs to classify the labeled areas. When training the semantic segmentation model using FCN, the general loss function consists of cross entropy and a Softmax classification function.
However, the result graph obtained from the conventional full convolution network often cannot well retain the edge information of the object, the result graph is often rough, and a post-processing process is generally adopted to improve the labeling precision. The post-processing processes not only increase the complexity of the labeling model, but also have the defects that the obtained result is not smooth and has more discontinuous pixel points due to the manual segmentation of the labeling process, and the result is greatly influenced. These drawbacks are mainly due to the fact that previous FCNs do not extract and exploit image features in the network very well, resulting in a degradation of performance. On the other hand, the conventional FCNs have a large number of parameters, and are not favorable for model transplantation and miniaturization.
Disclosure of Invention
The invention aims to provide an expansion full convolution neural network device and a construction method thereof, so that image features in a network can be accurately extracted and utilized and the expansion full convolution neural network device is used for an image pixel level segmentation task.
The technical solution for realizing the purpose of the invention is as follows: the utility model provides an inflation full convolution neural network device, includes convolution neural network, feature extraction module, the feature fusion module that connects gradually, wherein:
the convolutional neural network is a network main body and comprises a convolutional layer and a pooling layer, and a characteristic diagram is extracted from the convolutional layer and the pooling layer;
the feature extraction module comprises a plurality of expansion up-sampling modules which are connected in series, and each expansion up-sampling module respectively comprises a feature map merging layer, an expansion convolution layer and a deconvolution layer; the feature map merging layer merges feature maps with the same size in an overlapping mode; the expansion convolution layer is used for expanding a receptive field, and the definition of the receptive field is the size of an area mapped by pixel points on a characteristic diagram output by each layer of the convolution neural network on an original image; the deconvolution layer is used for up-sampling the feature map, so that the output feature map of the deconvolution layer is twice the size of the input feature map;
the feature fusion module comprises a dense expansion fusion convolution block, a deconvolution layer and an activation function, wherein the dense expansion fusion convolution block is used for fusing all feature graphs from the feature extraction module, the deconvolution layer resets an output image to the size of an original input image, and the activation function is selected according to a specific task.
Further, the feature extraction module comprises M expansion up-sampling modules connected in series, wherein the first expansion up-sampling module extracts a feature map from an output end of a first convolutional layer before the third down-sampling in the convolutional neural network, the second expansion up-sampling module extracts a feature map from an output end of the first convolutional layer before the fourth down-sampling in the convolutional neural network, and so on until the M expansion up-sampling module extracts a feature map from an output end of the first convolutional layer before the M +2 th down-sampling in the convolutional neural network; from the Mth expansion upsampling module to the 1 st expansion upsampling module, wherein the expansion factor of the expansion convolutional layer is decreased by less than 16; the up-sampling factor of the deconvolution layer in all the expansion up-sampling modules is 2.
Further, the dense expansion fusion convolution block is used for fusing all feature maps from the feature extraction module, the size of the feature maps is unchanged after the feature maps pass through each convolution layer in the dense expansion fusion convolution block, and the number of channels of the output feature maps of the dense expansion fusion convolution block is 1.
Further, the dense dilation fusion convolution block includes 5 convolutional layers, the input of each convolutional layer is from the output of all convolutional layers before this convolutional layer; the first 3 of them are expansion convolution layers, and the expansion factor is increased and less than 16; the last 2 are common convolutional layers; the feature map is unchanged in size after passing through the 5 convolutional layers.
A construction method of an expanded full convolution neural network comprises the following steps:
step 1, selecting a convolutional neural network: removing full-connection layers and classification layers for classification in the convolutional neural network, only leaving a convolutional layer and a pooling layer in the middle, and extracting a feature map from the convolutional layer and the pooling layer;
step 2, constructing a feature extraction module: the feature extraction module comprises a plurality of expansion up-sampling modules which are connected in series, and each expansion up-sampling module respectively comprises a feature map merging layer, an expansion convolution layer and a deconvolution layer;
step 3, constructing a feature fusion module: the feature fusion module comprises a dense expansion fusion volume block, a deconvolution layer and an activation function, the dense expansion fusion volume block fuses all feature graphs from the feature extraction module, the deconvolution layer resets an output image to the size of an original input image, and a result image is output after the output image is processed by the activation function.
Further, the feature extraction module in step 2 includes a plurality of expansion upsampling modules connected in series, where the number of the expansion upsampling modules is M, the first expansion upsampling module extracts a feature map from an output end of a first convolutional layer before third downsampling in the convolutional neural network, the second expansion upsampling module extracts a feature map from an output end of the first convolutional layer before fourth downsampling in the convolutional neural network, and so on until the mth expansion upsampling module extracts a feature map from an output end of the first convolutional layer before M +2 downsampling in the convolutional neural network, and from the mth expansion upsampling module to the 1 st expansion upsampling module, where expansion factors of the expansion convolutional layers decrease progressively and are smaller than 16; the up-sampling factor of the deconvolution layer in all the expansion up-sampling modules is 2.
Further, the dense expansion fusion convolution block in step 3 is used to fuse all feature maps from the feature extraction module, the size of the feature map after passing through each convolution layer in the dense expansion fusion convolution block is not changed, and the number of channels of the output feature map of the dense expansion fusion convolution block is 1.
Further, the activation function in step 3 is selected according to a specific task: if the network is used for training the image semantic segmentation task, the activation function is a softmax classification function; if the training of the significance detection task is performed, the activation function is a sigmoid function.
Further, the dense dilation fusion convolution block includes 5 convolutional layers, the input of each convolutional layer is from the output of all convolutional layers before this convolutional layer; the first 3 of them are expansion convolution layers, and the expansion factor is increased and less than 16; the last 2 are common convolutional layers; the feature map is unchanged in size after passing through the 5 convolutional layers.
Compared with the prior art, the invention has the following remarkable advantages: (1) the constructed feature extraction and fusion module effectively solves the problem of feature extraction and fusion in the FCN and can better solve the problem of pixel-level labeling in image processing; (2) better results can be obtained without additional post-treatment processes; (3) the model has simple structure, less final model parameters and high operation speed.
Drawings
Fig. 1 is an overall configuration diagram of the expanded fully convolutional neural network device of the present invention.
Fig. 2 is a schematic structural diagram of an expansion up-sampling module in the expansion full convolution neural network device according to the present invention.
FIG. 3 is a schematic structural diagram of a dense fusion module in the expanded fully convolutional neural network device according to the present invention.
FIG. 4 is an exemplary diagram of a dense dilation network constructed by the method of the present invention for a dense convolutional network.
Detailed Description
The present invention is described in further detail below with reference to the attached drawing figures.
The utility model provides an inflation full convolution neural network device, includes convolution neural network, feature extraction module, the feature fusion module that connects gradually, wherein:
the convolutional neural network is a network main body and comprises a convolutional layer and a pooling layer, and a characteristic diagram is extracted from the convolutional layer and the pooling layer;
the feature extraction module comprises a plurality of expansion up-sampling modules which are connected in series, and each expansion up-sampling module respectively comprises a feature map merging layer, an expansion convolution layer and a deconvolution layer; the feature map merging layer merges feature maps with the same size in an overlapping mode; the expansion convolution layer is used for expanding a receptive field, and the definition of the receptive field is the size of an area mapped by pixel points on a characteristic diagram output by each layer of the convolution neural network on an original image; the deconvolution layer is used for up-sampling the feature map, so that the output feature map of the deconvolution layer is twice the size of the input feature map;
the feature fusion module comprises a dense expansion fusion convolution block, a deconvolution layer and an activation function, wherein the dense expansion fusion convolution block is used for fusing all feature graphs from the feature extraction module, the deconvolution layer resets an output image to the size of an original input image, and the activation function is selected according to a specific task.
Further, the feature extraction module comprises M expansion up-sampling modules connected in series, wherein the first expansion up-sampling module extracts a feature map from an output end of a first convolutional layer before the third down-sampling in the convolutional neural network, the second expansion up-sampling module extracts a feature map from an output end of the first convolutional layer before the fourth down-sampling in the convolutional neural network, and so on until the M expansion up-sampling module extracts a feature map from an output end of the first convolutional layer before the M +2 th down-sampling in the convolutional neural network; from the Mth expansion upsampling module to the 1 st expansion upsampling module, wherein the expansion factor of the expansion convolutional layer is decreased by less than 16; the up-sampling factor of the deconvolution layer in all the expansion up-sampling modules is 2.
Further, the dense expansion fusion convolution block is used for fusing all feature maps from the feature extraction module, the size of the feature maps is unchanged after the feature maps pass through each convolution layer in the dense expansion fusion convolution block, and the number of channels of the output feature maps of the dense expansion fusion convolution block is 1.
Further, the dense dilation fusion convolution block includes 5 convolutional layers, the input of each convolutional layer is from the output of all convolutional layers before this convolutional layer; the first 3 of them are expansion convolution layers, and the expansion factor is increased and less than 16; the last 2 are common convolutional layers; the feature map is unchanged in size after passing through the 5 convolutional layers.
A construction method of an expanded full convolution neural network comprises the following steps:
step 1, selecting a convolutional neural network: removing full-connection layers and classification layers for classification in the convolutional neural network, only leaving a convolutional layer and a pooling layer in the middle, and extracting a feature map from the convolutional layer and the pooling layer;
step 2, constructing a feature extraction module: the feature extraction module comprises a plurality of expansion up-sampling modules which are connected in series, and each expansion up-sampling module respectively comprises a feature map merging layer, an expansion convolution layer and a deconvolution layer;
step 3, constructing a feature fusion module: the feature fusion module comprises a dense expansion fusion volume block, a deconvolution layer and an activation function, the dense expansion fusion volume block fuses all feature graphs from the feature extraction module, the deconvolution layer resets an output image to the size of an original input image, and a result image is output after the output image is processed by the activation function.
Further, the feature extraction module in step 2 includes a plurality of expansion upsampling modules connected in series, where the number of the expansion upsampling modules is M, the first expansion upsampling module extracts a feature map from an output end of a first convolutional layer before third downsampling in the convolutional neural network, the second expansion upsampling module extracts a feature map from an output end of the first convolutional layer before fourth downsampling in the convolutional neural network, and so on until the mth expansion upsampling module extracts a feature map from an output end of the first convolutional layer before M +2 downsampling in the convolutional neural network, and from the mth expansion upsampling module to the 1 st expansion upsampling module, where expansion factors of the expansion convolutional layers decrease progressively and are smaller than 16; the up-sampling factor of the deconvolution layer in all the expansion up-sampling modules is 2.
Further, the dense expansion fusion convolution block in step 3 is used to fuse all feature maps from the feature extraction module, the size of the feature map after passing through each convolution layer in the dense expansion fusion convolution block is not changed, and the number of channels of the output feature map of the dense expansion fusion convolution block is 1.
Further, the activation function in step 3 is selected according to a specific task: if the network is used for training the image semantic segmentation task, the activation function is a softmax classification function; if the training of the significance detection task is performed, the activation function is a sigmoid function.
Further, the dense dilation fusion convolution block includes 5 convolutional layers, the input of each convolutional layer is from the output of all convolutional layers before this convolutional layer; the first 3 of them are expansion convolution layers, and the expansion factor is increased and less than 16; the last 2 are common convolutional layers; the feature map is unchanged in size after passing through the 5 convolutional layers.
Example 1
FIG. 1 is a schematic diagram of the structure of the disclosed expanded fully convoluted network. The network consists of 3 parts, including a convolutional neural network part, a feature extraction module and a feature fusion module. The convolutional layer is denoted "Conv" and "Pooling" denotes the Pooling layer.
(1) A convolutional neural network:
the convolutional neural network can select all the existing convolutional neural networks, including VGG-Net, ResNet, DenseNet, and the like, the convolutional neural network is a network used for image classification, and generally comprises some convolutional layers, pooling layers and full-link layers, when the full-convolutional network is constructed, the full-link layer and the sorting layer which are finally used for classification in the convolutional network need to be removed, only the convolutional layers and the pooling layers in the middle are left, and an output feature map is extracted from the middle layers, as shown in fig. 1, a feature map after each pooling layer is generally extracted, because each pooling layer downsamples an input image, the feature maps after each pooling layer have different sizes, and specific analysis is found in a feature extraction module construction part.
(2) The feature extraction module is constructed as follows:
the feature extraction is composed of a series of expansion up-sampling modules, and fig. 2 is a diagram of the expansion up-sampling module provided by the invention, which is composed of a layer for merging feature maps, an expansion convolution layer and a deconvolution layer. The following briefly describes deconvolution and dilation convolution, followed by the feature extraction module construction process.
Assuming that F is a two-dimensional image, N × N, and K is a filter K × K, the convolution operation of F and K is defined as:
Figure GDA0003458567970000061
wherein
Figure GDA0003458567970000062
Here, the convolution sign is indicated, and S (x, y) is the obtained convolution result.
Let l be the dilation factor, then the convolution of the dilation factor l
Figure GDA0003458567970000063
Is defined as:
Figure GDA0003458567970000064
the expansion convolution can effectively enlarge the receptive field, and the definition of the receptive field is the size of the area of the pixel points on the characteristic diagram output by each layer of the convolution neural network, which are mapped on the original image. The larger the expansion factor is, the larger the receptive field in the feature map is, and more detailed information in the original image can be captured. When designing a dense dilation full convolution network, the smaller feature maps require a larger receptive field, so the dilation factors in the 4 dilation upsampling modules decrease sequentially from the fourth to the first and are smaller than 16.
Deconvolution is the inverse operation of convolution. In FCN, deconvolution is used to up-sample the feature map, because the CNN original structure is a series of down-sampling structures (including convolution and pooling), and in the convolutional neural network, the size relationship between the input and output images of each convolutional layer can be expressed as:
Figure GDA0003458567970000065
wherein O isconvIs the length or width of the output image, IconvIs the length or width of the input image, K is the convolution kernel size, P is the zero-padding number, and S is the convolution step.
The magnitude relation of the deconvolution input and output is as follows:
Odeconv=(Ideconv-1)S+K-SP (4)
wherein O isdeconvIs the length or width of the output image, IdeconvIs the length or width of the input image, K is the convolution kernel size, P is the zero-padding number, and S is the convolution step. The output size of the pooling layer is half of the input size.
In the following, a construction mode of the feature extraction module is described by taking a dense expanded convolution network as an example, as shown in fig. 4, in a convolution neural network, data flow exists in the form of a 4-dimensional tensor, assuming that an input image is N × N, the input tensor is 1 × 3 × N, after convolution, feature maps with different channel numbers are output, and according to a network structure, a feature map tensor extracted from a dense convolution block 1 is 1 × N (N/4), N is the channel number of the feature map, and here, the feature map tensor can be selected by self according to circumstances, generally, the larger N is, the more final model parameters are, but the better performance is. When designing the feature extraction module in the invention, the size relationship of each intermediate layer output feature graph is mainly concerned.
As described above, the size of the signature extracted from the dense convolution block 1 is (N/4) × (N/4), the size of the output signature of the dense convolution block 2 is (N/8) × (N/8), the size of the output signature of the dense convolution block 3 is (N/16) × (N/16), and the size of the output signature of the dense convolution block 4 is (N/32) × (N/32). However, in the labeling task at the pixel level, an output result graph with the same size as the original image needs to be obtained, meanwhile, the feature graph information of each layer is different, and all output feature graphs need to be up-sampled when the features of all layers are comprehensively utilized. For this, a cascaded upsampling structure is constructed, and the feature maps of all layers are upsampled to N/2 × N/2.
A single upsampling building block is shown in fig. 2, where the parameters of the deconvolution layer are set to twice the upsampling profile, and the dilated convolution input profile is the same size as the output profile, the merging layer functions to merge the same sized profiles by means of superposition. When constructed, the minimum feature map (the feature map output by the dense convolution block 4 in fig. 4) is largeAs small as (N/32) × (N/32) and no smaller eigen map extraction follows it, so that when constructing the upsampling module for the last layer, no merging layer is needed, only the (N/16) × (N/16) eigen maps are output after the dilation convolution and deconvolution, and the eigen maps extracted from the dense convolution block are also (N/16) × (N/16), so in the second upsampling structure, a merging layer is added, which serves to merge these same-sized eigen maps into one tensor, assuming that the data tensor extracted from the dense convolution block 4 is 1 × N4(N/32) x (N/32), wherein N4After passing through the first up-sampling structure unit, the output tensor becomes 1 × n3(N/16) x (N/16), and the data tensor extracted from the dense convolution block 3 is 1 x N4(N/16) and the output tensor becomes 1 (N) after the merging layers are merged3+n4) (N/16) by (N/16), in this way, sequentially onwards, the final 4 th up-sampled structure output tensor is 1 (N)3+n4+n2+n1)*(N/2)*(N/2)。
Table 1 includes some parameter setting examples of the feature extraction module in the dense expanding convolution network, and the convolution kernel size, zero padding, convolution step size and expansion factor are all related parameters of convolution operation. Where Conv1-4 is the dilated convolution layer of the 4 upsampled structures and Deconv1-4 is the deconvolution structure thereof. In design, the parameters can be selected according to the situation, but the sizes of the expansion convolution input and output characteristic graphs are ensured to be the same, and the deconvolution enlarges the image size by 2 times each time, wherein the expansion factor of the expansion convolution is selected according to the size of the characteristic graph input by the expansion convolution, namely the large expansion factor is used for the large-size characteristic graph, and the small expansion factor is used for the small-size characteristic graph. Because the dilation factor becomes large, it can be that the convolution kernel becomes large, and if the feature size being convolved is too small, the information in the feature is lost.
For different networks, only the parameters or the number of the up-sampling structural units need to be changed according to the steps.
TABLE 1
Type (B) Convolution kernel size Zero padding Convolution step size Expansion factor Output of
Conv4 3*3 2 1 2 n1*(N/32)*(N/32)
Conv3 3*3 3 1 3 n2*(N/16)*(N/16)
Conv2 3*3 4 1 4 n3*(N/8)*(N/8)
Conv1 3*3 5 1 5 n4*(N/4)*(N/4)
Deconv4 4*4 1 2 / n1*(N/16)*(N/16)
Deconv3 4*4 1 2 / n2*(N/8)*(N/8)
Deconv2 4*4 1 2 / n3*(N/4)*(N/4)
Deconv1 4*4 1 2 / n4*(N/2)*(N/2)
Conv9_1 3*3 2 1 2 100*(N/2)*(N/2)
Conv9_2 3*3 4 1 4 50*(N/2)*(N/2)
Conv9_3 3*3 8 1 8 30*(N/2)*(N/2)
Conv9_4 1*1 0 1 / 20*(N/2)*(N/2)
Conv9_5 1*1 0 1 / 1*(N/2)*(N/2)
Deconv5 4*4 1 2 / 1*N*N
(3) The feature fusion module structure:
the feature fusion module is composed of a dense expansion fusion volume block, an deconvolution layer and an activation function, wherein the structure of the dense expansion fusion volume block is shown in fig. 3, wherein Conv9-1, Conv9-2 and Conv9-3 are expansion convolutions, the expansion factors are respectively 2,4,8, Conv9-4 and the Conv9-5 convolution layer is 1 × 1 convolution layer. Conv9-1-Conv9-5 is densely connected, i.e., the input to each layer is from the output of all layers before this layer, such as Conv9-5 in FIG. 3, whose input is from all the outputs of the previous 4 convolutional layers. The dense dilation fusion convolution block is used to fuse all feature maps from the feature extraction module to get the corresponding result. Also illustrated by the example of the dense dilation convolution network of fig. 4, the input of the dense dilation fusion convolution module comes from the feature extraction module, and the input tensor is 1 × (n)3+n4+n2+n1) (N/2) × (N/2), after the dense convolution block goes through a series of convolution operations, a tensor of 1 × 1 (N/2) × (N/2) is output, and in design, it should be ensured that the size of the eigenmap passing through 5 convolution layers is not changed, and the number of channels of the output eigenmap of the Conv9-5 convolution layer must be 1, and an example of parameter design in the dense dilation convolution fusion block in fig. 4 is shown in table 1. The expansion factor of Conv9-1-Conv9-3 can be selected according to the situation, and the number of output characteristic diagrams of Conv9-1-Conv9-4 can also be selected according to the situation. Deconv5 functions to reset the output image to the input image size. And selecting the activation function according to a specific task, for example, training an image semantic segmentation task by using the network, wherein the activation function is a softmax classification function, and if the training of a saliency detection task is performed, the activation function is a sigmoid function.
(4) Network training: after the network is constructed, the learning training of the network can be carried out for specific tasks. Different loss functions are selected for different tasks. For example, for a task of significance detection, a training set image and a corresponding label graph thereof need to be selected first, and a loss function is generally an euclidean distance between the label graph and a generated map. As shown in the following formula
Figure GDA0003458567970000091
Wherein Z ═ Zi(i=1,...,N1) Is the training set image, f (Z)i) Is the output of the image after passing through the network, Mi(i=1,...,N1) Is an annotation graph corresponding to the training image. The parameters of the network can be updated by minimizing the equation (1) by a gradient descent method.
Different loss functions and parameter updating methods can be selected for different training tasks. The network effectively solves the problems of feature extraction and fusion in the convolutional neural network, and can be applied to the task of labeling the pixel level of an image.

Claims (7)

1. The utility model provides an inflation full convolution neural network device which characterized in that, includes convolution neural network, feature extraction module, the feature fusion module that connects gradually, wherein:
the convolutional neural network is a network main body and comprises a convolutional layer and a pooling layer, and a characteristic diagram is extracted from the convolutional layer and the pooling layer;
the feature extraction module comprises a plurality of expansion up-sampling modules which are connected in series, and each expansion up-sampling module respectively comprises a feature map merging layer, an expansion convolution layer and a deconvolution layer; the feature map merging layer merges feature maps with the same size in an overlapping mode; the expansion convolution layer is used for expanding a receptive field, and the definition of the receptive field is the size of an area mapped by pixel points on a characteristic diagram output by each layer of the convolution neural network on an original image; the deconvolution layer is used for up-sampling the feature map, so that the output feature map of the deconvolution layer is twice the size of the input feature map;
the feature fusion module comprises a dense expansion fusion convolution block, a deconvolution layer and an activation function, wherein the dense expansion fusion convolution block is used for fusing all feature graphs from the feature extraction module, the deconvolution layer resets an output image to the size of an original input image, and the activation function is selected according to a specific task;
the dense dilation fusion convolution block includes 5 convolutional layers, the input of each convolutional layer is from the output of all convolutional layers before the convolutional layer; the first 3 of them are expansion convolution layers, and the expansion factor is increased and less than 16; the last 2 are common convolutional layers; the feature map is unchanged in size after passing through the 5 convolutional layers.
2. The apparatus according to claim 1, wherein the feature extraction module comprises M cascaded dilation upsampling modules, wherein a first dilation upsampling module extracts a feature map from an output of a first convolutional layer before a third downsampling in the convolutional neural network, a second dilation upsampling module extracts a feature map from an output of the first convolutional layer before a fourth downsampling in the convolutional neural network, and so on until the mth dilation upsampling module extracts a feature map from an output of the first convolutional layer before the M +2 downsampling in the convolutional neural network; from the Mth expansion upsampling module to the 1 st expansion upsampling module, wherein the expansion factor of the expansion convolutional layer is decreased by less than 16; the up-sampling factor of the deconvolution layer in all the expansion up-sampling modules is 2.
3. The apparatus according to claim 1, wherein the dense dilation fusion convolution block is configured to fuse all feature maps from the feature extraction module, the feature maps have a constant size after passing through each convolution layer in the dense dilation fusion convolution block, and the number of channels of the output feature maps of the dense dilation fusion convolution block is 1.
4. A construction method of an expansion full convolution neural network is characterized by comprising the following steps:
step 1, selecting a convolutional neural network: removing full-connection layers and classification layers for classification in the convolutional neural network, only leaving a convolutional layer and a pooling layer in the middle, and extracting a feature map from the convolutional layer and the pooling layer;
step 2, constructing a feature extraction module: the feature extraction module comprises a plurality of expansion up-sampling modules which are connected in series, and each expansion up-sampling module respectively comprises a feature map merging layer, an expansion convolution layer and a deconvolution layer;
step 3, constructing a feature fusion module: the feature fusion module comprises a dense expansion fusion volume block, a deconvolution layer and an activation function, the dense expansion fusion volume block fuses all feature graphs from the feature extraction module, the deconvolution layer resets an output image to the size of an original input image, and a result image is output after the output image is processed by the activation function; the dense dilation fusion convolution block includes 5 convolutional layers, the input of each convolutional layer is from the output of all convolutional layers before the convolutional layer; the first 3 of them are expansion convolution layers, and the expansion factor is increased and less than 16; the last 2 are common convolutional layers; the feature map is unchanged in size after passing through the 5 convolutional layers.
5. The method according to claim 4, wherein the feature extraction module in step 2 includes a plurality of expansion upsampling modules connected in series, where the number of the expansion upsampling modules is M, a first expansion upsampling module extracts a feature map from an output end of a first convolutional layer in the convolutional neural network before a third downsampling, a second expansion upsampling module extracts a feature map from an output end of the first convolutional layer in the convolutional neural network before a fourth downsampling, and so on until an M expansion upsampling module extracts a feature map from an output end of the first convolutional layer in the convolutional neural network before an M +2 downsampling, and so on from the M expansion upsampling module to a 1 expansion upsampling module, where expansion factors of the expansion convolutional layers decrease progressively and are less than 16; the up-sampling factor of the deconvolution layer in all the expansion up-sampling modules is 2.
6. The method according to claim 4, wherein the dense expansion fusion convolution block in step 3 is used to fuse all feature maps from the feature extraction module, the size of the feature map passing through each convolution layer in the dense expansion fusion convolution block is not changed, and the number of channels of the output feature map of the dense expansion fusion convolution block is 1.
7. The method of constructing a dilated full convolution neural network of claim 4, wherein step 3 the activation function is selected according to a specific task: if the network is used for training the image semantic segmentation task, the activation function is a softmax classification function; if the training of the significance detection task is performed, the activation function is a sigmoid function.
CN201810470228.5A 2018-05-16 2018-05-16 Expansion full-convolution neural network device and construction method thereof Active CN108717569B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810470228.5A CN108717569B (en) 2018-05-16 2018-05-16 Expansion full-convolution neural network device and construction method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810470228.5A CN108717569B (en) 2018-05-16 2018-05-16 Expansion full-convolution neural network device and construction method thereof

Publications (2)

Publication Number Publication Date
CN108717569A CN108717569A (en) 2018-10-30
CN108717569B true CN108717569B (en) 2022-03-22

Family

ID=63900129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810470228.5A Active CN108717569B (en) 2018-05-16 2018-05-16 Expansion full-convolution neural network device and construction method thereof

Country Status (1)

Country Link
CN (1) CN108717569B (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109615059B (en) * 2018-11-06 2020-12-25 海南大学 Edge filling and filter expansion operation method and system in convolutional neural network
CN109522966B (en) * 2018-11-28 2022-09-27 中山大学 Target detection method based on dense connection convolutional neural network
CN109492612A (en) * 2018-11-28 2019-03-19 平安科技(深圳)有限公司 Fall detection method and its falling detection device based on skeleton point
CN111292301A (en) * 2018-12-07 2020-06-16 北京市商汤科技开发有限公司 Focus detection method, device, equipment and storage medium
US10762393B2 (en) * 2019-01-31 2020-09-01 StradVision, Inc. Learning method and learning device for learning automatic labeling device capable of auto-labeling image of base vehicle using images of nearby vehicles, and testing method and testing device using the same
CN109961095B (en) * 2019-03-15 2023-04-28 深圳大学 Image labeling system and method based on unsupervised deep learning
CN110110782A (en) * 2019-04-30 2019-08-09 南京星程智能科技有限公司 Retinal fundus images optic disk localization method based on deep learning
CN110189282A (en) * 2019-05-09 2019-08-30 西北工业大学 Based on intensive and jump connection depth convolutional network multispectral and panchromatic image fusion method
CN110464611A (en) * 2019-07-23 2019-11-19 苏州国科视清医疗科技有限公司 A kind of digitlization amblyopia enhancing training device and system and its related algorithm
CN110473173A (en) * 2019-07-24 2019-11-19 熵智科技(深圳)有限公司 A kind of defect inspection method based on deep learning semantic segmentation
CN110956194A (en) * 2019-10-10 2020-04-03 深圳先进技术研究院 Three-dimensional point cloud structuring method, classification method, equipment and device
CN111047569B (en) * 2019-12-09 2023-11-24 北京联合大学 Image processing method and device
CN111144269B (en) * 2019-12-23 2023-11-24 威海北洋电气集团股份有限公司 Signal correlation behavior recognition method and system based on deep learning
CN111415000B (en) * 2020-04-29 2024-03-22 Oppo广东移动通信有限公司 Convolutional neural network, and data processing method and device based on convolutional neural network
CN111738338B (en) * 2020-06-23 2021-06-18 征图新视(江苏)科技股份有限公司 Defect detection method applied to motor coil based on cascaded expansion FCN network
CN112101214B (en) * 2020-09-15 2022-08-26 重庆市农业科学院 Network structure for rapidly counting tea plant bugs based on thermodynamic diagrams
CN112381131A (en) * 2020-11-10 2021-02-19 中国地质大学(武汉) Rock slice identification method, device, equipment and storage medium
CN112541878A (en) * 2020-12-24 2021-03-23 北京百度网讯科技有限公司 Method and device for establishing image enhancement model and image enhancement
CN112686377B (en) * 2021-03-18 2021-07-02 北京地平线机器人技术研发有限公司 Method and device for carrying out deconvolution processing on feature data by utilizing convolution hardware
CN113098862A (en) * 2021-03-31 2021-07-09 昆明理工大学 Intrusion detection method based on combination of hybrid sampling and expansion convolution
CN112800691B (en) * 2021-04-15 2021-07-30 中国气象局公共气象服务中心(国家预警信息发布中心) Method and device for constructing precipitation level prediction model
CN113139543B (en) * 2021-04-28 2023-09-01 北京百度网讯科技有限公司 Training method of target object detection model, target object detection method and equipment
CN113762476B (en) * 2021-09-08 2023-12-19 中科院成都信息技术股份有限公司 Neural network model for text detection and text detection method thereof

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106934456A (en) * 2017-03-16 2017-07-07 山东理工大学 A kind of depth convolutional neural networks model building method
CN107016664A (en) * 2017-01-18 2017-08-04 华侨大学 A kind of bad pin flaw detection method of large circle machine
CN107169974A (en) * 2017-05-26 2017-09-15 中国科学技术大学 It is a kind of based on the image partition method for supervising full convolutional neural networks more
CN107203999A (en) * 2017-04-28 2017-09-26 北京航空航天大学 A kind of skin lens image automatic division method based on full convolutional neural networks
CN107844795A (en) * 2017-11-18 2018-03-27 中国人民解放军陆军工程大学 Convolutional neural networks feature extracting method based on principal component analysis
CN107977968A (en) * 2017-12-22 2018-05-01 长江勘测规划设计研究有限责任公司 The building layer detection method excavated based on buildings shadow information
CN108021923A (en) * 2017-12-07 2018-05-11 维森软件技术(上海)有限公司 A kind of image characteristic extracting method for deep neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9933264B2 (en) * 2015-04-06 2018-04-03 Hrl Laboratories, Llc System and method for achieving fast and reliable time-to-contact estimation using vision and range sensor data for autonomous navigation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107016664A (en) * 2017-01-18 2017-08-04 华侨大学 A kind of bad pin flaw detection method of large circle machine
CN106934456A (en) * 2017-03-16 2017-07-07 山东理工大学 A kind of depth convolutional neural networks model building method
CN107203999A (en) * 2017-04-28 2017-09-26 北京航空航天大学 A kind of skin lens image automatic division method based on full convolutional neural networks
CN107169974A (en) * 2017-05-26 2017-09-15 中国科学技术大学 It is a kind of based on the image partition method for supervising full convolutional neural networks more
CN107844795A (en) * 2017-11-18 2018-03-27 中国人民解放军陆军工程大学 Convolutional neural networks feature extracting method based on principal component analysis
CN108021923A (en) * 2017-12-07 2018-05-11 维森软件技术(上海)有限公司 A kind of image characteristic extracting method for deep neural network
CN107977968A (en) * 2017-12-22 2018-05-01 长江勘测规划设计研究有限责任公司 The building layer detection method excavated based on buildings shadow information

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Densely Connected Convolutional Networks;Gao Huang et al;《2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》;20170726;2261-2269 *
RDFNet: RGB-D Multi-Level Residual Feature Fusion for Indoor Semantic Segmentation;Seong-Jin Park et al;《Proceedings of the IEEE International Conference on Computer Vision (ICCV)》;20171231;4980-4989 *
基于全卷积网络的语义显著性区域检测方法研究;曹铁勇等;《电子学报》;20171115;第45卷(第11期);2593-2601 *
基于多尺度全卷积网络特征融合的人群计数;彭山珍等;《武汉大学学报(理学版)》;20180331;第64卷(第3期);249-254 *
用于视频动作检测的时空多任务神经网络;刘垚;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180115(第1期);I138-1171 *

Also Published As

Publication number Publication date
CN108717569A (en) 2018-10-30

Similar Documents

Publication Publication Date Title
CN108717569B (en) Expansion full-convolution neural network device and construction method thereof
CN108596330B (en) Parallel characteristic full-convolution neural network device and construction method thereof
CN110232394B (en) Multi-scale image semantic segmentation method
CN108549893B (en) End-to-end identification method for scene text with any shape
CN109461157B (en) Image semantic segmentation method based on multistage feature fusion and Gaussian conditional random field
CN112446383B (en) License plate recognition method and device, storage medium and terminal
Jiao et al. A configurable method for multi-style license plate recognition
CN111428781A (en) Remote sensing image ground object classification method and system
CN108805874B (en) Multispectral image semantic cutting method based on convolutional neural network
CN112528976B (en) Text detection model generation method and text detection method
CN114118124B (en) Image detection method and device
CN107564009B (en) Outdoor scene multi-target segmentation method based on deep convolutional neural network
CN113569865B (en) Single sample image segmentation method based on class prototype learning
CN106682628B (en) Face attribute classification method based on multilayer depth feature information
Narang et al. Devanagari ancient documents recognition using statistical feature extraction techniques
CN110298841B (en) Image multi-scale semantic segmentation method and device based on fusion network
CN109492640A (en) Licence plate recognition method, device and computer readable storage medium
van den Brand et al. Instance-level segmentation of vehicles by deep contours
CN114782705A (en) Method and device for detecting closed contour of object
CN116863194A (en) Foot ulcer image classification method, system, equipment and medium
CN111275732B (en) Foreground object image segmentation method based on depth convolution neural network
CN113313162A (en) Method and system for detecting multi-scale feature fusion target
CN112767280A (en) Single image raindrop removing method based on loop iteration mechanism
CN111914947A (en) Image instance segmentation method, device and equipment based on feature fusion and storage medium
Edan Cuneiform symbols recognition based on k-means and neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant