CN110458841B - Method for improving image segmentation running speed - Google Patents

Method for improving image segmentation running speed Download PDF

Info

Publication number
CN110458841B
CN110458841B CN201910535642.4A CN201910535642A CN110458841B CN 110458841 B CN110458841 B CN 110458841B CN 201910535642 A CN201910535642 A CN 201910535642A CN 110458841 B CN110458841 B CN 110458841B
Authority
CN
China
Prior art keywords
convolution
size
kernel
image
convolution kernel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910535642.4A
Other languages
Chinese (zh)
Other versions
CN110458841A (en
Inventor
张烨
樊一超
郭艺玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201910535642.4A priority Critical patent/CN110458841B/en
Publication of CN110458841A publication Critical patent/CN110458841A/en
Application granted granted Critical
Publication of CN110458841B publication Critical patent/CN110458841B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection

Abstract

A method of increasing an image segmentation run rate, comprising: designing a multi-scale cavity convolution kernel; designing a channel convolution network; designing a full convolution connection and deconvolution network; the invention can be suitable for any image size through a network of deconvolution and full convolution operations, can carry out semantic analysis on each pixel point of the image, achieves the aim of rapidly segmenting the image, and can rapidly and accurately position the image characteristics.

Description

Method for improving image segmentation running speed
Technical Field
The invention relates to a method for changing image segmentation rate.
Technical Field
In recent years, with the rapid development of computer science and technology, image processing, image target detection and the like based on computer technology have also been developed unprecedentedly, wherein deep learning is performed by learning massive digital image features and extracting key target features, which is more than human in target detection, and brings a further surprise to the industry. With the rise of the neuron network again, the video image method based on the convolutional neuron network becomes a mainstream technology of image segmentation and identification, and the accurate identification of the image is realized by means of template matching, edge feature extraction, gradient histograms and the like. Although the image feature detection based on the neural network can effectively identify the features of the targets in the complex scene, and the effect is far better than that of the traditional method, the method also has the following defects: (1) the noise immunity is weak; (2) the problem of overfitting is solved by using a Dropout method, a convolutional neural network model and parameters are improved, but the precision is slightly reduced; (3) a variable convolution and separable convolution structure is introduced, the generalization of the model is improved, the network model feature extraction capability is enhanced, but the target identification performance of a complex scene is poor; (4) a newer image segmentation method, namely End-to-End, directly predicts image pixel classification information and achieves pixel positioning of a target object, but the model has the problems of large parameter, low efficiency, rough segmentation and the like. In a word, the traditional detection method and the video image method have the problems of complex operation, low identification precision, low identification efficiency, rough segmentation and the like.
Disclosure of Invention
To overcome the above-mentioned deficiencies of the prior art, the present invention provides a method for increasing the running rate of image segmentation for full convolution. The invention adopts a deep learning framework and optimizes and improves the convolutional neural network; reducing the parameter quantity of the model by adopting a channel convolution method; the characteristics of the image are increased by adopting multi-scale hole convolution, and the problem of small receptive field of the traditional network is solved.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for improving the running speed of image segmentation comprises the following steps:
designing a multi-scale cavity convolution kernel;
in order to solve the problem that the receptive field is increased by adopting the traditional convolution and maximum pooling method, the invention adopts the hole convolution kernel, increases the sampling rate on the basis of the traditional convolution kernel, and changes the original convolution kernel into fluffy.
Thus, while the original calculated amount is kept, the receptive field is increased, so that the image segmentation information is accurate enough, and the calculation formula of the size of the receptive field based on the cavity convolution kernel is
Figure BDA0002101087030000011
In the formula: f is the size of the current layer receptive field; the rate is the sampling rate of the hole convolution kernel, i.e. the number of intervals, and the rate of the conventional convolution kernel can be regarded as 1, and the sampling rate of the hole convolution can be regarded as 2. The traditional convolution receptive field calculation formula is
Figure BDA0002101087030000012
In the formula: fi-1The size of the receptive field of the previous layer; k is a radical ofiIs convolution of ith layerNuclear or pooled nuclear size; n is the total number of layers of convolution; siIs the convolution step size Stride of the i-th layer convolution kernel.
The multi-scale cavity convolution is designed by using the thought of multi-scale image change, and the sampling rate and the convolution kernel size are subjected to diversified processing, so that the method can adapt to the feature extraction process of targets with different sizes. The calculation of the multi-scale hole convolution is
Figure BDA0002101087030000021
In the formula: y [ i ] is the convolution summation result corresponding to the ith step position; k is a convolution kernel; k is the coordinate position of the parameter in the convolution kernel, and K belongs to K; w [ k ] is the convolution kernel weight; the rate is a sampling rate and can take corresponding values of 1, 2 and 3.
Designing a channel convolution network;
since the conventional convolution mode is a dimension-increasing operation, it can be considered that a channel convolution mode is adopted to achieve the effect of reducing the dimension of the feature convolution. Firstly, the traditional convolution is changed into two-layer convolution, similar to group operation in ResNet, the new structure shortens the calculation time to about 1/8 on the premise of not influencing the accuracy, reduces the parameter quantity to about 1/9, can be well applied to a mobile terminal, realizes real-time detection of a target, and has an obvious model compression effect.
For the conventional convolution, the number of input characteristic channels is assumed to be M; width or height of convolution kernel is DkOr Dk(ii) a The number of convolution kernels is N. Then there are N M.D. s for each position that the convolution slides oncek·DkThe step size of the sliding is set to s. The calculation formula of the size of the image after sliding is
Figure BDA0002101087030000022
Figure BDA0002101087030000023
In the formula: h ', w' are height and width after convolution; pad is the boundary of the width and height fill. Therefore, a certain point of the size after h '& w' convolution corresponds to N M & Dk·DkThe parameter (D) can be obtained as a total parameter of N.M.Dk·Dk·h'·w' (6)
And the convolution step is divided into two steps by adopting an improved channel convolution mode:
1) by using Dk·DkThe convolution of M convolves the M channels separately. Sliding by using the same step length s, the dimension after convolution is h ', w', and the parameter quantity generated by the step is Dk·Dk·M·h'·w' (7)
2) And setting a convolution kernel of 1.1. N to perform the ascending-dimension feature extraction. At the moment, the feature diagram obtained by the result is subjected to feature extraction again by adopting a mode of step length 1, the original M channel features are subjected to feature extraction by adopting N convolution kernels respectively, and the calculated total parameter size is M.N.h '. w'. 1.1 (8)
The convolution structures of the two steps are integrated to obtain the final parameter quantity D of the channel convolutionk·Dk·M·h'·w'+M·N·h'·w' (9)
As previously mentioned, the parameter of the conventional convolution kernel is compared with the parameter of the improved channel convolution by the quantity
Figure BDA0002101087030000024
From the analysis of equation (10), if a convolution kernel size of 3 × 3 is used, the channel convolution operation can reduce the parameter amount to 1/9.
Designing a full convolution connection and deconvolution network;
the final layer of the traditional network structure adopts a fixed size, so that an input picture needs to be converted into a fixed size in advance, and the acquisition of the vehicle length coordinate of the logistics vehicle is not facilitated; in addition, the traditional full-connection layer network has the defects that the determined digit space coordinate is lost, so that the image space information is distorted, and the target cannot be effectively and accurately positioned. In order to solve the problem of information loss, the invention adopts a full convolution connection mode to accurately position the position coordinates of the features in the picture.
The full connection of the traditional network converts the convolution network [ b, c, h, w ] of the former part into [ b, c.h.w ], namely [ b,4096], and then into [ b, cls ], wherein b represents the batch size and cls represents the class number. The use of a full convolutional network is relative to a convolutional network followed by 1 × 1, without a full connection layer. Hence, it is called a full convolutional network. The calculation method of the full convolution is
Figure BDA0002101087030000035
In the formula: n is more than or equal to 1 and less than or equal to N; y isn[i][j]Convolving the (i, j) th position of the nth convolution kernel; siConvolution step size in the horizontal direction; sjConvolution step size in the vertical direction; k is a radical ofnIs the nth convolution kernel; dkFor convolution kernel width and height, the convolution kernel size corresponds to D in step 2k·Dk;δi,δjFor positions in the convolution kernel, the layer has a total of N different types of convolution kernels, 0 ≦ δij≤DkWhereas the sliding convolution operation of the convolution kernel may be converted to a two matrix multiplication operation. The result of the convolution with the pixels of the corresponding image may be expressed as
Figure BDA0002101087030000031
Wherein: the matrix dimension on the left is [ N, M.D ]k·Dk](ii) a The matrix dimension on the right side is [ M.D ]k·Dk,w′·h′](ii) a The dimension after convolution is [ N, w '. h']. In the matrix on the right, I is img, and subscripts thereof are image width and image height in turn, i.e. Iwh
Finally, through deconvolution operation, converting [ N, w '. h' ] into the size of the input image, thus accurately identifying the specific semantic information represented by each pixel and avoiding the loss of spatial information. The specific operation of deconvolution is equivalent to the inverse operation of convolution, i.e. using a single convolution as the inverse of the convolution
Figure BDA0002101087030000032
In the formula: k is a radical of1,…,kNThe weight value corresponding to each convolution kernel is changed from the original one
Figure BDA0002101087030000033
Is changed into
Figure BDA0002101087030000034
The weight is adjusted through training, and has the image semantic information characteristic.
Therefore, the network through the deconvolution and full convolution operation can be suitable for any image size, semantic analysis can be carried out on each pixel point of the image, the purpose of rapidly segmenting the image is achieved, and rapid and accurate positioning can be carried out on image features.
The invention has the advantages that:
aiming at the sample problem, the invention adopts a full convolution method to improve the image segmentation running speed, and has the most prominent characteristics that the image is subjected to lightweight processing, the segmentation efficiency of the model is improved under the condition of ensuring the segmentation precision, and the parameter quantity of the model is reduced in a channel convolution mode; and a multi-scale cavity convolution kernel is arranged, so that the receptive field of the model is reasonably and simply improved, and the generalization of the model is enhanced. The algorithm can be widely applied to the field of image positioning identification, such as logistics park vehicle identification and the like.
Drawings
FIG. 1 is a diagram illustrating a conventional convolution kernel convolution operation;
FIG. 2 is a schematic diagram of the convolution operation of the improved hole convolution kernel of the present invention;
FIGS. 3 a-3 c are multi-scale hole convolution kernels of the present invention, with FIG. 3a being a hole convolution kernel with a sample rate of 1, FIG. 3b being a hole convolution kernel with a sample rate of 2, and FIG. 3c being a hole convolution kernel with a sample rate of 3;
FIG. 4 is a prior art convolution scheme;
FIG. 5 is a channel convolution scheme of the present invention;
FIG. 6 is a channel convolution structure of the present invention;
FIG. 7 is a full convolution network design structure of the present invention;
FIG. 8 is a schematic diagram of a full convolution matrix calculation process according to the present invention.
Note: in fig. 6, DW is a channel convolution group, which represents a fixed collocation of channel convolution kernels; BN is batch normalization operation, and the problem that the data distribution of the middle layer is changed in the training process is solved; conv is the convolutional layer operation; RelU is a modified linear unit and is an activation function.
Note: in fig. 8: k is a radical of1,…,kNThe number of convolution kernels;
Figure BDA0002101087030000041
is the position weight of the nth convolution kernel.
Detailed Description
In order to overcome the defects in the prior art, the invention provides a full-convolution image segmentation method aiming at the sample problem, which adopts a deep learning framework and optimizes and improves a convolution neural network; reducing the parameter quantity of the model by adopting a channel convolution method; the characteristics of the image are increased by adopting multi-scale hole convolution, and the problem of small receptive field of the traditional network is solved.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for improving the running speed of image segmentation comprises the following steps:
designing a multi-scale cavity convolution kernel;
in order to solve the problem that the receptive field is increased by adopting the traditional convolution and maximum pooling method, the invention adopts the hole convolution kernel, increases the sampling rate on the basis of the traditional convolution kernel, and changes the original convolution kernel into fluffy.
Thus, while the original calculated amount is kept, the receptive field is increased, so that the image segmentation information is accurate enough, and the calculation formula of the size of the receptive field based on the cavity convolution kernel is
Figure BDA0002101087030000042
In the formula: f is the size of the current layer receptive field; the rate is the sampling rate of the hole convolution kernel, i.e. the number of intervals, and the rate of the conventional convolution kernel can be regarded as 1, and the sampling rate of the hole convolution can be regarded as 2. The traditional convolution receptive field calculation formula is
Figure BDA0002101087030000043
In the formula: fi-1The size of the receptive field of the previous layer; k is a radical ofiIs the convolution or pooling kernel size of the ith layer; n is the total number of layers of convolution; siIs the convolution step size Stride of the i-th layer convolution kernel.
The multi-scale cavity convolution is designed by using the thought of multi-scale image change, and the sampling rate and the convolution kernel size are subjected to diversified processing, so that the method can adapt to the feature extraction process of targets with different sizes. The calculation of the multi-scale hole convolution is
Figure BDA0002101087030000051
In the formula: y [ i ] is the convolution summation result corresponding to the ith step position; k is a convolution kernel; k is the coordinate position of the parameter in the convolution kernel, and K belongs to K; w [ k ] is the convolution kernel weight; the rate is a sampling rate and can take corresponding values of 1, 2 and 3.
Designing a channel convolution network;
since the conventional convolution mode is a dimension-increasing operation, it can be considered that a channel convolution mode is adopted to achieve the effect of reducing the dimension of the feature convolution. Firstly, the traditional convolution is changed into two-layer convolution, similar to group operation in ResNet, the new structure shortens the calculation time to about 1/8 on the premise of not influencing the accuracy, reduces the parameter quantity to about 1/9, can be well applied to a mobile terminal, realizes real-time detection of a target, and has an obvious model compression effect.
For the conventional convolution, the number of input characteristic channels is assumed to be M; width or height of convolution kernel is DkOr Dk(ii) a The number of convolution kernels is N. Then there are N M.D. s for each position that the convolution slides oncek·DkThe step size of the sliding is set to s. The calculation formula of the size of the image after sliding is
Figure BDA0002101087030000052
Figure BDA0002101087030000053
In the formula: h ', w' are height and width after convolution; pad is the boundary of the width and height fill. Therefore, a certain point of the size after h '& w' convolution corresponds to N M & Dk·DkThe parameter (D) can be obtained as a total parameter of N.M.Dk·Dk·h'·w' (6)
And the convolution step is divided into two steps by adopting an improved channel convolution mode:
1) by using Dk·DkThe convolution of M convolves the M channels separately. Sliding by using the same step length s, the dimension after convolution is h ', w', and the parameter quantity generated by the step is Dk·Dk·M·h'·w' (7)
2) And setting a convolution kernel of 1.1. N to perform the ascending-dimension feature extraction. At the moment, the feature diagram obtained by the result is subjected to feature extraction again by adopting a mode of step length 1, the original M channel features are subjected to feature extraction by adopting N convolution kernels respectively, and the calculated total parameter size is M.N.h '. w'. 1.1 (8)
The convolution structures of the two steps are integrated to obtain the final parameter quantity D of the channel convolutionk·Dk·M·h'·w'+M·N·h'·w' (9)
As previously mentioned, the parameter of the conventional convolution kernel is compared with the parameter of the improved channel convolution by the quantity
Figure BDA0002101087030000054
From the analysis of equation (10), if a convolution kernel size of 3 × 3 is used, the channel convolution operation can reduce the parameter amount to 1/9.
Designing a full convolution connection and deconvolution network;
the final layer of the traditional network structure adopts a fixed size, so that an input picture needs to be converted into a fixed size in advance, and the acquisition of the vehicle length coordinate of the logistics vehicle is not facilitated; in addition, the traditional full-connection layer network has the defects that the determined digit space coordinate is lost, so that the image space information is distorted, and the target cannot be effectively and accurately positioned. In order to solve the problem of information loss, the invention adopts a full convolution connection mode to accurately position the position coordinates of the features in the picture.
The full connection of the traditional network converts the convolution network [ b, c, h, w ] of the former part into [ b, c.h.w ], namely [ b,4096], and then into [ b, cls ], wherein b represents the batch size and cls represents the class number. The use of a full convolutional network is relative to a convolutional network followed by 1 × 1, without a full connection layer. Hence, it is called a full convolutional network. The calculation method of the full convolution is
Figure BDA0002101087030000061
In the formula: n is more than or equal to 1 and less than or equal to N; y isn[i][j]Convolving the (i, j) th position of the nth convolution kernel; siConvolution step size in the horizontal direction; sjConvolution step size in the vertical direction; k is a radical ofnIs the nth convolution kernel; dkFor convolution kernel width and height, the convolution kernel size corresponds to D in step 2k·Dk;δi,δjFor positions in the convolution kernel, the layer has a total of N different types of convolution kernels, 0 ≦ δij≤DkWhereas the sliding convolution operation of the convolution kernel may be converted to a two matrix multiplication operation. The result of the convolution with the pixels of the corresponding image may be expressed as
Figure BDA0002101087030000062
Wherein: the matrix dimension on the left is [ N, M.D ]k·Dk](ii) a The matrix dimension on the right side is [ M.D ]k·Dk,w′·h′](ii) a The dimension after convolution is [ N, w '. h']. In the matrix on the right, I is img, and subscripts thereof are image width and image height in turn, i.e. Iwh
Finally, through deconvolution operation, converting [ N, w '. h' ] into the size of the input image, thus accurately identifying the specific semantic information represented by each pixel and avoiding the loss of spatial information. The specific operation of deconvolution is equivalent to the inverse operation of convolution, i.e. using a single convolution as the inverse of the convolution
Figure BDA0002101087030000063
In the formula: k is a radical of1,…,kNThe weight value corresponding to each convolution kernel is changed from the original one
Figure BDA0002101087030000064
Is changed into
Figure BDA0002101087030000065
The weight is adjusted through training, and has the image semantic information characteristic.
Therefore, the network through the deconvolution and full convolution operation can be suitable for any image size, semantic analysis can be carried out on each pixel point of the image, the purpose of rapidly segmenting the image is achieved, and rapid and accurate positioning can be carried out on image features.
In order to verify the superiority of the invention, a logistics park vehicle is taken as an example, the following network model is constructed, and a comparison experiment is carried out:
firstly, network construction is carried out: four types of logistics vehicles, namely van trucks, traction trucks, dump trucks and tank trucks, are collected from the logistics park and are divided into 8000 training sets, 2000 testing sets and 1000 testing sets. The configuration of each parameter of the constructed network model structure is shown in the following table 1.
In table 1: k is the convolution kernel size; s is the step length; p is the size of the fill; DW is a channel convolution group and represents the fixed collocation formed by channel convolution kernels; residual summation is used to facilitate gradient transfer of a large network; activation of each layer and Batch Normalization (BN) facilitate accelerated network training; ReLU is a modified linear unit and is an activation function.
TABLE 1 design of parameters of network model architecture
Figure BDA0002101087030000071
The computer configuration adopted in the example is a display card with 11G, 1607 MHz display memory of Cujia NVIDIA Yingwei GTX1080 Ti.
Finally, the model test performance of the example network and the conventional network were compared, and the results are shown in table 2.
TABLE 2 comparison of lightweight segmentation model Performance
Figure BDA0002101087030000072
The evaluation index MPA in table 2 represents the average pixel accuracy (Mean pixel accuracy); MA represents the ratio of foreground area to label area (Mean accuracy); the MIOU represents a ratio of average intersection to area coverage (Mean intersection over), i.e. a ratio of the predicted correct region to the union of the predicted area and the tag area; the unit M pic-1 represents the memory occupied by training a picture, and the memory unit is megaly (M); the unit ms · iter-1 represents the time required per iteration, in milliseconds (ms); after the channel convolution is adopted, the occupied video memory is reduced by 51%, the training speed is improved by 78%, the testing speed is improved by 79%, each evaluation index of the segmentation positioning is greatly improved, and the improving amplitude of the MIOU is the largest.
By the embodiment, the improved method is verified to be capable of improving the performance of the model test, namely the running speed of image segmentation.
The scheme has the advantages that:
aiming at the sample problem, the invention adopts a full convolution method to improve the image segmentation running speed, and has the most prominent characteristics that the image is subjected to lightweight processing, the segmentation efficiency of the model is improved under the condition of ensuring the segmentation precision, and the parameter quantity of the model is reduced in a channel convolution mode; and a multi-scale cavity convolution kernel is arranged, so that the receptive field of the model is reasonably and simply improved, and the generalization of the model is enhanced. The algorithm can be widely applied to the field of image positioning identification, such as logistics park vehicle identification and the like.
The embodiments described in this specification are merely illustrative of implementations of the inventive concept and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments but rather by the equivalents thereof as may occur to those skilled in the art upon consideration of the present inventive concept.

Claims (1)

1. A method for improving the running speed of image segmentation comprises the following steps:
designing a multi-scale cavity convolution kernel;
in order to solve the problem that the receptive field is increased by adopting the traditional convolution and maximum pooling method, a hole convolution kernel is adopted, the sampling rate is increased on the basis of the traditional convolution kernel, and the original convolution kernel is turned to be fluffy;
thus, while the original calculated amount is kept, the receptive field is increased, so that the image segmentation information is accurate enough, and the calculation formula of the size of the receptive field based on the cavity convolution kernel is
Figure FDA0002946914510000011
In the formula: f is the size of the current layer receptive field; the rate is the sampling rate of the void convolution kernel, i.e. the number of intervals, and can be regarded as 1 for the rate of the conventional convolution kernel and 2 for the rate of the void convolution; the traditional convolution receptive field calculation formula is
Figure FDA0002946914510000012
In the formula: fi-1The size of the receptive field of the previous layer; k is a radical ofiIs the convolution or pooling kernel size of the ith layer; n is the total number of layers of convolution; siConvolution step size Stride being the i-th layer of convolution kernel;
the multi-scale cavity convolution is designed by using the thought of multi-scale image change, and the sampling rate and the convolution kernel size are subjected to diversified processing, so that the method can adapt to the characteristic extraction process of targets with different sizes; the calculation of the multi-scale hole convolution is
Figure FDA0002946914510000013
In the formula: y [ i ] is the convolution summation result corresponding to the ith step position; k is a convolution kernel; k is the coordinate position of the parameter in the convolution kernel, and K belongs to K; w [ k ] is the convolution kernel weight; rate is a sampling rate and can take corresponding values of 1, 2 and 3;
designing a channel convolution network;
because the traditional convolution mode is a dimension increasing operation, the function of characteristic convolution dimension reduction is achieved by adopting a channel convolution mode at first; firstly, the traditional convolution is changed into two-layer convolution, similar to group operation in ResNet, the new structure shortens the calculation time to about 1/8 on the premise of not influencing the accuracy, reduces the parameter quantity to about 1/9, can be well applied to a mobile terminal, realizes the real-time detection of a target, and has an obvious model compression effect;
for conventional convolution, the input is assumedThe number of the characteristic channels is M; width or height of convolution kernel is DkOr Dk(ii) a The number of convolution kernels is N; then there are N M.D. s for each position that the convolution slides oncek·DkThe step length of sliding is set as s; the calculation formula of the size of the image after sliding is
Figure FDA0002946914510000014
Figure FDA0002946914510000021
In the formula: h ', w' are height and width after convolution; pad is the boundary of the width and height filling; therefore, a certain point of the size after h '& w' convolution corresponds to N M & Dk·DkThe parameter quantity of (2) can be obtained as the total parameter quantity
N·M·Dk·Dk·h'·w' (6)
And the convolution step is divided into two steps by adopting an improved channel convolution mode:
1) by using Dk·DkThe convolution of M performs convolution on M channels respectively; the same step length s is used for sliding, the dimension after convolution is h ', w', and the parameter quantity generated by the step is
Dk·Dk·M·h'·w' (7)
2) Setting convolution kernels of 1, 1 and N for performing dimension-increasing feature extraction; at the moment, the step length is 1, the feature diagram obtained in the step 1) is subjected to feature extraction again, the original M channel features are subjected to feature extraction by adopting N convolution kernels, and the calculated total parameter number is
M·N·h'·w'·1·1 (8)
The convolution structures of the two steps are integrated to obtain the final parameter quantity of the channel convolution as
Dk·Dk·M·h'·w'+M·N·h'·w' (9)
As previously mentioned, the parameter of the conventional convolution kernel is compared with the parameter of the improved channel convolution by the quantity
Figure FDA0002946914510000022
From the analysis of equation (10), the channel convolution operation reduces the parameter amount;
designing a full convolution connection and deconvolution network;
the final layer of the traditional network structure adopts a fixed size, so that an input picture needs to be converted into a fixed size in advance, and the acquisition of the vehicle length coordinate of the logistics vehicle is not facilitated; in addition, the traditional full-connection layer network has the defects that the determined digit space coordinate is lost, so that the image space information is distorted, and the target cannot be effectively and accurately positioned; in order to solve the problem of information loss, a full convolution connection mode is adopted to accurately position the position coordinates of the features in the picture;
the full connection of the traditional network converts the convolutional network [ b, c, h, w ] of the front part into [ b, c.h.w ], namely [ b,4096], and then into [ b, cls ], wherein b represents the batch size and cls represents the class number; the adopted full convolution network is a convolution network connected with 1 multiplied by 1 correspondingly, and has no full connection layer; hence, the term full convolutional network; the calculation method of the full convolution is
yn[i][j]=fkns(x[sii][sjj]) (11)
In the formula: n is more than or equal to 1 and less than or equal to N; y isn[i][j]Convolving the (i, j) th position of the nth convolution kernel; siConvolution step size in the horizontal direction; sjConvolution step size in the vertical direction; k is a radical ofnIs the nth convolution kernel; dkFor the width and height of the convolution kernel, the size of the convolution kernel corresponds to D in the second stepk·Dk;δi,δjFor the position in the convolution kernel, the full convolution connection network layer has N convolution kernels of different types in total, and delta is more than or equal to 0ij≤DkAnd the sliding convolution operation of the convolution kernel can be converted into two matrix multiplication operations; corresponding pixels and volumes of an imageThe result of the product can be expressed as
Figure FDA0002946914510000031
Wherein: the matrix dimension on the left is [ N, M.D ]k·Dk](ii) a The matrix dimension on the right side is [ M.D ]k·Dk,w′·h′](ii) a The dimension after convolution is [ N, w '. h'](ii) a In the matrix on the right, I is img, and subscripts thereof are image width and image height in turn, i.e. Iwh
Finally, through deconvolution operation, converting [ N, w '& h' ] into the size of the input image, thus accurately identifying the specific semantic information represented by each pixel and avoiding the loss of spatial information; the specific operation of deconvolution is equivalent to the inverse operation of convolution, i.e. using a single convolution as the inverse of the convolution
Figure FDA0002946914510000032
In the formula: k is a radical of1,…,kNThe weight value corresponding to each convolution kernel is changed from the original one
Figure FDA0002946914510000033
Is changed into
Figure FDA0002946914510000034
Figure FDA0002946914510000035
The weight is adjusted through training, and has the image semantic information characteristic.
CN201910535642.4A 2019-06-20 2019-06-20 Method for improving image segmentation running speed Active CN110458841B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910535642.4A CN110458841B (en) 2019-06-20 2019-06-20 Method for improving image segmentation running speed

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910535642.4A CN110458841B (en) 2019-06-20 2019-06-20 Method for improving image segmentation running speed

Publications (2)

Publication Number Publication Date
CN110458841A CN110458841A (en) 2019-11-15
CN110458841B true CN110458841B (en) 2021-06-08

Family

ID=68480779

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910535642.4A Active CN110458841B (en) 2019-06-20 2019-06-20 Method for improving image segmentation running speed

Country Status (1)

Country Link
CN (1) CN110458841B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626267B (en) * 2019-09-17 2022-02-15 山东科技大学 Hyperspectral remote sensing image classification method using void convolution
CN111967401A (en) * 2020-08-19 2020-11-20 上海眼控科技股份有限公司 Target detection method, device and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10147193B2 (en) * 2017-03-10 2018-12-04 TuSimple System and method for semantic segmentation using hybrid dilated convolution (HDC)
CN107169974A (en) * 2017-05-26 2017-09-15 中国科学技术大学 It is a kind of based on the image partition method for supervising full convolutional neural networks more
CN108830855B (en) * 2018-04-02 2022-03-25 华南理工大学 Full convolution network semantic segmentation method based on multi-scale low-level feature fusion
CN108776969B (en) * 2018-05-24 2021-06-22 复旦大学 Breast ultrasound image tumor segmentation method based on full convolution network
CN108921196A (en) * 2018-06-01 2018-11-30 南京邮电大学 A kind of semantic segmentation method for improving full convolutional neural networks
CN109410185B (en) * 2018-10-10 2019-10-25 腾讯科技(深圳)有限公司 A kind of image partition method, device and storage medium

Also Published As

Publication number Publication date
CN110458841A (en) 2019-11-15

Similar Documents

Publication Publication Date Title
CN107016681B (en) Brain MRI tumor segmentation method based on full convolution network
CN108154102B (en) Road traffic sign identification method
CN111310666B (en) High-resolution image ground feature identification and segmentation method based on texture features
CN107358258B (en) SAR image target classification based on NSCT double CNN channels and selective attention mechanism
Yang et al. Fast vehicle logo detection in complex scenes
CN111951288B (en) Skin cancer lesion segmentation method based on deep learning
CN111783782A (en) Remote sensing image semantic segmentation method fusing and improving UNet and SegNet
Zhu et al. SAR target classification based on radar image luminance analysis by deep learning
CN109410195B (en) Magnetic resonance imaging brain partition method and system
CN111191583A (en) Space target identification system and method based on convolutional neural network
CN111008632B (en) License plate character segmentation method based on deep learning
Kashyap Evolution of histopathological breast cancer images classification using stochasticdilated residual ghost model
CN108230330B (en) Method for quickly segmenting highway pavement and positioning camera
CN112836671B (en) Data dimension reduction method based on maximized ratio and linear discriminant analysis
CN110930378B (en) Emphysema image processing method and system based on low data demand
CN112446891A (en) Medical image segmentation method based on U-Net network brain glioma
CN110458841B (en) Method for improving image segmentation running speed
CN111915583B (en) Vehicle and pedestrian detection method based on vehicle-mounted thermal infrared imager in complex scene
CN111986126B (en) Multi-target detection method based on improved VGG16 network
CN111062381B (en) License plate position detection method based on deep learning
CN111325750A (en) Medical image segmentation method based on multi-scale fusion U-shaped chain neural network
CN116630971B (en) Wheat scab spore segmentation method based on CRF_Resunate++ network
CN113298032A (en) Unmanned aerial vehicle visual angle image vehicle target detection method based on deep learning
CN116824239A (en) Image recognition method and system based on transfer learning and ResNet50 neural network
CN113486894B (en) Semantic segmentation method for satellite image feature parts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant