CN109461157A - Image, semantic dividing method based on multi-stage characteristics fusion and Gauss conditions random field - Google Patents

Image, semantic dividing method based on multi-stage characteristics fusion and Gauss conditions random field Download PDF

Info

Publication number
CN109461157A
CN109461157A CN201811218436.2A CN201811218436A CN109461157A CN 109461157 A CN109461157 A CN 109461157A CN 201811218436 A CN201811218436 A CN 201811218436A CN 109461157 A CN109461157 A CN 109461157A
Authority
CN
China
Prior art keywords
image
random field
convolution
layer
conditions random
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811218436.2A
Other languages
Chinese (zh)
Other versions
CN109461157B (en
Inventor
周鹏程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201811218436.2A priority Critical patent/CN109461157B/en
Publication of CN109461157A publication Critical patent/CN109461157A/en
Application granted granted Critical
Publication of CN109461157B publication Critical patent/CN109461157B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform

Abstract

The invention discloses it is a kind of based on multi-stage characteristics fusion and Gauss conditions random field image, semantic dividing method, comprising: 1) construct image pyramid;2) keep characteristic pattern resolution ratio constant using empty convolution;3) multi-stage characteristics successively merge tuning framework;4) it is up-sampled with bilinear interpolation;5) loss function is defined;6) the optimization output of Gauss conditions random field.The present invention realizes the full convolution framework that multi-stage characteristics successively merge by building image pyramid, use the instead preceding popular parallel pond module of top-down tuning frame, it is successively merged while obtaining different scale feature, it ensure that the feature between pyramid adjacent layer preferentially merges, contextual information is captured to the maximum extent, the output of front end is advanced optimized using Gauss conditions random field, capture more spatial details, so that object bounds are more accurate in segmentation effect figure, the output of final overall architecture obtains optimal semantic segmentation effect.

Description

Image, semantic dividing method based on multi-stage characteristics fusion and Gauss conditions random field
Technical field
The present invention relates to it is a kind of based on multi-stage characteristics fusion and Gauss conditions random field image, semantic dividing method.
Background technique
Content in image is only divided into roughly several regions by the image segmentation of early stage, and with the development of research, The coarse demand for being no longer satisfied various applications of image segmentation, proposes semantic segmentation therewith.The semanteme of image refers to image Or the semantic informations such as image-region classification of object or entity for being included, it is semantic under image segmentation be known as semantic segmentation.Figure It as semantic segmentation can be by the prospect and background separation in single-frame images, and identifies the classification of each foreground target, is equivalent to A semantic label is imparted to each pixel.Semantic segmentation is all the primary great liter of image segmentation in precision and fineness Grade.
The purpose of image, semantic segmentation is some being classified as each pixel in image in predetermined class label, this Sample can not only be partitioned into region, moreover it is possible to carry out the mark in content to region.In reality the application of more and more visions need from Relevant knowledge or semanteme are inferred in image, i.e., by specific to abstract process.These applications include automatic Pilot, man-machine friendship Mutually, photography is calculated, image search engine, augmented reality etc., the actual demand of behind is accurate and efficient cutting techniques.
The mode that current state-of-the-art image, semantic parted pattern is combined using deep learning with probability graph model.Model Front end is replaced all with last full articulamentum based on the depth convolutional neural networks for being initially image classification task and designing Convolutional layer, thus referred to as full convolutional network can carry out semantic segmentation to the image of arbitrary size.Model rear end is using full connection Condition random field optimizes the semantic feature of convolutional network coarse extraction, enhances the ability of model capture space details.It is macro It is said in sight, full convolutional network is more like a kind of skill, is constantly progressive with the performance boost of basic network.Deep learning adds generally Rate graph model is a kind of trend: deep learning carries out feature extraction, and probability graph model can be explained well from mathematical angle Essential connection between things.
Specifically, ready-made convolutional neural networks are utilized as it in the image, semantic dividing method based on full convolutional network One of module generates the feature of stratification, and convert full convolution model for existing well-known disaggregated model: full articulamentum replaces It is changed to convolutional layer, output is space reflection and non-categorical score.These mappings are obtained by small stride convolution up-sampling (deconvolution) It arrives, to generate intensive pixel level tag.This work that Long et al. takes the lead in being done is considered as landmark progress, Because it elaborates how convolutional neural networks can be trained end to end in semantic segmentation problem, and efficiently learns How the Tag Estimation of pixel scale is generated based on the input of arbitrary size for semantic segmentation problem.Vijay et al. is proposed SegNet introduces more jump connection, and has used maximum pondization index rather than the encoder spy in full convolutional network Sign, therefore SegNet more saves memory for full convolutional network.Fisher et al. is proposed using empty convolution as real The convolutional layer of existing Pixel-level prediction, can be such that receptive field exponentially increases under the premise of not reducing Spatial Dimension.Zhao etc. People thinks that global scene is capable of providing the category distribution information of image, semantic segmentation, then proposes pyramid pond module by making These information are obtained with the pond layer of larger core;
Method based on depth convolutional Neural seems innately to be difficult to get both between classification performance and positioning accuracy, convolution net Network to the powerful invariance that Image space transformation has allow to accurately predict target presence and rough location information, Segmentation result is often not enough coordinated, and certain zonules are likely to occur incorrect mark, inconsistent with its surrounding pixel mark, Thus can not accurately rendered object boundary.Equally, cross entropy loss function is also not the optimal loss letter of semantic segmentation Number, because the final penalty values of image are the superposition of each pixel loss value, intersecting entropy loss cannot be guaranteed the continuous of pixel Property.The output of tuning segmentation framework and universal method for strengthening its capture fine granularity information are introducing condition random field conducts Its rear end processing module makes the conditional random field models connected entirely as an independent post-processing step in its process, with This carries out tuning to segmentation result.Although full link model be usually it is inefficient, the model due to can with probability inference come It is approximate, it is possible to reach opposite efficient.The similitude between pixel is utilized in original image in condition random field.
For the modern architectures of image, semantic segmentation, it is primarily present two problems:
1, front end features, which extract, loses contextual information: being currently based in the semantic segmentation problem of full convolutional network, exports The resolution ratio of image is consistent with input picture.However, continuous pondization and skip operation reduce the resolution of gained characteristic pattern Rate, subsequent characteristic pattern are smaller and smaller, it is necessary to be up-sampled by deconvolution operation to characteristic pattern, also by the characteristic pattern of diminution Original arrives original size, to restore spatial resolution.This process necessarily causes loss information that can not restore, and frequent warp Product operation is also required to additional memory and time.So image, semantic segmentation needs the information to a variety of scales to be integrated, It needs to be balanced local message and global information.On the one hand, fine granularity information local in other words is for improving Pixel-level It is crucial, another aspect for the correctness of other mark, the global context information of integral image is for solving localized mode It is also critically important for paste property problem.Standard convolution neural network framework and being bad at handle this local message and global information it Between balance, the computational efficiency that pond layer can make network obtain space-invariance to a certain degree and maintain like but loses Global context information, similarly limited without the convolutional network of pond layer, after all the receptive field of its neuron can only be with layer It counts and linear increase.
2, rear end model optimization is difficult to restrain: the output that rear end introduces condition random field Optimized Segmentation framework is caught with strengthening it Catch the ability of details.However, it is multiple all to there is formula in either common gibbs condition random field or Markov random field It is miscellaneous, solve the problems such as constant, convergence is slow when whens these problems often results in model training, it is difficult to which convergence such as does not restrain even at the feelings Condition.
Summary of the invention
The technical problem to be solved by the present invention is to provide it is a kind of based on multi-stage characteristics fusion and Gauss conditions random field image Semantic segmentation method.
The technical scheme is that a kind of image, semantic segmentation based on multi-stage characteristics fusion and Gauss conditions random field Method, comprising the following steps:
1) image pyramid is constructed
One four layers of image pyramid is constructed to single-frame images, each layer is numbered from the bottom to top, and i-th layer is expressed as Gi, In order to generate the i+1 layer in image pyramid, with a Gaussian kernel kGaussianTo GiConvolution operation is executed, and is deleted later Each even number row and column, Gaussian kernel kGaussianIt is expressed as formula (1):
Then in input original image G0The upper iteration above process, ultimately generates whole image pyramid;
2) keep characteristic pattern resolution ratio constant using empty convolution
This kind of 2D signal of image from the point of view of filter, filter w do empty convolution operation on input feature vector figure x Obtain the output y of position i:
Wherein voidage r corresponds to the stride sampled to input signal, is equivalent to by input x and by each sky Between be inserted into r-1 zero between two continuous filtering wave numbers in dimension and the up-sampling filter that generates carries out convolution, Standard convolution It is equivalent to the special case of voidage r=1;
3) multi-stage characteristics successively merge tuning framework
Image pyramid comprising different resolution image is generated by original image, is opened from the image of top layer minimum resolution Begin to generate the characteristic pattern comprising local message by several full convolution operations, then is upsampled to identical with neighbouring tomographic image Resolution ratio stacks the subsequent several full convolution operations for participating in this layer with adjacent layer initial characteristics figure, i.e. fusion obtains new part Feature, with this, gradually tuning, the last layer have carried out 1 × 1 to obtain final segmentation figure from characteristic pattern from top to bottom Convolution operation is simultaneously finely tuned;
4) it is up-sampled with bilinear interpolation
The low resolution characteristic pattern comprising local feature is up-sampled to be fused to and include by bilinear interpolation In the high-resolution features figure of opposite global characteristics;
5) loss function is defined
It is total using the cross entropy item for minimizing each spatial position in cross entropy loss function calculating convolutional network output figure With, or perhaps calculate the distance between the prediction probability distribution of each pixel and the distribution of its true probability:
Wherein, L is the cross entropy loss function to wrong tag along sort, and R is a regular terms, and function L is usually decomposed For the sum of the loss of specified pixel:
Then by executing stochastic gradient descent to cross entropy loss function, in point of 2012 data set of PASCALVOC Cut trim network model in task;
6) the optimization output of Gauss conditions random field
It does rear end using Gauss conditions random field to optimize, the energy function E (x) of Gauss conditions random field are as follows:
When A+ λ I is symmetric positive definite matrix, the minimum value for solving E (x) is equivalent to solve equation:
(A+ λ I) x=B (12)
Further, in the present invention, in step 2), empty convolution allows by changing voidage r to be adaptively modified The size of the size of receptive field, receptive field can indicate are as follows:
F=((ksize+1)rrate-1) (3)
Wherein, ksizeRepresent the actual size of convolution kernel, rrateRefer to voidage.
Further, in the present invention, in step 4), bilinear interpolation calculates target picture using four points in original image Element value carries out interpolation, according to Q11(x1,y1)、Q12(x1,y2)、Q21(x2,y1)、Q22(x2,y2) the f function value at 4 points can count Value of the unknown function f at point P (x, y) is calculated, is specifically included:
The first step carries out linear interpolation in the x direction and obtains:
Wherein R1=(x, y1), R2=(x, y2);
Second step passes through the calculated R of the first step to linear interpolation is carried out on the direction y1With R2Interpolation calculation in y-direction P point out:
Wherein P=(x, y).
Compared with the prior art, the present invention has the following advantages:
1) the invention proposes a kind of tuning strategies that multi-stage characteristics successively merge, and are carried out using image pyramid to original image Multiple dimensioned pixel sampling is to generate the images of several different resolutions, before inputting network by the minimum image of top layer resolution ratio End carries out the coarse extraction of semantic feature, and suitable with the next layer of slightly higher image of resolution ratio being finally upsampled to, and entrance is next Layer participates in the input of network front end together with new image in different resolution.This preceding layer learns Hierarchical abstraction and later layer capture is high-precision The layer-by-layer amalgamation mode of degree information effectively increases the ability that model captures contextual information.
2) in the present invention, rear end is obviously had secondary using the Gauss conditions random field in full condition of contact random field Many advantages, such as global solution of energy, convergence rate is slowly even difficult to the problems such as restraining when model training has been effectively relieved.
Detailed description of the invention
The invention will be further described with reference to the accompanying drawings and embodiments:
Fig. 1 is the basic framework figure of the method for the present invention;
Fig. 2 is the structural schematic diagram of image pyramid in the present invention;
Fig. 3 (a) is 3 × 3 empty convolution schematic diagrames that voidage is 1 in the present invention;
Fig. 3 (b) is 3 × 3 empty convolution schematic diagrames that voidage is 2 in the present invention;
Fig. 3 (c) is 3 × 3 empty convolution schematic diagrames that voidage is 4 in the present invention;
Fig. 4 is the layer-by-layer fusion architecture schematic diagram of multi-stage characteristics in the present invention;
Fig. 5 is the schematic diagram of bilinear interpolation in the present invention;
Fig. 6 is the schematic diagram of full condition of contact random field in the present invention;
Fig. 7 (a) is the iterative process schematic diagram that regular terms is punished in the present invention;
Fig. 7 (b) is that schematic diagram is lost in the recurrence of parted pattern in the present invention;
Fig. 7 (c) be the present invention in canonical loss and return loss the sum of schematic diagram;
Fig. 8 is that score and velocity contrast scheme on PASCAL VOC 2012 in the present invention;
Fig. 9 is that figure is compared in the visualization in the present invention in 2012 data of PASCAL VOC.
Specific embodiment
Embodiment:
A kind of image, semantic point based on multi-stage characteristics fusion and Gauss conditions random field of the present invention is shown in conjunction with attached drawing The specific embodiment of segmentation method, the basic framework figure of this method is as shown in Figure 1, this method is realized by building image pyramid The full convolution framework that multi-stage characteristics successively merge, it is intended to use the instead preceding popular parallel pond of top-down tuning frame Module successively merges while obtaining different scale feature, ensure that the feature between pyramid adjacent layer preferentially merges, most Capture contextual information to limits.In addition, advanced optimizing the output of front end using Gauss conditions random field, capture is more Spatial detail, so that object bounds are more accurate in segmentation effect figure.The output of final overall architecture obtains optimal semanteme point Cut effect.
The present invention passes through the initial characteristics figure of each layer of image pyramid of calculating, and the up-sampling knot with upper one layer of characteristic pattern Enter this layer of remaining full convolution operation after fruit series connection together, final semantic segmentation figure passes through the last time of pyramid bottom Full convolution operation and obtain.In the method, following steps are specifically divided into: the building image pyramid of front end, using sky Hole convolution keep characteristic pattern resolution ratio is constant, multi-stage characteristics successively merge tuning framework, up-sampled with bilinear interpolation, Define loss function, the optimization output of the Gauss conditions random field of rear end.
1, the full convolutional network in front end extracts rough semantic feature
1) image pyramid is constructed
SIFT algorithm obtains the characteristic information on different scale by one image pyramid of building, and this method is to single frames Image also constructs one four layers of image pyramid.Imagine that pyramid is one group of figure layer, figure layer is higher, and size is smaller.
As shown in connection with fig. 2, each layer is numbered from the bottom to top, and i-th layer is expressed as Gi, so i+1 layer (uses Gi+1It indicates) Resolution ratio be less than i-th layer and (use GiIt indicates).In order to generate the i+1 layer in image pyramid, we are with a Gaussian kernel kGaussianTo GiConvolution operation is executed, and is deleting each even number row and column, Gaussian kernel k laterGaussianIt is expressed as formula (1):
Can readily note that, the image of generation by be its forerunner a quarter.In input original image G0On The iteration above process ultimately generates whole image pyramid.
2) keep characteristic pattern resolution ratio constant using empty convolution
Using the empty convolution method being initially applied in field of signal processing wavelet transformation analysis, by removing full convolution Last several layers of down-sampling operation and relevant up-sampling operate to extract dense feature in network.In this way, can be effectively The resolution ratio of characteristic pattern in convolutional neural networks is controlled without learning additional parameter, so as to avoid in Standard convolution network The resolution ratio of characteristic pattern is smaller and smaller, is up-sampled characteristic pattern by the characteristic pattern of diminution so that needing to operate by deconvolution The problem of original size is reverted to restore spatial resolution, will necessarily but lose information, and frequent deconvolution operation Need additional memory and time.
This kind of 2D signal of image from the point of view of filter, filter w do empty convolution operation on input feature vector figure x Obtain the output y of position i:
Wherein voidage r corresponds to the stride sampled to input signal, is equivalent to by input x and by each sky Between be inserted into r-1 zero between two continuous filtering wave numbers in dimension and the up-sampling filter that generates carries out convolution, because being referred to herein as Empty convolution.Standard convolution is equivalent to the special case of voidage r=1.
Empty convolution allows us by changing voidage r to be adaptively modified the size of receptive field, in conjunction with Fig. 3 (a), Shown in Fig. 3 (b), Fig. 3 (c).Wherein Fig. 3 (a) is that 3 × 3 convolution kernels of standard act on receptive field and common convolution operation one Sample.Fig. 3 (b) is 3 × 3 empty convolution of rate=2, and convolution kernel actual size is still 3 × 3, and only receptive field increases To 7 × 7.Fig. 3 (c) is 3 × 3 empty convolution of r=4, and convolution kernel actual size is also still 3 × 3, and receptive field is then Increase to 15 × 15.The size of receptive field can indicate are as follows:
F=((ksize+1)rrate-1) (3)
Wherein, ksizeRepresent the actual size of convolution kernel, rrateRefer to voidage.As can be seen that using different voidages ksize×ksizeConvolution kernel acts on the feature of the same available different feeling open country of input.
3) multi-stage characteristics successively merge tuning framework
Parallel pond thought most outstanding at present has merged local message and global information at all levels well, still Only being simply uniformly upsampled to equal resolution forms final characteristic pattern for final polymerization.Our method thinks, ruler Very little neighbouring characteristic pattern merged in advance as part relatively with global information can preferably restore due to resolution ratio reduction and The information of loss then proposes a kind of new Feature fusion.
As shown in connection with fig. 4, the image pyramid comprising different resolution image is generated by original image, from top layer minimum point The image of resolution begin through several full convolution operations and generate include local message characteristic patterns, then be upsampled to it is neighbouring The identical resolution ratio of tomographic image stacks the subsequent several full convolution operations for participating in this layer with adjacent layer initial characteristics figure, that is, merges New local feature is obtained, with this gradually tuning from top to bottom.This step-by-step procedure obtains on the basis of obtaining good details Semantic information as strong as possible incorporates the context of different zones more efficiently to obtain global contextual information. The last layer from characteristic pattern in order to obtain convolution operation and the fine tuning that final segmentation figure has carried out 1 × 1.
4) it is up-sampled with bilinear interpolation
Characteristic pattern is restored to size identical with adjacent layer using bilinear interpolation by our method, merges spy with this Sign.Computation complexity when guaranteeing the simple clarity of the network architecture and reduce training.
Bilinear interpolation is that a kind of more top sampling method is used to obtain in current semantic segmentation.The characteristics of this method It is not need to learn, the speed of service is fast, easy to operate, it is only necessary to set fixed parameter value.
Specifically as shown in connection with fig. 5, it is assumed that calculate value of the unknown function f at point P (x, y), it is known that condition is Q11(x1, y1)、Q12(x1,y2)、Q21(x2,y1)、Q22(x2,y2) f function value at 4 points.
The first step carries out linear interpolation in the x direction and obtains:
Wherein R1=(x, y1), R2=(x, y2);
Second step passes through the calculated R of the first step to linear interpolation is carried out on the direction y1With R2Interpolation calculation in y-direction P point out:
Wherein P=(x, y).
In short, bilinear interpolation, which calculates target pixel value using four points in original image, carries out interpolation.Our net It includes phase that network model, which up-samples the low resolution characteristic pattern comprising local feature by bilinear interpolation to be fused to, To in the high-resolution features figure of global characteristics.Up-sampling, which is carried out, with bilinear interpolation replaces complicated deconvolution operation, drop The low complexity of model.
5) loss function is defined
Loss function used in our model is damaged as most of semantic segmentation models using cross entropy is minimized It loses, which calculates the cross entropy item summation of each spatial position in convolutional network output figure, or perhaps calculates each The prediction probability of pixel is distributed the distance between the distribution of its true probability:
Wherein, L is the cross entropy loss function to wrong tag along sort, and R is a regular terms, and function L is usually decomposed For the sum of the loss of specified pixel:
The weight of each pixel is equally distributed in loss function default image defined above, so that learning algorithm is inclined The region big to the accounting in image, and ignore the small region of accounting.We are by executing boarding steps to cross entropy loss function Degree decline, the trim network model in the segmentation task of 2012 data set of PASCALVOC.
2, the optimization output of rear end condition random field
As shown in connection with fig. 6, use the conditional random field models connected entirely between any two as an independent post-processing step Suddenly, tuning can be carried out to segmentation result.The model by each pixel modeling be certain region in a node, no matter two pictures How far is element distance, and relationship between any two can be all scaled, and therefore, this model is also referred to as intensive or full connection exception figure. After this model, no matter short-term or long-term pixel correlation is all evaluated, so that system is considered that The detailed information needed in cutting procedure.
There is class label x for each pixel iiThere are also corresponding observation yi, pixel each in this way as node, Relationship between pixel and pixel constitutes a condition random field as side.And we pass through observational variable yiTo speculate The corresponding class label x of pixel ii
The energy function E (x) of full condition of contact random field are as follows:
Unitary potential function ∑ thereiniΨu(xi) i.e. from the output of the full convolutional network in front end.And binary potential function is such as Under:
Binary potential function is exactly the relationship described between pixel and pixel, and similar pixel is encouraged to distribute identical mark Label, and differ biggish pixel and distribute different labels, and the definition of this distance is related with color value and practical relative distance.Institute Image can be made to divide as far as possible in boundary with such condition random field.And the difference of full condition of contact random field is that, two First potential function describes the relationship of each pixel Yu other all pixels, so crying full connection.
6) the optimization output of Gauss conditions random field
There are formula complexity for discrete full condition of contact random field, when solving the problems such as constant, and then leading to model training Convergence is slow, it is difficult to the case where convergence does not restrain even.We do rear end using Gauss conditions random field and optimize, Gauss item The energy function of part random field again from it is different before:
When A+ λ I is symmetric positive definite matrix, the minimum value for solving E (x) is equivalent to solve equation:
(A+ λ I) x=B (12)
It can be seen that the second energy function of Gauss conditions random field possesses clear global minimum, linear equation is solved Energy function optimization than solving complicated is much simpler.
The data set that experiments of the invention use is: PASCAL VOC 2012.PASCAL VOC 2012 is current Divide the most common data set in field in image, semantic, we can verify image, semantic segmentation of the invention on this data set Effect has certain superiority.The data set includes 20 foreground objects and 1 background object, wherein being respectively used to train 1464 images, 1449 images for verifying and 1456 images for test.The artificial number such as Hariharan A large amount of additional marks are provided according to collection, the training image of data set is increased 10582.We pass through the precision of method The pixel of this 21 classifications hands over union average value to measure:
Wherein, k is the number of foreground object, pijRefer to the number for originally belonging to the pixel that the i-th class is but classified into jth class Amount.
Experimental Hardware environment: Ubuntu16.04, Core i7 processor, dominant frequency 3.6GHz inside save as 48G, explicit card type It number is NVIDIA GTX1080.Code running environment is: Python3.6.5+TensorFlow1.8.0.
Our experiment is based on TensorFlow, executes stochastic gradient descent algorithm with batch minimum value 14 and momentum 0.9, Iteration 600000 times, learning rate abides by iteration agreement:
Wherein power=0.9.
As shown in Fig. 7 (a), Fig. 7 (b), Fig. 7 (c), Fig. 7 (a) be for prevent over-fitting and increased punishment regular terms repeatedly For process, Fig. 7 (b) is the recurrence loss of parted pattern, gradually converges to what some more optimized after 600000 iteration Region, Fig. 7 (c) are the sum of canonical loss and recurrence loss, are the optimization process of entire objective function.As can be seen that mesh in figure The optimization process of scalar functions is not sailed right before the wind, and is become by just can gradually show global convergence after the biggish iteration of number Gesture.
Algorithm of the invention and FCN (2015, Fully Convolutional Networks for Semantic Segmentation), DeepLab (2017, DeepLab:Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs) and PSPNet (2017 Year, Pyramid Scene Parsing Networks) it is compared, as a result it is presented in Fig. 9.Me is can be found that from figure Method can actually be partitioned into target object well, pixel positioning is in contrast more accurate, than FCN and DeepLab There is apparent advantage.Because our method is the improvement that the parallel pond Multiscale Fusion thought based on PSPNet is made, point It cuts effect to be closer to compared with PSPNet, but still can be found that and be better than PSPNet method in the subtle ground general plan in several places, There is certain advantage for generally.
As shown in Figure 8, FCN network structure is most simple, does not have warp lamination, so processing speed is most fast;DeepLab is then It joined condition random field on the basis of FCN and carry out rear end optimization, therefore processing speed is many slowly.Our method is in precision On have slightly advantage compared with PSPNet, and because there is Gauss conditions random field to do rear end secondary treatment in speed, thus compared with More slowly, but the second energy function formula of Gauss conditions random field is succinct for the processing speed of PSPNet, and convenience of calculation is compared For DeepLab more rapidly.
Score in 2012 data of table 2PASCAL VOC
We finally execute assessment on 2012 test set of PASCAL VOC.It is as shown in the table, in order to analyze different objects Segmentation, we list the segmentation precision of all objects in 2012 data set of PASCAL VOC.It can be seen that the best way is simultaneously It is not all to be optimal and some special to the segmentation effect of all objects or disconnected object is led to by partial occlusion It is larger to divide difficulty.But for total mIOU, our method is really sayed such as prediction, the mode that multi-stage characteristics successively merge There is advantage than many other methods to a certain extent.
Certainly the above embodiments merely illustrate the technical concept and features of the present invention, and its object is to allow be familiar with technique People can understand the content of the present invention and implement it accordingly, it is not intended to limit the scope of the present invention.It is all according to this hair The modification that the Spirit Essence of bright main technical schemes is done, should be covered by the protection scope of the present invention.

Claims (3)

1. a kind of image, semantic dividing method based on multi-stage characteristics fusion and Gauss conditions random field, which is characterized in that including Following steps:
1) image pyramid is constructed
One four layers of image pyramid is constructed to single-frame images, each layer is numbered from the bottom to top, and i-th layer is expressed as Gi, for life At the i+1 layer in image pyramid, with a Gaussian kernel kGaussianTo GiConvolution operation is executed, and is deleting each idol later Number row and column, Gaussian kernel kGaussianIt is expressed as formula (1):
Then in input original image G0The upper iteration above process, ultimately generates whole image pyramid;
2) keep characteristic pattern resolution ratio constant using empty convolution
This kind of 2D signal of image from the point of view of filter, filter w do empty convolution operation on input feature vector figure x and obtain The output y of position i:
Wherein voidage r corresponds to the stride sampled to input signal, is equivalent to by input x and by each space dimension The up-sampling filter for being inserted into r-1 zero between two continuous filtering wave numbers on degree and generating carries out convolution, and Standard convolution is suitable In the special case of voidage r=1;
3) multi-stage characteristics successively merge tuning framework
Image pyramid comprising different resolution image is generated by original image, is led to since the image of top layer minimum resolution It crosses several full convolution operations and generates the characteristic pattern comprising local message, then be upsampled to resolution identical with neighbouring tomographic image Rate stacks the subsequent several full convolution operations for participating in this layer with adjacent layer initial characteristics figure, i.e. fusion obtains new local feature, With this gradually tuning from top to bottom, the last layer from characteristic pattern in order to obtain the convolution behaviour that final segmentation figure has carried out 1 × 1 Make and finely tunes;
4) it is up-sampled with bilinear interpolation
It includes opposite for being up-sampled to the low resolution characteristic pattern comprising local feature by bilinear interpolation to be fused to In the high-resolution features figure of global characteristics;
5) loss function is defined
The cross entropy item summation that convolutional network exports each spatial position in figure is calculated using cross entropy loss function is minimized, or It is the distance calculated between the prediction probability distribution of each pixel and the distribution of its true probability that person, which says:
Wherein, L is the cross entropy loss function to wrong tag along sort, and R is a regular terms, and function L is usually decomposed into finger Determine the sum of the loss of pixel:
Then by executing stochastic gradient descent to cross entropy loss function, in the segmentation task of 2012 data set of PASCALVOC Upper trim network model;
6) the optimization output of Gauss conditions random field
It does rear end using Gauss conditions random field to optimize, the energy function E (x) of Gauss conditions random field are as follows:
When A+ λ I is symmetric positive definite matrix, the minimum value for solving E (x) is equivalent to solve equation:
(A+ λ I) x=B (12).
2. the image, semantic dividing method according to claim 1 based on multi-stage characteristics fusion and Gauss conditions random field, It is characterized by: empty convolution allows by changing voidage r to be adaptively modified the size of receptive field, sense in step 2) It can be indicated by wild size are as follows:
F=((ksize+1)rrate-1) (3)
Wherein, ksizeRepresent the actual size of convolution kernel, rrateRefer to voidage.
3. the image, semantic dividing method according to claim 1 based on multi-stage characteristics fusion and Gauss conditions random field, It is characterized by: bilinear interpolation calculates target pixel value using four points in original image and carries out interpolation, root in step 4) According to Q11(x1,y1)、Q12(x1,y2)、Q21(x2,y1)、Q22(x2,y2) the f function value at 4 points can calculate unknown function f in point P Value at (x, y), specifically includes:
The first step carries out linear interpolation in the x direction and obtains:
Wherein R1=(x, y1), R2=(x, y2);
Second step passes through the calculated R of the first step to linear interpolation is carried out on the direction y1With R2Interpolation calculation goes out P in y-direction Point:
Wherein P=(x, y).
CN201811218436.2A 2018-10-19 2018-10-19 Image semantic segmentation method based on multistage feature fusion and Gaussian conditional random field Active CN109461157B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811218436.2A CN109461157B (en) 2018-10-19 2018-10-19 Image semantic segmentation method based on multistage feature fusion and Gaussian conditional random field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811218436.2A CN109461157B (en) 2018-10-19 2018-10-19 Image semantic segmentation method based on multistage feature fusion and Gaussian conditional random field

Publications (2)

Publication Number Publication Date
CN109461157A true CN109461157A (en) 2019-03-12
CN109461157B CN109461157B (en) 2021-07-09

Family

ID=65607897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811218436.2A Active CN109461157B (en) 2018-10-19 2018-10-19 Image semantic segmentation method based on multistage feature fusion and Gaussian conditional random field

Country Status (1)

Country Link
CN (1) CN109461157B (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110047047A (en) * 2019-04-17 2019-07-23 广东工业大学 Method, apparatus, equipment and the storage medium of three-dimensional appearance image information interpretation
CN110070022A (en) * 2019-04-16 2019-07-30 西北工业大学 A kind of natural scene material identification method based on image
CN110188765A (en) * 2019-06-05 2019-08-30 京东方科技集团股份有限公司 Image, semantic parted pattern generation method, device, equipment and storage medium
CN110210485A (en) * 2019-05-13 2019-09-06 常熟理工学院 The image, semantic dividing method of Fusion Features is instructed based on attention mechanism
CN110246084A (en) * 2019-05-16 2019-09-17 五邑大学 A kind of super-resolution image reconstruction method and its system, device, storage medium
CN110263732A (en) * 2019-06-24 2019-09-20 京东方科技集团股份有限公司 Multiscale target detection method and device
CN110348447A (en) * 2019-06-27 2019-10-18 电子科技大学 A kind of multiple-model integration object detection method with rich space information
CN110490842A (en) * 2019-07-22 2019-11-22 同济大学 A kind of steel strip surface defect detection method based on deep learning
CN110633715A (en) * 2019-09-27 2019-12-31 深圳市商汤科技有限公司 Image processing method, network training method and device and electronic equipment
CN110705344A (en) * 2019-08-21 2020-01-17 中山大学 Crowd counting model based on deep learning and implementation method thereof
CN110738647A (en) * 2019-10-12 2020-01-31 成都考拉悠然科技有限公司 Mouse detection method integrating multi-receptive-field feature mapping and Gaussian probability model
CN110837811A (en) * 2019-11-12 2020-02-25 腾讯科技(深圳)有限公司 Method, device and equipment for generating semantic segmentation network structure and storage medium
CN110969166A (en) * 2019-12-04 2020-04-07 国网智能科技股份有限公司 Small target identification method and system in inspection scene
CN111274995A (en) * 2020-02-13 2020-06-12 腾讯科技(深圳)有限公司 Video classification method, device, equipment and computer readable storage medium
CN111523546A (en) * 2020-04-16 2020-08-11 湖南大学 Image semantic segmentation method, system and computer storage medium
CN111539458A (en) * 2020-04-02 2020-08-14 咪咕文化科技有限公司 Feature map processing method and device, electronic equipment and storage medium
CN111738902A (en) * 2020-03-12 2020-10-02 超威半导体(上海)有限公司 Large convolution kernel real-time approximate fitting method based on bilinear filtering image hierarchy
CN112381020A (en) * 2020-11-20 2021-02-19 深圳市银星智能科技股份有限公司 Video scene identification method and system and electronic equipment
CN112837320A (en) * 2021-01-29 2021-05-25 武汉善睐科技有限公司 Remote sensing image semantic segmentation method based on parallel hole convolution
CN112948952A (en) * 2021-04-08 2021-06-11 郑州航空工业管理学院 Evolution prediction method for shield tunnel lining back cavity
WO2021115061A1 (en) * 2019-12-11 2021-06-17 中国科学院深圳先进技术研究院 Image segmentation method and apparatus, and server
CN113034371A (en) * 2021-05-27 2021-06-25 四川轻化工大学 Infrared and visible light image fusion method based on feature embedding
CN113033570A (en) * 2021-03-29 2021-06-25 同济大学 Image semantic segmentation method for improving fusion of void volume and multilevel characteristic information
CN113159038A (en) * 2020-12-30 2021-07-23 太原理工大学 Coal rock segmentation method based on multi-mode fusion
CN113597613A (en) * 2019-03-22 2021-11-02 辉达公司 Shape fusion for image analysis
CN113744169A (en) * 2021-09-07 2021-12-03 讯飞智元信息科技有限公司 Image enhancement method and device, electronic equipment and storage medium
CN114549913A (en) * 2022-04-25 2022-05-27 深圳思谋信息科技有限公司 Semantic segmentation method and device, computer equipment and storage medium
CN116777768A (en) * 2023-05-25 2023-09-19 珠海移科智能科技有限公司 Robust and efficient scanned document image enhancement method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104637045A (en) * 2013-11-14 2015-05-20 重庆理工大学 Image pixel labeling method based on super pixel level features
CN108062756A (en) * 2018-01-29 2018-05-22 重庆理工大学 Image, semantic dividing method based on the full convolutional network of depth and condition random field

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104637045A (en) * 2013-11-14 2015-05-20 重庆理工大学 Image pixel labeling method based on super pixel level features
CN108062756A (en) * 2018-01-29 2018-05-22 重庆理工大学 Image, semantic dividing method based on the full convolutional network of depth and condition random field

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
VEMULAPALLI R 等: "Gaussian Conditional Random Field Network for Semantic Segmentation", 《2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113597613A (en) * 2019-03-22 2021-11-02 辉达公司 Shape fusion for image analysis
CN110070022A (en) * 2019-04-16 2019-07-30 西北工业大学 A kind of natural scene material identification method based on image
CN110047047A (en) * 2019-04-17 2019-07-23 广东工业大学 Method, apparatus, equipment and the storage medium of three-dimensional appearance image information interpretation
CN110210485A (en) * 2019-05-13 2019-09-06 常熟理工学院 The image, semantic dividing method of Fusion Features is instructed based on attention mechanism
CN110246084B (en) * 2019-05-16 2023-03-31 五邑大学 Super-resolution image reconstruction method, system and device thereof, and storage medium
CN110246084A (en) * 2019-05-16 2019-09-17 五邑大学 A kind of super-resolution image reconstruction method and its system, device, storage medium
US11244196B2 (en) 2019-06-05 2022-02-08 Boe Technology Group Co., Ltd. Method of semantically segmenting input image, apparatus for semantically segmenting input image, method of pre-training apparatus for semantically segmenting input image, training apparatus for pre-training apparatus for semantically segmenting input image, and computer-program product
CN110188765A (en) * 2019-06-05 2019-08-30 京东方科技集团股份有限公司 Image, semantic parted pattern generation method, device, equipment and storage medium
CN110188765B (en) * 2019-06-05 2021-04-06 京东方科技集团股份有限公司 Image semantic segmentation model generation method, device, equipment and storage medium
CN110263732A (en) * 2019-06-24 2019-09-20 京东方科技集团股份有限公司 Multiscale target detection method and device
CN110348447A (en) * 2019-06-27 2019-10-18 电子科技大学 A kind of multiple-model integration object detection method with rich space information
CN110348447B (en) * 2019-06-27 2022-04-19 电子科技大学 Multi-model integrated target detection method with abundant spatial information
CN110490842A (en) * 2019-07-22 2019-11-22 同济大学 A kind of steel strip surface defect detection method based on deep learning
CN110705344B (en) * 2019-08-21 2023-03-28 中山大学 Crowd counting model based on deep learning and implementation method thereof
CN110705344A (en) * 2019-08-21 2020-01-17 中山大学 Crowd counting model based on deep learning and implementation method thereof
CN110633715A (en) * 2019-09-27 2019-12-31 深圳市商汤科技有限公司 Image processing method, network training method and device and electronic equipment
CN110738647A (en) * 2019-10-12 2020-01-31 成都考拉悠然科技有限公司 Mouse detection method integrating multi-receptive-field feature mapping and Gaussian probability model
CN110837811A (en) * 2019-11-12 2020-02-25 腾讯科技(深圳)有限公司 Method, device and equipment for generating semantic segmentation network structure and storage medium
CN110969166A (en) * 2019-12-04 2020-04-07 国网智能科技股份有限公司 Small target identification method and system in inspection scene
WO2021115061A1 (en) * 2019-12-11 2021-06-17 中国科学院深圳先进技术研究院 Image segmentation method and apparatus, and server
CN111274995A (en) * 2020-02-13 2020-06-12 腾讯科技(深圳)有限公司 Video classification method, device, equipment and computer readable storage medium
CN111274995B (en) * 2020-02-13 2023-07-14 腾讯科技(深圳)有限公司 Video classification method, apparatus, device and computer readable storage medium
CN111738902A (en) * 2020-03-12 2020-10-02 超威半导体(上海)有限公司 Large convolution kernel real-time approximate fitting method based on bilinear filtering image hierarchy
CN111539458B (en) * 2020-04-02 2024-02-27 咪咕文化科技有限公司 Feature map processing method and device, electronic equipment and storage medium
CN111539458A (en) * 2020-04-02 2020-08-14 咪咕文化科技有限公司 Feature map processing method and device, electronic equipment and storage medium
CN111523546B (en) * 2020-04-16 2023-06-16 湖南大学 Image semantic segmentation method, system and computer storage medium
CN111523546A (en) * 2020-04-16 2020-08-11 湖南大学 Image semantic segmentation method, system and computer storage medium
CN112381020A (en) * 2020-11-20 2021-02-19 深圳市银星智能科技股份有限公司 Video scene identification method and system and electronic equipment
CN113159038A (en) * 2020-12-30 2021-07-23 太原理工大学 Coal rock segmentation method based on multi-mode fusion
CN112837320B (en) * 2021-01-29 2023-10-27 华中科技大学 Remote sensing image semantic segmentation method based on parallel hole convolution
CN112837320A (en) * 2021-01-29 2021-05-25 武汉善睐科技有限公司 Remote sensing image semantic segmentation method based on parallel hole convolution
CN113033570A (en) * 2021-03-29 2021-06-25 同济大学 Image semantic segmentation method for improving fusion of void volume and multilevel characteristic information
CN113033570B (en) * 2021-03-29 2022-11-11 同济大学 Image semantic segmentation method for improving void convolution and multilevel characteristic information fusion
CN112948952A (en) * 2021-04-08 2021-06-11 郑州航空工业管理学院 Evolution prediction method for shield tunnel lining back cavity
CN113034371A (en) * 2021-05-27 2021-06-25 四川轻化工大学 Infrared and visible light image fusion method based on feature embedding
CN113744169A (en) * 2021-09-07 2021-12-03 讯飞智元信息科技有限公司 Image enhancement method and device, electronic equipment and storage medium
CN114549913B (en) * 2022-04-25 2022-07-19 深圳思谋信息科技有限公司 Semantic segmentation method and device, computer equipment and storage medium
CN114549913A (en) * 2022-04-25 2022-05-27 深圳思谋信息科技有限公司 Semantic segmentation method and device, computer equipment and storage medium
CN116777768A (en) * 2023-05-25 2023-09-19 珠海移科智能科技有限公司 Robust and efficient scanned document image enhancement method and device

Also Published As

Publication number Publication date
CN109461157B (en) 2021-07-09

Similar Documents

Publication Publication Date Title
CN109461157A (en) Image, semantic dividing method based on multi-stage characteristics fusion and Gauss conditions random field
CN110414377B (en) Remote sensing image scene classification method based on scale attention network
CN111259905B (en) Feature fusion remote sensing image semantic segmentation method based on downsampling
Sun et al. Lattice long short-term memory for human action recognition
Mohanty et al. Deep learning for understanding satellite imagery: An experimental survey
CN104376326B (en) A kind of feature extracting method for image scene identification
CN110428428A (en) A kind of image, semantic dividing method, electronic equipment and readable storage medium storing program for executing
CN106407986B (en) A kind of identification method of image target of synthetic aperture radar based on depth model
CN108549893A (en) A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN109409371A (en) The system and method for semantic segmentation for image
CN110287960A (en) The detection recognition method of curve text in natural scene image
CN110176027A (en) Video target tracking method, device, equipment and storage medium
CN110619369A (en) Fine-grained image classification method based on feature pyramid and global average pooling
CN112381097A (en) Scene semantic segmentation method based on deep learning
CN106909924A (en) A kind of remote sensing image method for quickly retrieving based on depth conspicuousness
CN107909015A (en) Hyperspectral image classification method based on convolutional neural networks and empty spectrum information fusion
CN110363201A (en) Weakly supervised semantic segmentation method and system based on Cooperative Study
CN112949647B (en) Three-dimensional scene description method and device, electronic equipment and storage medium
CN114937151A (en) Lightweight target detection method based on multi-receptive-field and attention feature pyramid
Anderson et al. Fuzzy choquet integration of deep convolutional neural networks for remote sensing
CN109165743A (en) A kind of semi-supervised network representation learning algorithm based on depth-compression self-encoding encoder
CN110334718A (en) A kind of two-dimensional video conspicuousness detection method based on shot and long term memory
CN107506792A (en) A kind of semi-supervised notable method for checking object
Liu et al. Coastline extraction method based on convolutional neural networks—A case study of Jiaozhou Bay in Qingdao, China
Mohanty et al. Deep learning for understanding satellite imagery: An experimental survey. Front

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant