CN109829929A

CN109829929A - A kind of level Scene Semantics parted pattern based on depth edge detection

Info

Publication number: CN109829929A
Application number: CN201811649016.XA
Authority: CN
Inventors: 王祎男; 王宇
Original assignee: FAW Group Corp
Current assignee: FAW Group Corp
Priority date: 2018-12-30
Filing date: 2018-12-30
Publication date: 2019-05-31

Abstract

The present invention relates to a kind of level Scene Semantics parted patterns based on depth edge detection, which comprises the following steps: (1) extraction for, using PSPNet carrying out semantic feature is set using parameter of the Adam algorithm to PSPNet network.(2), the semantic feature for using PSPNet network to obtain in domain on the basis of transformation model, by controlling the size of loss, constantly adjusts network parameter as input, training whole network；(3), the part close to edge is filtered using Fully-CRF algorithm, obtains final result.By being directed to the filtering processing of marginal portion, realizes the detection for marginal portion higher precision, further promote the accuracy of Scene Semantics segmenting structure.

Description

A kind of level Scene Semantics parted pattern based on depth edge detection

Technical field

The present invention relates to a kind of level Scene Semantics parted patterns based on depth edge detection, belong to scene cut technology Field.

Background technique

With the fast development of computer vision, various algorithms based on computer vision are constantly innovated again, are improved, And Algorithm of Scene is also one of them indispensable widely used part.Algorithm of Scene is often applied to various Need to simulate the place of current environment, such as unmanned vehicle technology, scene cut technology is often wherein providing eyes Effect is responsible for simulating current environment locating for vehicle as far as possible to come, for the decision judgement after vehicle.Future with The continuous improvement of scene cut precision, will play the role of in more areas even more important.

It in the prior art, is that an image block is taken centered on some pixel for the classical way of semantic segmentation, Then the feature of image block is taken to go to train classifier as sample.In test phase, similarly in test chart on piece with each picture An image block is adopted centered on vegetarian refreshments to classify, predicted value of the classification results as the pixel, finally realize point of pixel Class is to achieve the purpose that scene cut.But more noise, and the detection for marginal portion will appear to scene cut in this way It is easy to appear more fault.

Summary of the invention

The purpose of the present invention is to provide a kind of level Scene Semantics parted patterns based on depth edge detection, pass through needle Filtering processing to marginal portion, realizes the detection for marginal portion higher precision, further promotes Scene Semantics segmentation knot The accuracy of structure.

For achieving the above object, the technical scheme of the present invention is realized as follows: a kind of detected based on depth edge Level Scene Semantics parted pattern, which comprises the following steps:

(1), the extraction that semantic feature is carried out using PSPNet, is set using parameter of the Adam algorithm to PSPNet network It is fixed.

(2), the semantic feature for using PSPNet network to obtain trains whole network, the base of transformation model in domain as input On plinth, by controlling the size of loss, network parameter is constantly adjusted, so that domain converts density, that is, side in the transformation model of domain Certain accuracy is reached for the size detection of edge degree in hoddy network；Whole network joins whole network using SGD algorithm Number is adjusted setting.It is filtered hence for the position far from edge.

Wherein, in order to promote the detection effect at semantic edge, a convolutional layer is additionally added in edge network herein, And the output channel number of this layer is set as 10.All convolution kernels in edge network are sized to 1, convolution size is 1 Convolution kernel carries out process of convolution to each pixel in characteristic pattern, functions as full articulamentum, can be to a certain degree Upper acquisition global information, completes the effect of coding, and compares with full articulamentum, and the parameter amount of the convolutional layer is considerably less, thus The complexity of network model is simplified, overfitting problem can be effectively inhibited.So compared with full articulamentum, convolution having a size of 1 convolutional layer is more suitable for completing Fusion Features operation.

(3) part close to edge is filtered using Fully-CRF algorithm

Fully-CRF combines maximum entropy model and hidden Markov model, is a kind of undirected graph model, and with picture picture Element is that node constructs its energy function.The connection of Fully-CRF be it is global, binary potential function describes each pixel Relationship between other all pixels.

The positive effect of the present invention is the detection realized for marginal portion higher precision, further promotes Scene Semantics point Cut the accuracy of structure.

Detailed description of the invention

Fig. 1 (a) is the overall flow figure of the level Scene Semantics parted pattern detected the present invention is based on depth edge.

Fig. 1 (b) is the overall flow figure of the level Scene Semantics parted pattern detected the present invention is based on depth edge.

Fig. 2 is the flow through a network figure of PSPNet algorithm in the present invention.

Fig. 3 is domain conversion operation schematic diagram in the present invention.

Fig. 4 (a) is a kind of circulation way of domain conversion in the present invention.

Fig. 4 (b) is another circulation way of domain conversion in the present invention.

Fig. 5 is the filter action schematic diagram that Fully-CRF algorithm plays in the present invention.

Specific embodiment

A specific embodiment of the invention is described with reference to the accompanying drawing, preferably so as to those skilled in the art Understand the present invention.Requiring particular attention is that in the following description, when known function and the detailed description of design perhaps When can desalinate main contents of the invention, these descriptions will be ignored herein.

Fig. 1 (a) (b) is that the present invention is based on the overall flow figures of the level Scene Semantics parted pattern of depth edge detection.

In the present embodiment, as shown in Fig. 1 (a) (b), a kind of level Scene Semantics segmentation of depth edge detection of the present invention Model, comprising the following steps:

S1, the extraction that semantic feature is carried out using PSPNet, are set using parameter of the Adam algorithm to PSPNet network It is fixed.

As shown in Fig. 2, the semantic feature of Encoder is down sampled to different scale in last Decoder by PSPNet Under, these semantic features are then upsampled to size identical with input picture, finally carry out Fusion Features.So PSPNet Sharpest edges be to semantic feature carry out part and globality extraction and fusion

S2, the semantic feature for using PSPNet network to obtain train whole network, the basis of transformation model in domain as input On, by controlling the size of loss, network parameter is constantly adjusted, so that domain converts density, that is, edge in the transformation model of domain Certain accuracy is reached for the size detection of edge degree in network；Whole network is using SGD algorithm to whole network parameter It is adjusted setting.It is filtered hence for the position far from edge.

As shown in Fig. 4 (a), specific filtering is as follows, it is assumed that the length of one-dimensional input signal x is N { x₁,x₂, x₃...x_N, if the y of output y₁=x₁, then it is designated as i=2 under ... the processing of N is as follows:

y_i=(1-w_i)x_i+w_iy_i-1 (1.1)

Wherein weight w_iDensity d is converted by domain_iIt acquires:

But the filtering that formula (1.1) is carried out is asymmetric, because the output of current position relies only on one Output at position can be such that filtering deviates towards a direction as a result, such asymmetry is handled, and will lead to worse segmentation result It is passed down.In order to solve this problem, domain transformation has successively used the filtering operation of four direction respectively, is respectively: from a left side To the right side, from right to left, from top to bottom, from top to bottom.As shown in figure 3, domain conversion is a separation side to the processing of 2D signal Carried out under formula, i.e., individual one-dimensional signal filtering carried out respectively per one-dimensional to spatial domain: first carry out horizontal direction filtering (from It is left-to-right and from right to left), carrying out the filtering (from top to bottom and from top to bottom) in vertical direction.It converts in each iteration in domain When reduce the standard deviation of filtering core, and require total variance and be equal to desired varianceThat is:

σ is used at kth time iteration_kInstead of σ_s, to calculate weight w_i.Domain converts density d_iIt is defined as:

Wherein variable g_i> 0 is the output of edge network as a result, σ_rIndicate standard of the filtering core on edge detection characteristic pattern Difference.It should be noted g_iValue it is more big more showing that the probability for belonging to edge at the position i is bigger.So working as g_iWhen bigger, The output of domain conversion is compared dependent on original input signal x_i(semantic feature), works as g_iWhen smaller, the output of domain conversion Compare dependent on upper result y_i-1, to realize the filtering to semantic feature in the place far from edge.

As shown in Fig. 4 (b), it is assumed that node y_iNot only influence next node y_i+1, but also a succeeding layer is acted as, because This obtains gradient value from current layer in the back-propagation process of convolutional network.At this time gradient propagation formula is as follows:

WhereinWithIt is initialised and is set as 0,Initialization be transmitted by succeeding layer Lai value set by. Weight w_iIt is shared in all filtering stages (horizontal direction filtering and vertical direction filtering) and the number of iterations.

Using these partial derivatives, can produce relative to margin signal g_iDerivative.Formula (1.4) is updated to formula (1.2), available:

Then rule is sought according to local derviation, formula (1.8) is updated to (1.6), the derivation of margin signal can be obtained are as follows:

So far the penalty values being calculated at loss layers can be transmitted to edge network and semantic feature extraction net respectively Network.

S3, the part close to edge is filtered using Fully-CRF algorithm

As shown in figure 5, shown in such as formula of energy function constructed by Fully-CRF (1.10).

Wherein x is the class prediction to pixel.θ_i(x_i)=- logP (x_i), and P (x_i) it is convolutional network in the position i place The classification of calculating belongs to probability.Potential function are as follows:

Wherein work as x_i≠x_jWhen, μ (x_i,x_j)=1, in addition to this μ (x_i,x_j)=0.As shown in figure 5, in picture no matter as The distance between plain i and pixel j are how far, have a connection between two pixels, so the graph model connects entirely.It is each A k^mFeature extraction between pixel i, j is corresponding, weight w_m, and the position between pixel and colouring information are considered, Expression are as follows:

Wherein variable p and I respectively indicates position and the rgb value of pixel.First Gaussian kernel dependent on pixel position and Color, and second Gaussian kernel depends on the position of pixel.Parameter σ_α、σ_β、σ_γIt is the parameter of corresponding Gaussian kernel.

Although the illustrative specific embodiment of the present invention is described above, in order to the technology of the art Personnel understand the present invention, it should be apparent that the present invention is not limited to the range of specific embodiment, to the common skill of the art For art personnel, if various change the attached claims limit and determine the spirit and scope of the present invention in, these Variation is it will be apparent that all utilize the innovation and creation of present inventive concept in the column of protection.

Claims

1. a kind of level Scene Semantics parted pattern based on depth edge detection, which comprises the following steps:

(1), the extraction that semantic feature is carried out using PSPNet, is set using parameter of the Adam algorithm to PSPNet network；

(2), the semantic feature for using PSPNet network to obtain trains whole network as input, in domain on the basis of transformation model, By controlling the size of loss, network parameter is constantly adjusted, so that domain converts density d in the transformation model of domain_i, that is, edge net Certain accuracy is reached for the size detection of edge degree in network；Whole network using SGD algorithm to whole network parameter into Row adjustment setting, is filtered hence for the position far from edge；

Wherein, in order to promote the detection effect at semantic edge, a convolutional layer is additionally added in edge network herein, and set The output channel number of this layer is 10；All convolution kernels in edge network are sized to 1, the convolution that convolution size is 1 It checks each pixel in characteristic pattern and carries out process of convolution, function as full articulamentum, can obtain to a certain extent Global information is taken, the effect of coding is completed, and is compared with full articulamentum, the parameter amount of the convolutional layer is considerably less, to simplify The complexity of network model, can effectively inhibit overfitting problem；So comparing with full articulamentum, convolution is having a size of volume 1 Lamination is more suitable for completing Fusion Features operation；

(3), the part close to edge is filtered using Fully-CRF algorithm, obtains final result.

2. the level Scene Semantics parted pattern according to claim 1 based on depth edge detection, which is characterized in that institute The domain switched filter operation stated are as follows:

Assuming that the length of one-dimensional input signal x is N { x₁,x₂,x₃…x_N, if the y of output y₁=x₁, then it is designated as i=2 under, ... the processing of N is as follows:

y_i=(1-w_i)x_i+w_iy_i-1 (1.1)

Wherein weight w_iDensity d is converted by domain_iIt acquires:

But the filtering that formula (1.1) is carried out is asymmetric, because the output of current position relies only on a position The output at place can be such that filtering deviates towards a direction as a result, such asymmetry is handled, and will lead to worse segmentation result and passed It passs down；In order to solve this problem, domain transformation has successively used the filtering operation of four direction respectively, is respectively: from left to right, From right to left, from top to bottom, from top to bottom；Conversion to the processing of 2D signal is carried out under a separate mode, i.e., to space Domain carries out individual one-dimensional signal filtering per one-dimensional respectively: first carry out the filtering of horizontal direction from left to right and from right to left, Carrying out the filtering in vertical direction from top to bottom and from top to bottom；Domain conversion reduces the mark of filtering core when each iteration Quasi- deviation, and require total variance and be equal to desired varianceThat is:

σ is used at kth time iteration_kInstead of σ_s, to calculate weight w_i；Domain converts density d_iIt is defined as:

Wherein variable g_i> 0 is the output of edge network as a result, σ_rIndicate standard deviation of the filtering core on edge detection characteristic pattern； It should be noted g_iValue it is more big more showing that the probability for belonging to edge at the position i is bigger；So working as g_iWhen bigger, domain The output of conversion is compared dependent on original input signal x_iSemantic feature works as g_iWhen smaller, the output of domain conversion is compared Dependent on upper result y_i-1, to realize the filtering to semantic feature in the place far from edge；

Assuming that node y_iNot only influence next node y_i+1, but also a succeeding layer is acted as, therefore in the reversed of convolutional network Gradient value is obtained from current layer in communication process；At this time gradient propagation formula is as follows:

WhereinWithIt is initialised and is set as 0,Initialization be transmitted by succeeding layer Lai value set by, weight w_i It is shared in all filtering of filtering stage horizontal direction and vertical direction filtering and the number of iterations；

Using these partial derivatives, can produce relative to margin signal g_iDerivative；Formula (1.4) is updated to formula (1.2), It is available:

So far it can be transmitted to edge network and semantic feature extraction network respectively in oss layers of penalty values being calculated of l.