CN107180430A

CN107180430A - A kind of deep learning network establishing method and system suitable for semantic segmentation

Info

Publication number: CN107180430A
Application number: CN201710342354.8A
Authority: CN
Inventors: 陶文兵; 张灿; 李坤乾
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2017-05-16
Filing date: 2017-05-16
Publication date: 2017-09-19

Abstract

The invention discloses a kind of deep learning network establishing method and system suitable for semantic segmentation, this method is on the basis of deconvolution network semantic segmentation, in view of condition random field it is preferable to edge optimization the characteristics of, condition random field is construed into Recursive Networks to be dissolved into deconvolution network, carry out end-to-end training, so that the parameter learning interaction in convolutional network and Recursive Networks, finally trains more preferable integrated network.Deconvolution network proposed by the present invention and the mode of condition random field joint training, obtain stronger details and shape information, solve the problem of image border segmentation is less accurate；With reference to multiple dimensioned input and the strategy in multiple dimensioned pond, situation about being split in semantic segmentation due to the big target that receptive field is single and produces by over-segmentation or Small object by leakage is solved.The present invention is extended to classical deconvolution network, using condition random field joint training and multicharacteristic information convergence strategy, improves the accuracy of semantic segmentation.

Description

A kind of deep learning network establishing method and system suitable for semantic segmentation

Technical field

The invention belongs to technical field of computer vision, more particularly, to a kind of depth suitable for semantic segmentation Practise network establishing method and system.

Background technology

With the explosive growth of web database technology, big data video procession is increasingly becoming a popular direction, Wherein deep learning technology has become the indispensable research tool of big data.Although the development time of deep learning is not long, Theory deposit is imperfect, but depth network establishing method emerges in an endless stream, and the application effect in computer vision direction is notable.Utilize Deep learning carries out visually-perceptible based on human brain vision mechanism, and multi-layer network designs the information processing vision for being analogous to classification System.The vision system processing point following sections of people, pixel is caught by pupil, and then cerebral cortex finds edge and direction, Then the shape of object is taken out by edge, the classification of object is finally further taken out.Depth network is similar, rudimentary level Edge feature is extracted, intergrade extracts shape facility and simultaneously does further abstract, finally obtains the behavior of whole target or target more High-rise feature is classified.Deep learning another new milestone as machine learning, has attracted increasing image Researcher participates, specific theoretical to include image classification, target identification, the problem of computer vision such as semantic segmentation is related, In terms of including intelligent DAS (Driver Assistant System), recognition of face, image retrieval.

The thinking for carrying out image recognition using machine learning is carried out generally according to the following steps：Sensor obtains image first Data, then by pretreatment and feature extraction, then carry out feature selecting, prediction are identified finally according to feature.Pre- place The purpose of reason, feature extraction and feature selecting is to find suitable feature representation in order to which grader is classified.Feature representation has Effect property often plays a part of most critical to the accuracy finally recognized, and the feature representation of early stage is all the feature manually extracted, Selection feature is complicated and laborious by hand, and years of researches are all lifted little to recognition result accuracy rate.Wherein most represent Property Scale invariant features transform, although rotation, scaling, brightness change are maintained the invariance, but for Protean For image, good recognition effect still can not be reached.Deep learning is automatically learned as unsupervised feature learning process Useful feature is practised, the training for adding mass data is supported and powerful Computing ability, is undoubtedly regarded as computer Feel the focus of research.Two dimensional image can directly as network input, it is to avoid need to enter data characteristics in traditional algorithm Row manual extraction and the process for rebuilding data.Convolutional neural networks mainly have the advantages that two aspects, first, it is straight by convolutional layer It is connected to dynamic training and extracts feature, it is to avoid the artificial extraction of feature, the feature extractor of training has more preferable robustness；The Two, the neuron on convolutional layer shares weight, can carry out parallel e-learning, reduce the training burden of parameter.Convolution god Indeformable X-Y scheme is distorted through network to displacement, scaling and other forms can preferably be recognized, as numerous meters The preferred model of calculation machine vision research person.

In numerous computer vision problems, the problem of image, semantic segmentation is one important and complicated.Image, semantic Segmentation is different with image classification detection, and image classification detection is the understanding for doing image level, and semantic segmentation is to do the reason of Pixel-level Solution, the target of semantic segmentation is a given pictures, and each pixel in picture is classified.Traditional partitioning algorithm is mainly solved Do not have markup semantics information to object classification in the problems such as foreground-background segmentation, cluster of image content, these problems, it is real Border needs that segmentation block is further processed if applying.End-to-end instruction can be directly carried out using convolutional neural networks Practice and predict, it is only necessary to which the data set of corresponding semantic segmentation, project training network structure, it is possible to obtain semantic segmentation are provided Result.

Since being used for Computer Vision Task from the convolutional neural networks in deep learning, numerous scholars are to image, semantic point Cut and also generate interest, and propose many convolutional neural networks for being applied to semantic segmentation, compared to conventional method before, The effect that the framework of deep learning carries out semantic segmentation is well a lot.Although it is semantic to have can be designed that preferable network is carried out Segmentation, but result is still not applied for all kinds of images, and the diversity of image make it that the amount of training data that needs prepare is very big, and Interference between of all categories causes the prediction of Pixel-level can not reach especially accurate.

The content of the invention

For the disadvantages described above or Improvement requirement of prior art, semanteme is applied to object of the present invention is to provide one kind The deep learning network establishing method and system of segmentation, thus solve the existing convolutional neural networks pair suitable for semantic segmentation The relatively low technical problem of the accuracy of semantic segmentation.

To achieve the above object, according to one aspect of the present invention, there is provided a kind of depth suitable for semantic segmentation Network establishing method is practised, including：

S1, the image progress multi-scale transform that data are concentrated, wherein, the image in the data set is according to classification It is marked；

S2, using the image and respective markers after multi-scale transform as deep learning network input, it is and then right Network structure file and Solution To The Network file in the Caffe frameworks of deep learning network are modified, wherein, the depth Practise includes convolutional network, deconvolution network and mean field iteration layer, the modification bag of the network structure file successively in network The network settings in multiple dimensioned pond are included, the modification of the Solution To The Network file is set including training parameter；

S3, the mean field iteration layer in using mean field iterative algorithm to the deconvolution network output be iterated Optimization；

S4, according to amended network structure file and Solution To The Network file, using deconvolution network and condition random field The mode of joint training, obtains target deep learning network, the target deep learning network can be to passing through multi-scale transform Image to be tested afterwards carries out semantic segmentation.

Preferably, step S2 specifically includes following sub-step：

S2.1, image and respective markers after multi-scale transform is sent to as input builds leveldb The file that Caffe can be used directly can be modified as in operation program；

S2.2, set the type and network structure of convolutional layer in network structure file in Caffe and pond layer literary Operating parameter in part, carries out multiple dimensioned pondization to last layer of pond layer and operates, the image of input is divided into and multiple dimensioned pond Change corresponding multiple regions, and obtain the value in each region and insert last layer of pond layer；

S2.3, the realization of mean field algorithm is added in the caffe frameworks of deep learning network；

S2.4, caffe.proto update ID (M, N), and arrange parameter；SIMPLE_FAST_MEANFIELD=M, MULTI_STAGE_MEANFIELD=N, wherein, M, N are positive integer；

The training file and test text in network structure file in S2.5, the caffe frameworks of change deep learning network Part, adds corresponding mean field iteration layer；

S2.6, the network model in training file, basic learning rate, study more new strategy, last gradient updating Weight, maximum iteration and operational mode are configured.

Preferably, step S3 specifically includes following sub-step：

S3.1, byObtain the feed back input of mean field iteration, wherein V₂(t)=f_θ(U, V₁(t), I), 0≤t≤T represents the output by mean field iteration；

S3.2, byFinal output result is obtained, wherein, soft max return for progress probability One changes operation, and U is the output of deconvolution network, and t represents current iteration, and T represents iteration total degree, V₁And V₂During for iteration Between variable, I for input the two dimensional image after multi-scale transform, f_θFor mean field iterative algorithm calculating process, θ is needs The parameter of the condition random field of training, specifically includes the coefficient between the weight coefficient of each gaussian kernel function and binary crelation, Y (t) exported for final semantic segmentation.

Preferably, the output V of final mean field iteration₂(t) circular is：

A1, with deconvolution network semantic segmentation rough result to unitary potential function U_i(l) initialized, and byObtain probability normalized value, wherein Z_i=∑_lexp(U_i(l)), l is category label, U_i(l) it is i pictures Element belongs to the probability of l classifications；

A2, pass through gaussian kernel function k^m(p_i,p_j) influencing each other and using coefficient ω between transmission pixel key words sorting^(m) Its weighted sum is sought, wherein being represented with below equation：

Wherein, i, j represent pixel, p_i, p_jRepresent the pixel value of corresponding pixel points, k^m(p_i,p_j) represent m-th of Gaussian kernel Function；

A3, obtain according to the coefficient μ (l, l') between balanced binary crelation influencing each other between pixel key words sorting Relation：Q_i(l)=∑_l'∈Lμ(l,l')Q_i(l'), wherein, l' represents to be different from_lClassification, L represents the set of all categories；

A4, addition unitary potential function U_i(l), it is specially：Q_i(l)=U_i(l)-Q_i(l)；

A5, byUpdate Q_i(l) new input, is will be output as, step A2 is jumped to until receipts Hold back or reach maximum iteration, wherein, Z_i=∑_lexp(U_i(l) Q), finally given_i(l) it is that mean field iteration is defeated Go out V₂(t)。

Preferably, step S4 specifically includes following sub-step：

S4.1, using the image and respective markers after multi-scale transform as input it is sent to the deep learning net Network；

S4.2, by the convolutional network extract image target area feature, pass through the deconvolution network reduce The detailed information and shape information of the target area, obtain reality output probability of all categories；

Difference between the actual output probability of S4.3, calculating and mark；

S4.4, convolution nuclear parameter and offset vector are adjusted by the method backpropagation of minimization error according to the difference joined The Optimal Parameters of number and condition random field.

Preferably, it is described multiple dimensioned including 3 yardsticks, respectively 0.5,1,1.5, represent to carry out corresponding multiple to original image Scaling.

Preferably, the multiple dimensioned pondization is using 3 kinds of different yardsticks, and respectively 1 × 1,2 × 2,4 × 4, respectively will figure As being divided into 1 region, 4 regions, 16 regions.

It is another aspect of this invention to provide that there is provided a kind of deep learning network building systems suitable for semantic segmentation, Including：

Image transform module, the image for being concentrated to data carries out multi-scale transform, wherein, the figure in the data set As being marked according to classification；

Setup module, for regarding the image and respective markers after multi-scale transform as the defeated of deep learning network Enter, and then the network structure file and Solution To The Network file in the Caffe frameworks of deep learning network are modified, wherein, Include convolutional network, deconvolution network and mean field iteration layer, the network structure text in the deep learning network successively The modification of part includes the network settings in multiple dimensioned pond, and the modification of the Solution To The Network file is set including training parameter；

Optimization module, for defeated to the deconvolution network using mean field iterative algorithm in mean field iteration layer Go out and be iterated optimization；

Joint training module, for according to amended network structure file and Solution To The Network file, using deconvolution net Network and the mode of condition random field joint training, obtain target deep learning network, the target deep learning network can be right Image to be tested after multi-scale transform carries out semantic segmentation.

In general, the inventive method can obtain following beneficial effect compared with prior art：

(1) present invention is on the basis of deconvolution network semantic segmentation method, it is contemplated that condition random field is to edge optimization Preferably the characteristics of, condition random field is construed to Recursive Networks and is dissolved into deconvolution network, carry out end-to-end training so that Parameter learning interaction in convolutional network and Recursive Networks, finally trains more preferable deep learning network.

(2) present invention proposes the mode of a kind of deconvolution network and condition random field joint training, and parameter has stronger Robustness, stronger details and shape information can be obtained, solve image border segmentation it is less accurate the problem of.

(3) impression of the invention by inputting multiple dimensioned picture and changing neutral net using the strategy in multiple dimensioned pond Open country, the change of receptive field ensure that the integrality of big Small object so that the deep learning network trained can solve semantic point In cutting due to receptive field is single and produces big target by over-segmentation or Small object by the situation of leakage segmentation.

(4) present invention is extended to classical deconvolution network, is believed using condition random field joint training and multiple features Cease the strategy of fusion so that the deep learning network trained can improve semantic segmentation when carrying out semantic segmentation to image Accuracy.

Brief description of the drawings

Fig. 1 is a kind of network frame figure of deep learning network suitable for semantic segmentation disclosed in the embodiment of the present invention；

Fig. 2 is a kind of flow of deep learning network establishing method suitable for semantic segmentation disclosed in the embodiment of the present invention Schematic diagram；

Fig. 3 is a kind of mean field alternative manner schematic diagram disclosed in the embodiment of the present invention.

Embodiment

In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.As long as in addition, technical characteristic involved in each embodiment of invention described below Not constituting conflict each other can just be mutually combined.

It is a kind of network frame of deep learning network suitable for semantic segmentation disclosed in the embodiment of the present invention as shown in Figure 1 Frame figure, includes convolutional network, deconvolution network and mean field iteration layer (i.e. successively in the network frame figure shown in Fig. 1 CRF-RNN layers).

It is illustrated in figure 2 a kind of deep learning network establishing method suitable for semantic segmentation disclosed in the embodiment of the present invention Schematic flow sheet, this method mainly include following steps：1) data collection and pretreatment；2) network frame (Convolutional Architecture for Fast Feature Embedding, Caffe) associated documents are changed；3) Mean field iterative process；4) condition random field and the training of deconvolution network association.Its embodiment is as follows：

S1, the image progress multi-scale transform that data are concentrated, wherein, the image in above-mentioned data set is according to classification It is marked；

Wherein, it is multiple dimensioned including 3 yardsticks, respectively 0.5,1,1.5, contracting of the expression to the corresponding multiple of original image progress Put.

S2, using the image and respective markers after multi-scale transform as deep learning network input, it is and then right Network structure file and Solution To The Network file in the Caffe frameworks of deep learning network are modified, wherein, deep learning net Include convolutional network, deconvolution network and mean field iteration layer in network successively, the modification of network structure file is including multiple dimensioned The network settings in pond, the modification of Solution To The Network file is set including training parameter；

Wherein, step S2 specifically includes following sub-step：

S2.2, set the type and network structure of convolutional layer in network structure file in Caffe and pond layer literary Operating parameter in part, multiple dimensioned pondization operation is carried out to last layer of pond layer, by the image of input be divided into it is multiple dimensioned The corresponding multiple regions of pondization, and obtain the value in each region and insert pond layer；

S2.4, caffe.proto update ID (M, N), and arrange parameter；SIMPLE_FAST_MEANFIELD=M, MULTI_STAGE_MEANFIELD=N, wherein, M, N are positive integer；Preferably, M values are that 54, N values are 55；

Training file and test file in S2.5, the caffe frameworks of change deep learning network, addition are corresponding average Field iteration layer meanfield；

The network architecture part for training file is added into multiple dimensioned operation part, multiple dimensioned pondization is using 3 kinds of different chis Degree, respectively 1 × 1,2 × 2,4 × 4, image is divided into 1 region, 4 regions, 16 regions respectively, and obtain each region Value insert pond layer；Training core document solver.prototxt needs to be configured, and mainly includes network model title (train.prototxt used during training), basic learning rate (base_lr=0.01) learns more new strategy (lr_ policy:" step "), the weight (momentum of last gradient updating:0.9), maximum iteration (max_iter: 20000), operational mode (GPU) etc..

S2.6, the network model in the training file, basic learning rate, study more new strategy, last gradient are more New weight, maximum iteration and operational mode is configured.

S3, in mean field iteration layer using mean field iterative algorithm optimization is iterated to the output of deconvolution network；

A kind of mean field alternative manner schematic diagram disclosed in the embodiment of the present invention is illustrated in figure 3, including：

S3.2, byFinal output result is obtained, wherein, soft max return for progress probability One changes operation, and U is the output (i.e. semantic segmentation rough result) of deconvolution network, and t represents current iteration, and T represents that iteration is always secondary Number, V₁And V₂Intermediate variable during for iteration, I is the two dimensional image after multi-scale transform of input, f_θFor mean field iteration Algorithm calculating process, θ for need train condition random field parameter, specifically include each gaussian kernel function weight coefficient and Coefficient between binary crelation, Y (t) exports for final semantic segmentation.

Wherein, the output V of final mean field iteration₂(t) circular is：

A1, with deconvolution network semantic segmentation rough result to unitary potential function U_i(l) initialized, and byObtain probability normalized value, wherein Z_i=Σ_lexp(U_i(l)), l is category label, U_i(l) it is i pictures Element belongs to the probability of l classifications；

Wherein, i, j represent pixel, p_i, p_jRepresent the pixel value of corresponding pixel points, k^m(p_i,p_j) represent m-th of Gaussian kernel Function, the number of gaussian kernel function can be determined according to actual needs；

For example, using two different gaussian kernel functions, i.e. m takes 1 and 2, θ_α, θ_β, θ_γIt is specifically configured to 160,3,3, core letter P in number_i, p_jThe color value of respectively i, j pixel, I_i, I_jFor i, the position coordinate value of j pixels.

A3, obtain according to the coefficient μ (l, l') between balanced binary crelation influencing each other between pixel key words sorting Relation：Q_i(l)=Σ_l'∈Lμ(l,l')Q_i(l'), wherein, l' represents the classification different from l, and L represents the set of all categories；

Wherein, the otherness of each classification is mainly considered, two class its coefficient μ (l, l') less for difference are smaller, Span is -1 to 0.

A5, byUpdate Q_i(l) new input, is will be output as, step A2 is jumped to until receipts Hold back or reach maximum iteration, wherein, Z_i=Σ_lexp(U_i(l) Q), finally given_i(l) it is that mean field iteration is defeated Go out V₂(t)。

Preferably, maximum iteration is 10.

Wherein, softmax regression models are that two classification problems based on logistics models are promoted, can be by It is applied to many classification problems, specifically, for training set { (x⁽¹⁾,y⁽¹⁾),...(x^(l),y^(l)), x⁽ⁱ⁾For training sample (this In i.e. each pixel pixel value), y⁽ⁱ⁾For the corresponding label of each pixel, y⁽ⁱ⁾∈{1,2,...,k}.For each input X by convolutional neural networks, it is necessary to obtain its probability for belonging to each class, and we simply represent that whole network needs training with θ Parameter, then can be characterized with hypothesis function, it is specific as follows：

Wherein θ₁,θ₂,...,θ_kTo need the model parameter trained,This is general for normalized output Rate, principal security probability and for 1.

S4, according to amended network structure file and Solution To The Network file, using deconvolution network and condition random field The mode of joint training, obtains target deep learning network, and the target deep learning network can be to after multi-scale transform Image to be tested carry out semantic segmentation.

Wherein, step S4 is mainly joint training, and whole network is joined together to be trained, by the data set of collection and Corresponding mark label is input to network；By convolutional layer, pond layer etc. carries out the extraction step by step of feature, by deconvolution net Network reduces the detailed information and shape information of target, obtains the probability graph of each class in this region, probability graph and artwork are input to The output probability figure after being optimized is iterated in mean field iterative process；Finally calculate output and the mark of reality Between difference and by the method backpropagation adjustment convolution nuclear parameter and offset vector parameter and condition random of minimization error The Optimal Parameters of field, preserve the parameter of network to be tested.Specifically include following sub-step：

After the network trained, test pictures can be subjected to multi-scale transform, be transformed to 0.5,1,1.5 Three kinds of yardsticks, and be sequentially sent into the depth network trained, the probability graph of multiple dimensioned picture is subjected to summation normalization Operation obtains final probability graph, and final semantic segmentation result is obtained according to probability graph.

As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, it is not used to The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the invention etc., it all should include Within protection scope of the present invention.

Claims

1. a kind of deep learning network establishing method suitable for semantic segmentation, it is characterised in that including：

S1, the image progress multi-scale transform that data are concentrated, wherein, the image in the data set has been carried out according to classification Mark；

S2, using the image and respective markers after multi-scale transform as deep learning network input, and then to depth Network structure file and Solution To The Network file in the Caffe frameworks of learning network are modified, wherein, the deep learning net Include convolutional network, deconvolution network and mean field iteration layer in network successively, the modification of the network structure file is including more The network settings in yardstick pond, the modification of the Solution To The Network file is set including training parameter；

S3, the mean field iteration layer in using mean field iterative algorithm to the deconvolution network output be iterated it is excellent Change；

S4, according to amended network structure file and Solution To The Network file, combined using deconvolution network and condition random field The mode of training, obtains target deep learning network, the target deep learning network can be to after multi-scale transform Image to be tested carries out semantic segmentation.

2. according to the method described in claim 1, it is characterised in that step S2 specifically includes following sub-step：

S2.1, image and respective markers after multi-scale transform be sent to as input build leveldb and transport In line program, the file that Caffe can be used directly is modified as；

S2.2, set Caffe in network structure file in convolutional layer and pond layer type and network structure file in Operating parameter, multiple dimensioned pondization operation is carried out to last layer of pond layer, the image of input is divided into and multiple dimensioned pondization pair The multiple regions answered, and obtain the value in each region and insert last layer of pond layer；

S2.4, caffe.proto update ID (M, N), and arrange parameter；SIMPLE_FAST_MEANFIELD=M, MULTI_ STAGE_MEANFIELD=N, wherein, M, N are positive integer；

The training file and test file in network structure file in S2.5, the caffe frameworks of change deep learning network, adds Plus corresponding mean field iteration layer；

S2.6, the network model in training file, basic learning rate, study more new strategy, the weight of last gradient updating, Maximum iteration and operational mode are configured.

3. according to the method described in claim 1, it is characterised in that step S3 specifically includes following sub-step：

S3.1, byObtain the feed back input of mean field iteration, wherein V₂(t)=f_θ(U,V₁ (t), I), 0≤t≤T represents the output by mean field iteration；

S3.2, byFinal output result is obtained, wherein, normalization of the softmax to carry out probability is grasped Make, U is the output of deconvolution network, t represents current iteration, and T represents iteration total degree, V₁And V₂Intermediate variable during for iteration, I is the two dimensional image after multi-scale transform of input, f_θFor mean field iterative algorithm calculating process, θ trains for needs Coefficient between the parameter of condition random field, including the weight coefficient and binary crelation of each gaussian kernel function, Y (t) is final Semantic segmentation output.

4. method according to claim 3, it is characterised in that the output V of final mean field iteration₂(t) specific calculating Method is：

A2, pass through gaussian kernel function k^m(p_i,p_j) influencing each other and using coefficient ω between transmission pixel key words sorting^(m)Ask it Weighted sum, wherein being represented with below equation：

<mrow> <msubsup> <mi>Q</mi> <mi>i</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>&NotEqual;</mo> <mi>i</mi> </mrow> </msub> <msup> <mi>k</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> </msup> <mrow> <mo>(</mo> <msub> <mi>f</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>f</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>Q</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </mrow>

<mrow> <msub> <mi>Q</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>&Sigma;</mi> <mi>m</mi> </msub> <msup> <mi>&omega;</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> </msup> <msubsup> <mi>Q</mi> <mi>i</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </mrow>

Wherein, i, j represent pixel, p_i, p_jRepresent the pixel value of corresponding pixel points, k^m(p_i,p_j) represent m-th of Gaussian kernel letter Number；

A3, obtain according to the coefficient μ (l, l') between balanced binary crelation the relation that influences each other between pixel key words sorting： Q_i(l)=∑_l'∈Lμ(l,l')Q_i(l'), wherein, l' represents the classification different from l, and L represents the set of all categories；

A5, byUpdate Q_i(l), will be output as new input, jump to step A2 until convergence or Person reaches maximum iteration, wherein, Z_i=∑_lexp(U_i(l) Q), finally given_i(l) it is mean field iteration output V₂ (t)。

5. method according to claim 4, it is characterised in that step S4 specifically includes following sub-step：

S4.1, using the image and respective markers after multi-scale transform as input it is sent to the deep learning network；

S4.2, extracted by the convolutional network image target area feature, reduced by the deconvolution network described The detailed information and shape information of target area, obtain reality output probability of all categories；

S4.4, according to the difference by the method backpropagation of minimization error adjust convolution nuclear parameter and offset vector parameter with And the Optimal Parameters of condition random field.

6. the method according to claim 1 to 5 any one, it is characterised in that described multiple dimensioned including 3 yardsticks, point Wei not 0.5,1,1.5, scaling of the expression to the corresponding multiple of original image progress.

7. the method according to claim 1 to 5 any one, it is characterised in that the multiple dimensioned pondization is using 3 kinds of differences Yardstick, respectively 1 × 1,2 × 2,4 × 4, image is divided into 1 region, 4 regions, 16 regions respectively.

8. a kind of deep learning network building systems suitable for semantic segmentation, it is characterised in that including：

Image transform module, the image for being concentrated to data carries out multi-scale transform, wherein, the image in the data set is equal It is marked according to classification；

Setup module, for using the image and respective markers after multi-scale transform as deep learning network input, And then the network structure file and Solution To The Network file in the Caffe frameworks of deep learning network are modified, wherein, it is described Include convolutional network, deconvolution network and mean field iteration layer in deep learning network successively, the network structure file Modification includes the network settings in multiple dimensioned pond, and the modification of the Solution To The Network file is set including training parameter；

Optimization module, for the mean field iteration layer in using mean field iterative algorithm the deconvolution network is exported into Row iteration optimizes；

Joint training module, for according to amended network structure file and Solution To The Network file, using deconvolution network and The mode of condition random field joint training, obtains target deep learning network, and the target deep learning network can be to passing through Image to be tested after multi-scale transform carries out semantic segmentation.