CN110390251A

CN110390251A - A kind of pictograph semantic segmentation method based on the processing of multiple neural network Model Fusion

Info

Publication number: CN110390251A
Application number: CN201910403196.1A
Authority: CN
Inventors: 刘晋; 张鑫; 李云辉
Original assignee: Shanghai Maritime University
Current assignee: Shanghai Maritime University
Priority date: 2019-05-15
Filing date: 2019-05-15
Publication date: 2019-10-29
Anticipated expiration: 2039-05-15
Also published as: CN110390251B

Abstract

The present invention provides a kind of pictograph semantic segmentation methods based on the processing of multiple neural network Model Fusion, including multiple/a variety of semantic segmentation model training methods and multi-model method for amalgamation processing two parts.The present invention is using multiple dimensioned full convolutional neural networks model M SFCN, U-shaped full convolutional neural networks model U-net, the full convolutional neural networks model R-FCN based on region, the faster multiple semantic segmentation network models such as region convolutional neural networks model Faster R-CNN that are based on to the semantic positioning of character area progress in image, but 4 kinds of above-mentioned semantic segmentation network models are not limited to, can adjust and replace with other based on global or regional area multiple/a variety of semantic segmentation neural network models.The present invention can carry out semantic segmentation, applied widely, strong robustness to the character area comprising various text sizes, text color, character script, text languages while effectively excluding complicated non-legible region interference using deep neural network technology.

Description

A kind of pictograph semantic segmentation method based on the processing of multiple neural network Model Fusion

Technical field

The present invention relates to image recognition processing fields, in particular to for text semantic segmentation method in image.

Background technique

A kind of vital tool of the text as people's usually information interchange, has ten to the development of entire society Divide important influence.With being constantly progressive for epoch, people's text to be treated and information are also more and more, more and more Data and job note depend merely on and manually become more difficult to be identified and analyzed.Certain methods are studied to text word Symbol carries out identification and has become a current urgent demand.

Character segmentation is the difficult point and hot spot of alphabetic character identification.Character quantity is more, and only the common words of Chinese character are just There is more than 3000.Currently, main character segmentation method can be divided into three types: 1. segmentations based on structural analysis；2. base In knowledge method for distinguishing；3. whole segmentation strategy.Before executing segmentation, these methods need in a particular format that figure is presented Picture, to simplify subsequent processing.Pretreatment includes digitlization, denoising, binaryzation, normalization.However, there is various factors Text based image segmentation process is hindered, it is some as follows: picture quality, the positioning of content of text, texture file, text Type.

Segmentation for pictograph also needs the interference in view of other information, some rule-based dividing methods Segmentation is not can be effectively carried out.Meanwhile in recent years, it is desirable to improve the accuracy of character recognition.It therefore will for example Full convolutional neural networks FCN, image is applied to based on region convolutional neural networks model R-CNN even depth nerual network technique The semantic segmentation of text can solve the deficiency of conventional method.Meanwhile list is able to solve using the method for multi-model fusion treatment What one model occurred shows poor situation to certain class or a few class test objects.In practical application, at multi-model fusion Reason can break through this application limitation of single model, reach better detection effect.

Full convolutional neural networks (FCN, FullyConvolutionalNetworks), by the research team of University of California It proposes, original convolutional neural networks (CNN, Convolutional Neural Networks) has been promoted, with end to end The picture of arbitrary size is carried out the classification of Pixel-level by mode, to solve the image segmentation problem of semantic level.

MSFCN (Multi-scale Fully Convolutional Neural Networks) is multiple dimensioned full convolution Neural network model.

U-shaped full convolutional neural networks U-Net is to be improved based on full convolutional neural networks FCN, and increased using data The data of some fewer samples can be trained by force.

R-FCN (Region based Fully Convolutional Network) is the full convolution mind based on region Through network model, Position-sensitive score maps is proposed to solve the problems, such as the position sensing of target detection

Faster R-CNN (Faster Region-CNN) is to be based on region convolutional neural networks faster.

Summary of the invention

In the presence of overcoming the shortcomings of the prior art, provide for pictograph semantic segmentation Method.

The present invention provides in particular a kind of pictograph semantic segmentation side based on the processing of multiple neural network Model Fusion Method, comprising:

Multiple/a variety of semantic segmentation model training methods: gray processing, normalization are carried out to the sample image of semantic segmentation Deng pretreatment；Multi resolution feature extraction is carried out to the sample image of semantic segmentation；It is given birth to respectively for multi-model by semantic tagger At the label for semantic segmentation, the data set for deep learning is obtained；Nerual network technique based on semantic segmentation, structure Build multiple dimensioned full convolutional neural networks model M SFCN, U-shaped full convolutional neural networks model U-net, the full convolution based on region Neural network model R-FCN, it is based on region convolutional neural networks model Faster R-CNN faster；Based on to character area With the assessment in non-legible region, convolutional neural networks MODEL C NN is constructed；Above-mentioned multiple deep neural network models are respectively trained.

Multi-model method for amalgamation processing: it treats semantic segmentation image and carries out the pretreatment such as gray processing, normalization；To semanteme The sample image of segmentation carries out Multi resolution feature extraction；The treated sample image to semantic segmentation is respectively applied to Multiple dimensioned full convolutional neural networks model M SFCN, U-shaped full convolutional neural networks model U-net, the full convolution mind based on region Each single model is obtained through network model R-FCN, faster based on region convolutional neural networks model Faster R-CNN Prediction result figure；Using convolutional neural networks MODEL C NN to the prediction result of single model carry out assessment processing with merge, obtain To final semantic segmentation result.

A kind of above-mentioned pictograph semantic segmentation method based on the processing of multiple neural network Model Fusion is to pictograph Semantic segmentation be different from traditional the methods of Morphological scale-space, can be interfered effectively exclude complicated non-legible region While, semantic segmentation is carried out to the character area comprising various text sizes, text color, character script, text languages, Applied widely, strong robustness.

The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, It can be implemented in accordance with the contents of the specification again, and in order to allow above and other objects, features and advantages of the invention can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.

Detailed description of the invention

Fig. 1 is the schematic diagram that the method for the present invention realizes step

Fig. 2 shows the characteristic image of three kinds of scales

A kind of semantic tagger image schematic diagram of Fig. 3

A kind of schematic diagram for marking semantic frame information xml form of Fig. 4

The schematic diagram of Fig. 5 matrix character area image

Fig. 6 constructs multiple dimensioned full convolutional neural networks model flow figure

Fig. 7 constructs U-shaped full convolutional neural networks model flow figure

Fig. 8 constructs the full convolutional neural networks model flow figure based on region

Fig. 9 building is based on region convolutional neural networks model flow figure faster

The schematic diagram of Figure 10 multi-model temporal voting strategy processing

Figure 11 is the sample figure to pictograph semantic segmentation

Figure 12 is the semantic positioning figure of Processing with Neural Network

Figure 13 is the semantic segmentation block diagram of Processing with Neural Network

Figure 14 is the sub-pictures sample figure that semantic segmentation goes out

Specific embodiment

Below with reference to test example and specific embodiment, the present invention is described in further detail.But this should not be managed For the scope of the above subject matter of the present invention is limited to the following embodiments, all technologies realized based on the content of present invention are belonged to solution In the range of present aspect.

A kind of pictograph semantic segmentation method based on the processing of multiple neural network Model Fusion provided by the invention is realized Step, as shown in Figure 1.

In step s 110, the pretreatments, Analysis On Multi-scale Features such as gray processing, normalization are carried out to semantic segmentation sample image Extraction operation.

In a specific embodiment of the invention, the Multi resolution feature extraction algorithm be can be such that make Semantic segmentation model becomes more sensitive to detailed information is retained, and needs to provide to model more about the information of spacing.Using Multi resolution feature extraction algorithm can efficiently extract clearance features.By the way that Analysis On Multi-scale Features figure is added to semantic segmentation mould In type, such as more pitch informations can be provided for full convolutional neural networks model.Here Multi resolution feature extraction algorithm In, multiple dimensioned number can take numerical value appropriate, such as value in one embodiment of the present of invention according to semantic segmentation model It is 3, can also be the numerical value such as 4,5,6.Fig. 2 shows the characteristic pattern of the 3 kinds of scales obtained by Multi resolution feature extraction algorithm Picture.Three picture sizes shown in Fig. 2 can be preset numerical value: 512 × 376 × 1,256 × 188 × 1,128 × 94 × 1.First indicates the width and height of image with second-order digit, and third bit digital indicates the channel information of characteristic image.When It so can also be the characteristic image channel information of picture size such as 32 × 32,64 × 128 of other sizes etc. and other quantity Such as 3,4.

In the step s 120, raw according to the sample image of semantic segmentation by artificial or semi-artificial carry out semantic marker At semantic tag image either semantic marker frame information.Specifically for the neural network for being directed to full convolution type, such as institute MSFCN, U-net, the R-FCN stated constructs semantic tagger image as shown in Figure 3；For the mind of the convolution type based on region Through network, example FasterR-CNN as mentioned constructs the xml document of the semantic frame information of mark as shown in Figure 4.For being used for The convolutional neural networks CNN of character area and non-legible regional assessment, according to the character area of semantic marker and non-legible region Shearing generates the image of multiple matrix character areas and non-legible region, as shown in Figure 5.

In step s 130, the method, such as translation, rotation, mirror image, reflection transformation etc. enhanced by data, to above-mentioned Data set is extended.

In step S140, multiple/a variety of neural network structures for semantic segmentation of building, according to this hair A bright specific embodiment can use the multiple dimensioned complete full convolutional neural networks mould of convolutional neural networks model M SFCN, U type Type U-net, the full convolutional neural networks model R-FCN based on region, it is based on region convolutional neural networks model faster 4 kinds of models such as Faster R-CNN carry out semantic positioning to the image comprising text.

In step S150, a specific embodiment according to the present invention can be using to character area and non-legible area The convolutional neural networks MODEL C NN of domain assessment.

It is the preferable example that the present invention provides below for above-mentioned each neural network model.

As shown in fig. 6, being to construct multiple dimensioned full convolutional neural networks model flow figure in the present invention.Of the invention one In a embodiment, treat semantic segmentation image pre-processed, multi-feature extraction algorithm generate 512 × 376 × 1,256 × 188 The image of × 1,128 × 94 × 1 etc. 3 kind of scale size, is expressed as Scale1x, Scale2x, Scale4x.For Scale1x characteristic image is by 2 convolutional layers and 1 pond layer as a result, with Scale2x characteristic image by 2 convolutional layers Result carry out the processing of 1 fused layer.By fusion results by 2 convolutional layers and 1 pond layer as a result, special with Scale4x Image is levied by the result of 2 convolutional layers by 1 fused layer processing.Fusion results are passed through into 2 convolutional layers, 1 pond After layer, 2 convolutional layer processing, is handled using 3 warp laminations, obtain the output result of neural network.According to corresponding language Adopted segmentation tag figure calculates loss, updates model parameter, until preservation model result is completed in training.

As shown in fig. 7, being to construct U-shaped full convolutional neural networks model flow figure in the present invention.In a reality of the invention It applies in example, treats semantic segmentation image and pre-processed.By pre-processed results by 2 convolutional layers treated feature seal For Fa 1；By Fa1, by 1 pond layer and 2 convolutional layers, treated that characteristic pattern is denoted as Fa2；Fa2 is passed through into 1 pond layer Treated that characteristic pattern is denoted as Fa3 with 2 convolutional layers.By Fa3 by 1 pond layer and 2 convolutional layer processing and then warp The result and Fa3 for crossing 1 warp lamination are handled by 1 fused layer；Fusion results are passed through into 2 convolutional layers and 1 deconvolution The result and Fa2 of layer are handled by 1 fused layer；By fusion results by 2 convolutional layers and 1 warp lamination result with Fa 1 is by 1 fused layer processing.Fusion results are handled by the convolutional layer that 2 convolutional layers and 1 convolution kernel are 1 × 1, Obtain the output result of neural network.It is calculated and is lost according to corresponding semantic segmentation label figure, update model parameter, until instruction Practice and completes preservation model result.

As shown in figure 8, being full convolutional neural networks model flow figure of the building based on region in the present invention.In the present invention One embodiment in, using a basic convolutional network similar to ResNet-101, a region suggestion network RPN, One for the convolutional layer of position sensing score chart and the last one pond ROI layer and the decision-making level for ballot.Wherein class The convolutional network for being similar to ResNet-101 contains 15 convolutional layers, and 1 overall situation is averaged pond layer and 1 full articulamentum.R- FCN utilizes the loss function of ROI to calculate and updates neural network model by using including region suggestion and two step of territorial classification Parameter.By repeatedly training preservation model result.

It shown in Fig. 9, is constructed in the present invention faster based on region convolutional neural networks model flow figure.In the present invention One embodiment in, construct region and suggest network RPN and a Faster R-CNN network.RPN construction is based on VGG16 network structure, wherein RPN and Fast-R-CNN has shared the convolutional layer of 13 VGG.Pass through pre-training model initialization Network parameter, individually training RPN and Faster R-CNN, passes through multiple convolution and Chi Huacao for the candidate region of RPN output Make, then passes through ROI pondization and full articulamentum, then be used to carry out target classification by exporting result one, another is used for region It returns.RPN is trained again, only updates the exclusive partial parameters of RPN.Fast-RCNN network is finely tuned again with the result of RPN, only more The parameter of the new exclusive part Fast-RCNN.Preservation model result is completed in training.

In one embodiment of the present of invention, for the convolutional neural networks model of character area and non-legible regional assessment CNN uses 6 convolutional layers and 2 full articulamentums, is finally assessed using regressand value region.After the completion of training, protect Deposit model result.

In one embodiment of the invention, 3 × 3 convolution kernels are all made of in above-mentioned a variety of deep neural networks.At this It can also be using such as 5 × 5,7 × 7 convolution kernel or the convolution kernel with holes of other scales in invention.

In step S170, the prediction result figure obtained by each single model treatment is obtained.It among these include needing It will be to the knot for for example based on prediction results such as region convolutional neural networks model Faster R-CNN being faster semantic frame information Fruit is handled.The semantic frame information architecture obtained using semantic segmentation model treatment and full convolutional neural networks FCN prediction are tied The same semantic segmentation image of fruit.The semantic segmentation image marks whether it is character area using different pixel values 0 to 255 A possibility that, maximum, white a possibility that such as black, pixel value 0, expression is non-character area, pixel value is 255, a possibility that expression is character area, is maximum.Pixel included by each semanteme frame utilizes Faster in prediction result A possibility that predicted value that R-CNN is obtained for the semanteme frame and 255 product to indicate are character areas size.All languages Pixel numerical value other than adopted frame is all marked as 0.Carry out generative semantics segmented image with this.

It also needs to include to for example multiple dimensioned full convolutional neural networks model M SFCN, U-shaped full convolutional neural networks model The full convolutional network model such as U-net, full convolutional neural networks model R-FCN based on region is for pixel in prediction result figure Point carries out the operation of pseudo- binary conversion treatment.

The pseudo- binary conversion treatment operation is to be screened according to threshold value appropriate to pixel in prediction result figure: Pixel numerical value whole label less than threshold value is, greater than the size that the pixel numerical value of threshold value then retains original numerical value.

A possibility which kind of classification what such processing operation made that pixel represents is.Such as character area with Non-legible region, each pixel value can indicate in prediction result a possibility that being character area.

In step S180, assessment processing is carried out to the prediction result figure of each single model, specific method includes: root According to the character area marked in the prediction result figure of each single model, using train complete for character area with The convolutional neural networks MODEL C NN of non-legible regional assessment is assessed.Its assessment result is one 0 to 1 numerical value, wherein 0 A possibility that a possibility that expression is non-character area is maximum, and 1 expression is character area is maximum.Then with each character area Assessed value and character area in the product of each pixel value update entire prediction result figure.

In step S190, described can use following strategy by the final semantic segmentation figure of blending algorithm generation:

The binary conversion treatment of appropriate threshold value is first passed through, so that all pixels point value of prognostic chart is only 0 or 255, then The result of last semantic segmentation is generated by the way of temporal voting strategy for each pixel.The temporal voting strategy can be stated Are as follows: assuming that there is N number of semantic segmentation model.For pixel each in image to be detected, N number of semantic segmentation model the point (i, J) predicted value on is expressed as S_{(i, j)}={ S₁,S₂,S₃,…,S_N-1,S_N}.Wherein S_kValue be 0 or 255, the value of k exists 1 between N.So a temporal voting strategy can be S_{(i, j)}=Max { Num (S_k=0), Num (S_k=255) }.Wherein Num (S_k =0) S is indicated_{(i, j)}Middle S_kThe number that value is 0, same Num (S_k=255) S is indicated_{(i, j)}Middle S_kThe number that value is 255. Max { } indicates the most value of quantity.Segmentation result according to most semantic segmentations is credible situation, can be in point (i, j) On take the prediction result of most models at that point as final prediction result.For all positions pixel repeat into Row aforesaid operations, which can merge, generates a semantic segmentation image.

Multi-model temporal voting strategy processing strategie is as shown in Figure 10, i.e., is optimized and melted using the result of multiple single models It closes, this processing will have better expression effect than single model treatment.

It will appreciated by the skilled person that the throwing by pixel that above-mentioned multi-model method for amalgamation processing is mentioned Ticket strategy is the specific strategy of one embodiment of the invention.It can also be carried out on the basis of this thought in step S190 Other multi-model method for amalgamation processing are modified to, such as similar multi-model takes weighted average strategy, weighting by pixel Multi-model temporal voting strategy etc..

In a specific embodiment of the invention, receives and pass through the pretreated character image to semantic segmentation.In During the pictograph semantic segmentation of multi-model fusion treatment:

Figure 11 is the original image comprising text.It will be to be processed by the pre- place operation to semantic segmentation image Then the scale that the Size Conversion of picture is 512 × 376 × 1 generates other two kinds by Multi resolution feature extraction algorithm The picture of 256 × 188 × 1,128 × 94 × 1 scale.By adaptive binary processing method, by 256 brightness The gray level image of grade, which is chosen to obtain by adaptive threshold value, still can reflect the whole two-value with local feature of image Change image, and is inputted result picture as the Analysis On Multi-scale Features of neural network.

Include following 2 steps when carrying out semantic segmentation to image using the multiple/a variety of semantic segmentation model It is rapid:

Step1: three kinds of scales that Fig. 6 is generated by processing using multiple/a variety of semantic segmentation models that training is completed The input picture of feature carries out predicted operation.Semantic character area is marked with different pixel values using each single model Out.

Step2: character area assessment processing is carried out according to the convolutional neural networks that training is completed, is merged using multi-model Algorithm, the prediction result obtained to all single models are handled, final process and the semantic segmentation result for generating such as Figure 12 Figure.

According to final semantic segmentation result figure, it is as shown in figure 13 that the effect come is marked out on the image of original size. It is as shown in figure 14 that the sub-pictures that cutting is handled are carried out to character area block.

The processing for dividing module by the repeatedly described semantic region, can be obtained text semantic region in whole image The sub-pictures that cutting generates.

It will appreciated by the skilled person that above specification provides a large amount of implementation details.Certainly, this hair Bright embodiment can be practiced without these specific details.In some instances, it is not been shown in detail well known Method, structure and technology, so as not to obscure the understanding of this specification.

Claims

1. it is a kind of based on multiple neural network Model Fusion processing pictograph semantic segmentation method, which is characterized in that including with Lower step:

Step 1: the character image to semantic segmentation is received；

Step 2: the character image for treating semantic segmentation carries out the pretreatment such as gray processing, normalization；

Step 3: multiple/a variety of semantic segmentation models after inputting training respectively to pretreated image are predicted respectively；

Step 4: using the convolutional neural networks CNN after training to obtained multiple single model prediction results to character area With the assessment and processing in non-legible region；

Step 5: the method for using multi-model fusion to assessment and treated result generates the result of last semantic segmentation.

2. the pictograph semantic segmentation method according to claim 1 based on the processing of multiple neural network Model Fusion, It is characterized in that: the training method of semantic segmentation models multiple involved in the step 3/a variety of, comprising the following steps:

Step 21: gray processing being carried out to the sample image of semantic segmentation, normalization pre-processes and carries out Multi resolution feature extraction；

Step 22: generating the label for being used for semantic segmentation respectively by semantic tagger for multi-model, obtain for deep learning Data set；

Step 23: the nerual network technique based on semantic segmentation constructs multiple dimensioned full convolutional neural networks model M SFCN, U-shaped complete Convolutional neural networks model U-net, the full convolutional neural networks model R-FCN based on region, it is based on region convolution mind faster Through network model FasterR-CNN；

Step 24: being respectively trained and save above-mentioned multiple deep neural network models.

3. the pictograph semantic segmentation method according to claim 2 based on the processing of multiple neural network Model Fusion, Be characterized in that: in the step 21 to semantic segmentation sample image carry out pretreatment with Multi resolution feature extraction method, including with Lower step:

Step 31: the sample image gray processing of semantic segmentation being handled, the pixel of each point is indicated with 0 to 255 numerical value；

Step 32: handling by the image normalization of gray processing, the size that the length of image and width are zoomed to pre-set image is big It is small；

Step 33: more rulers are generated using the Multi resolution feature extraction algorithm based on multi-scale transform to the image of normalization and scaling Spend feature samples image.

4. the pictograph semantic segmentation method according to claim 2 based on the processing of multiple neural network Model Fusion, It is characterized in that: obtaining the method for deep learning data set in the step 22, it is characterised in that the following steps are included:

Step 41: artificial or semi-artificial carry out semantic marker, generative semantics label figure are passed through according to the sample image of semantic segmentation Picture either semantic marker frame information；

Step 42: multiple matrix character areas and non-legible are generated according to the character area of semantic marker and non-legible regional shear The image in region；

Step 43: the method, such as translation, rotation, mirror image, reflection transformation etc. enhanced by data carries out above-mentioned data set Extension, and it is processed into the training dataset format for being suitable for multiple/a variety of semantic segmentation models.

5. the pictograph semantic segmentation method according to claim 1 based on the processing of multiple neural network Model Fusion, It is characterized in that: being predicted respectively using multiple/a variety of semantic segmentation models after training, which is characterized in that including multiple dimensioned Full convolutional neural networks model M SFCN, U-shaped full convolutional neural networks model U-net, the full convolutional neural networks mould based on region Type R-FCN, it is based on region convolutional neural networks model Faster R-CNN faster, but is not limited to the network structure being previously mentioned, Further include:

The full convolutional neural networks FCN of single-input single-output and its improved a variety of variant structure based on basic multilayer, such as Including single input and multi-output, multiple input single output, the full convolutional neural networks for semantic segmentation of multiple-input and multiple-output Structure etc.；

The neural network structure for semantic segmentation based on Global treatment and Local treatment, such as based on region convolutional Neural net The various variant structures etc. of network model R-CNN.

6. the pictograph semantic segmentation method according to claim 1 based on the processing of multiple neural network Model Fusion, It is characterized in that: for the method to character area and non-legible regional assessment in the step 4, it is characterised in that including using Convolutional neural networks MODEL C NN.

7. the pictograph semantic segmentation method according to claim 1 based on the processing of multiple neural network Model Fusion, Be characterized in that: semantic segmentation models multiple in the step 3/a variety of are predicted to obtain the process of prediction result figure respectively are as follows: The treated sample image to semantic segmentation is respectively applied to multiple dimensioned full convolutional neural networks model M SFCN, U-shaped complete Convolutional neural networks model U-net, the full convolutional neural networks model R-FCN based on region, it is based on region convolution mind faster The prediction result figure of each single model is obtained through network model Faster R-CNN；Prediction result figure uses 0 to 255 pixel values Mark whether a possibility that being character area, such as a possibility that black pixel value is 0, and expression is non-character area is maximum, it is white A possibility that color pixel value is 255, and expression is character area is maximum.

8. the pictograph semantic segmentation method according to claim 1 based on the processing of multiple neural network Model Fusion, Be characterized in that: in the step 4 using convolutional neural networks CNN to obtained multiple single model prediction results to literal field The method of the assessment and processing in domain and non-legible region are as follows: for the prediction result figure that single semantic segmentation model generates, by it The character area of middle label is individually divided, and assesses by convolutional neural networks MODEL C NN.NN is to it for convolutional neural networks MODEL C Whether be assessed value that character area obtains one 0 to 1, wherein 0 indicates that a possibility that being non-character area is maximum, 1 indicate be A possibility that character area, is maximum；All single models are generated into character area in prediction result figure and all do above-mentioned assessment processing, Then it is added to this assessed value as weight in corresponding prediction result figure in all pixels value of character area；To all warps The prediction result figure for crossing assessment processing carries out fusion treatment, generates final semantic segmentation result.

9. the pictograph semantic segmentation method according to claim 1 based on the processing of multiple neural network Model Fusion, It is characterized in that: the method that multi-model fusion is used to assessment and treated result in the step 4 are as follows: pass through appropriate threshold value Binary conversion treatment；Temporal voting strategy is used for each pixel, ultimately generates the result of last semantic segmentation.

10. the pictograph semantic segmentation method according to claim 6 based on the processing of multiple neural network Model Fusion, It is characterized in that: further include: the following algorithm for classification: model-naive Bayesian, supporting vector machine model and correlation are improved Sorting algorithm.