CN110390251A - A kind of pictograph semantic segmentation method based on the processing of multiple neural network Model Fusion - Google Patents

A kind of pictograph semantic segmentation method based on the processing of multiple neural network Model Fusion Download PDF

Info

Publication number
CN110390251A
CN110390251A CN201910403196.1A CN201910403196A CN110390251A CN 110390251 A CN110390251 A CN 110390251A CN 201910403196 A CN201910403196 A CN 201910403196A CN 110390251 A CN110390251 A CN 110390251A
Authority
CN
China
Prior art keywords
semantic segmentation
model
convolutional neural
neural networks
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910403196.1A
Other languages
Chinese (zh)
Other versions
CN110390251B (en
Inventor
刘晋
张鑫
李云辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Maritime University
Original Assignee
Shanghai Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Maritime University filed Critical Shanghai Maritime University
Priority to CN201910403196.1A priority Critical patent/CN110390251B/en
Publication of CN110390251A publication Critical patent/CN110390251A/en
Application granted granted Critical
Publication of CN110390251B publication Critical patent/CN110390251B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Image Analysis (AREA)

Abstract

The present invention provides a kind of pictograph semantic segmentation methods based on the processing of multiple neural network Model Fusion, including multiple/a variety of semantic segmentation model training methods and multi-model method for amalgamation processing two parts.The present invention is using multiple dimensioned full convolutional neural networks model M SFCN, U-shaped full convolutional neural networks model U-net, the full convolutional neural networks model R-FCN based on region, the faster multiple semantic segmentation network models such as region convolutional neural networks model Faster R-CNN that are based on to the semantic positioning of character area progress in image, but 4 kinds of above-mentioned semantic segmentation network models are not limited to, can adjust and replace with other based on global or regional area multiple/a variety of semantic segmentation neural network models.The present invention can carry out semantic segmentation, applied widely, strong robustness to the character area comprising various text sizes, text color, character script, text languages while effectively excluding complicated non-legible region interference using deep neural network technology.

Description

A kind of pictograph semantic segmentation method based on the processing of multiple neural network Model Fusion
Technical field
The present invention relates to image recognition processing fields, in particular to for text semantic segmentation method in image.
Background technique
A kind of vital tool of the text as people's usually information interchange, has ten to the development of entire society Divide important influence.With being constantly progressive for epoch, people's text to be treated and information are also more and more, more and more Data and job note depend merely on and manually become more difficult to be identified and analyzed.Certain methods are studied to text word Symbol carries out identification and has become a current urgent demand.
Character segmentation is the difficult point and hot spot of alphabetic character identification.Character quantity is more, and only the common words of Chinese character are just There is more than 3000.Currently, main character segmentation method can be divided into three types: 1. segmentations based on structural analysis;2. base In knowledge method for distinguishing;3. whole segmentation strategy.Before executing segmentation, these methods need in a particular format that figure is presented Picture, to simplify subsequent processing.Pretreatment includes digitlization, denoising, binaryzation, normalization.However, there is various factors Text based image segmentation process is hindered, it is some as follows: picture quality, the positioning of content of text, texture file, text Type.
Segmentation for pictograph also needs the interference in view of other information, some rule-based dividing methods Segmentation is not can be effectively carried out.Meanwhile in recent years, it is desirable to improve the accuracy of character recognition.It therefore will for example Full convolutional neural networks FCN, image is applied to based on region convolutional neural networks model R-CNN even depth nerual network technique The semantic segmentation of text can solve the deficiency of conventional method.Meanwhile list is able to solve using the method for multi-model fusion treatment What one model occurred shows poor situation to certain class or a few class test objects.In practical application, at multi-model fusion Reason can break through this application limitation of single model, reach better detection effect.
Full convolutional neural networks (FCN, FullyConvolutionalNetworks), by the research team of University of California It proposes, original convolutional neural networks (CNN, Convolutional Neural Networks) has been promoted, with end to end The picture of arbitrary size is carried out the classification of Pixel-level by mode, to solve the image segmentation problem of semantic level.
MSFCN (Multi-scale Fully Convolutional Neural Networks) is multiple dimensioned full convolution Neural network model.
U-shaped full convolutional neural networks U-Net is to be improved based on full convolutional neural networks FCN, and increased using data The data of some fewer samples can be trained by force.
R-FCN (Region based Fully Convolutional Network) is the full convolution mind based on region Through network model, Position-sensitive score maps is proposed to solve the problems, such as the position sensing of target detection
Faster R-CNN (Faster Region-CNN) is to be based on region convolutional neural networks faster.
Summary of the invention
In the presence of overcoming the shortcomings of the prior art, provide for pictograph semantic segmentation Method.
The present invention provides in particular a kind of pictograph semantic segmentation side based on the processing of multiple neural network Model Fusion Method, comprising:
Multiple/a variety of semantic segmentation model training methods: gray processing, normalization are carried out to the sample image of semantic segmentation Deng pretreatment;Multi resolution feature extraction is carried out to the sample image of semantic segmentation;It is given birth to respectively for multi-model by semantic tagger At the label for semantic segmentation, the data set for deep learning is obtained;Nerual network technique based on semantic segmentation, structure Build multiple dimensioned full convolutional neural networks model M SFCN, U-shaped full convolutional neural networks model U-net, the full convolution based on region Neural network model R-FCN, it is based on region convolutional neural networks model Faster R-CNN faster;Based on to character area With the assessment in non-legible region, convolutional neural networks MODEL C NN is constructed;Above-mentioned multiple deep neural network models are respectively trained.
Multi-model method for amalgamation processing: it treats semantic segmentation image and carries out the pretreatment such as gray processing, normalization;To semanteme The sample image of segmentation carries out Multi resolution feature extraction;The treated sample image to semantic segmentation is respectively applied to Multiple dimensioned full convolutional neural networks model M SFCN, U-shaped full convolutional neural networks model U-net, the full convolution mind based on region Each single model is obtained through network model R-FCN, faster based on region convolutional neural networks model Faster R-CNN Prediction result figure;Using convolutional neural networks MODEL C NN to the prediction result of single model carry out assessment processing with merge, obtain To final semantic segmentation result.
A kind of above-mentioned pictograph semantic segmentation method based on the processing of multiple neural network Model Fusion is to pictograph Semantic segmentation be different from traditional the methods of Morphological scale-space, can be interfered effectively exclude complicated non-legible region While, semantic segmentation is carried out to the character area comprising various text sizes, text color, character script, text languages, Applied widely, strong robustness.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, It can be implemented in accordance with the contents of the specification again, and in order to allow above and other objects, features and advantages of the invention can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
Fig. 1 is the schematic diagram that the method for the present invention realizes step
Fig. 2 shows the characteristic image of three kinds of scales
A kind of semantic tagger image schematic diagram of Fig. 3
A kind of schematic diagram for marking semantic frame information xml form of Fig. 4
The schematic diagram of Fig. 5 matrix character area image
Fig. 6 constructs multiple dimensioned full convolutional neural networks model flow figure
Fig. 7 constructs U-shaped full convolutional neural networks model flow figure
Fig. 8 constructs the full convolutional neural networks model flow figure based on region
Fig. 9 building is based on region convolutional neural networks model flow figure faster
The schematic diagram of Figure 10 multi-model temporal voting strategy processing
Figure 11 is the sample figure to pictograph semantic segmentation
Figure 12 is the semantic positioning figure of Processing with Neural Network
Figure 13 is the semantic segmentation block diagram of Processing with Neural Network
Figure 14 is the sub-pictures sample figure that semantic segmentation goes out
Specific embodiment
Below with reference to test example and specific embodiment, the present invention is described in further detail.But this should not be managed For the scope of the above subject matter of the present invention is limited to the following embodiments, all technologies realized based on the content of present invention are belonged to solution In the range of present aspect.
A kind of pictograph semantic segmentation method based on the processing of multiple neural network Model Fusion provided by the invention is realized Step, as shown in Figure 1.
In step s 110, the pretreatments, Analysis On Multi-scale Features such as gray processing, normalization are carried out to semantic segmentation sample image Extraction operation.
In a specific embodiment of the invention, the Multi resolution feature extraction algorithm be can be such that make Semantic segmentation model becomes more sensitive to detailed information is retained, and needs to provide to model more about the information of spacing.Using Multi resolution feature extraction algorithm can efficiently extract clearance features.By the way that Analysis On Multi-scale Features figure is added to semantic segmentation mould In type, such as more pitch informations can be provided for full convolutional neural networks model.Here Multi resolution feature extraction algorithm In, multiple dimensioned number can take numerical value appropriate, such as value in one embodiment of the present of invention according to semantic segmentation model It is 3, can also be the numerical value such as 4,5,6.Fig. 2 shows the characteristic pattern of the 3 kinds of scales obtained by Multi resolution feature extraction algorithm Picture.Three picture sizes shown in Fig. 2 can be preset numerical value: 512 × 376 × 1,256 × 188 × 1,128 × 94 × 1.First indicates the width and height of image with second-order digit, and third bit digital indicates the channel information of characteristic image.When It so can also be the characteristic image channel information of picture size such as 32 × 32,64 × 128 of other sizes etc. and other quantity Such as 3,4.
In the step s 120, raw according to the sample image of semantic segmentation by artificial or semi-artificial carry out semantic marker At semantic tag image either semantic marker frame information.Specifically for the neural network for being directed to full convolution type, such as institute MSFCN, U-net, the R-FCN stated constructs semantic tagger image as shown in Figure 3;For the mind of the convolution type based on region Through network, example FasterR-CNN as mentioned constructs the xml document of the semantic frame information of mark as shown in Figure 4.For being used for The convolutional neural networks CNN of character area and non-legible regional assessment, according to the character area of semantic marker and non-legible region Shearing generates the image of multiple matrix character areas and non-legible region, as shown in Figure 5.
In step s 130, the method, such as translation, rotation, mirror image, reflection transformation etc. enhanced by data, to above-mentioned Data set is extended.
In step S140, multiple/a variety of neural network structures for semantic segmentation of building, according to this hair A bright specific embodiment can use the multiple dimensioned complete full convolutional neural networks mould of convolutional neural networks model M SFCN, U type Type U-net, the full convolutional neural networks model R-FCN based on region, it is based on region convolutional neural networks model faster 4 kinds of models such as Faster R-CNN carry out semantic positioning to the image comprising text.
In step S150, a specific embodiment according to the present invention can be using to character area and non-legible area The convolutional neural networks MODEL C NN of domain assessment.
It is the preferable example that the present invention provides below for above-mentioned each neural network model.
As shown in fig. 6, being to construct multiple dimensioned full convolutional neural networks model flow figure in the present invention.Of the invention one In a embodiment, treat semantic segmentation image pre-processed, multi-feature extraction algorithm generate 512 × 376 × 1,256 × 188 The image of × 1,128 × 94 × 1 etc. 3 kind of scale size, is expressed as Scale1x, Scale2x, Scale4x.For Scale1x characteristic image is by 2 convolutional layers and 1 pond layer as a result, with Scale2x characteristic image by 2 convolutional layers Result carry out the processing of 1 fused layer.By fusion results by 2 convolutional layers and 1 pond layer as a result, special with Scale4x Image is levied by the result of 2 convolutional layers by 1 fused layer processing.Fusion results are passed through into 2 convolutional layers, 1 pond After layer, 2 convolutional layer processing, is handled using 3 warp laminations, obtain the output result of neural network.According to corresponding language Adopted segmentation tag figure calculates loss, updates model parameter, until preservation model result is completed in training.
As shown in fig. 7, being to construct U-shaped full convolutional neural networks model flow figure in the present invention.In a reality of the invention It applies in example, treats semantic segmentation image and pre-processed.By pre-processed results by 2 convolutional layers treated feature seal For Fa 1;By Fa1, by 1 pond layer and 2 convolutional layers, treated that characteristic pattern is denoted as Fa2;Fa2 is passed through into 1 pond layer Treated that characteristic pattern is denoted as Fa3 with 2 convolutional layers.By Fa3 by 1 pond layer and 2 convolutional layer processing and then warp The result and Fa3 for crossing 1 warp lamination are handled by 1 fused layer;Fusion results are passed through into 2 convolutional layers and 1 deconvolution The result and Fa2 of layer are handled by 1 fused layer;By fusion results by 2 convolutional layers and 1 warp lamination result with Fa 1 is by 1 fused layer processing.Fusion results are handled by the convolutional layer that 2 convolutional layers and 1 convolution kernel are 1 × 1, Obtain the output result of neural network.It is calculated and is lost according to corresponding semantic segmentation label figure, update model parameter, until instruction Practice and completes preservation model result.
As shown in figure 8, being full convolutional neural networks model flow figure of the building based on region in the present invention.In the present invention One embodiment in, using a basic convolutional network similar to ResNet-101, a region suggestion network RPN, One for the convolutional layer of position sensing score chart and the last one pond ROI layer and the decision-making level for ballot.Wherein class The convolutional network for being similar to ResNet-101 contains 15 convolutional layers, and 1 overall situation is averaged pond layer and 1 full articulamentum.R- FCN utilizes the loss function of ROI to calculate and updates neural network model by using including region suggestion and two step of territorial classification Parameter.By repeatedly training preservation model result.
It shown in Fig. 9, is constructed in the present invention faster based on region convolutional neural networks model flow figure.In the present invention One embodiment in, construct region and suggest network RPN and a Faster R-CNN network.RPN construction is based on VGG16 network structure, wherein RPN and Fast-R-CNN has shared the convolutional layer of 13 VGG.Pass through pre-training model initialization Network parameter, individually training RPN and Faster R-CNN, passes through multiple convolution and Chi Huacao for the candidate region of RPN output Make, then passes through ROI pondization and full articulamentum, then be used to carry out target classification by exporting result one, another is used for region It returns.RPN is trained again, only updates the exclusive partial parameters of RPN.Fast-RCNN network is finely tuned again with the result of RPN, only more The parameter of the new exclusive part Fast-RCNN.Preservation model result is completed in training.
In one embodiment of the present of invention, for the convolutional neural networks model of character area and non-legible regional assessment CNN uses 6 convolutional layers and 2 full articulamentums, is finally assessed using regressand value region.After the completion of training, protect Deposit model result.
In one embodiment of the invention, 3 × 3 convolution kernels are all made of in above-mentioned a variety of deep neural networks.At this It can also be using such as 5 × 5,7 × 7 convolution kernel or the convolution kernel with holes of other scales in invention.
In step S170, the prediction result figure obtained by each single model treatment is obtained.It among these include needing It will be to the knot for for example based on prediction results such as region convolutional neural networks model Faster R-CNN being faster semantic frame information Fruit is handled.The semantic frame information architecture obtained using semantic segmentation model treatment and full convolutional neural networks FCN prediction are tied The same semantic segmentation image of fruit.The semantic segmentation image marks whether it is character area using different pixel values 0 to 255 A possibility that, maximum, white a possibility that such as black, pixel value 0, expression is non-character area, pixel value is 255, a possibility that expression is character area, is maximum.Pixel included by each semanteme frame utilizes Faster in prediction result A possibility that predicted value that R-CNN is obtained for the semanteme frame and 255 product to indicate are character areas size.All languages Pixel numerical value other than adopted frame is all marked as 0.Carry out generative semantics segmented image with this.
It also needs to include to for example multiple dimensioned full convolutional neural networks model M SFCN, U-shaped full convolutional neural networks model The full convolutional network model such as U-net, full convolutional neural networks model R-FCN based on region is for pixel in prediction result figure Point carries out the operation of pseudo- binary conversion treatment.
The pseudo- binary conversion treatment operation is to be screened according to threshold value appropriate to pixel in prediction result figure: Pixel numerical value whole label less than threshold value is, greater than the size that the pixel numerical value of threshold value then retains original numerical value.
A possibility which kind of classification what such processing operation made that pixel represents is.Such as character area with Non-legible region, each pixel value can indicate in prediction result a possibility that being character area.
In step S180, assessment processing is carried out to the prediction result figure of each single model, specific method includes: root According to the character area marked in the prediction result figure of each single model, using train complete for character area with The convolutional neural networks MODEL C NN of non-legible regional assessment is assessed.Its assessment result is one 0 to 1 numerical value, wherein 0 A possibility that a possibility that expression is non-character area is maximum, and 1 expression is character area is maximum.Then with each character area Assessed value and character area in the product of each pixel value update entire prediction result figure.
In step S190, described can use following strategy by the final semantic segmentation figure of blending algorithm generation:
The binary conversion treatment of appropriate threshold value is first passed through, so that all pixels point value of prognostic chart is only 0 or 255, then The result of last semantic segmentation is generated by the way of temporal voting strategy for each pixel.The temporal voting strategy can be stated Are as follows: assuming that there is N number of semantic segmentation model.For pixel each in image to be detected, N number of semantic segmentation model the point (i, J) predicted value on is expressed as S(i, j)={ S1,S2,S3,…,SN-1,SN}.Wherein SkValue be 0 or 255, the value of k exists 1 between N.So a temporal voting strategy can be S(i, j)=Max { Num (Sk=0), Num (Sk=255) }.Wherein Num (Sk =0) S is indicated(i, j)Middle SkThe number that value is 0, same Num (Sk=255) S is indicated(i, j)Middle SkThe number that value is 255. Max { } indicates the most value of quantity.Segmentation result according to most semantic segmentations is credible situation, can be in point (i, j) On take the prediction result of most models at that point as final prediction result.For all positions pixel repeat into Row aforesaid operations, which can merge, generates a semantic segmentation image.
Multi-model temporal voting strategy processing strategie is as shown in Figure 10, i.e., is optimized and melted using the result of multiple single models It closes, this processing will have better expression effect than single model treatment.
It will appreciated by the skilled person that the throwing by pixel that above-mentioned multi-model method for amalgamation processing is mentioned Ticket strategy is the specific strategy of one embodiment of the invention.It can also be carried out on the basis of this thought in step S190 Other multi-model method for amalgamation processing are modified to, such as similar multi-model takes weighted average strategy, weighting by pixel Multi-model temporal voting strategy etc..
In a specific embodiment of the invention, receives and pass through the pretreated character image to semantic segmentation.In During the pictograph semantic segmentation of multi-model fusion treatment:
Figure 11 is the original image comprising text.It will be to be processed by the pre- place operation to semantic segmentation image Then the scale that the Size Conversion of picture is 512 × 376 × 1 generates other two kinds by Multi resolution feature extraction algorithm The picture of 256 × 188 × 1,128 × 94 × 1 scale.By adaptive binary processing method, by 256 brightness The gray level image of grade, which is chosen to obtain by adaptive threshold value, still can reflect the whole two-value with local feature of image Change image, and is inputted result picture as the Analysis On Multi-scale Features of neural network.
Include following 2 steps when carrying out semantic segmentation to image using the multiple/a variety of semantic segmentation model It is rapid:
Step1: three kinds of scales that Fig. 6 is generated by processing using multiple/a variety of semantic segmentation models that training is completed The input picture of feature carries out predicted operation.Semantic character area is marked with different pixel values using each single model Out.
Step2: character area assessment processing is carried out according to the convolutional neural networks that training is completed, is merged using multi-model Algorithm, the prediction result obtained to all single models are handled, final process and the semantic segmentation result for generating such as Figure 12 Figure.
According to final semantic segmentation result figure, it is as shown in figure 13 that the effect come is marked out on the image of original size. It is as shown in figure 14 that the sub-pictures that cutting is handled are carried out to character area block.
The processing for dividing module by the repeatedly described semantic region, can be obtained text semantic region in whole image The sub-pictures that cutting generates.
It will appreciated by the skilled person that above specification provides a large amount of implementation details.Certainly, this hair Bright embodiment can be practiced without these specific details.In some instances, it is not been shown in detail well known Method, structure and technology, so as not to obscure the understanding of this specification.

Claims (10)

1. it is a kind of based on multiple neural network Model Fusion processing pictograph semantic segmentation method, which is characterized in that including with Lower step:
Step 1: the character image to semantic segmentation is received;
Step 2: the character image for treating semantic segmentation carries out the pretreatment such as gray processing, normalization;
Step 3: multiple/a variety of semantic segmentation models after inputting training respectively to pretreated image are predicted respectively;
Step 4: using the convolutional neural networks CNN after training to obtained multiple single model prediction results to character area With the assessment and processing in non-legible region;
Step 5: the method for using multi-model fusion to assessment and treated result generates the result of last semantic segmentation.
2. the pictograph semantic segmentation method according to claim 1 based on the processing of multiple neural network Model Fusion, It is characterized in that: the training method of semantic segmentation models multiple involved in the step 3/a variety of, comprising the following steps:
Step 21: gray processing being carried out to the sample image of semantic segmentation, normalization pre-processes and carries out Multi resolution feature extraction;
Step 22: generating the label for being used for semantic segmentation respectively by semantic tagger for multi-model, obtain for deep learning Data set;
Step 23: the nerual network technique based on semantic segmentation constructs multiple dimensioned full convolutional neural networks model M SFCN, U-shaped complete Convolutional neural networks model U-net, the full convolutional neural networks model R-FCN based on region, it is based on region convolution mind faster Through network model FasterR-CNN;
Step 24: being respectively trained and save above-mentioned multiple deep neural network models.
3. the pictograph semantic segmentation method according to claim 2 based on the processing of multiple neural network Model Fusion, Be characterized in that: in the step 21 to semantic segmentation sample image carry out pretreatment with Multi resolution feature extraction method, including with Lower step:
Step 31: the sample image gray processing of semantic segmentation being handled, the pixel of each point is indicated with 0 to 255 numerical value;
Step 32: handling by the image normalization of gray processing, the size that the length of image and width are zoomed to pre-set image is big It is small;
Step 33: more rulers are generated using the Multi resolution feature extraction algorithm based on multi-scale transform to the image of normalization and scaling Spend feature samples image.
4. the pictograph semantic segmentation method according to claim 2 based on the processing of multiple neural network Model Fusion, It is characterized in that: obtaining the method for deep learning data set in the step 22, it is characterised in that the following steps are included:
Step 41: artificial or semi-artificial carry out semantic marker, generative semantics label figure are passed through according to the sample image of semantic segmentation Picture either semantic marker frame information;
Step 42: multiple matrix character areas and non-legible are generated according to the character area of semantic marker and non-legible regional shear The image in region;
Step 43: the method, such as translation, rotation, mirror image, reflection transformation etc. enhanced by data carries out above-mentioned data set Extension, and it is processed into the training dataset format for being suitable for multiple/a variety of semantic segmentation models.
5. the pictograph semantic segmentation method according to claim 1 based on the processing of multiple neural network Model Fusion, It is characterized in that: being predicted respectively using multiple/a variety of semantic segmentation models after training, which is characterized in that including multiple dimensioned Full convolutional neural networks model M SFCN, U-shaped full convolutional neural networks model U-net, the full convolutional neural networks mould based on region Type R-FCN, it is based on region convolutional neural networks model Faster R-CNN faster, but is not limited to the network structure being previously mentioned, Further include:
The full convolutional neural networks FCN of single-input single-output and its improved a variety of variant structure based on basic multilayer, such as Including single input and multi-output, multiple input single output, the full convolutional neural networks for semantic segmentation of multiple-input and multiple-output Structure etc.;
The neural network structure for semantic segmentation based on Global treatment and Local treatment, such as based on region convolutional Neural net The various variant structures etc. of network model R-CNN.
6. the pictograph semantic segmentation method according to claim 1 based on the processing of multiple neural network Model Fusion, It is characterized in that: for the method to character area and non-legible regional assessment in the step 4, it is characterised in that including using Convolutional neural networks MODEL C NN.
7. the pictograph semantic segmentation method according to claim 1 based on the processing of multiple neural network Model Fusion, Be characterized in that: semantic segmentation models multiple in the step 3/a variety of are predicted to obtain the process of prediction result figure respectively are as follows: The treated sample image to semantic segmentation is respectively applied to multiple dimensioned full convolutional neural networks model M SFCN, U-shaped complete Convolutional neural networks model U-net, the full convolutional neural networks model R-FCN based on region, it is based on region convolution mind faster The prediction result figure of each single model is obtained through network model Faster R-CNN;Prediction result figure uses 0 to 255 pixel values Mark whether a possibility that being character area, such as a possibility that black pixel value is 0, and expression is non-character area is maximum, it is white A possibility that color pixel value is 255, and expression is character area is maximum.
8. the pictograph semantic segmentation method according to claim 1 based on the processing of multiple neural network Model Fusion, Be characterized in that: in the step 4 using convolutional neural networks CNN to obtained multiple single model prediction results to literal field The method of the assessment and processing in domain and non-legible region are as follows: for the prediction result figure that single semantic segmentation model generates, by it The character area of middle label is individually divided, and assesses by convolutional neural networks MODEL C NN.NN is to it for convolutional neural networks MODEL C Whether be assessed value that character area obtains one 0 to 1, wherein 0 indicates that a possibility that being non-character area is maximum, 1 indicate be A possibility that character area, is maximum;All single models are generated into character area in prediction result figure and all do above-mentioned assessment processing, Then it is added to this assessed value as weight in corresponding prediction result figure in all pixels value of character area;To all warps The prediction result figure for crossing assessment processing carries out fusion treatment, generates final semantic segmentation result.
9. the pictograph semantic segmentation method according to claim 1 based on the processing of multiple neural network Model Fusion, It is characterized in that: the method that multi-model fusion is used to assessment and treated result in the step 4 are as follows: pass through appropriate threshold value Binary conversion treatment;Temporal voting strategy is used for each pixel, ultimately generates the result of last semantic segmentation.
10. the pictograph semantic segmentation method according to claim 6 based on the processing of multiple neural network Model Fusion, It is characterized in that: further include: the following algorithm for classification: model-naive Bayesian, supporting vector machine model and correlation are improved Sorting algorithm.
CN201910403196.1A 2019-05-15 2019-05-15 Image and character semantic segmentation method based on multi-neural-network model fusion processing Active CN110390251B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910403196.1A CN110390251B (en) 2019-05-15 2019-05-15 Image and character semantic segmentation method based on multi-neural-network model fusion processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910403196.1A CN110390251B (en) 2019-05-15 2019-05-15 Image and character semantic segmentation method based on multi-neural-network model fusion processing

Publications (2)

Publication Number Publication Date
CN110390251A true CN110390251A (en) 2019-10-29
CN110390251B CN110390251B (en) 2022-09-30

Family

ID=68285296

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910403196.1A Active CN110390251B (en) 2019-05-15 2019-05-15 Image and character semantic segmentation method based on multi-neural-network model fusion processing

Country Status (1)

Country Link
CN (1) CN110390251B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178405A (en) * 2019-12-18 2020-05-19 浙江工业大学 Similar object identification method fusing multiple neural networks
CN111192248A (en) * 2019-12-30 2020-05-22 山东大学 Multi-task relation learning method for positioning, identifying and segmenting vertebral body in nuclear magnetic resonance imaging
CN111199539A (en) * 2019-12-25 2020-05-26 汕头大学 Crack detection method based on integrated neural network
CN111275034A (en) * 2020-01-19 2020-06-12 世纪龙信息网络有限责任公司 Method, device, equipment and storage medium for extracting text region from image
CN111582263A (en) * 2020-05-12 2020-08-25 上海眼控科技股份有限公司 License plate recognition method and device, electronic equipment and storage medium
CN111612799A (en) * 2020-05-15 2020-09-01 中南大学 Face data pair-oriented incomplete reticulate pattern face repairing method and system and storage medium
CN111626357A (en) * 2020-05-27 2020-09-04 北京微智信业科技有限公司 Image identification method based on neural network model
CN112183549A (en) * 2020-10-26 2021-01-05 公安部交通管理科学研究所 Foreign driving license layout character positioning method based on semantic segmentation
CN112270370A (en) * 2020-11-06 2021-01-26 北京环境特性研究所 Vehicle apparent damage assessment method
CN112489054A (en) * 2020-11-27 2021-03-12 中北大学 Remote sensing image semantic segmentation method based on deep learning
CN112966691A (en) * 2021-04-14 2021-06-15 重庆邮电大学 Multi-scale text detection method and device based on semantic segmentation and electronic equipment
CN113435271A (en) * 2021-06-10 2021-09-24 中国电子科技集团公司第三十八研究所 Fusion method based on target detection and instance segmentation model
CN113792742A (en) * 2021-09-17 2021-12-14 北京百度网讯科技有限公司 Semantic segmentation method of remote sensing image and training method of semantic segmentation model
CN116665114A (en) * 2023-07-28 2023-08-29 广东海洋大学 Multi-mode-based remote sensing scene identification method, system and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008114618A1 (en) * 2007-03-19 2008-09-25 Tanaka, Jiro Character searching method
CN106709924A (en) * 2016-11-18 2017-05-24 中国人民解放军信息工程大学 Deep convolutional neutral network and superpixel-based image semantic segmentation method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008114618A1 (en) * 2007-03-19 2008-09-25 Tanaka, Jiro Character searching method
CN106709924A (en) * 2016-11-18 2017-05-24 中国人民解放军信息工程大学 Deep convolutional neutral network and superpixel-based image semantic segmentation method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郑云飞等: "第6讲 深度卷积神经网络在图像分割中的应用", 《军事通信技术》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178405A (en) * 2019-12-18 2020-05-19 浙江工业大学 Similar object identification method fusing multiple neural networks
CN111199539A (en) * 2019-12-25 2020-05-26 汕头大学 Crack detection method based on integrated neural network
CN111192248A (en) * 2019-12-30 2020-05-22 山东大学 Multi-task relation learning method for positioning, identifying and segmenting vertebral body in nuclear magnetic resonance imaging
CN111192248B (en) * 2019-12-30 2023-05-05 山东大学 Multi-task relation learning method for positioning, identifying and segmenting vertebral body in nuclear magnetic resonance imaging
CN111275034A (en) * 2020-01-19 2020-06-12 世纪龙信息网络有限责任公司 Method, device, equipment and storage medium for extracting text region from image
CN111275034B (en) * 2020-01-19 2023-09-12 天翼数字生活科技有限公司 Method, device, equipment and storage medium for extracting text region from image
CN111582263A (en) * 2020-05-12 2020-08-25 上海眼控科技股份有限公司 License plate recognition method and device, electronic equipment and storage medium
CN111612799A (en) * 2020-05-15 2020-09-01 中南大学 Face data pair-oriented incomplete reticulate pattern face repairing method and system and storage medium
CN111626357A (en) * 2020-05-27 2020-09-04 北京微智信业科技有限公司 Image identification method based on neural network model
CN112183549B (en) * 2020-10-26 2022-05-27 公安部交通管理科学研究所 Foreign driving license layout character positioning method based on semantic segmentation
CN112183549A (en) * 2020-10-26 2021-01-05 公安部交通管理科学研究所 Foreign driving license layout character positioning method based on semantic segmentation
CN112270370A (en) * 2020-11-06 2021-01-26 北京环境特性研究所 Vehicle apparent damage assessment method
CN112270370B (en) * 2020-11-06 2023-06-02 北京环境特性研究所 Vehicle apparent damage assessment method
CN112489054A (en) * 2020-11-27 2021-03-12 中北大学 Remote sensing image semantic segmentation method based on deep learning
CN112966691A (en) * 2021-04-14 2021-06-15 重庆邮电大学 Multi-scale text detection method and device based on semantic segmentation and electronic equipment
CN113435271A (en) * 2021-06-10 2021-09-24 中国电子科技集团公司第三十八研究所 Fusion method based on target detection and instance segmentation model
CN113792742A (en) * 2021-09-17 2021-12-14 北京百度网讯科技有限公司 Semantic segmentation method of remote sensing image and training method of semantic segmentation model
CN116665114A (en) * 2023-07-28 2023-08-29 广东海洋大学 Multi-mode-based remote sensing scene identification method, system and medium
CN116665114B (en) * 2023-07-28 2023-10-10 广东海洋大学 Multi-mode-based remote sensing scene identification method, system and medium

Also Published As

Publication number Publication date
CN110390251B (en) 2022-09-30

Similar Documents

Publication Publication Date Title
CN110390251A (en) A kind of pictograph semantic segmentation method based on the processing of multiple neural network Model Fusion
CN112036335B (en) Inverse convolution guided semi-supervised plant leaf disease identification and segmentation method
CN109583425A (en) A kind of integrated recognition methods of the remote sensing images ship based on deep learning
US20190180154A1 (en) Text recognition using artificial intelligence
CN103984959B (en) A kind of image classification method based on data and task-driven
CN108182454A (en) Safety check identifying system and its control method
CN112966684A (en) Cooperative learning character recognition method under attention mechanism
CN109711448A (en) Based on the plant image fine grit classification method for differentiating key field and deep learning
CN110363201A (en) Weakly supervised semantic segmentation method and system based on Cooperative Study
CN113673338B (en) Automatic labeling method, system and medium for weak supervision of natural scene text image character pixels
US20120087556A1 (en) Digital image analysis using multi-step analysis
CN107844740A (en) A kind of offline handwriting, printing Chinese character recognition methods and system
CN106815604A (en) Method for viewing points detecting based on fusion of multi-layer information
CN108629367A (en) A method of clothes Attribute Recognition precision is enhanced based on depth network
CN105279519A (en) Remote sensing image water body extraction method and system based on cooperative training semi-supervised learning
CN109886335A (en) Disaggregated model training method and device
CN112580507B (en) Deep learning text character detection method based on image moment correction
CN112069900A (en) Bill character recognition method and system based on convolutional neural network
Liu et al. SemiText: Scene text detection with semi-supervised learning
Maryan et al. Machine learning applications in detecting rip channels from images
Sihang et al. Precise detection of Chinese characters in historical documents with deep reinforcement learning
CN116645592B (en) Crack detection method based on image processing and storage medium
CN107958219A (en) Image scene classification method based on multi-model and Analysis On Multi-scale Features
Song et al. Occluded offline handwritten Chinese character inpainting via generative adversarial network and self-attention mechanism
Silva et al. Superpixel-based online wagging one-class ensemble for feature selection in foreground/background separation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant